Hi, I’m calling IntermediateDecode 1 time per second (I’m still testing), in the first seconds it is fast to decode. See the red lines of the first image. They are relatively short compared to the second ones.
Here’s a video using the streaming feature and showing the decoding time increasing. (Very old AMD, without avx and avx2, so it is slower than real time)
With just 10 seconds of streaming there is noticeable difference on the decode time.
I understood that all the accumulated logits are used at every decode, is this the cause of slow decode? @reuben Let me know if I’m wrong.