Conditions for recognition of files 8 kHz. What needs to be changed?


(TIgor) #1

I need to recognize speech in files up to 5 minutes 8 kHz. With files 16 kHz, up to 10 seconds, everything is fine. That is how to deal with a large file, where the words of 40 and more. Split into 10 seconds?
And for 8 kHz, is it necessary to edit for learning on data with a frequency of 8 kHz? Anything else you need to do?

(Lissyx) #2

Pre-trained models are only 16kHz, you need to re-train from scratch with 8 kHz. Or try to upsample, but this will create artifacts and in our experience requires some more processing to avoid screwing up recognition.

(kdavis) #3

There is also an example of transcribing longer files described here.

In addition they describe how to upsample to 16KHz.

(TIgor) #4

Thanks for the answer. I use learning from scratch. To use audio data for training with a frequency of 8 kHz, no additional changes in the files do not need?

(TIgor) #5

Thank. This is what I need for long files.

(Lissyx) #6

For training no, for inference, you should need to change the 16kHz references we have in the code.

(TIgor) #7

I need more data to train a model at 8 kHz than at 16, right?