Improving accuracy with 8khz audio?

I’m testing out DeepSpeech with 8khz audio and seeing very poor accuracy.

Anyone have general pointers on how to improve this?

Should I train my own model (if so, what’s the recommended dataset size?)

Are there other approaches to working with DeepSpeech and 8khz audio?

This is expected, because the model is being trained with 16kHz audio. You can try upsampling, but in our tests we could not really get anything satisfactory (this is why we have a warning in place now). The best would be to re-train (which would require quite large dataset, thousands of hours of audio), or to record with 16kHz.

Thanks for confirming my suspicions.

Unfortunately my audio source native sample rate is 8kHz so I’m stuck with it :frowning:

Maybe you can try to upsample and filter it properly ?

Is there a chance that downsampling the training data and re-training would have better results than up-sampling?

That’s one idea, but it means re-training everything, which is roughly one full week on our current infra, and it’s busy with other things right now (we need to evaluate some parameters for streaming and some other tuning).

@cnelson

You could try 2 things : convert to 16K mono, AND change wav’s wave amplitude !

Check some of your wav’s with audacity : best amplitude should reach ± 0.5

I helped a friend with bad training : a small wav’s amplitude ±0.1 ! limit flat !!

But, on ears, it sounds correct !! not on a pc !