I’m testing out DeepSpeech with 8khz audio and seeing very poor accuracy.
Anyone have general pointers on how to improve this?
Should I train my own model (if so, what’s the recommended dataset size?)
Are there other approaches to working with DeepSpeech and 8khz audio?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
This is expected, because the model is being trained with 16kHz audio. You can try upsampling, but in our tests we could not really get anything satisfactory (this is why we have a warning in place now). The best would be to re-train (which would require quite large dataset, thousands of hours of audio), or to record with 16kHz.
Is there a chance that downsampling the training data and re-training would have better results than up-sampling?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
That’s one idea, but it means re-training everything, which is roughly one full week on our current infra, and it’s busy with other things right now (we need to evaluate some parameters for streaming and some other tuning).