Have recently installed Deepspeech and run it against a small WAV file - 55 words
275 characters, length - 19.968s
The accuracy is not good at all, with an error rate of 43.63%. If I use a small script to test the same audio with Google Speech Recognition , the error rate is only 20%. Still high but better.
Have even cut that WAV down to 9 seconds, yet no change in accuracy. Initially used the command as per docs
deepspeech models/output_graph.pb my_audio_file.wav models/alphabet.txt
and have added the files lm.binary and trie to the command, and tests reveal no change.
The audio is English and very clear. I followed the guidlines re specifications for the WAV and it is the same as the sample WAV’s for Deepspeech.
In regards to these sample model files:
lm.binary
trie
output_graph.pb
what bearing do they have on accuracy ? Do I need to create a specific model to improve the accuracy, or ‘add to’ (train) the sample models ? There are hundreds of audio files that we have for just one speaker/person, and we wish to be able to improve the accuracy substantially.
Just wondering that because this is for a specific purpose, is it better to create a model for that purpose only, rather than use the existing (sample) models ?