Custom code DeepSpeed comprehension

EIchizen · June 18, 2020, 1:28pm

I’ve tried to execute deepspeed following these intructions:

$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
$ source $HOME/tmp/deepspeech-venv/bin/activate
$ pip3 install deepspeech
$ pip3 install --upgrade deepspeech
deepspeech --model deepspeech-0.7.3-models.tflite --scorer deepspeech-0.7.3-models.scorer
--audio my_audio_file.wav

I have one question, I’ve tried to put a 15s wav file as input but it returns just 3 words. Is it because my file is too long? Sometimes, it returns words I don’t know … like gartano, atheromatous or ototachibana.
Besides, for some wav file I have this warning:
Warning: original sample rate(44100) is different than 16000Hz. Resampling might produce erratic speed recognition
Does it mean that my wav is 16k? And that the input needs to have a sample rate of 44k1?
Furthermore, can we use DeepSpeech to have film subtitles?
Last but not least, how can we have the script? To modify or add some custom code?

Thank you.

othiele · June 18, 2020, 2:55pm

Love the DeepSpeed

Take proper pbmm instead of tflite, use sox etc. to downsample and please read before posting:

https://discourse.mozilla.org/t/what-and-how-to-report-if-you-need-support/62071/2