Running deepspeech for more .wav files to infer the text


(Megha ) #1

I am able to get output for a single audio .wav file. Below is the command I am using.

(deepspeech-venv) megha@megha-medion:~/Alu_Meg/DeepSpeech_Alug_Meg/DeepSpeech$ ./deepspeech my_exportdir/model.pb/output_graph.pb models/alphabet.txt myAudio_for_testing.wav

here, myAudio_for_testing.wav is the audio file I am using to get the below output.

TensorFlow: v1.6.0-9-g236f83e
DeepSpeech: v0.1.1-44-gd68fde8
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-06-29 14:51:35.832686: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
heritor teay we decide the lunch ha annral limined eddition of y ye com im standmat

I also saving the output to some CSV file for now. But this is happening only for 1 audio file.
Here is my question,
I have around 2000 audio files like this. how can I read 1 by 1 and get output? I tried to write a script in python to read all the .wav audio files I have, but as deepspeech is using some sources which are kept in a virtual environment, I am not getting how I can I write my deepspeech command inside the script. Can you guys give me some hints to proceed with? It will be a great help.

Thank you:)


(Lissyx) #2

You should just write your own script, inspired from ours: https://github.com/mozilla/DeepSpeech/blob/v0.1.1/native_client/python/client.py#L73

There we do only one inference, but you could loop on more files.


(Abby) #3

Were you able to create your own script, Can you share it?