I just followed the steps mentioned by you (now using .pbmm file with the new deepspeech binary obtained from native_client.tar.xz) but getting “File format ‘# Ea’… not understood” error as shown below-
(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav
Loading model from file output_graph.pbmm
2018-03-05 11:37:53.392392: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Data loss: Can’t parse output_graph.pbmm as binary proto
Loaded model in 0.005s.
Traceback (most recent call last):
File “/home/centerstage/tmp/deepspeech-venv/bin/deepspeech”, line 11, in
sys.exit(main())
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/deepspeech/client.py”, line 66, in main
fs, audio = wav.read(args.audio)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 168, in _read_riff_chunk
"understood.".format(repr(str1))) ValueError: File format ‘# Ea’… not understood.
One more point that ‘-t’ argument is not supported here,
I am able to install deepspeech on centos 7.3.1611 and I am able to perform the Speech to Text conversion of a .wav audio file.
The point I am concerned right now about is the high inference time (Speech to Text conversion time) which I need to reduce.
Is the fix that you are providing [Python 2.7 unicode deepspeech build] is by any mean associated to decrease the inference time? Should I install this new deepspeech version to full fill my purpose?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
23
No, it’s only aimed at working. The -t is only available on the C++ client. Regarding the high memory usage, I still need you totest C++ client with -t to get more informations.
My deepspeech binary is getting abort when I invoked deepspeech of native_client-
(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm hiroshima-1.wav models/alphabet.txt models/trie -t
2018-03-05 16:26:40.885164: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Error: Alphabet size does not match loaded model: alphabet has size 497, but model has 28 classes in its output. Make sure you’re passing an alphabet file with the same size as the one used for training.
Loading the LM will be faster if you build a binary file.
Reading models/alphabet.txt
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of 'lm::FormatLoadException’
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “a” not \data. Byte: 218 Aborted (core dumped)
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
25
Yes, because you have not read properly, and we changed the order of arguments. WAV file or directory should now be the last one, just before -t.
(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:47:52.381157: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
In sox_init()
In startRead
Searching for 66 6d 74 20
WAV Chunk fmt
Searching for 64 61 74 61
WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
(deepspeech-venv) [centerstage@localhost DeepSpeech]$
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
27
But then you are not passing the -t and there is extra output that is not from our codebase. Please stick to our code.
(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav -t
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:54:48.750644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
In sox_init()
In startRead
Searching for 66 6d 74 20
WAV Chunk fmt
Searching for 64 61 74 61
WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
cpu_time_overall=51.56000 cpu_time_mfcc=0.01000 cpu_time_infer=51.55000
(deepspeech-venv) [centerstage@localhost DeepSpeech]$
(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie hiroshima-1.wav -t
2018-03-05 17:01:41.754827: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
In sox_init()
In startRead
Searching for 66 6d 74 20
WAV Chunk fmt
Searching for 64 61 74 61
WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning cpu_time_overall=45.09000 cpu_time_mfcc=0.01000 cpu_time_infer=45.08000
(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
30
And yet there is still a ton of output that is not from our codebase. I cannot trust those results
Then created a virtual environment deepspeech-venv and installed deepspeech using-
'pip install --upgrade deepspeech’
and then installed the requirements present in requirements.txt file of Deepspeech directory.
I am already using 12 CPU machine with CPU MHz: 1200.042
Can you please suggest me what is the required CPU/system configuration to work upon deepspeech?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
35
Please understand that I have no idea why it is that slow on your system, and it might come from a lot of things. Is it bare-metal ? Are you the only user ? How much memory is available ? What is the exact storage specs ? Have you checked if you have multiple threads running on the deepspeech binary? htop -p <pid-of-deepspeech> should help for that.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
36
You have not answered why there is this extra debug informations from SOX that I quoted earlier. I’m unsure of the exact code you are running right now, and the environment you are running it in. This might have a play as well.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
37
So that’s in both case less than half of the performances. For an audio sample of 2.9secs, my CPU on the similar codebase would decode it in ~5secs. Your sample is 4.8secs, so a quick back-of-the-enveloppe computation would give something around 20 secs, baseline. Without further informations on your system specifications, and why there is this debug output, it’s hard to assess if there’s anything wrong here.