Unable to install deepspeech on centos 6.9

Hello Lissyx,

I have downloaded DeepSpeech from below link and installed for python using the README.md file present in Deepspeech directory

There is one more README.md file present in the native_client directory which stats about building the Tensorflow and DeepSpeech libraries.

I am not getting your point from the above statement.
Can you please elaborate more on it?

  1. My system contains HDD.
  2. Total RAM is 15688592 kB.
  3. Loading the model takes 0.469 sec while loading the loanguage model takes 2.297 sec
    Loaded model in 0.469s.
    Loading language model from files …/models/lm.binary …/models/trie
    Loaded language model in 2.297s.

Please find below the details on CPU.

  1. vendor name
    [centerstage@localhost ~]$ cat /proc/cpuinfo | grep vendor | uniq
    vendor_id : GenuineIntel

  2. model name
    [centerstage@localhost ~]$ cat /proc/cpuinfo | grep ‘model name’ | uniq
    model name : Intel® Xeon® CPU E5-2609 v3 @ 1.90GHz

  3. Architecture
    [centerstage@localhost ~]$ lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 12
    On-line CPU(s) list: 0-11
    Thread(s) per core: 1
    Core(s) per socket: 6
    Socket(s): 2
    NUMA node(s): 2
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 63
    Model name: Intel® Xeon® CPU E5-2609 v3 @ 1.90GHz
    Stepping: 2
    CPU MHz: 1200.117
    BogoMIPS: 3808.33
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 15360K
    NUMA node0 CPU(s): 0-5
    NUMA node1 CPU(s): 6-11
    [centerstage@localhost ~]$

  4. frequency/speed of the processor
    [centerstage@localhost ~]$ lscpu | grep -i mhz
    CPU MHz: 1200.117

  5. Multiple processor
    [centerstage@localhost ~]$ cat /proc/cpuinfo | grep -i ‘physical id’ | uniq
    physical id : 0
    physical id : 1

  6. Number of cores
    [centerstage@localhost ~]$ cat /proc/cpuinfo | grep -i ‘core id’
    core id : 0
    core id : 1
    core id : 2
    core id : 3
    core id : 4
    core id : 5
    core id : 0
    core id : 1
    core id : 2
    core id : 3
    core id : 4
    core id : 5

Thanks for those details. HDD + your CPU, it might be slow, but I don’t have an overview of how much. To try with mmap, please download native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu and convert_graphdef_memmapped_format from https://tools.taskcluster.net/index/project.deepspeech.tensorflow.pip.r1.5/cpu

Following those steps: https://github.com/mozilla/DeepSpeech/blob/master/README.md#making-a-mmap-able-model-for-inference please produce output_graph.pbmm from output_graph.pb. Then, using the deepspeech binary from native_client.tar.xz you can run inference (use .pbmm instead of .pb), and add an extra -t argument at the end.

In the end,

$ ./deepspeech …/models/output_graph.pbmm …/models/alphabet.txt  …/

With multiple audio files in .../, it will load the model once, and perform multiple inferences. We should then be able to know better.

Someone filed an issue for the lack of Python 2.7 unicode build similar to yours. I’ve just merged the fix, it should be available as soon as https://tools.taskcluster.net/groups/PrzjPY-ITSK6cn9x4yr3yg completes. Python package can then be installed from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/deepspeech-0.1.1-cp27-cp27m-manylinux1_x86_64.whl or https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/deepspeech-0.1.1-cp27-cp27mu-manylinux1_x86_64.whl

This is not yet published to pypi registry (will be with v0.2.0 release)

Hello Lissyx,

I just followed the steps mentioned by you (now using .pbmm file with the new deepspeech binary obtained from native_client.tar.xz) but getting “File format ‘# Ea’… not understood” error as shown below-

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav
Loading model from file output_graph.pbmm
2018-03-05 11:37:53.392392: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Data loss: Can’t parse output_graph.pbmm as binary proto
Loaded model in 0.005s.
Traceback (most recent call last):
File “/home/centerstage/tmp/deepspeech-venv/bin/deepspeech”, line 11, in
sys.exit(main())
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/deepspeech/client.py”, line 66, in main
fs, audio = wav.read(args.audio)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 168, in _read_riff_chunk
"understood.".format(repr(str1)))
ValueError: File format ‘# Ea’… not understood.

One more point that ‘-t’ argument is not supported here,

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav -t
usage: deepspeech [-h] model audio alphabet [lm] [trie]
deepspeech: error: unrecognized arguments: -t

Hello Lissyx,
Thanks for your work!

I am able to install deepspeech on centos 7.3.1611 and I am able to perform the Speech to Text conversion of a .wav audio file.

The point I am concerned right now about is the high inference time (Speech to Text conversion time) which I need to reduce.

Is the fix that you are providing [Python 2.7 unicode deepspeech build] is by any mean associated to decrease the inference time? Should I install this new deepspeech version to full fill my purpose?

No, it’s only aimed at working. The -t is only available on the C++ client. Regarding the high memory usage, I still need you totest C++ client with -t to get more informations.

Hello Lissyx,

My deepspeech binary is getting abort when I invoked deepspeech of native_client-

(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm hiroshima-1.wav models/alphabet.txt models/trie -t
2018-03-05 16:26:40.885164: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Error: Alphabet size does not match loaded model: alphabet has size 497, but model has 28 classes in its output. Make sure you’re passing an alphabet file with the same size as the one used for training.
Loading the LM will be faster if you build a binary file.
Reading models/alphabet.txt
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of 'lm::FormatLoadException’
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “a” not \data. Byte: 218
Aborted (core dumped)

Yes, because you have not read properly, and we changed the order of arguments. WAV file or directory should now be the last one, just before -t.

Yeah…that was a mistake.

please find below the output now-

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:47:52.381157: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
(deepspeech-venv) [centerstage@localhost DeepSpeech]$

But then you are not passing the -t and there is extra output that is not from our codebase. Please stick to our code.

Please find the output with -t option

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav -t
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:54:48.750644: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
cpu_time_overall=51.56000 cpu_time_mfcc=0.01000 cpu_time_infer=51.55000
(deepspeech-venv) [centerstage@localhost DeepSpeech]$

I didn’t get what is the extra output path?

Hello Lissyx,

I am posting one more output which uses mmap and the native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu as per what suggested by you…

(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie hiroshima-1.wav -t
2018-03-05 17:01:41.754827: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
cpu_time_overall=45.09000 cpu_time_mfcc=0.01000 cpu_time_infer=45.08000
(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$

And yet there is still a ton of output that is not from our codebase. I cannot trust those results :frowning:

Hello lissyx,

I followed https://github.com/mozilla/DeepSpeech for installation.

I have downloaded deepspeech using below-
git clone https://github.com/mozilla/DeepSpeech

Downloaded the model from below link-
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

Then created a virtual environment deepspeech-venv and installed deepspeech using-
'pip install --upgrade deepspeech’
and then installed the requirements present in requirements.txt file of Deepspeech directory.

Performed-
python util/taskcluster.py --target native_client/

And then as suggested by you I used the native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu

and for mmap performed below operation-
convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm

Please let me know if I have done a mistake anywhere?

Where is this coming from ?

It might just be slow because your CPU / system is slow.

Hello Lissyx,

I am already using 12 CPU machine with CPU MHz: 1200.042
Can you please suggest me what is the required CPU/system configuration to work upon deepspeech?

Please understand that I have no idea why it is that slow on your system, and it might come from a lot of things. Is it bare-metal ? Are you the only user ? How much memory is available ? What is the exact storage specs ? Have you checked if you have multiple threads running on the deepspeech binary? htop -p <pid-of-deepspeech> should help for that.

You have not answered why there is this extra debug informations from SOX that I quoted earlier. I’m unsure of the exact code you are running right now, and the environment you are running it in. This might have a play as well.

https://www.cpubenchmark.net/compare.php?cmp[]=2275&cmp[]=2427

This gives rating between your CPU (Xeon E5-2609 v3) and mine (i7 4790K) :

|Single Thread Rating|2530|1115|
|CPU Mark|11189|5940|

So that’s in both case less than half of the performances. For an audio sample of 2.9secs, my CPU on the similar codebase would decode it in ~5secs. Your sample is 4.8secs, so a quick back-of-the-enveloppe computation would give something around 20 secs, baseline. Without further informations on your system specifications, and why there is this debug output, it’s hard to assess if there’s anything wrong here.