Unable to install deepspeech on centos 6.9

priyanka · February 27, 2018, 9:49am

hello, everyone I am trying to run deepspeech on ubuntu and getting this error

How to resolve this?

ajayks5017 · February 27, 2018, 9:53am

Hello Lissyx,

On applying upgrade command it shows below output-

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ pip install --upgrade deepspeech-0.1.1-cp27-cp27mu-manylinux1_x86_64.whl
deepspeech-0.1.1-cp27-cp27mu-manylinux1_x86_64.whl is not a supported wheel on this platform.

On running
(deepspeech-venv) [centerstage@localhost DeepSpeech]$ python -c ‘import sysconfig; import pprint; pprint.pprint(sysconfig.get_config_vars())’

I got below output and this time also not able to install deepspeech.
https://pastebin.mozilla.org/9078692

lissyx · February 27, 2018, 9:55am

Looks like I missed one part of the command: mv deepspeech-0.1.1-cp27-cp27mu-manylinux1_x86_64.whl deepspeech-0.1.1-cp27-cp27m-manylinux1_x86_64.whl && pip install --user --upgrade deepspeech-0.1.1-cp27-cp27m-manylinux1_x86_64.whl

lissyx · February 27, 2018, 9:58am

Right, this might be the lack of --enable-unicode=ucs4 in the CONFIG_ARGS parts. If that’s the case, renaming to deepspeech-0.1.1-cp27-cp27m-manylinux1_x86_64.whl should trick it, but it might behave erratically with Unicode.

It could be useful that you file an issue on Github about supporting this setup :).

ajayks5017 · February 27, 2018, 11:52am

Hello lissyx,
Thanks for the response!

Now I am able to install deepspeech.
But when I tried to install the requirements it throws error while installing tensorflow-1.5.0

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ pip install -r requirements.txt
Collecting pandas (from -r requirements.txt (line 1))
/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
SNIMissingWarning
/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Using cached pandas-0.22.0.tar.gz
Collecting progressbar2 (from -r requirements.txt (line 2))
Using cached progressbar2-3.35.2-py2.py3-none-any.whl
Collecting python-utils (from -r requirements.txt (line 3))
Using cached python_utils-2.3.0-py2.py3-none-any.whl
Collecting tensorflow==1.5.0 (from -r requirements.txt (line 4))
** Could not find a version that satisfies the requirement tensorflow==1.5.0 (from -r requirements.txt (line 4)) (from versions: )**
No matching distribution found for tensorflow==1.5.0 (from -r requirements.txt (line 4))

Meanwhile I just switched to Centos version 7.3 and I am able to successfully setup the deepspeech project along with tensorflow.

lissyx · February 27, 2018, 12:13pm

This is only needed if you intend to train. The fact that you have been able to install confirms it was just the unicode stuff. But if TensorFlow upstream itself fails, then all the bets are off and we cannot help you on that.

ajayks5017 · February 27, 2018, 12:25pm

Hello Lissyx,

Thanks for your help and the quick responses!!

I have a question and please let me know if this is the right place to ask it or should I open a new discussion window for it?

When I ran the decoder using the default model on a audio .wav file of ~4 sec it took ~38 sec of inference time.

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ deepspeech …/models/output_graph.pb …/hiroshima-1.wav …/models/alphabet.txt …/models/lm.binary …/models/trie
Loading model from file …/models/output_graph.pb
2018-02-27 17:26:13.741657: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.469s.
Loading language model from files …/models/lm.binary …/models/trie
Loaded language model in 2.297s.
Running inference.
on a bright cloud less morning
Inference took 38.391s for 4.620s audio file.

Why is it taking this long of time?
How to improve the speed?

lissyx · February 27, 2018, 12:56pm

It depends on a lot of parameters: CPU, IO subsystem. Can you give more details ?

ajayks5017 · February 27, 2018, 1:22pm

Hi Lissyx,

Below are the output of top/iostat commands while the deepspeech binary is decoding the ~4 sec audio file.

Working on 12 CPU machine

top - 18:04:49 up 39 days, 2:50, 11 users, load average: 2.27, 1.64, 0.98
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 17.0 us, 0.3 sy, 0.0 ni, 82.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 15688592 total, 2214132 free, 3858156 used, 9616304 buff/cache
KiB Swap: 16383996 total, 15744108 free, 639888 used. 10986616 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1344 centers+ 20 0 5362376 2.740g 1.519g S 201.0 18.3 0:57.56 deepspeech

CentOS Linux release 7.3.1611 (Core)
iostat output while the deepspeech binary system is running

[root@localhost DeepSpeech]# iostat
Linux 3.10.0-514.26.2.el7.x86_64 (localhost.localdomain) Tuesday 27 February 2018 x86_64 (12 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
1.55 0.00 0.51 0.08 0.00 97.86

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1.61 0.82 38.20 2763476 129148525

Please let me know if you need any more information.

lissyx · February 27, 2018, 1:43pm

What’s your exact CPU model, amount of RAM ? Hard-drive or SSD? It could the loading of the model itself that takes some time. Try artifacts from TaskCluster on master branch, C++ client in native_client.tar.xz contains “-t” option and you can use the mmap trick. It should all be accessible from readme. I’ll give you direct pointers if you don’t find but right now I cannot :-/.

ajayks5017 · March 1, 2018, 9:03am

Hello Lissyx,

I have downloaded DeepSpeech from below link and installed for python using the README.md file present in Deepspeech directory

There is one more README.md file present in the native_client directory which stats about building the Tensorflow and DeepSpeech libraries.

I am not getting your point from the above statement.
Can you please elaborate more on it?

My system contains HDD.
Total RAM is 15688592 kB.
Loading the model takes 0.469 sec while loading the loanguage model takes 2.297 sec
Loaded model in 0.469s.
Loading language model from files …/models/lm.binary …/models/trie
Loaded language model in 2.297s.

Please find below the details on CPU.

vendor name
[centerstage@localhost ~]$ cat /proc/cpuinfo | grep vendor | uniq
vendor_id : GenuineIntel
model name
[centerstage@localhost ~]$ cat /proc/cpuinfo | grep ‘model name’ | uniq
model name : Intel® Xeon® CPU E5-2609 v3 @ 1.90GHz
Architecture
[centerstage@localhost ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel® Xeon® CPU E5-2609 v3 @ 1.90GHz
Stepping: 2
CPU MHz: 1200.117
BogoMIPS: 3808.33
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-5
NUMA node1 CPU(s): 6-11
[centerstage@localhost ~]$
frequency/speed of the processor
[centerstage@localhost ~]$ lscpu | grep -i mhz
CPU MHz: 1200.117
Multiple processor
[centerstage@localhost ~]$ cat /proc/cpuinfo | grep -i ‘physical id’ | uniq
physical id : 0
physical id : 1
Number of cores
[centerstage@localhost ~]$ cat /proc/cpuinfo | grep -i ‘core id’
core id : 0
core id : 1
core id : 2
core id : 3
core id : 4
core id : 5
core id : 0
core id : 1
core id : 2
core id : 3
core id : 4
core id : 5

lissyx · March 1, 2018, 10:16am

Thanks for those details. HDD + your CPU, it might be slow, but I don’t have an overview of how much. To try with mmap, please download native_client.tar.xz from https://tools.taskcluster.net/index/artifacts/project.deepspeech.deepspeech.native_client.master/cpu and convert_graphdef_memmapped_format from https://tools.taskcluster.net/index/project.deepspeech.tensorflow.pip.r1.5/cpu

Following those steps: https://github.com/mozilla/DeepSpeech/blob/master/README.md#making-a-mmap-able-model-for-inference please produce output_graph.pbmm from output_graph.pb. Then, using the deepspeech binary from native_client.tar.xz you can run inference (use .pbmm instead of .pb), and add an extra -t argument at the end.

In the end,

$ ./deepspeech …/models/output_graph.pbmm …/models/alphabet.txt  …/

With multiple audio files in .../, it will load the model once, and perform multiple inferences. We should then be able to know better.

lissyx · March 1, 2018, 6:05pm

Someone filed an issue for the lack of Python 2.7 unicode build similar to yours. I’ve just merged the fix, it should be available as soon as https://tools.taskcluster.net/groups/PrzjPY-ITSK6cn9x4yr3yg completes. Python package can then be installed from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/deepspeech-0.1.1-cp27-cp27m-manylinux1_x86_64.whl or https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/deepspeech-0.1.1-cp27-cp27mu-manylinux1_x86_64.whl

This is not yet published to pypi registry (will be with v0.2.0 release)

ajayks5017 · March 5, 2018, 6:45am

Hello Lissyx,

I just followed the steps mentioned by you (now using .pbmm file with the new deepspeech binary obtained from native_client.tar.xz) but getting “File format ‘# Ea’… not understood” error as shown below-

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav
Loading model from file output_graph.pbmm
2018-03-05 11:37:53.392392: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Data loss: Can’t parse output_graph.pbmm as binary proto
Loaded model in 0.005s.
Traceback (most recent call last):
File “/home/centerstage/tmp/deepspeech-venv/bin/deepspeech”, line 11, in
sys.exit(main())
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/deepspeech/client.py”, line 66, in main
fs, audio = wav.read(args.audio)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File “/home/centerstage/tmp/deepspeech-venv/lib/python2.7/site-packages/scipy/io/wavfile.py”, line 168, in _read_riff_chunk
"understood.".format(repr(str1)))
ValueError: File format ‘# Ea’… not understood.

One more point that ‘-t’ argument is not supported here,

(deepspeech-venv) [centerstage@localhost new_native_client]$ deepspeech output_graph.pbmm …/models/alphabet.txt …/hiroshima-1.wav -t
usage: deepspeech [-h] model audio alphabet [lm] [trie]
deepspeech: error: unrecognized arguments: -t

ajayks5017 · March 5, 2018, 7:24am

Hello Lissyx,
Thanks for your work!

I am able to install deepspeech on centos 7.3.1611 and I am able to perform the Speech to Text conversion of a .wav audio file.

The point I am concerned right now about is the high inference time (Speech to Text conversion time) which I need to reduce.

Is the fix that you are providing [Python 2.7 unicode deepspeech build] is by any mean associated to decrease the inference time? Should I install this new deepspeech version to full fill my purpose?

lissyx · March 5, 2018, 10:43am

No, it’s only aimed at working. The -t is only available on the C++ client. Regarding the high memory usage, I still need you totest C++ client with -t to get more informations.

ajayks5017 · March 5, 2018, 11:26am

Hello Lissyx,

My deepspeech binary is getting abort when I invoked deepspeech of native_client-

(deepspeech-venv) [centerstage@localhost Speech_Recognizer]$ ./new_native_client/deepspeech new_native_client/output_graph.pbmm hiroshima-1.wav models/alphabet.txt models/trie -t
2018-03-05 16:26:40.885164: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Error: Alphabet size does not match loaded model: alphabet has size 497, but model has 28 classes in its output. Make sure you’re passing an alphabet file with the same size as the one used for training.
Loading the LM will be faster if you build a binary file.
Reading models/alphabet.txt
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of 'lm::FormatLoadException’
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “a” not \data. Byte: 218
Aborted (core dumped)

lissyx · March 5, 2018, 11:28am

Yes, because you have not read properly, and we changed the order of arguments. WAV file or directory should now be the last one, just before -t.

ajayks5017 · March 5, 2018, 11:44am

Yeah…that was a mistake.

please find below the output now-

(deepspeech-venv) [centerstage@localhost DeepSpeech]$ ./native_client/deepspeech …/models/output_graph.pb …/models/alphabet.txt …/models/lm.binary …/models/trie …/hiroshima-1.wav
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-03-05 16:47:52.381157: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

In sox_init()

In startRead

Searching for 66 6d 74 20

WAV Chunk fmt
Searching for 64 61 74 61

WAV Chunk LIST
WAV Chunk data
Reading Wave file: Microsoft PCM format, 1 channel, 16000 samp/sec
32000 byte/sec, 2 block align, 16 bits/samp, 147840 data bytes
73920 Samps/chans
Searching for 4c 49 53 54
on a bright cloud less morning
(deepspeech-venv) [centerstage@localhost DeepSpeech]$

lissyx · March 5, 2018, 11:46am

But then you are not passing the -t and there is extra output that is not from our codebase. Please stick to our code.