We could, but I’ll have to be in the office to do that, so not before tomorrow at best. Besides, can you sha1sum to verify everything? Models, audio, code, libs?
There you go:
224594024eed9d69d4c85a90a0f24fb2e3bfe19b ./models/trie
b90017e816572ddce84f5843f1fa21e6a377975e ./models/output_graph.pb
47d7a0e69778a3cc3c4012cbdee122401df0dca3 ./models/lm.binary
163607ce1fe2135c7b3b130b6b0bc16c47bb209c ./models/alphabet.txt
1f20d309fb3a07d5dad01f934adcaa8ee5656154 ./bin/README.mozilla
26d885f7a8557c293f42ae72c51f5340e4454bab ./bin/deepspeech
27e1ac471eead95e8389e67fef8abf5446475b0b ./bin/libtensorflow_framework.so
71ee4a4524235a94d463853b381ca0dc7c81171b ./bin/libdeepspeech.so
029ecf9d7b5a8dfdd5bdf7146a7c128111abab45 ./bin/libdeepspeech_utils.so
2be65d4ff1d982f78d4f45a900c5ee9627ad59da ./bin/libtensorflow_cc.so
6023b6a9d16e1febe8bf5d2e5ac1269e9f0fc116 ./bin/libctc_decoder_with_kenlm.so
8e6627932560a0b5a33e3a1ee8ebbc0da5e4fa29 ./bin/generate_trie
d22157abc0fc0b4ae96380c09528e23cf77290a9 ./bin/LICENSE
I have realised a few things:
- The very first error that I got was because of lm.binary which is around 1,5GB. It does not fit into RPi’s memory and that is why it crashes.
- The second error that I got (trying without lm.binary) got resolved by using your “LDC93S1” output_graph.pb. Of course recognition was totally off in that case.
So the questions arise:
- What is the lm.binary file and is it really needed?
- How can accuracy be achieved without the lm.binary file?
- Is there a proper output_graph.pb that I can use?
lm.binary is the language model, used to improve recognition when accoustic model was phonetically right but the word was wrong.
For output_graph.pb
, I think the one I used for “good” decoding was the test overfit LC93S1.
I just checked @fotiDim and in my tests above, perfect decoding was achieved on overfitted LDC93S1 output_graph
.pb
pi@raspberrypi:~/tc/rpi3-cpu $ time ./deepspeech ~/models-release/models/output_graph.pb ~/models-release/audio/2830-3980-0043.wav ~/models-release/models/alphabet.txt -t
experience proves les
cpu_time_overall=80.82691 cpu_time_mfcc=0.02943 cpu_time_infer=80.79748
real 1m39.378s
user 1m17.210s
sys 0m6.850s
SHA1 for release’s output_graph.pb
and alphabet.txt
matches yours, but as you can see in the post above, I have proper decoding.
That is weird then. Can I ask as last favor to hand out the .img from your sd card?
Send me an email in private
Hello,
I am trying to infer a custom trained model with native client in a Asus Thinkerboard, a card similar to raspberry with 32bit armv7l architecture.
I tried to compile from scratch, including tensorflow, but ended up with the same error than this issue is reporting (the SparseToDense one).
Then I tried your raspberry-ready compiled libraries downloaded from:
https://queue.taskcluster.net/v1/task/Wnpx0NPjS3G20t76jbl15Q/runs/0/artifacts/public/native_client.tar.xz
But unfortunately this new “Illegal instruction” error showed up:
Thread 1 “deepspeech” received signal SIGILL, Illegal instruction.
0xb692de84 in tensorflow::(anonymous namespace)::GraphConstructor::TryImport() () from /home/ftx/fonotexto/herramientas/DeepSpeech/nativelibs_jmd/libdeepspeech.so
(NOTE: in my raspberry pi3 these libraries work fine).
So I would like to do a specific compilation for the Thinkerboard card, could you give me any hint to do that and skip the SparseToDense error?
Thanks in advance,
Mar
Heads up, I have been working on moving to newer GCC toolchain (from Linaro) and build armv7-a binaries. This URL should point to that: https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.arm/artifacts/public/native_client.tar.xz
Switching to this newer toolchain also enabled me to make it cleaner to build for other targets. You can copy some of the build flags we have defined here: https://github.com/mozilla/tensorflow/blob/7455bfae5cc3b5eafbc9a5d8faee33e7772a4d12/tools/bazel.rc#L52-L56
Make sure you pass all those you need in --copt=
, and adapt things like -mtune
.