After building the native client, I tried running it using the following code:
ARGS="–model …/…/models/Voice_180207/output_graph.pb --alphabet …/…/models/Voice_180207/alphabet.txt --audio …/…/…/audio/Voice_180207_1.wav" make run`
This returns the following:
LD_LIBRARY_PATH=/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client: ./deepspeech --model ../../models/Voice_180207/output_graph.pb --alphabet ../../models/Voice_180207/alphabet.txt --audio ../../../audio/Voice_180207_1.wav
TensorFlow: b'v1.12.0-rc0-1797-g059c37c22c'
DeepSpeech: v0.4.0-alpha.1-12-gf69db72
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-12-14 14:59:51.008355: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
It seems that the script returns with exit code 0, however:
echo $?
0
Does the problem perhaps lie with the simplicity of the trained model? … I just wanted to try a baseline where a model is trained and tested on the same WAV file (mostly, to test the training part of DeepSpeech) - clearly the performance will not be realistic.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Yeah, one wav sample, default with of 2048 and only one epoch, you are right, it’s trained and running but it’s just unable to learn anything so the output is an empty string.
You should try with bin/run-ldc93s1.sh for that kind of testing, it’s designed to verify with a single file overfitting.
It would be ideal if I could use a pre-trained model - but our audio contains quite a few medical terms that, using the available pre-trained model, are not being transcribed properly.
Training our own model using an existing medical corpus would be an option as well. But our own set of audio files is way too small and I haven’t had much luck finding existing ones.
Is the current pre-trained model based on Common Voice? Does there happen to be domain-specific subsets of the CV corpus?
Thanks for pointing me towards bin/run-ldc93s1.sh. Just out of curiosity, how was the value for n_hidden obtained (it’s quite specific, i.e., 494)?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
We use some common voice data, regarding domain specific datasets, I don’t think so. What you can do, however, is train from the english model, if you have a few hours of specific data, and then make a more specialized language model: that should help a lot in your case.
We’re working on API changes to allow using multiple languages models, so in your case you could build one from medical terms.
Am I right in assuming that, by creating a language model, one could improve accuracy without having to train your own network? In that case, I’m quite excited. I will try the steps outlined in the Discourse post.
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
You should follow the steps documented in the repo, under data/lm/README.md
Thanks for all your help. I’ve generated a language model but I cannot build generate_trie - I successfully built the native client using make deepspeech but this doesn’t create a generate_trie executable.
Running make generate_trie gives the following:
c++ -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client -ldeepspeech generate_trie.cpp -o generate_trie
In file included from generate_trie.cpp:5:0:
ctcdecode/scorer.h:9:10: fatal error: lm/enumerate_vocab.hh: No such file or directory
#include "lm/enumerate_vocab.hh"
^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
<builtin>: recipe for target 'generate_trie' failed
make: *** [generate_trie] Error 1
Did I miss something? …
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
13
Like reading the documentation ? Fetching generate_trie from our prebuilt native_client.tar.xz ?
There’s no generate_trie target in the Makefile, that part depends on TensorFlow and thus Bazel.
I already said that I built native client successfully, which is the only thing discussed in the documentation. Nothing else is discussed there - certainly nothing about fetching generate-trie from native_client.tar.xz.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
15
There’s generate_trie building references in native_client/README.md. Sorry if it’s not clear enough, PRs to improve the docs are welcome.
yes, thanks for pointing it out. I just downloaded the pre-built binaries and sure enough it’s there. I had built the native_client binaries locally, using the bazel command from the docs, but perhaps something went wrong there … I dunno.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
17
Right, make sure your command line did include //native_client:generate_trie, and you should have the binary in TensorFlow’s bazel-bin/native_client/
thanks, the binaries were indeed under bazel-bin/native_client/ … I assumed they would have appeared under DeepSpeech/native_client … (maybe not a bad thing to add to the docs for those of us who are unacquainted with Bazel)
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
19
PRs are welcome, if it’s not documented it’s likely we have not seen that as a pain point, and we might need help to properly explain it