Native client not returning output

(William Vanwoensel) #1

I trained a very basic model based on a single WAV file of ca. 2m, using the same single file for training, validating and testing (with 1 epoch):

python3 ./ --train_files ../data/Voice_180207/train.csv --dev_files ../data/Voice_180207/train.csv --test_files ../data/Voice_180207/train.csv --epoch 1 --export_dir ../models/Voice_180207

After building the native client, I tried running it using the following code:

ARGS="–model …/…/models/Voice_180207/output_graph.pb --alphabet …/…/models/Voice_180207/alphabet.txt --audio …/…/…/audio/Voice_180207_1.wav" make run`

This returns the following:

LD_LIBRARY_PATH=/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client: ./deepspeech --model ../../models/Voice_180207/output_graph.pb --alphabet ../../models/Voice_180207/alphabet.txt --audio ../../../audio/Voice_180207_1.wav
TensorFlow: b'v1.12.0-rc0-1797-g059c37c22c'
DeepSpeech: v0.4.0-alpha.1-12-gf69db72
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-12-14 14:59:51.008355: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

It seems that the script returns with exit code 0, however:

echo $?

Does the problem perhaps lie with the simplicity of the trained model? … I just wanted to try a baseline where a model is trained and tested on the same WAV file (mostly, to test the training part of DeepSpeech) - clearly the performance will not be realistic.

(Lissyx) #2

Yeah, one wav sample, default with of 2048 and only one epoch, you are right, it’s trained and running but it’s just unable to learn anything so the output is an empty string.

You should try with bin/ for that kind of testing, it’s designed to verify with a single file overfitting.

(William Vanwoensel) #3

Thanks. How would you propose dealing with a total audio set of ca. 20min … I would think using less than the standard 70 epochs?

(Lissyx) #4

You might want to play with hyperparameters, and for sure, you want --n_hidden with a much smaller value than 2048

(Lissyx) #5

@william.van.woensel Also, with such amount of data, you might be a good fit for transfer learning. I’ll defer to @josh_meyer for the specifics :slight_smile:

(William Vanwoensel) #6

Thanks for your feedback.

It would be ideal if I could use a pre-trained model - but our audio contains quite a few medical terms that, using the available pre-trained model, are not being transcribed properly.

Training our own model using an existing medical corpus would be an option as well. But our own set of audio files is way too small and I haven’t had much luck finding existing ones.

Is the current pre-trained model based on Common Voice? Does there happen to be domain-specific subsets of the CV corpus?

(Jahir) #7

You can train using the released checkpoints using your own data (I think that’s what lissyx meant).

(William Vanwoensel) #8

Thanks for pointing me towards bin/ Just out of curiosity, how was the value for n_hidden obtained (it’s quite specific, i.e., 494)?

(Lissyx) #9

We use some common voice data, regarding domain specific datasets, I don’t think so. What you can do, however, is train from the english model, if you have a few hours of specific data, and then make a more specialized language model: that should help a lot in your case.

We’re working on API changes to allow using multiple languages models, so in your case you could build one from medical terms.

(William Vanwoensel) #10

With the talk about language models I found this Discourse post and this blog post.

Am I right in assuming that, by creating a language model, one could improve accuracy without having to train your own network? In that case, I’m quite excited. I will try the steps outlined in the Discourse post.

(Lissyx) #11

You should follow the steps documented in the repo, under data/lm/ :slight_smile:

(William Vanwoensel) #12

Thanks for all your help. I’ve generated a language model but I cannot build generate_trie - I successfully built the native client using make deepspeech but this doesn’t create a generate_trie executable.

Running make generate_trie gives the following:

c++     -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L/home/william/speech/deepspeech/tensorflow/bazel-bin/native_client  -ldeepspeech   generate_trie.cpp   -o generate_trie
In file included from generate_trie.cpp:5:0:
ctcdecode/scorer.h:9:10: fatal error: lm/enumerate_vocab.hh: No such file or directory
 #include "lm/enumerate_vocab.hh"
compilation terminated.
<builtin>: recipe for target 'generate_trie' failed
make: *** [generate_trie] Error 1

Did I miss something? …

(Lissyx) #13

Like reading the documentation ? Fetching generate_trie from our prebuilt native_client.tar.xz ?

There’s no generate_trie target in the Makefile, that part depends on TensorFlow and thus Bazel.

(William Vanwoensel) #14

I already said that I built native client successfully, which is the only thing discussed in the documentation. Nothing else is discussed there - certainly nothing about fetching generate-trie from native_client.tar.xz.

(Lissyx) #15

There’s generate_trie building references in native_client/ Sorry if it’s not clear enough, PRs to improve the docs are welcome.

We also had, I’m pretty sure, docs covering the fact that generate_trie is bundled in native_client.tar.xz, but I cannot find that anymore. Not sure what happened. So just pick it, and that should be fine: or use the documented util/

(William Vanwoensel) #16

yes, thanks for pointing it out. I just downloaded the pre-built binaries and sure enough it’s there. I had built the native_client binaries locally, using the bazel command from the docs, but perhaps something went wrong there … I dunno.

(Lissyx) #17

Right, make sure your command line did include //native_client:generate_trie, and you should have the binary in TensorFlow’s bazel-bin/native_client/

(William Vanwoensel) #18

thanks, the binaries were indeed under bazel-bin/native_client/ … I assumed they would have appeared under DeepSpeech/native_client … (maybe not a bad thing to add to the docs for those of us who are unacquainted with Bazel)

(Lissyx) #19

PRs are welcome, if it’s not documented it’s likely we have not seen that as a pain point, and we might need help to properly explain it :slight_smile:

(William Vanwoensel) #20

ok, just submitted a minor PR for this issue