DeepSpeech model training

if you dont want early stop and want to complete the epoch set --noearly_stop

1 Like

HI ,
Yeah but if model loss is not decreasing then model won’t be of much use, right?
As suggested by @lissyx , i have started training with 1-e6 learning rate , currectly it’s running on 7th epoch.

I wants to check if like learning rate to i need to change default value of any other parameters . Or there any good refrence where i could get much information about training parameters.

HI @lissyx,
i was able to verify the re-training of acoustic (output_graph.pb) model based on your comment and using other parameters from the release note of 0.5.1 model.

Now we wants to verify if we could fine tune the language model ( lm.binary & trie) with our domain related keywords . I followed below 2 discussions and what i understood is - “fine tuning of language model is not possible yet”… is that right??

You are basing your understanding on very old threads. Have a look at data/lm, it has everything you need.

@laxmikant04.yadav: if you are still looking for DeepSpeech results on German Language and training process. Check this paper and repository. It might be useful.

https://www.researchgate.net/publication/336532830_German_End-to-end_Speech_Recognition_based_on_DeepSpeech

1 Like

HI @lissyx,

I was able to create new trie and lm.binary files based on our organisation specific keywords. I followed below two references and i am using deepspeech version -0.5.1

1.TUTORIAL : How I trained a specific french model to control my robot
2.https://github.com/mozilla/DeepSpeech/tree/v0.5.1/data/lm

when i started training on newly generated trie and lm.binary files to generate acoustic model , train.csv and dev.csv gave no error but i got fatal error at test step.

Fatal Python error: Segmentation fault

when i looked more on forum , some places i found that it could be because of version mismatch , Could that be a case ? if yes, where should i be looking fist to get it right.

Note: When i train with trie and lm.binary thats are there in git repo , it works fine.

Make sure you properly create the LM and trie file as documented in data/lm. The tutorial is likely out of date, so I’d advise not to spend too much time.

What’s the size of your LM and trie? Can you verify you are using the good ones ? Can you share your exact steps for producing the trie ?

Hi @lissyx,

Below are the steps i used for generating trie and lm.binary files

Generating language model –

  • Clone deep speech git repo branch 0.5.1 with git lfs
  • install the dependencies
    • pip3 install -r requirement.txt
  • install CTC decoder
    • pip3 install $(python3 util/taskcluster.py --decoder)
  • clone tensor flow in same directory as DeepSpeech
  • in tensor flow directory run
    • git checkout origin/r1.13
  • As tensor flow version is 1.13 , so Bazel build tool version would be Bazel 0.19.2
  • I am using linux 16.04 , so based on https://docs.bazel.build/versions/master/install-ubuntu.html , Executed below commands
    • sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python3
    • Downloaded Bazel version - bazel-0.19.2installer-linux-x86_64.sh
    • chmod +x bazel-0.19.2installer-linux-x86_64.sh
    • ./ bazel-0.19.2installer-linux-x86_64.sh –user
      • Used all the default/recommended options
    • export PATH="$PATH:$HOME/bin"
  • navigated to tensorflow directory and executed
    • ./configure
      • Used all the default/recommended options
    • ln -s …/DeepSpeech/native_client ./
    • bazel build --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie
  • So far, I was able to compile DeepSpeech and I could see binaries in /tensorflow/bazel-bin/native-client directory.
  • Navigated to DeepSpeech/native-client ,
    • Clone kenlm repo - git clone --depth 1 https://github.com/kpu/kenlm
    • In kenlm directory, create a build folder
    • Navigate to build folder and execute
      • cmake …
      • make -j 4
  • After above step I could see lmplz and build_binary in /DeepSpeech/native_client/kenlm/build/bin directory
  • Form /DeepSpeech/native_client/kenlm/build/bin directory , executed
    • ./lmplz --order 5 --memory 50% --text /home/laxmikantm/proto_1/vocabulary.txt --arpa /tmp/lm.apra --prune 0 0 0 1 --temp_prefix /tmp/
    • ./build_binary -a 255 -q 8 trie /tmp/lm.apra /tmp/lm.binary
  • From /tensorflow/bazel-bin/native_client directory
    • ./generate_trie /home/xxxxxxx/proto_1/vocabulary.txt /tmp/lm.binary /tmp/trie
  • After above step I had trie and lm.binary in tmp folder , I copied these files to a new folder and then used from there.

For testing purpose i have only 10 files , and generated file size is -
trie - 75 Byte
lm.binary - 9.4K

You don’t need to rebuild libdeepspeech.so, nor generate_trie, just download the prebuilt ones ?

Check your trie creation, 75 bytes is wrong.

You may need to adjust the lmplz parameters as well, if you don’t hve a lot of data.

I don’t see a python virtualenv being setup, you should use one.

especially this might get tricky, can you pip3 list and share ?

Hi @lissyx

here is the list -

Package Version


absl-py 0.8.1
astor 0.8.0
attrdict 2.0.1
audioread 2.1.8
bcrypt 3.1.7
beautifulsoup4 4.8.1
bs4 0.0.1
certifi 2019.9.11
cffi 1.13.2
chardet 3.0.4
cryptography 2.8
cycler 0.10.0
decorator 4.4.1
ds-ctcdecoder 0.5.1
gast 0.3.2
grpcio 1.25.0
h5py 2.10.0
idna 2.8
joblib 0.14.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
librosa 0.7.1
llvmlite 0.30.0
Markdown 3.1.1
matplotlib 3.0.3
mock 3.0.5
numba 0.46.0
numpy 1.15.4
pandas 0.24.2
paramiko 2.6.0
pip 19.3.1
pkg-resources 0.0.0
progressbar2 3.47.0
protobuf 3.10.0
pycparser 2.19
PyNaCl 1.3.0
pyparsing 2.4.4
python-dateutil 2.8.1
python-utils 2.3.0
pytz 2019.3
pyxdg 0.26
requests 2.22.0
resampy 0.2.2
scikit-learn 0.21.3
scipy 1.3.1
setuptools 41.6.0
six 1.13.0
SoundFile 0.10.2
soupsieve 1.9.5
sox 1.3.7
tensorboard 1.13.1
tensorflow 1.13.1
tensorflow-estimator 1.13.0
termcolor 1.1.0
urllib3 1.25.6
Werkzeug 0.16.0
wheel 0.33.6

i created a virtualenv , sorry i forgot to mention

sorry i didn’t get that . You mean downloading these file externally .
i can see generate_trie.cpp in /DeepSpeech/native_client folder.

PS: forgive me for replying separately , will keep in mind from next time

Yeah, generate_trie is being bundled in native_client.tar.xz

HI @lissyx,

I was able to generate language model and test it using below steps. Thank-you for your help.

  • Created virtualenv -
    • virtualenv -p python3 $HOME/tmp/deepspeech-venv/
    • source $HOME/tmp/deepspeech-venv/bin/activate
  • Clone deep speech git repo branch 0.5.1 with git lfs
  • install the dependencies
    • pip3 install -r requirement.txt
    • pip3 uninstall tensorflow
    • pip3 install ‘tensorflow-gpu==1.13.1’
  • install CTC decoder
    • pip3 install $(python3 util/taskcluster.py --decoder)
  • download native_client from - https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/native_client.amd64.cuda.linux.tar.xz and extracted it into external_native_client folder in the same directory as DeepSpeech.
  • Navigated to DeepSpeech/native-client ,
    • Delete existing kenlm folder - rm -rf kenlm
    • Clone kenlm repo - got clone https://github.com/kpu/kenlm.git
    • In kenlm directory, create a build folder
    • Navigate to build folder and execute
      • cmake …
      • make -j 4
  • Form /DeepSpeech/native_client/kenlm/build/bin directory , executed
    • ./lmplz --text /home/laxmikantm/proto_1/vocabulary.txt --arpa /tmp/words.arpa --order 5 --discount_fallback --temp_prefix /tmp/
    • ./build_binary -T -s trie /tmp/words.arpa /tmp/lm.binary
  • Now using the external_native_client files created trie file
    • ./generate_trie /home/laxmikantm/DeepSpeech/data/alphabet.txt /tmp/lm.binary /tmp/trie
  • After above step I had trie and lm.binary in tmp folder , I copied these files to a new folder and then used from there.

Thanks!!

Thanks @laxmikant04.yadav. Have you been able to identify the mistake you did in the past? If you can expose it clearly it might help others

Hi @lissyx,

Below are the major changes that made me to get it right -

  • I am working on deepspeech model 0.5.1 , but for language model creation i was referring to latest doc which are of version -0.6.0-alpha.14. So please refer to the correct version docs

  • I was setting up tensor flow and Bazel separability but as you advised we don’t need to do that. We can get the native_client.tar.gz from the release page .

  • Instead of cloaning kenLM as -
    git clone --depth 1 https://github.com/kpu/kenlm , i cloned it as
    git clone https://github.com/kpu/kenlm.git . ( I am not sure if it was making a difference)

  • Again not so sure, but when i set-up on normal tensor flow i got error segmentation fault (core dumped) while testing the files . And when i did the set-up again in new virtualenv with tensorflow-gpu it worked fine for me . (It Could be it issue the virtualenv)

  • And most most important, DO set-up in new virtualenv :smiley:

Thanks!!

1 Like

If you are referring to the links, those are right if you select the right branch.

I’m not sure why people constantly go to the full build, if you see the pattern in the doc that leads to that, please feel free to open an issue / send a PR.

Should make no difference

Was you first virtualenv fresh, or an old one ? I got into that as well, with old virtualenv on a debian sid that gets upgraded regularly.

HI @lissyx,

this comments may look like the repeated one to the ones discussed earlier, forgive me for that.

But here are my observation-

  1. Newly trained language and acoustic model is working good the the data i trained.
  2. But when i recorded audio files for few sentences (10 sentences) and created i whole new language and acoustic model , it not giving good accuracy .(very poor accuracy).

i know there is bit of back-ground noise , but we were expecting to work on training files , and as it was new training so it was not using any previous files/checkout.

I tried changing following hyper-parameters -
test batch size- 1/2/3
dev batch size- 1/2/3
train batch size- 1/2/3
learning rate - 0.001/0.0001/0.0001/0.00001

wanted to check is , if need to really clean audio files to have literally zero background noise or there can be some training parameters that i need to play with to get it right?

Any suggestions please

Sorry, I don’t get the exact question here.