DeepSpeech model training

laxmikant04.yadav · November 8, 2019, 12:44pm

I was able to generate language model and test it using below steps. Thank-you for your help.

Created virtualenv -
- virtualenv -p python3 $HOME/tmp/deepspeech-venv/
- source $HOME/tmp/deepspeech-venv/bin/activate
Clone deep speech git repo branch 0.5.1 with git lfs
- git clone --branch v0.5.1 https://github.com/mozilla/DeepSpeech.git
install the dependencies
- pip3 install -r requirement.txt
- pip3 uninstall tensorflow
- pip3 install ‘tensorflow-gpu==1.13.1’
install CTC decoder
- pip3 install $(python3 util/taskcluster.py --decoder)
download native_client from - https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/native_client.amd64.cuda.linux.tar.xz and extracted it into external_native_client folder in the same directory as DeepSpeech.
Navigated to DeepSpeech/native-client ,
- Delete existing kenlm folder - rm -rf kenlm
- Clone kenlm repo - got clone https://github.com/kpu/kenlm.git
- In kenlm directory, create a build folder
- Navigate to build folder and execute
  - cmake …
  - make -j 4
Form /DeepSpeech/native_client/kenlm/build/bin directory , executed
- ./lmplz --text /home/laxmikantm/proto_1/vocabulary.txt --arpa /tmp/words.arpa --order 5 --discount_fallback --temp_prefix /tmp/
- ./build_binary -T -s trie /tmp/words.arpa /tmp/lm.binary
Now using the external_native_client files created trie file
- ./generate_trie /home/laxmikantm/DeepSpeech/data/alphabet.txt /tmp/lm.binary /tmp/trie
After above step I had trie and lm.binary in tmp folder , I copied these files to a new folder and then used from there.

Thanks!!

lissyx · November 8, 2019, 1:21pm

Thanks @laxmikant04.yadav. Have you been able to identify the mistake you did in the past? If you can expose it clearly it might help others

laxmikant04.yadav · November 8, 2019, 2:27pm

Hi @lissyx,

Below are the major changes that made me to get it right -

I am working on deepspeech model 0.5.1 , but for language model creation i was referring to latest doc which are of version -0.6.0-alpha.14. So please refer to the correct version docs
I was setting up tensor flow and Bazel separability but as you advised we don’t need to do that. We can get the native_client.tar.gz from the release page .
Instead of cloaning kenLM as -
git clone --depth 1 https://github.com/kpu/kenlm , i cloned it as
git clone https://github.com/kpu/kenlm.git . ( I am not sure if it was making a difference)
Again not so sure, but when i set-up on normal tensor flow i got error segmentation fault (core dumped) while testing the files . And when i did the set-up again in new virtualenv with tensorflow-gpu it worked fine for me . (It Could be it issue the virtualenv)
And most most important, DO set-up in new virtualenv

Thanks!!

lissyx · November 8, 2019, 3:42pm

If you are referring to the links, those are right if you select the right branch.

I’m not sure why people constantly go to the full build, if you see the pattern in the doc that leads to that, please feel free to open an issue / send a PR.

Should make no difference

Was you first virtualenv fresh, or an old one ? I got into that as well, with old virtualenv on a debian sid that gets upgraded regularly.

laxmikant04.yadav · November 12, 2019, 11:33am

HI @lissyx,

this comments may look like the repeated one to the ones discussed earlier, forgive me for that.

But here are my observation-

Newly trained language and acoustic model is working good the the data i trained.
But when i recorded audio files for few sentences (10 sentences) and created i whole new language and acoustic model , it not giving good accuracy .(very poor accuracy).

i know there is bit of back-ground noise , but we were expecting to work on training files , and as it was new training so it was not using any previous files/checkout.

I tried changing following hyper-parameters -
test batch size- 1/2/3
dev batch size- 1/2/3
train batch size- 1/2/3
learning rate - 0.001/0.0001/0.0001/0.00001

wanted to check is , if need to really clean audio files to have literally zero background noise or there can be some training parameters that i need to play with to get it right?

Any suggestions please

lissyx · November 12, 2019, 12:48pm

Sorry, I don’t get the exact question here.

laxmikant04.yadav · November 12, 2019, 1:17pm

HI @lissyx

i was asking , does the deep-speech model 0.5.1 is expected to work only on clean data (i.e. audio clips made from text using online tool). or it world work on self recorded audio clips as well .( with very minimal noise).

Currently , i am able to make a POC where it it works good on clean data with new new language and acoustic model but it’s not working good on self recorded audio clips.

I have tried above mentioned hyper parameters.

Thanks!!!

lissyx · November 12, 2019, 1:20pm

It should work.

Devil lies in details. Please be explicit about “working good”, as well as “self-recorded audio clips”.

laxmikant04.yadav · November 12, 2019, 1:34pm

“working good” means getting 95%+ accuracy on training data audio clips.

“self-recorded audio clips”.- So we have replicated mozilla’s voice-web project in our organisation. I recorded audio clips using than than.

When used self recorded clips accuracy is very less , appx 20%.

lissyx · November 12, 2019, 1:38pm

Ok, can you document better your workflow ?

What’s your accent ?

laxmikant04.yadav · November 12, 2019, 1:43pm

British English (Indian accent)

lissyx · November 12, 2019, 1:51pm

That alone can account for a big part of the difference

laxmikant04.yadav · November 12, 2019, 1:54pm

HI @lissyx

We have set-up voice web project locally .

I deleted all the English sentences text file from the repository.
added a text new text file with the following words/sentences. So that i get only these words on UI, when clicked on speak button.

laxmikant
deepak
smartek
smart bot hub
hello there
speech to text
welcome
great
dheeraj
one

recorded these words from UI interface.
Collected audio files from S3 and created a tsv file with required details .( We have a node JS script to perform this task).
used bin/import_cv2.py script to convert mp3 to wav and create a csv.
then used Deepspeech.py script to create a acoustic model.

Note: A language model was already created with same keywords using the steps listed in the above comment.

Thanks!!

laxmikant04.yadav · November 12, 2019, 2:01pm

Yeah , i have idea about that .

I was hoping i would “work good” when purely trained on that accent . As the model would be first tested within the organisation with Indian accent. That’s why it is important for us.

lissyx · November 12, 2019, 2:14pm

You trained how precisely ?

laxmikant04.yadav · November 12, 2019, 2:19pm

I am using below command to train it …

python3 DeepSpeech.py --export_dir /home/XXXXXX/proto_1/model1/ --train_files ./train.csv --dev_files ./train.csv --test_files ./train.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --epochs 75 --dropout_rate 0.15 --n_hidden 2048 --lm_alpha 0.75 --lm_beta 1.85 --learning_rate 0.0001 --lm_trie_path /home/XXXX/proto_1/model1/trie --lm_binary_path /home/XXXXX/proto_1/model1/lm.binary --alphabet_config_path /home/XXXXXX/DeepSpeech/data/alphabet.txt

I tried with changing below parameters-
test batch size- 1/2/3
dev batch size- 1/2/3
train batch size- 1/2/3
learning rate - 0.001/0.0001/0.0001/0.00001

Note : i am using 10 audio clips that’s why i tried with lower batch sizes

lissyx · November 12, 2019, 2:20pm

Well, you just have not enough data. You would need much more than that.

laxmikant04.yadav · November 12, 2019, 2:22pm

i am training , validating and testing on same audio clips , so far that as well ?

lissyx · November 12, 2019, 2:24pm

Well, not really, but with this dataset, you cannot really do anything serious, so it does not matter a lot

arcsaber · November 12, 2019, 2:25pm

mate I’m currently downloading 16 gigs of German audio files to create a model, 10 clips aren’t enough.