DeepSpeech model training

You don’t need to rebuild libdeepspeech.so, nor generate_trie, just download the prebuilt ones ?

Check your trie creation, 75 bytes is wrong.

You may need to adjust the lmplz parameters as well, if you don’t hve a lot of data.

I don’t see a python virtualenv being setup, you should use one.

especially this might get tricky, can you pip3 list and share ?

Hi @lissyx

here is the list -

Package Version


absl-py 0.8.1
astor 0.8.0
attrdict 2.0.1
audioread 2.1.8
bcrypt 3.1.7
beautifulsoup4 4.8.1
bs4 0.0.1
certifi 2019.9.11
cffi 1.13.2
chardet 3.0.4
cryptography 2.8
cycler 0.10.0
decorator 4.4.1
ds-ctcdecoder 0.5.1
gast 0.3.2
grpcio 1.25.0
h5py 2.10.0
idna 2.8
joblib 0.14.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
librosa 0.7.1
llvmlite 0.30.0
Markdown 3.1.1
matplotlib 3.0.3
mock 3.0.5
numba 0.46.0
numpy 1.15.4
pandas 0.24.2
paramiko 2.6.0
pip 19.3.1
pkg-resources 0.0.0
progressbar2 3.47.0
protobuf 3.10.0
pycparser 2.19
PyNaCl 1.3.0
pyparsing 2.4.4
python-dateutil 2.8.1
python-utils 2.3.0
pytz 2019.3
pyxdg 0.26
requests 2.22.0
resampy 0.2.2
scikit-learn 0.21.3
scipy 1.3.1
setuptools 41.6.0
six 1.13.0
SoundFile 0.10.2
soupsieve 1.9.5
sox 1.3.7
tensorboard 1.13.1
tensorflow 1.13.1
tensorflow-estimator 1.13.0
termcolor 1.1.0
urllib3 1.25.6
Werkzeug 0.16.0
wheel 0.33.6

i created a virtualenv , sorry i forgot to mention

sorry i didn’t get that . You mean downloading these file externally .
i can see generate_trie.cpp in /DeepSpeech/native_client folder.

PS: forgive me for replying separately , will keep in mind from next time

Yeah, generate_trie is being bundled in native_client.tar.xz

HI @lissyx,

I was able to generate language model and test it using below steps. Thank-you for your help.

  • Created virtualenv -
    • virtualenv -p python3 $HOME/tmp/deepspeech-venv/
    • source $HOME/tmp/deepspeech-venv/bin/activate
  • Clone deep speech git repo branch 0.5.1 with git lfs
  • install the dependencies
    • pip3 install -r requirement.txt
    • pip3 uninstall tensorflow
    • pip3 install ‘tensorflow-gpu==1.13.1’
  • install CTC decoder
    • pip3 install $(python3 util/taskcluster.py --decoder)
  • download native_client from - https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/native_client.amd64.cuda.linux.tar.xz and extracted it into external_native_client folder in the same directory as DeepSpeech.
  • Navigated to DeepSpeech/native-client ,
    • Delete existing kenlm folder - rm -rf kenlm
    • Clone kenlm repo - got clone https://github.com/kpu/kenlm.git
    • In kenlm directory, create a build folder
    • Navigate to build folder and execute
      • cmake …
      • make -j 4
  • Form /DeepSpeech/native_client/kenlm/build/bin directory , executed
    • ./lmplz --text /home/laxmikantm/proto_1/vocabulary.txt --arpa /tmp/words.arpa --order 5 --discount_fallback --temp_prefix /tmp/
    • ./build_binary -T -s trie /tmp/words.arpa /tmp/lm.binary
  • Now using the external_native_client files created trie file
    • ./generate_trie /home/laxmikantm/DeepSpeech/data/alphabet.txt /tmp/lm.binary /tmp/trie
  • After above step I had trie and lm.binary in tmp folder , I copied these files to a new folder and then used from there.

Thanks!!

Thanks @laxmikant04.yadav. Have you been able to identify the mistake you did in the past? If you can expose it clearly it might help others

Hi @lissyx,

Below are the major changes that made me to get it right -

  • I am working on deepspeech model 0.5.1 , but for language model creation i was referring to latest doc which are of version -0.6.0-alpha.14. So please refer to the correct version docs

  • I was setting up tensor flow and Bazel separability but as you advised we don’t need to do that. We can get the native_client.tar.gz from the release page .

  • Instead of cloaning kenLM as -
    git clone --depth 1 https://github.com/kpu/kenlm , i cloned it as
    git clone https://github.com/kpu/kenlm.git . ( I am not sure if it was making a difference)

  • Again not so sure, but when i set-up on normal tensor flow i got error segmentation fault (core dumped) while testing the files . And when i did the set-up again in new virtualenv with tensorflow-gpu it worked fine for me . (It Could be it issue the virtualenv)

  • And most most important, DO set-up in new virtualenv :smiley:

Thanks!!

1 Like

If you are referring to the links, those are right if you select the right branch.

I’m not sure why people constantly go to the full build, if you see the pattern in the doc that leads to that, please feel free to open an issue / send a PR.

Should make no difference

Was you first virtualenv fresh, or an old one ? I got into that as well, with old virtualenv on a debian sid that gets upgraded regularly.

HI @lissyx,

this comments may look like the repeated one to the ones discussed earlier, forgive me for that.

But here are my observation-

  1. Newly trained language and acoustic model is working good the the data i trained.
  2. But when i recorded audio files for few sentences (10 sentences) and created i whole new language and acoustic model , it not giving good accuracy .(very poor accuracy).

i know there is bit of back-ground noise , but we were expecting to work on training files , and as it was new training so it was not using any previous files/checkout.

I tried changing following hyper-parameters -
test batch size- 1/2/3
dev batch size- 1/2/3
train batch size- 1/2/3
learning rate - 0.001/0.0001/0.0001/0.00001

wanted to check is , if need to really clean audio files to have literally zero background noise or there can be some training parameters that i need to play with to get it right?

Any suggestions please

Sorry, I don’t get the exact question here.

HI @lissyx

i was asking , does the deep-speech model 0.5.1 is expected to work only on clean data (i.e. audio clips made from text using online tool). or it world work on self recorded audio clips as well .( with very minimal noise).

Currently , i am able to make a POC where it it works good on clean data with new new language and acoustic model but it’s not working good on self recorded audio clips.

I have tried above mentioned hyper parameters.

Thanks!!!

It should work.

Devil lies in details. Please be explicit about “working good”, as well as “self-recorded audio clips”.

“working good” means getting 95%+ accuracy on training data audio clips.

“self-recorded audio clips”.- So we have replicated mozilla’s voice-web project in our organisation. I recorded audio clips using than than.

When used self recorded clips accuracy is very less , appx 20%.

Ok, can you document better your workflow ?

What’s your accent ?

British English (Indian accent)

That alone can account for a big part of the difference :confused:

HI @lissyx

We have set-up voice web project locally .

  • I deleted all the English sentences text file from the repository.

  • added a text new text file with the following words/sentences. So that i get only these words on UI, when clicked on speak button.

laxmikant
deepak
smartek
smart bot hub
hello there
speech to text
welcome
great
dheeraj
one

  • recorded these words from UI interface.

  • Collected audio files from S3 and created a tsv file with required details .( We have a node JS script to perform this task).

  • used bin/import_cv2.py script to convert mp3 to wav and create a csv.

  • then used Deepspeech.py script to create a acoustic model.

Note: A language model was already created with same keywords using the steps listed in the above comment.

Thanks!!

Yeah , i have idea about that :frowning: .

I was hoping i would “work good” when purely trained on that accent . As the model would be first tested within the organisation with Indian accent. That’s why it is important for us.