Training Vietnamese model

Following my result of my first run to train the model I don’t know this is the bad or good result ?


  1. Data: My data is about 12.000 files wav (train:dev:test = 9500:1700:800), the transcript is good.

  2. I create alphabet.txt contains all my characters in data transcript.

  3. Language model: I create LM from 100% transcript text of the data above.

./lmplz --text vocabulary.txt --arpa  words.arpa --o 3
./build_binary -T -s words.arpa  lm.binary

The lm.binary file have size about 4.1 MB. Then I use the LM to generate trie file (size of trie file is about 65KB) .

  1. Then I start training the default source of Deepspeech.py with this command line:
python3 DeepSpeech.py \
--train_files=/Users/tringuyen/Documents/DeepSpeech/train.csv \
--test_files=/Users/tringuyen/Documents/DeepSpeech/test.csv \
--dev_files=/Users/tringuyen/Documents/DeepSpeech/dev.csv \
--alphabet_config_path=/Users/tringuyen/Documents/DeepSpeech/mymodels/alphabet.txt \
--lm_binary_path=/Users/tringuyen/Documents/DeepSpeech/mymodels/vnlm.binary \
--lm_trie_path=/Users/tringuyen/Documents/DeepSpeech/mymodels/vntrie \
--checkpoint_dir=/Users/tringuyen/Documents/DeepSpeech/myresult/checkpoints \
--export_dir=/Users/tringuyen/Documents/DeepSpeech/myresult/export \
--summary_dir=/Users/tringuyen/Documents/DeepSpeech/myresult/summary \
--epoch=80 \
--train_batch_size=64 \
--dev_batch_size=64 \
--test_batch_size=32 \
--report_count=100 \
--use_seq_length=False \
--es_std_th=0.1 \
--es_mean_th=0.1

And after 4 epochs, it’s stop training and print these result:

I Finished validating epoch 3 - loss: 377.712136
I Early stop triggered as (for last 4 steps) validation loss: 377.712136 with standard deviation: 4.922094 and mean: 363.410538
Preprocessing ['/Users/tringuyen/Documents/DeepSpeech/test.csv']
Preprocessing done
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Computing acoustic model predictions...
100% |#######################################################################################################################################################################################################################################|
Decoding predictions...
100% |#######################################################################################################################################################################################################################################|
Test - WER: 0.997571, CER: 0.974905, loss: 185.690430
--------------------------------------------------------------------------------
WER: 1.000000, CER: 5.000000, loss: 33.746929
 - src: "bà nói"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 5.000000, loss: 36.778442
 - src: "ôi dào"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 8.000000, loss: 47.495766
 - src: "vì tối đó"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 7.000000, loss: 47.781513
 - src: "trán giô"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 9.000000, loss: 48.339012
 - src: "cái khó là"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 8.000000, loss: 50.391266
 - src: "làm khung"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 25.000000, loss: 108.753731
 - src: "đàn ông có hai thứ để buồn"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 23.000000, loss: 108.777031
 - src: "là tương đối có hiệu quả"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 22.000000, loss: 108.902710
 - src: "chín mươi chín mươi mốt"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 25.000000, loss: 110.358658
 - src: "để chọn máy in cho phù hợp"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 25.000000, loss: 110.703247
 - src: "tôi có thể căm giận đủ thứ"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 23.000000, loss: 110.721542
 - src: "được mỹ ráo riết tung ra"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 23.000000, loss: 110.758110
 - src: "nhưng giấc ngủ chập chờn"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 24.000000, loss: 111.529968
 - src: "dọc hai bên sông ngàn sâu"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 22.000000, loss: 111.639183
 - src: "làm đẹp không đúng cách"
 - res: "ở "
--------------------------------------------------------------------------------
WER: 1.000000, CER: 25.000000, loss: 112.300140
 - src: "hai mươi tám hai mươi chín"
 - res: "ở "
--------------------------------------------------------------------------------
I Exporting the model...
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
I Models exported at /Users/tringuyen/Documents/DeepSpeech/myresult/export


I don’t know what is going on with this result. Is this bad or good after 4 epochs ? What can I do to make the result more better.

I had the same issue, while training on Chinese language.
Let me get back to you later for what i did.
But i think it’s good to increase the language model, by adding other text to the transcript dataset.

1 Like

I’m still stucking in this issue. Hope someone can help me.

command line add --early_stop False . these flags can be found in /util/flags.py

No no, I just want to know is my data too bad? Why all the results output ares the same.

i think you need to change the batchsize and learning rate ,your mode not convergence

I am working on Urdu language and facing the same issue.
Please help. If somebody find a solution of this problem.
My data is almost 100 hours.

@lissyx @kdavis Kindly help.

What does “same issue” mean in this context?

That my trained model gives me only one word for all files. Either I checked it on train files or test files.

When I trained on a very small corpus, almost 6 files, it learned only one word, though that word doesn’t belong to any set of data. and it gives the same word if I tried to decode either on train file or test file.

and When I tried on almost 100 hours, it doesn’t give any single word in the results.
Epochs: 20, with early stop = TRUE
Learning rate: 0.0001

Why is it so ?

What happens if you run run-ldc93s1.sh as follows:

(.virtualenv) kdavis-19htdh:DeepSpeech kdavis$ ./bin/run-ldc93s1.sh


WER: 1.000000, CER: 48.000000, loss: 27.773754

  • src: “she had your dark suit in greasy wash water all year”
  • res: “edted”
    This is the result of ldc93s1.

I’m not sure how but your install is very broken. This is basically a “smoke test” that every one of our PR’s has to pass.

If I were you, I’d check everything out from scratch and follow the README instructions again.

1 Like

Are we suppose to train the model while activating the virtual environment ?

It’s recommended that you use a virtual environment.

I followed the github guideline throughout and working on 0.4.1 master.
I also got the exported model with best validation loss.

Installation:
Linux: 16.04 LTS
CUDA 9.0
CUDNN= 7.1.3
Python 3.6.3
DeepSpeech 0.4.1 master
requirements.txt installed. but Tensorflow-gpu ==1.12.0
Installed LFS from the link given on github
Bazel 0.5.1
Downloaded and checked the pre-trained model from Common voice utterances and results are almost 99%
install CTC
Data prepared in CSV format.
Build the native client
have language model prepared from KenLM, generate trie file.

There is no error I got in training and got the .pb model in the end and also results gave me WER though it doesn’t have anything in decoded output.

Please let me know where I am doing wrong,
Thank you for all the help.

There are many steps here and a problem can creep in anywhere.

To help in debugging can you supply the final training log?

log.zip (106.9 KB)

Here it is.

Don’t know if it’s the encoding, but the log looks like line noise.