Pre-trained model become worse when i trained common voice data

Hi,

I was training the pre-trained model and models has trained i got output_graph.pb
When i checked the model it become worse than pre-models deepspeech 0.5.1

Here is my command and parameters. Please guide to use the best parameters

python -u DeepSpeech1.py \
   --n_hidden 2048 \
   --epochs 75 \
   --checkpoint_dir /home/speech/DeepSpeech/data/checkpoint2/ \
   --train_files /home/speech/DeepSpeech/data/corpus/clips/train.csv \
   --dev_files /home/speech/DeepSpeech/data/corpus/clips/dev.csv \
   --test_files /home/speech/DeepSpeech/data/corpus/clips/test.csv \
   --train_batch_size 24 \
   --dev_batch_size 48 \
   --test_batch_size 48 \
   --dropout_rate 0.15 \
   --learning_rate 0.0001 \
   --lm_binary_path /home/speech/DeepSpeech/data/mycreatedOG/lm.binary \
   --lm_trie_path /home/speech/DeepSpeech/data/mycreatedOG/trie \
   --export_dir /home/speech/DeepSpeech/data/export/ \
  "$@"

I am using Deepspeech 0.5.1

We can’t help without more context.

@lissyx

I have downloaded checkpoint https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-checkpoint.tar.gz and i have downloaded common voice mozilla https://voice.mozilla.org/data.

I am using
Deepspeech 0.5.1
GPU RTX 4000
Ubuntu 18.04
Tensorflow-GPU 1.14.0

Training was fine with here is the process

I Restored variables from most recent checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/model.v0.5.1, step 467356
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 1:00:04 | Steps: 6642 | Loss: 41.552052                                                                                                                                                                                            WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0912 15:14:06.021101 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 0 |   Training | Elapsed Time: 1:13:28 | Steps: 7574 | Loss: 44.842819                                                                                                                                                                                            
Epoch 0 | Validation | Elapsed Time: 0:04:15 | Steps: 1528 | Loss: 50.622232 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Saved new best validating model with loss 50.622232 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-474930
Epoch 1 |   Training | Elapsed Time: 1:13:23 | Steps: 7574 | Loss: 40.343765                                                                                                                                                                                            
Epoch 1 | Validation | Elapsed Time: 0:04:13 | Steps: 1528 | Loss: 47.958261 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Saved new best validating model with loss 47.958261 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-482504
Epoch 2 |   Training | Elapsed Time: 1:13:34 | Steps: 7574 | Loss: 37.761659                                                                                                                                                                                            
Epoch 2 | Validation | Elapsed Time: 0:04:19 | Steps: 1528 | Loss: 47.888508 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Saved new best validating model with loss 47.888508 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-490078
Epoch 3 |   Training | Elapsed Time: 1:13:21 | Steps: 7574 | Loss: 35.337711                                                                                                                                                                                            
Epoch 3 | Validation | Elapsed Time: 0:04:15 | Steps: 1528 | Loss: 47.695037 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Saved new best validating model with loss 47.695037 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
Epoch 4 |   Training | Elapsed Time: 1:13:17 | Steps: 7574 | Loss: 33.512327                                                                                                                                                                                            
Epoch 4 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 48.027997 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Early stop triggered as (for last 4 steps) validation loss: 48.027997 with standard deviation: 0.111347 and mean: 47.847269
I FINISHED optimization in 6:28:28.351865
INFO:tensorflow:Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
I0912 20:42:31.164370 140096652371776 saver.py:1280] Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
I Restored variables from best validation checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652, step 497652
Testing model on /home/karthik/speech/DeepSpeech/data/corpus/clips/test.csv
Test epoch | Steps: 3014 | Elapsed Time: 0:18:08                                                                                                                                                                                                                        
Test on /home/karthik/speech/DeepSpeech/data/corpus/clips/test.csv - WER: 0.562949, CER: 0.372234, loss: 56.315739
--------------------------------------------------------------------------------
WER: 3.000000, CER: 1.777778, loss: 120.665932
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_54384.wav
 - src: "undefined"
 - res: "then after canister "
--------------------------------------------------------------------------------
WER: 2.500000, CER: 2.764706, loss: 214.952164
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_17645060.wav
 - src: "did you know that"
 - res: "the two now that the denotat titulo that the notation that"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.571429, loss: 11.507010
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_18320583.wav
 - src: "nosiree"
 - res: "no there"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.250000, loss: 16.147278
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_191353.wav
 - src: "amen"
 - res: "the man"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.363636, loss: 20.434793
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_16047346.wav
 - src: "kettledrums"
 - res: "cattle drams"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.818182, loss: 25.313591
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_629809.wav
 - src: "kettledrums"
 - res: "go dream"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.470588, loss: 25.920713
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_283146.wav
 - src: "medley hotchpotch"
 - res: "men may hutch punch"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 32.957829
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_3514384.wav
 - src: "stay tuned"
 - res: "the tune a prop"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.250000, loss: 144.522491
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_680693.wav
 - src: "find me the saga air cavalry"
 - res: "i made a saucerful time i see i was covered for"
--------------------------------------------------------------------------------
WER: 1.750000, CER: 0.451613, loss: 66.565109
 - wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_137155.wav
 - src: "that's an inherent disadvantage"
 - res: "the then and heron as a vantage"
--------------------------------------------------------------------------------
I Exporting the model...
INFO:tensorflow:Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/train-505226
I0912 21:00:46.095046 140096652371776 saver.py:1280] Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/train-505226
WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
W0912 21:00:46.182916 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:270: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
W0912 21:00:46.183073 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:270: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 12 variables.
I0912 21:00:46.220419 140096652371776 graph_util_impl.py:311] Froze 12 variables.
INFO:tensorflow:Converted 12 variables to const ops.
I0912 21:00:46.297007 140096652371776 graph_util_impl.py:364] Converted 12 variables to const ops.
I Models exported at /home/karthik/speech/DeepSpeech/data/export/

Let me know if you need anything than this.

Which language ? How much data does it makes ?

What is this LM that you are using ?

@lissyx

Language -> English, 30 gb of data with 60592 steps inside train.csv, 12000 steps in dev.csv, 12500 steps in test.csv i have followed the steps from deepspeech to convert mp3 to wav etc and created csv of train.csv, test.csv, dev.csv.

LM is generated from https://github.com/mozilla/DeepSpeech/blob/master/data/lm/README.md
and same for trie.

This should just be the canonical LM we release as lm.binary and trie. Can you try with our files, to make sure?

What steps? import_cv2.py does that for you. Also, you changed learning rate and dropout. From previous experiences, it seems you might want even lower learning rate, and re-use our dropout values.

yes

what is the best value for learning_rate and dropout rate for my scenario?

You need to experiment yourself.

1 Like

@lissyx

I have tried with deepspeech lm.binary and trie from your advice and result into error

 I Restored variables from most recent checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/model.v0.5.1, step 467356
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 1:00:04 | Steps: 6709 | Loss: 100.632820                                                                                                                                                                                           WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0914 15:08:04.290839 140140694878016 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 0 |   Training | Elapsed Time: 1:12:27 | Steps: 7574 | Loss: 104.902475                                                                                                                                                                                           
Epoch 0 | Validation | Elapsed Time: 0:04:17 | Steps: 1528 | Loss: 99.321217 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                       
I Saved new best validating model with loss 99.321217 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-474930
Epoch 1 |   Training | Elapsed Time: 1:12:26 | Steps: 7574 | Loss: 98.436469                                                                                                                                                                                            
Epoch 1 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 100.826695 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
Epoch 2 |   Training | Elapsed Time: 1:12:25 | Steps: 7574 | Loss: 101.351250                                                                                                                                                                                           
Epoch 2 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 101.032610 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
Epoch 3 |   Training | Elapsed Time: 1:12:21 | Steps: 7574 | Loss: 104.625136                                                                                                                                                                                           
Epoch 3 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 106.296748 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv                                                                                                                      
I Early stop triggered as (for last 4 steps) validation loss: 106.296748 with standard deviation: 0.762869 and mean: 100.393507
I FINISHED optimization in 5:06:43.507333
Loading the LM will be faster if you build a binary file.
Reading /home/karthik/speech/DeepSpeech/data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  ../kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Fatal Python error: Aborted

Thread 0x00007f7456e8f700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 159 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Thread 0x00007f7455e8d700 (most recent call first):
  File "/usr/lib/python3.6/threading.py", line 295 in wait
  File "/usr/lib/python3.6/queue.py", line 164 in get
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 159 in run
  File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
  File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap

Current thread 0x00007f750c563740 (most recent call first):
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/ds_ctcdecoder/swigwrapper.py", line 231 in __init__
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/ds_ctcdecoder/__init__.py", line 22 in __init__
  File "/home/karthik/speech/DeepSpeech/evaluate.py", line 45 in evaluate
  File "DeepSpeech1.py", line 554 in test
  File "DeepSpeech1.py", line 824 in main
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/absl/app.py", line 250 in _run_main
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/absl/app.py", line 299 in run
  File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40 in run
  File "DeepSpeech1.py", line 836 in <module>
Aborted (core dumped)

Kindly check and let me know if i made any error this time i have reduced the learning rate and dropout

python -u DeepSpeech1.py \
   --n_hidden 2048 \
   --epochs 75 \
   --checkpoint_dir /home/karthik/speech/DeepSpeech/data/checkpoint/ \
   --train_files /home/karthik/speech/DeepSpeech/data/corpus/clips/train.csv \
   --dev_files /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv \
   --test_files /home/karthik/speech/DeepSpeech/data/corpus/clips/test.csv \
   --train_batch_size 8 \
   --dev_batch_size 8 \
   --test_batch_size 4 \
   --dropout_rate 0.05 \
   --learning_rate 0.001 \
   --lm_binary_path /home/karthik/speech/DeepSpeech/data/lm/lm.binary \
   --lm_trie_path /home/karthik/speech/DeepSpeech/data/lm/trie \
   --export_dir /home/karthik/speech/DeepSpeech/data/export/ \
  "$@"

@javi.rahman I’d try a batch size of 1 and fewer LR. BTW why did toy say that you had

60592 steps inside train.csv

and it shows just 7574?

@reyxuan

So, you want me to try with batch size 1 and lower learning rate 0.0001.

Due to train batch size 8 so it is converting to 7574 value (60592/8)

@javi.rahman

So, you want me to try with batch size 1 and lower learning rate 0.0001.

Yes. I’d wait for @lissyx confirmation, but usually in DL the best way to debug is using batch size 1.

1 Like

It looks like you have not properly followed documentation and setup git-lfs.

1 Like

You have increased learning rate, 0.001 > 0.0001. Could you please first try with the same hyperparameters as we released ?

Have you made any change to DeepSpeech.py ? If you change the code it’s going to be even harder to help …

1 Like

Okay sure. I will check and let you know.

I have followed the following links (How to find the which file is making loss inf) to identify loss inf files from my steps.