I was training the pre-trained model and models has trained i got output_graph.pb
When i checked the model it become worse than pre-models deepspeech 0.5.1
Here is my command and parameters. Please guide to use the best parameters
I am using Deepspeech 0.5.1 GPU RTX 4000 Ubuntu 18.04 Tensorflow-GPU 1.14.0
Training was fine with here is the process
I Restored variables from most recent checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/model.v0.5.1, step 467356
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 1:00:04 | Steps: 6642 | Loss: 41.552052 WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0912 15:14:06.021101 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 0 | Training | Elapsed Time: 1:13:28 | Steps: 7574 | Loss: 44.842819
Epoch 0 | Validation | Elapsed Time: 0:04:15 | Steps: 1528 | Loss: 50.622232 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Saved new best validating model with loss 50.622232 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-474930
Epoch 1 | Training | Elapsed Time: 1:13:23 | Steps: 7574 | Loss: 40.343765
Epoch 1 | Validation | Elapsed Time: 0:04:13 | Steps: 1528 | Loss: 47.958261 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Saved new best validating model with loss 47.958261 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-482504
Epoch 2 | Training | Elapsed Time: 1:13:34 | Steps: 7574 | Loss: 37.761659
Epoch 2 | Validation | Elapsed Time: 0:04:19 | Steps: 1528 | Loss: 47.888508 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Saved new best validating model with loss 47.888508 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-490078
Epoch 3 | Training | Elapsed Time: 1:13:21 | Steps: 7574 | Loss: 35.337711
Epoch 3 | Validation | Elapsed Time: 0:04:15 | Steps: 1528 | Loss: 47.695037 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Saved new best validating model with loss 47.695037 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
Epoch 4 | Training | Elapsed Time: 1:13:17 | Steps: 7574 | Loss: 33.512327
Epoch 4 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 48.027997 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Early stop triggered as (for last 4 steps) validation loss: 48.027997 with standard deviation: 0.111347 and mean: 47.847269
I FINISHED optimization in 6:28:28.351865
INFO:tensorflow:Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
I0912 20:42:31.164370 140096652371776 saver.py:1280] Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652
I Restored variables from best validation checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-497652, step 497652
Testing model on /home/karthik/speech/DeepSpeech/data/corpus/clips/test.csv
Test epoch | Steps: 3014 | Elapsed Time: 0:18:08
Test on /home/karthik/speech/DeepSpeech/data/corpus/clips/test.csv - WER: 0.562949, CER: 0.372234, loss: 56.315739
--------------------------------------------------------------------------------
WER: 3.000000, CER: 1.777778, loss: 120.665932
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_54384.wav
- src: "undefined"
- res: "then after canister "
--------------------------------------------------------------------------------
WER: 2.500000, CER: 2.764706, loss: 214.952164
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_17645060.wav
- src: "did you know that"
- res: "the two now that the denotat titulo that the notation that"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.571429, loss: 11.507010
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_18320583.wav
- src: "nosiree"
- res: "no there"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.250000, loss: 16.147278
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_191353.wav
- src: "amen"
- res: "the man"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.363636, loss: 20.434793
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_16047346.wav
- src: "kettledrums"
- res: "cattle drams"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.818182, loss: 25.313591
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_629809.wav
- src: "kettledrums"
- res: "go dream"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.470588, loss: 25.920713
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_283146.wav
- src: "medley hotchpotch"
- res: "men may hutch punch"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 32.957829
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_3514384.wav
- src: "stay tuned"
- res: "the tune a prop"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.250000, loss: 144.522491
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_680693.wav
- src: "find me the saga air cavalry"
- res: "i made a saucerful time i see i was covered for"
--------------------------------------------------------------------------------
WER: 1.750000, CER: 0.451613, loss: 66.565109
- wav: file:///home/karthik/speech/DeepSpeech/data/corpus/clips/common_voice_en_137155.wav
- src: "that's an inherent disadvantage"
- res: "the then and heron as a vantage"
--------------------------------------------------------------------------------
I Exporting the model...
INFO:tensorflow:Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/train-505226
I0912 21:00:46.095046 140096652371776 saver.py:1280] Restoring parameters from /home/karthik/speech/DeepSpeech/data/checkpoint/train-505226
WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
W0912 21:00:46.182916 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:270: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
W0912 21:00:46.183073 140096652371776 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/framework/graph_util_impl.py:270: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 12 variables.
I0912 21:00:46.220419 140096652371776 graph_util_impl.py:311] Froze 12 variables.
INFO:tensorflow:Converted 12 variables to const ops.
I0912 21:00:46.297007 140096652371776 graph_util_impl.py:364] Converted 12 variables to const ops.
I Models exported at /home/karthik/speech/DeepSpeech/data/export/
Let me know if you need anything than this.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Language -> English, 30 gb of data with 60592 steps inside train.csv, 12000 steps in dev.csv, 12500 steps in test.csv i have followed the steps from deepspeech to convert mp3 to wav etc and created csv of train.csv, test.csv, dev.csv.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
This should just be the canonical LM we release as lm.binary and trie. Can you try with our files, to make sure?
What steps? import_cv2.py does that for you. Also, you changed learning rate and dropout. From previous experiences, it seems you might want even lower learning rate, and re-use our dropout values.
I have tried with deepspeech lm.binary and trie from your advice and result into error
I Restored variables from most recent checkpoint at /home/karthik/speech/DeepSpeech/data/checkpoint/model.v0.5.1, step 467356
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 1:00:04 | Steps: 6709 | Loss: 100.632820 WARNING:tensorflow:From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0914 15:08:04.290839 140140694878016 deprecation.py:323] From /home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
Epoch 0 | Training | Elapsed Time: 1:12:27 | Steps: 7574 | Loss: 104.902475
Epoch 0 | Validation | Elapsed Time: 0:04:17 | Steps: 1528 | Loss: 99.321217 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Saved new best validating model with loss 99.321217 to: /home/karthik/speech/DeepSpeech/data/checkpoint/best_dev-474930
Epoch 1 | Training | Elapsed Time: 1:12:26 | Steps: 7574 | Loss: 98.436469
Epoch 1 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 100.826695 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
Epoch 2 | Training | Elapsed Time: 1:12:25 | Steps: 7574 | Loss: 101.351250
Epoch 2 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 101.032610 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
Epoch 3 | Training | Elapsed Time: 1:12:21 | Steps: 7574 | Loss: 104.625136
Epoch 3 | Validation | Elapsed Time: 0:04:14 | Steps: 1528 | Loss: 106.296748 | Dataset: /home/karthik/speech/DeepSpeech/data/corpus/clips/dev.csv
I Early stop triggered as (for last 4 steps) validation loss: 106.296748 with standard deviation: 0.762869 and mean: 100.393507
I FINISHED optimization in 5:06:43.507333
Loading the LM will be faster if you build a binary file.
Reading /home/karthik/speech/DeepSpeech/data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): ../kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Fatal Python error: Aborted
Thread 0x00007f7456e8f700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 159 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Thread 0x00007f7455e8d700 (most recent call first):
File "/usr/lib/python3.6/threading.py", line 295 in wait
File "/usr/lib/python3.6/queue.py", line 164 in get
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py", line 159 in run
File "/usr/lib/python3.6/threading.py", line 916 in _bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 884 in _bootstrap
Current thread 0x00007f750c563740 (most recent call first):
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/ds_ctcdecoder/swigwrapper.py", line 231 in __init__
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/ds_ctcdecoder/__init__.py", line 22 in __init__
File "/home/karthik/speech/DeepSpeech/evaluate.py", line 45 in evaluate
File "DeepSpeech1.py", line 554 in test
File "DeepSpeech1.py", line 824 in main
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/absl/app.py", line 250 in _run_main
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/absl/app.py", line 299 in run
File "/home/karthik/tmp/deepspeech-venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40 in run
File "DeepSpeech1.py", line 836 in <module>
Aborted (core dumped)
Kindly check and let me know if i made any error this time i have reduced the learning rate and dropout
But model (output_graph.pbmm) looks bad when compared to old.
My doubt is should common voice mozilla corpus will be good accuracy datasets? or should i need to train with my own data?
I am training English language. Checkpoint steps has been increased from 467356 to 520374.