Hi @lissyx
I tried what you suggested and now able to train the model.
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir deepspeech-0.6.1-checkpoint --epochs 100 --train_files …/deepspeech-0.6.1-models/audio/fluent_speech/csv/train.csv --dev_files …/deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv --test_files …/deepspeech-0.6.1-models/audio/fluent_speech/csv/test.csv --learning_rate 0.000001 --use_cudnn_rnn true --use_allow_growth true --lm_binary_path …/deepspeech-0.6.1-models/lm.binary --lm_trie_path …/deepspeech-0.6.1-models/trie --train_batch_size 64 --dev_batch_size 64 --test_batch_size 64
My training data is: https://www.fluent.ai/research/fluent-speech-commands/
The model was training smoothly but after some epochs, early stopping triggerd and it returns some error without exporting the .pb file.
Error:
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:02:01 | Steps: 361 | Loss: 15.122960
Epoch 0 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 15.799511 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 15.799511 to: deepspeech-0.6.1-checkpoint/best_dev-234145
Epoch 1 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 10.945183
Epoch 1 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 13.657758 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
WARNING:tensorflow:From /home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py:963: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
W0214 11:39:42.411926 139768476886848 deprecation.py:323] From /home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py:963: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I Saved new best validating model with loss 13.657758 to: deepspeech-0.6.1-checkpoint/best_dev-234506
Epoch 2 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 9.513324
Epoch 2 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 12.472702 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 12.472702 to: deepspeech-0.6.1-checkpoint/best_dev-234867
Epoch 3 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 8.588798
Epoch 3 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 11.621569 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 11.621569 to: deepspeech-0.6.1-checkpoint/best_dev-235228
Epoch 4 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 7.921981
Epoch 4 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 10.949868 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 10.949868 to: deepspeech-0.6.1-checkpoint/best_dev-235589
Epoch 5 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 7.381936
Epoch 5 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 10.450499 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 10.450499 to: deepspeech-0.6.1-checkpoint/best_dev-235950
Epoch 6 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 6.925110
Epoch 6 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 9.995356 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 9.995356 to: deepspeech-0.6.1-checkpoint/best_dev-236311
Epoch 7 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 6.541599
Epoch 7 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 9.585989 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 9.585989 to: deepspeech-0.6.1-checkpoint/best_dev-236672
Epoch 8 | Training | Elapsed Time: 0:01:57 | Steps: 361 | Loss: 6.200359
Epoch 8 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 9.234038 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 9.234038 to: deepspeech-0.6.1-checkpoint/best_dev-237033
Epoch 9 | Training | Elapsed Time: 0:01:57 | Steps: 361 | Loss: 5.890317
Epoch 9 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 8.920528 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 8.920528 to: deepspeech-0.6.1-checkpoint/best_dev-237394
Epoch 10 | Training | Elapsed Time: 0:01:57 | Steps: 361 | Loss: 5.630370
Epoch 10 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 8.636955 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 8.636955 to: deepspeech-0.6.1-checkpoint/best_dev-237755
Epoch 11 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 5.385189
Epoch 11 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 8.378594 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 8.378594 to: deepspeech-0.6.1-checkpoint/best_dev-238116
Epoch 12 | Training | Elapsed Time: 0:01:58 | Steps: 361 | Loss: 5.166077
Epoch 12 | Validation | Elapsed Time: 0:00:06 | Steps: 48 | Loss: 8.148501 | Dataset: ../deepspeech-0.6.1-models/audio/fluent_speech/csv/validate.csv
I Saved new best validating model with loss 8.148501 to: deepspeech-0.6.1-checkpoint/best_dev-238477
I Early stop triggered as (for last 4 steps) validation loss: 8.148501 with standard deviation: 0.221324 and mean: 8.645359
I FINISHED optimization in 0:27:50.338386
WARNING:tensorflow:From /home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
W0214 12:03:12.865196 139768476886848 deprecation.py:323] From /home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/tensorflow_core/contrib/rnn/python/ops/lstm_ops.py:597: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
INFO:tensorflow:Restoring parameters from deepspeech-0.6.1-checkpoint/best_dev-238477
I0214 12:03:12.936584 139768476886848 saver.py:1284] Restoring parameters from deepspeech-0.6.1-checkpoint/best_dev-238477
I Restored variables from best validation checkpoint at deepspeech-0.6.1-checkpoint/best_dev-238477, step 238477
Testing model on ../deepspeech-0.6.1-models/audio/fluent_speech/csv/test.csv
Test epoch | Steps: 50 | Elapsed Time: 0:04:54 Traceback (most recent call last):
File "/home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/ds_ctcdecoder/swigwrapper.py", line 581, in <lambda>
__setattr__ = lambda self, name, value: _swig_setattr(self, OutputVectorVector, name, value)
KeyboardInterrupt
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "DeepSpeech.py", line 974, in <module>
absl.app.run(main)
File "/home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "DeepSpeech.py", line 951, in main
test()
File "DeepSpeech.py", line 684, in test
samples = evaluate(FLAGS.test_files.split(','), create_model, try_loading)
File "/home/glmr/glmShare/AzeemData/DeepSpeech-0.6.1/evaluate.py", line 155, in evaluate
samples.extend(run_test(init_op, dataset=csv))
File "/home/glmr/glmShare/AzeemData/DeepSpeech-0.6.1/evaluate.py", line 122, in run_test
cutoff_prob=FLAGS.cutoff_prob, cutoff_top_n=FLAGS.cutoff_top_n)
File "/home/glmr/anaconda3/envs/azeem_vir_env/lib/python3.6/site-packages/ds_ctcdecoder/__init__.py", line 116, in ctc_beam_search_decoder_batch
batch_beam_results = swigwrapper.ctc_beam_search_decoder_batch(probs_seq, seq_lengths, native_alphabet, beam_size, num_processes, cutoff_prob, cutoff_top_n, scorer)
SystemError: <built-in function ctc_beam_search_decoder_batch> returned a result with an error set