I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:01:04 | Steps: 410 | Loss: 114.940936
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 152.060320 | Dataset: my-dev.csv
I Saved new best validating model with loss 152.060320 to: ../checkpoint/best_dev-410
Epoch 1 | Training | Elapsed Time: 0:01:00 | Steps: 410 | Loss: 111.319498
Epoch 1 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 144.299709 | Dataset: my-dev.csv
I Saved new best validating model with loss 144.299709 to: ../checkpoint/best_dev-820
Epoch 2 | Training | Elapsed Time: 0:00:42 | Steps: 336 | Loss: 110.281201 Epoch 2 | Training | Elapsed Time: 0:00:42 | Steps: 337 | Loss: 110.260607 Epoch 2 | Training | Elapsed Time: 0:01:00 | Steps: 410 | Loss: 111.206717
Epoch 2 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 141.602309 | Dataset: my-dev.csv
I Saved new best validating model with loss 141.602309 to: ../checkpoint/best_dev-1230
I FINISHED optimization in 0:03:18.015672
I Exporting the model...
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
E All initialization methods failed (['best', 'last']).
When loading checkpoints, the code respects the --load_checkpoint_dir flag. When saving, it respects the --save_checkpoint_dir flag. You should be able to run again with --load_checkpoint_dir and the export flags, and it’ll pick up the checkpoint saved during training.
./DeepSpeech.py --n_hidden 2048 --save_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --load_checkpoint_dir /home/dimanshu/latestcheckpoiint/checkpoint --epochs 100 --train_files my-train.csv --dev_files my-dev.csv --test_files my-test.csv --learning_rate 0.0001 --train_cudnn true --alphabet_config_path /home/dimanshu/alpha.txt --export_dir /home/dimanshu/latestcheckpoiint/checkpoint
but after completing the training when im checking with sample data it is showing no result
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 137.519836
- wav: file:///home/dimanshu/mydatadeepspeech/youtube-course-1/final_sound/5c45ebc9-8e10-4079-9a03-0688fbc3b96c.wav
- src: "every literals this called axiom now in"
- res: ""
Loading model from file /home/dimanshu/latestcheckpoiint/checkpoint/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-04-23 07:58:18.297305: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.146s.
Running inference.
Inference took 4.137s for 2.490s audio file.
GPU = tesla t4 but now im using v100
language = english
alphabet file =capital and small letter , special character and numbers .
train and dev batch size =default
it will take too much time to train from scratch so that’s why i’m training over the existing latest checkpoint releases v0.6.1
after 75epochs
Loading model from file …/best_path/output_graph.pb
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.1-0-g3df20fe
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2020-04-27 05:04:41.270571: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 1.61s.
Running inference.
n l sth js
expected output=am global to make sure that it just
output from the model =n l sth js
Adding to @lissyx try a dropout of 0.4, but I as I said you may need about 200k to get a better WER. Your results are ok for that amount of data if the language is quite diverse.
after completing this if i start the training again then it will resume from best check path which is on 4th epoch so there is no point after 4th epoch.?
yes i will add more data to make it 200k .
1)so i have to fine tune my model first then start the training ?and i will use dropout = 0.4 and learning rate= 0.0001