Hi guys,
I’m doing the following in order to do a fine-tuning model. However, I faced a power issue and lost the running process.
The lastest log that I have says that stopped at epoch 2 (please check the log below).
My question is: Is there any way to resume from this point? What’s the correct way?
Maybe the parameter for that is the --load_checkpoint_dir? or I had must trained using --save_checkpoint_dir or something before started?
I’m using Deepspeech v0.9.1
Thank you
python3 DeepSpeech.py --n_hidden 2048
–checkpoint_dir fine_tuning_checkpoints/
–epochs 3
–train_files /data/librivox-train-clean-100.csv
–dev_files /data/librivox-dev-clean.csv
–test_files /data/librivox-test-clean.csv
–learning_rate 0.0001
–export_dir output_models/
–train_cudnn
I1116 21:34:23.325432 140401849284480 utils.py:141] NumExpr defaulting to 2 threads.
I Loading best validating checkpoint from fine_tuning_checkpoints/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 3:36:24 | Steps: 28539 | Loss: 21.645701
Epoch 0 | Validation | Elapsed Time: 0:13:15 | Steps: 2703 | Loss: 15.384955 | Dataset: /data/librivox-dev-clean.csv
I Saved new best validating model with loss 15.384955 to: fine_tuning_checkpoints/best_dev-1495014
Epoch 1 | Training | Elapsed Time: 3:37:58 | Steps: 28539 | Loss: 19.061899
Epoch 1 | Validation | Elapsed Time: 0:09:12 | Steps: 2703 | Loss: 15.219610 | Dataset: /data/librivox-dev-clean.csv
I Saved new best validating model with loss 15.219610 to: fine_tuning_checkpoints/best_dev-1523553
Epoch 2 | Training | Elapsed Time: 0:08:18 | Steps: 2303 | Loss: 6.402097