I am getting Validation Loss: inf

Hi Deepspeech,

I am training model from the common voice Dataset. I have been running the following commands.

python -u DeepSpeech.py
–train_files /home/javi/train/train.csv
–dev_files /home/javi/train/dev.csv
–test_files /home/javi/train/test.csv
–train_batch_size 80
–dev_batch_size 80
–test_batch_size 40
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False
–export_dir /home/javi/speech/tools/backup/export_modal/
–checkpoint_dir /home/javi/speech/tools/backup/checkout/
–alphabet_config_path /home/javi/speech/tools/backup/DeepSpeech/data/alphabet.txt
–lm_binary_path /home/javi/speech/tools/backup/DeepSpeech/data/lm.binary \

Here are my outputs, Could some one please help what is happening with my model training its been too long.
Is that my commands above is correct to train the model?
I am getting Validation Loss: inf --> Is that any error? What kind of error?

Please help.

Output::
Instructions for updating:
Use tf.cast instead.
I Initializing variables…
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 21:24:41 | Steps: 60592 | Loss: 129.07832
Epoch 0 | Validation | Elapsed Time: 0:20:00 | Steps: 12229 | Loss: inf | Dataset: /home/javi/train/dev.csv
Epoch 1 | Training | Elapsed Time: 21:01:41 | Steps: 60592 | Loss: 129.14967
Epoch 1 | Validation | Elapsed Time: 0:16:15 | Steps: 12229 | Loss: inf | Dataset: /home/javi/train/dev.csv
Epoch 2 | Training | Elapsed Time: 06:00:55 | Steps: 27698 | Loss: 100.14967

This means your development/validation file contains a file (or more) that generates inf loss.

If you’re using v.0.5.1 release, modify your files as mentioned here: How to find the which file is making loss inf

Run a separate training on your /home/javi/train/dev.csv file, trace your printed output for any lines that saying

The following files caused an infinite (or NaN) loss: … .wav

, remove those wav files from your data.

1 Like

Any idea what type of wav file would lead to inf loss? Would like to just write a script to test the files and remove them instead of training over all the files until it prints out a console message.
The wav file that causes infinite loss in my case seems fine - it seems to not be corrupted and has proper audio with respect to its transcript.
Any ideas?

This is a really old thread and might not be up to date. Please don’t hijack, but open a new thread and give some information.