Training loss is inf but validation loss is decreasing

Hi
we are trying to train the deep speech model for the Indian spoken language of ‘hinglish’. We have around 1300+ hours of data with 27 characters in the alphabet.txt and are using the following parameters :

./DeepSpeech.py
–train_files data/train.csv
–dev_files data/dev.csv
–test_files data/test.csv
–alphabet_config_path data/alphabet.txt
–checkpoint_dir ~/checkpoint_dir
–export_dir ~/saved_models
–lm_binary_path data/lm.binary
–lm_trie_path data/trie
–report_count 200
–show_progressbar true
–train_batch_size 20
–test_batch_size 48
–dev_batch_size 48
–validation_step 1
–learning_rate 0.0001
–n_hidden 2048
–epoch 10
–dropout_rate 0.15

The train and the validation losses are as follows:

Training of Epoch 0 - loss: inf
Validation of Epoch 0 - loss: 95.800559
Training of Epoch 1 - loss: inf
Validation of Epoch 1 - loss: 81.028593
Training of Epoch 2 - loss: inf
Validation of Epoch 2 - loss: 77.50257
Training of Epoch 3 - loss: inf
Validation of Epoch 3 - loss: 75.459322
Training of Epoch 4 - loss: inf
Validation of Epoch 4 - loss: 74.528508

what changes can we do in the parameters to make this better?
Our dataset is not very clean, but not that bad either.

Any help is appreciated.

It’s not unusual that the first epochs shows a training loss of inf. It’s related to your hyperparameters, maybe you should increase the learning rate?

Thanks for the reply!

I increased my learning rate to 0.001
still, it shows the training loss as infinite till the first 4 epochs

But this time the validation loss is high and is not decreasing very much
Like :

Validation of Epoch 0 - loss: 337.850228
Validation of Epoch 1 - loss: 336.426547
Validation of Epoch 2 - loss: 335.004593

what could be the appropriate learning rate for such a model?

Also when I gave a smaller dataset of 300+ hours, there was no such problem.
We increased our data by augmentation.
And ever since we increased that dataset to 1300+ hours, there hasn’t been a single epoch where we got a training loss not being inf.
Could this be caused because of some problem with the dataset?

I found the problem,
There were some files in the dataset that were faulty.
By diving my data into chunks and giving each chunk separately, I was able to identify the faulty audio files.

Did you use any programmatic steps to identify these faulty files? I’ve used a few; identifying outputs larger than inputs (impacts CTC loss), very low average decibel (likely bad sample) level and error on open/read (corrupt files).

Hi

error on open/read (corrupt files).

We had corrupt files in the dataset. So finding them programmatically was easy by simply loading each file.

Thanks. I have done a similar check, where the files, as being converted to MFCC, are written to a dump file if…

  • wav.read error: corrupt file
  • Output > Input: CTC issue
  • avg decibel level = -inf: no energy in audio
  • avg decibel level <= -50: probably a bad sample

These files are then dropped from training/testing.

However, I am still receiving inf train loss with decreasing validation loss. I’ll continue modifying my learning rate as suggested in other posts on this forum.

Hi @nishthajain1611

I’m trying out on using Deepspeech. How did you solve this problem and finally, what’s the Train & Validation error. How long did it take?

Hi @tuttlebr

What do you mean by output>input? And why would it affect CTC loss?

It would be great if you could drop a line or two about it

Outputs represent the array of character labels: “loss” =[11,14,18,18]
Input represents the MFCC representation of the audio as array =[1,7,6]

To calculate comparable import (processing.py)
len(mfcc) - 2*numcontext

Tensorflow’s CTC loss function and the Deepspeech preprocessing script don’t like this. The ignore_longer_outputs_than_inputs option allows to specify the behavior of the CTCLoss when dealing with sequences that have longer outputs than inputs. If true, the CTCLoss will simply return zero gradient for those items, otherwise an InvalidArgument error is returned, stopping training.

I just used this check to find valid/invalid audio samples in my data.

Legit. Understood. Need to check this out once.