Hi
we are trying to train the deep speech model for the Indian spoken language of ‘hinglish’. We have around 1300+ hours of data with 27 characters in the alphabet.txt and are using the following parameters :
The train and the validation losses are as follows:
Training of Epoch 0 - loss: inf
Validation of Epoch 0 - loss: 95.800559
Training of Epoch 1 - loss: inf
Validation of Epoch 1 - loss: 81.028593
Training of Epoch 2 - loss: inf
Validation of Epoch 2 - loss: 77.50257
Training of Epoch 3 - loss: inf
Validation of Epoch 3 - loss: 75.459322
Training of Epoch 4 - loss: inf
Validation of Epoch 4 - loss: 74.528508
…
what changes can we do in the parameters to make this better?
Our dataset is not very clean, but not that bad either.
Any help is appreciated.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
It’s not unusual that the first epochs shows a training loss of inf. It’s related to your hyperparameters, maybe you should increase the learning rate?
I increased my learning rate to 0.001
still, it shows the training loss as infinite till the first 4 epochs
But this time the validation loss is high and is not decreasing very much
Like :
Validation of Epoch 0 - loss: 337.850228
Validation of Epoch 1 - loss: 336.426547
Validation of Epoch 2 - loss: 335.004593
…
what could be the appropriate learning rate for such a model?
Also when I gave a smaller dataset of 300+ hours, there was no such problem.
We increased our data by augmentation.
And ever since we increased that dataset to 1300+ hours, there hasn’t been a single epoch where we got a training loss not being inf.
Could this be caused because of some problem with the dataset?
I found the problem,
There were some files in the dataset that were faulty.
By diving my data into chunks and giving each chunk separately, I was able to identify the faulty audio files.
Did you use any programmatic steps to identify these faulty files? I’ve used a few; identifying outputs larger than inputs (impacts CTC loss), very low average decibel (likely bad sample) level and error on open/read (corrupt files).
Thanks. I have done a similar check, where the files, as being converted to MFCC, are written to a dump file if…
wav.read error: corrupt file
Output > Input: CTC issue
avg decibel level = -inf: no energy in audio
avg decibel level <= -50: probably a bad sample
These files are then dropped from training/testing.
However, I am still receiving inf train loss with decreasing validation loss. I’ll continue modifying my learning rate as suggested in other posts on this forum.
Outputs represent the array of character labels: “loss” =[11,14,18,18]
Input represents the MFCC representation of the audio as array =[1,7,6]
To calculate comparable import (processing.py)
len(mfcc) - 2*numcontext
Tensorflow’s CTC loss function and the Deepspeech preprocessing script don’t like this. The ignore_longer_outputs_than_inputs option allows to specify the behavior of the CTCLoss when dealing with sequences that have longer outputs than inputs. If true, the CTCLoss will simply return zero gradient for those items, otherwise an InvalidArgument error is returned, stopping training.
I just used this check to find valid/invalid audio samples in my data.