Problem in Convergence - DeepSpeech not learning

Hi everyone,
I did the training of DeepSpeech Model on 300 hours of data (tedlium + voxforge) as proof of concept to establish that we can do a large scale training and get something useful (say 85% accurate)

Hardware : GTX 1080Ti
Hyperparameters:

  1. I reduced n_hidden = 1024 as I had 300 hours of data only.
  2. Dropout : 0.30
  3. Train Batch Size : 64

Language model was the pre-trained released by DeepSpeech.
Rest all the parameters were same or default

Here are how model was learning.

  1. It learnt for initial 10 epochs as train and validation error was doing down.
  2. It did not learnt anything in next 10 epochs. It was not even over-fitting.
    It seems like it’s a case of underfitting where both train and validation loss very high and not decreasing.


Overall test accuracy was 20% which is very poor (WER 80%).
It took straight 9-10 hours to train (300 hours) this model for 20 iterations on a 1080Ti GPU system.

Can anyone give me some insight of what I should have done to get some good accuracy ? I saw people getting WER ~39% alone on Ted corpus.

Just wanted to know whether there are experimented parameters which has yielded best accuracy on ted+voxforge corpus ?

sir i think you should keep dropout value <= 0.12 and reduce batch size, model will learn better.:slightly_smiling_face:

Cool will try. This was 10 days back.

To experiment, I started with 100 hours.
By last night, after doing all the parameters search,
I achieved 76% accuracy (WER 24%) on 100 hours of voxforge dataset only.

This is by far the best I could achieve only on voxforge. Now, I am trying changing Uni-RNN with Bi-RNN in current deepspeech setup.
:slight_smile:

sir if you don’t mind, i ask one question.
do you develop indian accent based? or US/UK based english accent?
i need some suggestion, i don’t have much datasets to build indian accent model.
any idea about this, how can i prepare datasets for indian accent?

1 Like

Right now, I am only experimenting. But yes I may have to develop it for UK English accent 1-2 months down the line.

You have to chunkize the Indian accented audios in 5-10 seconds clips and collect the transcriptions.
You can use Google docs translator to translate and then manually correct it.

OR Try following this

@abhijeetchar sir thank you so much. i will try to prepare data for this way. :slightly_smiling_face:

You can use Google docs translator to translate and then manually correct it.

sir but google they not gave en-IN accent for translation. :slightly_smiling_face:

That is why I said then manually correct it further.

Also, That’s Google buddy, their model may have been trained on 10,000 hours of speech from all accents etc,
As far as I know, It does a fair job even on Indian accent. I had tried it sometime back

thank you so much sir. i will do it.:slightly_smiling_face:

I would avoid that kind of suggestion, it’s very likely that the Google Translate terms of use prohibits that :confused:

Oh is it ?
Though I had tried playing few audio to Google docs voice typing. It was able to transcribe.

Also, there are constraints from business which prevents such transcribing methods.

I’m not saying it’s not working, I’m saying it’s likely non authorized to use google translate to teach a concurrent system :slight_smile:

Cool man…I got your point.

yes, a link to the terms of use prohibiting this is here: google terms of use

1 Like

Hi Sir,

You said about the after tuning Parameters, i got 76% accuracy. Could you please tell what are the parameters that you tune to get those accuracy. Because, I tune the Learning rate to 0.0001 and Drop out to 0.40 with 160 epoch. Is it ok, or need to add any Hyper parameters.

Thanks

Please read this first: