Evaluation of the model

Hey
@lissyx @reuben @kdavis

Hope everyone doing good

I had 3 lacs audio voice from which 70% i used it for training and 15% for validation and other 15% in test and the result which i got on test after training it on deepspeech 0.4.1 is WER 0.32 CER 0.11 and loss 6 .

After that i increased my dataset eventually to 6 lacs.
I just want to know should i change my test dataset again and take 10% of 6 lacs and see if my model is doing better on test than previous or is it wrong to evaluate that way?
Or should i keep my test dataset same while keep increasing my training dataset and then test on the same dataset?

How to know that with more training on audio file the model is improving? What is the right approach?

Here’s a few tips:

  • As you have a lot of samples, taking a simple percentage cut can be wasteful of training data. I recommend using a sample size calculator like this one: https://www.surveymonkey.com/mp/sample-size-calculator/ (We use 99% confidence level with 1% margin of error for our dev/test sample sizes)
  • You should update to current master as there have been a lot of improvements since v0.4.1
  • If you want to make apples to apples comparisons between different models then the validation/test sets need to be identical. If you’re continuously collecting data, fixed dev/test sets will tend to be more and more biased over time as new training data gets added. To handle this, I recommend making new dev/test sets occasionally and then passing multiple files to --dev_files/--test_files so that you can keep track of things correctly. You can think of it as a bit of a versioning scheme, having e.g. dev_v1.csv, dev_v2.csv, etc, as you collect data. That way you’ll be able to know if you’re regressing on a set that you previously did well on.

Hello @reuben

Thank you for your prompt and detailed response.

I used the calculator and i had 570838 file so i inserted population size as same and after keeping confidence as 99 and margin of error 1. I am getting sample size 16170. So do you mean out of 570838 file i should divide 16170 file between test and dev dataset?

Wouldnt it be too less?

If you have 570838 in your set I’d use 538552 for training 16143 for dev, and 16143 for test.

Thank you i understood how the distribution should be done.
After doing the same i am getting training loss as infinity while validation loss is reducing every epoch.
The learning rate which i have kept is 0.0001.
What can be the reason of such result?

Previously when i trained on 3 lacs data i didn’t get such infinity loss on train. Now after increasing data from 3 to 6 lacs i am getting infinity loss on train.

Can corrupt data be a reason? If yes then how to remove such corrupt file or identify such corrupt file which causing training as infinity

Corrupt data could be the cause, but I’ve never seen corrupt data cause this problem. I’ve only seen it arise as a result of the learning rate being to high.

What does halving the learning rate yield?

Okay trying with learning rate of 0.00005 which is half of 0.0001 and we let you know the result

If you update to latest master, it’ll tell you exactly what files are causing the inf loss.

So you mean to say using deepspeech 0.6.0 version checkpoint instead of using deepspeech 0.4.1?

Yes, and the v0.6.0 code as well, of course.

Even after keeping the learning rate as 0.00005 i am still getting infinity on training data. Can you tell me how and which file of v0.6 identify audio file that causes infinity? and can i apply the same file directly to 0.4.1 without updating it into 0.6?

You cannot, I tried to introduce such a check to v0.5.1prior to release and it would be much work. It is easier to update to v0.6.0. If you have a file which produces NaN or infinite loss it will be printed to you which one specifically was problematic.

Here:


it is checked for such files.

Okay @Jendker understood.

Thank you for the reply.

Also i wanted to know that model WER which is 7.5% for v0.6 is with language model or without the language model?

This is with the language model.