Evaluation of the model

sanjay.pandey · December 3, 2019, 11:31am

Hope everyone doing good

I had 3 lacs audio voice from which 70% i used it for training and 15% for validation and other 15% in test and the result which i got on test after training it on deepspeech 0.4.1 is WER 0.32 CER 0.11 and loss 6 .

After that i increased my dataset eventually to 6 lacs.
I just want to know should i change my test dataset again and take 10% of 6 lacs and see if my model is doing better on test than previous or is it wrong to evaluate that way?
Or should i keep my test dataset same while keep increasing my training dataset and then test on the same dataset?

How to know that with more training on audio file the model is improving? What is the right approach?

reuben · December 3, 2019, 11:44am

Here’s a few tips:

As you have a lot of samples, taking a simple percentage cut can be wasteful of training data. I recommend using a sample size calculator like this one: https://www.surveymonkey.com/mp/sample-size-calculator/ (We use 99% confidence level with 1% margin of error for our dev/test sample sizes)
You should update to current master as there have been a lot of improvements since v0.4.1
If you want to make apples to apples comparisons between different models then the validation/test sets need to be identical. If you’re continuously collecting data, fixed dev/test sets will tend to be more and more biased over time as new training data gets added. To handle this, I recommend making new dev/test sets occasionally and then passing multiple files to --dev_files/--test_files so that you can keep track of things correctly. You can think of it as a bit of a versioning scheme, having e.g. dev_v1.csv, dev_v2.csv, etc, as you collect data. That way you’ll be able to know if you’re regressing on a set that you previously did well on.

sanjay.pandey · December 3, 2019, 12:21pm

Hello @reuben

Thank you for your prompt and detailed response.

I used the calculator and i had 570838 file so i inserted population size as same and after keeping confidence as 99 and margin of error 1. I am getting sample size 16170. So do you mean out of 570838 file i should divide 16170 file between test and dev dataset?

Wouldnt it be too less?

kdavis · December 3, 2019, 1:00pm

If you have 570838 in your set I’d use 538552 for training 16143 for dev, and 16143 for test.

sanjay.pandey · December 4, 2019, 6:11am

Thank you i understood how the distribution should be done.
After doing the same i am getting training loss as infinity while validation loss is reducing every epoch.
The learning rate which i have kept is 0.0001.
What can be the reason of such result?

Previously when i trained on 3 lacs data i didn’t get such infinity loss on train. Now after increasing data from 3 to 6 lacs i am getting infinity loss on train.

Can corrupt data be a reason? If yes then how to remove such corrupt file or identify such corrupt file which causing training as infinity

kdavis · December 4, 2019, 6:38am

Corrupt data could be the cause, but I’ve never seen corrupt data cause this problem. I’ve only seen it arise as a result of the learning rate being to high.

What does halving the learning rate yield?

sanjay.pandey · December 4, 2019, 8:47am

Okay trying with learning rate of 0.00005 which is half of 0.0001 and we let you know the result

reuben · December 4, 2019, 9:32am

If you update to latest master, it’ll tell you exactly what files are causing the inf loss.

sanjay.pandey · December 4, 2019, 10:04am

So you mean to say using deepspeech 0.6.0 version checkpoint instead of using deepspeech 0.4.1?

reuben · December 4, 2019, 10:28am

Yes, and the v0.6.0 code as well, of course.

sanjay.pandey · December 4, 2019, 11:14am

Even after keeping the learning rate as 0.00005 i am still getting infinity on training data. Can you tell me how and which file of v0.6 identify audio file that causes infinity? and can i apply the same file directly to 0.4.1 without updating it into 0.6?

Jendker · December 5, 2019, 9:26am

You cannot, I tried to introduce such a check to v0.5.1prior to release and it would be much work. It is easier to update to v0.6.0. If you have a file which produces NaN or infinite loss it will be printed to you which one specifically was problematic.

Here:

github.com

mozilla/DeepSpeech/blob/6d43e213a73d0efa141941196b45b8e8191daf46/DeepSpeech.py#L301


with tfv1.variable_scope(tfv1.get_variable_scope()):
    # Loop over available_devices
    for i in range(len(Config.available_devices)):
        # Execute operations of tower i on device i
        device = Config.available_devices[i]
        with tf.device(device):
            # Create a scope for all operations of tower i
            with tf.name_scope('tower_%d' % i):
                # Calculate the avg_loss and mean_edit_distance and retrieve the decoded
                # batch along with the original batch's labels (Y) of this tower
                avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)


                # Allow for variables to be re-used by the next tower
                tfv1.get_variable_scope().reuse_variables()


                # Retain tower's avg losses
                tower_avg_losses.append(avg_loss)


                # Compute gradients for model parameters using tower's mini-batch
                gradients = optimizer.compute_gradients(avg_loss)

it is checked for such files.

sanjay.pandey · December 10, 2019, 10:25am

Okay @Jendker understood.

Thank you for the reply.

Also i wanted to know that model WER which is 7.5% for v0.6 is with language model or without the language model?

kdavis · December 11, 2019, 3:59pm

This is with the language model.

Topic		Replies	Views
Loss is infinity during test dataset DeepSpeech	3	818	December 3, 2019
Testing result not good? DeepSpeech	10	314	February 22, 2020
Training loss is inf but validation loss is decreasing DeepSpeech	10	3707	March 22, 2019
Train model but actual prediction is too poor DeepSpeech	53	1680	May 5, 2020
Training results with 0.4.1 far worse than 0.3.0 DeepSpeech	48	2389	March 6, 2019

Evaluation of the model

Related topics