How to trained a model for common voice dataset using deepspeech v0.6.1?

As Im newbee to deepspeech, can anybody please guide me on how to train a model for common voice dataset using checkpoints. As I am trying Im getting the overfitting issue, so I have tried the below options:

  1. increased dropout rate to 0.2
  2. decresed learning rate to 0.000001
    Any help will be appreciated. Thanks in advance!

Please, there are multiple items here. Have you read the training doc ? https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html

Which language of Common Voice and what release are you refering to ?
What do you mean by using checkpoints ?

There’s no actionable question here.

Im following this link https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html#continuing-training-from-a-release-model to train a model on common voice dataset.

Here I mean I am getting validation loss always greater than training loss in every scenerio.

That does not answer:

  • all your command line parameters
  • which common voice dataset you are using exactly

That’s quite vague, and we don’t have a training log to have a reference …

Im using common voice dataset for english language. And the parameters Im using are:
–n_hidden 2048 --learning_rate 0.000001 --dropout_rate 0.5

We are still lacking:

  • version of common voice you are training with
  • version of deepspeech checkpoints you are training with
  • training log

Can you please share all informations at once instead of having me asking again and again ?

1 Like

ok. So Im using deepspeech v0.6.1 and downloaded the checkpoints for the same version. And common voice dataset is downloaded from https://voice.mozilla.org/en/datasets.
Im getting the below result.

Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
--------------------------------------------------------------------------------
WER: 6.000000, CER: 3.222222, loss: 193.430511
 - wav: file:///home/user/en/clips/common_voice_en_54384.wav
 - src: "undefined"
 - res: "everything on her and he banterer "
--------------------------------------------------------------------------------
WER: 3.750000, CER: 3.882353, loss: 363.831543
 - wav: file:///home/user/en/clips/common_voice_en_17645060.wav
 - src: "did you know that"
 - res: "the two road the du know that did you know that they do know that did you know that"
--------------------------------------------------------------------------------
WER: 2.666667, CER: 0.655172, loss: 120.730492
 - wav: file:///home/user/en/clips/common_voice_en_125325.wav
 - src: "elizabeth reclined gracefully"
 - res: "it is a bet to an integrate full"
--------------------------------------------------------------------------------
WER: 2.285714, CER: 1.928571, loss: 343.015198
 - wav: file:///home/user/en/clips/common_voice_en_17832183.wav
 - src: "as you sow so shall you reap"
 - res: "i just she didn't fall it all over myself i just sit in front at all"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 20.358313
 - wav: file:///home/user/en/clips/common_voice_en_191353.wav
 - src: "amen"
 - res: "the men"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.600000, loss: 21.027395
 - wav: file:///home/user/en/clips/common_voice_en_18442278.wav
 - src: "behave yourself"
 - res: "the head or self"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.785714, loss: 33.058380
 - wav: file:///home/user/en/clips/common_voice_en_17267925.wav
 - src: "any volunteers"
 - res: "in a woman to"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.285714, loss: 141.307816
 - wav: file:///home/user/en/clips/common_voice_en_680693.wav
 - src: "find me the saga air cavalry"
 - res: "fin made the aga i will cover for time the saga a cabal"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.600000, loss: 42.782768
 - wav: file:///home/user/en/clips/common_voice_en_18429519.wav
 - src: "ideas are uncopyrightable"
 - res: "idea for an operator well"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.666667, loss: 45.606365
 - wav: file:///home/user/en/clips/common_voice_en_2421.wav
 - src: "programming requires brains"
 - res: "so came i guess in"
--------------------------------------------------------------------------------
I Exporting the model...

Now, my query is even if i have followed all the steps mentioned in documentation for training model for common voice dataset, Im not able to get good accuracy result. I just want to know is there anything wrong with my approach.
Thanks!

Please, we have had multiple releases of Common Voice, so this is not helping here.

The first wrong thing is that you keep not sharing as much as details as you should, and it is all spread over. I have to go back and forth to just get a picture of what you are doing. This is not helping at all.

Now, you expose 42% WER on Common Voice. This is mostly on-part with what we have and what other tools have on that dataset.

So please explain what exactly you mean. If that’s the WER of the test set, then I don’t see anything we can really improve as of now.

Not knowing your exact Common Voice versions is a huge problem here, because as much as I remember, we have some data of it in the 0.6.1 model, so you might just be overfitting on it.

How to know the common voice dataset version because I have downloaded it from https://voice.mozilla.org/en/datasets.
The other details are as,
cuda 10.2 and tensorflow 1.14. I have 2 Tesla K80 GPUs with 24 GiB.

The download has a release name. When did you downloaded ?

I had downloaded the common voice data using sudo wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-3/en.tar.gz

So I think cv version is 3.

Right, you still have not explained exactly what you meant by “Im not able to get good accuracy result.”, though.

I meant to say whenever I’m training a model I’m always getting the validation loss always greater than the training. and lastly Test result is not so much correct.

Please be clear there, I’ve already answered on that point.

Well, you have not shared any training log, so again, it is hard to help you there … How much do they diverge ? Is it constant ? Increasing ? Decreasing ? Have you tried other learning rate ? Dropout ?

I have this training log with droput rate 0.2 and learning rate 0.000001
Epoch 0
Training loss 20.814306
validation loss 37.094707

Epoch 1
Training loss 20.219907
validation loss 36.780804

Epoch 2
Training loss 19.969551
validation loss 36.636615

Epoch 3
Training loss 19.783321
validation loss 36.561840

I don’t see anything suspicious on that.

ok. So What can be the reason that testing results are not correct?

Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
--------------------------------------------------------------------------------
WER: 6.000000, CER: 3.222222, loss: 193.430511
 - wav: file:///home/user/en/clips/common_voice_en_54384.wav
 - src: "undefined"
 - res: "everything on her and he banterer "
--------------------------------------------------------------------------------
WER: 3.750000, CER: 3.882353, loss: 363.831543
 - wav: file:///home/user/en/clips/common_voice_en_17645060.wav
 - src: "did you know that"
 - res: "the two road the du know that did you know that they do know that did you know that"
--------------------------------------------------------------------------------
WER: 2.666667, CER: 0.655172, loss: 120.730492
 - wav: file:///home/user/en/clips/common_voice_en_125325.wav
 - src: "elizabeth reclined gracefully"
 - res: "it is a bet to an integrate full"
--------------------------------------------------------------------------------
WER: 2.285714, CER: 1.928571, loss: 343.015198
 - wav: file:///home/user/en/clips/common_voice_en_17832183.wav
 - src: "as you sow so shall you reap"
 - res: "i just she didn't fall it all over myself i just sit in front at all"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 20.358313
 - wav: file:///home/user/en/clips/common_voice_en_191353.wav
 - src: "amen"
 - res: "the men"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.600000, loss: 21.027395
 - wav: file:///home/user/en/clips/common_voice_en_18442278.wav
 - src: "behave yourself"
 - res: "the head or self"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.785714, loss: 33.058380
 - wav: file:///home/user/en/clips/common_voice_en_17267925.wav
 - src: "any volunteers"
 - res: "in a woman to"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.285714, loss: 141.307816
 - wav: file:///home/user/en/clips/common_voice_en_680693.wav
 - src: "find me the saga air cavalry"
 - res: "fin made the aga i will cover for time the saga a cabal"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.600000, loss: 42.782768
 - wav: file:///home/user/en/clips/common_voice_en_18429519.wav
 - src: "ideas are uncopyrightable"
 - res: "idea for an operator well"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.666667, loss: 45.606365
 - wav: file:///home/user/en/clips/common_voice_en_2421.wav
 - src: "programming requires brains"
 - res: "so came i guess in"
--------------------------------------------------------------------------------

Have you read my previous replies ? You have 42% WER on the test set of Common Voice, that’s within range of the literature.

As documented in the flags, the test report shows the top worst examples … Why are they that bad ? I don’t know, maybe those are broken transcripts ?