How to trained a model for common voice dataset using deepspeech v0.6.1?

nehanagpure2016 · February 10, 2020, 11:53am

As Im newbee to deepspeech, can anybody please guide me on how to train a model for common voice dataset using checkpoints. As I am trying Im getting the overfitting issue, so I have tried the below options:

increased dropout rate to 0.2
decresed learning rate to 0.000001
Any help will be appreciated. Thanks in advance!

lissyx · February 10, 2020, 11:55am

Please, there are multiple items here. Have you read the training doc ? https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html

Which language of Common Voice and what release are you refering to ?
What do you mean by using checkpoints ?

There’s no actionable question here.

nehanagpure2016 · February 10, 2020, 12:04pm

Im following this link https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html#continuing-training-from-a-release-model to train a model on common voice dataset.

nehanagpure2016 · February 10, 2020, 12:05pm

Here I mean I am getting validation loss always greater than training loss in every scenerio.

lissyx · February 10, 2020, 12:07pm

That does not answer:

all your command line parameters
which common voice dataset you are using exactly

That’s quite vague, and we don’t have a training log to have a reference …

nehanagpure2016 · February 11, 2020, 4:51am

Im using common voice dataset for english language. And the parameters Im using are:
–n_hidden 2048 --learning_rate 0.000001 --dropout_rate 0.5

lissyx · February 11, 2020, 8:42am

We are still lacking:

version of common voice you are training with
version of deepspeech checkpoints you are training with
training log

Can you please share all informations at once instead of having me asking again and again ?

nehanagpure2016 · February 12, 2020, 8:34am

ok. So Im using deepspeech v0.6.1 and downloaded the checkpoints for the same version. And common voice dataset is downloaded from https://voice.mozilla.org/en/datasets.
Im getting the below result.

Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
--------------------------------------------------------------------------------
WER: 6.000000, CER: 3.222222, loss: 193.430511
 - wav: file:///home/user/en/clips/common_voice_en_54384.wav
 - src: "undefined"
 - res: "everything on her and he banterer "
--------------------------------------------------------------------------------
WER: 3.750000, CER: 3.882353, loss: 363.831543
 - wav: file:///home/user/en/clips/common_voice_en_17645060.wav
 - src: "did you know that"
 - res: "the two road the du know that did you know that they do know that did you know that"
--------------------------------------------------------------------------------
WER: 2.666667, CER: 0.655172, loss: 120.730492
 - wav: file:///home/user/en/clips/common_voice_en_125325.wav
 - src: "elizabeth reclined gracefully"
 - res: "it is a bet to an integrate full"
--------------------------------------------------------------------------------
WER: 2.285714, CER: 1.928571, loss: 343.015198
 - wav: file:///home/user/en/clips/common_voice_en_17832183.wav
 - src: "as you sow so shall you reap"
 - res: "i just she didn't fall it all over myself i just sit in front at all"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 20.358313
 - wav: file:///home/user/en/clips/common_voice_en_191353.wav
 - src: "amen"
 - res: "the men"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.600000, loss: 21.027395
 - wav: file:///home/user/en/clips/common_voice_en_18442278.wav
 - src: "behave yourself"
 - res: "the head or self"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.785714, loss: 33.058380
 - wav: file:///home/user/en/clips/common_voice_en_17267925.wav
 - src: "any volunteers"
 - res: "in a woman to"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.285714, loss: 141.307816
 - wav: file:///home/user/en/clips/common_voice_en_680693.wav
 - src: "find me the saga air cavalry"
 - res: "fin made the aga i will cover for time the saga a cabal"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.600000, loss: 42.782768
 - wav: file:///home/user/en/clips/common_voice_en_18429519.wav
 - src: "ideas are uncopyrightable"
 - res: "idea for an operator well"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.666667, loss: 45.606365
 - wav: file:///home/user/en/clips/common_voice_en_2421.wav
 - src: "programming requires brains"
 - res: "so came i guess in"
--------------------------------------------------------------------------------
I Exporting the model...

Now, my query is even if i have followed all the steps mentioned in documentation for training model for common voice dataset, Im not able to get good accuracy result. I just want to know is there anything wrong with my approach.
Thanks!

lissyx · February 12, 2020, 8:34am

Please, we have had multiple releases of Common Voice, so this is not helping here.

lissyx · February 12, 2020, 8:37am

The first wrong thing is that you keep not sharing as much as details as you should, and it is all spread over. I have to go back and forth to just get a picture of what you are doing. This is not helping at all.

Now, you expose 42% WER on Common Voice. This is mostly on-part with what we have and what other tools have on that dataset.

So please explain what exactly you mean. If that’s the WER of the test set, then I don’t see anything we can really improve as of now.

Not knowing your exact Common Voice versions is a huge problem here, because as much as I remember, we have some data of it in the 0.6.1 model, so you might just be overfitting on it.

nehanagpure2016 · February 12, 2020, 10:02am

How to know the common voice dataset version because I have downloaded it from https://voice.mozilla.org/en/datasets.
The other details are as,
cuda 10.2 and tensorflow 1.14. I have 2 Tesla K80 GPUs with 24 GiB.

lissyx · February 12, 2020, 10:16am

The download has a release name. When did you downloaded ?

nehanagpure2016 · February 12, 2020, 10:23am

I had downloaded the common voice data using sudo wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-3/en.tar.gz

So I think cv version is 3.

lissyx · February 12, 2020, 10:32am

Right, you still have not explained exactly what you meant by “Im not able to get good accuracy result.”, though.

nehanagpure2016 · February 12, 2020, 10:42am

I meant to say whenever I’m training a model I’m always getting the validation loss always greater than the training. and lastly Test result is not so much correct.

lissyx · February 12, 2020, 10:44am

Please be clear there, I’ve already answered on that point.

Well, you have not shared any training log, so again, it is hard to help you there … How much do they diverge ? Is it constant ? Increasing ? Decreasing ? Have you tried other learning rate ? Dropout ?

nehanagpure2016 · February 12, 2020, 1:04pm

I have this training log with droput rate 0.2 and learning rate 0.000001
Epoch 0
Training loss 20.814306
validation loss 37.094707

Epoch 1
Training loss 20.219907
validation loss 36.780804

Epoch 2
Training loss 19.969551
validation loss 36.636615

Epoch 3
Training loss 19.783321
validation loss 36.561840

lissyx · February 12, 2020, 1:09pm

I don’t see anything suspicious on that.

nehanagpure2016 · February 12, 2020, 1:26pm

ok. So What can be the reason that testing results are not correct?

Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
--------------------------------------------------------------------------------
WER: 6.000000, CER: 3.222222, loss: 193.430511
 - wav: file:///home/user/en/clips/common_voice_en_54384.wav
 - src: "undefined"
 - res: "everything on her and he banterer "
--------------------------------------------------------------------------------
WER: 3.750000, CER: 3.882353, loss: 363.831543
 - wav: file:///home/user/en/clips/common_voice_en_17645060.wav
 - src: "did you know that"
 - res: "the two road the du know that did you know that they do know that did you know that"
--------------------------------------------------------------------------------
WER: 2.666667, CER: 0.655172, loss: 120.730492
 - wav: file:///home/user/en/clips/common_voice_en_125325.wav
 - src: "elizabeth reclined gracefully"
 - res: "it is a bet to an integrate full"
--------------------------------------------------------------------------------
WER: 2.285714, CER: 1.928571, loss: 343.015198
 - wav: file:///home/user/en/clips/common_voice_en_17832183.wav
 - src: "as you sow so shall you reap"
 - res: "i just she didn't fall it all over myself i just sit in front at all"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.000000, loss: 20.358313
 - wav: file:///home/user/en/clips/common_voice_en_191353.wav
 - src: "amen"
 - res: "the men"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.600000, loss: 21.027395
 - wav: file:///home/user/en/clips/common_voice_en_18442278.wav
 - src: "behave yourself"
 - res: "the head or self"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.785714, loss: 33.058380
 - wav: file:///home/user/en/clips/common_voice_en_17267925.wav
 - src: "any volunteers"
 - res: "in a woman to"
--------------------------------------------------------------------------------
WER: 1.833333, CER: 1.285714, loss: 141.307816
 - wav: file:///home/user/en/clips/common_voice_en_680693.wav
 - src: "find me the saga air cavalry"
 - res: "fin made the aga i will cover for time the saga a cabal"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.600000, loss: 42.782768
 - wav: file:///home/user/en/clips/common_voice_en_18429519.wav
 - src: "ideas are uncopyrightable"
 - res: "idea for an operator well"
--------------------------------------------------------------------------------
WER: 1.666667, CER: 0.666667, loss: 45.606365
 - wav: file:///home/user/en/clips/common_voice_en_2421.wav
 - src: "programming requires brains"
 - res: "so came i guess in"
--------------------------------------------------------------------------------

lissyx · February 12, 2020, 1:32pm

Have you read my previous replies ? You have 42% WER on the test set of Common Voice, that’s within range of the literature.

As documented in the flags, the test report shows the top worst examples … Why are they that bad ? I don’t know, maybe those are broken transcripts ?