As Im newbee to deepspeech, can anybody please guide me on how to train a model for common voice dataset using checkpoints. As I am trying Im getting the overfitting issue, so I have tried the below options:
increased dropout rate to 0.2
decresed learning rate to 0.000001
Any help will be appreciated. Thanks in advance!
((slow to reply) [NOT PROVIDING SUPPORT])
ok. So Im using deepspeech v0.6.1 and downloaded the checkpoints for the same version. And common voice dataset is downloaded from
Im getting the below result.
Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
WER: 6.000000, CER: 3.222222, loss: 193.430511
- wav: file:///home/user/en/clips/common_voice_en_54384.wav
- src: "undefined"
- res: "everything on her and he banterer "
WER: 3.750000, CER: 3.882353, loss: 363.831543
- wav: file:///home/user/en/clips/common_voice_en_17645060.wav
- src: "did you know that"
- res: "the two road the du know that did you know that they do know that did you know that"
WER: 2.666667, CER: 0.655172, loss: 120.730492
- wav: file:///home/user/en/clips/common_voice_en_125325.wav
- src: "elizabeth reclined gracefully"
- res: "it is a bet to an integrate full"
WER: 2.285714, CER: 1.928571, loss: 343.015198
- wav: file:///home/user/en/clips/common_voice_en_17832183.wav
- src: "as you sow so shall you reap"
- res: "i just she didn't fall it all over myself i just sit in front at all"
WER: 2.000000, CER: 1.000000, loss: 20.358313
- wav: file:///home/user/en/clips/common_voice_en_191353.wav
- src: "amen"
- res: "the men"
WER: 2.000000, CER: 0.600000, loss: 21.027395
- wav: file:///home/user/en/clips/common_voice_en_18442278.wav
- src: "behave yourself"
- res: "the head or self"
WER: 2.000000, CER: 0.785714, loss: 33.058380
- wav: file:///home/user/en/clips/common_voice_en_17267925.wav
- src: "any volunteers"
- res: "in a woman to"
WER: 1.833333, CER: 1.285714, loss: 141.307816
- wav: file:///home/user/en/clips/common_voice_en_680693.wav
- src: "find me the saga air cavalry"
- res: "fin made the aga i will cover for time the saga a cabal"
WER: 1.666667, CER: 0.600000, loss: 42.782768
- wav: file:///home/user/en/clips/common_voice_en_18429519.wav
- src: "ideas are uncopyrightable"
- res: "idea for an operator well"
WER: 1.666667, CER: 0.666667, loss: 45.606365
- wav: file:///home/user/en/clips/common_voice_en_2421.wav
- src: "programming requires brains"
- res: "so came i guess in"
I Exporting the model...
Now, my query is even if i have followed all the steps mentioned in documentation for training model for common voice dataset, Im not able to get good accuracy result. I just want to know is there anything wrong with my approach.
((slow to reply) [NOT PROVIDING SUPPORT])
Please, we have had multiple releases of Common Voice, so this is not helping here.
((slow to reply) [NOT PROVIDING SUPPORT])
The first wrong thing is that you keep not sharing as much as details as you should, and it is all spread over. I have to go back and forth to just get a picture of what you are doing. This is not helping at all.
Now, you expose 42% WER on Common Voice. This is mostly on-part with what we have and what other tools have on that dataset.
So please explain what exactly you mean. If that’s the WER of the test set, then I don’t see anything we can really improve as of now.
Not knowing your exact Common Voice versions is a huge problem here, because as much as I remember, we have some data of it in the 0.6.1 model, so you might just be overfitting on it.
How to know the common voice dataset version because I have downloaded it from
The other details are as,
cuda 10.2 and tensorflow 1.14. I have 2 Tesla K80 GPUs with 24 GiB.
((slow to reply) [NOT PROVIDING SUPPORT])
The download has a release name. When did you downloaded ?
I meant to say whenever I’m training a model I’m always getting the validation loss always greater than the training. and lastly Test result is not so much correct.
((slow to reply) [NOT PROVIDING SUPPORT])
Please be clear there, I’ve already answered on that point.
Well, you have not shared any training log, so again, it is hard to help you there … How much do they diverge ? Is it constant ? Increasing ? Decreasing ? Have you tried other learning rate ? Dropout ?
ok. So What can be the reason that testing results are not correct?
Test on /home/user/en/clips/test.csv - WER: 0.422665, CER: 0.253988, loss: 44.272888
WER: 6.000000, CER: 3.222222, loss: 193.430511
- wav: file:///home/user/en/clips/common_voice_en_54384.wav
- src: "undefined"
- res: "everything on her and he banterer "
WER: 3.750000, CER: 3.882353, loss: 363.831543
- wav: file:///home/user/en/clips/common_voice_en_17645060.wav
- src: "did you know that"
- res: "the two road the du know that did you know that they do know that did you know that"
WER: 2.666667, CER: 0.655172, loss: 120.730492
- wav: file:///home/user/en/clips/common_voice_en_125325.wav
- src: "elizabeth reclined gracefully"
- res: "it is a bet to an integrate full"
WER: 2.285714, CER: 1.928571, loss: 343.015198
- wav: file:///home/user/en/clips/common_voice_en_17832183.wav
- src: "as you sow so shall you reap"
- res: "i just she didn't fall it all over myself i just sit in front at all"
WER: 2.000000, CER: 1.000000, loss: 20.358313
- wav: file:///home/user/en/clips/common_voice_en_191353.wav
- src: "amen"
- res: "the men"
WER: 2.000000, CER: 0.600000, loss: 21.027395
- wav: file:///home/user/en/clips/common_voice_en_18442278.wav
- src: "behave yourself"
- res: "the head or self"
WER: 2.000000, CER: 0.785714, loss: 33.058380
- wav: file:///home/user/en/clips/common_voice_en_17267925.wav
- src: "any volunteers"
- res: "in a woman to"
WER: 1.833333, CER: 1.285714, loss: 141.307816
- wav: file:///home/user/en/clips/common_voice_en_680693.wav
- src: "find me the saga air cavalry"
- res: "fin made the aga i will cover for time the saga a cabal"
WER: 1.666667, CER: 0.600000, loss: 42.782768
- wav: file:///home/user/en/clips/common_voice_en_18429519.wav
- src: "ideas are uncopyrightable"
- res: "idea for an operator well"
WER: 1.666667, CER: 0.666667, loss: 45.606365
- wav: file:///home/user/en/clips/common_voice_en_2421.wav
- src: "programming requires brains"
- res: "so came i guess in"
((slow to reply) [NOT PROVIDING SUPPORT])
Have you read my previous replies ? You have 42% WER on the test set of Common Voice, that’s within range of the literature.
As documented in the flags, the test report shows the top worst examples … Why are they that bad ? I don’t know, maybe those are broken transcripts ?