Fine tuning data requirements

Sushantmkarande · May 9, 2019, 6:02am

I am trying to improve deepspeech model accuracy for indian english dataset.
how my data should look like, is there any requirements.
what I did:
first I tried recording 6 people’s voice on 8-10 average word sentence. about 30 sentence each, note that same sentence repeated by every person. I got reasonably good accuracy,
then i tried again with 6 people with 40 sentence each. but this time some of the sentence only had one word which is kind of keyword i want to predict correctly. accuracy did not improve like the first time.

so what are some requirement we should keep in mind while recording voice.
1.should there be only sentences not words
2.does some noise around effect accuracy
3. fine tuning on already finetuned model effects accuracy.
4.If I repeated those sentence again with varied decible in wav. (data augumetetion) will it effect accuracy
5. what you suggest for data agumentetion.
1.will adding some noise help
2. will varying speed and pitch helps.

sorry for the big essay but i did not know how to frame this question.

lissyx · May 10, 2019, 8:27am

At first, I think your data augmentation for fine-tuning is just not enough. There seems to be several problems here to address separately:

indian accent
specific words
noisy background

For the indian accent, there’s no better solution than having a bit more than a few dozen of minutes of sound with it. You should try and look into Common Voice dataset, filtering for indian accent, that should already be a good basis. Contributing to Common Voice would of course help a lot.

For the noisy background, the only reliable solution is making the model noise-robust, which we are working on but is not yet ready. It’s done with data augmentation where we add noise, you can find more about it on github.

For specific words, if you need them to be properly identified, the best solution is to re-build a language model and add your own words. Better long-term solution is helping us add the feature of having multiple language models, which would allow to better control that and avoid re-building from scratch the base language model.

Sushantmkarande · May 11, 2019, 5:20am

thanks a lot… @lissyx
although I have few questions. this is very good explination to my question.

can I again train on mozilla corpus indian accent even though you might have already trained on it in pretraining.

do you mean start training from scratch when you say build your own language model. if thats the case how should I do it, I dont have enough data.

Sushantmkarande · May 11, 2019, 6:26am

I just saw common voice dataset. there are lot nan values in accent columns in validated tsv so what is the solution. i could only find 16000 indian accent sample from 490000 samples.

lissyx · May 11, 2019, 6:38am

Not a good idea

No, just follow the docs in data/lm/README.md, and adjust the text file to add your specific words.

lissyx · May 11, 2019, 6:39am

Then it’s likely we already have trained with those in 0.4.1, so unfortunately it’s not being helpful for you

Topic		Replies	Views
How can i improve Indian accent accuracy for pretrained model v0.2.0.? DeepSpeech	12	2693	April 1, 2019
Train Deep speech with Indian english DeepSpeech	33	4469	May 31, 2020
Training data for fine tuning the acoustic model DeepSpeech	1	492	November 8, 2018
When will you release deepspeech pretrained model v0.2.0? DeepSpeech	15	2531	August 20, 2019
System requirements for training Indian accent English over DeepSpeech pre-trained model checkpoints? DeepSpeech	22	3667	November 6, 2019

Fine tuning data requirements

Related topics