How to handle low voices and female voices and improve engine performance

Tortoise · October 15, 2020, 8:24am

Dear Friends,

First of all, wish you good health and safety away from corona.

I am using deepspeech 0.6.1 and 0.8.2 and model trained by myself using common voice data. Training is performed with batch size 12 as cannot go beyond due to hardware OOM issues at the moment. Ram is 64 and 2X RTX 2080 TI.
I have few questions to ask.

I am facing problem with low voices and female voices, fast spoken and to handle this, I would like to train more with audio augmentation and spectral augmentation.
Can you guide me or hint me the values which could possibly help me in better transcribing poor data? Earlier there I have not introduced any augmentation yet.
Would it be good idea to improve the engine and training with introduction of --automatic_mixed_precision=True as I am facing the overload on 2 GPUS already and cannot go to increased batch_size already which is 12 for train.csv.
There are some cases where the names and cities or some special feature words or events are not better recognized by engine even with normal or slow spoken. Is there any possibility to add those dictionaries to the trained engine (for example – transfer learning) with the easy simple or fast process? with which the engine can start recognizing.

othiele · October 15, 2020, 3:21pm

There is not much documentation on augmentation currently. Best approach is to search the forum to see what others used.

It worked well for me

Build a custom language model that includes many sentences with the special entities. The model can’t predict what it doesn’t know.

Tortoise · October 15, 2020, 3:27pm

@othiele thank you so much. Custom model means it could be only 10 words based model or so. and than the retraining on already trained engine will append after few iterations.

othiele · October 15, 2020, 6:47pm

Search about custom model here. But it basically is a textual list of possible sentences/chunks that should be recognized, which is then made into a scorer.

Tortoise · October 16, 2020, 7:13am

@othiele Thank you so much. So, I have to add those type of sentences in corpus text file which is used while building lm / scorer. Than it will be able to recognize such words or sentences after training. Thank you so much. !! I will make my trials now.

Topic		Replies	Views
Tools for applying data augmentation to wavs DeepSpeech	3	338	May 11, 2020
Train model but actual prediction is too poor DeepSpeech	53	1680	May 5, 2020
Recommended values for data augmentation DeepSpeech	1	424	August 26, 2020
Data augmentation slowing training DeepSpeech	3	2436	April 6, 2018
Fine tuning data requirements DeepSpeech dataset	5	2396	May 11, 2019

How to handle low voices and female voices and improve engine performance

Related topics