How to handle low voices and female voices and improve engine performance

Dear Friends,

First of all, wish you good health and safety away from corona.

I am using deepspeech 0.6.1 and 0.8.2 and model trained by myself using common voice data. Training is performed with batch size 12 as cannot go beyond due to hardware OOM issues at the moment. Ram is 64 and 2X RTX 2080 TI.
I have few questions to ask.

  1. I am facing problem with low voices and female voices, fast spoken and to handle this, I would like to train more with audio augmentation and spectral augmentation.
    Can you guide me or hint me the values which could possibly help me in better transcribing poor data? Earlier there I have not introduced any augmentation yet.

  2. Would it be good idea to improve the engine and training with introduction of --automatic_mixed_precision=True as I am facing the overload on 2 GPUS already and cannot go to increased batch_size already which is 12 for train.csv.

  3. There are some cases where the names and cities or some special feature words or events are not better recognized by engine even with normal or slow spoken. Is there any possibility to add those dictionaries to the trained engine (for example – transfer learning) with the easy simple or fast process? with which the engine can start recognizing.

There is not much documentation on augmentation currently. Best approach is to search the forum to see what others used.

It worked well for me :slight_smile:

Build a custom language model that includes many sentences with the special entities. The model can’t predict what it doesn’t know.

1 Like

@othiele thank you so much. Custom model means it could be only 10 words based model or so. and than the retraining on already trained engine will append after few iterations.

Search about custom model here. But it basically is a textual list of possible sentences/chunks that should be recognized, which is then made into a scorer.

@othiele Thank you so much. So, I have to add those type of sentences in corpus text file which is used while building lm / scorer. Than it will be able to recognize such words or sentences after training. Thank you so much. !! I will make my trials now.