Dear Friends,
First of all, wish you good health and safety away from corona.
I am using deepspeech 0.6.1 and 0.8.2 and model trained by myself using common voice data. Training is performed with batch size 12 as cannot go beyond due to hardware OOM issues at the moment. Ram is 64 and 2X RTX 2080 TI.
I have few questions to ask.
-
I am facing problem with low voices and female voices, fast spoken and to handle this, I would like to train more with
audio augmentation
andspectral augmentation
.
Can you guide me or hint me the values which could possibly help me in better transcribing poor data? Earlier there I have not introduced any augmentation yet. -
Would it be good idea to improve the engine and training with introduction of
--automatic_mixed_precision=True
as I am facing the overload on 2 GPUS already and cannot go to increased batch_size already which is 12 for train.csv. -
There are some cases where the names and cities or some special feature words or events are not better recognized by engine even with normal or slow spoken. Is there any possibility to add those dictionaries to the trained engine (for example – transfer learning) with the easy simple or fast process? with which the engine can start recognizing.