Training Deepspeech

lucifera678 · May 27, 2019, 1:02pm

I want to train a model for speech recognition on my data.
I have wav(audio) files of 44 khz and I do not want to use the mozilla dataset
what is the way to do that?

kdavis · May 27, 2019, 1:53pm

See the documentation section titled Training Your Own Model for the general idea, but use you data instead of Common Voice data. You’ll make your own csv’s, alphabet.txt, and language model for your use case.

lucifera678 · May 27, 2019, 2:08pm

Yes I got that point of making my own Csv files and Language model
(1)I need help in making csv files for my data.How it should be done
(2)does language model in .arpa format can be used
Can you help me out in that

carlfm01 · May 27, 2019, 11:46pm

Yes.

No, the arpa file is used to create the lm binary file.

Please read the oficial docs and I recommend to read :

lucifera678 · May 28, 2019, 4:40am

Ok thanks a lot for your help.I need to clear only one doubt
I have wav files of 44khz and not 16k it is stereo not mono and audio files for training are 90 sec long.
Will it create a problem, can we train deepspeech on my data as in mentioned format?

carlfm01 · May 28, 2019, 4:44am

Only 16-bit, 16 kHz, mono audio files are supported for inference and training.

lucifera678 · May 28, 2019, 4:51am

will performance be affected if I convert my wav files from 44khz to 16khz and stereo to mono
Please help me to find the solution with data in my format

Rasika_Telang · January 21, 2020, 9:47pm

Where in the documentation has this file format been mentioned? Just curious.

reuben · January 21, 2020, 11:10pm

That message you’re quoting is from May. In the mean time, we have dropped the 16kHz requirement. You can train with any sample rate, and export with any sample rate, provided the --audio_win_len and --audio_win_step flags in milliseconds correspond to an integer number of samples at the exported sample rate. 16 bits per sample is still required at the API level.