I want to train a model for speech recognition on my data.
I have wav(audio) files of 44 khz and I do not want to use the mozilla dataset
what is the way to do that?
See the documentation section titled Training Your Own Model for the general idea, but use you data instead of Common Voice data. You’ll make your own csv’s, alphabet.txt, and language model for your use case.
Yes I got that point of making my own Csv files and Language model
(1)I need help in making csv files for my data.How it should be done
(2)does language model in .arpa format can be used
Can you help me out in that
Yes.
No, the arpa file is used to create the lm binary file.
Please read the oficial docs and I recommend to read :
Ok thanks a lot for your help.I need to clear only one doubt
I have wav files of 44khz and not 16k it is stereo not mono and audio files for training are 90 sec long.
Will it create a problem, can we train deepspeech on my data as in mentioned format?
Only 16-bit, 16 kHz, mono audio files are supported for inference and training.
will performance be affected if I convert my wav files from 44khz to 16khz and stereo to mono
Please help me to find the solution with data in my format
Where in the documentation has this file format been mentioned? Just curious.
That message you’re quoting is from May. In the mean time, we have dropped the 16kHz requirement. You can train with any sample rate, and export with any sample rate, provided the --audio_win_len
and --audio_win_step
flags in milliseconds correspond to an integer number of samples at the exported sample rate. 16 bits per sample is still required at the API level.