Need some clarification on training on already pretrained model

Hello,

I have successfully tried to train deepspeech 0.4.1 model on my own dataset. these are some steps
downloaded mozilla common voice 22gb corpus for english.
I was going to create my new tsv. but could not able to figure out what is client id in corpus tsv
so i just overwrite my own sentence in corpus tsv and replaced my mp3 file with corresponding tsv path name for 15 samples.
but this time I am going to create big data around 600 sample so my question is what is client id in mozilla corpus or how do i create a big sample data for this model is there any script available for the same.

  1. accuracy on indian accent is very low. will it help if i retrain the model using mozilla indian accent samples only which is already been used to train the actual 0.4.1 model.

  2. is there any preprocessing need to be done to minimize noise while giving input as wav file to model to get prediction. I am using pyaudio with this setting

    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000

that sounds like you are going to train several times on the same data: bad idea

That’s up to you

I’m not sure what you did there … You should just use import_cv2.py to import Common Voice released dataset.

If you have your own dataset, why go through that complicated process?

thank you @lissyx quick question what should be the epoch no for training pretrianed 0.4.1 model mine starting at 1341 is it right.

Preprocessing ['/home/sush/Desktop/git_lfs_deepspeech/DeepSpeech/own_data/train1.csv']
Preprocessing done
Preprocessing ['/home/sush/Desktop/git_lfs_deepspeech/DeepSpeech/own_data/dev1.csv']
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
I STARTING Optimization
I Training epoch 1341...
 10% (17 of 168) |##                     | Elapsed Time: 0:03:48 ETA:   0:35:08

It’ll change depending on your train set and batch size, so just ignore that and always use negative values for the --epochs flag when fine tuning. This has been updated in master to always be relative to avoid confusion, FWIW.

thanks for the help appreciated