Creating an Indian accent model with ~115k files

Okay,
is this helpful? Or i need to make more changes?

That seems to be your root case. Some of your data as no label. Please check your CSV and importer.

okay i corrected the label problem in data. There was some problem with my csv file.

I am going to put the model for run again. I have 2 major doubts for that :

  1. I got CUDA support but still Model may run for a week, how do i check its progress in percentage if i am running the script in terminal?

  2. Also, i am training these ~115K files with the same parameters as mentioned below from this link (TUTORIAL : How I trained a specific french model to control my robot) :-

–train_batch_size 80
–dev_batch_size 80
–test_batch_size 40
–n_hidden 375
–epoch 33
–validation_step 1
–early_stop True
–earlystop_nsteps 6
–estop_mean_thresh 0.1
–estop_std_thresh 0.1
–dropout_rate 0.22
–learning_rate 0.00095
–report_count 100
–use_seq_length False \

The length of my audio files is around 5 sec per audio file. Can you please let me know which parameters i need to research and tweak?

As much as I could see from your previous console dumps, you are running the training on macOS. TensorFlow dropped support for CUDA on this platform several releases ago. So any training done there is going to be CPU-only. You should really aim at a Linux GPU-powered system. Progress will be controled by the --display_step. Check the documentation for that. Display step runs a WER, which consumes a lot of resources. Use with care.

Please look on the forum, there’s already lot of informations around that. I’m not sure how much more I can help you, except that the parameters used by Vincent for his robots are tailored to a specific, small dataset. Likely you will have to augment the n_hidden. Check the current github issues related to benchmarks for v0.2.0 project, we have run some tests to estimate better sizes. We also document the others good values for parameters in the v0.1.1 release notes. Please check with those.

115k files of 5 secs each on average gets you around 160 hours of audio. Likely a not too bad starting point, but you should not expect too much from that.

Actually i got a CUDA enabled Linux GPU-Powered system. Sorry, i forgot to mention that.

So, with --display _step tweaking i can check the progress of how much my model has trained in percentage if i train the model in terminal??

Also, can you answer the second part of my previous question? Which of these parameters will change as i have a total of ~115 audio files of average 5 sec length (80k in train, 23k in dev, 12k in test)?

These parameters were used by me directly from this link : (TUTORIAL : How I trained a specific french model to control my robot) but in that previous run i had only a total of 5000 files.

But since now, i have such a huge number of files, i am skeptical to use same value for these parameters.

apologies as i did not check your second answer before posting my previous comment.

Ignore the part where i am asking you to answer the parameter setting query.

We dont have « percentage ». But you set a number of epochs, and display step will tell you which epoch and which WER …

Hi,
While trying to run the model, i am getting errors like these:

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/Users/naveen/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Users/naveen/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py", line 151, in _populate_batch_queue
    raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

ValueError: Error: Audio file /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/g0907_e_tam_f_output.wav is too short for transcription.

Should i remove the file altogether, along with its corresponding entry from the csv files or there is any other solution to this?

Hard to tell without more context :frowning:

okay, as discussed earlier in this thread, i am trying to create a model with 160 hours of Indian accent audios but while running the model creation code, i am facing this error for many files that :

Exception in thread Thread-7:
Traceback (most recent call last):
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py”, line 151, in _populate_batch_queue
raise ValueError(‘Error: Audio file {} is too short for transcription.’.format(wav_file))

ValueError: Error: Audio file /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/g0907_e_tam_f_output.wav is too short for transcription.

No, context I meant is “what is this file” ? I mean, if the file is too short, what’s wrong with just removing it and its transcription ?

There are multiple files. Initially 3. Then I removed file and its transcription.

In fact, one of the files had a good length transcription too. Example:

“eng_text_90-2_e_man_m_output.wav, 33964, tenaliraman approached thimmana and appeased him with his expertise in spontaneous poetry”

But the problem is that, after rerunning the code, i am getting same error for more files.

If in a single run only atleast i would have got to know all the files that are giving this error, i would have removed them all at once. But, that is where the problem is, i am getting error for few files then after that there is no output. And then when i am rerunning, i am getting same error for new files.

So, i am just removing files and corresponding transcriptions and rerunning the code.

What would help here is that you document what’s the transcription AND the audio length. You might be able to search more broadly this way …

what is the minimum length of audio that i should feed while training the model?

Have a look at the source code that generates the error, you’ll get the answer. The stack tells you it is at util/feeding.py:151

I checked the condition in the code:

    source = audiofile_to_input_vector(wav_file, self._model_feeder.numcep, self._model_feeder.numcontext)
    source_len = len(source)
    target = text_to_char_array(transcript, self._alphabet)
    target_len = len(target)
    if source_len < target_len:
        raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

This tells me that, whenever duration of audio is less than duration of transcript text spoken, it will raise the error.

I tried to put this condition on my audio files to filter out such audio files but i am not able to recreate text_to_char_array as its coming from another code. What are your suggestions at this point?

read the source, luke!

$ git grep "def text_to_char_array"
util/text.py:def text_to_char_array(original, alphabet):

yeah, i checked that its coming from text.py code, but since that code requires some ‘config_file’, i don’t know how to recreate this function ‘text_to_char_array’ independently for my purpose. Is there any other method to filter out the smaller duration audio files?

Sorry to insist, but read the source. Your config_file is the … alphabet file. So I guess that it is something you have ?