Creating an Indian accent model with ~115k files

pra978 · May 30, 2018, 9:12am

okay, as discussed earlier in this thread, i am trying to create a model with 160 hours of Indian accent audios but while running the model creation code, i am facing this error for many files that :

Exception in thread Thread-7:
Traceback (most recent call last):
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/Users/naveen/anaconda3/lib/python3.6/threading.py”, line 864, in run
self._target(*self._args, **self._kwargs)
File “/Users/naveen/Downloads/DeepSpeech/DeepSpeech/util/feeding.py”, line 151, in _populate_batch_queue
raise ValueError(‘Error: Audio file {} is too short for transcription.’.format(wav_file))

ValueError: Error: Audio file /Users/naveen/Downloads/all_datasets/DeepSpeech/TEST/g0907_e_tam_f_output.wav is too short for transcription.

lissyx · May 30, 2018, 9:17am

No, context I meant is “what is this file” ? I mean, if the file is too short, what’s wrong with just removing it and its transcription ?

pra978 · May 30, 2018, 9:24am

There are multiple files. Initially 3. Then I removed file and its transcription.

In fact, one of the files had a good length transcription too. Example:

“eng_text_90-2_e_man_m_output.wav, 33964, tenaliraman approached thimmana and appeased him with his expertise in spontaneous poetry”

But the problem is that, after rerunning the code, i am getting same error for more files.

If in a single run only atleast i would have got to know all the files that are giving this error, i would have removed them all at once. But, that is where the problem is, i am getting error for few files then after that there is no output. And then when i am rerunning, i am getting same error for new files.

So, i am just removing files and corresponding transcriptions and rerunning the code.

lissyx · May 30, 2018, 9:33am

What would help here is that you document what’s the transcription AND the audio length. You might be able to search more broadly this way …

pra978 · May 31, 2018, 7:21am

what is the minimum length of audio that i should feed while training the model?

lissyx · May 31, 2018, 7:27am

Have a look at the source code that generates the error, you’ll get the answer. The stack tells you it is at util/feeding.py:151

pra978 · June 4, 2018, 7:40am

I checked the condition in the code:

    source = audiofile_to_input_vector(wav_file, self._model_feeder.numcep, self._model_feeder.numcontext)
    source_len = len(source)
    target = text_to_char_array(transcript, self._alphabet)
    target_len = len(target)
    if source_len < target_len:
        raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

This tells me that, whenever duration of audio is less than duration of transcript text spoken, it will raise the error.

I tried to put this condition on my audio files to filter out such audio files but i am not able to recreate text_to_char_array as its coming from another code. What are your suggestions at this point?

lissyx · June 4, 2018, 7:46am

read the source, luke!

$ git grep "def text_to_char_array"
util/text.py:def text_to_char_array(original, alphabet):

pra978 · June 4, 2018, 8:14am

yeah, i checked that its coming from text.py code, but since that code requires some ‘config_file’, i don’t know how to recreate this function ‘text_to_char_array’ independently for my purpose. Is there any other method to filter out the smaller duration audio files?

lissyx · June 4, 2018, 9:14am

Sorry to insist, but read the source. Your config_file is the … alphabet file. So I guess that it is something you have ?

pra978 · June 4, 2018, 10:32am

okay got it.

i want this function to work:

def audiofile_to_input_vector(audio_filename, numcep, numcontext):
    r"""
    Given a WAV audio file at ``audio_filename``, calculates ``numcep`` MFCC features
    at every 0.01s time step with a window length of 0.025s. Appends ``numcontext``
    context frames to the left and right of each time step, and returns this data
    in a numpy array.
    """
    # Load wav files
    fs, audio = wav.read(audio_filename)

    return audioToInputVector(audio, fs, numcep, numcontext)

What do i feed in place of ‘numcep’ and ‘numcontext’? How is it getting calculated or where is it coming from?

lissyx · June 4, 2018, 10:37am

Can you read the source calling that ? It’s clearly trivial. Hint: git grep audiofile_to_input_vector

pra978 · June 8, 2018, 7:58am

This got resolved. Thanks a lot.

I wrote a code to filter out all the files with source_len(audio file) < target_len(transcript) and then tested the code run and it runs fine.

Now i need to use these files and run on CUDA support linux platform.

I have tensorflow-gpu -1.4 and CUDA 8.0.

When i run the main training code, i get this :

tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.9.0: cannot open shared object file: No such file or directory

Does this got to do something with my installation of tensorflow or CUDA binaries??

lissyx · June 8, 2018, 8:01am

Your tensorflow tries to use CUDA 9.0, not 8.0.

pra978 · June 8, 2018, 8:02am

so i uninstall CUDA 8 and install CUDA 9. Right?

lissyx · June 8, 2018, 8:03am

Well, you said TensorFlow GPU 1.4, which should be linked to CUDA 8.0, so I’m a bit doubtful about your setup. I cannot recommend anything.

pra978 · June 8, 2018, 8:07am

Can i give you more information??

When i do ‘nvcc --version’, i get :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

And when i do ‘pip list | grep tensorflow’, i get:

tensorflow-gpu                     1.4.0      
tensorflow-tensorboard             0.4.0

pra978 · June 8, 2018, 8:08am

Will getting tensorflow 1.6 and CUDA 9.0 help?

Also, if its tensorflow-gpu-1.4 and hence linked to CUDA 8.0, why is it trying to use CUDA 9.0?

lissyx · June 8, 2018, 8:13am

How can I know ? It’s your setup, not mine :-(.

pra978 · June 8, 2018, 8:24am

ohkay but can you recommend which tensorflow/CUDA combination will work?