Creating an Indian accent model with ~115k files


I checked the condition in the code:

    source = audiofile_to_input_vector(wav_file, self._model_feeder.numcep, self._model_feeder.numcontext)
    source_len = len(source)
    target = text_to_char_array(transcript, self._alphabet)
    target_len = len(target)
    if source_len < target_len:
        raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

This tells me that, whenever duration of audio is less than duration of transcript text spoken, it will raise the error.

I tried to put this condition on my audio files to filter out such audio files but i am not able to recreate text_to_char_array as its coming from another code. What are your suggestions at this point?

(Lissyx) #22

read the source, luke!

$ git grep "def text_to_char_array"
util/ text_to_char_array(original, alphabet):


yeah, i checked that its coming from code, but since that code requires some ‘config_file’, i don’t know how to recreate this function ‘text_to_char_array’ independently for my purpose. Is there any other method to filter out the smaller duration audio files?

(Lissyx) #24

Sorry to insist, but read the source. Your config_file is the … alphabet file. So I guess that it is something you have ?


okay got it.

i want this function to work:

def audiofile_to_input_vector(audio_filename, numcep, numcontext):
    Given a WAV audio file at ``audio_filename``, calculates ``numcep`` MFCC features
    at every 0.01s time step with a window length of 0.025s. Appends ``numcontext``
    context frames to the left and right of each time step, and returns this data
    in a numpy array.
    # Load wav files
    fs, audio =

    return audioToInputVector(audio, fs, numcep, numcontext)

What do i feed in place of ‘numcep’ and ‘numcontext’? How is it getting calculated or where is it coming from?

(Lissyx) #26

Can you read the source calling that ? It’s clearly trivial. Hint: git grep audiofile_to_input_vector


This got resolved. Thanks a lot.

I wrote a code to filter out all the files with source_len(audio file) < target_len(transcript) and then tested the code run and it runs fine.

Now i need to use these files and run on CUDA support linux platform.

I have tensorflow-gpu -1.4 and CUDA 8.0.

When i run the main training code, i get this :

tensorflow.python.framework.errors_impl.NotFoundError: cannot open shared object file: No such file or directory

Does this got to do something with my installation of tensorflow or CUDA binaries??

(Lissyx) #28

Your tensorflow tries to use CUDA 9.0, not 8.0.


so i uninstall CUDA 8 and install CUDA 9. Right?

(Lissyx) #30

Well, you said TensorFlow GPU 1.4, which should be linked to CUDA 8.0, so I’m a bit doubtful about your setup. I cannot recommend anything.


Can i give you more information??

When i do ‘nvcc --version’, i get :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

And when i do ‘pip list | grep tensorflow’, i get:

tensorflow-gpu                     1.4.0      
tensorflow-tensorboard             0.4.0


Will getting tensorflow 1.6 and CUDA 9.0 help?

Also, if its tensorflow-gpu-1.4 and hence linked to CUDA 8.0, why is it trying to use CUDA 9.0?

(Lissyx) #33

How can I know ? It’s your setup, not mine :-(.


ohkay but can you recommend which tensorflow/CUDA combination will work?

(Lissyx) #35

You need to check TensorFlow’s upstream for that.


i checked ’ ldd deepspeech’. It shows :

$ ldd deepspeech 
deepspeech: =>  (0x00007ffd6d7f0000) => not found => /usr/lib/x86_64-linux-gnu/ (0x00007fc560d03000) => /home/aa/Downloads/deepspeech/DeepSpeech/./ (0x00007fc54a4ca000) => /home/aa/Downloads/deepspeech/DeepSpeech/./ (0x00007fc54a2c5000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc54a032000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc549cac000) => /lib/x86_64-linux-gnu/ (0x00007fc549956000) => /lib/x86_64-linux-gnu/ (0x00007fc54973f000) => /lib/x86_64-linux-gnu/ (0x00007fc54935f000) => /lib/x86_64-linux-gnu/ (0x00007fc54915b000) => /lib/x86_64-linux-gnu/ (0x00007fc548f3c000) => /lib/x86_64-linux-gnu/ (0x00007fc548d34000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc548ae8000) => not found => not found => not found => not found => not found => not found => /usr/lib/x86_64-linux-gnu/ (0x00007fc5488b9000)
	/lib64/ (0x00007fc5616f9000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc5486af000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc54847d000) => /lib/x86_64-linux-gnu/ (0x00007fc548260000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc54803e000) => /usr/lib/x86_64-linux-gnu/ (0x00007fc547e30000) =>  (0x00007fff7f3a7000) => not found => not found => /usr/lib/x86_64-linux-gnu/ (0x00007f1836f58000) => not found => not found => not found => not found => /usr/lib/x86_64-linux-gnu/ (0x00007f1836d29000) => /lib/x86_64-linux-gnu/ (0x00007f1836b25000) => /lib/x86_64-linux-gnu/ (0x00007f18367cf000) => /lib/x86_64-linux-gnu/ (0x00007f18365b0000) => /usr/lib/x86_64-linux-gnu/ (0x00007f183622a000) => /lib/x86_64-linux-gnu/ (0x00007f1836013000) => /lib/x86_64-linux-gnu/ (0x00007f1835c33000)
	/lib64/ (0x00007f184e187000) => /lib/x86_64-linux-gnu/ (0x00007f1835a2b000) => /usr/lib/x86_64-linux-gnu/ (0x00007f18357df000)

I have CUDA 8.0 and tensorflow 1.4 currently. What does this signify? Why is it showing ‘not found’ for some files above?

(Lissyx) #37

You are mixing two things here. This shows libdeepspeech's linkage. It has nothing to do with the TensorFlow python package you installed.

Just install CUDA 9.0 + CuDNNv7 locally and adjust LD_LIBRARY_PATH ?

(Vijay) #38

Hi @pra978, I am also trying to train my model on Indian English accent, where did you get your Indian English datasets?


Indic tts iit madras data

(SGang) #40

Hi @pra978 how did the training with Indic TTS go? Did you train from zero or did you train from the officially released checkpoint?