Creating an Indian accent model with ~115k files

I checked the condition in the code:

    source = audiofile_to_input_vector(wav_file, self._model_feeder.numcep, self._model_feeder.numcontext)
    source_len = len(source)
    target = text_to_char_array(transcript, self._alphabet)
    target_len = len(target)
    if source_len < target_len:
        raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

This tells me that, whenever duration of audio is less than duration of transcript text spoken, it will raise the error.

I tried to put this condition on my audio files to filter out such audio files but i am not able to recreate text_to_char_array as its coming from another code. What are your suggestions at this point?

read the source, luke!

$ git grep "def text_to_char_array"
util/text.py:def text_to_char_array(original, alphabet):

yeah, i checked that its coming from text.py code, but since that code requires some ‘config_file’, i don’t know how to recreate this function ‘text_to_char_array’ independently for my purpose. Is there any other method to filter out the smaller duration audio files?

Sorry to insist, but read the source. Your config_file is the … alphabet file. So I guess that it is something you have ?

okay got it.

i want this function to work:

def audiofile_to_input_vector(audio_filename, numcep, numcontext):
    r"""
    Given a WAV audio file at ``audio_filename``, calculates ``numcep`` MFCC features
    at every 0.01s time step with a window length of 0.025s. Appends ``numcontext``
    context frames to the left and right of each time step, and returns this data
    in a numpy array.
    """
    # Load wav files
    fs, audio = wav.read(audio_filename)

    return audioToInputVector(audio, fs, numcep, numcontext)

What do i feed in place of ‘numcep’ and ‘numcontext’? How is it getting calculated or where is it coming from?

Can you read the source calling that ? It’s clearly trivial. Hint: git grep audiofile_to_input_vector

This got resolved. Thanks a lot.

I wrote a code to filter out all the files with source_len(audio file) < target_len(transcript) and then tested the code run and it runs fine.

Now i need to use these files and run on CUDA support linux platform.

I have tensorflow-gpu -1.4 and CUDA 8.0.

When i run the main training code, i get this :

tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.9.0: cannot open shared object file: No such file or directory

Does this got to do something with my installation of tensorflow or CUDA binaries??

Your tensorflow tries to use CUDA 9.0, not 8.0.

so i uninstall CUDA 8 and install CUDA 9. Right?

Well, you said TensorFlow GPU 1.4, which should be linked to CUDA 8.0, so I’m a bit doubtful about your setup. I cannot recommend anything.

Can i give you more information??

When i do ‘nvcc --version’, i get :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

And when i do ‘pip list | grep tensorflow’, i get:

tensorflow-gpu                     1.4.0      
tensorflow-tensorboard             0.4.0

Will getting tensorflow 1.6 and CUDA 9.0 help?

Also, if its tensorflow-gpu-1.4 and hence linked to CUDA 8.0, why is it trying to use CUDA 9.0?

How can I know ? It’s your setup, not mine :-(.

ohkay but can you recommend which tensorflow/CUDA combination will work?

You need to check TensorFlow’s upstream for that.

i checked ’ ldd deepspeech libdeepspeech.so’. It shows :

$ ldd deepspeech libdeepspeech.so 
deepspeech:
	linux-vdso.so.1 =>  (0x00007ffd6d7f0000)
	libcudart.so.9.0 => not found
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fc560d03000)
	libdeepspeech.so => /home/aa/Downloads/deepspeech/DeepSpeech/./libdeepspeech.so (0x00007fc54a4ca000)
	libdeepspeech_utils.so => /home/aa/Downloads/deepspeech/DeepSpeech/./libdeepspeech_utils.so (0x00007fc54a2c5000)
	libsox.so.2 => /usr/lib/x86_64-linux-gnu/libsox.so.2 (0x00007fc54a032000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc549cac000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc549956000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc54973f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc54935f000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc54915b000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc548f3c000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc548d34000)
	libnvidia-fatbinaryloader.so.375.26 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.26 (0x00007fc548ae8000)
	libcusolver.so.9.0 => not found
	libcublas.so.9.0 => not found
	libcudnn.so.7 => not found
	libcufft.so.9.0 => not found
	libcurand.so.9.0 => not found
	libcudart.so.9.0 => not found
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fc5488b9000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc5616f9000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fc5486af000)
	libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007fc54847d000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc548260000)
	libmagic.so.1 => /usr/lib/x86_64-linux-gnu/libmagic.so.1 (0x00007fc54803e000)
	libgsm.so.1 => /usr/lib/x86_64-linux-gnu/libgsm.so.1 (0x00007fc547e30000)
libdeepspeech.so:
	linux-vdso.so.1 =>  (0x00007fff7f3a7000)
	libcusolver.so.9.0 => not found
	libcublas.so.9.0 => not found
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f1836f58000)
	libcudnn.so.7 => not found
	libcufft.so.9.0 => not found
	libcurand.so.9.0 => not found
	libcudart.so.9.0 => not found
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1836d29000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1836b25000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f18367cf000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f18365b0000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f183622a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1836013000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1835c33000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f184e187000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1835a2b000)
	libnvidia-fatbinaryloader.so.375.26 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.26 (0x00007f18357df000)

I have CUDA 8.0 and tensorflow 1.4 currently. What does this signify? Why is it showing ‘not found’ for some files above?

You are mixing two things here. This shows libdeepspeech's linkage. It has nothing to do with the TensorFlow python package you installed.

Just install CUDA 9.0 + CuDNNv7 locally and adjust LD_LIBRARY_PATH ?

Hi @pra978, I am also trying to train my model on Indian English accent, where did you get your Indian English datasets?
Thanks.

Indic tts iit madras data

Hi @pra978 how did the training with Indic TTS go? Did you train from zero or did you train from the officially released checkpoint?