Creating an Indian accent model with ~115k files

pra978 · June 4, 2018, 7:40am

I checked the condition in the code:

    source = audiofile_to_input_vector(wav_file, self._model_feeder.numcep, self._model_feeder.numcontext)
    source_len = len(source)
    target = text_to_char_array(transcript, self._alphabet)
    target_len = len(target)
    if source_len < target_len:
        raise ValueError('Error: Audio file {} is too short for transcription.'.format(wav_file))

This tells me that, whenever duration of audio is less than duration of transcript text spoken, it will raise the error.

I tried to put this condition on my audio files to filter out such audio files but i am not able to recreate text_to_char_array as its coming from another code. What are your suggestions at this point?

lissyx · June 4, 2018, 7:46am

read the source, luke!

$ git grep "def text_to_char_array"
util/text.py:def text_to_char_array(original, alphabet):

pra978 · June 4, 2018, 8:14am

yeah, i checked that its coming from text.py code, but since that code requires some ‘config_file’, i don’t know how to recreate this function ‘text_to_char_array’ independently for my purpose. Is there any other method to filter out the smaller duration audio files?

lissyx · June 4, 2018, 9:14am

Sorry to insist, but read the source. Your config_file is the … alphabet file. So I guess that it is something you have ?

pra978 · June 4, 2018, 10:32am

okay got it.

i want this function to work:

def audiofile_to_input_vector(audio_filename, numcep, numcontext):
    r"""
    Given a WAV audio file at ``audio_filename``, calculates ``numcep`` MFCC features
    at every 0.01s time step with a window length of 0.025s. Appends ``numcontext``
    context frames to the left and right of each time step, and returns this data
    in a numpy array.
    """
    # Load wav files
    fs, audio = wav.read(audio_filename)

    return audioToInputVector(audio, fs, numcep, numcontext)

What do i feed in place of ‘numcep’ and ‘numcontext’? How is it getting calculated or where is it coming from?

lissyx · June 4, 2018, 10:37am

Can you read the source calling that ? It’s clearly trivial. Hint: git grep audiofile_to_input_vector

pra978 · June 8, 2018, 7:58am

This got resolved. Thanks a lot.

I wrote a code to filter out all the files with source_len(audio file) < target_len(transcript) and then tested the code run and it runs fine.

Now i need to use these files and run on CUDA support linux platform.

I have tensorflow-gpu -1.4 and CUDA 8.0.

When i run the main training code, i get this :

tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.9.0: cannot open shared object file: No such file or directory

Does this got to do something with my installation of tensorflow or CUDA binaries??

lissyx · June 8, 2018, 8:01am

Your tensorflow tries to use CUDA 9.0, not 8.0.

pra978 · June 8, 2018, 8:02am

so i uninstall CUDA 8 and install CUDA 9. Right?

lissyx · June 8, 2018, 8:03am

Well, you said TensorFlow GPU 1.4, which should be linked to CUDA 8.0, so I’m a bit doubtful about your setup. I cannot recommend anything.

pra978 · June 8, 2018, 8:07am

Can i give you more information??

When i do ‘nvcc --version’, i get :

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

And when i do ‘pip list | grep tensorflow’, i get:

tensorflow-gpu                     1.4.0      
tensorflow-tensorboard             0.4.0

pra978 · June 8, 2018, 8:08am

Will getting tensorflow 1.6 and CUDA 9.0 help?

Also, if its tensorflow-gpu-1.4 and hence linked to CUDA 8.0, why is it trying to use CUDA 9.0?

lissyx · June 8, 2018, 8:13am

How can I know ? It’s your setup, not mine :-(.

pra978 · June 8, 2018, 8:24am

ohkay but can you recommend which tensorflow/CUDA combination will work?

lissyx · June 8, 2018, 8:25am

You need to check TensorFlow’s upstream for that.

pra978 · June 11, 2018, 11:05am

i checked ’ ldd deepspeech libdeepspeech.so’. It shows :

$ ldd deepspeech libdeepspeech.so 
deepspeech:
	linux-vdso.so.1 =>  (0x00007ffd6d7f0000)
	libcudart.so.9.0 => not found
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fc560d03000)
	libdeepspeech.so => /home/aa/Downloads/deepspeech/DeepSpeech/./libdeepspeech.so (0x00007fc54a4ca000)
	libdeepspeech_utils.so => /home/aa/Downloads/deepspeech/DeepSpeech/./libdeepspeech_utils.so (0x00007fc54a2c5000)
	libsox.so.2 => /usr/lib/x86_64-linux-gnu/libsox.so.2 (0x00007fc54a032000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc549cac000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc549956000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc54973f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc54935f000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc54915b000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc548f3c000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc548d34000)
	libnvidia-fatbinaryloader.so.375.26 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.26 (0x00007fc548ae8000)
	libcusolver.so.9.0 => not found
	libcublas.so.9.0 => not found
	libcudnn.so.7 => not found
	libcufft.so.9.0 => not found
	libcurand.so.9.0 => not found
	libcudart.so.9.0 => not found
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fc5488b9000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc5616f9000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fc5486af000)
	libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007fc54847d000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc548260000)
	libmagic.so.1 => /usr/lib/x86_64-linux-gnu/libmagic.so.1 (0x00007fc54803e000)
	libgsm.so.1 => /usr/lib/x86_64-linux-gnu/libgsm.so.1 (0x00007fc547e30000)
libdeepspeech.so:
	linux-vdso.so.1 =>  (0x00007fff7f3a7000)
	libcusolver.so.9.0 => not found
	libcublas.so.9.0 => not found
	libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f1836f58000)
	libcudnn.so.7 => not found
	libcufft.so.9.0 => not found
	libcurand.so.9.0 => not found
	libcudart.so.9.0 => not found
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f1836d29000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1836b25000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f18367cf000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f18365b0000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f183622a000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1836013000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1835c33000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f184e187000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1835a2b000)
	libnvidia-fatbinaryloader.so.375.26 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.375.26 (0x00007f18357df000)

I have CUDA 8.0 and tensorflow 1.4 currently. What does this signify? Why is it showing ‘not found’ for some files above?

lissyx · June 11, 2018, 2:21pm

You are mixing two things here. This shows libdeepspeech's linkage. It has nothing to do with the TensorFlow python package you installed.

Just install CUDA 9.0 + CuDNNv7 locally and adjust LD_LIBRARY_PATH ?

vijay · July 20, 2018, 10:23am

Hi @pra978, I am also trying to train my model on Indian English accent, where did you get your Indian English datasets?
Thanks.

pra978 · July 20, 2018, 10:42am

Indic tts iit madras data

sayantangangs.91 · December 21, 2018, 6:25pm

Hi @pra978 how did the training with Indic TTS go? Did you train from zero or did you train from the officially released checkpoint?