TUTORIAL : How I trained a specific french model to control my robot

Hi Mark2.
I think you should ask to Kenneth, the creator of kenlm tools :
It’s a lm problem, regarding to silences.
I saw issues on it github, if I remember !

Did you add silences in your “file”.txt, before converting to arpa ?
Me, no !
I just added a sentence per lign, without punctuation
I didn’t have any problems
Good luck


Thanks for sharing such a wonderful article … but can you please share a snapshot of your csv as i am confused that do we need to give the full path of the wav files or only their name

Thanks for compliments.

here is a sample of a typical deepspeech csv file :

/home/nvidia/DeepSpeech/data/alfred/dev/record.1.wav,87404,qui es-tu et qui est-il
/home/nvidia/DeepSpeech/data/alfred/dev/record.2.wav,101804,quel est ton nom ou comment tu t'appelles
/home/nvidia/DeepSpeech/data/alfred/dev/record.3.wav,65324,est-ce que tu vas bien 

You must respect the first line (needed to create columns for CSV usage)
And each next line inform 3 values, separated by a comma :

  • where is the wav file, (I use complete link, perhaps relative path could work ?!)
  • what is it size, (you can have size with this : os.path.getsize(“the wav file”))
  • what is the transcript (in the wav language)

Take a look at …DeepSpeech/bin/import_ldc93s1.py, L23 for CSV creation !!

About transcript, pay attention to only enter characters present in alphabet.txt, otherwise you’ll encounter errors when training.

Hope it will help you.

1 Like

but i have more than 16000 file wav. how can i write in csv file.
we can follow the same DeepSpeech/bin/import_ldc93s1.py to do write in csv file. That right ?

1 Like

Thanks for the help when i was trying from relative path it was not working for me but giving the full absolute path worked

@gr8nishan, thanks for info !
@phanthanhlong7695, try this :

save it in a python file :
run it as python2, and follow asks !! You’ll have nice finished CSV file !
if python3, you’ll have some minor changes to do !

when asked for prefix, enter only prefix wav (all before numbers)
ex : audio223 -> audio ; audio.223 -> audio.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import os
import fnmatch

print('\n\n°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°  ')
print('                         CSV creator :                           ')  
print('                         -------------                           ')                
print('      -  adding CSV columns,                                            ')
print('      -  files location, bytes size, and transcription.           ')
print('              Vincent FOUCAULT,     Septembre 2017            ')

def process():
    directory = raw_input('Paste here the location of your wavs:\n>> ')
    directory = directory.replace('file://','')
    textfile = raw_input('Paste here the location of your transcript text:\n>> ')
    textfile = textfile.replace('file://','')
    sentenceTextFile = open(textfile, 'rb')
    sentences = sentenceTextFile.readlines()
    csv_file = raw_input('Paste here the complete CVS file link:\n>> ')
    csv_file = csv_file.replace('file://','')
    transcriptions = open(csv_file, 'wb')

    wavDir = directory
    wav_prefix = raw_input('Enter the prefix of wav file (ex : if record.223.wav --> enter "record.") :\n>> ')
    wavs = directory+"/"+wav_prefix
    print('your wav dir is : '+directory)
    print('wave prefix name is : '+wav_prefix)
    print('transcript is here : '+textfile)
    print('you want to save CSV here : '+csv_file)
    content = len(fnmatch.filter(os.listdir(wavDir), '*.wav'))
    print('\nNumber of wav found : '+str(content)+'\n')
    for i in range(content):
        wavPath = wavs+str(i+1)+'.wav'
if __name__ == "__main__":
        print('--->  CSV passed !')
        print('\n\n --->  Bye !!\n\n')
        print('An error occured !! Check your links.')
        print('GOOD LUCK !!')

Here is the terminal result :

your wav dir is : /media/nvidia/neo_backup/DeepSpeech/data/alfred/test2/
wave prefix name is : record.
transcript is here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test.txt
you want to save CSV here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test_final.csv

Number of wav found : 71

—> CSV passed !

—> Bye !!

Hi Mark,

I ran into the same problem as this. Were you able to find a solution to this??

Prafful’s MacBook Pro:~ naveen$ /Users/naveen/Downloads/kenlm/build/bin/build_binary -T -s /Users/naveen/Downloads/kenlm/build/words.arpa lm.binary
Reading /Users/naveen/Downloads/kenlm/build/words.arpa

/Users/naveen/Downloads/kenlm/lm/vocab.cc:305 in void lm::ngram::MissingSentenceMarker(const lm::ngram::Config &, const char *) threw SpecialWordMissingException.
The ARPA file is missing and the model is configured to reject these models. Run build_binary -s to disable this check. Byte: 191298

How did you record your arpa ?
/bin/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3


I have quite vague understanding what caused that error in my case. I think something related to wrong characters or wrong encoding. But I fixed the problem by filtering out from the vocabulary all characters that are not present in my alphabet.

In Python something like that:
PERMITTED_CHARS = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
new_data = “”.join(c for c in data if c in PERMITTED_CHARS)

I am trying this process on macos. I have got everything done except the trie file. When i am trying to generate the trie file, i am getting this error using the details provided:-

“cannot execute binary file”

when i searched this error, i see that its a linux file. is it so??

Can anyone help me out?

btw, this is what i am running:

/Users/naveen/generate_trie / /Users/naveen/Downloads/DeepSpeech/alphabet.txt / /Users/naveen/Downloads/DeepSpeech/lm.binary / /Users/naveen/Downloads/DeepSpeech/vocabulary.txt / /Users/naveen/Downloads/DeepSpeech/trie


yup, like this only. Finally, this got resolved when i did " Run build_binary -s to disable this check. " as suggested

Hey, thank you for the tutorial , it’s really helpful.
I have been trying to train a french model using this data. https://datashare.is.ed.ac.uk/handle/10283/2353
i divided the data 6800 files training, 1950 dev, 976 test.
i followed all your steps, but the loss is really high and it doesn’t decrease much , it doesn’t go below 160 , and if i enabled the early stop it would stop at 46 epochs
any thoughts ?

I think the problem was with the frequency of the files. they were in (41000 Hz) and i converted them to (16000 Hz) and it works better now.

Very good…
And wav must be correctly
Sampled :
Ex : test
Wav on audacity / it should reach ±0.5 amplitude…

The max (±0.5) the better for training.

Ps: what is your total wav duration for french ??

it’s a about ten hours. i’m facing another problem. the ten hours are for the same female voice. when i tried to use other recordings for a different male person, it didn’t work. is the model sensitive to the voice itself ?

No. The computer does t mind !!
It should be a wav format error, or some alphabet changes (or csv)

maybe i wasn’t so clear, i trained with female voice only, and tried to test with male voice and different tune , but it didn’t give a good output (random text)

Ah… not same !!!

The model only knows this girl voice !!

This is why we need a max of different speakers, to let the model try to anderstand an unknown one (principle of this deep learning!)
Hope this will help.

1 Like

yes, thank you . i will try to have more data and different speakers. thank you again :slight_smile:

Do it right…perhaps I’ll ask you to test your model !! LOL

I don t see wave length on your link !!

Do you know the total wave length on the website, for french ?