TUTORIAL : How I trained a specific french model to control my robot

(Phanthanhlong7695) #33

it done. i fixed this

(Phanthanhlong7695) #34

and how can i create trie file .


I’ve been able to use a Python tool to cut a WAV into word chunks - Longer audio files with Deep Speech

The audio outputs range from 1 second to 49 seconds. How will the longer (than 3 to 5 seconds) audio lengths affect the building of a model ?

(Vincent Foucault) #36

Well, as you can see in the deepspeech process,
A wav is cut using miliseconds.
Each part of the audio cut is “linked” to a vocabulary word character, and both are sent to “builder”.

There is a big error risks in this process, because a really small gap could result in lots of errors. (Big gap, characters errors…)

So, a small wav file, nearly 5s is the “best” compromise.

You could think : “so, I’ll use wav’s about 1 word only, to avoid gap”

It’s not a good idea : starting a word and continue a word after a previous one doesn’t produce same wave form (amplitude) beginning.
Ex: “hello”, "I say hello"
Often, the waveform beginning is highter in a start word.

Don’t hesitate to share with us your tests.

(Vincent Foucault) #38

Please be more explicit because I don’t understand your question.


Yes, and I appreciate your thread here is based on building the wav files used for training, by speaking. However, that is not always the case, as sometimes we may want to do the ‘same’ type of building (i.e. build our own models), but the WAV sources used are all from a WAV file. Hence the need to cut a WAV file into small pieces, and attempting to keep words within each cut.

That is, no broken words.

As you say, a small WAV of maximum duration of 5 seconds is ideal. I have been testing the "Python interface to the WebRTC Voice Activity Detector " at https://github.com/wiseman/py-webrtcvad

There is a python script there, example.py and I ran it against a 10 minute WAV file. The results were 56 WAV files, duration range from 00.63 seconds to 49.38 seconds.

Then the author of that package advised how to cut down the range duration size, as 49.38 seconds is a long way from your recommendation of 5 seconds max. The results then were 243 WAV files, duration range from 00.18 seconds to 13.44 seconds.

Of course some of those smaller duration sized WAV are just noise and even no noise, at least not that I could hear. Some 2 or 3 worded WAV’s were only 2 seconds long and there are quite a few that are just 1 word in duration.

Of those 243 WAV files, there are only 31 that exceed your recommendation of 5 seconds though, so that seems encouraging.

Longer audio files with Deep Speech
(Vincent Foucault) #41

Very good, Jehoshua.
Train it and tell us about wer…

(Matti Meikäläinen) #43

After running the command:

/bin/bin/./build_binary -T -s words.arpa lm.binary

I get for some (but not all) vocabularies the following error:

vocab.cc:305 in void lm::ngram::MissingSentenceMarker(const lm::ngram::Config&, const char*) threw SpecialWordMissingException.
The ARPA file is missing < /s > and the model is configured to reject these models. Run build_binary -s to disable this check. Byte: 106432571

Do you know what causes it?

(Vincent Foucault) #44

Hi Mark2.
I think you should ask to Kenneth, the creator of kenlm tools :
It’s a lm problem, regarding to silences.
I saw issues on it github, if I remember !

Did you add silences in your “file”.txt, before converting to arpa ?
Me, no !
I just added a sentence per lign, without punctuation
I didn’t have any problems
Good luck

Create A SubSet of existing models
(Gr8nishan) #45

Thanks for sharing such a wonderful article … but can you please share a snapshot of your csv as i am confused that do we need to give the full path of the wav files or only their name

(Vincent Foucault) #46

Thanks for compliments.

here is a sample of a typical deepspeech csv file :

/home/nvidia/DeepSpeech/data/alfred/dev/record.1.wav,87404,qui es-tu et qui est-il
/home/nvidia/DeepSpeech/data/alfred/dev/record.2.wav,101804,quel est ton nom ou comment tu t'appelles
/home/nvidia/DeepSpeech/data/alfred/dev/record.3.wav,65324,est-ce que tu vas bien 

You must respect the first line (needed to create columns for CSV usage)
And each next line inform 3 values, separated by a comma :

  • where is the wav file, (I use complete link, perhaps relative path could work ?!)
  • what is it size, (you can have size with this : os.path.getsize(“the wav file”))
  • what is the transcript (in the wav language)

Take a look at …DeepSpeech/bin/import_ldc93s1.py, L23 for CSV creation !!

About transcript, pay attention to only enter characters present in alphabet.txt, otherwise you’ll encounter errors when training.

Hope it will help you.

(Phanthanhlong7695) #47

but i have more than 16000 file wav. how can i write in csv file.
we can follow the same DeepSpeech/bin/import_ldc93s1.py to do write in csv file. That right ?

(Gr8nishan) #48

Thanks for the help when i was trying from relative path it was not working for me but giving the full absolute path worked

(Vincent Foucault) #49

@gr8nishan, thanks for info !
@phanthanhlong7695, try this :

save it in a python file :
run it as python2, and follow asks !! You’ll have nice finished CSV file !
if python3, you’ll have some minor changes to do !

when asked for prefix, enter only prefix wav (all before numbers)
ex : audio223 -> audio ; audio.223 -> audio.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import os
import fnmatch

print('\n\n°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°  ')
print('                         CSV creator :                           ')  
print('                         -------------                           ')                
print('      -  adding CSV columns,                                            ')
print('      -  files location, bytes size, and transcription.           ')
print('              Vincent FOUCAULT,     Septembre 2017            ')

def process():
    directory = raw_input('Paste here the location of your wavs:\n>> ')
    directory = directory.replace('file://','')
    textfile = raw_input('Paste here the location of your transcript text:\n>> ')
    textfile = textfile.replace('file://','')
    sentenceTextFile = open(textfile, 'rb')
    sentences = sentenceTextFile.readlines()
    csv_file = raw_input('Paste here the complete CVS file link:\n>> ')
    csv_file = csv_file.replace('file://','')
    transcriptions = open(csv_file, 'wb')

    wavDir = directory
    wav_prefix = raw_input('Enter the prefix of wav file (ex : if record.223.wav --> enter "record.") :\n>> ')
    wavs = directory+"/"+wav_prefix
    print('your wav dir is : '+directory)
    print('wave prefix name is : '+wav_prefix)
    print('transcript is here : '+textfile)
    print('you want to save CSV here : '+csv_file)
    content = len(fnmatch.filter(os.listdir(wavDir), '*.wav'))
    print('\nNumber of wav found : '+str(content)+'\n')
    for i in range(content):
        wavPath = wavs+str(i+1)+'.wav'
if __name__ == "__main__":
        print('--->  CSV passed !')
        print('\n\n --->  Bye !!\n\n')
        print('An error occured !! Check your links.')
        print('GOOD LUCK !!')

Here is the terminal result :

your wav dir is : /media/nvidia/neo_backup/DeepSpeech/data/alfred/test2/
wave prefix name is : record.
transcript is here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test.txt
you want to save CSV here : /media/nvidia/neo_backup/DeepSpeech/data/alfred/text2/test_final.csv

Number of wav found : 71

—> CSV passed !

—> Bye !!


Hi Mark,

I ran into the same problem as this. Were you able to find a solution to this??

Prafful’s MacBook Pro:~ naveen$ /Users/naveen/Downloads/kenlm/build/bin/build_binary -T -s /Users/naveen/Downloads/kenlm/build/words.arpa lm.binary
Reading /Users/naveen/Downloads/kenlm/build/words.arpa

/Users/naveen/Downloads/kenlm/lm/vocab.cc:305 in void lm::ngram::MissingSentenceMarker(const lm::ngram::Config &, const char *) threw SpecialWordMissingException.
The ARPA file is missing and the model is configured to reject these models. Run build_binary -s to disable this check. Byte: 191298

(Vincent Foucault) #51

How did you record your arpa ?
/bin/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3

(Matti Meikäläinen) #52


I have quite vague understanding what caused that error in my case. I think something related to wrong characters or wrong encoding. But I fixed the problem by filtering out from the vocabulary all characters that are not present in my alphabet.

In Python something like that:
PERMITTED_CHARS = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
new_data = “”.join(c for c in data if c in PERMITTED_CHARS)


I am trying this process on macos. I have got everything done except the trie file. When i am trying to generate the trie file, i am getting this error using the details provided:-

“cannot execute binary file”

when i searched this error, i see that its a linux file. is it so??

Can anyone help me out?

btw, this is what i am running:

/Users/naveen/generate_trie / /Users/naveen/Downloads/DeepSpeech/alphabet.txt / /Users/naveen/Downloads/DeepSpeech/lm.binary / /Users/naveen/Downloads/DeepSpeech/vocabulary.txt / /Users/naveen/Downloads/DeepSpeech/trie



yup, like this only. Finally, this got resolved when i did " Run build_binary -s to disable this check. " as suggested

(Abdelrhman D) #55

Hey, thank you for the tutorial , it’s really helpful.
I have been trying to train a french model using this data. https://datashare.is.ed.ac.uk/handle/10283/2353
i divided the data 6800 files training, 1950 dev, 976 test.
i followed all your steps, but the loss is really high and it doesn’t decrease much , it doesn’t go below 160 , and if i enabled the early stop it would stop at 46 epochs
any thoughts ?