TUTORIAL : How I trained a specific french model to control my robot


Yes, and no !

For trie creation, you need textual sentences, to work with probabilities, accuracy…

Vocab.txt doesn’t need multiple times the same sentence

hope to help.

@elpimous_robot Thanks for help

1 Like

Ready to help again my friend

@elpimous_robot :blush:

@elpimous_robot can you help me below issue

@victornoriega7 he is also help me in this issue

hello deep_learning:
have you trained a korean model successly? i am training a korean model, but i meet much troubles ,could yuu please share some expirement with me ? how do you train it? thanks !

The way you explained the steps of training the model is amazing, I believe you helped a lot of people with this post, including me. I am a newbie regarding this whole concept and I’m stuck at the very first point of implementing a model for my native language. The csv file containing the wav_name, wav_size, transcript was done manually or by using a script that gathers these data? I have around 1500 .wav and I’m a little concerned about creating the csv manually (checking each .wav the size, etc). If there is a script that someone created in order to avoid all this struggle and would like to share…Thank you! :smiley:

I did something like…

for i in $(ls *.wav); do
  fs=$(stat --printf="%s" $i)
  ts=$(cat $i.txt) ## assuming transcript for the wav file is file.wav.txt
  echo "$i,$fs,$ts" >> mydata.csv
1 Like

Training on cpu?

If you have early stop on, it will stop training when it’s not seeing loss decrease significantly. As you’re on epoch 0 still you might need to adjust your settings a bit more or, if you’re cpu training, train on a gpu.

1 Like

I was looking at mozilla voice-corpus-tool and I can’t see all of the effects that have been listed above by @elpimous_robot, the only ones that I can see on the github repo as well as when I run ‘help’ are:


Distortion by mp3 compression
kbit: int - Virtual bandwidth in kBit/s

Resampling to different sample rate
rate: int - Sample rate to apply

augment [-times ] [-gain ]
Augment samples of current buffer with noise
source: string - CSV file with samples to augment onto current sample buffer
-times: int - How often to apply the augmentation source to the sample buffer
-gain: float - How much gain (in dB) to apply to augmentation audio before overlaying onto buffer samples

How to create this file libctc_decoder_with_kenlm.so ?

I have only lm.binary and trie file created using KenLM

I have seen you commands generate try with is have 5 parameter but now it won’t accept 5 parameter it required only 4 parameter vocab.txt not accept

Please help in that

Please, make an effort and understand this tutorial was contributed by @elpimous_robot a long time ago, and that the project moved since, and a lot of the instructions are deprecated.

can we use this method to prepare data and train deepspeech v0.6.0?

Hello i have a question, i have a three type of files, train, dev and test, and the words and audio within these must be same ? i mean, dev files and test files must include in train files ?

Please read the guidelines, it says explicitly don’t hijack old threads …

where is that guidlines ?

Really? I posted them with your first thread, that you abandoned to hijack this one, but here you go:


Please stop spamming other threads, do your homework and we are happy to help.

hi, i have a problem,
FATAL Flags parsing error: flag --alphabet_config_path=/content/gdrive/My\ drive/deepSpeech/alphabet_rus.txt: The file pointed to by --alphabet_config_path must exist and be readable.
Pass --helpshort or --helpfull to see help on flags.

Here is my datasets:

!python3 DeepSpeech.py \

--drop_source_layers 1 \

--alphabet_config_path "/content/gdrive/My\ drive/deepSpeech/alphabet_rus.txt" \

--save_checkpoint_dir /content/gdrive/My\ drive/deepSpeech/savecheckpoint \

--load_checkpoint_dir /content/gdrive/My\ drive/deepSpeech/loadcheckpoint \

--train_files   /content/gdrive/My\ drive/deepSpeech/train/train.csv \

--dev_files   /content/gdrive/My\ drive/deepSpeech/dev/dev.csv \

--test_files  /content/gdrive/My\ drive/deepSpeech/test/test.csv \

The error message is explaining you. Please read your error messages before reaching for help and stop hijacking unrelated threads. This is spamming behavior and it makes the forum much less readable for anyone.