TUTORIAL : How I trained a specific french model to control my robot

Yep.
Download the complete vocabulary file of the last ds model,
Add your own sentences, build LM.

But, are you sure that your words aren t in the model ??

An easy way : record the needed sentences, with a good online us text to speech,
Convert it to 16k mono, and test the model…

I did it for some tests, and it works perfectly.

Hope it Will help

okay i will try. can you please share the link where i can get the vocabulary file of the last model if you know.
thanks

this is the issue am facing…! can anyone help me with this…!

hi @elpimous_robot, if you dont mind can you please explain the below lines…
--early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22

Hello.
Early stop and its params are used to limit the overfitting.
Dropout_rate too.
Perhaps could you investigate on tensorflow learning params.
Have a nice day.
Vincent

yeah … Thank you…

Hi, I am using DeepSpeech 0.4.1 for developing Urdu Language ASR using Deepspeech.
I developed the language model, data is prepared, wrote the alphabets in alphabets.txt as per given guideline in this post.
Now I am trying to generate trie file. Bur I am having this error.
ERROR: VectorFst::Write: Write failed:

Please help. Thank you so much!

Could you give a bit more info on how you’ve attempted to generate the trie? For example the command line and arguments you ran.

/home/rc/Desktop/0.4.1/DeepSpeech-master/native-client-U/generate_trie //home/rc/Desktop/0.4.1/DeepSpeech-master/data/alphabet.txt //home/rc/Desktop/0.4.1/DeepSpeech-master/data/lm/lm.binary //home/rc/Desktop/0.4.1/DeepSpeech/data/trie

I am following this tutorial to generate trie file.

One thing I changed was to bump the n_hidden size up to an even number (1024, based of issue #1241’s results). The first time I ran my model with an odd number returned a warning and WER wasn’t great:
“BlockLSTMOp is inefficient when both batch_size and input_size are odd. You are using: batch_size=1, input_size=375”

How to create lm/trie ?

Hi @yogesha,
Your first post… perhaps you could start with a simple "hello…":wink:

Welcome in this Discourse section.

Have a look at your Deepspeech directory :
Deepspeech/bin/lm
Under, in files, you’ll have the commands to lm creation.

You ll need too to install kenlm files ! (See beginning tuto)
Have a nice day YogeshA.

Hi, Thanks to you tutorial, have a litle question, how can compile this? I am not an expert in this topic

Hi.
Yes you need to compile kenlm libs…

For a deepspeech/native client compilation, if needed, see the readme.md in native client.

That is indeed very informative in terms of coding up the robots

1 Like

Hello @elpimous_robot

Thanks a lot for your tutorial, it really helps :slight_smile:

I have a question for you. You use sentences to train your model in order to use this model with command. Does theses sentences concern only your commands ?
I explain myself :
I want a robot who understands 10 orders.
Do I have to train my model with only sentences with the words concerned by these orders ?
Exemple: order “What time is it ?”
what kind of sentences do I have to had in my wav files ?

Last question: Do single words orders means single words sentences for training ?

Thanks again for this tutorial, I hope my question is not too blur :wink:

1 Like

Hi.
What robot do you have??

Well, you need 10 orders… Only?
With my robot, working on social approach, so I make it learn same order, asked with different forms…
Ex : what time is it - what is the time - could you tell me the time…

For single words like stop - yes - no…
I’ll suggest you to record only the word.
The lm works like probabilities.
If “stop” is learnt into a whole sentence, (ex : I want you to stop now)
Perhaps that the model could interpret a noise as a possible word, (near the stop word learnt into the sentence) (ex : to stop, or stop now)

A simple thing to keep in mind :
The more the possibilities, the more the error risk in inferences.

What I ll do :
For a few limited orders,
Record same sentences, max 5s
Vary intonations
Vary the environment noise (rooms, walls, front, back…up, down…)
Vary the location of record (echos difficult to hear in records, but heard by robot).
With a robot use, record the robot noise too (if wheels, record at different speeds… Same for dynamixels…)…
To avoid overfitting in learning, perhaps record nearly 50 to 100 sentences per order minimum.

Use only your few limited orders for lm and trie build.

You should obtain a small model with good results…

Hope that you have a good robot microphone.
(mine is a respeaker 4 mic array)

Hope it will help you…

PS : hi to everyone…:wave:

Vincent

1 Like

Hi,

Thanks for your quick reply ! Lot of useful stuff in there.

The final goal is for a robot (don’t know what type yet, still pretty in discussion), so for now it’ll be an app on a tablet. 100% of the tablet resources will be for this app.

For now, we only need 10-20 orders yeah. Such as “Register”, the tablet know that “register” mean “register the next 10 min of your CPU usage” per example

I though of using combination between my commands words and what i call phoneme-words. It’s a group of words where each of them represent a particular phonemes of the french language, like [é] or [p]. The goal is for my model to know all the phonemes and have a quality recognition. But what you said :

makes me doubt this method.

If I get it right, you mean records “Register” with a lot of variation such as noise, radio noise, emotion, distance, intensity of the voice…

That’s mean my lm is a unigram ?

Haven’t though of the microphone quality as I start using audacity and modifying my records on it (and respects the caracteristics needed mono, 16kHz and 16 bits), but I will :slight_smile:

I forgot to say, it’s a poly-speaker app. So can I count the 50-100 variations of sentences per order per speaker or does the speaker is another variation ? I don’t want my recognizer to be speaker-blocked…

Thanks again, your help is very useful ! It’s a vast subject and reading your tutorial and response helps me get in the right direction :smiley:

Hi, you have done a great job here. I have learned a lot of things from this thread, it’s a perfect tutorial especially for a beginner like me!
My model is working so i decide to extend my dataset using data augmentation technique. Specifically i want to increase speech speed but when i call voice-corpus-tool help command i can’t find the parameter “speed” like you mentioned. I would appreciated it if you give an example of the command that you used in your case.

Hey @ctzogka,

Could you tell us more about your use case, your data, the different steps and difficulties you encounter ? I might help people if you share theses informations :slight_smile:

Thanks !