hi @elpimous_robot, if you dont mind can you please explain the below lines… --early_stop True --earlystop_nsteps 6 --estop_mean_thresh 0.1 --estop_std_thresh 0.1 --dropout_rate 0.22
Hello.
Early stop and its params are used to limit the overfitting.
Dropout_rate too.
Perhaps could you investigate on tensorflow learning params.
Have a nice day.
Vincent
Hi, I am using DeepSpeech 0.4.1 for developing Urdu Language ASR using Deepspeech.
I developed the language model, data is prepared, wrote the alphabets in alphabets.txt as per given guideline in this post.
Now I am trying to generate trie file. Bur I am having this error.
ERROR: VectorFst::Write: Write failed:
One thing I changed was to bump the n_hidden size up to an even number (1024, based of issue #1241’s results). The first time I ran my model with an odd number returned a warning and WER wasn’t great:
“BlockLSTMOp is inefficient when both batch_size and input_size are odd. You are using: batch_size=1, input_size=375”
I have a question for you. You use sentences to train your model in order to use this model with command. Does theses sentences concern only your commands ?
I explain myself :
I want a robot who understands 10 orders.
Do I have to train my model with only sentences with the words concerned by these orders ?
Exemple: order “What time is it ?”
what kind of sentences do I have to had in my wav files ?
Last question: Do single words orders means single words sentences for training ?
Thanks again for this tutorial, I hope my question is not too blur
Well, you need 10 orders… Only?
With my robot, working on social approach, so I make it learn same order, asked with different forms…
Ex : what time is it - what is the time - could you tell me the time…
For single words like stop - yes - no…
I’ll suggest you to record only the word.
The lm works like probabilities.
If “stop” is learnt into a whole sentence, (ex : I want you to stop now)
Perhaps that the model could interpret a noise as a possible word, (near the stop word learnt into the sentence) (ex : to stop, or stop now)
A simple thing to keep in mind :
The more the possibilities, the more the error risk in inferences.
What I ll do :
For a few limited orders,
Record same sentences, max 5s
Vary intonations
Vary the environment noise (rooms, walls, front, back…up, down…)
Vary the location of record (echos difficult to hear in records, but heard by robot).
With a robot use, record the robot noise too (if wheels, record at different speeds… Same for dynamixels…)…
To avoid overfitting in learning, perhaps record nearly 50 to 100 sentences per order minimum.
Use only your few limited orders for lm and trie build.
You should obtain a small model with good results…
Hope that you have a good robot microphone.
(mine is a respeaker 4 mic array)
Thanks for your quick reply ! Lot of useful stuff in there.
The final goal is for a robot (don’t know what type yet, still pretty in discussion), so for now it’ll be an app on a tablet. 100% of the tablet resources will be for this app.
For now, we only need 10-20 orders yeah. Such as “Register”, the tablet know that “register” mean “register the next 10 min of your CPU usage” per example
I though of using combination between my commands words and what i call phoneme-words. It’s a group of words where each of them represent a particular phonemes of the french language, like [é] or [p]. The goal is for my model to know all the phonemes and have a quality recognition. But what you said :
makes me doubt this method.
If I get it right, you mean records “Register” with a lot of variation such as noise, radio noise, emotion, distance, intensity of the voice…
That’s mean my lm is a unigram ?
Haven’t though of the microphone quality as I start using audacity and modifying my records on it (and respects the caracteristics needed mono, 16kHz and 16 bits), but I will
I forgot to say, it’s a poly-speaker app. So can I count the 50-100 variations of sentences per order per speaker or does the speaker is another variation ? I don’t want my recognizer to be speaker-blocked…
Thanks again, your help is very useful ! It’s a vast subject and reading your tutorial and response helps me get in the right direction
Hi, you have done a great job here. I have learned a lot of things from this thread, it’s a perfect tutorial especially for a beginner like me!
My model is working so i decide to extend my dataset using data augmentation technique. Specifically i want to increase speech speed but when i call voice-corpus-tool help command i can’t find the parameter “speed” like you mentioned. I would appreciated it if you give an example of the command that you used in your case.
Could you tell us more about your use case, your data, the different steps and difficulties you encounter ? I might help people if you share theses informations