TUTORIAL : How I trained a specific french model to control my robot

rashed.genuity · August 4, 2019, 4:02am

Can anyone enlighten me, I am stuck here Fatal Python Error: Segmentation fault.
I am using a vertual environment. and have run DeepSpeech using below .sh file.

This is my error log,

This is my .sh file

dipfcl · August 22, 2019, 1:13pm

I’am taking advantage of some online books in spanish language … which have some accented letters (áéíóú) … should i include them in the alphabet.txt … or must i transform them into letters without an accent in the text … finally … should i leave all letters in lower case (in the text) and in the process eliminating all the complementary symbols (-_¡!¿? …) … thank’s in advance

carlfm01 · August 22, 2019, 10:44pm

@dipfcl

FYI: Here’s my alphabet for spanish:

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ü
á
é
í
ó
ú
ñ

alphabet.zip (221 Bytes)

Noticed that I removed the #, using the # fails with the trie.

carlfm01 · August 22, 2019, 10:45pm

Fom public domain? I’m also training for Spanish

elpimous_robot · August 23, 2019, 7:49am

Hi @carlfm01
Yes, you need to keep accented letters.
And you need to preprocess your text, to make it lowercase.
M and n are not same letters, for deepspeech system, and it would be nearly impossible for it to produce correct inferences…
Have a nice day
Vincent

dipfcl · August 23, 2019, 8:21pm

Hello

@carlfm01 https://librivox.org

dipfcl · August 23, 2019, 8:25pm

Thank’s for all the tips

Your alphabet file miss that instruction

The last (non-comment) line needs to end with a newline.

carlfm01 · August 23, 2019, 8:34pm

The forum removes it, check the one inside the zip file. Maybe this is useful for you: https://www.kaggle.com/carlfm01/120h-spanish-speech/

There’s a lot of new spanish data on openslr, http://www.openslr.org/resources.php

dipfcl · August 23, 2019, 9:13pm

Hello

I’am doing this one … based on the following tutorial and some python scripts

Creating an open speech recognition dataset for (almost) any language | by Andreas Klintberg | Medium

elpimous_robot · December 11, 2019, 11:26am

Well done @hello56445.
Could you share with me your model ?

lissyx · December 11, 2019, 5:05pm

Have you tried the model I shared in Modèle français 0.3 pour DeepSpeech v0.6 ?

elpimous_robot · December 11, 2019, 5:38pm

Hi @lissyx
I’ll test it this week.
Thanks friend

lissyx · December 11, 2019, 7:51pm

Please share and contribute those

deep_learning · January 10, 2020, 10:19am

Hello,
I have also limited audio data with 10 command in korean. I want to know the hyper parameter used for training your model. could you provide the details of your model parameter? how can I create language model with just 10 command?

elpimous_robot · January 10, 2020, 10:36am

Hello.
For very small model, I tested with small n_hidden, and results were better!!

Try with n_hidden = 464

deep_learning · January 10, 2020, 11:01am

Thanks, what about other parameter value? you used all same as deep speech models. could you provide more details so that it can be helpful for starting training.

Sudarshan.gurav14 · February 21, 2020, 5:16am

@elpimous_robot as you said for

TRIE CREATION :

you use

alphabet.txt
lm.binary
vacab.txt

and you create trie right?

is it required to use vocab.txt for trie creation ?

also one more question

for creation vocab.txt suppose in my wave transcripts repeated

eg:
1000.wav 23093 hello good morning [male voice]
2000.wav 32424 hello good morning [female voice]

so capy both sentence or not?

elpimous_robot · February 21, 2020, 6:55pm

Hello

Yes, and no !

For trie creation, you need textual sentences, to work with probabilities, accuracy…

Vocab.txt doesn’t need multiple times the same sentence

hope to help.
Vincent

Sudarshan.gurav14 · February 22, 2020, 12:58pm

@elpimous_robot Thanks for help

elpimous_robot · February 22, 2020, 6:38pm

Ready to help again my friend

Topic		Replies	Views
Training Deepspeech DeepSpeech	8	4021	January 21, 2020
Issue regarding dataset format DeepSpeech	1	506	April 7, 2020
Trained model on my own data DeepSpeech	48	4603	May 29, 2021
How to Fine Tune using Ted-Lium3? DeepSpeech learning	3	532	December 8, 2020
Train for only one voice DeepSpeech learning	4	1148	March 26, 2019

TUTORIAL : How I trained a specific french model to control my robot

The last (non-comment) line needs to end with a newline.

Related topics