TUTORIAL : How I trained a specific french model to control my robot

Can anyone enlighten me, I am stuck here Fatal Python Error: Segmentation fault.
I am using a vertual environment. and have run DeepSpeech using below .sh file.

This is my error log,

This is my .sh file
shFile

I’am taking advantage of some online books in spanish language … which have some accented letters (áéíóú) … should i include them in the alphabet.txt … or must i transform them into letters without an accent in the text … finally … should i leave all letters in lower case (in the text) and in the process eliminating all the complementary symbols (-_¡!¿? …) … thank’s in advance :wink:

@dipfcl

FYI: Here’s my alphabet for spanish:

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ü
á
é
í
ó
ú
ñ

alphabet.zip (221 Bytes)

Noticed that I removed the #, using the # fails with the trie.

1 Like

Fom public domain? I’m also training for Spanish

Hi @carlfm01
Yes, you need to keep accented letters.
And you need to preprocess your text, to make it lowercase.
M and n are not same letters, for deepspeech system, and it would be nearly impossible for it to produce correct inferences…
Have a nice day
Vincent

Hello

@carlfm01 https://librivox.org

Thank’s for all the tips

Your alphabet file miss that instruction

The last (non-comment) line needs to end with a newline.

The forum removes it, check the one inside the zip file. Maybe this is useful for you: https://www.kaggle.com/carlfm01/120h-spanish-speech/

There’s a lot of new spanish data on openslr, http://www.openslr.org/resources.php

Hello

I’am doing this one … based on the following tutorial and some python scripts :smile:

https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf

Well done @hello56445.
Could you share with me your model ? :wink:

Have you tried the model I shared in Modèle français 0.3 pour DeepSpeech v0.6 ?

Hi @lissyx
I’ll test it this week.
Thanks friend

Please share and contribute those

Hello,
I have also limited audio data with 10 command in korean. I want to know the hyper parameter used for training your model. could you provide the details of your model parameter? how can I create language model with just 10 command?

Hello.
For very small model, I tested with small n_hidden, and results were better!!

Try with n_hidden = 464

Thanks, what about other parameter value? you used all same as deep speech models. could you provide more details so that it can be helpful for starting training.

@elpimous_robot as you said for

TRIE CREATION :

you use

alphabet.txt
lm.binary
vacab.txt

and you create trie right?

is it required to use vocab.txt for trie creation ?

also one more question

for creation vocab.txt suppose in my wave transcripts repeated

eg:
1000.wav 23093 hello good morning [male voice]
2000.wav 32424 hello good morning [female voice]

so capy both sentence or not?

Hello

Yes, and no !

For trie creation, you need textual sentences, to work with probabilities, accuracy…

Vocab.txt doesn’t need multiple times the same sentence

hope to help.
Vincent

@elpimous_robot Thanks for help

1 Like

Ready to help again my friend