TUTORIAL : How I trained a specific french model to control my robot

Can anyone enlighten me, I am stuck here Fatal Python Error: Segmentation fault.
I am using a vertual environment. and have run DeepSpeech using below .sh file.

This is my error log,

This is my .sh file

I’am taking advantage of some online books in spanish language … which have some accented letters (áéíóú) … should i include them in the alphabet.txt … or must i transform them into letters without an accent in the text … finally … should i leave all letters in lower case (in the text) and in the process eliminating all the complementary symbols (-_¡!¿? …) … thank’s in advance :wink:

@dipfcl

FYI: Here’s my alphabet for spanish:

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
ü
á
é
í
ó
ú
ñ

alphabet.zip (221 Bytes)

Noticed that I removed the #, using the # fails with the trie.

1 Like

Fom public domain? I’m also training for Spanish

Hi @carlfm01
Yes, you need to keep accented letters.
And you need to preprocess your text, to make it lowercase.
M and n are not same letters, for deepspeech system, and it would be nearly impossible for it to produce correct inferences…
Have a nice day
Vincent

Hello

@carlfm01 https://librivox.org

Thank’s for all the tips

Your alphabet file miss that instruction

The last (non-comment) line needs to end with a newline.

The forum removes it, check the one inside the zip file. Maybe this is useful for you: https://www.kaggle.com/carlfm01/120h-spanish-speech/

There’s a lot of new spanish data on openslr, http://www.openslr.org/resources.php

Hello

I’am doing this one … based on the following tutorial and some python scripts :smile:

Creating an open speech recognition dataset for (almost) any language | by Andreas Klintberg | Medium

Well done @hello56445.
Could you share with me your model ? :wink:

Have you tried the model I shared in Modèle français 0.3 pour DeepSpeech v0.6 ?

Hi @lissyx
I’ll test it this week.
Thanks friend

Please share and contribute those

Hello,
I have also limited audio data with 10 command in korean. I want to know the hyper parameter used for training your model. could you provide the details of your model parameter? how can I create language model with just 10 command?

Hello.
For very small model, I tested with small n_hidden, and results were better!!

Try with n_hidden = 464

Thanks, what about other parameter value? you used all same as deep speech models. could you provide more details so that it can be helpful for starting training.

@elpimous_robot as you said for

TRIE CREATION :

you use

alphabet.txt
lm.binary
vacab.txt

and you create trie right?

is it required to use vocab.txt for trie creation ?

also one more question

for creation vocab.txt suppose in my wave transcripts repeated

eg:
1000.wav 23093 hello good morning [male voice]
2000.wav 32424 hello good morning [female voice]

so capy both sentence or not?

Hello

Yes, and no !

For trie creation, you need textual sentences, to work with probabilities, accuracy…

Vocab.txt doesn’t need multiple times the same sentence

hope to help.
Vincent

@elpimous_robot Thanks for help

1 Like

Ready to help again my friend