Can we use DeepSpeech for Vietnamese Speech To Text?

@jaredoptimus1 do you fix that.
i See that issue https://github.com/mozilla/DeepSpeech/issues/1107 but nothing in there

At the expense of being repetitive, this issue is just mismatching between characters used in the training data and in the alphabet. You should just add the missing ones.

@lissyx
for example: ma, mà, má, mả, mã, mạ
`, ~, ?, . i think this characters can’t stand alone, it must be go with a word?
i think so

As long as you add those combined characters using proper unicode, it should work. Make sure you use the same unicode between alphabet and transcriptions. If you have any error raising about invalid label, it means we might have something bogus somewhere in our UTF-8 handling, and thus we will need more infos on how to reproduce.

@lissyx
i save all file UTF-8. it still error invalid label

You need to find where it comes from: you have some transcription that has characters not in the alphabet :). There’s nothing we can do more for now.

yeah. thank you for your help.
for example. in windown: mà
and in linux : ma`
maybe error :smiley:
i’m try to fix this. thank for your support

The best I can suggest for this is simply binary search: open your train CSV (if it happens during training), remove the first half of the lines, try to re-run. If it works, then the first half contains the offending character. If not, it’s in the second half, and you restart the process by removing half of the second half, until you get ONE line :slight_smile:

i know. but i can not create trie file. so how could i start trainning. i am still follow instruction. :slight_smile:

Well, you don’t need the trie file for training. Worst case, you can just apply the same process with generate_trie.

that mean i can delete this :
–lm_trie_path /home/nvidia/DeepSpeech/data/alfred/trie
that right ?

What is this ? Where is this coming from ?

come from there .
DEEPSPEECH/bin/run-alfred.sh

I’m a bit lost now in the status of your system. When do you have the invalid label error ? At training or during trie creation ? Why do you try to use a trie made for french on vietnamese data ?

invalid label during trie creation

i did not use trie for french. i find how to create trie for Vietnamese ?

We are circling here. You need to create it. Vincent documented it in his thread. If you are hitting the invalid label during its creation, you need to find what is missing in your alphabet.

i am trying :frowning:

Hi, @phanthanhlong7695,

To create Trie file, you need some parts :

  • alphabet.txt
  • lm.binary
  • vocabulary.txt

invalid label during trie creation : it seems that you have unknown characters in your vocabulary.
a “label” is a character (a letter, or a punctuation)
check that all caracters in your vocabulary are present in alphabet.

If not, correct it, and restart all process.

Well, I cannot do it for you, and I have much other work to perform. I gave you a process to find what is broken. Apply it.