Trie file creation

Hello,
I bet this is simplest question on the forum, but still cannot understand what is wrong.

I have created all necessary binary files:

  1. text.txt
  2. alphabet.txt
  3. words.arpa
  4. lm.binary
    I do not understand how to create trie file. Have went through 3-4 tutorials still does not understand.

I was looking for generate_trie folder on DeepSpeech folder, but could not find it.
My directory is /home/stass/train/DeepSpeech/native_client/
My files(alphabet, lm.binary etc) are stored at /home/stass/latvian1

I have tried to run command
(deepspeech-venv1) stass@stass-VirtualBox:~/train/DeepSpeech/native_client/generate_trie$ /home/stass/train/DeepSpeech/native_client/generate_trie /home/stass/latvian1/alphabet.txt /home/stass/latvian1/lm.binary /home/stass/latvian1/train.txt /home/stass/latvian1/trie

No such directory that is obvious :slight_smile: even tried to create such director. Any suggestions would be appreciated.

Please state what version you are using, as in the latest release you don’t longer need a trie file.

But it looks like you are using 0.6x, so you will need to get the native client. Try

python3 util/taskcluster.py --target native_client

in the Deepspeech directory.

My version is 0.6.1. I just compared GitHub latest version with my folder architecture. On github they do not have native folder as well. <— DID not know that you do not need it any more.

Does it mean after lm.binary creation i can imidially process to ./DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --dev_files …/data/CV/en/clips/dev.csv --test_files …/data/CV/en/clips/test.csv command ? Does even binary creation in this case is necessary and how to use it in this case ?

Starting with 0.7x you don’t, but that is still in beta. For now I would stick to the 0.6x branch and install the native client.

so what this command do ?

python3 util/taskcluster.py --target native_client I did run it it downloaded something. I this enough or now I need start building native_client ?

Really? Just check whether you now have the generate_trie script :slight_smile: You should

Hello, no it created native_client folder without generate_trie, have tried 2 times.

generate_trie is not required anymore, so verify how you call taskcluster.py (read --help please) and ensure you have matching version. It should be the default behavior, but …

@lissyx currently I am using v0.7.4 deep speech, I have completed creating lm.binary. what is the next step I should do? as you have mentioned we dont require generate_trie anymore what should I do next? please explain me clearly.

I can find the file taskcluster.py

We have docs written for that, please read them before asking for help.

@lissyx can you provide me any reference or link for the article?

https://deepspeech.readthedocs.io/en/v0.8.0/Scorer.html

1 Like