Earlier i was working on macOS where i failed, then here again i got the above error and now again i switched back to macOS and successfully generated trie.
Also, i would like to bring it upto your notice that, this command :
âpython3 util/taskcluster.py --arch osx --target .â (macOS specific command)
is not mentioned in this page in the âtrainingâ section :
, instead its mentioned previously somewhere which makes it kind of confusing and made me believe that âpython3 util/taskcluster.py --target .â is the default command to download pre-build binaries for any OS.
I am mentioning this as i got stuck with this error earlier while working in mac OS : âcannot execute binary fileâ since i downloaded using âpython3 util/taskcluster.py --target .â
I hope that will be useful/helpful to someone in future.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Two things:
Your CPU seems to lack AVX, according to Intelâs website. So linux or macOS, same deal.
python util/taskcluster.py --help should document you the usage and thus tells you about osx.
Hey @lissyx, i tried to check the mismatch between text transcriptions and alphabet, but i was not able to resolve this.
I am using the alphabet.txt file which used for DeepSpeech pre-trained model as my text transcripts only contain english alphabets (a to z).
Further, regarding my text transcriptions, this is a 4 row snippet of how my different csv files look (1st row starting with wav_filename. Next 3 rows starting with /Users):-
wav_filename,wav_filesize,transcript
/Users/naveen/Downloads/DeepSpeech/TEST/engtext_3488.wav,253470,hit by the stone the kite released its prey and the mouse at once ran to the sage asking him for protection
/Users/naveen/Downloads/DeepSpeech/TEST/engtext_3489.wav,202702,the kite addressed sage and said sage you have hit me with a stone which is not proper
/Users/naveen/Downloads/DeepSpeech/TEST/engtext_3490.wav,167212,are you not afraid of god surrender that mouse to me or you will go to hell
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
Just a snippet is not useful. Try to instrument util/text.py to provide more context on the source of the error. It seems you lack â3â in the alphabet, according to the stack trace. And we have no number at all in data/alphabet.txt.
Does it imply that any of my input files(text transcriptions, binary, trie file) might contain â3â or other numbers too along with alphabets?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
Thatâs my guess, you have some transcription used during learning that seems to have a â3â somewhere. We have an issue open to help mitigate that, but nobody picked it so far https://github.com/mozilla/DeepSpeech/issues/1107
Okay. I will be glad to do it. Just suggest me a flow /process/format to create tooling which will be easier to follow.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
13
Well, whatever you can do is already much better than what we have: nothing :). I have to admit I have not really thought it through, so I donât have a hard opinion on that.
Also, my Indian accent datasets model finally got created but this is the error i am getting while trying to run a inference. I am not able to resolve this.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
15
Check on the forum, youâre not the first one, this is because you trained with TensorFlow > r1.4 and you are using binaries that are r1.4 (like deepspeech v0.1.1)
to simplify:
if i use âutil/taskcluster.pyâ then where exactly do i specify my folder path to the âdownloadsâ?
I am just not able to understand which command to use?
about the 3rd part:
I think that for trie file creation, âgenerate_trieâ is used as first argument which is located inside native_client folder. Also, native_client has kenlm decoder, thatâs why i think that model might have to be trained again.
Also,
can i simply unzip the âpublic/native_client.tar.xzâ file instead of installing?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
19
Please make an effort and read what I am telling you.
This is exactly what I told you above, util/taskcluster.py will take care of downloading (by default) binaries from the latest master and extract content of native_client.tar.xz.
Sorry. I am really trying hard to understand what you told. And i appreciate your help a lot.
my doubt is that while setting up everything, i had already used âutil/taskcluster.pyâ as specified in the DeepSpeech github page with command as:
python3 util/taskcluster.py --arch osx --target .
and then i successfully trained my own model but while running inference, i got error due to mismatch of binaries and tensorflow version as i told you.
Now, what exactly do i do now? What i understand is that if i have to use âutil/taskcluster.pyâ again then i must change this command (python3 util/taskcluster.py --arch osx --target .) somehow so that it downloads and extracts latest binaries.