FormatLoadException Error during running build_binary command

naimish101.viradia · January 3, 2020, 4:02am

Hi Everyone,

I am trying to train very small model first to get my self used to with all the commands.

Language I am trying train is Gujarati. Is this issue with Unicode characters for my language?

I am using following tutorial,

Reference Guide:

Command:
…/…/native_client/kenlm/build/bin/build_binary -T -s words.arpa lm.binary

Error:
/DeepSpeech/native_client/kenlm/lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&)
[with Search = lm::ngram::detail::HashedSearchlm::ngram::BackoffValue; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 20
ERROR

Thanks in advance for help.

lissyx · January 3, 2020, 3:07pm

No, you are following an old tutorial instead of following uptodate documentation under data/lm

naimish101.viradia · January 4, 2020, 10:53pm

As suggested by error I added “-s” and it did worked fine. Please do we have some sort of guide where I can understand the actual reason behind this errors,

The ARPA file is missing <> then use of -s

and

Could not calculate Kneser-Ney discounts for 3-grams with adjusted count 4 because we didn’t observe any 3-grams with adjusted count 3; Is this small or artificial data?
Try deduplicating the input. To override this error for e.g. a class-based model, rerun with --discount_fallback

Thanks

naimish101.viradia · January 5, 2020, 12:03am

I am getting this error,

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /DeepSpeech/dvolume/gujarati/results/checkout/train-0
I0104 23:55:28.599471 139891706021696 saver.py:1280] Restoring parameters from /DeepSpeech/dvolume/gujarati/results/checkout/train-0
I Restored variables from most recent checkpoint at /DeepSpeech/dvolume/gujarati/results/checkout/train-0, step 0
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: /DeepSpeech/dvolume/gujarati/dev/dev.csv
Traceback (most recent call last):
File “/DeepSpeech/DeepSpeech.py”, line 966, in
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/DeepSpeech/DeepSpeech.py”, line 939, in main
train()
File “/DeepSpeech/DeepSpeech.py”, line 646, in train
dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero

Is it because of my small dataset?

lissyx · January 5, 2020, 2:49pm

Likely, yes, this is why.

ranjbar_1m · October 3, 2020, 11:08am

i have this error exactly
but i can’t find the reason

othiele · October 3, 2020, 12:03pm

Please don’t hijack old threads, you already opened a new one.