Dump Error while training Common Voice Data

saravananselvamohan · November 22, 2019, 5:17am

Reading data/lm/lm.binary
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): …/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “version https://git-lfs.github.com/spec/v1” not \data. Byte: 43
Fatal Python error: Aborted

Thread 0x00007fa5f1f17700 (most recent call first):
File “/usr/lib/python3.7/threading.py”, line 296 in wait
File “/usr/lib/python3.7/queue.py”, line 170 in get
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/summary/writer/event_file_writer.py”, line 159 in run
File “/usr/lib/python3.7/threading.py”, line 926 in _bootstrap_inner
File “/usr/lib/python3.7/threading.py”, line 890 in _bootstrap

Thread 0x00007fa5f2718700 (most recent call first):
File “/usr/lib/python3.7/threading.py”, line 296 in wait
File “/usr/lib/python3.7/queue.py”, line 170 in get
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/summary/writer/event_file_writer.py”, line 159 in run
File “/usr/lib/python3.7/threading.py”, line 926 in _bootstrap_inner
File “/usr/lib/python3.7/threading.py”, line 890 in _bootstrap

Current thread 0x00007fa6623c2740 (most recent call first):
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/ds_ctcdecoder/swigwrapper.py”, line 279 in init
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/ds_ctcdecoder/init.py”, line 30 in init
File “/root/DeepSpeech/evaluate.py”, line 48 in evaluate
File “./DeepSpeech.py”, line 672 in test
File “./DeepSpeech.py”, line 939 in main
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py”, line 250 in _run_main
File “/root/tmp/deepspeech-train-venv/lib/python3.7/site-packages/absl/app.py”, line 299 in run
File “./DeepSpeech.py”, line 962 in
Aborted (core dumped)

I’m getting this error while training the model with Common Voice Dataset

lissyx · November 22, 2019, 7:43am

@saravananselvamohan It looks like you improperly setup git-lfs. Please check your setup.

saravananselvamohan · November 22, 2019, 7:49am

Does that mean the problem with the training environment I created??? For me training of common voice data occurs for almost 5 Epochs. After I am getting this error

lissyx · November 22, 2019, 8:45am

The problem is that data/lm/lm.binary and data/lm/trie file are not properly checked out, yes. Please verify and ensure you checkout them properly through git-lfs, as documented.

saravananselvamohan · November 25, 2019, 1:02pm

Still I can’t able to fix the issue…
I Early stop triggered as (for last 4 steps) validation loss: 264.626007 with standard deviation: 3.371662 and mean: 257.793589
I FINISHED optimization in 0:06:38.399252
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): …/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “version https://git-lfs.github.com/spec/v1” not \data. Byte: 43
Aborted (core dumped)

lissyx · November 25, 2019, 1:27pm

I’ve already told you what was the issue, I can’t fix your setup for you.

saravananselvamohan · November 25, 2019, 1:46pm

Can you explain "data/lm/lm.binary and data/lm/trie " How to check these files are properly checked out?

lissyx · November 25, 2019, 2:00pm

No, this is already documented, please read the documentation.

lissyx · November 25, 2019, 2:01pm

Very begining of the documentation: https://github.com/mozilla/DeepSpeech/blob/master/TRAINING.rst#getting-the-training-code “install Git Large File Storage”.