./bin/run-ldc93s1.sh fails

Hi all,

I’m new to DeepSpeech and ML in general. I’s following readme at https://github.com/mozilla/DeepSpeech.

[Ubuntu16.04; Python3.6]

So far I’ve been able to:
i. Install python3.6.
ii. Install DeepSpeech (from pip3) in a virtualenv.
iii. Download pre-trained english model (wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -)
iv. Run deepspeech to get text for a .wav file (deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_input.wav).
This worked well and I got the intended text output.
v. Download DeepSpeech project from github.
vi. Installed all requirements via pip3 (pip3 install -r requirements.txt)
vii. Ran - python3 util/taskcluster.py --target .

Things were going too smooth for their own good…

So now I’m trying to train a model and as recommended in the README, I ran -
./bin/run-ldc93s1.sh

And this is the output I’m getting:

+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path(“deepspeech/ldc93s1”))
+ checkpoint_dir=/home/boparaim/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 75 --checkpoint_dir /home/boparaim/.local/share/deepspeech/ldc93s1
Preprocessing [‘data/ldc93s1/ldc93s1.csv’]
Preprocessing done
Preprocessing [‘data/ldc93s1/ldc93s1.csv’]
Preprocessing done
Preprocessing [‘data/ldc93s1/ldc93s1.csv’]
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
I STARTING Optimization
I Training epoch 0…
I Training of Epoch 0 - loss: 358.054657
100% (1 of 1) |#################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
I Training epoch 1…
.
.
.
I Training epoch 73…
I Training of Epoch 73 - loss: 7.663087
100% (1 of 1) |#################################################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
I Training epoch 74…
100% (1 of 1) |#################################################################################################################| Elapsed Time: 0:00:00 ETA: 00:00:00Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100
terminate called after throwing an instance of ‘lm::FormatLoadException’
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was “version https://git-lfs.github.com/spec/v1” not \data. Byte: 43
Aborted (core dumped)

I hope someone here can help me get over this issue. Thanks for your help.

Thanks. Repeated the whole thing and it worked this time. I missed the very first instruction in the read.me.

To be clear - “Install Git Large File Storage, either manually or through a package like git-lfs if available on your system.

1 Like