How to get .pb model?


(Ugnius Malukas) #1

I am trying to train the DeepSpeech with this code:

#!/bin/sh
python -u ../DeepSpeech.py \
  --export_tflite \
  --export_dir /media/ugnelis/Data/GIT/lithuanian-speach-to-text/DeepSpeech/mano \
  --train_files data/dataset.csv \
  --dev_files data/dataset.csv \
  --test_files data/dataset.csv \
  --alphabet data/alphabet.txt \
  --train_batch_size 1 \
  --dev_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 494 \
  --epoch 5 \
  --checkpoint_dir models \
"$@"

And the problem is that I don’t get “.pb” model as an output, only TensorFlow check points.
Do I need additional flag for “.pb” module?

P. s. I hope this error is not blocking to get “.pb” module:

...
100% (1 of 1) |##########################| Elapsed Time: 0:00:00 Time:  0:00:00
Preprocessing ['data/dataset.csv']
WARNING:root:frame length (551) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid.
Preprocessing done
[scorer.cpp:63] FATAL: "(access(filename, 4)) == (0)" check failed. Invalid language model path

(Lissyx) #2

That’s wrong, and not even in master, and will not get you a .pb but a .tflite. Please stick to only --export_dir as documented.


(Ugnius Malukas) #3

I have tried both - with --export_tflite and without. I don’t get .pb or .tflite.
image


(Ugnius Malukas) #4

If I set --export_dir, it event doesn’t create a folder.


(Lissyx) #5

Again, --export_tflite is only on my branch, so it’s not expected to work


(Lissyx) #6

it should create it, but try forcing the creation. Also the error about access() is kind of worrying. Can you try mkdir -p /tmp/test-model and then run with --export_dir /tmp/test-model/ ?


(Ugnius Malukas) #7

Still doesn’t create. Could be a problem that I use wrong git branch?


(Lissyx) #8

I can’t tell if you don’t share us more informations, like branches, versions, and full logs.


(Ugnius Malukas) #9

I am running last version of master branch, commit de279168ecfa47e79d4b155c4c529267441a051a.

The output with the error:

(deepspeech) user@computer:/home/user/GIT/DeepSpeech/temp$ ./train.sh 
Preprocessing ['data/dataset.csv']
Preprocessing done
Preprocessing ['data/dataset.csv']
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
Preprocessing ['data/dataset.csv']
Preprocessing done
[scorer.cpp:63] FATAL: "(access(filename, 4)) == (0)" check failed. Invalid language model path

I don’t understand where is the problem.


(Ugnius Malukas) #10

if I run ./bin/run-ldc93s1.sh

+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d  ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
+ checkpoint_dir=/home/ugnelis/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 75 --checkpoint_dir /home/ugnelis/.local/share/deepspeech/ldc93s1 --export_dir mano
Preprocessing ['data/ldc93s1/ldc93s1.csv']
Preprocessing done
Preprocessing ['data/ldc93s1/ldc93s1.csv']
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
Preprocessing ['data/ldc93s1/ldc93s1.csv']
Preprocessing done
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  ../kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Aborted (core dumped)

(Lissyx) #11

it seems pretty obvious to me: you have not followed the docs and properly setup Git-LFS ; hence your language model is not correct, as stated in the first error