Pretrained model : Release- version 0.4.0

Hi,
I am searching for the latest release of pretrained model in : https://github.com/mozilla/DeepSpeech/releases

I found few files related to version 0.4.0 but I couldn’t find a big tar file(~in GBs) which contains the model file/binaries which we have in version 0.3.0.
Can anyone help me get the latest pretrained binaries of the Deepspeech?
Thanks

I downloaded from Working models for 0.4.0

v0.4 hasn’t been released yet, so there’s no v0.4 model available.

@reuben then what are these files mentioned by @carlfm01 in the above post?

Also, if are not using CPU for training, do we need the ctc decoder? Coz when I executed the command

python3 util/taskcluster.py --decoder
https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a0-cp36-cp36m-manylinux1_x86_64.whl

it asks me to download a CPU version. Can you guide me here please.

Regards

Also, @reuben
I am trying to use the pre trainined model on my own data in the respective format required.

python3 DeepSpeech.py --n_hidden 2048 --train_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/train.csv --dev_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/dev.csv --test_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/test.csv – epoch -3 --learning_rate 0.0001 – display_step 10 --validation_step 10 --checkpoint_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/chkpntlibri/ --export_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/20novrawf/output/ --summary_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/20novrawf/summary/ --summary_secs 1000 --alphabet /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/alphabet.txt --lm /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/lm.binary --trie /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/trie

If you can see I have only passed on -3 epochs and ideally it should just do 3 more epochs (meaning 3 more epochs) but it is taking forever to train.

But the training is running but it has already crossed 66 epochs.

Things to note:

  1. I did not pass any " --model /dir/ " command as just the checkpoint downloaded from the o.3.0 version were used to train (wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-checkpoint.tar.gz | tar xvfz -)

Am I thinking correct? For training our data by using the pre trainined model we do not need to pass --model parameter?

There isn’t any --model parameter in DeepSpeech.py, so I don’t know what you’re talking about. It looks like you passed – epoch -3 instead of --epoch -3 (note the dashes), probably some text editor screwing with you, so it defaulted to 75 epochs.

@reuben Thanks for the reply.

It was – epoch -3 only, pasted in the wrong format. But I have a question, if there is a gap between – and epoch , is that a problem? ex – epoch or should there be no space like
–epoch ?

for the --model, the models folder created by wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -
has 2 output graphs one ending with .pb and one with .pbmm

For me, in order to call .pb, --initialize_from_frozen_model was used and for calling the .pbmm --model was used. A lot of users are also using the same command to call it.

As stated in read me: deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

Yes, that is a problem, there can’t be a space there. The deepspeech binary, used for inference, is different from DeepSpeech.py, used for training. The latter has no --model parameter.

What should be my ideal training command if I would like to use the pre-trained model and want to train using my own custom data…

I do have both the files

for the --model, the models folder created by wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -

wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-checkpoint.tar.gz | tar xvfz -

what command to use…

I am following this only. As there is no mention of used of frozen model, should I use it or no?
–initialize_from_frozen_model models/output_graph.pbmm

This code has been removed from master. There is no --initialize_from_frozen_model anymore

It contains version 0.2.0 . Is it latest binary?

Hi @bharat_patidar, I’m using the binary from master and the pbmm from 0.2.0 that I mentioned

Thanks Carlos and Reuben for the response.
Can you guy suggest some important attributes like bit rate and accent which should be taken care of to get the best out of DeepSpeech model.

Check the documentation, it’s all covered: PCM 16 bits, 16kHz mono

@lissyx , @carlfm01 & @reuben
My input audio has 44.1kHz sampling rate and I tried to downsample(16k) it through Audacity as well as sox. But I am getting very bad result after downsampling. Although I am getting decent result with original sampling rate(44.1kHz) but to get better result i tried to make it compatible with model but didn’t get what i expected.
Any clue or reason behind this??
thank you so much guys for all the previous responses

Can you share an example of the audio that you are using? If you can, share both versions of the audio.

That’s unclear. The result you shared above are with which audio files ? Can you ensure it’s mono as well ? Pushing stereo at 16kHz would kind of explain that.

You want to retrain from scratch a new model with 44.1kHz ? That’s going to require a lot of data and processing power.

Can you please tell me how new data set should be arranged to retrain the pre -trained model.