Pretrained model : Release- version 0.4.0

bharat_patidar · November 20, 2018, 8:09pm

Hi,
I am searching for the latest release of pretrained model in : https://github.com/mozilla/DeepSpeech/releases

I found few files related to version 0.4.0 but I couldn’t find a big tar file(~in GBs) which contains the model file/binaries which we have in version 0.3.0.
Can anyone help me get the latest pretrained binaries of the Deepspeech?
Thanks

carlfm01 · November 20, 2018, 8:37pm

I downloaded from Working models for 0.4.0

reuben · November 20, 2018, 10:21pm

v0.4 hasn’t been released yet, so there’s no v0.4 model available.

anna · November 21, 2018, 11:16am

@reuben then what are these files mentioned by @carlfm01 in the above post?

Also, if are not using CPU for training, do we need the ctc decoder? Coz when I executed the command

python3 util/taskcluster.py --decoder
https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a0-cp36-cp36m-manylinux1_x86_64.whl

it asks me to download a CPU version. Can you guide me here please.

Regards

anna · November 21, 2018, 11:22am

Also, @reuben
I am trying to use the pre trainined model on my own data in the respective format required.

python3 DeepSpeech.py --n_hidden 2048 --train_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/train.csv --dev_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/dev.csv --test_files /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/data/trainingset1/3col/test.csv – epoch -3 --learning_rate 0.0001 – display_step 10 --validation_step 10 --checkpoint_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/chkpntlibri/ --export_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/20novrawf/output/ --summary_dir /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/data/20novrawf/summary/ --summary_secs 1000 --alphabet /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/alphabet.txt --lm /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/lm.binary --trie /opt/deepspeech/Abhay/deepspeech-git/DeepSpeech/models/trie

If you can see I have only passed on -3 epochs and ideally it should just do 3 more epochs (meaning 3 more epochs) but it is taking forever to train.

But the training is running but it has already crossed 66 epochs.

Things to note:

I did not pass any " --model /dir/ " command as just the checkpoint downloaded from the o.3.0 version were used to train (wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-checkpoint.tar.gz | tar xvfz -)

Am I thinking correct? For training our data by using the pre trainined model we do not need to pass --model parameter?

reuben · November 21, 2018, 11:28am

There isn’t any --model parameter in DeepSpeech.py, so I don’t know what you’re talking about. It looks like you passed – epoch -3 instead of --epoch -3 (note the dashes), probably some text editor screwing with you, so it defaulted to 75 epochs.

anna · November 21, 2018, 11:38am

@reuben Thanks for the reply.

It was – epoch -3 only, pasted in the wrong format. But I have a question, if there is a gap between – and epoch , is that a problem? ex – epoch or should there be no space like
–epoch ?

for the --model, the models folder created by wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -
has 2 output graphs one ending with .pb and one with .pbmm

For me, in order to call .pb, --initialize_from_frozen_model was used and for calling the .pbmm --model was used. A lot of users are also using the same command to call it.

As stated in read me: deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

reuben · November 21, 2018, 11:54am

Yes, that is a problem, there can’t be a space there. The deepspeech binary, used for inference, is different from DeepSpeech.py, used for training. The latter has no --model parameter.

anna · November 21, 2018, 11:57am

What should be my ideal training command if I would like to use the pre-trained model and want to train using my own custom data…

I do have both the files

for the --model, the models folder created by wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz -

wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-checkpoint.tar.gz | tar xvfz -

what command to use…

reuben · November 21, 2018, 12:25pm

github.com

mozilla/DeepSpeech/blob/master/README.md#continuing-training-from-a-release-model

# Project DeepSpeech

[![Documentation Status](https://readthedocs.org/projects/deepspeech/badge/?version=latest)](http://deepspeech.readthedocs.io/?badge=latest)
[![Task Status](https://github.taskcluster.net/v1/repository/mozilla/DeepSpeech/master/badge.svg)](https://github.taskcluster.net/v1/repository/mozilla/DeepSpeech/master/latest)

Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on [Baidu's Deep Speech research paper](https://arxiv.org/abs/1412.5567). Project DeepSpeech uses Google's [TensorFlow](https://www.tensorflow.org/) project to make the implementation easier.

![Usage](images/usage.gif)

Pre-built binaries that can be used for performing inference with a trained model can be installed with `pip3`. Proper setup using virtual environment is recommended and you can find that documented [below](#using-the-python-package).

A pre-trained English model is available for use, and can be downloaded using [the instructions below](#getting-the-pre-trained-model).

Once everything is installed you can then use the `deepspeech` binary to do speech-to-text on short, approximately 5 second, audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client):

```bash
pip3 install deepspeech
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
```

This file has been truncated. show original

anna · November 21, 2018, 12:37pm

I am following this only. As there is no mention of used of frozen model, should I use it or no?
–initialize_from_frozen_model models/output_graph.pbmm

lissyx · November 21, 2018, 12:41pm

This code has been removed from master. There is no --initialize_from_frozen_model anymore

bharat_patidar · November 21, 2018, 1:42pm

It contains version 0.2.0 . Is it latest binary?

carlfm01 · November 21, 2018, 5:53pm

Hi @bharat_patidar, I’m using the binary from master and the pbmm from 0.2.0 that I mentioned

bharat_patidar · November 24, 2018, 2:54pm

Thanks Carlos and Reuben for the response.
Can you guy suggest some important attributes like bit rate and accent which should be taken care of to get the best out of DeepSpeech model.

lissyx · November 24, 2018, 3:04pm

Check the documentation, it’s all covered: PCM 16 bits, 16kHz mono

bharat_patidar · November 24, 2018, 8:54pm

@lissyx , @carlfm01 & @reuben
My input audio has 44.1kHz sampling rate and I tried to downsample(16k) it through Audacity as well as sox. But I am getting very bad result after downsampling. Although I am getting decent result with original sampling rate(44.1kHz) but to get better result i tried to make it compatible with model but didn’t get what i expected.
Any clue or reason behind this??
thank you so much guys for all the previous responses

carlfm01 · November 24, 2018, 9:22pm

Can you share an example of the audio that you are using? If you can, share both versions of the audio.

lissyx · November 24, 2018, 10:10pm

That’s unclear. The result you shared above are with which audio files ? Can you ensure it’s mono as well ? Pushing stereo at 16kHz would kind of explain that.

You want to retrain from scratch a new model with 44.1kHz ? That’s going to require a lot of data and processing power.

Nitin_Chauhan · December 10, 2019, 4:55am

Can you please tell me how new data set should be arranged to retrain the pre -trained model.

Topic		Replies	Views
DeepSpeech Training own English model for call center speech recognition DeepSpeech	22	3264	October 8, 2019
Using Deep Speech DeepSpeech	34	12902	August 20, 2019
Where I can get pre-trained model for version 5 DeepSpeech	8	2057	May 3, 2019
Links to pretrained models DeepSpeech	55	33096	April 18, 2023
Training pre-trained model DeepSpeech	0	604	April 16, 2019

Pretrained model : Release- version 0.4.0

Related topics