New v0.4-alpha.2 and language model

Hi,

I will retrain in a Ubuntu server, a model with the new prerelease 0.4-alpha.2 and the new CTC hoping it will improve the WER and get more or less the same results as for 0.1 version (with the same data).

As the CTC has changed, and so the generate_trie binary, I am questioning if the quantization in the language_model binary file is still mandatory.

On the other hand, I get this error from DeepSpeech.py :

       from ds_ctcdecoder import ctc_beam_search_decoder_batch, Scorer
        ModuleNotFoundError: No module named 'ds_ctcdecoder'

I have executed this before:
$ pip3 install -r requirements.txt
$ pip3 install deepspeech-gpu==0.4.0a2

Is there something else to install?

I have executed this to get the url:
$ python3 util/taskcluster.py --arch gpu --decoder

https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl

And then try to install it, but already fails :

$ pip3 install https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl

Collecting ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
  HTTP error 404 while getting https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
  Could not install requirement ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl because of error 404 Client Error: Not Found for url: https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
Could not install requirement ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl because of HTTP error 404 Client Error: Not Found for url: https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl for URL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl

Thanks,
Mar

Please check the documentation. This is just wrong.

Hi Alexander,

I knew about the --decoder parameter using the taskcluster.py --help itself, not in any documentation. It was an experiment, forget about it.

I did the regular native client retrieval before with:

python3 util/taskcluster.py --arch gpu --target .

And its fine, but nothing about ds_ctcdecoder module.

Should I just build the module or it is in the taskcluster?.

Regards,
Mar

What is unclear in https://github.com/mozilla/DeepSpeech/blob/d3168391b3e9ee041bf8c517237cbacbd0c23da2/README.md#installing-prerequisites-for-training ?

I was experimenting with the url, but if I execute the exact command, nothing changes:

$ pip3 install $(python3 util/taskcluster.py --decoder)

Collecting ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
  HTTP error 404 while getting https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
  Could not install requirement ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl because of error 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OjaaCF-MTOmXUUAE6x-hjQ/artifacts/public%2Fds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
Could not install requirement ds-ctcdecoder==0.4.0a1 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl because of HTTP error 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OjaaCF-MTOmXUUAE6x-hjQ/artifacts/public%2Fds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl for URL https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a1-cp36-cp36m-manylinux1_x86_64.whl
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

In the meantime I have build it:

cd native_client/ctcdecode
make bindings NUM_PROCESSES=8
pip3 install dist/*.whl

It is ok?

Regards,
Mar

Now, that looks better, but can you verify VERSION file? It seems it reads 0.4.0-alpha.1 from there, instead of 0.4.0-alpha.2.

Yes, the VERSION is not ok:

    cat VERSION
    0.4.0-alpha.1

I just did yesterday:

git clone https://github.com/mozilla/DeepSpeech

(master branch)

Should I better get the source code package from
https://github.com/mozilla/DeepSpeech/releases
?

BR

Yes, if you pull latest branch that includes the 0.4.0-alpha.2 then it should work better. Maybe we should have something more robust, might be a good idea to file an issue on Github.

I forgot you can try to add --branch v0.4.0-alpha.1 after --decoder

Finally I did this:

  1. Download source code from releases instead of github master branch:

    wget https://github.com/mozilla/DeepSpeech/archive/v0.4.0-alpha.2.zip
    unzip v0.4.0-alpha.2.zip

  2. Install requirements and deepspeech bindings (they were ok from the first try)

  3. Install the ds_ctcdecoder module (now it works fine).

$ pip3 install $(python3 util/taskcluster.py --decoder)

Collecting ds-ctcdecoder==0.4.0a2 from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a2-cp36-cp36m-manylinux1_x86_64.whl
  Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu-ctc/artifacts/public/ds_ctcdecoder-0.4.0a2-cp36-cp36m-manylinux1_x86_64.whl (1.6MB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.6MB 5.8MB/s 
Requirement already satisfied: numpy>=1.7.0 in /home/jose/.local/lib/python3.6/site-packages (from ds-ctcdecoder==0.4.0a2) (1.15.0)
Installing collected packages: ds-ctcdecoder
Successfully installed ds-ctcdecoder-0.4.0a2

I have just see now the hint about the --branch v0.4.0-alpha.1, so I finally didn’t try it, I don’t know why the 404 with alpha1.

The github master branch not getting the last version is quite disturbing, from now on I will always use the archive page.

Best Regards,
Mar

1 Like

that’s normal behavior given your system state and our current code :slight_smile:

No, the problem is that master in the URL is okay, but we read and build the version number from VERSION file. So there was a discrepency.

@mar_martinez This should now be the default behavior: https://github.com/mozilla/DeepSpeech/pull/1802

1 Like