Installing Deep Speech for the first time: thinking out loud

I’m installing Deep Speech for the first time! I’m taking notes as I go along in this topic. Sorry for the length, but I hope the detail & my thinking process helps :slight_smile:

  • Found this Discourse channel by clicking through the contact section from README :+1:
  • Running into some problems with my python installation. Had to reinstall virtualenv & update a path for python3. This is probably my bad for messing with python2 recently.
  • Once python was sorted, everything is worked well till #Transcribe an audio file.
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav
usage: deepspeech [-h] --model MODEL [--lm [LM]] [--trie [TRIE]] --audio AUDIO
                  [--beam_width BEAM_WIDTH] [--lm_alpha LM_ALPHA]
                  [--lm_beta LM_BETA] [--version] [--extended] [--json]
deepspeech: error: unrecognized arguments: --scorer deepspeech-0.6.1-models/kenlm.scorer
(deepspeech-venv) 

Looks like scorer isn’t an argument anymore!

  • Just realized I’ve been working in my home directory and very messily unziped files here. Created a new directory, moved the uncompressed files and deleted the .tar.gz files. Might be worth adding these steps to the instructions?
  • Removed the scorer argument and it sort of worked? “experience proofsless”. Not sure what scorer does.
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --audio audio/2830-3980-0043.wav
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.1-0-g3df20fe
2020-03-12 10:54:11.321259: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0116s.
Running inference.
experience proofsless
Inference took 1.709s for 1.975s audio file.
(deepspeech-venv) 
  • I originally wasn’t sure it worked / what ‘experience proofsless’ meant till I listened to the .wav file.
  • Just realized I probably installed a stable version through pip and the readme has instructions for master (very clearly printed). Bit confusing since the readme explicitly uses pip3 & not master.
  • Scrolling up in my terminal, I see that pip installed v0.6.1. Got here by switching to the 0.6.1 tag. Took me a minute to get past all the branches. Ran instructions I found there!
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.1-0-g3df20fe
2020-03-12 11:22:44.920337: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0103s.
Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie
Loaded language model in 0.00578s.
Running inference.
experience proof less
Inference took 1.929s for 1.975s audio file.
(deepspeech-venv) 
  • No argument errors this time! And “experience proof less” is slightly more accurate?

Suggestions

  • It makes sense to keep Master documentation on the master branch. In this case, can these instructions be updated to use instructions for installing the master branch (not pip)?
  • I think it’s still important to have instructions installing a stable release via pip somewhere prominent. Is there a website or wiki that could house a quick getting started for the stable release (or whatever’s on pip)? The README can link out to this.
  • Can the instructions clarify what the expected output would be? I’m not convinced “experience proof less” is a 100% correct transcription.

Overall, great experience and fun tool :tada: Very much looking forward to playing with this more. I’d be happy to help with these changes if they sound good.

Next up, I’m going to try training a model :dizzy:

Thanks, it’s always hard to grasp the feeling from people not acustomed to the project, and you have been quite focused and clear in your steps.

Not sure what you mean here. The documentation from master is accurate for master.

Well, switching versions gives you the proper one. I don’t see how we can do better.
README already links to readthedocs which defaults to v0.6.1.

Looks like this is perfect for a good first PR, don’t hesitate to send one.

The link to readthedocs could be highlighted, it’s just the badge image which I guess is not obvious, given continued confusion re. versioning of docs.

It isn’t supposed to be a 100% correct transcription, as our model is not 100% accurate. Those samples were basically gotten at random from the LibriSpeech dataset, and it happens that the v0.6.1 model does not get that sample entirely right.

If this is what @abbycabs needs and it was not as obvious as I think, then a PR improving that is welcome: getting first-hand new-comer high-quality feedback on what is unclear is always good.

Thanks all for your thoughts! Much appreciated :slight_smile:

For the master documentation: The instructions use pip3 install deepspeech – which installs a stable version afaik. Can the instructions be updated to install the master version? I thought that I was installing and using the master version by following these instruction, until I remembered how pip works.

I did not see the readthedocs! (Honestly, I skim over badges) Yes, highlighting that would have been very helpful for me.

I don’t know how except forcing ==VERSION which we don’t want since it’s pain to do. I know npm has some @stable and others :confused:

This is helpful for understanding the expected outcome! Thank you.

There are no equivalent instructions we could put there for the master version, because more often than not there is no pre-trained model available that is compatible with master.

Ah, thanks both! In that case, a bit more clarity would be helpful. I’ll send along a PR shortly – huge thanks for the additional context.

The “getting started” commands on the master readme make reference to 0.6 models but use the syntax of 0.7. The mixing of different versions is confusing, plus that code will never ever work because the 0.6 models aren’t compatible with 0.7 even after it’s publicly released. It seems weird to put commands on the homepage readme that don’t work and won’t ever work. IMO that section should either be 100% 0.7 or 100% 0.6 but not both mixed together.

I also think it could be better explained that master is intended only for people who will train their own models.

1 Like