I’m installing Deep Speech for the first time! I’m taking notes as I go along in this topic. Sorry for the length, but I hope the detail & my thinking process helps
- Found this Discourse channel by clicking through the contact section from README
- Running into some problems with my python installation. Had to reinstall virtualenv & update a path for python3. This is probably my bad for messing with python2 recently.
- Once python was sorted, everything is worked well till
#Transcribe an audio file
.
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav
usage: deepspeech [-h] --model MODEL [--lm [LM]] [--trie [TRIE]] --audio AUDIO
[--beam_width BEAM_WIDTH] [--lm_alpha LM_ALPHA]
[--lm_beta LM_BETA] [--version] [--extended] [--json]
deepspeech: error: unrecognized arguments: --scorer deepspeech-0.6.1-models/kenlm.scorer
(deepspeech-venv)
Looks like scorer
isn’t an argument anymore!
- Just realized I’ve been working in my home directory and very messily unziped files here. Created a new directory, moved the uncompressed files and deleted the .tar.gz files. Might be worth adding these steps to the instructions?
- Removed the
scorer
argument and it sort of worked? “experience proofsless”. Not sure what scorer does.
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --audio audio/2830-3980-0043.wav
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.1-0-g3df20fe
2020-03-12 10:54:11.321259: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0116s.
Running inference.
experience proofsless
Inference took 1.709s for 1.975s audio file.
(deepspeech-venv)
- I originally wasn’t sure it worked / what ‘experience proofsless’ meant till I listened to the .wav file.
- Just realized I probably installed a stable version through pip and the readme has instructions for master (very clearly printed). Bit confusing since the readme explicitly uses pip3 & not master.
- Scrolling up in my terminal, I see that pip installed v0.6.1. Got here by switching to the 0.6.1 tag. Took me a minute to get past all the branches. Ran instructions I found there!
$ deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
Loading model from file deepspeech-0.6.1-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504ac6b
DeepSpeech: v0.6.1-0-g3df20fe
2020-03-12 11:22:44.920337: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0103s.
Loading language model from files deepspeech-0.6.1-models/lm.binary deepspeech-0.6.1-models/trie
Loaded language model in 0.00578s.
Running inference.
experience proof less
Inference took 1.929s for 1.975s audio file.
(deepspeech-venv)
- No argument errors this time! And “experience proof less” is slightly more accurate?
Suggestions
- It makes sense to keep Master documentation on the master branch. In this case, can these instructions be updated to use instructions for installing the master branch (not pip)?
- I think it’s still important to have instructions installing a stable release via pip somewhere prominent. Is there a website or wiki that could house a quick getting started for the stable release (or whatever’s on pip)? The README can link out to this.
- Can the instructions clarify what the expected output would be? I’m not convinced “experience proof less” is a 100% correct transcription.
Overall, great experience and fun tool Very much looking forward to playing with this more. I’d be happy to help with these changes if they sound good.
Next up, I’m going to try training a model