New 0.9 alphas to test new features

As we are looking at preparing a 0.9 release, we have recently merged two big changes from contributors. Those changes are bringing new features and welcome bugfixes, and hence it would really be nice of every people interested in deepspeech and building apps on it to take a look, debug, and report before we ship a new stable release.

Please do so with the upcoming v0.9.0-alpha.10 (just merged, building as I am writing this, so it might still be a few hours until all the built packages hits the stores).

Namely, we need to thank @godefv with https://github.com/mozilla/DeepSpeech/pull/3279 who has been able to tackle and fix a long-standing problem of sometimes a bit incorrect timestamps reported by the API. We expect no regression both on the timestamps reported as well as on the latency and execution time of the decoder. Feedback on this is really important, both good and bad.

Then, we need to thank @josh_meyer for keeping up with us and shipping https://github.com/mozilla/DeepSpeech/pull/3297 which adds a new API feature: hot-word boosting. This new API gives control to the user for boosting probability of some words from your scorer and thus unlocks possible use-cases. This is very new, so much debugging is welcome. It is exposed in all of our bindings (C, Python, JS, Java and .Net), but it might not be user-facing in all binaries (current implementation only covers exposing in the deepspeech python binary as well as deepspeech C++ command line tool) of those bindings. Obviously PRs are welcome to add this and this is a nice way to get into the project, as described on the followup issue https://github.com/mozilla/DeepSpeech/issues/3336.

4 Likes

Thanks for the heads up on the up-coming release.

I’ll see if I can get some time to look specifically at the new features, but I did dust off a simple demo project (https://github.com/nmstoker/SimpleSpeechLoop) I had working with DeepSpeech v0.7 and it worked flawlessly with v0.9.0-alpha.10 when I installed that version with pip. The project didn’t need any code changes (as expected).

Details

Platform OS

  • Linux

Python Environment

  • Python 3.7.6

  • Virtual env: Conda

Package Installation

  • DeepSpeech 0.9.0a10 installed via Pip

Click to see package list (package count: 39)

:package: Package list from Conda

Package Version
_libgcc_mutex 0.1
_openmp_mutex 4.5
ca-certificates 2020.4.5.1
certifi 2020.4.5.1
chardet 3.0.4
colorama 0.3.9
colorful 0.5.4
cursor 1.2.0
deepspeech-gpu 0.9.0a10
enum34 1.1.6
halo 0.0.18
idna 2.9
ld_impl_linux-64 2.34
libffi 3.2.1
libgcc-ng 9.2.0
libgomp 9.2.0
libstdcxx-ng 9.2.0
log-symbols 0.0.11
ncurses 6.1
numpy 1.18.3
openssl 1.1.1g
pip 20.1
pyaudio 0.2.11
python 3.7.6
python_abi 3.7
readline 8.0
requests 2.23.0
scipy 1.4.1
setuptools 46.1.3
six 1.11.0
spinners 0.0.23
sqlite 3.30.1
termcolor 1.1.0
tk 8.6.10
urllib3 1.25.9
webrtcvad 2.0.10
wheel 0.34.2
xz 5.2.5
zlib 1.2.11

- generated at 22:31 on Oct 04 2020 using Gather Up tool :gift:

3 Likes

I am using v0.9.0a10 for Nvidia Jetson/Xavier with a german DS-model (trained with v0.7). It is fed by my Mycroft Mark-I voice assistant now for 9 days without any issues (besides the rather high WER, but that is not a problem of v0.9 itself).

3 Likes

I’ve had an experiment with the new hot-word boosting feature via the Python API.
I’ll need to experiment a bit more but I’m pleased to say that it worked for me. Thanks @josh_meyer and everyone else who helped get it added :+1:

There was a small (obvious) issue with the doc string parameters that I submitted a tiny PR to fix just earlier.

I’ve got to rush off shortly, but I can share my “hacked together” test code for this via a gist or something if it’s handy (although it’s straight forward to create an example by following the docs here: https://deepspeech.readthedocs.io/en/master/Python-API.html)

My environment was the same as my earlier post above.

An example I tested with was with the words by vs buy. By default, it would hear “by” in the phrase “i want you to go by turkey” but with a modest boost of 10.0 for “buy” it would recognise “I want you to go buy turkey” (which depending on context could be the more likely intent).

A few points I wanted to check/raise:

  1. It looks as if a word that’s not in the scorer vocab at all may not get added. Is it meant to? It’s hard to tell as it may just be misinterpreting my UK English accent, but it seemed not to matter what factor I put for “Asterix” (ie the comic character) as it would never be returned.

  2. Also I noticed is if you put an excessive boost the words after the hot-word seem to get misinterpreted as individual letters (in the case below I actually didn’t have a word between buy and turkey):

Added hotword (buy, 50.0)
Recognized: i want you to go buy t h a t turkey

  1. Setting a hot-word a second time (ie when it already had been set and hadn’t been cleared) caused an error in the Python API. Should be possible to catch that in user side code and just clear it then set it but it would be handy if the API did this (I think it was mentioned in one of the comments on the PR but I guess that’s still pending, and it’s not a big deal at all, I just mention it so others don’t fall into this trap too)

To clarify, I tried adding it in lower case (“asterix”) as the recognised words returned are lower case but that made no difference. Boost values of 5, 10, 50, 100, 1000 and 10000 were tried.