I’m a beginner with DeepSpeech. I installed last version as specified here: https://deepspeech.readthedocs.io/en/v0.9.3/index.html
And I’m now able to transcript using the CLI command and the native client (BTW, I’m working on a micro opensource project to show how to use DS server from nodejs: https://github.com/solyarisoftware/DeepSpeechJs
Considering that I would like to use DS as short sentences ASR for a closed-domain chatbot, where there are specific kind of user utterances as:
- spelled alphanumeric codes (e.g.
M N Q U one two four six)
- specific name entities as person names (e.g.
Giuditta Del Buono)
If I well understood I can improve the transcript accuracy of the pre-trained language model also “just” building a custom scorer file (
customApp.scorer) to be used at run-time (avoiding to re-train the pretrained model with custom audio files):
deepspeech \ --model deepspeech-0.9.3-models.pbmm \ --scorer customApp.scorer \ --audio sample.wav
BTW, There is any data/report that show quantitatively how accuracy rise using a custom scorer for specific closed-domain inputs?
I read documentation about how to create my own scorer file:
But I’m confused. There is any step-by-step tutorial that show how can I proceed?
A step-by-step example would help a lot! Does it exists?
data/lm/generate_lm.py , and
generate_scorer_package are located?
What’s the format of the original text file containing custom sentences?
If, by example, I want to let the ASR better understand 4 digit numeric codes:
one zero zero zero one zero zero one one zero zero two one zero zero three ... ... nine nine nine nine
the text is a collection of all possible sentences possible, so in this case all numbers in letters between 0000 and 9999 ?
A last point is not clear to me. For a best result in general case I would extend the pretrained model scorer with a custom scorer. In this case, do I need to add custom sentences at the end of the original pretrained model scorer? Or building the custom scorer is the way t go?
BTW, my configuration:
(deepspeech-venv) uname -a linux itd-giorgio-laptop 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux (deepspeech-venv) $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal (deepspeech-venv) $ python --version Python 3.8.5 (deepspeech-venv) $ deepspeech --version DeepSpeech 0.9.3 (deepspeech-venv) $ sudo lshw -C display *-display description: VGA compatible controller product: WhiskeyLake-U GT2 [UHD Graphics 620] vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 00 width: 64 bits clock: 33MHz capabilities: pciexpress msi pm vga_controller bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:129 memory:a1000000-a1ffffff memory:b0000000-bfffffff ioport:6000(size=64) memory:c0000-dffff