Hi all,
I’m a beginner with DeepSpeech. I installed last version as specified here: https://deepspeech.readthedocs.io/en/v0.9.3/index.html
And I’m now able to transcript using the CLI command and the native client (BTW, I’m working on a micro opensource project to show how to use DS server from nodejs: https://github.com/solyarisoftware/DeepSpeechJs
Question 1:
Considering that I would like to use DS as short sentences ASR for a closed-domain chatbot, where there are specific kind of user utterances as:
- spelled alphanumeric codes (e.g.
M N Q U one two four six) - specific name entities as person names (e.g.
Giorgio Robino,Giuditta Del Buono) - etc.
If I well understood I can improve the transcript accuracy of the pre-trained language model also “just” building a custom scorer file (customApp.scorer) to be used at run-time (avoiding to re-train the pretrained model with custom audio files):
deepspeech \
--model deepspeech-0.9.3-models.pbmm \
--scorer customApp.scorer \
--audio sample.wav
That’s true?
BTW, There is any data/report that show quantitatively how accuracy rise using a custom scorer for specific closed-domain inputs?
Question 2:
I read documentation about how to create my own scorer file:
https://deepspeech.readthedocs.io/en/v0.9.3/Scorer.html#external-scorer-scripts
But I’m confused. There is any step-by-step tutorial that show how can I proceed?

A step-by-step example would help a lot! Does it exists?
Where data/lm/generate_lm.py , and generate_scorer_package are located?
What’s the format of the original text file containing custom sentences?
If, by example, I want to let the ASR better understand 4 digit numeric codes:
one zero zero zero
one zero zero one
one zero zero two
one zero zero three
...
...
nine nine nine nine
the text is a collection of all possible sentences possible, so in this case all numbers in letters between 0000 and 9999 ?
Question 3:
A last point is not clear to me. For a best result in general case I would extend the pretrained model scorer with a custom scorer. In this case, do I need to add custom sentences at the end of the original pretrained model scorer? Or building the custom scorer is the way t go?
BTW, my configuration:
(deepspeech-venv) uname -a
linux itd-giorgio-laptop 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
(deepspeech-venv) $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
(deepspeech-venv) $ python --version
Python 3.8.5
(deepspeech-venv) $ deepspeech --version
DeepSpeech 0.9.3
(deepspeech-venv) $ sudo lshw -C display
*-display
description: VGA compatible controller
product: WhiskeyLake-U GT2 [UHD Graphics 620]
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:129 memory:a1000000-a1ffffff memory:b0000000-bfffffff ioport:6000(size=64) memory:c0000-dffff
Thanks!
giorgio


There are various measures of LM effectiveness (eg perplexity, entropy etc) with details available by googling but again I’d suggest that with something like this it’s best to experiment - compiling the LM part is surprisingly quick (it takes much less time than asking a question and then you won’t need to tie someone up answering it