Hi all,
I’m a beginner with DeepSpeech. I installed last version as specified here: https://deepspeech.readthedocs.io/en/v0.9.3/index.html
And I’m now able to transcript using the CLI command and the native client (BTW, I’m working on a micro opensource project to show how to use DS server from nodejs: https://github.com/solyarisoftware/DeepSpeechJs
Question 1:
Considering that I would like to use DS as short sentences ASR for a closed-domain chatbot, where there are specific kind of user utterances as:
- spelled alphanumeric codes (e.g.
M N Q U one two four six
) - specific name entities as person names (e.g.
Giorgio Robino
,Giuditta Del Buono
) - etc.
If I well understood I can improve the transcript accuracy of the pre-trained language model also “just” building a custom scorer file (customApp.scorer
) to be used at run-time (avoiding to re-train the pretrained model with custom audio files):
deepspeech \
--model deepspeech-0.9.3-models.pbmm \
--scorer customApp.scorer \
--audio sample.wav
That’s true?
BTW, There is any data/report that show quantitatively how accuracy rise using a custom scorer for specific closed-domain inputs?
Question 2:
I read documentation about how to create my own scorer file:
https://deepspeech.readthedocs.io/en/v0.9.3/Scorer.html#external-scorer-scripts
But I’m confused. There is any step-by-step tutorial that show how can I proceed?
A step-by-step example would help a lot! Does it exists?
Where data/lm/generate_lm.py
, and generate_scorer_package
are located?
What’s the format of the original text file containing custom sentences?
If, by example, I want to let the ASR better understand 4 digit numeric codes:
one zero zero zero
one zero zero one
one zero zero two
one zero zero three
...
...
nine nine nine nine
the text is a collection of all possible sentences possible, so in this case all numbers in letters between 0000 and 9999 ?
Question 3:
A last point is not clear to me. For a best result in general case I would extend the pretrained model scorer with a custom scorer. In this case, do I need to add custom sentences at the end of the original pretrained model scorer? Or building the custom scorer is the way t go?
BTW, my configuration:
(deepspeech-venv) uname -a
linux itd-giorgio-laptop 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
(deepspeech-venv) $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
(deepspeech-venv) $ python --version
Python 3.8.5
(deepspeech-venv) $ deepspeech --version
DeepSpeech 0.9.3
(deepspeech-venv) $ sudo lshw -C display
*-display
description: VGA compatible controller
product: WhiskeyLake-U GT2 [UHD Graphics 620]
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:129 memory:a1000000-a1ffffff memory:b0000000-bfffffff ioport:6000(size=64) memory:c0000-dffff
Thanks!
giorgio