Tune MoziilaDeepSpeech to recognize specific sentences

thank you for the reply. i will build it myself

hi, thanks for the wonderful tutorial. i trained a language model for only 12 words, but when i used with the pretrained model for most of the words the output is empty. i want to predict only those 12 words. how to improve the accuracy of language model?. if i dont use the language model output is gibberish.

Hi @Ajay_Ganesan - this might be hard to diagnose.
I would start with confirming that it isn’t a recording quality or accent issue by trying to find the words in your list of 12 in another source, ideally one where they’re said by people with a US accent (as the majority of the acoustic model training has been done with US accented speakers). That would at least give a sense about whether it is equally challenged by those samples or not.

Can I check also if you’ve stuck with an older version of DeepSpeech - your comments above are asking about 0.5.0 and I’m guessing perhaps you stuck with that to be able to follow the steps in the tutorial. Generally as there have been some significant improvements I’d suggest trying to use 0.7.1 - I realise you will have to make some changes to the steps as the handling of the LM has changed a bit but the principals are pretty similar (if anything it’s easier now and it’s well documented).

Switching won’t necessarily help if it’s an accent thing, as I believe the model is still stronger with US accents - it does pretty well with my UK accent but there are areas where the accent seems like it’s struggling for me too. If you were in that situation then the way forward would be to look at fine tuning the model but you’d need a decent amount of audio transcribed and I’d try to narrow down what the issue is before going down that route.

Hope that helps? Good luck!

Hi @nmstoker, this is very helpful but while generating lm.binary and other output files, I am getting error. Please help me generate lm.binary

usage: ipykernel_launcher.py [-h] --vocabulary.txt VOCABULARY.TXT --output_dir OUTPUT_DIR --top_k TOP_K --kenlm_bins
KENLM_BINS --arpa_order ARPA_ORDER --max_arpa_memory MAX_ARPA_MEMORY --arpa_prune
ARPA_PRUNE --binary_a_bits BINARY_A_BITS --binary_q_bits BINARY_Q_BITS --binary_type
BINARY_TYPE [–discount_fallback]
ipykernel_launcher.py: error: the following arguments are required: --vocabulary.txt, --output_dir, --top_k, --kenlm_bins, --arpa_order, --max_arpa_memory, --arpa_prune, --binary_a_bits, --binary_q_bits, --binary_type

An exception has occurred, use %tb to see the full traceback.
SystemExit: 2

@Yugandhar_Gantala your post doesn’t seem to have enough information to investigate further. Can you give a bit more detail on what you’re actually doing, versions, environment etc. It looks like you’ve called some code and you haven’t passed the parameters.

Imagine I can’t see what you’re doing (because I cannot :slightly_smiling_face:)

Please search the forum and if you post give us more to work on. There are several posts on building the scorer.

Hey @nmstoker,
I have created my own vocabulary.txt file and I want to train deepspeech 0.7.3 pretrained model on my own vocabulary. I did follow the steps you mentioned above. Now I am trying to generate the output files (lm.binary, warpa.words, trie), while generating I am getting an error that the arguments are required “–vocabulary.txt, --output_dir, --top_k, --kenlm_bins, --arpa_order, --max_arpa_memory, --arpa_prune, --binary_a_bits, --binary_q_bits, --binary_type”.
What is generate_trie for? We will be getting an output file trie right, isn’t that enough to train the model on vocabulary.txt

@sujithvoona2 I think the issue is that since writing the above instructions the process has changed, so what is above won’t work with 0.7.x - as @othiele suggests, it’s best to look over the forum for how to build the scorer. You could also refer to the documentation for your corresponding version here: https://deepspeech.readthedocs.io/en/v0.7.3/

2 Likes

And more specifically, https://deepspeech.readthedocs.io/en/v0.7.3/Scorer.html

2 Likes

Again, you hijack older threads. You have to build most of the system yourself. DeepSpeech only does speech to text. The rest is your part.

thanks .

https://stackoverflow.com/questions/64183342/commant-to-ddep-speech-by-deep-speech-to-farsi-data-set/64183407#64183407

Hello @nmstoker,

I want to create my own scorer file. When executing the generate_lm.py script, I have this output:

CBuilding lm.binary …
./data/lm/lm.binary
Usage: ./kenlm/build/bin/build_binary [-u log10_unknown_probability] [-s] [-i] [-v] [-w mmap|after] [-p probing_multiplier] [-T trie_temporary] [-S trie_building_mem] [-q bits] [-b bits] [-a bits] [type] input.arpa [output.mmap]

-u sets the log10 probability for if the ARPA file does not have one.
Default is -100. The ARPA file will always take precedence.
-s allows models to be built even if they do not have and .
-i allows buggy models from IRSTLM by mapping positive log probability to 0.
-v disables inclusion of the vocabulary in the binary file.
-w mmap|after determines how writing is done.
mmap maps the binary file and writes to it. Default for trie.
after allocates anonymous memory, builds, and writes. Default for probing.
-r “order1.arpa order2 order3 order4” adds lower-order rest costs from these
model files. order1.arpa must be an ARPA file. All others may be ARPA or
the same data structure as being built. All files must have the same
vocabulary. For probing, the unigrams must be in the same order.

type is either probing or trie. Default is probing.

probing uses a probing hash table. It is the fastest but uses the most memory.
-p sets the space multiplier and must be >1.0. The default is 1.5.

trie is a straightforward trie with bit-level packing. It uses the least
memory and is still faster than SRI or IRST. Building the trie format uses an
on-disk sort to save memory.
-T is the temporary directory prefix. Default is the output file name.
-S determines memory use for sorting. Default is 80%. This is compatible
with GNU sort. The number is followed by a unit: % for percent of physical
memory, b for bytes, K for Kilobytes, M for megabytes, then G,T,P,E,Z,Y.
Default unit is K for Kilobytes.
-q turns quantization on and sets the number of bits (e.g. -q 8).
-b sets backoff quantization bits. Requires -q and defaults to that value.
-a compresses pointers using an array of offsets. The parameter is the
maximum number of bits encoded by the array. Memory is minimized subject
to the maximum, so pick 255 to minimize memory.

-h print this help message.

Get a memory estimate by passing an ARPA file without an output file name.
Traceback (most recent call last):
File “./data/lm/generate_lm.py”, line 211, in
main()
File “./data/lm/generate_lm.py”, line 202, in main
build_lm(args, data_lower, vocab_str)
File “./data/lm/generate_lm.py”, line 127, in build_lm
“./data/lm/lm.binary”,
File “/usr/lib/python3.6/subprocess.py”, line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[’./kenlm/build/bin/build_binary’, ‘-a’, ‘255’, ‘-q’, ‘8’, ‘-v’, ‘tree’, ‘./data/lm/lm_filtered.arpa’, ‘./data/lm/lm.binary’]’ returned non-zero exit status 1.

My lm.arpa is created succesfully but it crashed when creating lm.binary. Do you have any idea about ?

Thank’s for your help

I’d suggest you start by trying to get more details about the error by running the command that creates the lm_binary directly (the script will guide you as to what it’s doing so take a look in there to figure out what you need to try running in the terminal)

With that info you may be able to figure it out right away or it might be something you can Google around how KenLM works to figure out.

Also I would suggest that you confirm you can generate the official scorer first, before branching off to make your own one, because if you know you can use the script to make the official one you’ll have some confidence your setup is workable (right now you don’t know for sure it works and you’ve tried something new with it, so your ability to isolate your problem is reduced). I realise it’s tempting to try your own new thing, and people often are keen to run before they can walk :slightly_smiling_face:

Anyway, I’m sure that by being methodical you can figure it out yourself. Best of luck!

Thank’s @nmstoker for your fast reply. I got my error and everything works well.

Thank’s again !

1 Like

Hey kamil_BENTOUNES I am getting same error as you…can you help me what changes you do to solve that

thank you

Hello @Ahmad_Ali1,

Sorry I don’t really remember what I did to solve my error ! But here the commands I used to generate the .scorer file:

sudo python3 /path/to/generate_lm.py --input_txt /path/to/vocabulary.txt --output_dir ./path/to/output --top_k 500 --kenlm_bins /path/to/kenlm/build/bin --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie --discount_fallback

Then:

/path/to/native_client/generate_scorer_package --alphabet /path/to/alphabet.txt --lm /path/to/lm.binary --vocab /path/to/vocabulary.txt --package /path/to/output/use_case_eval.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284

I hope it will help you !

1 Like

Thanks for your helpful reply

1 Like

Hi every one
i am create custom language model. i am use deepspeech 0.7.4 . i have 1 hour sound. i am create my own scorer file . i am training 500 epoch.
after use mic_vad_streaming.py but my model worked uncorrectly.

%cd /content/DeepSpeech/

! python3 DeepSpeech.py \

–train_files /content/drive/MyDrive/sound/train2.csv \

–dev_files /content/drive/MyDrive/sound/dev.csv \

–test_files /content/drive/MyDrive/sound/test.csv \

–train_batch_size 1 \

–test_batch_size 1 \

–n_hidden 100 \

–epochs 500 \

–checkpoint_dir /content/drive/MyDrive/checkpoint3 \

–export_dir /content/drive/MyDrive/model \

–alphabet_config_path /content/drive/MyDrive/files/alphabet.txt \

–scorer /content/drive/MyDrive/files/kenlm.scorer\

–learning_rate 0.001\