I tried to create my own scorer according to the given instruction https://github.com/mozilla/STT/blob/master/doc/Scorer.rst
Unfortunately, on a test audio recording with my own scorer instead of “why should one hall on the way” it recognizes the following phrase “wsauanatfuantdav”
If I decrease the default_alpha parameter to about 0.001, then it recognizes the following phrase “wsae ae ae a ta w”
I assumed that the problem could be caused by incorrect scripts. To test this hypothesis, I tried to assemble the scorer on the body, which is mentioned in the instructions. As a result, the phrase was correctly recognized
Please help me solve this problem if anyone has encountered a similar problem when creating a scorer on their own case
Please explicitely states the exact steps you took. Often people refers to “I followed the documentation” and it turns out they did something wrong.
What model are you using? What data source for the scorer? Parameters?
I don’t understand what you mean.
Please correctly state your problem, right now it’s too vague.
- python3 generate_lm.py --input_txt …/…/…/deepspeech_data/simple_corpus.txt --output_dir ./ --top_k 500000 --kenlm_bins …/…/…/deepspeech_data/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory “85%” --arpa_prune “0|0|1” --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
- ./generate_scorer_package --alphabet /mount/export2/alphabets/vocab_en.txt --lm lm.binary --vocab vocab-500000.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
I used the given model mozilla/deepspeech-0.8.0-models.tflite
Unfortunately, I am not allowed to provide my corpus. My corpus is in English and contains approximately 1900 unique words.
The parameters I’ve tried using:
- –default_alpha 0.931289039105002 --default_beta 1.1834137581510284
- –default_alpha 0.0001 --default_beta 1.1834137581510284
- –default_alpha 3.756 --default_beta 4.756
All other parameters are the same as in the instructions.
A phrase that is recognized with certain parameters on my corpus: - “wsauanatfuantdav”
- “wsae ae ae a ta w”
- “wsauanatfuantdav”
Sorry for my english. I mean i tried using LibriSpeech corpus
My problem is that when I try to use my corpus, I get an incorrect recognition. Wherein the arpa file I receive looks correctly
Where are those values coming from?
From this instruction: DeepSpeech/doc/Scorer.rst at master · mozilla/DeepSpeech · GitHub
I also forgot to clarify that I had already used my corpus in another DeepSpeech implementation with these parameters: --default_alpha 3.756 --default_beta 4.756.
There I got well recognition accuracy.
Have you read and understood what those parameters are doing? They need to be adjusted for your pair of dataset and scorer. So if --default_alpha 3.756 --default_beta 4.756
works well, where is the problem?
I understand what these parameters are responsible for.
These parameters work well with another DeepSpeech implementation, but I need them to work well with Mozilla’s DeepSpeech implementation.
What do you mean? I don’t understand.
I mean, on a given DeepSpeech implementation, LM built only with KenLM shows good results.
At the same time, LM, built according to the above-mentioned instruction on DeepSpeech from Mozilla, does not recognize correctly
This is a completely different project.
Sorry, but you are mixing a lot of things, and different projects here, I have no idea what you are doing.
Thanks for the help. I just figured out what the problem is. When generating the scorer package, I used my alphabet, which is different from the alphabet used for training deepspeech-0.8.0-models.tflite
Not unusal, and that’s why “I followed the docs” is not enough for us to help diagnose.