Training own scorer for Deepspeech 0.7.4

ed15v · December 8, 2020, 6:58pm

Hi there,

I’ve came across of a pre-trained model for Deepspeech 0.7.4 and wanted to adjust it to my use. I use it with a speech assistant so I created a list of all words that are recognized by the assistant and called it alphabet.txt.

In the guide it says something about generating a KenLM model. Do I have to create my own one and if yes what kind of input does it need? Or should I rather stick to the given one and only create the scorer with my alphabet.txt?

Thank you in advance

lissyx · December 8, 2020, 7:00pm

Have you read External scorer scripts — Mozilla DeepSpeech 0.10.0-alpha.3 documentation ?

ed15v · December 8, 2020, 7:11pm

Yeah but it’s still not clear to me. I already have a lm.binary that was created with a big text corpus though. My question is if it makes more sense to reuse it or for example generate a file with all the possible sentences that can occur instead. Or a combination of text corpus + sentences of the voice assistant

lissyx · December 8, 2020, 7:12pm

please define your problem, the correct handling will be obvious once you states exactly what you are trying to achieve

ed15v · December 8, 2020, 7:29pm

I have a custom voice assistant where I have different skills installed so I have various intents that can be triggered. I want to improve the accuracy of the speech-to-text so the natural language processing has it easier when there is ambiguity.

I’ve installed a custom model that was created using different speech corpuses including mozilla common voice. Now since this model is for general use, I want to adapt it to my use so it becomes more accurate.

I already created a list of all words that can occur when interacting with my voice assistant. I’m not sure if I also need to create all possible sentences that can occur to generate an own lm.binary. Or if using the given lm.binary is better in this case.

Topic		Replies	Views
Question regarding the new scorer function instead of LM+trie DeepSpeech	8	826	May 20, 2020
Custom language model and alternatives to recognized sentences? DeepSpeech	13	1936	May 20, 2020
Using the newly generated language model doesn't perform as expected DeepSpeech	2	478	June 26, 2021
Help: how to generate a custom scorer? DeepSpeech	18	2711	August 13, 2021
DeepSpeech Language Model parameters DeepSpeech	5	1587	September 13, 2020

Training own scorer for Deepspeech 0.7.4

Related topics