Existing scorer and acoustic model further tweaking/training possibilities

Hi All, I am very new to all this, so please accept my apology if anything below does not make sense…

First I would like to ask if I understand two things correctly:


The whole solution architecture is composed from three layers/components:

Acoustic model + decoder (.pbmm file)

Language model (included in the .scorer file)

Classification model (TRIE) (included included in the .scorer file)


In newer versions the Language model and TRIE are combined into one component called Scorer.


Usually the best way how to create the Acoustic model + decoder (pbmm) and Scorer (scorer) components is to use the same text input set (and corresponding WAV files in case of the Acoustic model).

Given the answers to previous questions are yes, I would like to ask few more more specific questions:


a) Is it possible to ammend the pre-created Scorer (based on Libri Speech Corpus) with the custom text input? I.e. keep using it but „tweak it“ by adding specific content (text input)?

b) If so, is it possible to increase the statistical weight of the addition so it takes precedence before the „standard“ content (from Libri Speech Corpus)?


Is it possible to do something similar with the Acoustic model + decoder component? I.e. use the supplied (example) model and train it further on specific content (text + audiofiles) to increase its efficiency in specific area of usage?

Thanks a lot for any input on this.

The trie is just a prefix tree encoding the vocabulary of the model.

Incorrect, it’s best to use more text data to build a scorer than just your training transcripts.

Yes, LibriSpeech is open source and so are the scripts we use to build the Scorer, just modify things according to your needs.

Not sure, probably not without writing some code.

Yes, it’s called fine tuning.

Thank you for the clarification!