Building my own scorer for Deepspeech

Jeremi_CAROL · November 10, 2021, 11:23am

Hello everyone,

I’m using Deepspeech to build a TTS software for a specific domain. Therefore I’m trying to build my own scorer as shown in https://mozilla.github.io/deepspeech-playbook/SCORER.html
On this page we can read :
Preparing the text file
…
These phrases should not be copied from test.tsv , train.tsv or validated.tsv as you will bias the resultant model.
I don’t understand why giving the scorer phrases from the train_set or val_set will bias the model and I can’t find any clear explanation on the internet or DS documentation. Can you help me ?

I also wonder why in the librispeech corpus txt used by default for DS Scorer we can read phrases like :

A
A A
A A A
A A A A
A A A A A
A A A A A A A A A A A A A A
A A A A A AH
A A A A A AH THE CRY WAS WRUNG FROM JOHNNIE
A A A A A BOVE SECOND SINGER DIMINUENDO
A A A A A MEN

Why so many A’s before every sentence ?

Best regards