Adding in a custom language model (like BERT)

mrdbourke · March 19, 2019, 7:11am

I’ve been reading the forums/other resources online looking for different ways to alter the language model behind DeepSpeech.

The data/lm/README.md seems pretty straight forward on how to train a language model using KenLM. Including updating the language model with additional vocab.

I’ve also seen posts such as:

Fine tuning the language model

and

TUTORIAL : How I trained a specific french model to control my robot

Which go through examples of different language model use.

But I was wondering if anyone has tried any other kind of language model?

As in one not built by KenLM? Such as BERT.

I’ve been looking at BERT lately (state of the art language model, achieving the best results on many language tasks) and was wondering how this would go behind the DeepSpeech acoustic model.

I’m aware it’s probably not a straight forward switch out but I plan on spending a few more days trying to figure out if it’s possible.

If anyone has any inputs/has tried something similar, I’d love to hear.

lissyx · March 19, 2019, 9:57am

I’m pretty sure either on discourse or on Github the question was already asked. It’s not impossible but you need to change training as well as inference code to plug your own language model, and it could be a bit more than a few days of work.