Recognizing phone numbers

I’m interested in using deep speech to do transcriptions on voicemails I receive. So far the biggest limitation in the recognition is phone numbers – deep speech seems biased against the audio containing a string of digits in a row, which makes sense since saying “five five five three four four …” is not common in normal speech, but very common in voicemails.

Is it possible to create a training model that starts with the pre-trained model, but adds more known-good transcripts I have on top of that? Maybe if I just feed it a lot of people saying phone numbers this will improve.

This is definitely possible and one of the reasons we release the checkpoints[1].

What one does is to “fine tune”, continue training the checkpointed model using your data set containing phone numbers, the model.

In addition you may need to recreate a language model, using KenLM[2], and trie from text that contains more numbers.