LM for names


I’m wondering how to build a LM for human names which can be used by DeepSpeech.

Option #1: Build a database of “First Last” names
Pros: work with KenLM directly
Cons: hard to find sources with valid “First Last” names.

Option #2: Build a database of mixed “First” or “Last” names.
Pros: easy to build such a database
Cons: KenLM doesn’t support unigram

Has anybody done this before? And could you please share your experience? Thanks!

Not to our knowledge. To date, it seems all other speech recognition engines struggled badly with that as well, especially last names.