If you’re looking to implement this with a pre-trained model, using a class-based language model with the new decoder works reasonably well. The pre-trained model doesn’t include a lot of proper names in it though, and it’s too slow for real-time inference on mobile devices as it’s currently implemented. If you’re training from your own audio, a smaller model (1024 or 1536 wide) can be real-time on a Snapdragon 835 for example, and paired with a class-based LM it should work reasonably well.