Differences from core architecture and optional RNN-T decoding in future?

With reference to the following:

The architecture of the engine was originally motivated by that presented in Deep Speech: Scaling up end-to-end speech recognition. However, the engine currently differs in many respects from the engine it was originally motivated by. The core of the engine is a recurrent neural network (RNN) trained to ingest speech spectrograms and generate English text transcriptions.

Can anyone provide any pointers to the stated differences?(Blogs/Documentation/Publications)

Also, Can we look forward to (a compatible) RNN-T option for decoding in near future?

You are referring to these docs, right? Because they state the currently used architecture:

https://deepspeech.readthedocs.io/en/v0.7.1/DeepSpeech.html

@reuben and @lissyx can tell you more about the plans for mobile DeepSpeech including RNN-T.

And for next time, please try searching the forum and github repo, this saves us all some time. Hint, there is info about transducers.

1 Like

I apologize for any inconvenience. Surely! I will keep that in mind.

I am coming from the documentation and thought If there would be an article or a document discussing these differences explicitly other than the architecture documentation.

Thank you for the hint. :slight_smile:

I generally avoid pinging core team. Anyways, Thanks for mentioning!

Hello!

Since we are talking about architecture and future options, I would like to know Reuben’s thoughts on QuartzNet tests.