How does DeepSpeech work?

Is there any article or Description how DeepSpeech is working?
Does it analyse syllables ?

It’s based on a scientific paper by Baidu. You can read all about the method here: https://arxiv.org/abs/1412.5567

Thank you very much!

There’s also a video from FOSDEM 2018 with some details: https://youtu.be/_VCfnHZvmBU