LSTM vs. GRU

The blog post A Journey to <10% Word Error Rate mentions that Mozilla DeepSpeech uses LSTM instead of GRU for its recurrent units.

What was the reasoning behind this decision?

Generically LSTM’s seem to out perform GRU’s. But Not universally.

So with this generic rule in mind we tried LSTM’s. However, we are currently running lots of benchmarks to see which is best and we will have experimental validation of our final choice.