Anyone tell the WER of switchboard when train only on switchboard dataset. Also WER of TED when training on TED dataset. Actually I wanted to know how the Deepspeech perform when train on small dataset (around 200hrs).
I tend to remember that on too small datasets you could not expect that good results: WER would not achieve good level without overfitting enough that the network is barely usable for anything more general. So, maybe it can work in your case, what are you trying to achieve exactly?
I have small amount (around 200hrs) of homogeneous dataset, telephonic conversion. I am curious which algorithm should I use, whether I should go to deep learning or HMM based algo. Deep learning itself has CNN/RNN based algo. How much accuracy I can achieve if my test dataset also similar to train dataset of telephonic conversation.