6.5% was with v0.1, but that turned out to be affected by the test data being included by accident in the language model. v0.2 then had an increase in the error rate due to the reduction in the model size for streaming as well as removing the offending data from the construction of the language model. v0.3 fixed some non-deterministic behavior bugs that were introduced in v0.2, but the acoustic model released is the same as the one in v0.2, so you the v0.3 error rate is probably more correct. We’re working on different features that will hopefully bring the WER back down to under 10%.
Can you please expand on this, I don’t understand it clearly. Does it mean the transcripts of the test data set was included in the language corpus when building the language model?