Any reason 0.5.x models weren't trained on Common Voice data this time?

Comparing the 0.5.0 and 0.5.1 release notes with those from 0.4.0 and before, it seems not to list Common Voice in the training set (it mentions the list of sources under the Hyperparameter section as:

And that seems to have been backed up by @lissyx 's comment here: Fine-tuning DeepSpeech Model (CommonVoice-DATA)

Is there any particular reason this was done?

Mainly I’m just curious, but I’m also in the process of trying to fine tune with English English data extracted from Common Voice and thought it would be useful to hear, just in case the CV data hadn’t been used with 0.5.x models for some reason that would affect my efforts too

It was just an oversight when training the 0.5.0 model. We’ll be back to business as usual in the next release.

Ah! Thanks for clarifying.

Oh, I didn’t realize the 0.5 model didn’t contain Common Voice data. I guess that’s probably the reason for the regressions I posted about here:

is common voice data going to be included in 0.6.0 or in 0.5.2?

In 0.6.0 more likely, there won’t be a 0.5.2

Hello, i would like to add a question, is it reasonable to fine-tune a model that has already been trained on CV data (e.g. v0.4.0), using CV data?
I will always have worse results on a standard test set in comparison to Mozilla exported model?
Have I any chance to outperform?

Yes for some use cases it’s reasonable.

For example, 0.4.0 was trained on various data sets. Common Voice made up about one tenth of the training data for 0.4.0 and the remaining nine tenths had a bit less noise than Common Voice. If your use case involved data similar to Common Voice, i.e. data that had a bit of noise, it would make sense to fine tune the 0.4.0 model using Common Voice to make a model more robust to noise.

I don’t think 0.4.0 is optimal. So yes, you’d have a chance to outperform it.

Thank you, it’s clear now!