Comparing the 0.5.0 and 0.5.1 release notes with those from 0.4.0 and before, it seems not to list Common Voice in the training set (it mentions the list of sources under the Hyperparameter section as:
Mainly I’m just curious, but I’m also in the process of trying to fine tune with English English data extracted from Common Voice and thought it would be useful to hear, just in case the CV data hadn’t been used with 0.5.x models for some reason that would affect my efforts too
Hello, i would like to add a question, is it reasonable to fine-tune a model that has already been trained on CV data (e.g. v0.4.0), using CV data?
I will always have worse results on a standard test set in comparison to Mozilla exported model?
Have I any chance to outperform?
For example, 0.4.0 was trained on various data sets. Common Voice made up about one tenth of the training data for 0.4.0 and the remaining nine tenths had a bit less noise than Common Voice. If your use case involved data similar to Common Voice, i.e. data that had a bit of noise, it would make sense to fine tune the 0.4.0 model using Common Voice to make a model more robust to noise.
I don’t think 0.4.0 is optimal. So yes, you’d have a chance to outperform it.