Comparing the 0.5.0 and 0.5.1 release notes with those from 0.4.0 and before, it seems not to list Common Voice in the training set (it mentions the list of sources under the Hyperparameter section as:
-
train_files
Fisher, LibriSpeech, and Switchboardtraining corpora.
And that seems to have been backed up by @lissyx 's comment here: Fine-tuning DeepSpeech Model (CommonVoice-DATA)
Is there any particular reason this was done?
Mainly I’m just curious, but I’m also in the process of trying to fine tune with English English data extracted from Common Voice and thought it would be useful to hear, just in case the CV data hadn’t been used with 0.5.x models for some reason that would affect my efforts too