Any reason 0.5.x models weren't trained on Common Voice data this time?

nmstoker · June 27, 2019, 12:18am

Comparing the 0.5.0 and 0.5.1 release notes with those from 0.4.0 and before, it seems not to list Common Voice in the training set (it mentions the list of sources under the Hyperparameter section as:

train_files Fisher, LibriSpeech, and Switchboardtraining corpora.

And that seems to have been backed up by @lissyx 's comment here: Fine-tuning DeepSpeech Model (CommonVoice-DATA)

Is there any particular reason this was done?

Mainly I’m just curious, but I’m also in the process of trying to fine tune with English English data extracted from Common Voice and thought it would be useful to hear, just in case the CV data hadn’t been used with 0.5.x models for some reason that would affect my efforts too

reuben · June 27, 2019, 12:38am

It was just an oversight when training the 0.5.0 model. We’ll be back to business as usual in the next release.

nmstoker · June 27, 2019, 12:51am

Ah! Thanks for clarifying.

dabinat · June 27, 2019, 3:51am

Oh, I didn’t realize the 0.5 model didn’t contain Common Voice data. I guess that’s probably the reason for the regressions I posted about here:

yv001 · June 27, 2019, 8:00am

is common voice data going to be included in 0.6.0 or in 0.5.2?

lissyx · June 27, 2019, 8:58am

In 0.6.0 more likely, there won’t be a 0.5.2

ctzogka · June 27, 2019, 10:00am

Hello, i would like to add a question, is it reasonable to fine-tune a model that has already been trained on CV data (e.g. v0.4.0), using CV data?
I will always have worse results on a standard test set in comparison to Mozilla exported model?
Have I any chance to outperform?

kdavis · June 27, 2019, 10:23am

Yes for some use cases it’s reasonable.

For example, 0.4.0 was trained on various data sets. Common Voice made up about one tenth of the training data for 0.4.0 and the remaining nine tenths had a bit less noise than Common Voice. If your use case involved data similar to Common Voice, i.e. data that had a bit of noise, it would make sense to fine tune the 0.4.0 model using Common Voice to make a model more robust to noise.

I don’t think 0.4.0 is optimal. So yes, you’d have a chance to outperform it.

ctzogka · June 27, 2019, 10:27am

Thank you, it’s clear now!