I can't find scripted French Dataset on MDC

Hi everyone,

I am currently looking for the French Scripted Speech dataset on the new Mozilla Data Collective platform: https://datacollective.mozillafoundation.org/datasets.

When I filter by language (French), the only available results are:

  • Common Voice Spontaneous Speech 3.0 - French (which contains only 152 transcribed clips).

I am looking for the full scripted version (previously known as mvc-scripted-fr)

Could you please clarify:

  1. Is the historical scripted French dataset still being migrated to the MDC?

  2. Is it now bundled within a global multi-language “Common Voice Corpus” entry instead of a standalone French one?

  3. Where can I find the most recent stable version of the scripted French data on this new platform?

Thank you for your help!

1 Like

Hey @bitote7015, we are in the last steps of the release process. French scripted has been uploaded to MDC and is waiting for approval.

1 Like

Thanks,

It’s available : Common Voice Scripted Speech 25.0 - French | Mozilla Data Collective

1 Like