Adding a CC0 1.0 licensed voice database for Odia

psubhashish · July 15, 2020, 1:18pm

There are two main categories on Wikimedia Commons that list more than 4000 audio recordings of pronunciations of words (here and here). It would be useful to bring those recordings on Common Voice as the license is compatible (CC0 1.0) and they can be validated on Common Voice. Though these are recording of words and not sentences, they are mostly from a 1930 Public Domain lexicon called Purnachandra Ordiya Bhashakosha and words used on Wikipedia articles, and hence have a wide variety domains. Many words (and phonemes) that might not be recorded on Common Voice can be found from those recordings. This is a request that I made when Michael Henretty was around as he was helping a lot in building the initial foundation of Common Voice (per his recommendation I, as the copyright holder of the recordings, made a request on Commons to relicense all the audio files under a CC0 1.0 license). Would really appreciate if this can be added to the workflow for importing those audio from Wikimedia Commons to Common Voice.