How to add sentences and recordings. Kyrgyz. 10000 samples

Hello,
I’m supporting a group in Kyrgyzstan who has collected a dataset with 10000 sentences and the associated audio. There is no copyright on the data. Can we get this added to the existing Kyrgyz dataset on Common Voice?

Any help would be greatly appreciated.

2 Likes

Hi @algutman03 welcome to the community!

The Common Voice dataset is formed only by the audios collected by our site. If you have a full dataset (text and audio) for your language, it would be better to publish it somewhere with details about the methodology and QA controls you have used to collect it.

This dataset might be usable on #deep-speech model training together with the Common Voice dataset.

1 Like

Thank you for the response.

Would it make sense to share the text on this site?

@omarov-abai999, I will ask the project sponsors if it’s possible to share with you directly.

Are you in Krygyzstan?

I’m from Kazakhstan and I’m interested in the task transfer learning

If the sentences are CC0 licensed you can add them to the sentence collector so they can be validated and recorded from the common voice site.

Thanks!