How to add sentences and recordings. Kyrgyz. 10000 samples

algutman03 · March 10, 2020, 5:52pm

Hello,
I’m supporting a group in Kyrgyzstan who has collected a dataset with 10000 sentences and the associated audio. There is no copyright on the data. Can we get this added to the existing Kyrgyz dataset on Common Voice?

Any help would be greatly appreciated.

nukeador · March 11, 2020, 11:28am

Hi @algutman03 welcome to the community!

The Common Voice dataset is formed only by the audios collected by our site. If you have a full dataset (text and audio) for your language, it would be better to publish it somewhere with details about the methodology and QA controls you have used to collect it.

This dataset might be usable on #deep-speech model training together with the Common Voice dataset.

algutman03 · March 23, 2020, 11:40am

Thank you for the response.

Would it make sense to share the text on this site?

algutman03 · March 23, 2020, 11:41am

@omarov-abai999, I will ask the project sponsors if it’s possible to share with you directly.

Are you in Krygyzstan?

omarov-abai999 · March 26, 2020, 5:50am

I’m from Kazakhstan and I’m interested in the task transfer learning

nukeador · March 26, 2020, 12:24pm

If the sentences are CC0 licensed you can add them to the sentence collector so they can be validated and recorded from the common voice site.

Thanks!

Topic		Replies	Views
Spoken language vs written language in Tamil Common Voice sentence-collection	9	2909	November 1, 2019
Add in dataset Sakha language Common Voice dataset	5	1312	April 25, 2019
Licensing and contribution to Common Voice Common Voice sentence-collection	5	1641	June 12, 2019
Common voice sentences are the opposite of "common" Common Voice participation , sentence-collection , feedback , issue	27	3812	September 7, 2024
Extending our sentence collection capabilities Common Voice sentence-collection , announcements	19	3708	September 11, 2019

How to add sentences and recordings. Kyrgyz. 10000 samples

Related topics