Common voice sentences are the opposite of "common"

nukeador · January 20, 2020, 1:45pm

Hi @david-song welcome to the community!

One of the main challenges of this project is to have a public-domain text dataset big enough to accommodate the thousands of unique hours we need to have a solid dataset.

The most successful approach we have done to get 2M+ sentences to read was the wikipedia extraction (where we still need help).

If you happen know another big source of sentences with a public-domain license we can use, it would be great to plan our next steps into evolving the wikipedia-extractor tool to also be able to extract and clean-up sentences from other sources.

Thanks for your feedback!

Topic		Replies	Views
Common Voice Sentence Collection Tool launch Common Voice sentence-collection , announcements	15	4229	April 2, 2019
[Common Voice] Technical help needed to grow our sentence diversity DeepSpeech	0	933	July 30, 2019
Bulk sentences submission from Wikipedia Common Voice sentence-collection	4	556	August 12, 2024
Support needed to get more sentences in Persian Common Voice sentence-collection	3	3088	May 18, 2020
Russian speech Common Voice sentence-collection	25	5403	March 4, 2019

Common voice sentences are the opposite of "common"

Related topics