Use deepspeech as one positive validation

carlfm01 · June 9, 2019, 9:13am

I’ve used this approach to create my Spanish dataset, you can read to see what I did : Releasing my Spanish dataset - 120h of public domain data
Now about the quality of the dataset is hard to tell, I need people to test it, I can’t simply manually review them, it is 110k files. I think probably a good idea is to sort them using the loss and start the review on the higher ones.

Topic		Replies	Views
Running out of sentences to validate Common Voice issue	12	762	February 18, 2019
Skipped % samples that failed on transcript validation DeepSpeech	8	699	March 19, 2020
Using speech recognition software to collect more data Common Voice	5	763	April 10, 2020
Releasing my Spanish dataset - 120h of public domain data DeepSpeech	1	2345	June 10, 2019
Question on training data set DeepSpeech	3	388	June 22, 2020

Use deepspeech as one positive validation

Related topics