Partial datasets for limited space, noisy datasets / creation

arpi.aszalos · September 2, 2020, 7:12am

Hey, I’m planning to do an experiment for noise injection training with three different magnitudes for the English language. Unfortunately, my disk space is limited so I’m looking for a dataset to be used with deep speech under between 5 - 15 GB.

Also, is it possible to use only part of the English common voice dataset?
If yes how can I do that?

For inference testing do I need to find certain noisy sets or can I create it from the common voice dataset by making them noisy myself?

Some advice on this one would also be appreciated, Thanks in advance.

Topic		Replies	Views
Smaller commonvoice dataset Common Voice learning , feedback	0	1179	September 2, 2020
Noise injection training experiment DeepSpeech learning , feedback , dataset	33	1926	September 16, 2020
Using common voice datasets? DeepSpeech	5	1073	November 17, 2020
Suggestion: Offer download of sample of dataset Common Voice feedback	7	1355	January 3, 2021
Common Voice dataset releases - Looking for your feedback DeepSpeech	0	831	April 10, 2019

Partial datasets for limited space, noisy datasets / creation

Related topics