Looking for Common Voice Corpus English before 2019-02-25 (v1) release

makoto_wada_jp · June 15, 2021, 10:37am

@fyters Thank you for your comments. As a background, I was skimming through a 2018 paper on Audio Adversarial Examples: Targeted Attacks on Speech-to-Text which mentions the usage of Mozilla Common
Voice. So I wondered what exact data they are referring to.

Evaluation Benchmark. To evaluate the effectiveness of our attack, we construct targeted audio adversarial examples on the first 100 test instances of the Mozilla Common Voice dataset.

Having explained the background, I was searching and found this post Older English dataset question. I wish I ran into this post earlier but there seems to be an older database with file name of sample-000000.mp3, sample-000001.mp3, etc. This is different from the file names of Common Voice Corpus 1 English which is 128 letters long followed by .mp3 extension.

0000a0f45a2a9ca26455c76d7abfe5992806f8ad0f014a18616fb7dda86c508753765e61697993e5d2a0d9e2fab52a822b31ed5c3f7f3e5bc37495453f6b335f.mp3

0000a1804c153bbb8cc5360a0b59a4818e7b4639e8948794af5eb2f725bf9c6219d4da66c0ee1bcd911295f87d33fab29165049095de65542efbb1165d33999f.mp3

…

Topic		Replies	Views
Older English dataset question Common Voice dataset	6	1526	June 15, 2021
Pre Release Data vs Latest Release Data Common Voice dataset	1	481	April 2, 2019
Common Voice Dataset Release - Mid Year 2020 Common Voice announcements	15	24362	August 21, 2020
Dowloading updated common voice data Common Voice	3	538	December 17, 2018
Where is the documentation regarding Mozilla Common Voice database? Common Voice issue	9	2824	January 24, 2022

Looking for Common Voice Corpus English before 2019-02-25 (v1) release

Related topics