@fyters Thank you for your comments. As a background, I was skimming through a 2018 paper on Audio Adversarial Examples: Targeted Attacks on Speech-to-Text which mentions the usage of Mozilla Common
Voice. So I wondered what exact data they are referring to.
Evaluation Benchmark. To evaluate the effectiveness of our attack, we construct targeted audio adversarial examples on the first 100 test instances of the Mozilla Common Voice dataset.
Having explained the background, I was searching and found this post Older English dataset question. I wish I ran into this post earlier but there seems to be an older database with file name of sample-000000.mp3, sample-000001.mp3, etc. This is different from the file names of Common Voice Corpus 1 English which is 128 letters long followed by .mp3 extension.
- 0000a0f45a2a9ca26455c76d7abfe5992806f8ad0f014a18616fb7dda86c508753765e61697993e5d2a0d9e2fab52a822b31ed5c3f7f3e5bc37495453f6b335f.mp3
- 0000a1804c153bbb8cc5360a0b59a4818e7b4639e8948794af5eb2f725bf9c6219d4da66c0ee1bcd911295f87d33fab29165049095de65542efbb1165d33999f.mp3
- …