Few things to improve (Common Voice) user experience and data quality. I don’t know where to send these ideas, let me post here:
A background noise is very useful for speech analysis.
Can we add from time to time (eg. every 30 utterances) recording (10s) of just a background noise?
When recording, the 10-second limit seems too low for longer prompts. How about the rule that this limit varies between 10-15 seconds and depends on the length of the text. E.g.
- Text with 5 words => limit is 10s
- Text with 12 words => limit is 12s
- Text with 15+ words = limit is 15s
When recording, showing the actual sound power level / graph (in DB) would be very helpful in seeing if the microphone has the appropriate sensitivity.
During listening, sometimes the recording is just silence.
I’ve noticed two potential reasons:
- It is a silence only - than we can have detector for that, no need to annotate. If you need help, I can provide command line tool to detect most of silence recordings.
- Web browser didn’t cached audio properly, but it lets me press “Play”. This seems to be a sign of some bug, please double check. It happens more often when I had slow connection on Android mobile phone.
Showing the sound power graph (in DB) while listening would improve the judgment if it is too quiet or beginning non-speech is quite long and actual speech is delayed.
Showing a graph of the audio power while recording would avoid a fairly common problem that people start talking a bit too fast. And the first word is truncated.