Tagging an audio clip as whisper/voice

swordfish · October 29, 2019, 3:10pm

There has been some research into different modes of speaking: solo speech, whispered speech, nam speech, etc. Currently, Common Voice discards whispered data.

Why this is important: There are profound (acoustic characteristics) differences between whispered and normal speech [1]. With whisper input, the model might learn a better representation of human voice. There exist a few datasets for whispered input (CHAINS, wTIMIT, wMRT) - but they are quite small and limited number of speakers.

Bottomline: Should we instead just create a ‘whisper’ tag and collect that data?

nukeador · October 30, 2019, 4:46pm

I think this would be a question for #deep-speech team, if this is a requirement from their side we will adapt how we tag and collect data over Common Voice.

Topic		Replies	Views
Special tags in spontaneous speech mode Common Voice participation , spontaneous-speech , guidelines	3	27	March 11, 2026
Ayuda a crear el primer objetivo segmentado de Common Voice Español (es)	7	4306	May 29, 2020
Let's talk about OpenAI Whisper and about colaborations with STT systems in general Common Voice	8	3014	June 3, 2024
Bias against accented speech from voting instead of transcribing Common Voice	9	884	February 3, 2023
Submitting text to be voiced and collecting submitted audio data? Common Voice	1	983	December 4, 2017

Tagging an audio clip as whisper/voice

Related topics