How does DeepSpeech discriminate between speech-music?

vivek.mangipudi13 · December 19, 2017, 6:42am

Say I’m recording a radio DJ, and the final results in an audio file contain:
music — some music — speech/voice – music— speech — speech — speech — hold music — end of audio
(here speech-music is assumed to be non overlapping or marginally overlapping.)

Q1. How do I ignore the non speech and extract only the speech portions of the audio?
i.e I want my final audio to have only the speech portions.

Q2. How does deep speech currently handle music when doing speech-recognition??

Q3. is there any pre trained model for detecting the onset of speech or portions of speech in an audio?

reuben · December 19, 2017, 9:14am

Use a Voice Activity Detection (VAD) tool.
It doesn’t. Transcription results for music will not make sense.
There are several available VAD tools. The WebRTC project has one, for example. There’s a topic here on Discourse that mentions other tools, but I don’t remember where it is.

yv001 · December 19, 2017, 9:41am

some VAD tools are mentioned here Longer audio files with Deep Speech

Topic		Replies	Views
Deepspeech silence detection DeepSpeech	2	1100	August 24, 2018
How to prepare training data and test data for deepspeech speech to text DeepSpeech learning	1	1586	December 15, 2017
Support for audios with background music DeepSpeech	0	469	February 13, 2018
Preprocessing, Silence, Lyric Recognition DeepSpeech	0	345	April 10, 2019
How to train DeepSpeech that something ISN'T speech? DeepSpeech	2	464	July 19, 2019

How does DeepSpeech discriminate between speech-music?

Related topics