How to train DeepSpeech that something ISN'T speech?

dabinat · July 19, 2019, 5:46pm

I’ve noticed that DeepSpeech will sometimes interpret instrumental music, sound effects and animal noises as speech. How I can I tell it that something isn’t speech - would I just provide a blank transcript with the clip?

Or is it not necessary - will DeepSpeech get better at interpreting this as it gets more samples of what speech actually is?

kdavis · July 19, 2019, 5:53pm

We have, without the express purpose of what your are suggesting, done a bit of this.

One of our data sets, Fisher, has lots of “ummms”, “ahhhs”, and “hmmms” in it and we didn’t give deep speech the transcript for such disfluencies. But it did have the transcript of the surrounding fluent speech. So it tends not to transcribe such disfluencies.

So you could do the same by adding background noise, such as music or animal noises, to your standard training data then train or fine tune on that data set.

dabinat · July 19, 2019, 10:14pm

Thanks, I’ll just extend the clip so it contains actual speech and transcribe that.

Topic		Replies	Views
Training DeepSpeech on (near) silence? DeepSpeech	3	272	August 31, 2020
Training DeepSpeech on Silence? TTS (Text-to-Speech)	1	485	August 29, 2020
How to prepare training data and test data for deepspeech speech to text DeepSpeech learning	1	1583	December 15, 2017
Support for audios with background music DeepSpeech	0	469	February 13, 2018
How does DeepSpeech discriminate between speech-music? DeepSpeech	2	1668	December 19, 2017

How to train DeepSpeech that something ISN'T speech?

Related topics