How to prepare training data and test data for deepspeech speech to text

vivek.mangipudi13 · December 13, 2017, 8:27pm

I have a bunch of audio files, each with a “tring tring or hold music” and then the actual conversation.
What is the best way to detect and remove the the hold music and the tring tring from the audio file so that I will have clean files with just “conversation” ?

So that the speech to text transcription might be better.

elpimous_robot · December 15, 2017, 10:59pm

Hello,

Well, in fact, you should start with clean voices, and then, add modified voices, with noise add, and other deformations. (with voice-corpus-tool)

ex : you live near a noisy road;
you made a lot of clean voices,
you recorded many noisy road sounds (ex : 10s each)
you duplicated those voices, adding this noise inside. (augment param in voice-corpus-tool)

Finally, you obtain a model working in your own environnment.

Now, about cuts in your existing recs, have a look at the ‘-silence’ function of SOX
(but it’s not miraculous with noisy voice : sox will not only remove noise, it will surely remove important parts of your voice… Have a try.

Topic		Replies	Views
How does DeepSpeech discriminate between speech-music? DeepSpeech	2	1667	December 19, 2017
Add support for Real-time Noise cancellation in all DeepSpeech Inference Examples (Feature Request) DeepSpeech	4	635	April 2, 2021
Transcription having lot of spelling errors and giving wrong spaces for words DeepSpeech	5	1416	January 17, 2019
Preprocessing, Silence, Lyric Recognition DeepSpeech	0	343	April 10, 2019
Support for audios with background music DeepSpeech	0	465	February 13, 2018

How to prepare training data and test data for deepspeech speech to text

Related topics