Preprocessing, Silence, Lyric Recognition

nganferneejoan · April 10, 2019, 12:34pm

Hello! My schoolmates and I are working on a group project using DeepSpeech, we want to ask a couple of questions.

What are the operations that deepspeech does for preprocessing the audio files?
Will silence in the front and end of an audio file affect the package’s ability to do inference?
The main objective of our project is to be able to recognize sung notes (especially solfege). For example, someone says “do re mi fa re do”. We want to be able to get the exact thing that person said without consideration of the actual note the person sang. (This means that they say ‘do’ but sang ‘la’ - we want this ‘do’) Is this python package suitable for such a use case? If not, what are some suggested packages for this kind of project?

Thanks a bunch!

Topic		Replies	Views
Support for audios with background music DeepSpeech	0	461	February 13, 2018
Training DeepSpeech on (near) silence? DeepSpeech	3	258	August 31, 2020
Do we need to do any noice cancellation or pre-emphasis before feeding to deepspeech DeepSpeech	1	523	February 13, 2018
Add support for Real-time Noise cancellation in all DeepSpeech Inference Examples (Feature Request) DeepSpeech	4	623	April 2, 2021
Can we do speaker diarization using DeepSpeech DeepSpeech	0	1322	February 7, 2018