Longer audio files with Deep Speech

jehoshua · January 28, 2018, 2:36am

I tested the Python one you mentioned ( GitHub - wiseman/py-webrtcvad: Python interface to the WebRTC Voice Activity Detector ) today, and had a small WAV of 55 words, 275 characters and length 19.968 seconds

It split the WAV into 5 WAV files

00:00:01.59 — 3 words
00:00:06.87 — 21 words
00:00:03.63 — 12 words
00:00:03.33 - 10 words
00:00:02.43 — 9 words

and playing all the files in the same sequence as the original WAV, there appears to be no word truncation. I don’t understand the aggressiveness setting, and simply ran it with a ‘0’

As I was matching up the (real) transcript with the audio from this python tool, also found that where sentences ended, the audio also ended. Possibly it is picking up the additional pause/wait at the end of a sentence ?

The total length of the 5 WAV audios is 17.85 seconds, where the original was 19.968 seconds. What was ‘dropped’ was obviousy noise and not speech.

hth

Topic		Replies	Views
Text produced has long strings of words with no spaces DeepSpeech	22	4015	April 30, 2018
Can DeepSpeech process longer audio files? DeepSpeech	5	6418	December 18, 2019
DeepSpeech generates long nonsense tokens as output DeepSpeech	1	600	July 3, 2018
Running inference on long audio files (30-45 minutes) sampled at 44.1kHz with DeepSpeech 0.7.0 DeepSpeech	8	1986	May 10, 2020
Why 5s audio? DeepSpeech	4	478	June 26, 2019

Longer audio files with Deep Speech

Related topics