Can DeepSpeech process longer audio files?

lissyx · January 24, 2018, 8:13am

Well, the thread I pointed you at contains an answer from Kelly, who explicitely documents that because of the architecture of the network and because of the current training dataset, processing very long audio will likely not work as expected :).

Regarding the accuracy, there are a number of other factors. You say 46% on 19 secs audio, that feels not expected, but it might depend on a lot of stuff: the dataset we have makes the model behave erratically if you don’t really have american english clean sound. It could also be mic interferences …

Besides, sorry, but DeepSpeech does not “damage” your computer. It’s computationnally intensive, but we know that, and again, we are working on that. But all of that takes time to accomplish properly. If you train for a specific speaker, I would suggest taking a look at TUTORIAL : How I trained a specific french model to control my robot where Vincent produced a model dedicated to himself, smaller and running good on NVIDIA GPU for his robot. It’s not magic, he has been able to produce enough audio data to train seriously, but he also reduced the model size, making it much smaller, and thus much less computationnally intensive. There’s a balance between the generalization capabilities of the model and its complexity.

For CTRL+C, I guess you are referring to Python binding? It’s likely the usual mess of Python and threads, not even sure we can do something to that.

Topic		Replies	Views
Longer audio files with Deep Speech DeepSpeech	12	12067	November 21, 2019
Transcribing longer audio files DeepSpeech	17	2682	February 28, 2023
Audio files for Deepspeech DeepSpeech	1	441	June 24, 2019
Running inference on long audio files (30-45 minutes) sampled at 44.1kHz with DeepSpeech 0.7.0 DeepSpeech	8	1993	May 10, 2020
DeepSpeech training with large files DeepSpeech	6	1025	June 23, 2019

Can DeepSpeech process longer audio files?

Related topics