Can DeepSpeech process longer audio files?

lissyx · January 23, 2018, 5:59am

Regarding longer audio files, you might have hints in Longer audio files with Deep Speech. Long story short, the actual design with bidirectional recurrent layers requires us to have full knowledge of the audio we want to decode.

Regarding the cancelling for long audio, that feels like a good idea but then it means more questions: where do we draw the line? And moreover, it’s not just based on the audio length itself, it also depends on your hardware, and it might be very very different.

Said otherwise, given it’s alpha software, I think we should not spend our time on this kind of workaround and instead:

optimize the network for requiring less resources
enable the system to be streamable

That being said, if you want to submit a workaround doing this kind of limitation, we’ll be happy to help and review your patches

Topic		Replies	Views
Longer audio files with Deep Speech DeepSpeech	12	12080	November 21, 2019
Transcribing longer audio files DeepSpeech	17	2685	February 28, 2023
Audio files for Deepspeech DeepSpeech	1	443	June 24, 2019
Running inference on long audio files (30-45 minutes) sampled at 44.1kHz with DeepSpeech 0.7.0 DeepSpeech	8	1999	May 10, 2020
DeepSpeech training with large files DeepSpeech	6	1026	June 23, 2019

Can DeepSpeech process longer audio files?

Related topics