Can DeepSpeech process longer audio files?

jehoshua · January 24, 2018, 1:51am

Regarding longer audio files, you might have hints in Longer audio files with Deep Speech. Long story short, the actual design with bidirectional recurrent layers requires us to have full knowledge of the audio we want to decode.

Thanks, yes I did read through that post before starting this thread, but that was more about how to train, not an answer as such, about can ‘Deepspeech process (produce a transcripton) longer audio files’. The audios we have are from between 44 minutes and 1 hr 18 miutes. If I deployed the solution in that thread about cutting up just one audio, I would need 936 wav files for DeepSpeech to do the training.

Even if I did that for 1 audio (and there are hundreds), what will the expected output be. The accuracy of the transcript ? I calculated the WER of a 19 second audio transcript (output from DeepSpeech) and the error rate was about 46 %

Sure, if I spent the time building a specific model just for this speaker, that makes sense. But will I be able to run DeepSpeech on that computer after the training has been done ? Or will it consume so much resources that the computer freezes ? More hard drive damage ?

Regarding the cancelling for long audio, that feels like a good idea but then it means more questions: where do we draw the line? And moreover, it’s not just based on the audio length itself, it also depends on your hardware, and it might be very very different.

Yes, good point. How about enabling Ctrl-C at least ?

I’m wondering if I should just learn to touch type to produce the transcriptions.

Topic		Replies	Views
Longer audio files with Deep Speech DeepSpeech	12	12080	November 21, 2019
Transcribing longer audio files DeepSpeech	17	2685	February 28, 2023
Audio files for Deepspeech DeepSpeech	1	443	June 24, 2019
Running inference on long audio files (30-45 minutes) sampled at 44.1kHz with DeepSpeech 0.7.0 DeepSpeech	8	1999	May 10, 2020
DeepSpeech training with large files DeepSpeech	6	1026	June 23, 2019

Can DeepSpeech process longer audio files?

Related topics