Can DeepSpeech process longer audio files?

Regarding longer audio files, you might have hints in Longer audio files with Deep Speech. Long story short, the actual design with bidirectional recurrent layers requires us to have full knowledge of the audio we want to decode.

Thanks, yes I did read through that post before starting this thread, but that was more about how to train, not an answer as such, about can ‘Deepspeech process (produce a transcripton) longer audio files’. The audios we have are from between 44 minutes and 1 hr 18 miutes. If I deployed the solution in that thread about cutting up just one audio, I would need 936 wav files for DeepSpeech to do the training.

Even if I did that for 1 audio (and there are hundreds), what will the expected output be. The accuracy of the transcript ? I calculated the WER of a 19 second audio transcript (output from DeepSpeech) and the error rate was about 46 %

Sure, if I spent the time building a specific model just for this speaker, that makes sense. But will I be able to run DeepSpeech on that computer after the training has been done ? Or will it consume so much resources that the computer freezes ? More hard drive damage ?

Regarding the cancelling for long audio, that feels like a good idea but then it means more questions: where do we draw the line? And moreover, it’s not just based on the audio length itself, it also depends on your hardware, and it might be very very different.

Yes, good point. How about enabling Ctrl-C at least ?

I’m wondering if I should just learn to touch type to produce the transcriptions. :wink: