Speech cutter

Hi, I’ve wrote a tool to align speech, the idea is to use the current text cleaning tools to create a senteces for a chapter downloaded from librivox, then with the sentences we spot them in the audio of the chapter using the Windows speech recognition, if there’s a match with a specific confidence value the offset of the audio and the duration of the speech are passed to ffmpeg to split the audio.

Would be amazing to get any suggestions to improve it.

Hope it helps.

So basically you take an audiobook audio and the book text and use Windows speech recognition define where a sentence starts and end?

This is very interesting, specially because Librivox material is all public domain.

@josh_meyer maybe this is something we can automate to create a dataset with text-audio pairs that can be used by Deep Speech?

The only issue I see is that most books are just read by one person, and we really need a lot of diverse voices to train the algorithm.

Yes

:confused: yes, for example Spanish

@josh_meyer what do you think about this idea?