I’ve been able to use a Python tool to cut a WAV into word chunks - Longer audio files with Deep Speech
The audio outputs range from 1 second to 49 seconds. How will the longer (than 3 to 5 seconds) audio lengths affect the building of a model ?
I’ve been able to use a Python tool to cut a WAV into word chunks - Longer audio files with Deep Speech
The audio outputs range from 1 second to 49 seconds. How will the longer (than 3 to 5 seconds) audio lengths affect the building of a model ?