I’ve been working on a small project involving Python and long audio files but have been unable to track down Python-package-specific documentation (ala https://pillow.readthedocs.io/en/5.3.x/ ) related to non-streaming use cases. Is there a standard method/class/process for processing larger audio files that doesn’t involve slicing them up into smaller ones?
I read a few articles mentioning the splicing method but this is difficult with my use case and the articles are months/years old and don’t take into account this development which mentions more efficient processing of longer files.
Any help you can provide would be much appreciated!
My demo was was a 30-60 second clip (part of a 1hr clip I planned on eventually converting). It ran for about 30 minutes before I cut it off. I can make up a snippet similar to my code if you think it would be useful.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
Wait, a 30-60 seconds file was not decoded after 30 mins? Can you share details on the hardware ?
This worked! However, I got this result from my short audio file:
“een go at in e an e same is were er to o in tejest bused e bro o or litre an per andifolworesware o t al e o as aner reaerything is bid some min o so o o o e o la oro ah e or ee ingle is ateou ea to his head e ant oo the bot te o hii a se is bo i weo reb e be a arebut the a e o a o bo tha wy om back tothe oher”
Using release 0.3.0 models and packages I’m getting jibberish. Though my input has noise and reverb I’m surprised by the lack of English words. Is this to be expected?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
14
It really depends on your audio sources, at some point. Can you make sure it’s 16 bits PCM, 16kHz and mono, first? Could you share some sample ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
mono, stereo? I assume it was mp3. It’s possible your conversion introduced some bad artifacts, but given the output, can we be sure of your exact setup / versions ? The full output should include those informations.
Sorry, source was stereo .m4a . I’m running the latest non-alpha (0.3.0) versions of the Python package and models. As the python code only returns a string with the transcription I’m not sure what “full output” I can provide. If you mean the CLI interface I don’t have access to that as I’m only able to run the program in a specific environment.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
20
libdeepspeech.so produces some TensorFlow/DeepSpeech version infos on stderr. We need that.