Can DeepSpeech process dual channel audio and convert it into text

samir_panda · September 16, 2020, 5:33am

Hi
I would like to convert call center conversation recordings into text. It works fine for mono channel audio.

But could not understand how to process dual channel audios? I could not find anything in documentation, hence wondering whether it support dual channel audio or not?

Please help me in this regard?

othiele · September 16, 2020, 9:02am

DeepSpeech takes the mono input and converts it into a MFCC (look it up) and basically processes that. So if you have dual input you need to convert it into a MFCC representation. I would suggest you either use ffmpeg or sth to convert it to mono or you process each track separately. The meta data will even tell you when speech occurred so you could reconstruct a conversation.

samir_panda · September 16, 2020, 3:57pm

Thanks…Can i pass the two audio file(obtained from 2 channel from two different speaker) at a time?

othiele · September 16, 2020, 6:31pm

No, you would need 2 instances of DeepSpeech or combine both to mono.