Can DeepSpeech process dual channel audio and convert it into text

I would like to convert call center conversation recordings into text. It works fine for mono channel audio.

But could not understand how to process dual channel audios? I could not find anything in documentation, hence wondering whether it support dual channel audio or not?

Please help me in this regard?

1 Like

DeepSpeech takes the mono input and converts it into a MFCC (look it up) and basically processes that. So if you have dual input you need to convert it into a MFCC representation. I would suggest you either use ffmpeg or sth to convert it to mono or you process each track separately. The meta data will even tell you when speech occurred so you could reconstruct a conversation.

1 Like

Thanks…Can i pass the two audio file(obtained from 2 channel from two different speaker) at a time?

No, you would need 2 instances of DeepSpeech or combine both to mono.