Hi
I would like to convert call center conversation recordings into text. It works fine for mono channel audio.
But could not understand how to process dual channel audios? I could not find anything in documentation, hence wondering whether it support dual channel audio or not?
DeepSpeech takes the mono input and converts it into a MFCC (look it up) and basically processes that. So if you have dual input you need to convert it into a MFCC representation. I would suggest you either use ffmpeg or sth to convert it to mono or you process each track separately. The meta data will even tell you when speech occurred so you could reconstruct a conversation.