How to apply the CTC decoder to process the microphone input?

alansiu · September 18, 2020, 1:47am

Hi everyone,

The result of the trained model is good. I trained the model with over 1000 times and the test function matched my expectation. The loss function is lower than 10. However, the result is not desired by using microphone input with web_microphone_websocket.

I suspect that the main reason is the training progress contained the Connectionist Temporal Classification(CTC) decoder and the microphone websocket don’t. So, it makes the difference.

When I speak one word e.g. one for 3 seconds, the web socket returns 8 to 10 words.
When I speak one word e.g. one for 0.5 seconds, the web socket returns 2-3 words.

So, how can I insert the CTC decoder to process the microphone input?

Thank you.

lissyx · September 18, 2020, 7:48am

What makes you think of that? CTC is being part of the model, so it’s being used by any code relying on our model + libdeepspeech.so

it’s more likely that this example code is not super reliable in your case,

accent?
trained model?
too slow processing dropping frame?
incorrect sound capture?

lissyx · September 18, 2020, 7:49am

PS: when you reach for support, it’s nice to give a bit of context. I have no idea what you are precisely doing, so I can’t help you.

othiele · September 18, 2020, 9:26am

Examples work fine without CTCing yourself