How to apply the CTC decoder to process the microphone input?

Hi everyone,

The result of the trained model is good. I trained the model with over 1000 times and the test function matched my expectation. The loss function is lower than 10. However, the result is not desired by using microphone input with web_microphone_websocket.

I suspect that the main reason is the training progress contained the Connectionist Temporal Classification(CTC) decoder and the microphone websocket don’t. So, it makes the difference.

When I speak one word e.g. one for 3 seconds, the web socket returns 8 to 10 words.
When I speak one word e.g. one for 0.5 seconds, the web socket returns 2-3 words.

So, how can I insert the CTC decoder to process the microphone input?

Thank you.

What makes you think of that? CTC is being part of the model, so it’s being used by any code relying on our model + libdeepspeech.so

it’s more likely that this example code is not super reliable in your case,

  • accent?
  • trained model?
  • too slow processing dropping frame?
  • incorrect sound capture?

PS: when you reach for support, it’s nice to give a bit of context. I have no idea what you are precisely doing, so I can’t help you.

Examples work fine without CTCing yourself