Hi everyone,
The result of the trained model is good. I trained the model with over 1000 times and the test function matched my expectation. The loss function is lower than 10. However, the result is not desired by using microphone input with web_microphone_websocket.
I suspect that the main reason is the training progress contained the Connectionist Temporal Classification(CTC) decoder and the microphone websocket don’t. So, it makes the difference.
When I speak one word e.g. one for 3 seconds, the web socket returns 8 to 10 words.
When I speak one word e.g. one for 0.5 seconds, the web socket returns 2-3 words.
So, how can I insert the CTC decoder to process the microphone input?
Thank you.