Is there any possibility of getting the alternative transcripts if I am not using the bindings? For what I have seen the sttWithMetadata bindings function is now capable of doing it, but I wanted to ask if there is any other way without the bindings.
Specifically, I am using the evaluate.py-like function for the efficient inference on the GPU and would like to also get the alternative, multiple transcripts.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
You’d have to hack the CTC decoder Python interface to try and expose them, I guess, but that feels like a lot of painful things to do.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
What is your exact need, we had some people trying to get more efficient GPU inference using the library, maybe there is somethign doable there?
Yes exactly, and if there is already a way to do it with bindings maybe there is an opportunity to reuse it. I understand that I would have to try to hack it by myself then.
Nothing really special, I am using the model for the inference using the learned model loaded from the checkpoint as recommended here: Deepspeech inference with multiple gpu - #5 by lissyx. I am reading the appearing JSON files, which list the files to process, and make the inference on them.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
5
Right but why do you rely on this setup instead of libdeepspeech.so CUDA-enabled? Do you need batching for high amount of transcriptions?
Yes, exactly. Should I use then libdeepspeech.so to achieve it? I checked the documentation for it, but it is only mentioned how to compile it. With evaluate.py it was easier because I had the specific example which I could follow.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7