Hi all, and thanks again for this awesome project.
–
With recent changes in 0.5.1 release (and above) I’m having trouble getting access to what used to be the logits vector<float>`` during the decode_metadata method.
Once upon a time we made some development only changes to (our version of) the 0.4.x branch to add confidence estimates at the character level. This was discussed here. a bit .
This worked great, for what we were doing then, but now we want to do this again for real this time ™. Unfortunately, for various reasons, we had to wait a bit before picking up the project of pushing this into production.
Problem is, we don’t want to make our production changes in an out of date branch, and in the meantime DeepSpeech has moved on (from 0.4.x to 0.5.1, and now, 0.6.x) so merging our changes with changes from the upstream is proving quite problematic. I’d appreciate any advice on best way forward from here.
–
To be more specific…
Once upon a time, decode_metadata was handed a simple vector of logits, like this:
decode_metadata(const vector<float>& logits)
This was super handy because I could just save the probabilities (of the best guess) out directly into the MetaDataItem like this (after making relevant changes to MetaDataItem
)
items[i].probability =
logits[best.timesteps[i] * num_classes + best.tokens[i]];
Also we had some entropy
like calculations that we did based on the full vector of logits for that time step - where basically low entropy was considered high confidence, and conversely high entropy was more uncertainty in the acoustic model.
But the problem is, in the 0.5.1 release the signature has changed to
decode_metadata(DecoderState* state)
And in the master branch (aka 0.6.x) it’s just
decode_metadata()
So. My problem is… whats the best way to get back to the logits. Do I need to change decode_raw and save them out into the DecoderState or is there some way to save these values out without getting down into the ctc_decoder level?
Or at the very least preserve just the confidence of the alphabet char picked at a each step of the ctc_decoder output???
I guess I’ll try to muddle through but any suggestions on how best to proceed here much appreciated !
Thanks in advance…