How to normalize confidence values?

GregorM · November 22, 2019, 4:08pm

Confidence from the metadata of the native client JSON call seems to be: “roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.”

So, in order to get values between 0 … 1 should I divide by the number of alphabet characters? And will this be above 90% for a “good” translation?

Any ideas, comments? What are typical good values?

GregorM · November 27, 2019, 9:11am

someone? would be great to have some sort of input

reuben · November 27, 2019, 3:49pm

You should only use it relatively, to compare against other confidence values. There’s no need to convert it to an absolute scale, and there’s no “good transcription” cutoff point.

GregorM · November 28, 2019, 9:12am

Thanks, that helps. I guess the big players have their test data and “normalize” their values accordingly to 0…1