Reliability of Metadata.confidence

I’ve found the confidence to be nowhere near reliable. Even when the transcription was spot on, the confidence often says this was 99% or 0% seemingly random. I understand correctly that the number given is the logit value and to get an actual usable probability/confidence I calculate:
exp(confidence) / (1+ exp(confidence) right?
It also differs a lot when using the default language Model in contrast to using a custom language model with just 20 vocabularies. With the custom model I get more normal results but with the default model its almost always 0% even if the transcription was good.

Is this normal?

A bit late, but I also had a look into it, and according to docu: https://deepspeech.readthedocs.io/en/v0.6.1/Python-API.html#metadata it is the sum of logits, so you should first divide it by the number of samples (time in secs * 50, because we have 20ms samples, 50 in a second) and then try to calculate the probability. But still, this result has to be taken with a grain of salt as reuben has pointed out: How to normalize confidence values?. Just use logits to compare the transcripts, the value is monotonic.