Per letter probability

Piero_Volante · February 7, 2021, 3:37pm

Hi! Does anyone know how to get the probability per letter when you inference a sentence in DeepSpeech? for example if I have:
“The sky is blue”

T = 0.7, D = 0.2, H = 0.1 … (the sum over the alphabet = 1)
H = 0.9 …
E = 0.8 …
…

lissyx · February 7, 2021, 4:11pm

Have you had a look at the API? And especially Metadata data structure?

Piero_Volante · February 7, 2021, 4:50pm

Yes, I have already checked, but I am only able to derive the probability by word

lissyx · February 7, 2021, 5:25pm

The Metadata data structure should expose character-level information. Here in Python we leverage that to get back to the word-level: https://github.com/mozilla/DeepSpeech/blob/master/native_client/python/client.py#L38-L68, but the raw data you get access to either from C API or bindings should be character-level.

reuben · February 7, 2021, 7:25pm

Currently we only expose per candidate transcript confidence scores. Per letter scores are possible to be exposed, and someone was working on that a while ago but sadly we never got a pull request.

lissyx · February 8, 2021, 8:42am

Right, I was not looking carefully, confidence is attached to CandidateTranscript which has a set of Tokens, not a single one Sorry about the confusion @Piero_Volante

Topic		Replies	Views
How to obtain probabilities of each character DeepSpeech	4	540	July 24, 2020
Per word confidence DeepSpeech	9	1678	April 9, 2019
Obtain per-word confidence score DeepSpeech	1	1043	September 12, 2019
How to get the confidence level of the words during prediction? DeepSpeech	1	484	February 14, 2020
Simple way to get at raw probabilities/logits via python bindings? DeepSpeech	3	866	October 11, 2018

Per letter probability

Related topics