Confidence of STT transcription

Tortoise · November 25, 2020, 2:15pm

I am using deepspeech 0.6.1 which is until now providing the best transcription.
I tried to use 0.8.2 but maybe due to the reason the model scorer building is different, and I I was not able to build correct lm and scorer until now, I could not get the results comparable.

`My question is`

`Is it possible to get the confidence value for STT transcription while transcription and alignment? My deepspeech = 0.6.1 and DSAlign is also 0.6.1 version.

I mean the confidence value for each fragment? for deepspeech 0.6.1 with DSAlign.
Please guide.

othiele · November 25, 2020, 2:22pm

As far as I know the current DSAlign can only be run with the newer models >= 0.7, but it is a bit older so you could use an older commit. Check the repo. But as Tilman is no longer on the project, you won’t have any support.

Tortoise · November 25, 2020, 2:28pm

You are right. Just question is about the confidence/? I have the old version as well. Just the change between is logistics.
Newer version can provide confidence?

othiele · November 25, 2020, 2:44pm

Don’t know got to try yourself.

lissyx · November 25, 2020, 4:23pm

No, we can’t provide features from newer version on older.

lissyx · November 25, 2020, 4:25pm

Help yourself, read the documentation of the API: https://deepspeech.readthedocs.io/en/v0.9.1/Structs.html#_CPPv4N19CandidateTranscript10confidenceE

We are happy to try and support people to the extents of our availability, but please help us by extensively checking the documentation first.

Basically, searching for “confidence” would have been a good start: https://deepspeech.readthedocs.io/en/v0.9.1/search.html?q=confidence&check_keywords=yes&area=default

Tortoise · November 26, 2020, 12:52pm

Dear Friends. This I know. Thank you again.
OK, in easy words.

How to call this confidence value while transcribing because there is no flag I can see with deepspeech flags. If you can guide. So that I can get time start time end and transcript with confidence for that fragment.

lissyx · November 26, 2020, 12:59pm

As said multiple times and linked by the doc, this is exposed in the metadata of the API.
Some CLI clients provides --json but you should rather use the bindings of your language, if it exists, instead of relying on executing third-party binary like that. But again, since you don’t share details on what you are using, we can’t help efficiently.

Tortoise · November 26, 2020, 1:12pm

I am transcribing the wav file as below with DSAlign and deepspeech (0.7.1)

./bin/align.sh --audio /home/data/audio.wav --script /home/data/raw_empty.txt --aligned /home/data/text_aligned.json --tlog /home/data/text_log.json --output-pretty
where I get the output as below

    {
        "duration": 5.06,
        "start": 0.0,
        "end": 5.06,
        "transcript": "I am a boy"
    }

But I want like this:

    {
        "duration": 5.06,
        "start": 0.0,
        "end": 5.06,
        "confidence": "0.95"
        "transcript": "I am a boy"
    }

each transcription should have its own confidence value from None or 0.00 to 1.00 depends on the audio fragment quality. Maybe now you get my point.

lissyx · November 26, 2020, 1:19pm

where is the code that produces it?

Tortoise · November 26, 2020, 1:24pm

It is under the link

https://github.com/mozilla/DSAlign

lissyx · November 26, 2020, 1:40pm

Then you need to change DSAlign.

Tortoise · November 27, 2020, 10:23am

Thank you. I got it. I tried and successful with your guide.
The confidence I am getting is for each single wav. file one time at the end of transcript. Is it possible to get it for each word? Like now what I tested is I am getting for each audio file one time at the end of transcript. I want to check how much accurate / confidence is each word.

lissyx · November 27, 2020, 11:02am

Please explore the documentation of the Metadata structure.

Topic		Replies	Views
Obtain per-word confidence score DeepSpeech	1	1047	September 12, 2019
Reliability of Metadata.confidence DeepSpeech	1	455	March 9, 2020
How to obtain probabilities of each character DeepSpeech	4	548	July 24, 2020
Metadata for python doesn't have confidence DeepSpeech	5	815	October 18, 2019
Scoring or evaluation inference of Model trained with DeepSpeech DeepSpeech	12	2362	May 8, 2019

Confidence of STT transcription

Related topics