Time Metadata

JanX2 · December 1, 2017, 6:11pm

Even with Deep Speech being designed for/trained on sentence-length snippets, for each word for a lot of applications, it would be great to have time metadata. Maybe even per character or phoneme.

I have had a look at the code for the native client and I can’t see any obvious points where this could be bolted on or integrated. Any suggestions?

kdavis · December 1, 2017, 6:24pm

The CTC algorithm we use doesn’t lend itself, or need to, obtain “time data”, such as where a particular character or phoneme starts or ends.

However, there is some research, don’t remember the reference off the top of my head, which finds that modifications of CTC can mark (approximately) where a particular character starts.

However, as we, for our work, don’t need or require such “time data”, I doubt if we’ll get around to modifying our CTC to output “time data”.

JanX2 · December 3, 2017, 8:35pm

It would open whole new fields of applications in OSS to Deep Speech.

Topic		Replies	Views
Word/letter timestamp with deep speech DeepSpeech	13	3787	May 16, 2019
How to use Deep Speech to force aliginment? DeepSpeech	2	388	June 25, 2019
Language Model influence on word timings DeepSpeech	6	511	July 30, 2019
Using deep speech to get timestamp for each word, not only string DeepSpeech	1	2083	February 17, 2019
Is it possible to do word alignment? DeepSpeech	3	537	November 7, 2019

Time Metadata

Related topics