Handling missing words in transcript?

Hello,

Are there any approaches to handling missing words in transcripts? Some of the transcripts I am working with are partially filled in. The transcriber wasn’t able to make out the word in question due to static in the original recordings (public communication audio domain) so they have been marking the items with a wildcard. In practice, is there a way to handle this in ASR or is it fully necessary to require full transcripts when training?

Thank you.

Can you explain the context? Are you talking about handling missing workds in deepspeech-generated transcript or are you refering to incomplete training data?

Incomplete training data is the issue. Not missing words via DeepSpeech. I am curious if there are state of the art methods for handling these types of blips in a ground truth transcript.

That’s really not something we have experience about. If the amount of data where some word is missing is neglictible, you might just skip those recordings and/or don’t care about the discrepancy ?

That is kinda what I figured. I wasn’t sure if anyone from the team had some advice or thoughts on overcoming these types of situations with incomplete training data. Appreciate the response though.

I am just working in a language domain where the recording quality is poor/static-filled and transcripts don’t yet exist (Air Traffic Control data in the USA) for free. So there are some efforts to transcribe this data and make sense of it but it is very challenging to get complete transcribed sentences from these data. Often, a transcriber has at least an issue making out a word or so in each sentence on 50% or so of the recordings.

I can’t speak for others, but IMHO incomplete data is just not suitable by default, except if with careful analysis you can ensure it’s not going to be problematic.

In that context, I think that being able to either augment robust training data with matching noise to trafic control context, or perform fine-tuning on a subset of valid complete data might be more efficient.

I don’t know the specifics of air trafic control, but I remember some very old feedback on this forum from someone trying to use deepspeech at inference time on that kind of data, and getting nice success with some low band filter.

Also, maybe you can improve the inference situation by using a dedicated external scorer.

DSAlign might help you extract more reliable transcript and skipping the missing words?

Thank you for the info! I have experience with low, high and bandpass filters to remove some noise from the data. If you can remember that post you referred to I would love to see it.

Searching for “air trafic” yields this november 2018 thread: Retraining for poorer-quality audio