I need some clarification on ignore-longer-outputs-than-inputs flag

@kdavis @reuben I was training data I scraped from youtube and its cc aka vtt aka subtitle as transcript on deepspeech 0.5.0 model when I get this error.

Not enough time for target transition sequence (required: 102, available: 0)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

I gave ignore_longer_outputs_than_inputs=True this flag in tf.nn.ctc_loss and model started training again but I need some clarification on this.

what does it mean?..

why i get this error… it might be true that my transcript is not 100% match to audio but I remember giving this model completely wrong transcript and it still trained on it,
and how to know how many training sample its ignoring after giving this flag. what if its skipping over all of the sample because I am not seeing even slightest effect on model after training all day…

So far there’s no better solution than either filtering on min / max length and / or do some binary search to find offending samples.

how do i filter on min/max length. Sry I did not fully understand that. :roll_eyes::grimacing:
how do i find offending samples error do not specify anything about on which sample it is stuck…

You can look at the data directly. If the audio is too short for its transcript, it won’t work. Audio windows have a 20ms step between them, so to get the number of windows from an audio file you can just divide its duration by 20ms, and then compare that with the length of the transcript.

1 Like

Good answer. However, the CTC loss calculation, as far as I know, adds blank character ‘-’ between repetitive characters of the transcript or something like this… this will make comparing with the length of the transcript just an indicator but not accurate. @reuben, what do you think?

I don’t think CTC blanks are relevant here.