I need some clarification on ignore-longer-outputs-than-inputs flag

Sushantmkarande · June 17, 2019, 4:45am

@kdavis @reuben I was training data I scraped from youtube and its cc aka vtt aka subtitle as transcript on deepspeech 0.5.0 model when I get this error.

Not enough time for target transition sequence (required: 102, available: 0)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs

I gave ignore_longer_outputs_than_inputs=True this flag in tf.nn.ctc_loss and model started training again but I need some clarification on this.

what does it mean?..

why i get this error… it might be true that my transcript is not 100% match to audio but I remember giving this model completely wrong transcript and it still trained on it,
and how to know how many training sample its ignoring after giving this flag. what if its skipping over all of the sample because I am not seeing even slightest effect on model after training all day…

lissyx · June 18, 2019, 6:36pm

So far there’s no better solution than either filtering on min / max length and / or do some binary search to find offending samples.

Sushantmkarande · June 20, 2019, 5:31am

how do i filter on min/max length. Sry I did not fully understand that.
how do i find offending samples error do not specify anything about on which sample it is stuck…

reuben · June 20, 2019, 12:26pm

You can look at the data directly. If the audio is too short for its transcript, it won’t work. Audio windows have a 20ms step between them, so to get the number of windows from an audio file you can just divide its duration by 20ms, and then compare that with the length of the transcript.

SamahZaro · August 12, 2019, 3:35am

Good answer. However, the CTC loss calculation, as far as I know, adds blank character ‘-’ between repetitive characters of the transcript or something like this… this will make comparing with the length of the transcript just an indicator but not accurate. @reuben, what do you think?

reuben · August 12, 2019, 5:08am

I don’t think CTC blanks are relevant here.

agarwalaashish20 · September 14, 2019, 3:00pm

@reuben, @lissyx : I am using Deep Speech v0.5.0, and I am also encountering this error. I have set ignore_longer_outputs_than_inputs=True

total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

Now, when I run the training my Training Loss is always infinity. Kindly guide, how to resolve it?

agarwalaashish20 · November 26, 2019, 4:17pm

@lissyx, could you please help on the above issue. Even after setting the flag, it didn’t work.

The training loss is inf and validation loss is decreasing. I am using German-Mailabs dataset.

Andreea_Georgiana_Sarca · June 17, 2020, 3:04pm

Hello!

I started training the system with another dataset, the training worked well (I allowed it to train for about 100 epochs to see if everything works fine), but when it started the testing I also got the wonderful error:

"tensorflow.python.framework.errors_impl.InvalidArgumentError: Not enough time for target transition sequence (required: 20, available: 17)1You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
[[{{node CTCLoss}}]] "
I need to mention that the flag ignore_longer_outputs_than_inputs=True was already added to DeepSpeech.py / for the ctc_loss and I still get the error ! ! !

Any ideas?

othiele · June 17, 2020, 3:11pm

I guess this could well be a new thread, but if I remember correctly, this is a different line in the DeepSpeech.py script?

Anyway, this means you have an input that probably doesn’t match the transcript. Are you sure about your data? Especially for testing this could be a problem.

Andreea_Georgiana_Sarca · June 17, 2020, 3:25pm

In DeepSpeech.py I modified and added the flag, because without it the training wouldn’t start:

Compute the CTC loss using TensorFlow’s `ctc_loss`

total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, **ignore_longer_outputs_than_inputs=True**)

As I mentioned before, the error occurs again when it starts to test and it asks me again to set the flag that is already there.

Unfortunately I can’t say that I am sure of my data, because it was provided by another University that used it for other things (not speech recognition). For testing there are around 27000 audio files and I randomly checked some of them by listening if they match with the transcript and I didn’t notice any problem, probably somewhere in the dataset it is a mistake

lissyx · June 17, 2020, 3:26pm

Please search on the forum, this is already extensively documented as a data-level issue.

othiele · June 17, 2020, 4:12pm

Write a small script that checks transcription length vs. audio length in the csv. Then check outliers manually.

carlfm01 · June 17, 2020, 6:06pm

A long time ago this solved my issue: https://github.com/mozilla/DeepSpeech/issues/1629#issuecomment-436864418

What works for me to know the quality of the set is the usage of ignore_longer_outputs_than_inputs on evaluate and the flag test_output_file to generate a json file sorted using the loss, then I use my own .NET app to listen to the worst examples from the generated json.

carlfm01 · June 17, 2020, 6:08pm

Never mind audiofile_to_input_vector is no longer a thing

michalis_p · September 9, 2021, 8:22am

I am having the same error. I have put the flag on the train.py and evaluation.py files but still get the same error.
for the train.py I have put it as:
total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

and the same for the evaluation.py.

What else could affect the CTC and gives the same mistake.
I have checked the transcripts and the audio files and they don’t have non ASCII characters.

jaychandra · March 3, 2023, 11:38am

hey, I am having the error .how to solve this issue?

lissyx · March 3, 2023, 1:57pm

We should never have accepted this workaround, and we should have named it --my-data-is-bogus-but-i-rather-prefer-generating-broken-models-than-fixing-it

I need some clarification on ignore-longer-outputs-than-inputs flag

Compute the CTC loss using TensorFlow’s ctc_loss

Compute the CTC loss using TensorFlow’s `ctc_loss`