Can i use an unsupported flag in DeepSpeech?

I am training a model with a significant amount of data ( about 260 hours - 215k files) . Even-though it may not be enough for the end result when i start my training after a few minutes i get the error of “Not enough time for target transition sequence” … “You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs”

@lissyx @reuben Is there a way to use this flag ?
Searching manually for that file (or files) is almost impossible .

Thank you.

Why ? On each and every dataset where this occurred, doing some basic filtering checks to ensure to have a rough match between audio and text helped and fixed the issue.

Well you have to apply that locally, so technically that makes you having to maintain a fork, which is never fun.

1 Like

lissyx is right, you should look into your data.

But if you just want to do a test run, change line 228 of Deepspeech.py to

total_loss = tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)

and line 70 of evaluate.py to

sequence_length=batch_x_len,
ignore_longer_outputs_than_inputs=True)

That’s for the 0.6 release.

It shouldn’t be that hard to identify the problematic files. Plot a histogram of WAV duration (easily computable from the file size) divided by transcript length for your training set and then look at the outliers.

For example, using pandas:

import pandas
df = pandas.read_csv('train.csv')
duration = (df['wav_filesize']-44)/16000/2 # assumes 16 kHz, 16-bits per sample
transcript_len = df['transcript'].str.len()
ratio = duration / transcript_len
ratio.hist(bins=200)

@lissyx, @othiele and @reuben Thank you for the fast replies.
I really appreciate it. :slight_smile:
@othiele Maybe i should have mentioned it that this is for a test run.
You saved me quite some time with the solution provided

1 Like

Yeah, for a test run, just hack it like suggested. Once you get more serious, you should really just fix your dataset or act at importer level to eradicate those broken components.