Not working for checking non-finite loss files?

checking non-finite loss files

Thanks for added this feature in DeepSpeech latest branch.

I tried this concept for v0.5.1 master and i modified lightly for feeding.py, DeepSpeech.py and evaluate.py.

feeding.py

def batch_fn(wav_filenames, features, features_len, transcripts):
        wav_filenames = wav_filenames.batch(batch_size)
        return tf.data.Dataset.zip((wav_filenames, features, transcripts))

DeepSpeech.py

non_finite_files = tf.gather(batch_filenames, fv1.where(~tf.math.is_finite(total_loss)))
  • Have I written custom code (as opposed to running examples on an unmodified clone of the repository): Yes

  • OS Platform and Distribution: Linux Ubuntu 16.04

  • TensorFlow version (use command below): 1.13.1

  • Python version: 3.6.5

  • CUDA/cuDNN version:Cuda 10.0

  • GPU model and memory: 4 x 24 GB TITAN RTX

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

b'v1.13.1-0-g6612da8951' 1.13.1

I was given 1000 audio samples out of 100 files transcripts are wrong.

training loss and validations giving high but why it is not identify non-finite-loss files (wrong transcripts files)?

if i did any mistakes, please correct me. can someone explain me how it is working actually?

Thank you for all.

This code identifies files that cause non-finite (inf or NaN) loss, nothing else. It doesn’t identify “wrong transcripts”.

1 Like

@reuben thanks.

here my questions is?

– when it will throw inf or NaN loss?
– How can i resolve this inf or NaN loss?

Thank you :slight_smile:

It doesn’t identify “wrong transcripts”.

wrong transcripts in the sense, here some spelling mistakes and mispronunciations.

like youtube datasets segmented with corresponding subtitle transcript.

I don’t know exactly when, in my experience it tends to be when files have very long transcripts, taking almost as many characters as there are audio windows. You can resolve it by double checking if there isn’t anything weird happening, or worst case simply removing the file from your dataset.

1 Like

Thanks @reuben

in this worse only i am following. if i will find out Inf or NaN files.

Thank you.