Evaluate.py sorting

Is there a method to keep the input order of the evaluate.py test data? or to return the file path at least if order is not possible within the pool.map fn?

I think this is what is described in this issue: https://github.com/mozilla/DeepSpeech/issues/2180 but nobody has had time yet to work on it. If you want, you can send patches :slight_smile:

Thanks for pointing this out, @lissyx It looks like the sorting happens in few places;

  1. create_dataset() in ./util/feeding.py where df is sorted by wav_filesize.
  2. calculate_report() in ./util/evaluate_tools.py where samples is sorted by loss then wer.

My untested hypothesis: If I drop these sorting operations when running evaluate.py, inference will occur in the same order as my input test.csv.

Update: that appears to do the trick, then merge however.

df = pd.DataFrame({
                'GroundTruth': ground_truths,
                'Prediction' : predictions,
                'loss' : losses
                })

input_csv = pd.read_csv(str(FLAGS.test_files))

df = pd.concat([input_csv, df], axis=1)

df.to_csv(...)

I’ll give that a shot when I have a moment.