Evaluate.py sorting

tuttlebr · July 10, 2019, 11:17pm

Is there a method to keep the input order of the evaluate.py test data? or to return the file path at least if order is not possible within the pool.map fn?

lissyx · July 11, 2019, 6:54am

I think this is what is described in this issue: https://github.com/mozilla/DeepSpeech/issues/2180 but nobody has had time yet to work on it. If you want, you can send patches

tuttlebr · July 11, 2019, 2:45pm

Thanks for pointing this out, @lissyx It looks like the sorting happens in few places;

create_dataset() in ./util/feeding.py where df is sorted by wav_filesize.
calculate_report() in ./util/evaluate_tools.py where samples is sorted by loss then wer.

My untested hypothesis: If I drop these sorting operations when running evaluate.py, inference will occur in the same order as my input test.csv.

Update: that appears to do the trick, then merge however.

df = pd.DataFrame({
                'GroundTruth': ground_truths,
                'Prediction' : predictions,
                'loss' : losses
                })

input_csv = pd.read_csv(str(FLAGS.test_files))

df = pd.concat([input_csv, df], axis=1)

df.to_csv(...)

I’ll give that a shot when I have a moment.