Is there a method to keep the input order of the evaluate.py test data? or to return the file path at least if order is not possible within the pool.map fn?
I think this is what is described in this issue: Add wav_filename to result set in evaluation.py · Issue #2180 · mozilla/DeepSpeech · GitHub but nobody has had time yet to work on it. If you want, you can send patches
Thanks for pointing this out, @lissyx It looks like the sorting happens in few places;
- create_dataset() in ./util/feeding.py where df is sorted by wav_filesize.
- calculate_report() in ./util/evaluate_tools.py where samples is sorted by loss then wer.
My untested hypothesis: If I drop these sorting operations when running evaluate.py, inference will occur in the same order as my input test.csv.
Update: that appears to do the trick, then merge however.
df = pd.DataFrame({
'GroundTruth': ground_truths,
'Prediction' : predictions,
'loss' : losses
})
input_csv = pd.read_csv(str(FLAGS.test_files))
df = pd.concat([input_csv, df], axis=1)
df.to_csv(...)
I’ll give that a shot when I have a moment.