Printing filename in evaluate.py report

Hi,

I use evaluate.py to batch process a dataset and I want to compare how results differ when I plug different language models. To achieve this I have changed evaluate.py to output the full report (all results) as a simple CSV (just changed the print format). I plan to import the data from different CSVs as tables and join them. Unfortunately, there is no column of unique values.

The filenames would work perfectly, however I cannot figure out how to extract them from the TF iterator that consumes the dataset or how to match them to results if I manually import them as a list.

Any ideas for a straightforward solution to include filenames in the report produced by evaluate.py?

Isn’t it what you need here ? https://github.com/mozilla/DeepSpeech/issues/2180 Looks like @Tilman_Kamp has patches for that :slight_smile:

1 Like

@lissyx Put up a PR for it.

2 Likes

@dko Can you try if the patch that @Tilman_Kamp shared fixes your use-case ? If so, we would merge it then.

@Tilman_Kamp @lissyx Very thankful for this. It’s exactly what I needed.

I am using v0.5.1 and there was a small change I had to do. Line 60 of evaluate.py had to be deleted as it complained create_model() was getting an extra argument:

Original:
59 logits, _ = create_model(batch_x=batch_x,
60 batch_size=FLAGS.test_batch_size,
61 seq_length=batch_x_len,
62 dropout=no_dropout)

What worked for me on 0.5.1:
59 logits, _ = create_model(batch_x=batch_x,
60 seq_length=batch_x_len,
61 dropout=no_dropout)

(Indentation won’t show up properly on this post.)

Not sure if this is caused by version mismatch or something else.

@dko The change got merged to master.