Best practice for evaluating other datasets on the pretrained model (or any model for that matter)?

Iā€™m trying to evaluate the pre-trained model on other datasets. What is best practice for this? Is running DeepSpeech.py with just the --test-files argument and using it via the --checkpoint_dir argument? Or should push through each wav file of the dataset with the model in inference mode and calculate the WER?
Can I assume these will result in the same final WER?

That always depends on what you want to evaluate it for :slight_smile:

This is a good standard measurement. Be sure to use test_output_file to store the results and maybe use a higher report_count. Check details here.

You would do this if you need something special.

Search the forum, there was some discussion on how WER is measured, but I can just remember that it is basically the same. Obviously you should use the same method for all your test sets.

1 Like