I’m trying to evaluate the pre-trained model on other datasets. What is best practice for this? Is running DeepSpeech.py
with just the --test-files
argument and using it via the --checkpoint_dir
argument? Or should push through each wav
file of the dataset with the model in inference mode and calculate the WER?
Can I assume these will result in the same final WER?
That always depends on what you want to evaluate it for
This is a good standard measurement. Be sure to use test_output_file
to store the results and maybe use a higher report_count
. Check details here.
You would do this if you need something special.
Search the forum, there was some discussion on how WER is measured, but I can just remember that it is basically the same. Obviously you should use the same method for all your test sets.
1 Like