Different WER and CER for same model on different hardware

willem.ropke · August 14, 2019, 10:16am

Hello everyone,

I am training a model with 512 hidden nodes, a learning rate of 0.0001 and dropout of 0.2. The problem that I am having is when training on my current hardware I am getting very high WER and CER. However when training on different hardware this is way lower. To be precise:

Hardware 1: WER = 0.868798 and CER = 0.494691
Hardware 2: WER = 0.423804 and CER = 0.277463

To make matters worse these two models were actually trained on slightly different versions. Hardware 1 is running on 0.5.0-alpha.8 and hardware 2 on 0.5.0-alpha.11.

The two models were trained on exactly the same commands and language model.

Does anyone know what might be the cause of this. Is it possible that different hardware gives different results (I would find this quite strange actually). Were there some fundamental changes between the two versions that can explain this big discrepancy?

Any help would be appreciated!

Kind regards

lissyx · August 14, 2019, 11:55am

Please reproduce with the same current master on both. That set of changes encompasses huge changes to the CTC decoder …

willem.ropke · August 14, 2019, 1:09pm

Would the changes to the CTC decoder actually impact the performance of the model or only the observed WER and CER?

lissyx · August 15, 2019, 10:10am

If you read the code, you will see that loss computation directly depends on CTC, in calculate_mean_edit_distance_and_loss. So I’ll let you draw the conclusion.

Edit: never mind, I misread the code, while searching for ctc, the ctc_loss call was a hit but I misread that for ctc_beam_search_decoder. That’s what you get when replying busy keeping an eye on a small child …

reuben · August 14, 2019, 4:57pm

The decoder does not affect the training process itself, just inference results.

willem.ropke · August 15, 2019, 10:23am

What I meant by affecting the training process is that if the two models created by the two different versions would differ in real world performance or only on test results.

By your and lissyx’s answers I suspect that the quality and thus real world performance of the model would also be impacted.

In any case thank you for the answers, it has cleared things up a lot!