I am training a model with 512 hidden nodes, a learning rate of 0.0001 and dropout of 0.2. The problem that I am having is when training on my current hardware I am getting very high WER and CER. However when training on different hardware this is way lower. To be precise:
Hardware 1: WER = 0.868798 and CER = 0.494691
Hardware 2: WER = 0.423804 and CER = 0.277463
To make matters worse these two models were actually trained on slightly different versions. Hardware 1 is running on 0.5.0-alpha.8 and hardware 2 on 0.5.0-alpha.11.
The two models were trained on exactly the same commands and language model.
Does anyone know what might be the cause of this. Is it possible that different hardware gives different results (I would find this quite strange actually). Were there some fundamental changes between the two versions that can explain this big discrepancy?
Any help would be appreciated!