I trained a DeepSpeech model with the same training set and configuration twice but obtained different WERs that deviated by 3%. I checked flags.py and saw that the random_seed flag has been set to a certain value by default.
Is there something that I’m missing, or are the results not reproducible? How can I ensure that I’m getting reliable results?
Your help would be much appreciated!