I am trying to include BatchNormalization layers after dense layers, much like the paper DeepSpeech2 implements (next I am going to implement BN on the LSTMs). In summay, I am using the following implementation of batchnorm inside the BiRNN
definition:
layer_1 = tf.layers.batch_normalization(inputs=layer_1, momentum=FLAGS.batchnorm_momentum, training=training_ph)
where training_ph
is a boolean placeholder set to True
during training, and False
otherwise; and batchnorm_momentum=0.9
. I’ve also passing the updateops for session during training (on line https://github.com/mozilla/DeepSpeech/blob/master/DeepSpeech.py#L1637):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
_, _, current_step, batch_loss, batch_report = session.run([train_op, update_ops ,global_step, loss, report_params], **extra_params)
Now to the problem: it seems that it works on training (loss going down), but it seems bugged in validation. As batchnorm is behaving differently for training and validation, the training_ph
appears to be correct, and I suspect that the params of batchnorm (moving_mean
and moving_variance
) are not being properly updated.
Ps: I tried to check the branch of ‘bnlstm’, but it didn’t help me to fix this. it seems that @reuben was having some trouble making BN work too, was he capable of using it in the end?