Problem including BatchNorm after Dense layers

I am trying to include BatchNormalization layers after dense layers, much like the paper DeepSpeech2 implements (next I am going to implement BN on the LSTMs). In summay, I am using the following implementation of batchnorm inside the BiRNN definition:

    layer_1 = tf.layers.batch_normalization(inputs=layer_1, momentum=FLAGS.batchnorm_momentum, training=training_ph)

where training_ph is a boolean placeholder set to True during training, and False otherwise; and batchnorm_momentum=0.9. I’ve also passing the updateops for session during training (on line https://github.com/mozilla/DeepSpeech/blob/master/DeepSpeech.py#L1637):

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
 _, _, current_step, batch_loss, batch_report = session.run([train_op, update_ops ,global_step, loss, report_params], **extra_params)

Now to the problem: it seems that it works on training (loss going down), but it seems bugged in validation. As batchnorm is behaving differently for training and validation, the training_ph appears to be correct, and I suspect that the params of batchnorm (moving_mean and moving_variance) are not being properly updated.

Ps: I tried to check the branch of ‘bnlstm’, but it didn’t help me to fix this. it seems that @reuben was having some trouble making BN work too, was he capable of using it in the end?