我怎样才能在TensorFlow中使用批量标准化?
我想在TensorFlow中使用Batch Normalization,因为我在源代码core/ops/nn_ops.cc
find它。 但是,我没有在tensorflow.org上find它的logging。
BN在MLP和CNN中有不同的语义,所以我不确定这个BN究竟做了什么。
我没有find一个名为MovingMoments
的方法。
C ++代码复制在这里供参考:
REGISTER_OP("BatchNormWithGlobalNormalization") .Input("t: T") .Input("m: T") .Input("v: T") .Input("beta: T") .Input("gamma: T") .Output("result: T") .Attr("T: numbertype") .Attr("variance_epsilon: float") .Attr("scale_after_normalization: bool") .Doc(R"doc( Batch normalization. t: A 4D input Tensor. m: A 1D mean Tensor with size matching the last dimension of t. This is the first output from MovingMoments. v: A 1D variance Tensor with size matching the last dimension of t. This is the second output from MovingMoments. beta: A 1D beta Tensor with size matching the last dimension of t. An offset to be added to the normalized tensor. gamma: A 1D gamma Tensor with size matching the last dimension of t. If "scale_after_normalization" is true, this tensor will be multiplied with the normalized tensor. variance_epsilon: A small float number to avoid dividing by 0. scale_after_normalization: A bool indicating whether the resulted tensor needs to be multiplied with gamma. )doc");
2016年7月更新在TensorFlow中使用批量标准化的最简单方法是通过contrib / layers , tflearn或slim提供的更高级别的界面。
以前的答案,如果你想DIY :自发布以来,文档string已经改进 – 请参阅主分支中的文档评论,而不是您发现的文档 。 它特别澄清,这是tf.nn.moments
的输出。
您可以在batch_normtesting代码中看到一个非常简单的示例。 对于一个更真实的使用示例,我在下面的助手类中使用了笔记,并使用我自己使用的笔记(不提供任何保证!):
"""A helper class for managing batch normalization state. This class is designed to simplify adding batch normalization (pdf/1502.03167v3.pdf) to your model by managing the state variables associated with it. Important use note: The function get_assigner() returns an op that must be executed to save the updated state. A suggested way to do this is to make execution of the model optimizer force it, eg, by: update_assignments = tf.group(bn1.get_assigner(), bn2.get_assigner()) with tf.control_dependencies([optimizer]): optimizer = tf.group(update_assignments) """ import tensorflow as tf class ConvolutionalBatchNormalizer(object): """Helper class that groups the normalization logic and variables. Use: ewma = tf.train.ExponentialMovingAverage(decay=0.99) bn = ConvolutionalBatchNormalizer(depth, 0.001, ewma, True) update_assignments = bn.get_assigner() x = bn.normalize(y, train=training?) (the output x will be batch-normalized). """ def __init__(self, depth, epsilon, ewma_trainer, scale_after_norm): self.mean = tf.Variable(tf.constant(0.0, shape=[depth]), trainable=False) self.variance = tf.Variable(tf.constant(1.0, shape=[depth]), trainable=False) self.beta = tf.Variable(tf.constant(0.0, shape=[depth])) self.gamma = tf.Variable(tf.constant(1.0, shape=[depth])) self.ewma_trainer = ewma_trainer self.epsilon = epsilon self.scale_after_norm = scale_after_norm def get_assigner(self): """Returns an EWMA apply op that must be invoked after optimization.""" return self.ewma_trainer.apply([self.mean, self.variance]) def normalize(self, x, train=True): """Returns a batch-normalized version of x.""" if train: mean, variance = tf.nn.moments(x, [0, 1, 2]) assign_mean = self.mean.assign(mean) assign_variance = self.variance.assign(variance) with tf.control_dependencies([assign_mean, assign_variance]): return tf.nn.batch_norm_with_global_normalization( x, mean, variance, self.beta, self.gamma, self.epsilon, self.scale_after_norm) else: mean = self.ewma_trainer.average(self.mean) variance = self.ewma_trainer.average(self.variance) local_beta = tf.identity(self.beta) local_gamma = tf.identity(self.gamma) return tf.nn.batch_norm_with_global_normalization( x, mean, variance, local_beta, local_gamma, self.epsilon, self.scale_after_norm)
请注意,我将它称为ConvolutionalBatchNormalizer
因为它将tf.nn.moments
的使用tf.nn.moments
在轴0,1和2上,而对于非卷积使用,您可能只需要轴0。
反馈表示赞赏,如果你使用它。
下面的工作对我来说很好,它不需要在外面调用EMA-应用。
import numpy as np import tensorflow as tf from tensorflow.python import control_flow_ops def batch_norm(x, n_out, phase_train, scope='bn'): """ Batch normalization on convolutional maps. Args: x: Tensor, 4D BHWD input maps n_out: integer, depth of input maps phase_train: boolean tf.Varialbe, true indicates training phase scope: string, variable scope Return: normed: batch-normalized maps """ with tf.variable_scope(scope): beta = tf.Variable(tf.constant(0.0, shape=[n_out]), name='beta', trainable=True) gamma = tf.Variable(tf.constant(1.0, shape=[n_out]), name='gamma', trainable=True) batch_mean, batch_var = tf.nn.moments(x, [0,1,2], name='moments') ema = tf.train.ExponentialMovingAverage(decay=0.5) def mean_var_with_update(): ema_apply_op = ema.apply([batch_mean, batch_var]) with tf.control_dependencies([ema_apply_op]): return tf.identity(batch_mean), tf.identity(batch_var) mean, var = tf.cond(phase_train, mean_var_with_update, lambda: (ema.average(batch_mean), ema.average(batch_var))) normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3) return normed
例:
import math n_in, n_out = 3, 16 ksize = 3 stride = 1 phase_train = tf.placeholder(tf.bool, name='phase_train') input_image = tf.placeholder(tf.float32, name='input_image') kernel = tf.Variable(tf.truncated_normal([ksize, ksize, n_in, n_out], stddev=math.sqrt(2.0/(ksize*ksize*n_out))), name='kernel') conv = tf.nn.conv2d(input_image, kernel, [1,stride,stride,1], padding='SAME') conv_bn = batch_norm(conv, n_out, phase_train) relu = tf.nn.relu(conv_bn) with tf.Session() as session: session.run(tf.initialize_all_variables()) for i in range(20): test_image = np.random.rand(4,32,32,3) sess_outputs = session.run([relu], {input_image.name: test_image, phase_train.name: True})
要添加另一个select:从TensorFlow 1.0(2017年2月)开始,还有TensorFlow本身包含的高级tf.layers.batch_normalization
API。
这是非常简单的使用:
# Set this to True for training and False for testing training = tf.placeholder(tf.bool) x = tf.layers.dense(input_x, units=100) x = tf.layers.batch_normalization(x, training=training) x = tf.nn.relu(x)
…除了它增加了额外的操作(为了更新它的均值和方差variables),使得它们不会是你的训练操作的依赖。 你可以单独运行这些操作:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) sess.run([train_op, extra_update_ops], ...)
或者手动添加更新操作作为您的培训操作的依赖性,然后正常运行您的培训操作:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(extra_update_ops): train_op = optimizer.minimize(loss) ... sess.run([train_op], ...)
还有一个由开发人员编码的“官方”批处理标准化层 。 他们没有很好的文档如何使用它,但这里是如何使用它(根据我):
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm def batch_norm_layer(x,train_phase,scope_bn): bn_train = batch_norm(x, decay=0.999, center=True, scale=True, updates_collections=None, is_training=True, reuse=None, # is this right? trainable=True, scope=scope_bn) bn_inference = batch_norm(x, decay=0.999, center=True, scale=True, updates_collections=None, is_training=False, reuse=True, # is this right? trainable=True, scope=scope_bn) z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) return z
要实际使用它,您需要为train_phase
创build一个占位符,指示您是否处于训练或推理阶段(如train_phase = tf.placeholder(tf.bool, name='phase_train')
)。 它的价值可以在推理或训练期间用tf.session
填充,如下所示:
test_error = sess.run(fetches=cross_entropy, feed_dict={x: batch_xtest, y_:batch_ytest, train_phase: False})
或在训练期间:
sess.run(fetches=train_step, feed_dict={x: batch_xs, y_:batch_ys, train_phase: True})
这里是根据他们实现BN的function(可以在上面给出的链接上,但我粘贴这个以防万一它们改变了):
@add_arg_scope def batch_norm(inputs, decay=0.999, center=True, scale=False, epsilon=0.001, activation_fn=None, updates_collections=ops.GraphKeys.UPDATE_OPS, is_training=True, reuse=None, variables_collections=None, outputs_collections=None, trainable=True, scope=None): """Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" Sergey Ioffe, Christian Szegedy Can be used as a normalizer function for conv2d and fully_connected. Args: -inputs: a tensor of size `[batch_size, height, width, channels]` or `[batch_size, channels]`. -decay: decay for the moving average. -center: If True, subtract `beta`. If False, `beta` is ignored. -scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the next layer is linear (also eg `nn.relu`), this can be disabled since the scaling can be done by the next layer. -epsilon: small float added to variance to avoid dividing by zero. -activation_fn: Optional activation function. -updates_collections: collections to collect the update ops for computation. If None, a control dependency would be added to make sure the updates are computed. -is_training: whether or not the layer is in training mode. In training mode it would accumulate the statistics of the moments into `moving_mean` and `moving_variance` using an exponential moving average with the given `decay`. When it is not in training mode then it would use the values of the `moving_mean` and the `moving_variance`. -reuse: whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given. -variables_collections: optional collections for the variables. -outputs_collections: collections to add the outputs. -trainable: If `True` also add variables to the graph collection `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable). -scope: Optional scope for `variable_op_scope`. Returns: a tensor representing the output of the operation. """ with variable_scope.variable_op_scope([inputs],scope, 'BatchNorm', reuse=reuse) as sc: inputs_shape = inputs.get_shape() dtype = inputs.dtype.base_dtype axis = list(range(len(inputs_shape) - 1)) params_shape = inputs_shape[-1:] # Allocate parameters for the beta and gamma of the normalization. beta, gamma = None, None if center: beta_collections = utils.get_variable_collections(variables_collections,'beta') beta = variables.model_variable('beta',shape=params_shape,dtype=dtype,initializer=init_ops.zeros_initializer,collections=beta_collections,trainable=trainable) if scale: gamma_collections = utils.get_variable_collections(variables_collections,'gamma') gamma = variables.model_variable('gamma',shape=params_shape,dtype=dtype,initializer=init_ops.ones_initializer,collections=gamma_collections,trainable=trainable) # Create moving_mean and moving_variance variables and add them to the # appropiate collections. moving_mean_collections = utils.get_variable_collections(variables_collections, 'moving_mean') moving_mean = variables.model_variable('moving_mean',shape=params_shape,dtype=dtype,initializer=init_ops.zeros_initializer,trainable=False,collections=moving_mean_collections) moving_variance_collections = utils.get_variable_collections(variables_collections, 'moving_variance') moving_variance = variables.model_variable('moving_variance',shape=params_shape,dtype=dtype,initializer=init_ops.ones_initializer,trainable=False,collections=moving_variance_collections) if is_training: # Calculate the moments based on the individual batch. mean, variance = nn.moments(inputs, axis, shift=moving_mean) # Update the moving_mean and moving_variance moments. update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, decay) update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, decay) if updates_collections is None: # Make sure the updates are computed here. with ops.control_dependencies([update_moving_mean,update_moving_variance]): outputs = nn.batch_normalization(inputs, mean, variance, beta, gamma, epsilon) else: # Collect the updates to be computed later. ops.add_to_collections(updates_collections, update_moving_mean) ops.add_to_collections(updates_collections, update_moving_variance) outputs = nn.batch_normalization(inputs, mean, variance, beta, gamma, epsilon) else: outputs = nn.batch_normalization( inputs, moving_mean, moving_variance, beta, gamma, epsilon) outputs.set_shape(inputs.get_shape()) if activation_fn: outputs = activation_fn(outputs) return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
根据github的讨论,我确信这是正确的。
似乎还有另一个有用的链接:
http://r2rt.com/implementing-batch-normalization-in-tensorflow.html
您可以简单地使用内置的batch_norm图层:
batch_norm = tf.cond(is_train, lambda: tf.contrib.layers.batch_norm(prev, activation_fn=tf.nn.relu, is_training=True, reuse=None), lambda: tf.contrib.layers.batch_norm(prev, activation_fn =tf.nn.relu, is_training=False, reuse=True))
prev是前一层的输出(既可以是完全连接也可以是卷积层),is_train是一个布尔值占位符。 只需使用batch_norm作为下一层的input即可。
只是一个头,不能评论,因为我没有50代表。 但@ bgshi上面的答案似乎不正确。 当phase_train设置为false时,它仍会更新ema均值和方差。 这可以通过下面的代码片段来validation。
x = tf.placeholder(tf.float32, [None, 20, 20, 10], name='input') phase_train = tf.placeholder(tf.bool, name='phase_train') # generate random noise to pass into batch norm x_gen = tf.random_normal([50,20,20,10]) pt_false = tf.Variable(tf.constant(True)) #generate a constant variable to pass into batch norm y = x_gen.eval() [bn, bn_vars] = batch_norm(x, 10, phase_train) tf.initialize_all_variables().run() train_step = lambda: bn.eval({x:x_gen.eval(), phase_train:True}) test_step = lambda: bn.eval({x:y, phase_train:False}) test_step_c = lambda: bn.eval({x:y, phase_train:True}) # Verify that this is different as expected, two different x's have different norms print(train_step()[0][0][0]) print(train_step()[0][0][0]) # Verify that this is same as expected, same x's (y) have same norm print(train_step_c()[0][0][0]) print(train_step_c()[0][0][0]) # THIS IS DIFFERENT but should be they same, should only be reading from the ema. print(test_step()[0][0][0]) print(test_step()[0][0][0])
使用TensorFlow内置的batch_norm层,下面是加载数据的代码,build立一个隐藏的ReLU层和L2规范化的networking,并为隐藏层和外层引入批量规范化。 这运行良好,火车罚款。 仅供参考,这个例子主要build立在Udacity DeepLearning课程的数据和代码上。 PS是的,其中的一部分在前面的答案中曾经以某种方式进行过讨论,但是我决定把所有内容都集中到一个代码片段中,以便您可以通过批量规范化和评估来进行整个networking培训过程的实例
# These are all the modules we'll be using later. Make sure you can import them # before proceeding further. from __future__ import print_function import numpy as np import tensorflow as tf from six.moves import cPickle as pickle pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle' with open(pickle_file, 'rb') as f: save = pickle.load(f) train_dataset = save['train_dataset'] train_labels = save['train_labels'] valid_dataset = save['valid_dataset'] valid_labels = save['valid_labels'] test_dataset = save['test_dataset'] test_labels = save['test_labels'] del save # hint to help gc free up memory print('Training set', train_dataset.shape, train_labels.shape) print('Validation set', valid_dataset.shape, valid_labels.shape) print('Test set', test_dataset.shape, test_labels.shape) image_size = 28 num_labels = 10 def reformat(dataset, labels): dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32) # Map 2 to [0.0, 1.0, 0.0 ...], 3 to [0.0, 0.0, 1.0 ...] labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32) return dataset, labels train_dataset, train_labels = reformat(train_dataset, train_labels) valid_dataset, valid_labels = reformat(valid_dataset, valid_labels) test_dataset, test_labels = reformat(test_dataset, test_labels) print('Training set', train_dataset.shape, train_labels.shape) print('Validation set', valid_dataset.shape, valid_labels.shape) print('Test set', test_dataset.shape, test_labels.shape) def accuracy(predictions, labels): return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0]) #for NeuralNetwork model code is below #We will use SGD for training to save our time. Code is from Assignment 2 #beta is the new parameter - controls level of regularization. #Feel free to play with it - the best one I found is 0.001 #notice, we introduce L2 for both biases and weights of all layers batch_size = 128 beta = 0.001 #building tensorflow graph graph = tf.Graph() with graph.as_default(): # Input data. For the training data, we use a placeholder that will be fed # at run time with a training minibatch. tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) tf_valid_dataset = tf.constant(valid_dataset) tf_test_dataset = tf.constant(test_dataset) #introduce batchnorm tf_train_dataset_bn = tf.contrib.layers.batch_norm(tf_train_dataset) #now let's build our new hidden layer #that's how many hidden neurons we want num_hidden_neurons = 1024 #its weights hidden_weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_hidden_neurons])) hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons])) #now the layer itself. It multiplies data by weights, adds biases #and takes ReLU over result hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset_bn, hidden_weights) + hidden_biases) #adding the batch normalization layerhi() hidden_layer_bn = tf.contrib.layers.batch_norm(hidden_layer) #time to go for output linear layer #out weights connect hidden neurons to output labels #biases are added to output labels out_weights = tf.Variable( tf.truncated_normal([num_hidden_neurons, num_labels])) out_biases = tf.Variable(tf.zeros([num_labels])) #compute output out_layer = tf.matmul(hidden_layer_bn,out_weights) + out_biases #our real output is a softmax of prior result #and we also compute its cross-entropy to get our loss #Notice - we introduce our L2 here loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( out_layer, tf_train_labels) + beta*tf.nn.l2_loss(hidden_weights) + beta*tf.nn.l2_loss(hidden_biases) + beta*tf.nn.l2_loss(out_weights) + beta*tf.nn.l2_loss(out_biases))) #now we just minimize this loss to actually train the network optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss) #nice, now let's calculate the predictions on each dataset for evaluating the #performance so far # Predictions for the training, validation, and test data. train_prediction = tf.nn.softmax(out_layer) valid_relu = tf.nn.relu( tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases) valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases) test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases) #now is the actual training on the ANN we built #we will run it for some number of steps and evaluate the progress after #every 500 steps #number of steps we will train our ANN num_steps = 3001 #actual training with tf.Session(graph=graph) as session: tf.initialize_all_variables().run() print("Initialized") for step in range(num_steps): # Pick an offset within the training data, which has been randomized. # Note: we could use better randomization across epochs. offset = (step * batch_size) % (train_labels.shape[0] - batch_size) # Generate a minibatch. batch_data = train_dataset[offset:(offset + batch_size), :] batch_labels = train_labels[offset:(offset + batch_size), :] # Prepare a dictionary telling the session where to feed the minibatch. # The key of the dictionary is the placeholder node of the graph to be fed, # and the value is the numpy array to feed to it. feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels} _, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict) if (step % 500 == 0): print("Minibatch loss at step %d: %f" % (step, l)) print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)) print("Validation accuracy: %.1f%%" % accuracy( valid_prediction.eval(), valid_labels)) print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
那么使用这个蝙蝠虫类的一个简单的例子:
from bn_class import * with tf.name_scope('Batch_norm_conv1') as scope: ewma = tf.train.ExponentialMovingAverage(decay=0.99) bn_conv1 = ConvolutionalBatchNormalizer(num_filt_1, 0.001, ewma, True) update_assignments = bn_conv1.get_assigner() a_conv1 = bn_conv1.normalize(a_conv1, train=bn_train) h_conv1 = tf.nn.relu(a_conv1)
这是一个简单的networking,包含2个隐藏层和批量规范化。 它使用了一些方便的tensorflow.contrib.layers
:
from tensorflow.contrib.layers import fully_connected, batch_norm from tensorflow.contrib.framework import arg_scope batch_norm_params = { 'is_training': True, 'decay': 0.9, 'updates_collections': None, 'scale': True, } with arg_scope( [fully_connected], normalizer_fn=batch_norm, normalizer_params=batch_norm_params): hidden1 = fully_connected(inputs, n_hidden1, scope="hidden1") hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2") logits = fully_connected(hidden2, n_outputs, scope="output", activation_fn=None)
注意:要运行这个,你需要添加inputs
(例如占位符),并且定义每个图层中你想要的神经元的数量( n_hidden1
, n_hidden2
和n_outputs
)。
编辑
马修·拉赫兹的回答是正确的答案。