原文地址:

http://ruishu.io/2016/12/27/batchnorm/

------------------------------------------------------------------------------

Update [11-21-2017]: Please see this code snippet for my current preferred implementation.

I recently made the switch to TensorFlow and am very happy with how easy it was to get things done using this awesome library. Tensorflow has come a long way since I first experimented with it in 2015, and I am happy to be back.

Since I am getting myself re-acquainted with TensorFlow, I decided that I should write a post about how to do batch normalization in TensorFlow. It’s kind of weird that batch normalization still presents such a challenge for new TensorFlow users, especially since TensorFlow comes with invaluable functions like tf.nn.moments, tf.nn.batch_normalization, and even tf.contrib.layers.batch_norm. One would think that using batch normalization in TensorFlow will be a cinch. But alas, confusion still crops up from time to time, and the devil really lies in the details.

Batch Normalization The Easy Way

Perhaps the easiest way to use batch normalization would be to simply use the tf.contrib.layers.batch_norm layer. So let’s give that a go! Let’s get some imports and data loading out of the way first.

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from utils import show_graph
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Next, we define our typical fully-connected + batch normalization + nonlinearity set-up

def dense(x, size, scope):
return tf.contrib.layers.fully_connected(x, size,
activation_fn=None,
scope=scope) def dense_batch_relu(x, phase, scope):
with tf.variable_scope(scope):
h1 = tf.contrib.layers.fully_connected(x, 100,
activation_fn=None,
scope='dense')
h2 = tf.contrib.layers.batch_norm(h1,
center=True, scale=True,
is_training=phase,
scope='bn')
return tf.nn.relu(h2, 'relu')

One thing that might stand out is the phase term. We are going to use as a placeholder for a boolean which we will insert into feed_dict. It will serve as a binary indicator for whether we are in training phase=True or testing phase=False mode. Recall that batch normalization has distinct behaviors during training verus test time:

Training

  1. Normalize layer activations according to mini-batch statistics.

  2. During the training step, update population statistics approximation via moving average of mini-batch statistics.

Testing

  1. Normalize layer activations according to estimated population statistics.

  2. Do not update population statistics according to mini-batch statistcs from test data.

Now we can define our very simple neural network for MNIST classification.

tf.reset_default_graph()
x = tf.placeholder('float32', (None, 784), name='x')
y = tf.placeholder('float32', (None, 10), name='y')
phase = tf.placeholder(tf.bool, name='phase') h1 = dense_batch_relu(x, phase,'layer1')
h2 = dense_batch_relu(h1, phase, 'layer2')
logits = dense(h2, 10, 'logits') with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(
tf.equal(tf.argmax(y, 1), tf.argmax(logits, 1)),
'float32')) with tf.name_scope('loss'):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits, y))

Now that we have defined our data and computational graph (a.k.a. model), we can train the model! Here is where we need to notice a very important note in the tf.contrib.layers.batch_norm documentation:

Note: When is_training is True the moving_mean and moving_variance need to be updated, by default the update_ops are placed in tf.GraphKeys.UPDATE_OPS so they need to be added as a dependency to the train_op, example:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

if update_ops:

updates = tf.group(*update_ops)

total_loss = control_flow_ops.with_dependencies([updates], total_loss)

If you are comfortable with TensorFlow’s underlying graph/ops mechanism, the note is fairly straight-forward. If not, here’s a simple way to think of it: when you execute an operation (such as train_step), only the subgraph components relevant to train_step will be executed. Unfortunately, the update_moving_averages operation is not a parent of train_step in the computational graph, so we will never update the moving averages! To get around this, we have to explicitly tell the graph:

Hey graph, update the moving averages before you finish the training step!

Unfortunately, the instructions in the documentation are a little out of date. Furthermore, if you think about it a little more, you may conclude that attaching the update ops to total_loss may not be desirable if you wish to compute the total_loss of the test set during test time. Personally, I think it makes more sense to attach the update ops to the train_step itself. So I modified the code a little and created the following training function

def train():
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
# Ensures that we execute the update_ops before performing the train_step
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer()) history = []
iterep = 500
for i in range(iterep * 30):
x_train, y_train = mnist.train.next_batch(100)
sess.run(train_step,
feed_dict={'x:0': x_train,
'y:0': y_train,
'phase:0': 1})
if (i + 1) % iterep == 0:
epoch = (i + 1)/iterep
tr = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.train.images,
'y:0': mnist.train.labels,
'phase:0': 1})
t = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.test.images,
'y:0': mnist.test.labels,
'phase:0': 0})
history += [[epoch] + tr + t]
print history[-1]
return history

And we’re done! We can now train our model and see what happens. Below, I provide a comparison of the model without batch normalization, the model with pre-activation batch normalization, and the model with post-activation batch normalization.

As you can see, batch normalization really does help with training (not always, but it certainly did in this simple example).

Additional Remarks

You have the choice of applying batch normalization either before or after the non-linearity, depending on your definition of the “activation distribution of interest” that you wish to normalize. It will probably end up being a hyperparameter that you’ll just have to tinker with.

There is also the question of what it means to share the same batch normalization layer across multiple models. I think this question should be treated with some care, and it really depends on what you think you’ll gain out of sharing the batch normalization layer. In particular, we should be careful about sharing the mini-batch statistics across multiple data streams if we expect the data streams to have distinct distributions, as is the case when using batch normalization in recurrent neural networks. As of the moment, the

tf.contrib.layers.batch_norm function does not allow this level of control.

But I do!… Kind of. It’s a fairly short piece of code, so it should be easy to modify it to fit your own purposes. *runs away*

-----------------------------------------------------------------------------

附(采用batch_normalization的代码):

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) def dense(x, size, scope):
return tf.contrib.layers.fully_connected(x, size,
activation_fn=None, scope=scope) def dense_relu(x, size, scope):
with tf.variable_scope(scope):
h1 = dense(x, size, 'dense')
return tf.nn.relu(h1, 'relu') tf.reset_default_graph()
x = tf.placeholder('float32', (None, 784), name='x')
y = tf.placeholder('float32', (None, 10), name='y')
phase = tf.placeholder(tf.bool, name='phase') h1 = dense_relu(x, 100, 'layer1')
h1 = tf.contrib.layers.batch_norm(h1,
center=True, scale=True,
is_training=phase,
scope='bn_1') h2 = dense_relu(h1, 100, 'layer2')
h2 = tf.contrib.layers.batch_norm(h2,
center=True, scale=True,
is_training=phase,
scope='bn_2') logits = dense(h2, 10, scope='logits') with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(
tf.equal(tf.argmax(y, 1), tf.argmax(logits, 1)),
'float32')) with tf.name_scope('loss'):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y)) def train():
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer()) history = []
iterep = 500
for i in range(iterep * 30):
x_train, y_train = mnist.train.next_batch(100)
sess.run(train_step,
feed_dict={'x:0': x_train,
'y:0': y_train,
'phase:0': 1})
if (i + 1) % iterep == 0:
epoch = (i + 1)/iterep
tr = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.train.images,
'y:0': mnist.train.labels,
'phase:0': 1})
t = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.test.images,
'y:0': mnist.test.labels,
'phase:0': 0})
history += [[epoch] + tr + t]
print( history[-1] )
return history train()

【转载】 Tensorflow Guide: Batch Normalization (tensorflow中的Batch Normalization)的更多相关文章

  1. 使用TensorFlow中的Batch Normalization

    问题 训练神经网络是一个很复杂的过程,在前面提到了深度学习中常用的激活函数,例如ELU或者Relu的变体能够在开始训练的时候很大程度上减少梯度消失或者爆炸问题.但是却不能保证在训练过程中不出现该问题, ...

  2. Pytorch中的Batch Normalization操作

    之前一直和小伙伴探讨batch normalization层的实现机理,作用在这里不谈,知乎上有一篇paper在讲这个,链接 这里只探究其具体运算过程,我们假设在网络中间经过某些卷积操作之后的输出的f ...

  3. Tensorflow 保存模型 & 在java中调用

    本节涉及: 保存TensorFlow 的模型供其他语言使用 java中调用模型并进行预测计算 一.保存TensorFlow 的模型供其他语言使用 如果用户选择“y” ,则执行下面的步骤: 判断程序执行 ...

  4. 基于Anaconda安装Tensorflow 并实现在Spyder中的应用

    基于Anaconda安装Tensorflow 并实现在Spyder中的应用 Anaconda可隔离管理多个环境,互不影响.这里,在anaconda中安装最新的python3.6.5 版本. 一.安装 ...

  5. PyTorch中的Batch Normalization

    Pytorch中的BatchNorm的API主要有: 1 torch.nn.BatchNorm1d(num_features, 2 3 eps=1e-05, 4 5 momentum=0.1, 6 7 ...

  6. 深度学习中优化【Normalization】

    深度学习中优化操作: dropout l1, l2正则化 momentum normalization 1.为什么Normalization?     深度神经网络模型的训练为什么会很困难?其中一个重 ...

  7. tensorflow学习笔记——使用TensorFlow操作MNIST数据(2)

    tensorflow学习笔记——使用TensorFlow操作MNIST数据(1) 一:神经网络知识点整理 1.1,多层:使用多层权重,例如多层全连接方式 以下定义了三个隐藏层的全连接方式的神经网络样例 ...

  8. tensorflow学习笔记——使用TensorFlow操作MNIST数据(1)

    续集请点击我:tensorflow学习笔记——使用TensorFlow操作MNIST数据(2) 本节开始学习使用tensorflow教程,当然从最简单的MNIST开始.这怎么说呢,就好比编程入门有He ...

  9. 【TensorFlow】:解决TensorFlow的ImportError: DLL load failed: 动态链接库(DLL)初始化例程失败

    [背景] 在scikit-learn基础上系统结合数学和编程的角度学习了机器学习后(我的github:https://github.com/wwcom614/machine-learning),意犹未 ...

  10. CVPR2021 | 重新思考BatchNorm中的Batch

    ​ 前言 公众号在前面发过三篇分别对BatchNorm解读.分析和总结的文章(文章链接在文末),阅读过这三篇文章的读者对BatchNorm和归一化方法应该已经有了较深的认识和理解.在本文将介绍一篇关于 ...

随机推荐

  1. linux系统下,安装docker教程,以CentOS8为例

    查看本机的系统信息: 使用命令 lsb_release -a ,可以看到本机是CentOS系统,版本是8.4.2105 一.安装docker 1.Docker的安装要求CentOS系统内核版本要高于3 ...

  2. 喜讯!INFINI Easysearch 在墨天轮数据库排名中挺进前30!

    近日,2023 年 10 月的 墨天轮中国数据库流行度排行 火热出炉,本月共有 283 个数据库参与排名,中国数据库行业竞争日益激烈.其中,极限科技旗下软件产品 INFINI Easysearch 稳 ...

  3. ZeroPadding 参照

    加密时要处理. 解密时,不需要额外处理,直接NoPadding. import sun.misc.BASE64Decoder; import sun.misc.BASE64Encoder; impor ...

  4. VIVO IQOO 5G 开关

    VIVO IQOO 5G 开关 在拨号盘输入*#*#2288#*#*,然后点击网络模式选择. -

  5. Vue微前端架构与Qiankun实践理论指南

    title: Vue微前端架构与Qiankun实践理论指南 date: 2024/6/15 updated: 2024/6/15 author: cmdragon excerpt: 这篇文章介绍了微前 ...

  6. windows server 安装.net framework 3.5失败

    windows server如果高版本的.net framework 那么在安装.net framework3.5时会提示已安装高版本的不能安装低版本的了 ---------------------- ...

  7. 在AS中logcat的设置过滤信息图文教程

    [当前使用版本 1.4] logcat是调试代码的很好工具,但是因为跳出的信息过多让人目不暇接,未必能让人找到想要的信息,所以我们必须从中过滤出想要的信息 [样例]这里我们要搜索 System.out ...

  8. mysql8的collate问题和修改

    环境 os:centos 7.6 数据库:8.0.22 64bit 问题: 字段a,b它们的collate不一样,结果关联的时候,发现错误. 查询了以下,发现挺多的,逐个修改挺麻烦的,于是整理了如下s ...

  9. Wireshark找不到接口

    在管理员权限下的命令行窗口输入net start npf即可. 注意是管理员权限下的,否则会拒绝访问.

  10. 上交大开源镜像站下架 Docker Hub 镜像

    ​ 在现代软件开发中,Docker镜像已经成为不可或缺的工具.然而,最近频频出现的Docker镜像下架事件让许多开发者措手不及.突然失去依赖的镜像,不仅打乱了项目进程,还引发了许多不便.那么,面对Do ...