使用tf.nn.batch_normalization函数实现Batch Normalization操作

觉得有用的话,欢迎一起讨论相互学习~

参考文献

吴恩达deeplearningai课程

 课程笔记

 Udacity课程



"""

大多数情况下，您将能够使用高级功能，但有时您可能想要在较低的级别工作。例如，如果您想要实现一个新特性—一些新的内容，那么TensorFlow还没有包括它的高级实现，

比如LSTM中的批处理规范化——那么您可能需要知道一些事情。

这个版本的网络的几乎所有函数都使用tf.nn包进行编写，并且使用tf.nn.batch_normalization函数进行标准化操作

'fully_connected'函数的实现比使用tf.layers包进行编写的要复杂得多。然而，如果你浏览了Batch_Normalization_Lesson笔记本，事情看起来应该很熟悉。

为了增加批量标准化，我们做了如下工作:

Added the is_training parameter to the function signature so we can pass that information to the batch normalization layer.

1.在函数声明中添加'is_training'参数，以确保可以向Batch Normalization层中传递信息

2.去除函数中bias偏置属性和激活函数

3.添加gamma, beta, pop_mean, and pop_variance等变量

4.使用tf.cond函数来解决训练和预测时的使用方法的差异

5.训练时，我们使用tf.nn.moments函数来计算批数据的均值和方差，然后在迭代过程中更新均值和方差的分布，并且使用tf.nn.batch_normalization做标准化

  注意：一定要使用with tf.control_dependencies...语句结构块来强迫Tensorflow先更新均值和方差的分布，再使用执行批标准化操作

6.在前向传播推导时(特指只进行预测，而不对训练参数进行更新时)，我们使用tf.nn.batch_normalization批标准化时其中的均值和方差分布来自于训练时我们

  使用滑动平均算法估计的值。

7.将标准化后的值通过RelU激活函数求得输出

8.不懂请参见https://github.com/udacity/deep-learning/blob/master/batch-norm/Batch_Normalization_Lesson.ipynb

  中关于使用tf.nn.batch_normalization实现'fully_connected'函数的操作

"""

import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)

def fully_connected(prev_layer, num_units, is_training):

    """

    num_units参数传递该层神经元的数量，根据prev_layer参数传入值作为该层输入创建全连接神经网络。

   :param prev_layer: Tensor

        该层神经元输入

    :param num_units: int

        该层神经元结点个数

    :param is_training: bool or Tensor

        表示该网络当前是否正在训练，告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息

    :returns Tensor

        一个新的全连接神经网络层

    """

    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)

    gamma = tf.Variable(tf.ones([num_units]))

    beta = tf.Variable(tf.zeros([num_units]))

    pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)

    pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)

    epsilon = 1e-3

    def batch_norm_training():

        batch_mean, batch_variance = tf.nn.moments(layer, [0])

        decay = 0.99

        train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))

        train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):

            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)

    def batch_norm_inference():

        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)

    return tf.nn.relu(batch_normalized_output)

"""

我们对conv_layer卷积层的改变和我们对fully_connected全连接层的改变几乎差不多。

然而也有很大的区别，卷积层有多个特征图并且每个特征图在输入图层上共享权重

所以我们需要确保应该针对每个特征图而不是卷积层上的每个节点进行Batch Normalization操作

为了实现这一点，我们做了与fully_connected相同的事情，有两个例外:

1.将gamma、beta、pop_mean和pop_方差的大小设置为feature map(输出通道)的数量，而不是输出节点的数量。

2.我们改变传递给tf.nn的参数。时刻确保它计算正确维度的均值和方差。

"""

def conv_layer(prev_layer, layer_depth, is_training):

    """

       使用给定的参数作为输入创建卷积层

        :param prev_layer: Tensor

            传入该层神经元作为输入

        :param layer_depth: int

            我们将根据网络中图层的深度设置特征图的步长和数量。

            这不是实践CNN的好方法，但它可以帮助我们用很少的代码创建这个示例。

        :param is_training: bool or Tensor

            表示该网络当前是否正在训练，告知Batch Normalization层是否应该更新或者使用均值或方差的分布信息

        :returns Tensor

            一个新的卷积层

        """

    strides = 2 if layer_depth%3 == 0 else 1

    in_channels = prev_layer.get_shape().as_list()[3]

    out_channels = layer_depth*4

    weights = tf.Variable(

        tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))

    layer = tf.nn.conv2d(prev_layer, weights, strides=[1, strides, strides, 1], padding='SAME')

    gamma = tf.Variable(tf.ones([out_channels]))

    beta = tf.Variable(tf.zeros([out_channels]))

    pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)

    pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)

    epsilon = 1e-3

    def batch_norm_training():

        # 一定要使用正确的维度确保计算的是每个特征图上的平均值和方差而不是整个网络节点上的统计分布值

        batch_mean, batch_variance = tf.nn.moments(layer, [0, 1, 2], keep_dims=False)

        decay = 0.99

        train_mean = tf.assign(pop_mean, pop_mean*decay + batch_mean*(1 - decay))

        train_variance = tf.assign(pop_variance, pop_variance*decay + batch_variance*(1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):

            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)

    def batch_norm_inference():

        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)

    return tf.nn.relu(batch_normalized_output)

"""

为了修改训练函数，我们需要做以下工作:

1.Added is_training, a placeholder to store a boolean value indicating whether or not the network is training.

添加is_training，一个用于存储布尔值的占位符，该值指示网络是否正在训练

2.Each time we call run on the session, we added to feed_dict the appropriate value for is_training.

每次调用sess.run函数时，我们都添加到feed_dict中is_training的适当值用以表示当前是正在训练还是预测

3.We did not need to add the with tf.control_dependencies... statement that we added in the network that used tf.layers.batch_normalization

because we handled updating the population statistics ourselves in conv_layer and fully_connected.

我们不需要将train_opt训练函数放进with tf.control_dependencies... 的函数结构体中,这是只有在使用tf.layers.batch_normalization才做的更新均值和方差的操作

"""

def train(num_batches, batch_size, learning_rate):

    # Build placeholders for the input samples and labels

    # 创建输入样本和标签的占位符

    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])

    labels = tf.placeholder(tf.float32, [None, 10])

    # Add placeholder to indicate whether or not we're training the model

    # 创建占位符表明当前是否正在训练模型

    is_training = tf.placeholder(tf.bool)

    # Feed the inputs into a series of 20 convolutional layers

    # 把输入数据填充到一系列20个卷积层的神经网络中

    layer = inputs

    for layer_i in range(1, 20):

        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers

    # 将卷积层输出扁平化处理

    orig_shape = layer.get_shape().as_list()

    layer = tf.reshape(layer, shape=[-1, orig_shape[1]*orig_shape[2]*orig_shape[3]])

    # Add one fully connected layer

    # 添加一个具有100个神经元的全连接层

    layer = fully_connected(layer, 100, is_training)

    # Create the output layer with 1 node for each

    # 为每一个类别添加一个输出节点

    logits = tf.layers.dense(layer, 10)

    # Define loss and training operations

    # 定义loss 函数和训练操作

    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))

    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)

    # Create operations to test accuracy

    # 创建计算准确度的操作

    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))

    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # Train and test the network

    # 训练并测试网络模型

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        for batch_i in range(num_batches):

            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch

            # 训练样本批次

            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})

            # Periodically check the validation or training loss and accuracy

            # 定期检查训练或验证集上的loss和精确度

            if batch_i%100 == 0:

                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,

                                                              labels: mnist.validation.labels,

                                                              is_training: False})

                print(

                    'Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))

            elif batch_i%25 == 0:

                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})

                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets

        # 最后在验证集和测试集上对模型准确率进行评分

        acc = sess.run(accuracy, {inputs: mnist.validation.images,

                                  labels: mnist.validation.labels,

                                  is_training: False})

        print('Final validation accuracy: {:>3.5f}'.format(acc))

        acc = sess.run(accuracy, {inputs: mnist.test.images,

                                  labels: mnist.test.labels,

                                  is_training: False})

        print('Final test accuracy: {:>3.5f}'.format(acc))

        # Score the first 100 test images individually, just to make sure batch normalization really worked

        # 对100个独立的测试图片进行评分,对比验证Batch Normalization的效果

        correct = 0

        for i in range(100):

            correct += sess.run(accuracy, feed_dict={inputs: [mnist.test.images[i]],

                                                     labels: [mnist.test.labels[i]],

                                                     is_training: False})

        print("Accuracy on 100 samples:", correct/100)

num_batches = 800  # 迭代次数

batch_size = 64  # 批处理数量

learning_rate = 0.002  # 学习率

tf.reset_default_graph()

with tf.Graph().as_default():

    train(num_batches, batch_size, learning_rate)

"""

再一次，批量标准化的模型很快达到了很高的精度。

但是在我们的运行中，注意到它似乎并没有学习到前250个批次的任何东西，然后精度开始上升。

这只是显示——即使是批处理标准化，给您的网络一些时间来学习是很重要的。

PS:再100个单个数据的预测上达到了较高的精度，而这才是BN算法真正关注的！！

"""

# Extracting MNIST_data/train-images-idx3-ubyte.gz

# Extracting MNIST_data/train-labels-idx1-ubyte.gz

# Extracting MNIST_data/t10k-images-idx3-ubyte.gz

# Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

# 2018-03-18 19:35:28.568404: I D:\Build\tensorflow\tensorflow-r1.4\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX

# Batch:  0: Validation loss: 0.69113, Validation accuracy: 0.10020

# Batch: 25: Training loss: 0.57341, Training accuracy: 0.07812

# Batch: 50: Training loss: 0.45526, Training accuracy: 0.04688

# Batch: 75: Training loss: 0.37936, Training accuracy: 0.12500

# Batch: 100: Validation loss: 0.34601, Validation accuracy: 0.10700

# Batch: 125: Training loss: 0.34113, Training accuracy: 0.12500

# Batch: 150: Training loss: 0.33075, Training accuracy: 0.12500

# Batch: 175: Training loss: 0.34333, Training accuracy: 0.15625

# Batch: 200: Validation loss: 0.37085, Validation accuracy: 0.09860

# Batch: 225: Training loss: 0.40175, Training accuracy: 0.09375

# Batch: 250: Training loss: 0.48562, Training accuracy: 0.06250

# Batch: 275: Training loss: 0.67897, Training accuracy: 0.09375

# Batch: 300: Validation loss: 0.48383, Validation accuracy: 0.09880

# Batch: 325: Training loss: 0.43822, Training accuracy: 0.14062

# Batch: 350: Training loss: 0.43227, Training accuracy: 0.18750

# Batch: 375: Training loss: 0.39464, Training accuracy: 0.37500

# Batch: 400: Validation loss: 0.50557, Validation accuracy: 0.25940

# Batch: 425: Training loss: 0.32337, Training accuracy: 0.59375

# Batch: 450: Training loss: 0.14016, Training accuracy: 0.75000

# Batch: 475: Training loss: 0.11652, Training accuracy: 0.78125

# Batch: 500: Validation loss: 0.06241, Validation accuracy: 0.91280

# Batch: 525: Training loss: 0.01880, Training accuracy: 0.96875

# Batch: 550: Training loss: 0.03640, Training accuracy: 0.93750

# Batch: 575: Training loss: 0.07202, Training accuracy: 0.90625

# Batch: 600: Validation loss: 0.03984, Validation accuracy: 0.93960

# Batch: 625: Training loss: 0.00692, Training accuracy: 0.98438

# Batch: 650: Training loss: 0.01251, Training accuracy: 0.96875

# Batch: 675: Training loss: 0.01823, Training accuracy: 0.96875

# Batch: 700: Validation loss: 0.03951, Validation accuracy: 0.94080

# Batch: 725: Training loss: 0.02886, Training accuracy: 0.95312

# Batch: 750: Training loss: 0.06396, Training accuracy: 0.87500

# Batch: 775: Training loss: 0.02013, Training accuracy: 0.98438

# Final validation accuracy: 0.95820

# Final test accuracy: 0.95780

# Accuracy on 100 samples: 0.98

Tensorflow BatchNormalization详解：4_使用tf.nn.batch_normalization函数实现Batch Normalization操作的更多相关文章

Tensorflow BatchNormalization详解：2_使用tf.layers高级函数来构建神经网络
Batch Normalization: 使用tf.layers高级函数来构建神经网络觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文献吴恩达deeplearningai课程课程笔 ...
Tensorflow BatchNormalization详解：3_使用tf.layers高级函数来构建带有BatchNormalization的神经网络
Batch Normalization: 使用tf.layers高级函数来构建带有Batch Normalization的神经网络觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文献吴 ...
Tensorflow BatchNormalization详解：1_原理及细节
Batch Normalization: 原理及细节觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文献吴恩达deeplearningai课程课程笔记 Udacity课程为了标准化 ...
【TensorFlow】tf.nn.embedding_lookup函数的用法
tf.nn.embedding_lookup函数的用法主要是选取一个张量里面索引对应的元素.tf.nn.embedding_lookup(tensor, id):tensor就是输入张量,id就是张量 ...
tensorflow 的 Batch Normalization 实现（tf.nn.moments、tf.nn.batch_normalization）
tensorflow 在实现 Batch Normalization(各个网络层输出的归一化)时,主要用到以下两个 api: tf.nn.moments(x, axes, name=None, kee ...
tf.nn.embedding_lookup函数的用法
关于np.random.RandomState.np.random.rand.np.random.random.np.random_sample参考https://blog.csdn.net/lanc ...
Kotlin——高级篇（二）：高阶函数详解与标准的高阶函数使用
在上面一个章节中,详细的讲解了Kotlin中关于Lambda表达式的语法以及运用,如果还您对其还不甚理解,请参见Kotlin--高级篇(一):Lambda表达式详解.在这篇文章中,多次提到了Kotli ...
小记tensorflow-1:tf.nn.conv2d 函数介绍
tf.nn.conv2d函数介绍 Input: 输入的input必须为一个4d tensor,而且每个input的格式必须为float32 或者float64. Input=[batchsize,im ...
这个贴子的内容值得好好学习--实例详解Django的 select_related 和 prefetch_related 函数对 QuerySet 查询的优化
感觉要DJANGO用得好,ORM必须要学好,不管理是内置的,还是第三方的ORM. 最最后还是要到SQL.....:( 这一关,慢慢练啦.. 实例详解Django的 select_related 和 p ...

随机推荐

Nginx连载
一. nginx变量(用户变量.内建变量) 用户变量又称用户自定义变量 Nginx用户变量的可见范围是整个配置文件,甚至可以跨越不通虚拟主机的server配置,但是变量适用范围是不可以跨越自己的容器 ...
总结在Visual Studio Code创建Node.js+Express+handlebars项目
一.安装node.js环境. Node.js安装包及源码下载地址为:https://nodejs.org/en/download/ 32 位安装包下载地址 : https://nodejs.org/d ...
HDU 5229 ZCC loves strings 博弈
题目链接: hdu:http://acm.hdu.edu.cn/showproblem.php?pid=5229 bc:http://bestcoder.hdu.edu.cn/contests/con ...
java沙盒
JAVA的安全模型不同于传统的安全方法,传统的安全方法中,大多数操作系统允许应用程序充分访问系统资源,在操作系统不提供安全保护的机器里,运行环境不能被信任.为了弥补这个缺陷,安全策略经常要求在应用程序 ...
TCP系列47—拥塞控制—10、FACK下的快速恢复与PRR
一.概述 FACK下的重传我们在之前的重传部分已经进行了介绍,这里简单介绍一下随着FACK提出的拥塞控制算法的改进及随后的进一步改进. 从我们之前介绍的RFC2582和RFC5681中可以看到,快速恢 ...
删除多余的自编译的内核、mysql连接不了的问题
1.删除多余的自编译的内核每次Debian发布内核更新,总是有某些内核选项跟自己的硬件不配套,要自己编译内核.编译多了,多余的内核就占用了多余的硬盘空间.我就试过因为/boot分区满了,而导致编译内 ...
简述Java中Http/Https请求监听方法
一.工欲善其事必先利其器做Web开发的人总免不了与Http/Https请求打交道,很多时候我们都希望能够直观的的看到我们发送的请求参数和服务器返回的响应信息,这个时候就需要借助于某些工具啦.本文将采 ...
AdminLTE 框架应用（一）- 插件介绍
原AdminLTE中的插件让我大部分都移除了,第一是占地方,需要的时候再引入也不迟,第二就是有些插件已经过时了,有比较好的插件可以替代.附上项目插件截图 1.bootstrap-addTabs 提供多 ...
PHP中关于取模运算及符号
执行程序段<?php echo 8%(-2) ?>,输出结果是: %为取模运算,以上程序将输出0 $a%$b,其结果的正负取决于$a的符号. echo ((-8)%3); //将 ...
python脚本批量生成50000条插入数据的sql语句
f = open("xx.txt",'w') for i in range(1,50001): str_i = str(i) realname = "lxs"+ ...

Tensorflow BatchNormalization详解：4_使用tf.nn.batch_normalization函数实现Batch Normalization操作

使用tf.nn.batch_normalization函数实现Batch Normalization操作

觉得有用的话,欢迎一起讨论相互学习~

Tensorflow BatchNormalization详解：4_使用tf.nn.batch_normalization函数实现Batch Normalization操作的更多相关文章

随机推荐

热门专题