batch_norm在强化学习中建议使用的形式

def batch_norm(layer, **kwargs):

    """

    Apply batch normalization to an existing layer. This is a convenience

    function modifying an existing layer to include batch normalization: It

    will steal the layer's nonlinearity if there is one (effectively

    introducing the normalization right before the nonlinearity), remove

    the layer's bias if there is one (because it would be redundant), and add

    a :class:`BatchNormLayer` and :class:`NonlinearityLayer` on top.

    Parameters

    ----------

    layer : A :class:`Layer` instance

        The layer to apply the normalization to; note that it will be

        irreversibly modified as specified above

    **kwargs

        Any additional keyword arguments are passed on to the

        :class:`BatchNormLayer` constructor.

    Returns

    -------

    BatchNormLayer or NonlinearityLayer instance

        A batch normalization layer stacked on the given modified `layer`, or

        a nonlinearity layer stacked on top of both if `layer` was nonlinear.

    Examples

    --------

    Just wrap any layer into a :func:`batch_norm` call on creating it:

    >>> from lasagne.layers import InputLayer, DenseLayer, batch_norm

    >>> from lasagne.nonlinearities import tanh

    >>> l1 = InputLayer((64, 768))

    >>> l2 = batch_norm(DenseLayer(l1, num_units=500, nonlinearity=tanh))

    This introduces batch normalization right before its nonlinearity:

    >>> from lasagne.layers import get_all_layers

    >>> [l.__class__.__name__ for l in get_all_layers(l2)]

    ['InputLayer', 'DenseLayer', 'BatchNormLayer', 'NonlinearityLayer']

    """

    nonlinearity = getattr(layer, 'nonlinearity', None)

    if nonlinearity is not None:

        layer.nonlinearity = lasagne.nonlinearities.identity

    if hasattr(layer, 'b') and layer.b is not None:

        del layer.params[layer.b]

        layer.b = None

    layer = BatchNormLayer(layer, **kwargs)

    if nonlinearity is not None:

        layer = L.NonlinearityLayer(layer, nonlinearity)

    return layer

源代码地址：

https://gitee.com/devilmaycry812839668/rllab/blob/master/rllab/core/lasagne_layers.py

=================================================

这是经典reinforcement learning框架rllab中的batch_norm的使用。可以看到，在对一个线性层进行batch_norm的时候是不先对线性层的输出进行非线性变换的，而是先对其进行batch_norm，然后再进行非线性变换。而且要注意这里使用的线性层是不使用偏置参数b的，形象的来说，这里建议使用的对 tanh(w*x+b) 的 batch_norm 是这样运行的：

tanh( batch_norm( w*x ) )

而不是：

batch_norm( tanh( w*x + b ) )

    This introduces batch normalization right before its nonlinearity:

It

    will steal the layer's nonlinearity if there is one (effectively

    introducing the normalization right before the nonlinearity), remove

    the layer's bias if there is one (because it would be redundant)

=================================================

batch_norm在强化学习中建议使用的形式的更多相关文章

强化学习中的无模型基于值函数的 Q-Learning 和 Sarsa 学习
强化学习基础: 注: 在强化学习中奖励函数和状态转移函数都是未知的,之所以有已知模型的强化学习解法是指使用采样估计的方式估计出奖励函数和状态转移函数,然后将强化学习问题转换为可以使用动态规划求解的 ...
深度强化学习中稀疏奖励问题Sparse Reward
Sparse Reward 推荐资料 <深度强化学习中稀疏奖励问题研究综述>1 李宏毅深度强化学习Sparse Reward4 强化学习算法在被引入深度神经网络后,对大量样本的需求更加 ...
强化学习中REIINFORCE算法和AC算法在算法理论和实际代码设计中的区别
背景就不介绍了,REINFORCE算法和AC算法是强化学习中基于策略这类的基础算法,这两个算法的算法描述(伪代码)参见Sutton的reinforcement introduction(2nd). A ...
SpiningUP 强化学习中文文档
2020 OpenAI 全面拥抱PyTorch, 全新版强化学习教程已发布. 全网第一个中文译本新鲜出炉:http://studyai.com/course/detail/ba8e572a 个人认为 ...
强化学习中的经验回放（The Experience Replay in Reinforcement Learning）
一.Play it again: reactivation of waking experience and memory(Trends in Neurosciences 2010) SWR发放模式不 ...
(转) 深度强化学习综述：从AlphaGo背后的力量到学习资源分享（附论文）
本文转自:http://mp.weixin.qq.com/s/aAHbybdbs_GtY8OyU6h5WA 专题 | 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文) 原创 201 ...
深度强化学习资料（视频+PPT+PDF下载）
https://blog.csdn.net/Mbx8X9u/article/details/80780459 课程主页:http://rll.berkeley.edu/deeprlcourse/ 所有 ...
【整理】强化学习与MDP
[入门,来自wiki] 强化学习是机器学习中的一个领域,强调如何基于环境而行动,以取得最大化的预期利益.其灵感来源于心理学中的行为主义理论,即有机体如何在环境给予的奖励或惩罚的刺激下,逐步形成对刺激的 ...
强化学习读书笔记 - 05 - 蒙特卡洛方法(Monte Carlo Methods)
强化学习读书笔记 - 05 - 蒙特卡洛方法(Monte Carlo Methods) 学习笔记: Reinforcement Learning: An Introduction, Richard S ...
强化学习（五）—— 策略梯度及reinforce算法
1 概述在该系列上一篇中介绍的基于价值的深度强化学习方法有它自身的缺点,主要有以下三点: 1)基于价值的强化学习无法很好的处理连续空间的动作问题,或者时高维度的离散动作空间,因为通过价值更新策略时是 ...

随机推荐

跨域问题CORS笔记
CORS跨域问题跨域问题简介跨域资源共享(Cross-origin resource sharing, CORS)是用于让网站资源能被不同源网站访问的一种安全机制,这个机制由浏览器与服务器共同负责 ...
搭建第一个web项目
实现使用: 1.创建一个普通java文件 2.Java文件的类名实现HttpServlet 3.重写service方法 4.在WEB-INF下的web.xml中添加请求与servlet类的映射关系定 ...
Django Paginatior分页，页码过多，动态返回页码，页码正常显示
问题: 当返回数据较多,如设置每页展示10条,数据接近200条,返回页码范围1~20,前端每个页码都显示的话,就会出现页码超出当前页面,被遮挡的页码无法操作和显示不美观: 代码优化: 在使用pagin ...
ecnuoj 5042 龟速飞行棋
5042. 龟速飞行棋题目链接:5042. 龟速飞行棋赛中没过,赛后补题时由于题解有些抽象,自己写个题解. 可以发现每次转移的结果只跟后面两个点的胜负状态有关. 不妨设 $f_{u,a,b}$ ...
ClickHouse的物化视图及MySQL表引擎
MySQL表引擎可以与MySQL数据库中的数据表简历映射,并通过SQL向其发起远程查询. MySQL表引擎可以与物化视图结合,来同步更新MySQL数据库中的数据. 语法: CREATE TABLE [ ...
CLR via C# 笔记 -- 数组(16)
1. 数组隐式继承 System.Array,所以数组是引用类型.变量包含的是对数组的引用,而不是包含数据本身的元素. 2. 数组协变性.将数组从一种类型转换为另一种类型. string[] sa = ...
C++获取商店应用(msix应用)桌面快捷方式的安装目录
传统应用的快捷方式目标指向可执行文件的路径,但是对于商店应用(也叫msix打包应用),则指向一个奇怪的字符串,使用IShellLink::GetPath获取路径时,则得到的是空字符串,而我们的最终目的 ...
算法金 | A - Z，115 个数据科学机器学习江湖黑话（全面）
大侠幸会,在下全网同名「算法金」 0 基础转 AI 上岸,多个算法赛 Top 「日更万日,让更多人享受智能乐趣」机器学习本质上和数据科学一样都是依赖概率统计,今天整整那些听起来让人头大的机器学习江湖 ...
C语言：if(0)之后的语句真的不会执行吗？
C语言--if(0)之后的语句真的不会执行吗? 原文(有删改):https://www.cnblogs.com/CodeWorkerLiMing/p/14726960.html 前言学过c语言的都知 ...
jsp+servlet+mysql-----------批量删除
jsp: <div class="result-wrap"> <form action="${pageContext.request.contextPa ...

batch_norm在强化学习中建议使用的形式

batch_norm在强化学习中建议使用的形式的更多相关文章

随机推荐

热门专题