batch_norm在强化学习中建议使用的形式

def batch_norm(layer, **kwargs):
    """
    Apply batch normalization to an existing layer. This is a convenience
    function modifying an existing layer to include batch normalization: It
    will steal the layer's nonlinearity if there is one (effectively
    introducing the normalization right before the nonlinearity), remove
    the layer's bias if there is one (because it would be redundant), and add
    a :class:`BatchNormLayer` and :class:`NonlinearityLayer` on top.
 
    Parameters
    ----------
    layer : A :class:`Layer` instance
        The layer to apply the normalization to; note that it will be
        irreversibly modified as specified above
    **kwargs
        Any additional keyword arguments are passed on to the
        :class:`BatchNormLayer` constructor.
 
    Returns
    -------
    BatchNormLayer or NonlinearityLayer instance
        A batch normalization layer stacked on the given modified `layer`, or
        a nonlinearity layer stacked on top of both if `layer` was nonlinear.
 
    Examples
    --------
    Just wrap any layer into a :func:`batch_norm` call on creating it:
 
    >>> from lasagne.layers import InputLayer, DenseLayer, batch_norm
    >>> from lasagne.nonlinearities import tanh
    >>> l1 = InputLayer((64, 768))
    >>> l2 = batch_norm(DenseLayer(l1, num_units=500, nonlinearity=tanh))
 
    This introduces batch normalization right before its nonlinearity:
 
    >>> from lasagne.layers import get_all_layers
    >>> [l.__class__.__name__ for l in get_all_layers(l2)]
    ['InputLayer', 'DenseLayer', 'BatchNormLayer', 'NonlinearityLayer']
    """
    nonlinearity = getattr(layer, 'nonlinearity', None)
    if nonlinearity is not None:
        layer.nonlinearity = lasagne.nonlinearities.identity
    if hasattr(layer, 'b') and layer.b is not None:
        del layer.params[layer.b]
        layer.b = None
    layer = BatchNormLayer(layer, **kwargs)
    if nonlinearity is not None:
        layer = L.NonlinearityLayer(layer, nonlinearity)
    return layer

源代码地址：

https://gitee.com/devilmaycry812839668/rllab/blob/master/rllab/core/lasagne_layers.py

=================================================

这是经典reinforcement learning框架rllab中的batch_norm的使用。可以看到，在对一个线性层进行batch_norm的时候是不先对线性层的输出进行非线性变换的，而是先对其进行batch_norm，然后再进行非线性变换。而且要注意这里使用的线性层是不使用偏置参数b的，形象的来说，这里建议使用的对 tanh(w*x+b) 的 batch_norm 是这样运行的：

tanh( batch_norm( w*x ) )

而不是：

batch_norm( tanh( w*x + b ) )

    This introduces batch normalization right before its nonlinearity:

It
    will steal the layer's nonlinearity if there is one (effectively
    introducing the normalization right before the nonlinearity), remove
    the layer's bias if there is one (because it would be redundant)

=================================================

batch_norm在强化学习中建议使用的形式的更多相关文章

强化学习中的无模型基于值函数的 Q-Learning 和 Sarsa 学习
强化学习基础: 注: 在强化学习中奖励函数和状态转移函数都是未知的,之所以有已知模型的强化学习解法是指使用采样估计的方式估计出奖励函数和状态转移函数,然后将强化学习问题转换为可以使用动态规划求解的 ...
深度强化学习中稀疏奖励问题Sparse Reward
Sparse Reward 推荐资料 <深度强化学习中稀疏奖励问题研究综述>1 李宏毅深度强化学习Sparse Reward4 强化学习算法在被引入深度神经网络后,对大量样本的需求更加 ...
强化学习中REIINFORCE算法和AC算法在算法理论和实际代码设计中的区别
背景就不介绍了,REINFORCE算法和AC算法是强化学习中基于策略这类的基础算法,这两个算法的算法描述(伪代码)参见Sutton的reinforcement introduction(2nd). A ...
SpiningUP 强化学习中文文档
2020 OpenAI 全面拥抱PyTorch, 全新版强化学习教程已发布. 全网第一个中文译本新鲜出炉:http://studyai.com/course/detail/ba8e572a 个人认为 ...
强化学习中的经验回放（The Experience Replay in Reinforcement Learning）
一.Play it again: reactivation of waking experience and memory(Trends in Neurosciences 2010) SWR发放模式不 ...
(转) 深度强化学习综述：从AlphaGo背后的力量到学习资源分享（附论文）
本文转自:http://mp.weixin.qq.com/s/aAHbybdbs_GtY8OyU6h5WA 专题 | 深度强化学习综述:从AlphaGo背后的力量到学习资源分享(附论文) 原创 201 ...
深度强化学习资料（视频+PPT+PDF下载）
https://blog.csdn.net/Mbx8X9u/article/details/80780459 课程主页:http://rll.berkeley.edu/deeprlcourse/ 所有 ...
【整理】强化学习与MDP
[入门,来自wiki] 强化学习是机器学习中的一个领域,强调如何基于环境而行动,以取得最大化的预期利益.其灵感来源于心理学中的行为主义理论,即有机体如何在环境给予的奖励或惩罚的刺激下,逐步形成对刺激的 ...
强化学习读书笔记 - 05 - 蒙特卡洛方法(Monte Carlo Methods)
强化学习读书笔记 - 05 - 蒙特卡洛方法(Monte Carlo Methods) 学习笔记: Reinforcement Learning: An Introduction, Richard S ...
强化学习（五）—— 策略梯度及reinforce算法
1 概述在该系列上一篇中介绍的基于价值的深度强化学习方法有它自身的缺点,主要有以下三点: 1)基于价值的强化学习无法很好的处理连续空间的动作问题,或者时高维度的离散动作空间,因为通过价值更新策略时是 ...

随机推荐

Scrapy框架(一)--初识
scrapy初识什么是框架? 所谓的框架简单通用解释就是就是一个具有很强通用性并且集成了很多功能的项目模板,该模板可被应用在不同的项目需求中. 也可被视为是一个项目的半成品. 如何学习框架? 对于刚接 ...
MoneyPrinterPlus:AI自动短视频生成工具,赚钱从来没有这么容易过
这是一个轻松赚钱的项目. 短视频时代,谁掌握了流量谁就掌握了Money! 所以给大家分享这个经过精心打造的MoneyPrinterPlus项目. 它可以:使用AI大模型技术,一键批量生成各类短视频. ...
javascript 类class设置访问器setter时出现Maximum call stack size exceeded错误
Maximum call stack size exceeded这个错误的意思是调用栈溢出,但是自己写的代码基本不可能出现.所以可能的原因是A调用了B,然后B再调用A,形成了循环调用.或者说是A自己调 ...
Mysql中innodb的B+tree能存储多少数据？
引言 InnoDB一棵3层B+树可以存放多少行数据?这个问题的简单回答是:约2千万.为什么是这么多呢?因为这是可以算出来的,要搞清楚这个问题,我们先从InnoDB索引数据结构.数据组织方式说起. 在计 ...
高德的API来查询行政区域查询
高德的API来查询行政区域查询 1.api接口文档地址 https://lbs.amap.com/api/webservice/guide/api/district GET https://resta ...
FreeRTOS简单内核实现6 优先级
0.思考与回答 0.1.思考一如何实现 RTOS 内核支持多优先级? 因为不支持优先级,所以所有的任务都插入了一个名为 pxReadyTasksLists 的就绪链表中,相当于所有任务的优先级都是一 ...
解决Vue中使用history路由模式出现404的问题
背景 vue中默认的路由模式是hash,会出现烦人的符号#,如http://127.0.0.1/#/. 改为history模式可以解决这个问题,但是有一个坑是:强刷新.回退等操作会出现404. Vue ...
BST-Treap名次树指针实现板子 Ver2.1
为了更好的阅读体验,请点击这里这里只有板子没有原理QWQ 可实现 1.插入 x 数 2.删除 x 数(若有多个相同的数,只删除一个) 3.查询 x 数的排名(排名定义为比当前数小的数的个数 +1) ...
带有ttl的Lru在Rust中的实现及源码解析
TTL是Time To Live的缩写,通常意味着元素的生存时间是多长. 应用场景数据库:在redis中我们最常见的就是缓存我们的数据元素,但是我们又不想其保留太长的时间,因为数据时间越长污染的可能 ...
Django部署在CENTOS7上
项目结构 /data/playback_project/├── PlayBack└── script /data/playback_project/PlayBack├── app01├── db.sq ...

batch_norm在强化学习中建议使用的形式

batch_norm在强化学习中建议使用的形式的更多相关文章

随机推荐

热门专题