The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

Best answer in stackexchange:

Two additional major benefits of ReLUs are sparsity and a reduced likelihood of vanishing gradient. But first recall the definition of a ReLU is h=max(0,a)h=max(0,a) where a=Wx+ba=Wx+b.

One major benefit is the reduced likelihood of the gradient to vanish. This arises when a>0a>0. In this regime the gradient has a constant value. In contrast, the gradient of sigmoids becomes increasingly small as the absolute value of x increases. The constant gradient of ReLUs results in faster learning.

The other benefit of ReLUs is sparsity. Sparsity arises when a≤0a≤0. The more such units that exist in a layer the more sparse the resulting representation. Sigmoids on the other hand are always likely to generate some non-zero value resulting in dense representations. Sparse representations seem to be more beneficial than dense representations.

Reference: http://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-network

ReLU

ReLU的全称是rectified linear unit。上面的回答基本上涵盖了它胜过sigmoid function的几个方面：

faster
more biological inspired
sparsity
less chance of vanishing gradient (梯度消失问题)

早期使用sigmoid或tanh激活函数的DL在做unsupervised learning时因为 gradient vanishing problem 的问题会无法收敛。ReLU则这没有这个问题。

What are the advantages of ReLU over sigmoid function in deep neural network?的更多相关文章

Sigmoid function in NN
X = [ones(m, ) X]; temp = X * Theta1'; t = size(temp, ); temp = [ones(t, ) temp]; h = temp * Theta2' ...
S性能 Sigmoid Function or Logistic Function
S性能 Sigmoid Function or Logistic Function octave码 x = -10:0.1:10; y = zeros(length(x), 1); for i = 1 ...
logistic function 和 sigmoid function
简单说, 只要曲线是 “S”形的函数都是sigmoid function: 满足公式<1>的形式的函数都是logistic function. 两者的相同点是: 函数曲线都是“S”形. ...
Sigmoid Function
本系列文章由 @yhl_leo 出品,转载请注明出处. 文章链接: http://blog.csdn.net/yhl_leo/article/details/51734189 Sigmodi 函数是一 ...
sigmoid function vs softmax function
DIFFERENCE BETWEEN SOFTMAX FUNCTION AND SIGMOID FUNCTION 二者主要的区别见于, softmax 用于多分类,sigmoid 则主要用于二分类: ...
sigmoid function的直观解释
Sigmoid function也叫Logistic function, 在logistic regression中扮演将回归估计值h(x)从 [-inf, inf]映射到[0,1]的角色. 公式为: ...
神经网络中的激活函数具体是什么？为什么ReLu要好过于tanh和sigmoid function?（转）
为什么引入激活函数? 如果不用激励函数(其实相当于激励函数是f(x) = x),在这种情况下你每一层输出都是上层输入的线性函数,很容易验证,无论你神经网络有多少层,输出都是输入的线性组合,与没有隐藏层 ...
ReLU 和sigmoid 函数对比
详细对比请查看:http://www.zhihu.com/question/29021768/answer/43517930 . 激活函数的作用: 是为了增加神经网络模型的非线性.否则你想想,没有激活 ...
小白学习之pytorch框架(5)-多层感知机(MLP)-(tensor、variable、计算图、ReLU()、sigmoid()、tanh())
先记录一下一开始学习torch时未曾记录(也未好好弄懂哈)导致又忘记了的tensor.variable.计算图计算图计算图直白的来说,就是数学公式(也叫模型)用图表示,这个图即计算图.借用 htt ...

随机推荐

Cocos2d-x环境搭建
资源列表官网上下载最新的cocos2d-x-3.3. 安装JDK,Eclipse,CDT插件,ADT插件. 下载Android SDK,更新.因为国内经常访问不了google,用vpn速度也有点慢. ...
【iCore3 双核心板】例程六：IWDG看门狗实验——复位ARM
实验指导书及代码包下载: http://pan.baidu.com/s/1c0frjHm iCore3 购买链接: https://item.taobao.com/item.htm?id=524229 ...
nginx https ssl 设置受信任证书[原创]
1. 安装nginx 支持ssl模块 http://nginx.org/en/docs/configure.html yum -y install openssh openssh-devel (htt ...
Composer ： php依赖管理工具
原始时代我记得在当时用php的时候还没有composer,只有个pear,但是不好用呀,还不如直接在互联网上到处复制代码了,更快更不容易出错,当时也没有github这么好的社区工具了总结如下代码 ...
distribution数据库过大问题
从事件探查器中监控到如下语句执行时间查过 1分钟: EXEC dbo .sp_MSdistribution_cleanup @min_distretention = 0, @max_distreten ...
Fatal Error: TXK Install Service oracle.apps.fnd.txk.config.ProcessStateException: OUI process failed : Exit=255 See log for details
安装EBS的时候,database pre-install checks检查报警,显示"!" 一开始忽略了该报警,继续安装.在post-install checks的时候又报了错误 ...
Java回顾之Spring基础
第一篇:Java回顾之I/O 第二篇:Java回顾之网络通信第三篇:Java回顾之多线程第四篇:Java回顾之多线程同步第五篇:Java回顾之集合第六篇:Java回顾之序列化第七篇:Java ...
【java开发系列】—— spring简单入门示例
1 JDK安装 2 Struts2简单入门示例前言作为入门级的记录帖,没有过多的技术含量,简单的搭建配置框架而已.这次讲到spring,这个应该是SSH中的重量级框架,它主要包含两个内容:控制反转 ...
Leetcode: Strong Password Checker
A password is considered strong if below conditions are all met: It has at least 6 characters and at ...
javascript中的自增与自减
一直都对自增与自减的执行顺序有点糊涂,今天查了资料,来总结一下 a++(a--),就是指当时计算a,当下一次使用这个变量的时候才执行++或者-- ++a(--a),就是指当时就计算++或者-- 例1: ...

What are the advantages of ReLU over sigmoid function in deep neural network?

The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural network, what are the advantages?

I know that training a network when ReLU is used would be faster, and it is more biological inspired, what are the other advantages? (That is, any disadvantages of using sigmoid)?

ReLU

What are the advantages of ReLU over sigmoid function in deep neural network?的更多相关文章

随机推荐

热门专题