Introduction to Deep Learning Algorithms
Introduction to Deep Learning Algorithms
See the following article for a recent survey of deep learning:
Depth
The computations involved in producing an output from an input can be represented by a flow graph: a flow graph is a graph representing a computation, in which each node represents an elementary computation and a value (the result of the computation, applied to the values at the children of that node). Consider the set of computations allowed in each node and possible graph structures and this defines a family of functions. Input nodes have no children. Output nodes have no parents.
The flow graph for the expression
could be represented by a graph with two input nodes
and
, one node for the division
taking
and
as input (i.e. as children), one node for the square (taking only
as input), one node for the addition (whose value would be
and taking as input the nodes
and
, and finally one output node computing the sinus, and with a single input coming from the addition node.
A particular property of such flow graphs is depth: the length of the longest path from an input to an output.
Traditional feedforward neural networks can be considered to have depth equal to the number of layers (i.e. the number of hidden layers plus 1, for the output layer). Support Vector Machines (SVMs) have depth 2 (one for the kernel outputs or for the feature space, and one for the linear combination producing the output).
Motivations for Deep Architectures
The main motivations for studying learning algorithms for deep architectures are the following:
Insufficient depth can hurt
Depth 2 is enough in many cases (e.g. logical gates, formal [threshold] neurons, sigmoid-neurons, Radial Basis Function [RBF] units like in SVMs) to represent any function with a given target accuracy. But this may come with a price: that the required number of nodes in the graph (i.e. computations, and also number of parameters, when we try to learn the function) may grow very large. Theoretical results showed that there exist function families for which in fact the required number of nodes may grow exponentially with the input size. This has been shown for logical gates, formal neurons, and RBF units. In the latter case Hastad has shown families of functions which can be efficiently (compactly) represented with
nodes (for
inputs) when depth is
, but for which an exponential number (
) of nodes is needed if depth is restricted to
.
One can see a deep architecture as a kind of factorization. Most randomly chosen functions can’t be represented efficiently, whether with a deep or a shallow architecture. But many that can be represented efficiently with a deep architecture cannot be represented efficiently with a shallow one (see the polynomials example in the Bengio survey paper). The existence of a compact and deep representation indicates that some kind of structure exists in the underlying function to be represented. If there was no structure whatsoever, it would not be possible to generalize well.
The brain has a deep architecture
For example, the visual cortex is well-studied and shows a sequence of areas each of which contains a representation of the input, and signals flow from one to the next (there are also skip connections and at some level parallel paths, so the picture is more complex). Each level of this feature hierarchy represents the input at a different level of abstraction, with more abstract features further up in the hierarchy, defined in terms of the lower-level ones.
Note that representations in the brain are in between dense distributed and purely local: they are sparse: about 1% of neurons are active simultaneously in the brain. Given the huge number of neurons, this is still a very efficient (exponentially efficient) representation.
Cognitive processes seem deep
- Humans organize their ideas and concepts hierarchically.
- Humans first learn simpler concepts and then compose them to represent more abstract ones.
- Engineers break-up solutions into multiple levels of abstraction and processing
It would be nice to learn / discover these concepts (knowledge engineering failed because of poor introspection?). Introspection of linguistically expressible concepts also suggests a sparse representation: only a small fraction of all possible words/concepts are applicable to a particular input (say a visual scene).
Breakthrough in Learning Deep Architectures
Before 2006, attempts at training deep architectures failed: training a deep supervised feedforward neural network tends to yield worse results (both in training and in test error) then shallow ones (with 1 or 2 hidden layers).
Three papers changed that in 2006, spearheaded by Hinton’s revolutionary work on Deep Belief Networks (DBNs):
- Hinton, G. E., Osindero, S. and Teh, Y., A fast learning algorithm for deep belief nets Neural Computation 18:1527-1554, 2006
- Yoshua Bengio, Pascal Lamblin, Dan Popovici and Hugo Larochelle, Greedy Layer-Wise Training of Deep Networks, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 153-160, MIT Press, 2007
- Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra and Yann LeCun Efficient Learning of Sparse Representations with an Energy-Based Model, in J. Platt et al. (Eds), Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, 2007
The following key principles are found in all three papers:
- Unsupervised learning of representations is used to (pre-)train each layer.
- Unsupervised training of one layer at a time, on top of the previously trained ones. The representation learned at each level is the input for the next layer.
- Use supervised training to fine-tune all the layers (in addition to one or more additional layers that are dedicated to producing predictions).
The DBNs use RBMs for unsupervised learning of representation at each layer. The Bengio et al paper explores and compares RBMs andauto-encoders (neural network that predicts its input, through a bottleneck internal layer of representation). The Ranzato et al paper uses sparse auto-encoder (which is similar to sparse coding) in the context of a convolutional architecture. Auto-encoders and convolutional architectures will be covered later in the course.
Since 2006, a plethora of other papers on the subject of deep learning has been published, some of them exploiting other principles to guide training of intermediate representations. See Learning Deep Architectures for AI for a survey.
Introduction to Deep Learning Algorithms的更多相关文章
- A beginner’s introduction to Deep Learning
A beginner’s introduction to Deep Learning I am Samvita from the Business Team of HyperVerge. I join ...
- 李宏毅机器学习笔记4:Brief Introduction of Deep Learning、Backpropagation(后向传播算法)
李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...
- 课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)—— 2、10个测验题
1.What does the analogy “AI is the new electricity” refer to? (B) A. Through the “smart grid”, AI i ...
- 【DeepLearning学习笔记】Coursera课程《Neural Networks and Deep Learning》——Week1 Introduction to deep learning课堂笔记
Coursera课程<Neural Networks and Deep Learning> deeplearning.ai Week1 Introduction to deep learn ...
- [C1W1] Neural Networks and Deep Learning - Introduction to Deep Learning
第一周:深度学习引言(Introduction to Deep Learning) 欢迎(Welcome) 深度学习改变了传统互联网业务,例如如网络搜索和广告.但是深度学习同时也使得许多新产品和企业以 ...
- Coursera, Deep Learning 1, Neural Networks and Deep Learning - week1, Introduction to deep learning
整个deep learing 系列课程主要包括哪些内容 Intro to Deep learning
- 课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)—— 0、学习目标
1. Understand the major trends driving the rise of deep learning.2. Be able to explain how deep lear ...
- 课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)—— 1、经常提及的问题
Frequently Asked Questions Congratulations to be part of the first class of the Deep Learning Specia ...
- [1天搞懂深度学习] 读书笔记 lecture I:Introduction of deep learning
- 通常机器学习,目的是,找到一个函数,针对任何输入:语音,图片,文字,都能够自动输出正确的结果. - 而我们可以弄一个函数集合,这个集合针对同一个猫的图片的输入,可能有多种输出,比如猫,狗,猴子等, ...
随机推荐
- windows把zookeeper注册成服务
1.官网下载zookeeper:http://mirrors.hust.edu.cn/apache/zookeeper/. 2.修改zookeeper下面的文件/zookeeper/conf/zoo_ ...
- 源码编译vi过程中进行配置时报“checking if compile and link flags for Python are sane... no: PYTHON DISABLED”怎么办?
答: 需要安装python开发库(如果不安装这个库,那么在配置时执行./configure --enable-pythoninterp=yes将不会生效,以至于vi的python特性并没有被开启) 如 ...
- PHP判断访问者是PC端还是移动端
function isMobile() { // 如果有HTTP_X_WAP_PROFILE则一定是移动设备 if (isset ($_SERVER['HTTP_X_WAP_PROFILE'])) { ...
- linux简单命令5---开机与重启
时间可以写为:now.shutdown命令是安全的命令(保存运行程序) 2:下面为其他不安全的关机命令 必须正确退出登录,window是注销
- [ML] Linear Discriminant Analysis
虽然名字里有discriminat这个字,但却是生成模型,有点意思. 判别式 pk 生成式 阅读:生成方法 vs 判别方法 + 生成模型 vs 判别模型 举例: 判别式模型举例:要确定一个羊是山羊还是 ...
- PAT 甲级1025 PAT Ranking (25 分)(结构体排序,第一次超时了,一次sort即可小技巧优化)
题意: 给定一次PAT测试的成绩,要求输出考生的编号,总排名,考场编号以及考场排名. 分析: 题意很简单嘛,一开始上来就,一组组输入,一组组排序并记录组内排名,然后再来个总排序并算总排名,结果发现最后 ...
- 【S/4系列专栏】关于S/4你想知道的问题与答案
转自:http://www.sohu.com/a/152235225_652820 S/4系列专栏将收集国内的实施案例,从各个角度进行分析,包括S/4的由来,S/4各个版本的变化,企业是否有必要选择S ...
- Canal——写入到ES中数据错乱
问题描述 使用canal-adapter写入elasticSearch数据时,数据是写入了elasticSearch了,但出现了mysql表中的数据和elasticSearch中索引中的数据错乱的问题 ...
- LODOP纸张高度不定的纯文本累计高度
小票由于纸张没有确定的高度,根据内容多少,小票打印机出多少纸,在设置纸张的时候,需要把纸张设置成不定高的纸张.简短问答:小票打印 ,参考样例18 http://www.c-lodop.com/demo ...
- iOS-保存图片到相册
//保存 UIButton *saveBtn = [[UIButton alloc] init]; // saveBtn.frame = CGRectMake((screenWi ...