Convolutional Neural Network


Overview

A Convolutional Neural Network (CNN) is comprised of one or more convolutional layers (often with a subsampling step) and then followed by one or more fully connected layers as in a standard multilayer neural network. The architecture of a CNN is designed to take advantage of the 2D structure of an input image (or other 2D input such as a speech signal). This is achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features. Another benefit of CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. In this article we will discuss the architecture of a CNN and the back propagation algorithm to compute the gradient with respect to the parameters of the model in order to use gradient based optimization. See the respective tutorials on convolution andpooling for more details on those specific operations.

Architecture

A CNN consists of a number of convolutional and subsampling layers optionally followed by fully connected layers. The input to a convolutional layer is a m x m x rm x m x r image where mm is the height and width of the image and rr is the number of channels, e.g. an RGB image has r=3r=3. The convolutional layer will have kk filters (or kernels) of size n x n x qn x n x q where nn is smaller than the dimension of the image and qq can either be the same as the number of channels rr or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce kk feature maps of size m−n+1m−n+1. Each map is then subsampled typically with mean or max pooling over p x pp x p contiguous regions where p ranges between 2 for small images (e.g. MNIST) and is usually not more than 5 for larger inputs. Either before or after the subsampling layer an additive bias and sigmoidal nonlinearity is applied to each feature map. The figure below illustrates a full layer in a CNN consisting of convolutional and subsampling sublayers. Units of the same color have tied weights.

Fig 1: First layer of a convolutional neural network with pooling. Units of the same color have tied weights and units of different color represent different filter maps.

After the convolutional layers there may be any number of fully connected layers. The densely connected layers are identical to the layers in a standard multilayer neural network.

Back Propagation

Let δ(l+1)δ(l+1) be the error term for the (l+1)(l+1)-st layer in the network with a cost function J(W,b;x,y)J(W,b;x,y)where (W,b)(W,b) are the parameters and (x,y)(x,y) are the training data and label pairs. If the ll-th layer is densely connected to the (l+1)(l+1)-st layer, then the error for the ll-th layer is computed as

δ(l)=((W(l))Tδ(l+1))∙f′(z(l))δ(l)=((W(l))Tδ(l+1))∙f′(z(l))

and the gradients are

∇W(l)J(W,b;x,y)∇b(l)J(W,b;x,y)=δ(l+1)(a(l))T,=δ(l+1).∇W(l)J(W,b;x,y)=δ(l+1)(a(l))T,∇b(l)J(W,b;x,y)=δ(l+1).

If the ll-th layer is a convolutional and subsampling layer then the error is propagated through as

δ(l)k=upsample((W(l)k)Tδ(l+1)k)∙f′(z(l)k)δk(l)=upsample((Wk(l))Tδk(l+1))∙f′(zk(l))

Where kk indexes the filter number and f′(z(l)k)f′(zk(l)) is the derivative of the activation function. The upsampleoperation has to propagate the error through the pooling layer by calculating the error w.r.t to each unit incoming to the pooling layer. For example, if we have mean pooling then upsample simply uniformly distributes the error for a single pooling unit among the units which feed into it in the previous layer. In max pooling the unit which was chosen as the max receives all the error since very small changes in input would perturb the result only through that unit.

Finally, to calculate the gradient w.r.t to the filter maps, we rely on the border handling convolution operation again and flip the error matrix δ(l)kδk(l) the same way we flip the filters in the convolutional layer.

∇W(l)kJ(W,b;x,y)∇b(l)kJ(W,b;x,y)=∑i=1m(a(l)i)∗rot90(δ(l+1)k,2),=∑a,b(δ(l+1)k)a,b.∇Wk(l)J(W,b;x,y)=∑i=1m(ai(l))∗rot90(δk(l+1),2),∇bk(l)J(W,b;x,y)=∑a,b(δk(l+1))a,b.

Where a(l)a(l) is the input to the ll-th layer, and a(1)a(1) is the input image. The operation (a(l)i)∗δ(l+1)k(ai(l))∗δk(l+1) is the “valid” convolution between ii-th input in the ll-th layer and the error w.r.t. the kk-th filter.

from: http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/

斯坦福大学卷积神经网络教程UFLDL Tutorial - Convolutional Neural Network的更多相关文章

  1. 树卷积神经网络Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning

    树卷积神经网络Tree-CNN: A Deep Convolutional Neural Network for Lifelong Learning 2018-04-17 08:32:39 看_这是一 ...

  2. 深度学习笔记 (一) 卷积神经网络基础 (Foundation of Convolutional Neural Networks)

    一.卷积 卷积神经网络(Convolutional Neural Networks)是一种在空间上共享参数的神经网络.使用数层卷积,而不是数层的矩阵相乘.在图像的处理过程中,每一张图片都可以看成一张“ ...

  3. 卷积神经网络用语句子分类---Convolutional Neural Networks for Sentence Classification 学习笔记

    读了一篇文章,用到卷积神经网络的方法来进行文本分类,故写下一点自己的学习笔记: 本文在事先进行单词向量的学习的基础上,利用卷积神经网络(CNN)进行句子分类,然后通过微调学习任务特定的向量,提高性能. ...

  4. Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

    理论知识:Optimization: Stochastic Gradient Descent和Convolutional Neural Network CNN卷积神经网络推导和实现.Deep lear ...

  5. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  6. Deep Learning 1_深度学习UFLDL教程:Sparse Autoencoder练习(斯坦福大学深度学习教程)

    1前言 本人写技术博客的目的,其实是感觉好多东西,很长一段时间不动就会忘记了,为了加深学习记忆以及方便以后可能忘记后能很快回忆起自己曾经学过的东西. 首先,在网上找了一些资料,看见介绍说UFLDL很不 ...

  7. Deep Learning 10_深度学习UFLDL教程:Convolution and Pooling_exercise(斯坦福大学深度学习教程)

    前言 理论知识:UFLDL教程和http://www.cnblogs.com/tornadomeet/archive/2013/04/09/3009830.html 实验环境:win7, matlab ...

  8. Deep Learning 13_深度学习UFLDL教程:Independent Component Analysis_Exercise(斯坦福大学深度学习教程)

    前言 理论知识:UFLDL教程.Deep learning:三十三(ICA模型).Deep learning:三十九(ICA模型练习) 实验环境:win7, matlab2015b,16G内存,2T机 ...

  9. Deep Learning 12_深度学习UFLDL教程:Sparse Coding_exercise(斯坦福大学深度学习教程)

    前言 理论知识:UFLDL教程.Deep learning:二十六(Sparse coding简单理解).Deep learning:二十七(Sparse coding中关于矩阵的范数求导).Deep ...

随机推荐

  1. [loj2116]「HNOI2015」开店 动态点分治

    4012: [HNOI2015]开店 Time Limit: 70 Sec  Memory Limit: 512 MBSubmit: 2452  Solved: 1089[Submit][Status ...

  2. email 校验

    email 校验: javascript: /^([a-zA-Z0-9_-])+@([a-zA-Z0-9_-])+((\.[a-zA-Z0-9_-]{2,3}){1,2})$/ java: ^([a- ...

  3. Java经典设计模式之五大创建型模式

    转载: Java经典设计模式之五大创建型模式 一.概况 总体来说设计模式分为三大类: (1)创建型模式,共五种:工厂方法模式.抽象工厂模式.单例模式.建造者模式.原型模式. (2)结构型模式,共七种: ...

  4. 牛客练习赛9 F - 珂朵莉的约数

    题目描述 珂朵莉给你一个长为n的序列,有m次查询 每次查询给两个数l,r 设s为区间[l,r]内所有数的乘积 求s的约数个数mod 1000000007 输入描述: 第一行两个正整数n,m第二行一个长 ...

  5. 【交叉染色法判断二分图】Claw Decomposition UVA - 11396

    题目链接:https://cn.vjudge.net/contest/209473#problem/C 先谈一下二分图相关: 一个图是二分图的充分必要条件: 该图对应无向图的所有回路必定是偶环(构成该 ...

  6. Good Bye 2017 部分题解

    D. New Year and Arbitrary Arrangement 分析 \(dp[i][j]\) 表示已有 \(i\) 个 \(a\) 和 \(j\) 个 \(ab\) 的情况下继续构造能得 ...

  7. 为什么我喜欢Java

    我现在的老板使用一个在线测试系统来筛选在线申请职位的求职者.测试的第一个问题很浅显,仅仅是为了让求职者熟悉一下这个系统的提交和测试代码的流程.问题是这样的,写一个将标准输入拷贝到标准输出的流程.求职者 ...

  8. BZOJ1007 水平相交直线

    按照斜率排序,我们可以想象如果你能看到大于等于三条直线那么他一定会组成一个下凸包,这样我们只需要判断如果当前这条直线与栈顶第二直线相交点相比于栈顶第二直线与栈顶直线相交点靠左那么他就不满足凸包性质. ...

  9. [BZOJ3598][SCOI2014]方伯伯的商场之旅(数位DP,记忆化搜索)

    3598: [Scoi2014]方伯伯的商场之旅 Time Limit: 30 Sec  Memory Limit: 64 MBSubmit: 449  Solved: 254[Submit][Sta ...

  10. POJ2157 Check the difficulty of problems 概率DP

    http://poj.org/problem?id=2151   题意 :t个队伍m道题,i队写对j题的概率为pij.冠军是解题数超过n的解题数最多的队伍之一,求满足有冠军且其他队伍解题数都大于等于1 ...