<转>卷积神经网络是如何学习到平移不变的特征

After some thought, I do not believe that pooling operations are responsible for the translation invariant property in CNNs. I believe that invariance (at least to translation) is due to the convolution filters (not specifically the pooling) and due to the fully-connected layer.

For instance, let's use the Fig. 1 as reference:

The blue volume represents the input image, while the green and yellow volumes represent layer 1 and layer 2 output activation volumes (see CS231n Convolutional Neural Networks for Visual Recognition if you are not familiar with these volumes). At the end, we have a fully-connected layer that is connected to all activation points of the yellow volume.

These volumes are build using a convolution plus a pooling operation. The pooling operation reduces the height and width of these volumes, while the increasing number of filters in each layer increases the volume depth.

For the sake of the argument, let's suppose that we have very "ludic" filters, as show in Fig. 2:

the first layer filters (which will generate the green volume) detect eyes, noses and other basic shapes (in real CNNs, first layer filters will match lines and very basic textures);
The second layer filters (which will generate the yellow volume) detect faces, legs and other objects that are aggregations of the first layer filters. Again, this is only an example: real life convolution filters may detect objects that have no meaning to humans.

Now suppose that there is a face at one of the corners of the image (represented by two red and a magenta point). The two eyes are detected by the first filter, and therefore will represent two activations at the first slice of the green volume. The same happens for the nose, except that it is detected for the second filter and it appears at the second slice. Next, the face filter will find that there are two eyes and a nose next to each other, and it generates an activation at the yellow volume (within the same region of the face at the input image). Finally, the fully-connected layer detects that there is a face (and maybe a leg and an arm detected by other filters) and it outputs that it has detected an human body.

Now suppose that the face has moved to another corner of the image, as shown in Fig. 3:

The same number of activations occurs in this example, however they occur in a different region of the green and yellow volumes. Therefore, any activation point at the first slice of the yellow volume means that a face was detected, INDEPENDENTLY of the face location. Then the fully-connected layer is responsible to "translate" a face and two arms to an human body. In both examples, an activation was received at one of the fully-connected neurons. However, in each example, the activation path inside the FC layer was different, meaning that a correct learning at the FC layer is essential to ensure the invariance property.

It must be noticed that the polling operation only "compresses" the activation volumes, if there was no polling in this example, an activation at the first slice of the yellow volume would still mean a face.

In conclusion, what makes a CNN invariant to object translation is the architecture of the neural network: the convolution filters and the fully-connected layer. Additionally, I believe that if a CNN is trained showing faces only at one corner, during the learning process, the fully-connected layer may become insensitive to faces in other corners.

source:

https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features/answer/Jean-Da-Rolt

<转>卷积神经网络是如何学习到平移不变的特征的更多相关文章

深度学习之卷积神经网络（CNN）
卷积神经网络(CNN)因为在图像识别任务中大放异彩,而广为人知,近几年卷积神经网络在文本处理中也有了比较好的应用.我用TextCnn来做文本分类的任务,相比TextRnn,训练速度要快非常多,准确性也 ...
TensorFlow学习笔记（四）图像识别与卷积神经网络
一.卷积神经网络简介卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现. ...
经典卷积神经网络的学习（一）—— AlexNet
AlexNet 为卷积神经网络和深度学习正名,以绝对优势拿下 ILSVRC 2012 年冠军,引起了学术界的极大关注,掀起了深度学习研究的热潮. AlexNet 在 ILSVRC 数据集上达到 16. ...
【RS】Automatic recommendation technology for learning resources with convolutional neural network - 基于卷积神经网络的学习资源自动推荐技术
[论文标题]Automatic recommendation technology for learning resources with convolutional neural network ( ...
Python CNN卷积神经网络代码实现
# -*- coding: utf-8 -*- """ Created on Wed Nov 21 17:32:28 2018 @author: zhen "& ...
Python之TensorFlow的卷积神经网络-5
一.卷积神经网络(Convolutional Neural Networks, CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度 ...
TensorFlow实战之实现AlexNet经典卷积神经网络
本文根据最近学习TensorFlow书籍网络文章的情况,特将一些学习心得做了总结,详情如下.如有不当之处,请各位大拿多多指点,在此谢过. 一.AlexNet模型及其基本原理阐述 1.关于AlexNet ...
卷积神经网络之AlexNet
由于受到计算机性能的影响,虽然LeNet在图像分类中取得了较好的成绩,但是并没有引起很多的关注. 知道2012年,Alex等人提出的AlexNet网络在ImageNet大赛上以远超第二名的成绩夺冠,卷 ...
卷积神经网络(CNN)基础介绍
本文是对卷积神经网络的基础进行介绍,主要内容包含卷积神经网络概念.卷积神经网络结构.卷积神经网络求解.卷积神经网络LeNet-5结构分析.卷积神经网络注意事项. 一.卷积神经网络概念上世纪60年代. ...

随机推荐

hadoop 集群的配置
在经过几天折腾,终于将hadoop环境搭建成功,整个过程中遇到各种坑,反复了很多遍,光虚拟机就重新安装了4.5次,接下来就把搭建的过程详细叙述一下 0.相关工具: 1,系统环境说明: 我这边给出我的集 ...
cessss
[文字] 关注1啊啊啊啊啊点击关注微信点击关注微信2 点击关注微信3 关注2啊啊啊啊啊啊啊啊啊啊啊关注3啊啊啊啊啊啊啊啊关注4啊啊啊啊啊啊啊啊关注5啊啊啊啊啊啊啊啊关注6啊啊啊啊啊啊啊啊啊 ...
基于 Angularjs&Node.js 云编辑器架构设计及开发实践
基于 Angularjs&Node.js 云编辑器架构设计及开发实践一.产品背景二.总体架构 1. 前端架构 a.前端层次 b.核心基础模块设计 c.业务模块设计 2. Node.js端设 ...
JS特效之Tab标签切换
在我们平时浏览网站的时候,经常看到的特效有图片轮播.导航子菜单的隐藏.tab标签的切换等等.这段时间学习了JS后,开始要写出一些简单的特效.今天我也分享一个简单tab标签切换的例子.先附上代码: HT ...
JavaScript基本语法（一）
前段时间学习了HTML和CSS,也实战了一些结构较简单的项目.在还没运用到JS的知识时,做出来的效果总觉得少了些什么.虽然总体布局与一些基本的特效,也能用HTML+CSS就能完成.但如今开始进入Jav ...
心无旁骛，向死而生：WGDC2016给创企上的一堂课
"这是最好的时代,也是最坏的时代:这是希望的春天,也是失望的冬天." ------狄更斯 WGDC2016落幕已经一月有余,我仍然记得会议结束后,穿过高大宽敞的国家会议中心大厅,走 ...
js---OOP浅谈
对象化编程-------简单地去理解就是把javascript能涉及到的范围分成各种对象,对象下面再次划分对象.编程出发点多是对象,或者说基于对象.所说的对象既包含变量,网页,窗口等等对象的含义 ...
sqlite 管理软件
★SQLite的官方网站 http://www.sqlite.org/ ★SQLite的官方网址提供数据库查看软件:http://www.sqlite.org/cvstrac/wiki?p=Manag ...
Android 手机卫士--md5加密过程
在之前的文章中,我们将用户的密码使用SharedPreferences存储,我们打开/data/data/com.wuyudong.mobilesafe/shared_prefs文件夹下的 confi ...
GamePinTu
package com.example.administrator.pintu; import android.content.Context; import android.graphics.Bit ...

<转>卷积神经网络是如何学习到平移不变的特征

<转>卷积神经网络是如何学习到平移不变的特征的更多相关文章

随机推荐

热门专题