前言:研究卷积神经网络,把阅读到的一些文献经典的部分翻译一下,写成博客,代码后续给出,不足之处还请大家指出。

本文来自:tony-tan.com

Github:github.com/Tony-Tan

大型卷积神经网络在图片分类上很成功,然而我们不知道他为什么能表现的如此不错,或者如何提高。

#Abstract:

In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of inter-mediate feature layers and the operation of the classifier

我们研究一个优秀的可视化技术,能够给出函数内部特征层以及分类层的信息

Used in a diagnostic role, these visualizations allow us to find model architectures that outperform Krizhevsky et al. on the ImageNet classification benchmark.

可视化使我们找到比Kri在ImageNet分类更好的网络架构。

We also perform an ablation study to discover the performance contribution from different model layers.

我们通过切块研究发现不同层对分类的作用。

We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

我们展示了我们的 ImageNet 模型在其他数据集上获得优秀的表现:当我们重新训练SoftMax分类器。其结果信服的打败了当前SOTA结果,在Caltech-101和Caltech-256数据集上

评论:作者要解决的是可视化深度学习模型,来给出内部的结构,工作原理,以及内在结构的相关性等。并且在这个基础上反向选择优化不同的深度架构(模型)来得到更好的模型,并给出了监督学习的Pre-training方法,在不同测试数据集上表现不俗。

#Introduction

卷积神经网络很牛,在各种分类比赛上获得state-of-the-art的结果。

卷积神经网络在各大测试集上获得好结果的原因:

Several factors are responsible for this renewed interest in convnet

models:

(i) the availability of much larger training sets, with

大量的训练数据

(ii) powerful GPU implementations, making the training of very large models practical

GPU的高效计算

(iii) better model regularization strategies, such as Dropout (Hinton et al., 2012).

更优秀的网络结构(例如Dropout)

Without clear understanding of how and why they work, the development of better models is reduced to trial-and-error.

如果不知道内部原因,我们的新模型只能停留在实验,观察的基础上

In this paper we introduce a visualization technique that reveals the input stimuli that excite individual feature maps at any layer in the model. It also allows us to observe the evolution of features during training and to diagnose potential problems with the model.

本文提出了一种可视化技术,其能够揭示输入是如何激活那些独立的特征映射在模型中的任一层。这项技术也允许我们来观察特征在训练过程中的进化过程来判断模型潜在的问题。

The visualization technique we propose uses a multi-layered Deconvolutional Network (deconvnet), as proposed by (Zeiler et al., 2011), to project the feature activations back to the input pixel space.

我们提出了使用多层逆卷积网络,(Zeiler et al 2014年)提出的,将特征反向映射会到输入层观察结果

We also perform a sensitivity analysis of the classifier output by occluding portions of the input image, revealing which parts of the scene are important for classification

通过遮挡输入图片的部分,对分类器进行分析,来揭示哪些部分对分类结果产生相对重的影响

Using these tools, we start with the architecture of (Krizhevsky et al., 2012) and explore different architectures, discovering ones that outperform their results on ImageNet.

使用这些工具,我们开始使用此架构探索不同的架构,认识在ImageNet上表现出色的结构

We then explore the generalization ability of the model to other datasets, just retraining the softmax classifier on top.

我们随后探索架构对于其他数据集的范化能力,在只重新训练softmax分类器的基础上。

As such, this is a form of supervised pre-training, which contrasts with the unsupervised pre-training methods popularized by (Hinton et al., 2006) and others (Bengio et al., 2007; Vincent et al., 2008)

监督学习的Pre-training来对比无监督的Pre-training方法(Hinton et al., 2006 Bengio et al., 2007; Vincent et al., 2008)

评论:主要就是说,以前都是不知道为啥深度学习会工作,不知道如何优化,只是考实验观察,现在我们能牛x的知道为啥能工作了,虽然没有数学证明,但我们知道怎么调了,知道工作原理,知道怎么Pre-training。。。

#Related Work

Our approach, by contrast, provides a non-parametric view of invariance, showing which patterns from the training set activate the feature map.

我们的工作提出了一种无参数的不变性观点,来展示训练数据的哪些部分激活了特征映射

评论:没有评论

#Approach

We use standard fully supervised convnet models throughout the paper, as defined by (LeCun et al., 1989) and (Krizhevsky et al., 2012).

我们在整篇文章使用标准完全监督卷积网络模型,在 (LeCun et al., 1989) and (Krizhevsky et al., 2012)定义的。

(i) convolution of the previous layer output (or, in the case of the 1st layer, the input image) with a set of learned filters;

(ii) pass- ing the responses through a rectified linear function (relu(x) = max(x, 0));

(iii) [optionally] max pooling over local neighborhoodsand

(iv) [optionally] a local contrast operation that normalizes the responses across feature maps.

The top few layers of the network are conventional fully-connected

networks and the final layer is a softmax classifie

##Visualization with a Deconvnet

We present a novel way to map these activities back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps.

我们提出了一个高级的方法来映射激活反向到输入像素空间,来展示在特征空间哪一部分输入引起了这个给定的激活

In (Zeiler et al., 2011), deconvnets were proposed as a way of performing unsupervised learning. Here, they are not used in any learning capacity, just as a probe of an already trained convent.

在(Zeiler et al., 2011),Deconvnets 被提出作为一种表现非监督学习的方法。这里他们不再用于任何其学习能力,只是用于研究已经训练好的Convnet

To examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer.

为了测试一个给定的神经元激活,我们设置所有其他的同层神经元激活值为零,传导特征映射作为输入来激活deconvnet层

Then we successively (i) unpool, (ii) rectify and (iii) filter to reconstruct the activity in the layer beneath that gave rise to the chosen activation. This is then repeated until input pixel space is reached.

在激活选定特征的神经元后Unpool,rectify,filter来重建本层的激活。

###Unpooling:

In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables. In the deconvnet, the unpooling operation uses these switches to place the reconstructions from the layer above into appropriate locations, preserving the structure of the stimulus. See Fig. 1(bottom) for an illustration of the procedure

最大池化不可逆,我们通过记录位置来进行近似,记录被称为一组switch值,在deconvnet中,逆池化使用这些switch值来定位重建上一层,保留激活分布。在Fig1中说明

###Rectification:

The convnet uses relu non-linearities, which rectify the feature maps

thus ensuring the feature maps are always positive. To obtain valid feature reconstructions at each layer (which also should be positive),we pass the reconstructed signal through a relu non-linearity.

Convnet使用ReLu非线性函数,保证激活值非负;为保证每层特征可重建,我们让所有重建信号经过ReLu层。

###Filtering:

The convnet uses learned filters to con-volve the feature maps from the previous layer. To invert this, the deconvnet uses transposed versions of the same filters, but applied to the rectified maps, not the output of the layer beneath.

Convnet使用学习到的Filters从前一层来获取特征映射。相反,deconvnet使用同一filter的转置,但是操作的对象是整流结果(Rectification),而不是之前的层。

###总结:

Since the model is trained discriminatively, they implicitly show

which parts of the input image are discriminative

由于模型是训练成有区别的,因此他们理所应当的展示输入图片的那些部分是有区别的。

Note that these projections are not samples from the model, since

there is no generative process involved

注意这些映射不是从模型中采样,因为没有范化处理涉及。

评论:此段描述了具体如何反向将特征映射到像素空间。

#Training Details

The architecture, shown in Fig. 3, is similar to that used by (Krizhevsky et al., 2012) for ImageNet classification

结构在Fig 3中(本文第一张图)。。。

训练方法:

The model was trained on the ImageNet 2012 train- ing set (1.3 million images, spread over 1000 different classes). Each RGB image was preprocessed by resizing the smallest dimension to 256, cropping the center 256x256 region, subtracting the per-pixel mean (across all images) and then using 10 different sub-crops of size 224x224 (corners + center with(out) horizontal flips). Stochastic gradient descent with a mini-batch size of 128 was used to update the parameters, starting with a learning rate of 10−2, in conjunction with a momentum term of 0.9. We anneal the learning rate throughout training manually when the validation error plateaus. Dropout (Hinton et al., 2012) is used in the fully connected layers (6 and 7) with a rate of 0.5.

此处翻译略过,描述了卷积神经网络的训练方法。

Visualization of the first layer filters during training reveals that a few of them dominate, as shown in Fig. 6(a). To combat this, we renormalize each filter in the convolutional layers whose RMS value exceeds a fixed radius of 10−1 to this fixed radius

第一层 Filter的可视化在训练过程揭示,其中一部分起支配作用,如Fig 6 a 所示,为了对抗这种情况,我们重新归一化RMS值超过fixed-radius的0.1倍的每一个在卷基层的Filter

评论:详细的训练过程

#Convnet Visualization

关于特征:

Feature Visualization: Fig. 2 shows feature visualizations from our model once training is complete. However, instead of showing the single strongest activation for a given feature map, we show the top 9 activations.

特征可视化:图2显示的特征可视化是当模型训练完成时就确定的,然而不显示对于给定特征映射的单一强刺激而显示top9

Alongside these visualizations we show the corresponding image patches. These have greater variation than visualizations as the latter solely focus on the discriminant structure within each patch.

沿着这个可视化我们可以观察到相当的当前图像区域。这有相当大的可视化成都相对于单独把注意力放到每一个path。

The projections from each layer show the hierarchical nature of the features in the network.

不同层的映射表现出网络中不同自然层级的特征

Feature Evolution during Training: Fig. 4 visualizes the progression during training of the strongest activation (across all training examples) within a given feature map projected back to pixel space.

特征在训练过程中的进化:图4,训练较强反应的神经元(在所有训练样本中)在给定特征映射逆向投影到像素空间的过程中的可视化。

Sudden jumps in appearance result from a change in the image from which the strongest activation originates.

表面上突然的跳跃来自图像最强的激活区域(此区域能够激发网络中的部分神经元产生大的特征变化)

The lower layers of the model can be seen to converge within a few epochs. However, the upper layers only develop after a considerable number of epochs (40-50), demonstrating the need to let the models train until fully converged.

模型的较低层可以在一定周期内观察到。然而,高层的网络只在相当大的周期后才能被建立起来,表明模型需要继续训练到完全收敛

Feature Invariance: Fig. 5 shows 5 sample images being translated, rotated and scaled by varying degrees while looking at the changes in the feature vectors from the top and bottom layers of the model, relative to the untransformed feature.

特征独立性:图5,显示五个样本图经过变换,旋转,缩放多种随机模型,然后从底层到高层观察特征向量与未变换的特征向量进行对比

小的变换对于模型的第一层有显著影响,但是对于顶层特征影响不大,对于变换和缩放大致呈线性

The network output is stable to translations and scalings.

网络输出对于变换和尺度缩放稳定。

In general, the output is not invariant to rotation, except for object with rotational symmetry (e.g. entertainment center).

然而,输出对于旋转变换不稳定,除非是对齐式的旋转。

评论:训练过程中的特征是怎么来的。

##Architecture Selection

While visualization of a trained model gives insight into its operation, it can also assist with selecting good architectures in the first place.

当可视化一个训练好的模型给出了其内部的操作,也能帮助我们选取更好的架构。

The first layer filters are a mix of extremely high and low frequency information, with little coverage of the mid frequencies.

第一层filter是混合了极其高频和极其低频的信息,只有少量的中频信息。

Additionally, the 2nd layer visualization shows aliasing artifacts caused by the large stride 4 used in the 1st layer convolutions.

第二层可视化展示了混淆的手工结果由在第一层中较大的步长(4)引起的

To remedy these problems, we

解决办法:

(i) reduced the 1st layer filter size from 11x11 to 7x7

(ii) made the stride of the convolution 2, rather than 4.

This new architecture retains much more information in the 1st and 2nd layer fea- tures, as shown in Fig. 6© & (e). More importantly, it also improves the classification performance as shown in Section 5.1.

评论:本段讲如何选取架构,说明步长在其中的影响

##Occlusion Sensitivity

With image classification approaches, a natural question is if the model is truly identifying the location of the object in the image, or just using the surrounding context.

对于图像分类的应用,一个自然的问题是模型是否只利用图片中的物体,还是使用周围的上下文信息

Fig. 7 attempts to answer this question by systematically occluding different portions of the input image with a grey square, and monitoring the output of the classifier.

图7试图回答这个问题,通过系统的遮挡输入图片不同的位置,使用一个灰色方框,然后监视分类器的输出

When the occluder covers the image region that appears in the visualization, we see a strong drop in activity in the feature map.

当遮挡覆盖到可视化中出现的区域,我们发现在特征映射层有一个强烈的drop

Fig. 4 and Fig. 2. 图4和图2

评论:遮挡不同区域的影响,不同区域敏感度不同。

##Correspondence Analysis

一致性分析

Deep models differ from many existing recognition approaches in that there is no explicit mechanism for establishing correspondence between specific object parts in different images (e.g. faces have a particular spatial configuration of the eyes and nose)

深度学习模型与现存其他识别机制不同在于:其不存在对于在不同图片之间某些物体的特殊部分之间的准确的区别关系(例如:脸部存在一个鼻子和脸的特别空间关系)

不同的特征向量计算公式:



We then measure the consistency of this difference vector delta between all related image pairs (i, j):

我们然后计算所有图片对之间的不同。

where H is Hamming distance.

H为汉明距离

A lower value indicates greater consistency in the change resulting

from the masking operation, hence tighter correspondence between the

same object parts in different images (i.e. blocking the left eye)

在遮挡操作的变换结果中,一个较低的值表示部件之间较大的相关性fig8:

Table 1

评论:不同特征的独立性验证,如果你有鼻子眼睛嘴的脸部特征,遮住鼻子对最后的特征向量影响不大,说明他们之间的相关性比较强,类似于一张图如果有鼻子,基本也有眼睛,所以你遮住眼睛也会得到差不多的特征向量。

总结:简单的学习了一下这篇文章,后面第五部分讲的是经验,关于如何训练高质量的网络,会在下一篇推出,欢迎收看。

Visualizing and Understanding Convolutional Networks的更多相关文章

  1. [论文解读]CNN网络可视化——Visualizing and Understanding Convolutional Networks

    概述 虽然CNN深度卷积网络在图像识别等领域取得的效果显著,但是目前为止人们对于CNN为什么能取得如此好的效果却无法解释,也无法提出有效的网络提升策略.利用本文的反卷积可视化方法,作者发现了AlexN ...

  2. 0 - Visualizing and Understanding Convolutional Networks(阅读翻译)

    卷积神经网络的可视化理解(Visualizing and Understanding Convolutional Networks) 摘要(Abstract) 近来,大型的卷积神经网络模型在Image ...

  3. 深度学习论文翻译解析(十):Visualizing and Understanding Convolutional Networks

    论文标题:Visualizing and Understanding Convolutional Networks 标题翻译:可视化和理解卷积网络 论文作者:Matthew D. Zeiler  Ro ...

  4. Visualizing and Understanding Convolutional Networks论文复现笔记

    目录 Visualizing and Understanding Convolutional Networks 论文复现笔记 Abstract Introduction Approach Visual ...

  5. 深度学习研究理解5:Visualizing and Understanding Convolutional Networks(转)

    Visualizing and understandingConvolutional Networks 本文是Matthew D.Zeiler 和Rob Fergus于(纽约大学)13年撰写的论文,主 ...

  6. 论文笔记:Visualizing and Understanding Convolutional Networks

    2014 ECCV 纽约大学 Matthew D. Zeiler, Rob Fergus 简单介绍(What) 提出了一种可视化的技巧,能够看到CNN中间层的特征功能和分类操作. 通过对这些可视化信息 ...

  7. 【网络结构可视化】Visualizing and Understanding Convolutional Networks(ZF-Net) 论文解析

    目录 0. 论文地址 1. 概述 2. 可视化结构 2.1 Unpooling 2.2 Rectification: 2.3 Filtering: 3. Feature Visualization 4 ...

  8. ZFNet: Visualizing and Understanding Convolutional Networks

    目录 论文结构 反卷积 ZFnet的创新点主要是在信号的"恢复"上面,什么样的输入会导致类似的输出,通过这个我们可以了解神经元对输入的敏感程度,比如这个神经元对图片的某一个位置很敏 ...

  9. Fully Convolutional Networks for semantic Segmentation(深度学习经典论文翻译)

    摘要 卷积网络在特征分层领域是非常强大的视觉模型.我们证明了经过端到端.像素到像素训练的卷积网络超过语义分割中最先进的技术.我们的核心观点是建立"全卷积"网络,输入任意尺寸,经过有 ...

随机推荐

  1. .Net面试题二

    谈谈创建线程的方式 1.列举.Net页面之间传值的方式 2..Net中aspx页面从客户端浏览器开始请求到服务器返回响应所经历的过程 CLR主要运行过程 ASP.NET运行管道所有事件 3.如何理解委 ...

  2. arcgisJs之featureLayer中feature的获取

    arcgisJs之featureLayer中feature的获取 在featureLayer中source可以获取到一个Graphic数组,但是这个数组属于原数据数组.当使用 applyEdits修改 ...

  3. git托管代码

    安装git 在github上创建仓库 搞钥匙 在vscode上托管 初次在不同电脑上上传代码报错 解决:git pull --rebase origin master 链接:https://blog. ...

  4. js 操作对象的小技巧

    来源:https://www.w3cplus.com/javascript/javascript-tips.html 1.使用...运算符合并对象或数组中的对象 同样使用ES的...运算符可以替代人工 ...

  5. Docker Ubuntu容器安装ping

    apt-get update apt-get install iputils-ping apt-get install net-tools

  6. docker 卸载旧版本

    列出docker安装过的相关包: sudo yum list installed | grep docker 删除相关安装包 sudo yum -y remove docker-ce.x86_64su ...

  7. python3中OS模块

    os模块 OS模块简单的来说它是一个Python的系统编程的操作模块,可以处理文件和目录这些我们日常手动需要做的操作. 可以查看OS模块的帮助文档: import os:#导入os模块 help(os ...

  8. 《python解释器源码剖析》第7章--python中的set对象

    7.0 序 集合和字典一样,都是性能非常高效的数据结构,性能高效的原因就在于底层使用了哈希表.因此集合和字典的原理本质上是一样的,都是把值映射成索引,通过索引去查找. 7.1 PySetObject ...

  9. Spring源码解析--Spring的整体架构

    概述 Spring是一个开放源代码的设计层面框架,他解决的是业务逻辑层和其他各层的松耦合问题,因此它将面向接口的编程思想贯穿整个系统应用.Spring是于2003 年兴起的一个轻量级的Java 开发框 ...

  10. OSI七层协议模型、TCP/IP四层模型

    OSI七层协议模型 TCP/IP四层模型 首先我们梳理一下每层模型的职责: 链路层:对0和1进行分组,定义数据帧,确认主机的物理地址,传输数据: 网络层:定义IP地址,确认主机所在的网络位置,并通过I ...