Saining——【arXiv2017】Aggregated Residual Transformations for Deep Neural Networks

作者和相关链接

作者

论文下载
代码下载

主要思想

要解决的问题是什么？

　　对于ResNet，VGG，Inception等网络，需要由一些重复的building block堆叠而成，而这些building block的滤波器个数，大小等不能任意设置，需要人工调整。由于其中有很多超参数需要调整，而且在不同的vision task甚至是不同的dataset上参数不能直接共享需要进行个性化定制，因此，这种需要为一定task或者dataset定制的module虽然效果好，但通用性太差。这篇文章介绍了一种新的building block，可以用来替换ResNet的building block，新的模型称为ResNeXt。ResNeXt的最大优势在于整个网络的building block都是一样的，不用在每个stage里再对每个building block的超参数进行调整，只用一个building block，重复堆叠即可形成整个网络。实验结果表明ResNeXt比ResNet在同样模型大小的情况下效果更好。

解决思路？

　　将ResNet的blcok（如图Figure 1的左图所示）换成ResNeXt的block（如图Figure 1的右图所示），实际上是将左边的64个卷积核分成了右边32条不同path，每个path有4个卷积核，最后的32个path将输出向量直接pixel-wise相加（所有通道对应位置点相加），再与Short Cut相加

Figure 1. Left: A block of ResNet [13]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complexity. A layer is shown as (# in channels, filter size, # out channels)

Cardinality和Bottleneck

　　这篇文章提出了一种新的衡量模型容量（capacity，指的是模型拟合各种函数的能力）。在此之前，模型容量有宽度（width)和高度(height)这两种属性，本文提出的“Cardinality”指的是网络结构中的building block的变换的集合大小（the size of the set of transformation）。如图Figure 2所示，（a）、（b）、（c）三种结构是等价的，本文用的是图（c）。实际上Cardinality指的就是Figure 2（b）中path数或Figure 2（c）中group数，即每一条path或者每一个group表示一种transformation，因此path数目或者group个数即为Cardinality数。Bottleneck指的是在每一个path或者group中，中间过渡形态的feature map的channel数目（或者卷积核个数），如Figure 2（a）中，在每一条path中，对于输入256维的向量，使用了4个1*1*256的卷积核进行卷积后得到了256*4的feature map，即4个channel，每个channel的feature map大小为256维，因此，Bottleneck即为4。

Figure 2. Equivalent building blocks of ResNeXt. (a): Aggregated residual transformations, the same as Fig. 1 right. (b): A block equivalent to (a), implemented as early concatenation. (c): A block equivalent to (a,b), implemented as grouped convolutions [23]. Notations in bold text highlight the reformulation changes. A layer is denoted as (# input channels, filter size, # output channels).

ResNet和ResNeXt对比

网络结构对比

　　图Figure 2所示表示的depth=3的情况下ResNet和ResNeXt的building block的对比。

具体配置对比

　　ResNet-50和ResNeXt-50的building block的配置对比如Table 1所示，图中C=32即表示Cardinality=32，Bottleneck= 4，即如图Figure 2中所示。

Table 1. (Left) ResNet-50. (Right) ResNeXt-50 with a 32×4d template (using the reformulation in Fig. 3(c)). Inside the brackets are the shape of a residual block, and outside the brackets is the number of stacked blocks on a stage. “C=32” suggests grouped convolutions [23] with 32 groups. The numbers of parameters and FLOPs are similar between these two models.

模型大小计算

　　以图Figure 3为例，ResNet的参数个数为256 · 64 + 3 · 3 · 64 · 64 + 64 · 256 ≈ 70k 。

ResNeXt的参数个数为C · (256 · d + 3 · 3 · d · d + d · 256），其中，C表示Cardinality=32，d表示bottleneck=4，因此参数总数 ≈ 70k 。

Figure 3. Left: A block of ResNet [13]. Right: A block of ResNeXt with cardinality = 32, with roughly the same complexity. A layer is shown as (# in channels, filter size, # out channels)

实验结果对比
- 证明ResNeXt比ResNet更好，而且Cardinality越大效果越好

Table 2. Ablation experiments on ImageNet-1K. (Top): ResNet-50 with preserved complexity (∼4.1 billion FLOPs); (Bottom): ResNet-101 with preserved complexity ∼7.8 billion FLOPs). The error rate is evaluated on the single crop of 224×224 pixels.

- 证明增大Cardinality比增大模型的width或者depth效果更好

Table 3. Comparisons on ImageNet-1K when the number of FLOPs is increased to 2× of ResNet-101’s. The error rate is evaluated on the single crop of 224×224 pixels. The highlighted factors are the factors that increase complexity.

【速读】——ResNeXt的更多相关文章

提升 composer 的执行速读
常常遇到 php composer.phar update 等待一二十分钟还没有更新完成的情况. 提升速读的方法: 1. 升级PHP 版本到5.4以上 2. 删除文件夹Vender(或者重命名),之后 ...
spring-cloud-square源码速读(spring-cloud-square-okhttp篇)
欢迎访问我的GitHub https://github.com/zq2599/blog_demos 内容:所有原创文章分类汇总及配套源码,涉及Java.Docker.Kubernetes.DevOPS ...
spring-cloud-square源码速读（retrofit + okhttp篇）
欢迎访问我的GitHub 这里分类和汇总了欣宸的全部原创(含配套源码):https://github.com/zq2599/blog_demos spring-cloud-square系列文章五分钟 ...
Grails 1.2参考文档速读（10）：Controller
转载:http://keyvalue.blog.51cto.com/1475446/303260 从本篇起,我们将开始进入Grails的Web层,首先让我们从Controller说起. G ...
论文速读（Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection）
Chuhui Xue--[arxiv2019]MSR_Multi-Scale Shape Regression for Scene Text Detection 论文 Chuhui Xue--[arx ...
论文速读（Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network ）
Jiaming Liu--[2019]Detecting Text in the Wild with Deep Character Embedding Network 论文 Jiaming Liu-- ...
论文速读（Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text）
Yongchao Xu--[2018]TextField_Learning A Deep Direction Field for Irregular Scene Text Detection 论文 Y ...
【论文速读】Yuliang Liu_2017_Detecting Curve Text in the Wild_New Dataset and New Solution
Yuliang Liu_2017_Detecting Curve Text in the Wild_New Dataset and New Solution 作者和代码 caffe版代码关键词文字 ...
【论文速读】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection
XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection 作者和代码 caffe代码关键词 ...

随机推荐

Django与Celery配合实现定时任务
一.前言 Celery是一个基于python开发的分布式任务队列,而做python WEB开发最为流行的框架莫属Django,但是Django的请求处理过程都是同步的无法实现异步任务,若要实现异步任务 ...
操作mysql（import pymysql模块）
pymysql模块 import pymysql #1.连上数据库.账号.密码.ip.端口号.数据库 #2.建立游标 #3.执行sql #4.获取结果 #5.关闭游标 #6.连接关闭 #charest ...
CSS之属性操作
(1)css text 文本文本颜色:color 颜色属性被用来设置文字的颜色颜色是通过css最经常的指定: *十六进制值—如:#FF0000 *一个RGB值---如:RGB(255,0,0) * ...
Spark on Yarn with HA
Spark 可以放到yarn上面去跑,这个毫无疑问.当Yarn做了HA的时候,网上会告诉你基本Spark测不需做太多的关注修改,实际不然. 除了像spark.yarn开头的相关配置外,其中一个很重要的 ...
springboot/Mybatis整合
正题本项目使用的环境: 开发工具:Intellij IDEA 2017.1.3 springboot: 1.5.6 jdk:1.8.0_161 maven:3.3.9 额外功能 PageHelper ...
Idea快捷键和使用技巧【未完】
参考1:点击跳转参考2:点击跳转2 整合后的如下所示:
记录C#中的扩展方法
C#中的扩展方法. 系统自带的类型,我们无法去修改: 修改源代码需要较大的精力,而且可能会带来错误: 我们只是需要一个或者较少的几个方法,修改源代码费时费力: 被扩展的类是sealed的,不能被继承: ...
js数组元素，获得某个元素的最大值。
var rows=[{项次:1},{项次:2},{项次:3}]; Math.max.apply(Math, rows.map(function (o) { return o.项次 })) //结果:3 ...
SQL语句之on子句过滤和where子句过滤区别
1.测试数据: SQL> select * from dept; DEPTNO DNAME LOC ------ -------------- ------------- ...
BtxCMS.Net 项目
项目: 广告位:<script type="text/javascript" src="http://yg1.jmcdn.cn/file/script/A538.x ...

【速读】——ResNeXt