0、Abstract

In this paper, we propose a new channel pruning method based on artificial bee colony algorithm (ABC), dubbed as ABCPruner, which aims to efficiently find optimal pruned structure, i.e., channel number in each layer, rather than selecting “important” channels as previous works did.

一开头提到新的剪枝方法:ABCPruner(artificial bee colony algorithm)【channel level】,表明自己的方法旨在找出最优剪枝结构而不是和以往研究一样挑选出最重要的channel

..., we first propose to shrink the combinations where the preserved channels are limited to a specific space, ... . And then, we formulate the search of optimal pruned structure as an optimization problem and integrate the ABC algorithm to solve it in an automatic manner to lessen human interference.

新方法的大概流程是:将通道组合**压缩**到一个特定空间,然后将“(在这个特定空间)搜索最优剪枝结构”作为一个**优化**问题;并且使用ABCPruner 来自动求解这个优化问题。

1、Introduce

Channel pruning targets at removing the entire channel in each layer, which is straightforward but challenging because removing channels in one layer might drastically change the input of the next layer.

简单提到通道剪枝的弊端:删除某一个层中的一部分通道,极大可能会影响了下一层的输入。(但是只是提了一下,通篇没有给出解决方法)

Most cutting-edge practice implements channel pruning by selecting channels (filters) based on rule-of-thumb designs.

  1. The first is to identify the most important filter weights in a pre-trained model, ...
  2. The second performs channel pruning based on handcrafted rules to regularize the retraining of a full model followed by pruning and fine-tuning...

然后说到现有大多数方法是根据经验规则实现通道剪枝:一类是根据预训练模型的每个filter的权重,根据权重大小排序舍弃不重要的权重;另一个是手工规则,即人工决定剪枝率等超参数,进行剪枝。然后引用一些别人的论文,这里省略。

The motivation of our ABCPruner is two-fold.

  1. First, [Liu et al., 2019b] showed that the essence of channel pruning lies in finding optimal pruned structure, i.e., channel number in each layer, instead of selecting “important” channels.
  2. Second, [He et al., 2018b] proved the feasibility of applying automatic methods for controlling hyper-parameters to channel pruning, which requires less human interference.

作者根据上面总结的两种现状,以及他人论文,表明:首先通道剪枝的本质是寻找最优剪枝结构,而不是重要的通道;再者认为较少的人为干涉会好一些,所以引用他人文章方法,将超参数自动控制方法应用到剪枝中。

【更详细解释自己的方法】

Given a CNN with L layers, the combinations of pruned structure could be \(\prod _ { j = 1 } ^ { L } c _ { j }\), \(L\) is layers number, \(c_j\) is channel number in the \(j\)-th layer. The combination overhead is extremely intensive.

To solve this problem, we propose to shrink the combinations by limiting the number of preserved channels to \(\{ 0.1 c _{ j } , 0.2 c _{ j } , \ldots , \alpha c _{ j } \}\) where the value of α falls in {10%,20%, ...,100%}, which shows that there are 10α feasible solutions for each layer, ...

对于一个有L层卷积的网络,裁剪的可能性方案有\(\prod _ { j = 1 } ^ { L } c _ { j }\),作者认为这个方案太多,应该少点。(这里的少点没有给出合理性解释,通篇看下来只是为了方便而提出的);

因此作者将每层的裁剪方案限制成十个,具体做法就是:按照该层的filter数目c,取10%c,20%c,...,100%c 十个数作为该层裁剪的选择空间。也就是作者说的:将通道组合压缩到一个特定空间。

  1. As shown in Fig.1, we first initialize a structure set, each element of which represents the preserved channel number in each layer.
  2. The filter weights of the full model are randomly selected and assigned to initialize each structure.
  3. We train it for a given number of epochs to measure its fitness, a.k.a, accuracy performance in this paper.
  4. Then, ABC is introduced to update the structure set.
  5. Similarly, filter assignment, training and fitness calculation for the updated structures are conducted. We continue the search for some cycles.
  6. Finally, the one with the best fitness is considered as the optimal pruned structure, and its trained weights are reserved as a warm-up for fine-tuning.

整个剪枝方案:

  1. 初始化一个 structure set,set中的每个元素表示每层要保留的通道数目(实际上是初始化多个structure set,每个set代表一种剪枝方案)
  2. 根据这个集合对每层进行随机裁剪
  3. 训练一定epochs,测试精度
  4. 然后使用ABC来更新 structure set
  5. 重复 2,3,4
  6. 挑选出最优结构,进行微调



2、Related Work

Network Pruning: weight pruning and channel pruning.

AutoML: automatic pruning

大概说了一下权重剪枝和通道剪枝的特性和情况,然后提到AutoML的好处,又因为[1810.05270] Rethinking the Value of Network Pruning (arxiv.org)这里面提到的“通道剪枝的关键在于剪枝结构而不是选择‘重要’通道”,启发了现在的方法。

3、The Proposed ABCPrunner

Given a CNN model \(N\) that contains \(L\) convolutional layers and its filters set \(W\), we refer to \(C = (c_1, c_2, ..., c_L)\) as the network structure of \(N\), where \(c_j\) is the channel number of the \(j\)-th layer. Channel pruning aims to remove a portion of filters in \(W\) while keeping a comparable or even better accuracy.

For any pruned model N0, we denote its structure as \(C^ { \prime } = \left( c _ { 1 } ^ { \prime } , c _ { 2 } ^ { \prime } , \ldots , c _ { L } ^ { \prime } \right)\), where \(c _ { j } ^ { \prime } \leq c _ { j }\) is the channel number of the pruned model in the \(j\)-th layer.

定义一下参数:模型 \(N\)、卷积层数 \(L\)、过滤器集合 \(W\)、结构集合 \(C = (c_1, c_2, ..., c_L)\) , \(c_j\) 是每层中的通道数。而剪枝中网络使用的机构集合定义为 \(C^ { \prime } = \left( c _ { 1 } ^ { \prime } , c _ { 2 } ^ { \prime } , \ldots , c _ { L } ^ { \prime } \right)\), \(c _ { j } ^ { \prime } \leq c _ { j }\)。

目标:移除过滤器集合 \(W\) 中一部分过滤器,精度基本不变

3.1 Combination Shrinkage

在前面提到,每层的 \(c _ { j } ^ { \prime }\) 的大小由 \(c_j\) 即 filter 数量决定上限,如果不做处理,整个模型所有层一组合,情况太多。所以作者这边做了限制,每层的\(c _ { j } ^ { \prime }\) 只能取值为设定的 \(c_j\) 的梯度百分比,具体形式为:\(c _{ i } ^ { \prime } \in \left\{ 0.1 c _{ i } , 0.2 c _{ i } , \ldots , \alpha c _{ i } \right\} ^ { L }\)

3.2 Optimal Pruned Structure

Given the training set \(\mathcal { T } _ { \text {train} }\) and test set , \(\mathcal { T } _ { \text {test} }\) we aim to find the optimal combination of \(C ^ { \prime }\), such that the pruned model \(\mathcal { N } ^ { \prime }\) trained/fine-tuned on\(\mathcal { T } _ { \text {train} }\) obtains the best accuracy. To that effect, we formulate our channel pruning problem as:

\[\left( C ^ { \prime } \right) ^ { * } = \underset { C ^ { \prime } } { \arg \max } \operatorname { acc } \left( \mathcal { N } ^ { \prime } \left( C ^ { \prime } , \mathbf { W } ^ { \prime } ; \mathcal { T } _ { \text {train} } \right) ; \mathcal { T } _ { \text {test} } \right)
\]

\(c _{ i } ^ { \prime } \in \left\{ 0.1 c _{ i } , 0.2 c _{ i } , \ldots , \alpha c _{ i } \right\} ^ { L }\)

where \(\mathbf { W } ^ { \prime }\) is the weights of pruned model trained/fine-tuned on \(\mathcal { T } _ { \text {train} }\), and \(acc(·)\) denotes the accuracy on \(\mathcal { T } _ { \text {test} }\) for \(\mathcal { N } ^ { \prime }\) with structure \(\mathcal { C } ^ { \prime }\).

在一次策略中(即一次自定义的结构集合中),根据\(C = (c_1, c_2, ..., c_L)\) 初始化不同的\(C^ { \prime } = \left( c _ { 1 } ^ { \prime } , c _ { 2 } ^ { \prime } , \ldots , c _ { L } ^ { \prime } \right)\),然后训练、微调并比较,获取精度最高时对应的参数

3.3 Automatic Structure Search

In particular, we initialize a set of \(n\) pruned structures \(\left\{ C _ { j } ^ { \prime } \right\} _ { j = 1 } ^ { n }\) with the \(i\)-th element \(c^{\prime}_{ji}\) of \(C^{\prime}_j\) randomly sampled from \(\left\{ 0.1 c _{ i } , 0.2 c _{ i } , \ldots , \alpha c _{ i } \right\}\). Accordingly, we obtain a set of pruned model \(\left\{ \mathcal { N } _ { j } ^ { \prime } \right\} _ { j = 1 } ^ { n }\) and a set of pruned weights \(\left\{ \mathbf { W } _ { j } ^ { \prime } \right\} _ { j = 1 } ^ { n }\).

Each pruned structure \(C^{\prime}_j\) represents a potential solution to the optimization problem.

自动结构搜索的具体做法是,随机初始化多组structures set。

然后作者得到多组 structures set 后,使用了ABC算法来更新这多组structures set。

这里插入ABC算法的相关介绍


ABC算法|[Artificial bee colony algorithm](Artificial bee colony algorithm - Wikipedia)

1、原理

标准的ABC算法将人工蜂群分为三类:被雇佣蜂,观察蜂以及侦察蜂。

  • 被雇佣蜂负责采蜜,根据记忆中的食物位置,负责搜索食物邻域内的其他食物;
  • 被雇佣蜂将找到的食物位置信息分享给观察蜂,观察蜂选择哪一个是更好的食物来源
  • 当在限定的次数内都没有搜索到一个高于阈值的理想食物来源,需要抛弃食物来源,被雇佣蜂成为侦察蜂,随机搜索新的食物来源。

2、实现

2.1 刚开始,对整个蜂群进行初始化。蜂群的规模为2SN,被雇佣蜂和观察蜂的数量相等,均为SN。蜜源的数量与采蜜蜂相等,也为SN。使用 \({\displaystyle X_{i}=\{x_{i,1},x_{i,2},\ldots ,x_{i,n}\}}\) 表示第 \(i\) 次的搜索结构,\(n\) 表示维度。

2.2 现在受雇佣蜂根据记忆位置 \(X_{i}\) 生成新的位置 \(V_i\),生成公式为:

\[{\displaystyle v_{i,k}=x_{i,k}+\Phi _{i,k}\times (x_{i,k}-x_{j,k})}
\]

\(\Phi _{i,k}\) 是 [-1, 1] 中的随机数,\(x_{i,n}\) 是随机第 \(j\) 次方案的随机第 \(k\) 维度数据。

2.3 观察蜂观察这两个位置的食物:根据 \(X_i\) 和 \(V_i\) 的适应值(遗传算法中的一个说法)比较优劣,选择更优的那个。

2.4 在一定次数以后(称为 limit),对比这些蜜源位置的适应值,记录全局最优的蜜源。公式如下:$${\displaystyle P_{i}={\frac {\mathrm {fit} _{i}}{\sum _{j}{\mathrm {fit} _{j}}}}}$$

2.5 如果在一定次数内位置没有变化,那么就放弃这个食物来源,重现初始化一个食物来源

算法流程

创新点

  1. 将ABC引入到剪枝中

[论文分享]Channel Pruning via Automatic Structure Search的更多相关文章

  1. 论文笔记——Channel Pruning for Accelerating Very Deep Neural Networks

    论文地址:https://arxiv.org/abs/1707.06168 代码地址:https://github.com/yihui-he/channel-pruning 采用方法 这篇文章主要讲诉 ...

  2. 模型压缩之Channel Pruning

    论文地址 channel pruning是指给定一个CNN模型,去掉卷积层的某几个输入channel以及相应的卷积核, 并最小化裁剪channel后与原始输出的误差. 可以分两步来解决: channe ...

  3. [论文分享] DHP: Differentiable Meta Pruning via HyperNetworks

    [论文分享] DHP: Differentiable Meta Pruning via HyperNetworks authors: Yawei Li1, Shuhang Gu, etc. comme ...

  4. MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

    MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning 2019-08-11 19:48:17 Paper: h ...

  5. 【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-论文阅读

    DMCP 2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks Shaopeng Guo(sensetime ...

  6. 论文分享NO.4(by_xiaojian)

    论文分享第四期-2019.04.16 Residual Attention Network for Image Classification,CVPR 2017,RAN 核心:将注意力机制与ResNe ...

  7. C and C++ : Partial initialization of automatic structure

    Refer to: http://stackoverflow.com/questions/10828294/c-and-c-partial-initialization-of-automatic-st ...

  8. 论文分享NO.3(by_xiaojian)

    论文分享第三期-2019.03.29 Fully convolutional networks for semantic segmentation,CVPR 2015,FCN 一.全连接层与全局平均池 ...

  9. 论文分享NO.2(by_xiaojian)

    论文分享第二期-2019.03.26 NIPS2015,Spatial Transformer Networks,STN,空间变换网络

随机推荐

  1. Mat使用详解

    背景 笔记中躺了很久的文章,今天用到Mat时发现之前写的内容还算清晰,分享出来; 如下所举例使用的dump文件是针对之前使用的ignite库溢出时的dump文件:关于ignite的概念此处不再叙述,本 ...

  2. [TroubleShootting]Zabbix数据采集出现断点的问题

    背景 最近发现公司的Zabbix监控大屏上的监控图经常出现数据断点的现象,主要集中在一些自定义的监控项数据上,如下图: 原因 查看Zabbix Server日志以及zabbix官方手册后,分析可能原因 ...

  3. 【mq读书笔记】消息消费队列和索引文件的更新

    ConsumeQueue,IndexFile需要及时更新,否则无法及时被消费,根据消息属性查找消息也会出现较大延迟. mq通过开启一个线程ReputMessageService来准时转发commitL ...

  4. 浅尝 Elastic Stack (四) Logstash + Beats 读取 Spring Boot 日志

    一.Spring Boot 日志配置 采用 Spring Boot 默认的 Logback: <?xml version="1.0" encoding="UTF-8 ...

  5. [从源码学设计]蚂蚁金服SOFARegistry之消息总线异步处理

    [从源码学设计]蚂蚁金服SOFARegistry之消息总线异步处理 目录 [从源码学设计]蚂蚁金服SOFARegistry之消息总线异步处理 0x00 摘要 0x01 为何分离 0x02 业务领域 2 ...

  6. Python运算符的优先级是怎样的?

    优先级数字越高表示优先级越高,有关运算符的详细介绍请参考<Python运算符大全>

  7. PyQt(Python+Qt)学习随笔:QListWidget插入多项的insertItems方法

    老猿Python博文目录 专栏:使用PyQt开发图形界面Python应用 老猿Python博客地址 除了insertItem方法能插入项外,QListWidget支持一次插入多个项,对应的方法就是in ...

  8. PyQt(Python+Qt)学习随笔:使用pyqtConfigure建立信号和槽的连接

    老猿Python博文目录 专栏:使用PyQt开发图形界面Python应用 老猿Python博客地址 在PyQt中,一般信号和槽的连接是通过connect方法建立的,语法如下: connect(slot ...

  9. [ASIS 2019]Unicorn shop

    点击进去之后是一个购买独角兽的界面,有四种类型的独角兽,前三种的价格比较便宜,最后的独角兽价格比较贵. 我们先尝试购买前三种独角兽,输入id,然后price输入9 然后就告诉我商品错了,可能复现靶场这 ...

  10. python序列化与反序列化(json、pickle)-(五)

    1.什么是序列化&反序列化? 序列化:将字典.列表.类的实例对象等内容转换成一个字符串的过程. 反序列化:将一个字符串转换成字典.列表.类的实例对象等内容的过程 PS:Python中常见的数据 ...