PNAS

2018-ECCV-Progressive Neural Architecture Search

  • Johns Hopkins University(霍普金斯大学) && Google AI && Stanford
  • GitHub:300+ stars
  • Citation:504

Motivation

current techniques usually fall into one of two categories: evolutionary algorithms(EA) or reinforcement learning(RL).

Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources.

目前的两种nas方法,EA和RL,存在计算代价高昂的问题

Contribution

we describe a method that requiring 5 times fewer model evaluations during the architecture search.

只需要评估1/5的模型。

We propose to use heuristic search to search the space of cell structures, starting with simple (shallow) models and progressing to complex ones, pruning out unpromising structures as we go.

渐进式的搜索,从浅层网络开始,逐步搜索复杂网络。

Since this process is expensive, we also learn a model or surrogate function(替代函数) which can predict the performance of a structure without needing to training it.

提出一种近似评估模型好坏的评估函数(预测器),直接预测模型性能,而不是从头训练候选网络。

Several advantages:

First, the simple structures train faster, so we get some initial results to train the surrogate quickly.

代理网络比较小,训练速度快(代价可以忽略不计)。

Second, we only ask the surrogate to predict the quality of structures that are slightly different (larger) from the ones it has seen

预测器只需要预测稍微不同的网络。

Third, we factorize(分解) the search space into a product(乘积) of smaller search spaces, allowing us to potentially search models with many more blocks.

将大的搜索空间分解为小的搜索空间的乘积。

we show that our approach is 5 times more efficient than the RL method of [41] in terms of number of models evaluated, and 8 times faster in terms of total compute.

效率相比RL方法提高5倍,总计算量快了8倍。

Method

Search Space

we first learn a cell structure, and then stack this cell a desired number of times, in order to create the final CNN.

先学习cell结构,再堆叠cell到目标层数。

一个cell接收HxWxF的tensor,如果cell的stride=1,输出HxWxF的tensor,如果stride=2,输出H/2 x W/2 x 2F的tensor。

一个cell由B个block组成,每个block有2个input和1个output,每个block可以用一个五元组表示\(\left(I_{1}, I_{2}, O_{1}, O_{2}, C\right)\),第c个cell的输出表示为\(H^c\),第c个cell的第b个block的输出表示为\(H^c_b\)。

每个block的输入为当前cell中,在 {此block之前所有block的输出} 和 {上一个cell的输出,上上个cell的输出} 的集合。

Operator的选择空间有8种操作。

we stack a predefined number of copies of the basic cell (with the same structure, but untied weights 不继承权重 ), using either stride 1 or stride 2, as shown in Figure 1 (right).

找到最佳cell结构后,堆叠预定义的层数,构成右边的完整网络,不继承权重(重新训练)。

The number of stride-1 cells between stride-2 cells is then adjusted accordingly with up to N number of repeats.

Normal cell(stride=1)的数量,取决于N(超参)。

we only use one cell type (we do not distinguish between Normal and Reduction cells, but instead emulate a Reduction cell by using a Normal cell with stride 2),

我们没有区分normal cell 和Reduction cell,仅将Normal cell的stride设置为2作为Reduction cell。

Progressive Neural Architecture Search

Many previous approaches directly search in the space of full cells, or worse, full CNNs.

之前的方法直接搜索完整的cell结构,更糟糕的是整个cnn。

While this is a more direct approach, we argue that it is difficult to directly navigate in an exponentially large search space, especially at the beginning where there is no knowledge of what makes a good model.

尽管这种方式很直接,但搜索空间太大,而且一开始我们没有任何先验知识指导我们在巨大的搜索空间往哪个方向搜索。

从每个cell含有1个block开始搜索。训练所有可能的\(B_1\),用\(B_1\)训练预测器,然后将\(B_1\)展开为\(B_2\)。

训练所有可能的\(B_2\)代价太大,我们使用预测器来评估所有\(B_2\)-cell的性能并选出最佳的K个\(B_2\)-cell,重复此过程(用选出来K个\(B_2\)-cell训练预测器,将选出的K个\(B_2\)-cell展开为\(B_3\),再用预测器选出最佳的K个...)。

Performance Prediction with Surrogate Model

Requirement of Predictor

  • Handle variable-sized inputs(接受可变输入)
  • Correlated with true performance(预测值与真实值得相关性)
  • Sample efficiency(简单高效)
  • The requirement that the predictor be able to handle variable-sized strings immediately suggests the use of an RNN.

Two Predictor method

RNN and MLP(多层感知机)

However, since the sample size is very small, we fit an ensemble of 5 predictors, We observed empirically that this reduced the variance of the predictions.

由于样本很简单,因此集成5个预测器(RNN-ensemble,MLP-ensemble),可以减少方差。

Experiments

Performance of the Surrogate Predictors

we train the predictor on the observed performance of cells with up to b blocks, but we apply it to cells with b+1 blocks.

在{B=b}上训练,在{B=b+1}的集合上预测。

We therefore consider predictive accuracy both for cells with sizes that have been seen before (but which have not been trained on), and for cells which are one block larger than the training data.

同时考虑在{B=b}的未训练的cell集合上的预测准确率,和{B=b+1}的cell集合上的预测准确率。

在所有{B=b}的cell集合中随机选择10k个作为数据集\(U_{b,1 :R}\),训练20个epochs。

randomly select K = 256 models (each of size b) from \(U_{b,1 :R}\)to generate a training set \(S_{b,t,1:K}\);

从数据集U中随机选择256个作为每轮的训练集S。

一共会训练20*256=5120个数据点。

We now use this random dataset to evaluate the performance of the predictors using the pseudocode(伪代码) in Algorithm 2, where A(H) returns the true validation set accuracies of the models in some set H.

A(H) 返回cell的集合H训练后真实的准确率。

当B=b时,训练集为所有{B=b}的cell的一个子集,第一行为在所有{B=b}的cell的训练集(256*20=5120)上的预测结果和真实结果的相关性,

第二行为在所有{B=b+1}的cell的数据集(10k)上的预测结果和真实结果的相关性。

We see that the predictor performs well on models from the training set, but not so well when predicting larger models. However, performance does increase as the predictor is trained on more (and larger) cells.

预测器在训练集{B=b}上表现良好,但在较大的数据集{B=b+1}上表现不够好,但随着b的增加,越来越好。

We see that for predicting the training set, the RNN does better than the MLP, but for predicting the performance on unseen larger models (which is the setting we care about in practice), the MLP seems to do slightly better.

RNN方法的预测器在训练集{B=b}上表现更好,MLP在较大的数据集{B=b+1}上表现更好(我们关心的)

Conclusion

The main contribution of this work is to show how we can accelerate the search for good CNN structures by using progressive search through the space of increasingly complex graphs

使用渐进式(cell的深度逐渐增加)的搜索加速NAS

combined with a learned prediction function to efficiently identify the most promising models to explore.

使用可学习的预测器来识别潜在的最优网络。(引入P网络来搜索目标网络的最佳结构。eg. 用C网络来搜索B网络的最佳结构,而B网络又是来搜索A网络的最佳结构,套娃)

The resulting models achieve the same level of performance as previous work but with a fraction of the computational cost.

用小代价达到了了SOTA

Appendix

2018-ECCV-PNAS-Progressive Neural Architecture Search-论文阅读的更多相关文章

  1. 论文笔记:Progressive Neural Architecture Search

    Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...

  2. 论文笔记:Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

    Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...

  3. 小米造最强超分辨率算法 | Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search

    本篇是基于 NAS 的图像超分辨率的文章,知名学术性自媒体 Paperweekly 在该文公布后迅速跟进,发表分析称「属于目前很火的 AutoML / Neural Architecture Sear ...

  4. Research Guide for Neural Architecture Search

    Research Guide for Neural Architecture Search 2019-09-19 09:29:04 This blog is from: https://heartbe ...

  5. 论文笔记:Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

    Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...

  6. 论文笔记系列-Neural Architecture Search With Reinforcement Learning

    摘要 神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的.在本篇论文中,作者使用 递归网络去省城神经网络的模型描述,并且使用 增强学习训练RNN,以使得生成得到的模型在验证集上 ...

  7. 论文笔记:ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

    ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...

  8. 论文笔记:Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation

    Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation ...

  9. (转)Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS

    Illustrated: Efficient Neural Architecture Search --- Guide on macro and micro search strategies in  ...

随机推荐

  1. HDU 3038 (向量图解)

    题意:\(有n个人坐在zjnu体育馆里面,然后给出m个他们之间的距离, A B X, 代表B的座位比A多X.\) \(然后求出这m个关系之间有多少个错误,所谓错误就是当前这个关系与之前的有冲突\) \ ...

  2. K - Leapin' Lizards HDU - 2732 网络流

    题目链接:https://vjudge.net/contest/299467#problem/K 这个题目从数据范围来看可以发现是网络流,怎么建图呢?这个其实不是特别难,主要是读题难. 这个建图就是把 ...

  3. 在web项目中使用shiro(记住我功能)

    第一步,添加“记住我”复选框,rememberMe要设置参数 第二步,配置shiro的主配置文件 注意 rememberMeCookie对应的bean中要声明 <constructor-arg ...

  4. 某科学的PID算法学习笔记

    最近,在某社团的要求下,自学了PID算法.学完后,深切地感受到PID算法之强大.PID算法应用广泛,比如加热器.平衡车.无人机等等,是自动控制理论中比较容易理解但十分重要的算法. 下面是博主学习过程中 ...

  5. jquery注册页面的判断及代码的优化

    今天主要负责完成注册页面的jquery代码的写入与优化,基本代码和登录页面差不多,复制修改一下代码就行了,主要区别在于多了一个重复密码与密码是否一致的判断,刚开始写出来的代码导致每个框的后面都追加重复 ...

  6. 如何构建一个arm64 AArch64的Ubuntu rootfs

    文章目录 1 下载文件创建rootfs文件夹 2 安装qemu-user-static搭建arm64模拟环境 3 chroot 到 模拟arm64的文件系统下 4 安装基础的软件包 5 系统基础的修改 ...

  7. dot 使用教程

    dot使用教程 安装: windows: 安装后需要将安装文件的bin目录添加到命令行, 可以在命令行生成图片 linux: mac: dot和vscode 安装插件:Graphviz (dot) l ...

  8. Python基础语法day_03——列表

    day_03 列表是什么 在Python中,用[]来表示列表,并用逗号来分隔其中的元素.下面是一个简单的列表示例: >>> bicycles = ['treak','cannonda ...

  9. AI技术原理|机器学习算法

    摘要 机器学习算法分类:监督学习.半监督学习.无监督学习.强化学习 基本的机器学习算法:线性回归.支持向量机(SVM).最近邻居(KNN).逻辑回归.决策树.k平均.随机森林.朴素贝叶斯.降维.梯度增 ...

  10. React:Component

    web开发由web pages过渡到web app 后,开发的模式也发生了变化,由传统的主张结构.样式.行为分离到现在的组件化,把应用的各个部分看成解耦的部分,每部分自包含js.css和html,以方 ...