(转)Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS
Illustrated: Efficient Neural Architecture Search
--- Guide on macro and micro search strategies in ENAS
2019-03-27 09:41:07
This blog is copied from: https://towardsdatascience.com/illustrated-efficient-neural-architecture-search-5f7387f9fb6
Designing neural networks for various tasks like image classification and natural language understanding often requires significant architecture engineering and expertise. Enter Neural Architecture Search (NAS), a task to automate the manual process of designing neural networks. NAS owes its growing research interest to the increasing prominence of deep learning models of late.
There are many ways to search for or discover neural architectures. Over the past couple of years, the community has seen different search methods proposed including:
- Reinforcement learning
Neural Architecture Search with Reinforcement Learning (Zoph and Le, 2016)
NASNet (Zoph et al., 2017)
ENAS (Pham et al., 2018) - Evolutionary algorithm
Hierarchical Evo (Liu et al., 2017)
AmoebaNet (Real et al., 2018) - Sequential model-based optimisation (SMBO)
PNAS (Liu et al., 2017) - Bayesian optimisation
Auto-Keras (Jin et al., 2018)
NASBOT (Kandasamy et al. 2018) - Gradient-based optimisation
SNAS (Xie et al., 2018)
DARTS (Liu et al., 2018)
In this post, we will look at Efficient Neural Architecture Search (ENAS)which employs reinforcement learning to build convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The authors (Hieu Pham, Melody Guan, Barret Zoph, Quoc V. Le, and Jeff Dean) proposed a predefined neural network to generate new neural networks guided by a reinforcement learning framework using macro and micro search. That’s right – a neural network building another neural network.
The purpose of this article is to provide the readers a tutorial on how the macro and micro search strategies lead to generating neural networks. While the illustrations and animations serve to guide the readers, the sequence of animations do not necessarily reflect the flow of operations (due to vectorisation etc.).
We shall narrow the scope of this tutorial to neural architecture search for CNNs in an image classification task. This article assumes that the reader is familiar with the basics of RNNs, CNNs, and reinforcement learning. Familiarity with deep learning concepts like transfer learning and skip/residual connections will greatly help as they are heavily used in the architecture search. It is not required to have read the paper, but it would speed up your understanding.
Contents
0. Overview
1. Search Strategy
1.1. Macro Search
1.2. Micro Search
2. Notes
3. Summary
4. Implementations
5. References
0. Overview
In ENAS, there are 2 types of neural networks involved:
- Controller – a predefined RNN, which is a long short-term memory (LSTM) cell
- Child model – the desired CNN for image classification
Like most other NAS algorithms, the ENAS involves 3 concepts:
- Search space — all the different possible architectures or child models that can possibly be generated
- Search strategy — a method to generate these architectures or child models
- Performance evaluation — a method to measure the effectiveness of the generated child models
Let’s see how these five ideas form the ENAS story.
The controller controls or directs the building of the child model’s architecture by “generating a set of instructions” (or, more rigorously, making decisions or sampling decisions) using a certain search strategy. These decisions are things like what types of operations (convolutions, pooling etc.) to perform at a particular layer of the child model. Using these decisions, a child model is built. A generated child model is one of the many possible child models that can be built in the search space.
This particular child model is then trained to convergence (~95% training accuracy) using stochastic gradient descent to minimise the expected loss function between the predicted class and ground truth class (for an image classification task). This is done for a specified number of epochs, what I’d like to call child epochs, say 100. Then, a validation accuracy is obtained from this trained model.
Then, we update the controller’s parameters using REINFORCE, a policy-based reinforcement learning algorithm, to maximise the expected reward function which is the validation accuracy. This parameter update hopes to improve the controller in generating better decisions that give higher validation accuracies.
This entire process (from 3 paragraphs before this) is just one epoch — let’s call it controller epoch. We then repeat this for a specified number of controller epochs, say 2000.
Of all the 2000 child models generated, the one with the highest validation accuracy gets the honour to be the neural network for your image classification task. However, this child model must go through just one more round of training (again specified by the number of child epochs), before it can be used for deployment.
A pseudo algorithm for the entire training is written below:
CONTROLLER_EPOCHS = 2000
CHILD_EPOCHS = 100
Build controller network
for i in CONTROLLER_EPOCHS:
1. Generate a child model
2. Train this child model for CHILD_EPOCHS
3. Obtain val_acc
4. Update controller parameters
Get child model with the highest val_acc
Train this child model for CHILD_EPOCHS
This entire problem is essentially a reinforcement learning framework with the archetypal elements:
- Agent — Controller
- Action — The decisions taken to build the child network
- Reward — Validation accuracy from the child network
The aim of this reinforcement learning task is to maximise the reward (validation accuracy) from the actions taken (decisions taken to build child model architecture) by the agent (controller).
1. Search Strategy
Recall in the previous section that the controller generates the child model’s architecture using a certain search strategy. There are two questions that you should ask in this statement — (1) how does the controller make decisions and (2) what search strategy?
How does the controller make decisions?
This brings us to the model of the controller, which is an LSTM. This LSTM samples decisions via softmax classifiers, in an auto-regressive fashion: the decision in the previous step is fed as input embedding into the next step.
What are the search strategies?
The authors of ENAS proposed 2 strategies for searching for or generating an architecture.
- Macro search
- Micro search
Macro search is an approach where the controller designs the entire network. Examples of publications that use this include NAS by Zoph and Le, FractalNet and SMASH. On the other hand, micro search is an approach where the controller designs modules or building blocks, which are combined to build the final network. Some papers that implement this approach are Hierarchical NAS, Progressive NAS and NASNet.
In the following 2 sub-sections we will see how ENAS implements these 2 strategies.
1.1 Macro Search
In macro search, the controller makes 2 decisions for every layer in the child model:
- the operation to perform on the previous layer (see Notes for the list of operations)
- the previous layer to connect to for skip connections
In this macro search example, we will see how the controller generates a 4-layer child model. Each layer in this child model is colour-coded with red, green, blue and purple respectively.
Convolutional Layer 1 (Red)
We’ll start with running the first time step of the controller. The output of this time step is softmaxed to get a vector, which translates to a conv3×3operation.
What this means for the child model is that the we perform a convolution with a 3×3 filter on the input image.

The output from the first time step (conv3×3) of the controller corresponds to building the first layer (red) in the child model. This means the child model will first perform 3×3 convolution on the input image.
I know I mentioned that the controller needs to make 2 decisions but there’s only 1 here. Since this is the first layer, we can only sample one decision which is the operation to perform, because there’s nothing else to connect to except for the input image itself.
Convolutional Layer 2 (Green)
To build the subsequent convolutional layers, the controller makes 2 decisions (no more lies): (i) operation and (ii) layer(s) to connect to. Here, we see that it generated 1 and sep5×5.
What this means for the child model is that we first perform a sep5×5operation on the output of the previous layer. Then, this output is concatenated along the depth together with the output of Layer 1, i.e. the output from the red layer.

The outputs from the 2nd and 3rd time step (1 and sep5×5) in the controller correspond to building Convolutional Layer 2 (green) in the child model.
Convolutional Layer 3 (Blue)
We repeat the previous step again to generate the 3rd convolutional layer. Again, we see here that the controller generates 2 things: (i) operation and (ii) layer(s) to connect to. Below, the controller generated 1 and 2, and the operation max3×3.
So, the child model performs the operation max3×3 on the output of the previous layer (Layer 2, green). Then, the result of this operation is concatenated along the depth dimension with Layers 1 and 2.

The outputs from the 4th and 5th time step (1,2 and max3×3) in the controller correspond to building Convolutional Layer 3 (blue) in the child model.
Convolutional Layer 4 (Purple)
We repeat the previous step again to generate the 4th convolutional layer. This time the controller generated 1 and 3, and the operation conv5×5.
The child model performs the operation conv5×5 on the output of the previous layer (Layer 3, blue). Then, the result of this operation is concatenated along the depth dimension with Layers 1 and 3.

The outputs from the 6th and 7th time step (1,3 and conv5×5) in the controller correspond to building Convolutional Layer 4 (purple) in the child model.
End
And there you have it — a child model generated using the macro search! Now on to micro search. Heads up: micro search isn’t as straightforward as macro search.
1.2 Micro Search
As mentioned earlier, micro search designs modules or building blocks which are then connected together to form the final architecture. ENAS calls these building blocks convolutional cells and reduction cells. Simply put, a convolutional cell or reduction cell is just a block of operations. Both are similar — the only thing different about reduction cells is that the operations are applied with a stride of 2, thus reducing the spatial dimensions.
How to connect these cells to form the final network, you may ask?
The final network
Below is an image that gives you a quick overview of the final generated child model.

Fig. 1.2.1: Overview of the final neural network generated. Image source.
Let’s come back to this in a bit.
Building units for networks derived for micro search
There’s sort of a hierarchy of the ‘building units’ of child networks derived from micro search. From biggest to smallest:
- block
- convolutional cell / reduction cell
- node
A child model consists of several blocks. Each block consists of Nconvolutional cells and 1 reduction cell, in that order, as mentioned above. Each convolutional/reduction cell comprises B nodes. And each node consists of standard convolutional operations (we’ll see this later). (N and B are hyperparameters that can be tuned by the architect.)
Below is a child model with 3 blocks. Each block consists of N=3 convolutional cells and 1 reduction cell. The operations within each cell are not shown here.

Fig. 1.2.2: Overview of the final neural network generated. Image source.
So how to generate this child model from micro search, you may ask? Continue reading!
Generate a child model from micro search
For this micro search tutorial, let’s build a child model that has 1 block, for simplicity’s sake. Each block (there’s only one though) comprises N=3 convolutional cells and 1 reduction cell. Each cell comprises B=4 nodes. This means our generated child model should look like this:

Fig. 1.2.3: A neural network generated from micro search which has 1 block, consisting of 3 convolutional cells and 1 reduction cell. The individual operations are not shown here.
Let’s now build a convolutional cell!
Fast forward
To explain how to build a convolutional cell, let me take you to a stage where we have already built the first 2 convolutional cells for you. Notice that the last operations from each of these 2 cells are add operations. Let’s just take it for granted for now.

Fig. 1.2.4: Two convolutional cells already built in micro search.
With 2 convolutional cells built for us, let’s move on to the third.
Convolutional Cell #3
Now, let’s ‘prepare’ the third convolutional cell — the cell that you and I will be building together.

Fig. 1.2.5: ‘Preparing’ the third convolutional cell in micro search.
Recall that every convolutional cell consists of 4 nodes. Now you might say: Okay sure so where are these nodes?
The first two nodes — read this very carefully and slowly — are the two previous cells from the current cell — yes, cells. What about the other 2 nodes? These 2 nodes fall in this very convolutional cell that we are building right now. Let’s make known where these nodes are:

Fig. 1.2.6: Identifying the 4 nodes while building Convolutional Cell #3.
From this section onwards, you can safely disregard the ‘Convolutional cell’ labels you see on the image above and concentrate on the ‘Nodes’ labels:
Node 1 — red (Convolutional Cell #1)
Node 2 — blue (Convolutional Cell #2)
Node 3 — green
Node 4 — purple
If you’re wondering if these nodes will change for every convolutional cell we’re building, the answer is yes! Every cell will ‘assign’ the nodes in this manner.
You might also wonder — since we’ve already built the operations in Node 1 and Node 2 (which are Convolutional Cells #1 and #2), what’s there left to build in these nodes? You asked the right question.
Convolutional Cell #3: Node 1 (red) and Node 2 (blue)
For any cell that we’re building, the first 2 nodes do not have to be built but instead become the inputs to the other nodes. In our example, since we are building 4 nodes, so Node 1 and 2 can be inputs to Node 3 and Node 4. So, yay! We don’t have to do anything for Node 1 and Node 2 and we can now move on to building Node 3 and Node 4. Phew!
Convolutional Cell #3: Node 3 (Green)
Node 3 is where the building starts. Unlike in macro search where the controller samples 2 decisions for every layer, here in micro search we have the controller samples 4 decisions for us (or rather 2 sets of decisions):
- 2 nodes to connect to
- the respective 2 operations to perform on the nodes to connect to
With 4 decisions to make, the controller runs 4 time steps. Have a look below:

Fig. 1.2.7: The outputs of the first four controller time steps (2, 1, avg5×5, sep5×5), which will be used to build Node 3.
From the above we see that the controller sampled 2, 1, avg5×5, and sep5×5 from each of the four time steps. How does this translate to the architecture of the child model? Let’s see:

Fig. 1.2.8: How the outputs of the first four controller time steps (2, 1, avg5×5, sep5×5) are translated to build Node 3.
From the above, there are three things that just happened:
- The output from Node
2(blue) undergoes theavg5×5operation. - The output from Node
1(red) undergoes asep5×5operation. - Both the results from these two operations undergo an
addoperation.
The output from this node is the tensor that undergoes the add operation. This explains why Nodes 1 and 2 end with add operations.
Convolutional Cell #3: Node 4 (Purple)
Now for Node 4. We repeat the same steps, just that the controller now has 3 nodes to choose from (Nodes 1, 2 and 3). Below, the controller generated 3, 1, id and avg3×3.

Fig. 1.2.9: The outputs of the first four controller time steps (3, 1, id, avg3×3), which will be used to build Node 4.
This translates to building the following:

Fig. 1.2.10: How the outputs of the first four controller time steps (3, 1, id, avg3×3) are translated to build Node 3.
What just happened?
- The output from Node
3(green) undergoes anidoperation. - The output from Node
1(red) undergoes anavg3×3operation. - Both the results from these two operations undergo an
addoperation.
And that’s it we’re done for Convolutional Cell #3.
Reduction Cell
Recall that for every N convolutional cells, we need to have a reduction cell. Since N=3 in this tutorial, and we’ve just finished with Convolutional Cell #3, it’s time to build a reduction cell. As mentioned earlier, the design of the reduction cell is similar to Convolutional Cell #3, except that the operations that are sampled have a stride of 2.
End
And so that wraps up generating a child model out of the micro search strategy. Phew! I hope that wasn’t too much for you, because it was for me when I first read the paper.
2. Notes
Because this post mainly shows the macro and micro search strategies, I’ve left out many small details (especially on the concept of transfer learning). Let me briefly cover them:
- What’s so ‘efficient’ in ENAS? Answer: transfer learning. If a computation between two nodes has been done (trained) before, the weights from the convolutional filters and 1×1 convolutions (to maintain number of channel outputs; not mentioned in the previous sections) will be reused. This is what makes ENAS faster than its predecessors!
- It is possible that the controller samples a decision where no skip connection is needed.
- There are 6 operations available for the controller: convolutions with filter sizes 3×3 and 5×5, depthwise-separable convolutions with filter sizes 3×3 and 5×5, max pooling and average pooling of kernel size 3×3.
- Do read up on the concatenate operation at the end of each cell which ties up ‘loose ends’ of any nodes.
- Do read up briefly on the policy gradient algorithm (REINFORCE) reinforcement learning.
3. Summary
Macro search (for an entire network)
The final child model is as shown below.

Fig. 3.1: Generating a convolutional neural network with macro search.
Micro search (for a convolutional cell)
Note that only part of the final child model is shown here.

Fig. 3.2: Generating a convolutional neural network with micro search. Only part of the full architecture is shown.
4. Implementations
5. References
Efficient Neural Architecture Search via Parameter Sharing
Neural Architecture Search with Reinforcement Learning
Learning Transferable Architectures for Scalable Image Recognition
That’s it! Remember to read the ENAS paper Efficient Neural Architecture Search via Parameter Sharing. If you have any questions, please highlight and leave a comment.
Other Articles on Deep Learning
General
Counting No. of Parameters in Deep Learning Models
Related to NLP
Line-by-Line Word2Vec Implementation
Related to Computer Vision
Breaking down Mean Average Precision (mAP)
Optimisation
Step-by-Step Tutorial on Linear Regression with Stochastic Gradient Descent
10 Gradient Descent Optimisation Algorithms + Cheat Sheet
Special thanks to Ren Jie Tan, Derek, and Yu Xuan Tay for ideas, suggestions and corrections to this article.
==
(转)Illustrated: Efficient Neural Architecture Search ---Guide on macro and micro search strategies in ENAS的更多相关文章
- 论文笔记系列-Efficient Neural Architecture Search via Parameter Sharing
Summary 本文提出超越神经架构搜索(NAS)的高效神经架构搜索(ENAS),这是一种经济的自动化模型设计方法,通过强制所有子模型共享权重从而提升了NAS的效率,克服了NAS算力成本巨大且耗时的缺 ...
- Research Guide for Neural Architecture Search
Research Guide for Neural Architecture Search 2019-09-19 09:29:04 This blog is from: https://heartbe ...
- (转)The Evolved Transformer - Enhancing Transformer with Neural Architecture Search
The Evolved Transformer - Enhancing Transformer with Neural Architecture Search 2019-03-26 19:14:33 ...
- 论文笔记系列-Neural Architecture Search With Reinforcement Learning
摘要 神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的.在本篇论文中,作者使用 递归网络去省城神经网络的模型描述,并且使用 增强学习训练RNN,以使得生成得到的模型在验证集上 ...
- Neural Architecture Search — Limitations and Extensions
Neural Architecture Search — Limitations and Extensions 2019-09-16 07:46:09 This blog is from: https ...
- 论文笔记:Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
- 论文笔记:ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
- 论文笔记:Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
- 论文笔记:Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
随机推荐
- socket通信的遇到的问题1
/*使用select对fd可读写,格式*/ while(ctrl){ //// FD_ZERO(&readSocketSet); FD_SET(readSocketFd,&readSo ...
- uni-app第三方登陆-微信
结合上文全局登陆校验,实现微信授权登录官方手册地址: https://uniapp.dcloud.io/api/plugins/login?id=getuserinfo 一.书写两个界面 login. ...
- 2017(2)数据库设计,数据库设计过程,ER模型,规范化理论
试题二(共 25 分〉 阅读以下关于系统数据分析与建模的叙述,在答题纸上回答问题1 至问题 3. [说明] 某软件公司受快递公司委托,拟开发一套快递业务综合管理系统,实现快递单和物流信息的综合管理.项 ...
- springboot读取application.properties中自定义配置
假设在application-xxx.properties中配置 user.name=yuhk 一.在Controller中读取 @Value("{$user.name}") pr ...
- java中使用springmvc实现下载文件
下载文件具体实现代码: public class TestDownload{ public HttpServletRequest request; public HttpServletResponse ...
- JavaScript 的if语句和==的判断
一. if(xx)的判断 JavaScript 遇到预期为布尔值的地方(比如if语句的条件部分),就会将非布尔值的参数自动转换为布尔值.系统内部会自动调用Boolean函数. 1.当if括号里面的表达 ...
- log4cplus在Linux下编译及使用
log4cplus第一次在windows下使用的时候很快就完成了,最近在Linux下尝试使用时遇到了不少问题,主要原因是对Linux的编译连接不熟悉,以下就记录安装使用的过程,希望对需要的人有所帮助. ...
- vue中遇到的问题:Error: Cannot find module 'chalk'
安装 npm install chalk 如果还缺其他很多模块,那就 npm install 暴力解决问题
- DHCP和TFTP服务
DHCP服务 主要用途:用于内部网络和网络服务供应商自动分配IP地址给用户 用于内部网络管理员作为对所有电脑作集中管理的手段 使用场景:自动化安装系统 解决IPV4资源不足问题 DHCP共有八种报文: ...
- 短信利用weixin://connectToFreeWifi/?apKey=协议跳转到微信打开落地页h5
微信门店wifi接口,任意站跳转,跳转二维码长按识别加粉,接口支持动态传参数,支持微信支付等特殊接口对接. 代码如下使用 <head> <meta charset="utf ...