1. Introduction


In this work, inspired by metric learning based on deep neural features and memory augment neural networks, authors propose matching networks that map a small labelled support set and an unlabelled example to its label. Then they define one-shot learning problems on vision and language tasks and obtain an improving one-shot accuracy on ImageNet and Omnight. The novelty of their work is twofold: at the modeling level, and at the training procedure.

2. Model


Their non-parametric approach to solving one-shot is based on two components. First, the model architecture follows recent advances in neural networks augmented with memory. Given a support set $S$, the model difines a function $c_S$(or classifier) for each $S$ Sencond, we employ a training strategy which is tailored for one-shot learning from the support set $S$

2.1 Model Architecture

Matching Networks are able to produce sensible test labels for unobserved classes without any changes to the network. We wish to map from a support set of $k$ examples of images-label pairs $S={(x_i,y_i)}_{i=1}^k$ to a classfier $c_S(\hat{x})$ which,given a test example $\hat{x}$, defines a probability distribution over outputs $\hat{y}$. Furthmore, difine the mapping $S\rightarrow c_S(\hat{x})$ to be $P(\hat{y} \mid \hat{x},S)$ where $P$ is parameterised by a neural network. Thus, When given a new support set of examples $S'$ from which to one-shot learn, we simply use the parametric neural network defined by $P$ to make predictions about the appropriate label $\hat{y}$ for each test example $\hat{x}$: $P(\hat{y} \mid \hat{x},S')$. In general, our predicted output class for a given input unseen example $\hat{x}$ and a support set $S$ becomes $arg \max_y P(y\mid \hat{x},S)$. The model in its simplest form computes $\hat{y}$ as follows:

$$ \hat{y}=\sum_{i=1}^k a(\hat{x},x_i)y_i $$

where $x_i,y_i$ are the samples and labels from the support set $S=\{(x_i,y_i)\}_{i=1}^k$, and $a$ is an attention mechanism. Here,the attention kernel function is the softmax over the cosine distance. $$ a(\hat{x},x_i)=\frac{e^{c(f(\hat{x}),g(x_i))}}{\sum_{j=1}^k e^{c(f(\hat{x}),g(x_j))}} $$ where embeding functions $f$ and $g$ are, actually, appropriate neural networks to embed $\hat{x}$ and $x_i$

2.2 Training Strategy

Let us define a tast $T$ as distribution over possible label sets $L$. To form an “episode” to compute gradients and update our model, we first sample $L$ from $T$(e.g.,$L$ could be the label set {cats; dogs}). We then use $L$ to sample the support set $S$ and a batch $B$ (i.e., both $S$ and $B$ are labelled examples of cats and dogs). The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set $S$. This is a form of meta-learning since the training procedure explicitly learns to learn from a given support set to minimise a loss over a batch. More precisely, the Matching Nets training objective is as follows:

$$ \theta = arg\max_{\theta}E_{L\sim T}\Big[E_{S\sim L,B\sim L}\Big[\sum_{(x,y)\in B}\log P_{\theta}(y\mid x,S)\Big]\Big] $$

Training $\theta$ with this objective function yields a model which works well when sampling $S'\sim T'$ from a different distribution of novel labels

3. Appendix


3.1 The Fully Conditional Embedding $f$

The embedding function for an example $\hat{x}$ in the batch $B$ is as follows:

$$ f(\hat{x},S)=attLSTM(f'(\hat{x}),g(S),K) $$

where $f'$ is a neural network. $K$ is the number of "processing" steps following work. $g(S)$ represents the embedding function $g$ applied to each element $x_i$ from the set $S$. Thus, the state after $k$ processing steps is as follows:

$$ \hat{h}_k,c_k = LSTM(f'(\hat{x}),[h_{k-1},r_{k-1}],c_{k-1}) $$

$$ h_k = \hat{h}_k+f'(\hat{x}) $$

$$ r_{k-1}=\sum_{i=1}^{|S|}a(h_{k-1},g(x_i))g(x_i) $$

$$ a(h_{k-1},g(x_i))=softmax(h_{k-1}^Tg(x_i)) $$

3.2 The Fully Conditional Embedding $g$

The encoding function for the elements in the support set $S$, $g(x_i,S)$ as a bidirectional LSTM. Let g'(x_i) be a neural network, then we difine $g(x_i,S)=\vec{h}_i+h_i^{\leftarrow}+g'(x_i)$ with:

$$ \vec{h}_i,\vec{c}_i=LSTM(g'(x_i),\vec{h}_{i-1},\vec{c}_{i-1}) $$

$$ h_i^{\leftarrow},c_i^{\leftarrow}=LSTM(g'(x_i),h_{i+1}^{\leftarrow},c_{i+1}^{\leftarrow}) $$

Reference: https://arxiv.org/abs/1606.04080

Matching Networks for One Shot Learning的更多相关文章

  1. (转)Paper list of Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning

    Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning 2018-08-03 19:16:56 本文转自:http ...

  2. Multi-attention Network for One Shot Learning

    Multi-attention Network for One Shot Learning 2018-05-15 22:35:50  本文的贡献点在于: 1. 表明类别标签信息对 one shot l ...

  3. (六)6.11 Neurons Networks implements of self-taught learning

    在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...

  4. 论文笔记系列-Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

    I. 背景介绍 1. 学习曲线(Learning Curve) 我们都知道在手工调试模型的参数的时候,我们并不会每次都等到模型迭代完后再修改超参数,而是待模型训练了一定的epoch次数后,通过观察学习 ...

  5. CS229 6.11 Neurons Networks implements of self-taught learning

    在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...

  6. 零样本学习 - (Zero shot learning,ZSL)

    https://zhuanlan.zhihu.com/p/41846072 https://zhuanlan.zhihu.com/p/38418698 https://zhuanlan.zhihu.c ...

  7. 18 Issues in Current Deep Reinforcement Learning from ZhiHu

    深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...

  8. (zhuan) Where can I start with Deep Learning?

    Where can I start with Deep Learning? By Rotek Song, Deep Reinforcement Learning/Robotics/Computer V ...

  9. Few-Shot/One-Shot Learning

    Few-Shot/One-Shot Learning指的是小样本学习,目的是克服机器学习中训练模型需要海量数据的问题,期望通过少量数据即可获得足够的知识. Matching Networks for ...

随机推荐

  1. 不常用的vi命令

    vi u 撤回ctrl+r 撤回的撤回 全文替换%s/old/new/g 指定行区间替换12,15s/old/new/g c替换前确认12,15s/old/new/gc 用#代替分隔符,用户关键字有/ ...

  2. 【RestTemplete】使用RestTemplete传Json或者 {} 报错--解决

    https://jira.spring.io/browse/SPR-9220?focusedCommentId=76760&page=com.atlassian.jira.plugin.sys ...

  3. Activiti流程设计工具

    在Actitivi工程的src/main/resources新建一个文件夹diagrams 然后右键,创建一个activiti Diagram 取名为helloWorld后finish 中间区域,是我 ...

  4. 基于LNMP的Zabbix4.0.1部署

     转:http://www.safecdn.cn/monitor/2018/12/lnmp-zabbix4-0-1-install/306.htmlZabbix4.0.1部署   一 安装源和Zabb ...

  5. CPU与内存互联的架构演变

    随着计算机中CPU核数目的增加,传统的UMA(unifonn memory access)架构由于对关键硬件(如中央内存控制器)的竞争加剧出现了性能上的瓶颈,即扩展性不强.而NUMA架构则以其良好的可 ...

  6. .NET MVC 控制器和行为

    行为就是可访问方法(public) 行为返回类型必须是 ActionResult 或者其派生类,基本上返回类型为以下四种之一 View(视图路径) Json(对象或者对象集合) Content(字符串 ...

  7. LNMP 支持 ThinkPHP 的 pathinfo 模式

    注意使用LNMP 1.4版 1.修改php.ini 启用pathinfo /usr/local/php/etc/php.ini cgi.fix_pathinfo = 0 值改为1 2.修改/usr/l ...

  8. ubuntu彻底卸载软件

    找到此软件名称,然后sudo apt-get purge ......(点点为为程序名称),purge参数为彻底删除文件,然后sudo apt-get autoremove,sudo apt-get ...

  9. LeetCode OJ 102. Binary Tree Level Order Traversal

    题目 Given a binary tree, return the level order traversal of its nodes' values. (ie, from left to rig ...

  10. SpringBoot +Jpa+ Hibernate+Mysql工程

    1 使用工具workspace-sts 3.9.5.RELEASE (1)新建一个SpringBoot 项目,选择加载项目需要的的组件.DevTools,JPA,Web,Mysql. Finish.  ...