1. Introduction


In this work, inspired by metric learning based on deep neural features and memory augment neural networks, authors propose matching networks that map a small labelled support set and an unlabelled example to its label. Then they define one-shot learning problems on vision and language tasks and obtain an improving one-shot accuracy on ImageNet and Omnight. The novelty of their work is twofold: at the modeling level, and at the training procedure.

2. Model


Their non-parametric approach to solving one-shot is based on two components. First, the model architecture follows recent advances in neural networks augmented with memory. Given a support set $S$, the model difines a function $c_S$(or classifier) for each $S$ Sencond, we employ a training strategy which is tailored for one-shot learning from the support set $S$

2.1 Model Architecture

Matching Networks are able to produce sensible test labels for unobserved classes without any changes to the network. We wish to map from a support set of $k$ examples of images-label pairs $S={(x_i,y_i)}_{i=1}^k$ to a classfier $c_S(\hat{x})$ which,given a test example $\hat{x}$, defines a probability distribution over outputs $\hat{y}$. Furthmore, difine the mapping $S\rightarrow c_S(\hat{x})$ to be $P(\hat{y} \mid \hat{x},S)$ where $P$ is parameterised by a neural network. Thus, When given a new support set of examples $S'$ from which to one-shot learn, we simply use the parametric neural network defined by $P$ to make predictions about the appropriate label $\hat{y}$ for each test example $\hat{x}$: $P(\hat{y} \mid \hat{x},S')$. In general, our predicted output class for a given input unseen example $\hat{x}$ and a support set $S$ becomes $arg \max_y P(y\mid \hat{x},S)$. The model in its simplest form computes $\hat{y}$ as follows:

$$ \hat{y}=\sum_{i=1}^k a(\hat{x},x_i)y_i $$

where $x_i,y_i$ are the samples and labels from the support set $S=\{(x_i,y_i)\}_{i=1}^k$, and $a$ is an attention mechanism. Here,the attention kernel function is the softmax over the cosine distance. $$ a(\hat{x},x_i)=\frac{e^{c(f(\hat{x}),g(x_i))}}{\sum_{j=1}^k e^{c(f(\hat{x}),g(x_j))}} $$ where embeding functions $f$ and $g$ are, actually, appropriate neural networks to embed $\hat{x}$ and $x_i$

2.2 Training Strategy

Let us define a tast $T$ as distribution over possible label sets $L$. To form an “episode” to compute gradients and update our model, we first sample $L$ from $T$(e.g.,$L$ could be the label set {cats; dogs}). We then use $L$ to sample the support set $S$ and a batch $B$ (i.e., both $S$ and $B$ are labelled examples of cats and dogs). The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set $S$. This is a form of meta-learning since the training procedure explicitly learns to learn from a given support set to minimise a loss over a batch. More precisely, the Matching Nets training objective is as follows:

$$ \theta = arg\max_{\theta}E_{L\sim T}\Big[E_{S\sim L,B\sim L}\Big[\sum_{(x,y)\in B}\log P_{\theta}(y\mid x,S)\Big]\Big] $$

Training $\theta$ with this objective function yields a model which works well when sampling $S'\sim T'$ from a different distribution of novel labels

3. Appendix


3.1 The Fully Conditional Embedding $f$

The embedding function for an example $\hat{x}$ in the batch $B$ is as follows:

$$ f(\hat{x},S)=attLSTM(f'(\hat{x}),g(S),K) $$

where $f'$ is a neural network. $K$ is the number of "processing" steps following work. $g(S)$ represents the embedding function $g$ applied to each element $x_i$ from the set $S$. Thus, the state after $k$ processing steps is as follows:

$$ \hat{h}_k,c_k = LSTM(f'(\hat{x}),[h_{k-1},r_{k-1}],c_{k-1}) $$

$$ h_k = \hat{h}_k+f'(\hat{x}) $$

$$ r_{k-1}=\sum_{i=1}^{|S|}a(h_{k-1},g(x_i))g(x_i) $$

$$ a(h_{k-1},g(x_i))=softmax(h_{k-1}^Tg(x_i)) $$

3.2 The Fully Conditional Embedding $g$

The encoding function for the elements in the support set $S$, $g(x_i,S)$ as a bidirectional LSTM. Let g'(x_i) be a neural network, then we difine $g(x_i,S)=\vec{h}_i+h_i^{\leftarrow}+g'(x_i)$ with:

$$ \vec{h}_i,\vec{c}_i=LSTM(g'(x_i),\vec{h}_{i-1},\vec{c}_{i-1}) $$

$$ h_i^{\leftarrow},c_i^{\leftarrow}=LSTM(g'(x_i),h_{i+1}^{\leftarrow},c_{i+1}^{\leftarrow}) $$

Reference: https://arxiv.org/abs/1606.04080

Matching Networks for One Shot Learning的更多相关文章

  1. (转)Paper list of Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning

    Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning 2018-08-03 19:16:56 本文转自:http ...

  2. Multi-attention Network for One Shot Learning

    Multi-attention Network for One Shot Learning 2018-05-15 22:35:50  本文的贡献点在于: 1. 表明类别标签信息对 one shot l ...

  3. (六)6.11 Neurons Networks implements of self-taught learning

    在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...

  4. 论文笔记系列-Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

    I. 背景介绍 1. 学习曲线(Learning Curve) 我们都知道在手工调试模型的参数的时候,我们并不会每次都等到模型迭代完后再修改超参数,而是待模型训练了一定的epoch次数后,通过观察学习 ...

  5. CS229 6.11 Neurons Networks implements of self-taught learning

    在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...

  6. 零样本学习 - (Zero shot learning,ZSL)

    https://zhuanlan.zhihu.com/p/41846072 https://zhuanlan.zhihu.com/p/38418698 https://zhuanlan.zhihu.c ...

  7. 18 Issues in Current Deep Reinforcement Learning from ZhiHu

    深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...

  8. (zhuan) Where can I start with Deep Learning?

    Where can I start with Deep Learning? By Rotek Song, Deep Reinforcement Learning/Robotics/Computer V ...

  9. Few-Shot/One-Shot Learning

    Few-Shot/One-Shot Learning指的是小样本学习,目的是克服机器学习中训练模型需要海量数据的问题,期望通过少量数据即可获得足够的知识. Matching Networks for ...

随机推荐

  1. 开机自启动 centos 7

    开机自启动,写入必要的命令即可.vim /etc/rc.d/rc.local

  2. gdb调试嵌入式环境搭建

    1.下载gdb源代码 http://ftp.gnu.org/gnu/gdb/ 2.编译 解压#tar zxvf gdb-7.9.1.tar.gz,cd到解压的目录中. 2.1编译arm-linux-g ...

  3. 流媒体压力测试rtmp&hls(含推流和拉流)

    http://blog.csdn.net/sinat_34194127/article/details/50816045 [root@localhost ~]# yum install git unz ...

  4. JNDI在Spring和tomcat下的使用

    1. 是什么 JNDI是 Java 命名与目录接口(Java Naming and Directory Interface),在J2EE规范中是重要的规范之一.JNDI 在 J2EE 中的角色就是&q ...

  5. MySQL视图-(视图创建,修改,删除,查看,更新数据)

    视图是一种虚拟存在的表,对于使用视图的用户来说基本上是透明的.视图并不在数据库中实际存在,行和列数据来自定义视图的查询总使用的表,并且是在使用视图时动态生成的. 视图相对于普通表的优势: 简单:使用视 ...

  6. 异常处理,MD5

    异常处理. try except raise try: 代码 except 异常类: 除了错, 如何处理异常 except 异常类: 除了错, 如何处理异常 except 异常类: 除了错, 如何处理 ...

  7. J2EE十三个技术规范

    从事Java开发的童鞋都知道,java是一种非常棒的语言,能够实现跨平台运行.它屏蔽了具体的平台环境的要求,也就是说,无论是windows,还是Unix.Linux系统,只要支持Java虚拟机,就可以 ...

  8. Tomcat 配置详解和优化

    2018年01月09日 18:14:41 tianxiaojun2014 阅读数:306   转自:https://www.cnblogs.com/xbq8080/p/6417671.html htt ...

  9. 转载:oracle 启动过程--oracle深入研究

    Oracle数据库的启动-nomount状态深入解析 通常所说的Oracle Server主要由两个部分组成:Instance和Database.Instance是指一组后台进程(在Windows上是 ...

  10. python入门(三):循环

    1.for i in xxx xxx: 序列(列表,元祖,字符串) xxx: 可迭代对象 >>> for i in "abc": ...     print(i) ...