Matching Networks for One Shot Learning
1. Introduction
In this work, inspired by metric learning based on deep neural features and memory augment neural networks, authors propose matching networks that map a small labelled support set and an unlabelled example to its label. Then they define one-shot learning problems on vision and language tasks and obtain an improving one-shot accuracy on ImageNet and Omnight. The novelty of their work is twofold: at the modeling level, and at the training procedure.
2. Model
Their non-parametric approach to solving one-shot is based on two components. First, the model architecture follows recent advances in neural networks augmented with memory. Given a support set $S$, the model difines a function $c_S$(or classifier) for each $S$ Sencond, we employ a training strategy which is tailored for one-shot learning from the support set $S$

2.1 Model Architecture
Matching Networks are able to produce sensible test labels for unobserved classes without any changes to the network. We wish to map from a support set of $k$ examples of images-label pairs $S={(x_i,y_i)}_{i=1}^k$ to a classfier $c_S(\hat{x})$ which,given a test example $\hat{x}$, defines a probability distribution over outputs $\hat{y}$. Furthmore, difine the mapping $S\rightarrow c_S(\hat{x})$ to be $P(\hat{y} \mid \hat{x},S)$ where $P$ is parameterised by a neural network. Thus, When given a new support set of examples $S'$ from which to one-shot learn, we simply use the parametric neural network defined by $P$ to make predictions about the appropriate label $\hat{y}$ for each test example $\hat{x}$: $P(\hat{y} \mid \hat{x},S')$. In general, our predicted output class for a given input unseen example $\hat{x}$ and a support set $S$ becomes $arg \max_y P(y\mid \hat{x},S)$. The model in its simplest form computes $\hat{y}$ as follows:
$$ \hat{y}=\sum_{i=1}^k a(\hat{x},x_i)y_i $$
where $x_i,y_i$ are the samples and labels from the support set $S=\{(x_i,y_i)\}_{i=1}^k$, and $a$ is an attention mechanism. Here,the attention kernel function is the softmax over the cosine distance. $$ a(\hat{x},x_i)=\frac{e^{c(f(\hat{x}),g(x_i))}}{\sum_{j=1}^k e^{c(f(\hat{x}),g(x_j))}} $$ where embeding functions $f$ and $g$ are, actually, appropriate neural networks to embed $\hat{x}$ and $x_i$
2.2 Training Strategy
Let us define a tast $T$ as distribution over possible label sets $L$. To form an “episode” to compute gradients and update our model, we first sample $L$ from $T$(e.g.,$L$ could be the label set {cats; dogs}). We then use $L$ to sample the support set $S$ and a batch $B$ (i.e., both $S$ and $B$ are labelled examples of cats and dogs). The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set $S$. This is a form of meta-learning since the training procedure explicitly learns to learn from a given support set to minimise a loss over a batch. More precisely, the Matching Nets training objective is as follows:
$$ \theta = arg\max_{\theta}E_{L\sim T}\Big[E_{S\sim L,B\sim L}\Big[\sum_{(x,y)\in B}\log P_{\theta}(y\mid x,S)\Big]\Big] $$
Training $\theta$ with this objective function yields a model which works well when sampling $S'\sim T'$ from a different distribution of novel labels
3. Appendix
3.1 The Fully Conditional Embedding $f$
The embedding function for an example $\hat{x}$ in the batch $B$ is as follows:
$$ f(\hat{x},S)=attLSTM(f'(\hat{x}),g(S),K) $$
where $f'$ is a neural network. $K$ is the number of "processing" steps following work. $g(S)$ represents the embedding function $g$ applied to each element $x_i$ from the set $S$. Thus, the state after $k$ processing steps is as follows:
$$ \hat{h}_k,c_k = LSTM(f'(\hat{x}),[h_{k-1},r_{k-1}],c_{k-1}) $$
$$ h_k = \hat{h}_k+f'(\hat{x}) $$
$$ r_{k-1}=\sum_{i=1}^{|S|}a(h_{k-1},g(x_i))g(x_i) $$
$$ a(h_{k-1},g(x_i))=softmax(h_{k-1}^Tg(x_i)) $$
3.2 The Fully Conditional Embedding $g$
The encoding function for the elements in the support set $S$, $g(x_i,S)$ as a bidirectional LSTM. Let g'(x_i) be a neural network, then we difine $g(x_i,S)=\vec{h}_i+h_i^{\leftarrow}+g'(x_i)$ with:
$$ \vec{h}_i,\vec{c}_i=LSTM(g'(x_i),\vec{h}_{i-1},\vec{c}_{i-1}) $$
$$ h_i^{\leftarrow},c_i^{\leftarrow}=LSTM(g'(x_i),h_{i+1}^{\leftarrow},c_{i+1}^{\leftarrow}) $$
Reference: https://arxiv.org/abs/1606.04080
Matching Networks for One Shot Learning的更多相关文章
- (转)Paper list of Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning
Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning 2018-08-03 19:16:56 本文转自:http ...
- Multi-attention Network for One Shot Learning
Multi-attention Network for One Shot Learning 2018-05-15 22:35:50 本文的贡献点在于: 1. 表明类别标签信息对 one shot l ...
- (六)6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
- 论文笔记系列-Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
I. 背景介绍 1. 学习曲线(Learning Curve) 我们都知道在手工调试模型的参数的时候,我们并不会每次都等到模型迭代完后再修改超参数,而是待模型训练了一定的epoch次数后,通过观察学习 ...
- CS229 6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
- 零样本学习 - (Zero shot learning,ZSL)
https://zhuanlan.zhihu.com/p/41846072 https://zhuanlan.zhihu.com/p/38418698 https://zhuanlan.zhihu.c ...
- 18 Issues in Current Deep Reinforcement Learning from ZhiHu
深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...
- (zhuan) Where can I start with Deep Learning?
Where can I start with Deep Learning? By Rotek Song, Deep Reinforcement Learning/Robotics/Computer V ...
- Few-Shot/One-Shot Learning
Few-Shot/One-Shot Learning指的是小样本学习,目的是克服机器学习中训练模型需要海量数据的问题,期望通过少量数据即可获得足够的知识. Matching Networks for ...
随机推荐
- 开机自启动 centos 7
开机自启动,写入必要的命令即可.vim /etc/rc.d/rc.local
- gdb调试嵌入式环境搭建
1.下载gdb源代码 http://ftp.gnu.org/gnu/gdb/ 2.编译 解压#tar zxvf gdb-7.9.1.tar.gz,cd到解压的目录中. 2.1编译arm-linux-g ...
- 流媒体压力测试rtmp&hls(含推流和拉流)
http://blog.csdn.net/sinat_34194127/article/details/50816045 [root@localhost ~]# yum install git unz ...
- JNDI在Spring和tomcat下的使用
1. 是什么 JNDI是 Java 命名与目录接口(Java Naming and Directory Interface),在J2EE规范中是重要的规范之一.JNDI 在 J2EE 中的角色就是&q ...
- MySQL视图-(视图创建,修改,删除,查看,更新数据)
视图是一种虚拟存在的表,对于使用视图的用户来说基本上是透明的.视图并不在数据库中实际存在,行和列数据来自定义视图的查询总使用的表,并且是在使用视图时动态生成的. 视图相对于普通表的优势: 简单:使用视 ...
- 异常处理,MD5
异常处理. try except raise try: 代码 except 异常类: 除了错, 如何处理异常 except 异常类: 除了错, 如何处理异常 except 异常类: 除了错, 如何处理 ...
- J2EE十三个技术规范
从事Java开发的童鞋都知道,java是一种非常棒的语言,能够实现跨平台运行.它屏蔽了具体的平台环境的要求,也就是说,无论是windows,还是Unix.Linux系统,只要支持Java虚拟机,就可以 ...
- Tomcat 配置详解和优化
2018年01月09日 18:14:41 tianxiaojun2014 阅读数:306 转自:https://www.cnblogs.com/xbq8080/p/6417671.html htt ...
- 转载:oracle 启动过程--oracle深入研究
Oracle数据库的启动-nomount状态深入解析 通常所说的Oracle Server主要由两个部分组成:Instance和Database.Instance是指一组后台进程(在Windows上是 ...
- python入门(三):循环
1.for i in xxx xxx: 序列(列表,元祖,字符串) xxx: 可迭代对象 >>> for i in "abc": ... print(i) ...