论文笔记：Visual Question Answering as a Meta Learning Task

Visual Question Answering as a Meta Learning Task
ECCV 2018

2018-09-13 19:58:08

Paper: http://openaccess.thecvf.com/content_ECCV_2018/papers/Damien_Teney_Visual_Question_Answering_ECCV_2018_paper.pdf

1. Introduction:

本文提出一种新的 VQA 思路，将 meta-learning 结合进来，通过支持集的形式（Support Set），让神经网络学会学习。

本文核心的技术贡献是：提供一种顶尖的 VQA模型到 meta-learning 的设定下。the resulting model 是一个深度神经网络，利用 dynamic parameters，也被称为 fast weights，依赖于 support set 在测试时决定的。

the resulting system 的一个能力是：学会产生完全新颖的答案（在 training data 中从未出现的回答）。另外一个能力是处理 rare answers 能力。因为 VQA 是严重的类别不均衡。

本文的贡献是：

1. 将 VQA 看做是 meta-learnig 的问题，在测试时，提供一个 support set 进行模仿；

2. 描述了一个神经网络结构以及训练过程，能够结合 meta-learning 的场景；

3. 能够产生新颖的答案。对于 rare answers 能够很好的处理，更好的采样效率；

2. VQA in a Meta Learning Setting :

1）传统的 VQA 模型：

　　Image I, Question Q, 答案集合 A；

2）拓展到meta-learning 应用场景下：

　　带有 support set S， the support set S can include novel examples S' provided at test time; S = T U S' ;

3. Proposed Model:

作者将 VQA 系统分为两个部分：第一个部分就是感知，the embedding part that encodes the input question and image；第二个部分就是，the classifier part that handles the reasoning and actural question answering；

3.1. 非线性映射 $f_{theta} (*)$ ：

非线性映射的作用是：将问题/图像 h 的 embedding 映射到适合 classifier 的表示（is to map the embedding of the question/image h to a representation suitable for the following classifier）。

我们采用 paper 【34】的设置，利用 a gated hyperbolic tangent layer, 定义为：

其中，$\delta$ 是逻辑激活函数，W, W', b, b' 都是可学习的参数，圆圈代表了元素级相乘。我们将这些参数统一表达为 $\theta$，传统方法就是用 BP 算法以及梯度下降的方法进行训练，这样他们得到的就是 static 的参数。而本文所提出的方法，在测试的时候，依赖于 the input h 以及 the available support set，自适应的进行参数的调整。具体的，我们利用 static parameter $\theta^s$，以及测试时候的动态参数 $\theta^d$。其线性组合为：其中，w 是学习权重的向量。动态权重可以看做是根据输入 h，对 static weights 进行的调整（the dynamic weights can therefore be seen as an adjustment made to be the static ones depends on the input h）。

候选动态权重的集合，被保留在 associative memory M 中。该 memory 是一个关于 key/value pair 的集合（跟支持集一样大）。在测试的时候，我们从该 memory 中提取出合适的动态权重，通过 soft key matching:

其中，$d_{cos}$ 代表了余弦相似度函数。所以，我们得到的是一个加权的 sum，用的是输入 h 和 memory keys $h_i^~$ 之间的相似度来加权 the memory values。

Mapping to Candidate Answers :

未完，待遇。。。

论文笔记：Visual Question Answering as a Meta Learning Task的更多相关文章

论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention
Learning Visual Question Answering by Bootstrapping Hard Attention Google DeepMind ECCV-2018 2018 ...
【自然语言处理】--视觉问答（Visual Question Answering，VQA）从初始到应用
一.前述视觉问答(Visual Question Answering,VQA),是一种涉及计算机视觉和自然语言处理的学习任务.这一任务的定义如下: A VQA system takes as inp ...
Hierarchical Question-Image Co-Attention for Visual Question Answering
Hierarchical Question-Image Co-Attention for Visual Question Answering NIPS 2016 Paper: https://arxi ...
Visual Question Answering with Memory-Augmented Networks
Visual Question Answering with Memory-Augmented Networks 2018-05-15 20:15:03 Motivation: 虽然 VQA 已经取得 ...
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
论文笔记系列-Neural Architecture Search With Reinforcement Learning
摘要神经网络在多个领域都取得了不错的成绩,但是神经网络的合理设计却是比较困难的.在本篇论文中,作者使用递归网络去省城神经网络的模型描述,并且使用增强学习训练RNN,以使得生成得到的模型在验证集上 ...
论文笔记：Deep Attentive Tracking via Reciprocative Learning
Deep Attentive Tracking via Reciprocative Learning NIPS18_tracking Type:Tracking-By-Detection 本篇论文地主 ...
论文笔记：（CVPR2017）PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
目录一. 存在的问题二. 解决的方案 1.点云特征 2.解决方法三. 网络结构四. 理论证明五.实验效果 1.应用 (1)分类: ModelNet40数据集 (2)部件分割:ShapeNet ...
论文笔记之：Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation Google 2016.10.06 官方 ...

随机推荐

react使用apollo简单的获取列表
react yarn add apollo-boost apollo-client react-apollo apollo-cache-inmemory apollo-link-http graphq ...
oracle sqlplus命令详解
涉及到的知识要点 a.带有一个&的替换变量的用法b.带有两个&的替换变量用法c.define命令用法d.accept命令用法e.定制SQL*Plus环境f.在glogin.sql文件中 ...
Vue.js中滚动条加载更多数据
本文章参考:http://www.cnblogs.com/ssrsblogs/p/6108423.html 分析:1.需要判断滚动条是否到底部: 需要用到DOM的三个属性值,即scrollTop.cl ...
python 中的 list dict 与 set 的关系
转自: http://www.cnblogs.com/soaringEveryday/p/5044007.html list arraylist 实现(数组) List 通过内置的 append()方 ...
ArcGIS AddIn调用ArcMap自带的对话框
ESRI.ArcGIS.Framework命名空间提供了ArcGIS常用的一些对话框,可以在开发时直接调用这些对话框,而不需要重新去写Form 主要对话框有 1.IColorBrowser/IColo ...
Mysql 索引迁移策略
Mysql 索引迁移策略近日在核查项目中的一些慢sql时发现一个很鸡仔儿的问题,本地开发库表中索引跟生产上差距很大,又因为生产库登录各种麻烦,需要各种验证码,那么多的慢sql分给好些个人,不可能让大 ...
python全栈开发* 02 知识点汇总 * 180531
运算符和编码一格式化输出 1 .输入 name ,age , job , hobby. 输出 : --------------- info of Mary ------------ ...
201621123049《java程序设计》第四周学习总结
1. 本周学习总结 1.1 写出你认为本周学习中比较重要的知识点关键词继承类型转换覆盖 1.2 尝试使用思维导图将这些关键词组织起来.注:思维导图一般不需要出现过多的字. 2. 书面作业 1. ...
MySQL 误操作后数据恢复（update,delete忘加where条件）
在数据库日常维护中,开发人员是最让人头痛的,很多时候都会由于SQL语句写的有问题导致服务器出问题,导致资源耗尽.最危险的操作就是在做DML操作的时候忘加where条件,导致全表更新,这是作为运维或者D ...
家庭记账本之微信小程序（六）
Wxss的学习 WXSS(WeiXin Style Sheets)是一套样式语言,用于描述WXML的组件样式. WXSS用来决定WXML的组件应该怎么显示. 为了适应广大的前端开发者,我们的WXSS具 ...

论文笔记：Visual Question Answering as a Meta Learning Task

论文笔记：Visual Question Answering as a Meta Learning Task的更多相关文章

随机推荐

热门专题