论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention

Learning Visual Question Answering by Bootstrapping Hard Attention

Google DeepMind ECCV-2018

2018-08-05 19:24:44

Paper：https://arxiv.org/abs/1808.00300

Introduction：

　　本文尝试仅仅用 hard attention 的方法来抠出最有用的 feature，进行 VQA 任务的学习。

Soft Attention：

　　Existing attention models are predominantly based on soft attention, in which all information is adaptively re-weighted before being aggregated. This can improve accuracy by isolating important information and avoiding interference from unimportant information.

Hard Attention：

　　It has the potential to improve accuracy and learning efficiency by focusing computation on the important parts of an image. But beyond this, it offers better computational efficiency because it only fully processes the information deemed most relevant.

　　但是，hard attention 有一个很致命的缺陷：由于图像中信息的选择是离散的，这导致基于梯度的学习方法，如 deep learning based methods，不可求导。然后，就无法利用 back-propagation 的方法进行区域的选择，来支持基于梯度的优化（because the choice of which information to process is discrete and thus non-differentiable, gradients cannot be backpropagated into the selection mechanism to support gradient-based optimization.）。当然有一些基于 Policy Gradient 的方法可以通过采样的方法，来处理梯度不可导的问题，但是这方面的研究，也仍然是非常的火热。

Approach Details：　　

待更新、、、

论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention的更多相关文章

论文笔记：Visual Question Answering as a Meta Learning Task
Visual Question Answering as a Meta Learning Task ECCV 2018 2018-09-13 19:58:08 Paper: http://openac ...
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
Hierarchical Question-Image Co-Attention for Visual Question Answering
Hierarchical Question-Image Co-Attention for Visual Question Answering NIPS 2016 Paper: https://arxi ...
Visual Question Answering with Memory-Augmented Networks
Visual Question Answering with Memory-Augmented Networks 2018-05-15 20:15:03 Motivation: 虽然 VQA 已经取得 ...
【自然语言处理】--视觉问答（Visual Question Answering，VQA）从初始到应用
一.前述视觉问答(Visual Question Answering,VQA),是一种涉及计算机视觉和自然语言处理的学习任务.这一任务的定义如下: A VQA system takes as inp ...
论文：Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结笔记不能简单的抄写文中 ...
论文阅读笔记二十二：End-to-End Instance Segmentation with Recurrent Attention（CVPR2017）
论文源址:https://arxiv.org/abs/1605.09410 tensorflow 代码:https://github.com/renmengye/rec-attend-public 摘 ...
第八讲_图像问答Image Question Answering
第八讲_图像问答Image Question Answering 课程结构图像问答的描述具备一系列AI能力:细分识别,物体检测,动作识别,常识推理,知识库推理..... 先要根据问题,判断什么任务 ...
Deep Reinforcement Learning for Dialogue Generation 论文阅读
本文来自李纪为博士的论文 Deep Reinforcement Learning for Dialogue Generation. 1,概述当前在闲聊机器人中的主要技术框架都是seq2seq模型.但 ...

随机推荐

Ｃ－Cow Sorting （置换群，数学）
Farmer John's N (1 ≤ N ≤ 10,000) cows are lined up to be milked in the evening. Each cow has a uniqu ...
sitecore系列教程之营销人员和技术人员如何策划与消费者的对话以提升体验？
“每次良好的交谈都要从良好的倾听开始.” - 未知你是如何听取网站访问者的?你是在倾听还是只是回复? 拥有内容管理系统只是良好网站战略的一个要素.毕竟,内容必须是动态的,及时的和相关的. 当网站访问 ...
Java集合-----Map详解
Map与Collection并列存在.用于保存具有映射关系的数据:Key-Value Map 中的 key 和 value 都可以是任何引用类型的数据 Map 中的 ...
CS229 - MachineLearning - 12 强化学习笔记
Ng的机器学习课,课程资源:cs229-课件网易公开课-视频问题数学模型: 马尔科夫过程五元组{S.a.Psa.γ.R},分别对应 {状态.行为.状态s下做出a行为的概率.常数.回报}. 一 ...
SQL数据同步之发布订阅
发布订阅份为两个步骤:1.发布.2.订阅.首先在数据源数据库服务器上对需要同步的数据进行发布,然后在目标数据库服务器上对上述发布进行订阅.发布可以发布一张表的部分数据,也可以对整张表进行发布.下面分别 ...
The Little Prince-12/01
The Little Prince-12/01 The people have no imagination. They repeat whatever one says to them… On my ...
WinCHM 制作开发知识库，So easy!!!
开发过程中可能需要一些团队需要相互参照的东西,如前后台开发中的接口定义,团队开发规范,公用的类库,开发FAQ等 ,可以考虑用WinCHM这种工具制作开发知识库,然后发布至一Web服务器上,这样开发人员 ...
game to refactor for refactor
first step, only aim to work. it works, but i have not seen the necessaty to use class yet. my quest ...
从零开始部署一个 Laravel 站点
从零开始部署一个 Laravel 站点此文章为原创文章,未经同意,禁止转载. PHP Laravel Web Git 在阿里云买ECS的时候选择自己习惯的镜像系统,我一般都是使用Linux Ubun ...
Linux学习笔记之Linux运行脚本时 $'\r' 错误
1.Windows上操作用notepad++编译器打开脚本,编辑->文档格式转换->转换为UNIX格式,然后保存. 重新上传.运行,问题解决 2.Linux上操作用vi/vim命令打开 ...

论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention

论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention的更多相关文章

随机推荐

热门专题