论文阅读:Learning Visual Question Answering by Bootstrapping Hard Attention
Learning Visual Question Answering by Bootstrapping Hard Attention
Google DeepMind ECCV-2018
2018-08-05 19:24:44
Paper:https://arxiv.org/abs/1808.00300
Introduction:
本文尝试仅仅用 hard attention 的方法来抠出最有用的 feature,进行 VQA 任务的学习。
Soft Attention:
Existing attention models are predominantly based on soft attention, in which all information is adaptively re-weighted before being aggregated. This can improve accuracy by isolating important information and avoiding interference from unimportant information.
Hard Attention:
It has the potential to improve accuracy and learning efficiency by focusing computation on the important parts of an image. But beyond this, it offers better computational efficiency because it only fully processes the information deemed most relevant.
但是,hard attention 有一个很致命的缺陷:由于图像中信息的选择是离散的,这导致基于梯度的学习方法,如 deep learning based methods,不可求导。然后,就无法利用 back-propagation 的方法进行区域的选择,来支持基于梯度的优化(because the choice of which information to process is discrete and thus non-differentiable, gradients cannot be backpropagated into the selection mechanism to support gradient-based optimization.)。当然有一些基于 Policy Gradient 的方法可以通过采样的方法,来处理梯度不可导的问题,但是这方面的研究,也仍然是非常的火热。

Approach Details:
待更新 、、、
--
论文阅读:Learning Visual Question Answering by Bootstrapping Hard Attention的更多相关文章
- 论文笔记:Visual Question Answering as a Meta Learning Task
Visual Question Answering as a Meta Learning Task ECCV 2018 2018-09-13 19:58:08 Paper: http://openac ...
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
- Hierarchical Question-Image Co-Attention for Visual Question Answering
Hierarchical Question-Image Co-Attention for Visual Question Answering NIPS 2016 Paper: https://arxi ...
- Visual Question Answering with Memory-Augmented Networks
Visual Question Answering with Memory-Augmented Networks 2018-05-15 20:15:03 Motivation: 虽然 VQA 已经取得 ...
- 【自然语言处理】--视觉问答(Visual Question Answering,VQA)从初始到应用
一.前述 视觉问答(Visual Question Answering,VQA),是一种涉及计算机视觉和自然语言处理的学习任务.这一任务的定义如下: A VQA system takes as inp ...
- 论文:Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结 笔记不能简单的抄写文中 ...
- 论文阅读笔记二十二:End-to-End Instance Segmentation with Recurrent Attention(CVPR2017)
论文源址:https://arxiv.org/abs/1605.09410 tensorflow 代码:https://github.com/renmengye/rec-attend-public 摘 ...
- 第八讲_图像问答Image Question Answering
第八讲_图像问答Image Question Answering 课程结构 图像问答的描述 具备一系列AI能力:细分识别,物体检测,动作识别,常识推理,知识库推理..... 先要根据问题,判断什么任务 ...
- Deep Reinforcement Learning for Dialogue Generation 论文阅读
本文来自李纪为博士的论文 Deep Reinforcement Learning for Dialogue Generation. 1,概述 当前在闲聊机器人中的主要技术框架都是seq2seq模型.但 ...
随机推荐
- mysql主从配置,读写分离
Mysql主从配置,实现读写分离 大型网站为了软解大量的并发访问,除了在网站实现分布式负载均衡,远远不够.到了数据业务层.数据访问层,如果还是传统的数据结构,或者只是单单靠一台服务器扛,如此多的数据库 ...
- html5-section元素
<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8&qu ...
- codeforces 980B Marlin
题意: 有一个城市有4行n列,n是奇数,有一个村庄在(1,1),村民的活动地点是(4,n): 有一个村庄在(4,1),村民的活动地点是(1,n): 现在要修建k个宾馆,不能修建在边界上,问能否给出一种 ...
- OpenCV-3.3.0测试
安装包目录下/samples/cpp里是各种例程 其中example_cmake里CMakeLists.txt已写好,直接cmake,make就可以,example.cpp是一个调用笔记本摄像头并显示 ...
- python 读写json数据
json 模块提供了一种很简单的方式来编码和解码JSON 数据. 字符串操作 其中两个主要的函数是json.dumps() 和json.loads() ,要比其他序列化函数库如pickle 的接口少得 ...
- 系统批量运维管理器Fabric详解
系统批量运维管理器Fabric详解 Fabrici 是基于python现实的SSH命令行工具,简化了SSH的应用程序部署及系统管理任务,它提供了系统基础的操作组件,可以实现本地或远程shell命令,包 ...
- mxnet 查看 Sym shape
import mxnet as mximport numpy as npimport randomimport mxnet as mximport sysdata_shape = {'data':(6 ...
- URL的解析,C语言实现
源: URL的解析,C语言实现 c语言实现urlencode和decode
- Java 安全套接字编程以及keytool 使用最佳实践
概述 利用 Java 的 JSSE(Java Secure Socket Extension)技术,我们可以方便的编写安全套接字程序,关于 JSSE 的介绍,可以参阅 Oracle 网站提供的 JSS ...
- kivy Properties
Introduction to Properties¶ Properties are an awesome way to define events and bind to them. Essenti ...