Learning Latent Graph Representations for Relational VQA

The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.

基于transformer的模型的关键机制是交叉关注，交叉关注在tokens上隐式地形成图，并充当扩散操作符，以促进信息通过图传播，用于需要对场景进行一些推理的问答。

We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.

我们重新解释和表述基于transformer的模型，以显式地在tokens上构造潜在图，从而支持改进性能，以回答关于对象之间关系的可视化问题。

Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.

巧合的是，基于transformer的语言编码器不仅可以利用标记化趋势，而且由于其核心的自我注意机制，其本质上是为信息融合和对齐而构建的。

基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化，以及文本标记和图像标记之间的成对标记交互。

我们观察到成对的tokens交互共同形成了一个图，并且遍历这个图形成了一种推理，这可能是对这些基于transformer的模型的推理能力声明的解释

we reinterpret transformer-based VQA systems as graph convolutions,

We show that our model benefits from its latent graph representations

To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.

In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.

A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.

Graph Transformer Networks (GTN) are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.

如何利用场景图scene graph和图表示，并利用transformer机制的图卷积，提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章

论文解读（GMT）《Accurate Learning of Graph Representations with Graph Multiset Pooling》
论文信息论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...
论文解读（GraRep）《GraRep: Learning Graph Representations with Global Structural Information》
论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间: CIKM论文作 ...
论文解读（LG2AR）《Learning Graph Augmentations to Learn Graph Representations》
论文信息论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...
Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
论文解读（DeepWalk）《DeepWalk: Online Learning of Social Representations》
一.基本信息论文题目:<DeepWalk: Online Learning of Social Representations>发表时间: KDD 2014论文作者: Bryan P ...
论文解读（ N2N）《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》
论文信息论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...
【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
论文笔记之：Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11 19:40:22 Moti ...

随机推荐

.NET宝藏API之：IHostedService，后台任务执行
我们在项目开发的过程中可能会遇到类似后台定时任务的需求,比如消息队列的消费者. 按照.NetF时的开发习惯首先想到的肯定是Windows Service,拜托,都什么年代了还用Windows服务(小声 ...
GO 前后端分离开源后台管理系统 Gfast v2.0.4 版发布
更新内容:1.适配插件商城,开发环境从后台直接安装插件功能:2.代码生成细节修复及功能完善(支持生成上传文件.图片及富文本编辑器功能):3.增加swagger接口文档生成:4.更新goframe版本至 ...
[AcWing 795] 前缀和
点击查看代码 #include<iostream> using namespace std; const int N = 1e5 + 10; int a[N], s[N]; int mai ...
没错，华为开始对IoT下手了！
最近,有很多粉丝在后台私信想知道目前最热的技术是什么? 小编觉得,5G时代到来物联网技术将迎来快速的发展加上目前,国内物联网人才短缺每年人才缺口达百万 IoT物联网将成为最热门的技术最近,小 ...
Windows IDEA Community 报错
运行时报错 "CreateProcess error=206,文件名或扩展名太长" 解决方法:https://plugins.gradle.org/plugin/ua.eshepe ...
uniapp复制到剪贴板
uni.setClipboardData() ; 例: 给元素添加点击事件 <view @click="doCopy()">复制</view> 复制方法 d ...
Angular中懒加载一个模块并动态创建显示该模块下声明的组件
angular中支持可以通过路由来懒加载某些页面模块已达到减少首屏尺寸, 提高首屏加载速度的目的. 但是这种通过路由的方式有时候是无法满足需求的. 比如, 点击一个按钮后显示一行工具栏, 这个工具栏组 ...
nodejs + typescript + koa + eslint + typescript eslint + prettier + webstorm
ESLint 安装 yarn add -D eslint 生成配置文件 yarn eslint --init cli 选项 How would you like to use ESLint? To c ...
C# settings 文件基础用法
原文自定义设置项类型 Serializable 修饰的枚举,可作为设置项类型 [Serializable] public enum DeviceBrand { None = 0, [Descript ...
高危！Fastjson反序列化远程代码执行漏洞风险通告，请尽快升级
据国家网络与信息安全信息通报中心监测发现,开源Java开发组件Fastjson存在反序列化远程代码执行漏洞.攻击者可利用上述漏洞实施任意文件写入.服务端请求伪造等攻击行为,造成服务器权限被窃取.敏感信 ...

Learning Latent Graph Representations for Relational VQA

Learning Latent Graph Representations for Relational VQA的更多相关文章

随机推荐

热门专题