Learning Latent Graph Representations for Relational VQA
The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.
基于transformer的模型的关键机制是交叉关注,交叉关注在tokens上隐式地形成图,并充当扩散操作符,以促进信息通过图传播,用于需要对场景进行一些推理的问答。
We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.
我们重新解释和表述基于transformer的模型,以显式地在tokens上构造潜在图,从而支持改进性能,以回答关于对象之间关系的可视化问题。
Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.
巧合的是,基于transformer的语言编码器不仅可以利用标记化趋势,而且由于其核心的自我注意机制,其本质上是为信息融合和对齐而构建的。
基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化,以及文本标记和图像标记之间的成对标记交互。
我们观察到成对的tokens交互共同形成了一个图,并且遍历这个图形成了一种推理,这可能是对这些基于transformer的模型的推理能力声明的解释
we reinterpret transformer-based VQA systems as graph convolutions,
We show that our model benefits from its latent graph representations
To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.
In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.
A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.
Graph Transformer Networks (GTN) are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.
如何利用场景图scene graph和图表示,并利用transformer机制的图卷积,提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章
- 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》
论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...
- 论文解读(GraRep)《GraRep: Learning Graph Representations with Global Structural Information》
论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间: CIKM论文作 ...
- 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》
论文信息 论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
- 论文解读(DeepWalk)《DeepWalk: Online Learning of Social Representations》
一.基本信息 论文题目:<DeepWalk: Online Learning of Social Representations>发表时间: KDD 2014论文作者: Bryan P ...
- 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》
论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文笔记之:Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11 19:40:22 Moti ...
随机推荐
- i2c调试工具分享
i2c-tools简介 在嵌入式开发仲,有时候需要确认硬件是否正常连接,设备是否正常工作,设备的地址是多少等等,这里我们就需要使用一个用于测试I2C总线的工具--i2c-tools. i2c-tool ...
- 在IDEA中已经配置postgis数据库驱动并且能在Java类中连接数据库,但在servlet中无法连接数据库且导致Tomcat自动断开连接的解决方案
最近在IDEA中用JDBC连接PostgreSQL数据库时遇到了这样一个奇怪的事情: 从PostgreSQL JDBC Driver官网下载好JDBC驱动之后,在IDEA的Project Struct ...
- 10个 Linux 命令,让你的操作更有效率
点击上方"开源Linux",选择"设为星标" 回复"学习"获取独家整理的学习资料! 根据老九大师兄口头阐述,Linux是最适合开发的操作系统 ...
- 干货 | 手把手教你搭建一套OpenStack云平台
1 前言 今天我们为一位朋友搭建一套OpenStack云平台. 我们使用Kolla部署stein版本的OpenStack云平台. kolla是用于自动化部署OpenStack的一个项目,它基于dock ...
- 深度解析javaScript常见数据类型检查校验
前言 在JavaScript中,数据类型分为两大类,一种是基础数据类型,另一种则是复杂数据类型,又叫引用数据类型 基础数据类型:数字Number 字符串String 布尔Boolean Null Un ...
- 关于利用STL栈求解四则中缀表达式以及中缀表达式转逆波兰表达式和逆波兰表达式的求解
今天总结一下栈的一个重要应用---四则数学表达式的求解 数学表达式的求解是栈的一个重要的应用,在计算机的应用中 如果求解一个四则运算表达式,我们可能会直接写一个程序例如什么printf("% ...
- CAD图与互联网地图网页端相互叠加显示技术分析和实现
需求分析 之前相关的博文中介绍了如果在Web网页端展示CAD图形(唯杰地图云端图纸管理平台 https://vjmap.com/app/cloud),当一些CAD图纸有实际地理坐标位置时,如地形图等, ...
- async用法
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http ...
- Android.mk编译App源码
在Andriod源码环境编译APP主要考虑如何引入第三方jar包和arr包的问题,初次尝试,步步是坑,这里给出一个模板: LOCAL_PATH := $(call my-dir) include $( ...
- Kafka消息的压缩机制
最近在做 AWS cost saving 的事情,对于 Kafka 消息集群,计划通过压缩消息来减少消息存储所占空间,从而达到减少 cost 的目的.本文将结合源码从 Kafka 支持的消息压缩类型. ...