Learning Latent Graph Representations for Relational VQA
The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.
基于transformer的模型的关键机制是交叉关注,交叉关注在tokens上隐式地形成图,并充当扩散操作符,以促进信息通过图传播,用于需要对场景进行一些推理的问答。
We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.
我们重新解释和表述基于transformer的模型,以显式地在tokens上构造潜在图,从而支持改进性能,以回答关于对象之间关系的可视化问题。
Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.
巧合的是,基于transformer的语言编码器不仅可以利用标记化趋势,而且由于其核心的自我注意机制,其本质上是为信息融合和对齐而构建的。
基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化,以及文本标记和图像标记之间的成对标记交互。
我们观察到成对的tokens交互共同形成了一个图,并且遍历这个图形成了一种推理,这可能是对这些基于transformer的模型的推理能力声明的解释
we reinterpret transformer-based VQA systems as graph convolutions,
We show that our model benefits from its latent graph representations
To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.
In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.
A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.
Graph Transformer Networks (GTN) are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.
如何利用场景图scene graph和图表示,并利用transformer机制的图卷积,提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章
- 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》
论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...
- 论文解读(GraRep)《GraRep: Learning Graph Representations with Global Structural Information》
论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间: CIKM论文作 ...
- 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》
论文信息 论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering
Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...
- 论文解读(DeepWalk)《DeepWalk: Online Learning of Social Representations》
一.基本信息 论文题目:<DeepWalk: Online Learning of Social Representations>发表时间: KDD 2014论文作者: Bryan P ...
- 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》
论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...
- 论文笔记之:Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11 19:40:22 Moti ...
随机推荐
- Bugku练习题---Web---计算器
Bugku练习题---Web---计算器 flag:flag{8b4b2f83db2992d17d770be1db965147} 解题步骤: 1.观察题目,打开场景 2.场景打开后发现是一个验证码界面 ...
- XCTF练习题---MISC---stegano
XCTF练习题---MISC---stegano flag:flag{1nv151bl3m3554g3} 解题步骤: 1.观察题目,下载附件 2.打开发现是一张PDF图片,尝试转换word无果后,想到 ...
- ucore lab3 虚拟内存管理 学习笔记
做个总结,这节说是讲虚拟内存管理,大部分的时间都在搞SWAP机制和服务于此机制的一些个算法.难度又降了一截. 不过现在我的电脑都16G内存了,能用完一半的情景都极少见了,可能到用到退休都不见得用的上S ...
- Linux文本工具-cat-cut-paste;文本分析-sort-wc-uniq
1.1 查看文本文件内容 cat 1.1.1 cat可以查看文本内容 cat [OPTION]... [FILE]... 常见选项 -E: 显示行结束符$ -A: 显示所有控制符 -n: 对显示出的 ...
- vmware 无法安装 win 10
因为默认是 UEFI,但我们并没有 UEFI 引导分区,所以需要改成 BIOS
- mysql 主从数据同步配置
一主一从,单向同步 master 数据库的数据变更单向同步到 slave 数据库 互为主从,双向同步 master 数据库的数据变更同步到 slave 数据库,slave 数据库的数据边同步到 mas ...
- 女朋友面试回来抱怨说会redis,面试官问了一堆redis
Redis 优缺点及特点 什么是Redis?简述它的优缺点? Redis本质上是一个Key-Value类型的内存数据库,类似MemoryCache,整个数据库统统加载在内存当中进行操作,定期通过异步操 ...
- 好客租房44-react组件基础综合案例-5发表评论-1
发表评论 1给按钮绑定点击事件 2在事件处理程序中 通过state获取评论信息 3将评论信息添加到state中 并调用setState()方法更新数据 //导入react import React f ...
- 101_Power Pivot DAX 累计至今,历史累计至今
焦棚子的文章目录 一.背景 DAX中已经有诸如YTD,QTD,MTD时间智能函数.用起来也比较方便. 但很多时候需要看历史累计至今的数据,需要自己根据实际情况写dax. 今天抛砖引玉,写一个示例. 二 ...
- 实践torch.fx第一篇——基于Pytorch的模型优化量化神器
第一篇--什么是torch.fx 今天聊一下比较重要的torch.fx,也趁着这次机会把之前的torch.fx笔记整理下,笔记大概拆成三份,分别对应三篇: 什么是torch.fx 基于torch.fx ...