The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.

基于transformer的模型的关键机制是交叉关注,交叉关注在tokens上隐式地形成图,并充当扩散操作符,以促进信息通过图传播,用于需要对场景进行一些推理的问答。

We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.

我们重新解释和表述基于transformer的模型,以显式地在tokens上构造潜在图,从而支持改进性能,以回答关于对象之间关系的可视化问题。

Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.

巧合的是,基于transformer的语言编码器不仅可以利用标记化趋势,而且由于其核心的自我注意机制,其本质上是为信息融合和对齐而构建的。

基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化,以及文本标记和图像标记之间的成对标记交互。

我们观察到成对的tokens交互共同形成了一个图,并且遍历这个图形成了一种推理,这可能是对这些基于transformer的模型的推理能力声明的解释

we reinterpret transformer-based VQA systems as graph convolutions,

We show that our model benefits from its latent graph representations

To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.

In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.

A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.

Graph Transformer Networks (GTN)  are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.

如何利用场景图scene graph和图表示,并利用transformer机制的图卷积,提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章

  1. 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》

    论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...

  2. 论文解读(GraRep)《GraRep: Learning Graph Representations with Global Structural Information》

    论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间:  CIKM论文作 ...

  3. 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》

    论文信息 论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...

  4. Learning Conditioned Graph Structures for Interpretable Visual Question Answering

    Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...

  5. 论文解读(DeepWalk)《DeepWalk: Online Learning of Social Representations》

    一.基本信息 论文题目:<DeepWalk: Online Learning of Social Representations>发表时间:  KDD 2014论文作者:  Bryan P ...

  6. 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》

    论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...

  7. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  8. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  9. 论文笔记之:Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

    Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11  19:40:22  Moti ...

随机推荐

  1. petite-vue源码剖析-逐行解读@vue-reactivity之effect

    当我们通过effect将副函数向响应上下文注册后,副作用函数内访问响应式对象时即会自动收集依赖,并在相应的响应式属性发生变化后,自动触发副作用函数的执行. // ./effect.ts export ...

  2. XCTF练习题---CRYPTO---混合编码解析

    XCTF练习题---CRYPTO---混合编码解析 flag:cyberpeace{welcometoattackanddefenceworld} 解题步骤: 1.观察题目,下载附件进行查看 2.看到 ...

  3. python数据可视化-matplotlib入门(5)-饼图和堆叠图

    饼图常用于统计学模块,画饼图用到的方法为:pie( ) 一.pie()函数用来绘制饼图 pie(x, explode=None, labels=None, colors=None, autopct=N ...

  4. 1.10 Linux桌面环境(桌面系统)大比拼[附带优缺点

    早期的 Linux 系统都是不带界面的,只能通过命令来管理,比如运行程序.编辑文档.删除文件等.所以,要想熟练使用 Linux,就必须记忆很多命令. 后来随着 Windows 的普及,计算机界面变得越 ...

  5. springMvc和Hibernate集成实现用户添加

    源码:http://pan.baidu.com/s/1i4xVLE9(百度云) 步骤:一.创建数据库(mysql) 二.导入相应jar包(注意不同数据库jdbc.jar包)配置web.xml.spri ...

  6. Cubieboard安装系统

    2013年买的一个小玩意. 一.硬件 1.1 相关资料 http://www.cubieforums.com http://cubie.cc 1.2 cubieboard1 1.3 无线网卡 水星 M ...

  7. Nexus5x 修改Android开机动画

    1.制作帧动画 这里随便从网上找了一个gif图片,导入PS中,打开后会形成很多帧图层,选择导航栏中的文件->脚本->将图层导出到文件可以将所有图层导出来.要注意文件命名,Android会按 ...

  8. 设计并实现大数类 BigNum

    学习任务:设计并实现大数类 BigNum 代码示例: import java.util.Scanner; public class BigNum { private double num; publi ...

  9. monit 命令详解(monit)

    monit是Monit软件的主操作控制命令. 语法 monit [options]+ [command] 选项(options) -c file 指定要使用的配置文件 -d n 每间隔多少秒运行一次M ...

  10. CoaXPress 时间戳 Time Stamping

    背景 在CXP2.0之前,CXP没有定义Time Stamping时间戳的概念,但是用户对Time Stamping是有实际需求的,比如我们要对比多台设备拍摄同一个物体不同角度的照片,或者记录触发完成 ...