The key mechanism of transformer-based models is cross-attentions, which implicitly form graphs over tokens and act as diffusion operators to facilitate information propagation through the graph for question-answering that requires some reasoning over the scene.

基于transformer的模型的关键机制是交叉关注,交叉关注在tokens上隐式地形成图,并充当扩散操作符,以促进信息通过图传播,用于需要对场景进行一些推理的问答。

We reinterpret and reformulate the transformer-based model to explicitly construct latent graphs over tokens and thereby support improved performance for answering visual questions about relations between objects.

我们重新解释和表述基于transformer的模型,以显式地在tokens上构造潜在图,从而支持改进性能,以回答关于对象之间关系的可视化问题。

Coincidentally, transformer-based language encoders can not only take advantage of the tokenization trend but also are intrinsically built for information fusion and alignments due to its core self-attention mechanism.

巧合的是,基于transformer的语言编码器不仅可以利用标记化趋势,而且由于其核心的自我注意机制,其本质上是为信息融合和对齐而构建的。

基于transformer的VQA系统的这种成功表明了两个见解的有效性:图像标记化,以及文本标记和图像标记之间的成对标记交互。

我们观察到成对的tokens交互共同形成了一个图,并且遍历这个图形成了一种推理,这可能是对这些基于transformer的模型的推理能力声明的解释

we reinterpret transformer-based VQA systems as graph convolutions,

We show that our model benefits from its latent graph representations

To the best of our knowledge, current transformer-based models cannot benefit from graph information, and there have not been work on taking advantage of scene graphs or graph representations in general for VQA.

In our model, the goal is to learn to generate a latent graph representation and then perform node classification on the resulting heterogeneous graph.

A typical task for a GCN is node classification, as GCN is capable of learning node representations from a given static homogeneous graph.

Graph Transformer Networks (GTN)  are a model for handling heterogeneous graphs, graphs with various types of edges, as well as generating new graphs.

如何利用场景图scene graph和图表示,并利用transformer机制的图卷积,提供VQA。

Learning Latent Graph Representations for Relational VQA的更多相关文章

  1. 论文解读(GMT)《Accurate Learning of Graph Representations with Graph Multiset Pooling》

    论文信息 论文标题:Accurate Learning of Graph Representations with Graph Multiset Pooling论文作者:Jinheon Baek, M ...

  2. 论文解读(GraRep)《GraRep: Learning Graph Representations with Global Structural Information》

    论文题目:<GraRep: Learning Graph Representations with Global Structural Information>发表时间:  CIKM论文作 ...

  3. 论文解读(LG2AR)《Learning Graph Augmentations to Learn Graph Representations》

    论文信息 论文标题:Learning Graph Augmentations to Learn Graph Representations论文作者:Kaveh Hassani, Amir Hosein ...

  4. Learning Conditioned Graph Structures for Interpretable Visual Question Answering

    Learning Conditioned Graph Structures for Interpretable Visual Question Answering 2019-05-29 00:29:4 ...

  5. 论文解读(DeepWalk)《DeepWalk: Online Learning of Social Representations》

    一.基本信息 论文题目:<DeepWalk: Online Learning of Social Representations>发表时间:  KDD 2014论文作者:  Bryan P ...

  6. 论文解读( N2N)《Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization》

    论文信息 论文标题:Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximiz ...

  7. 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs

    Unsupervised Learning of Video Representations using LSTMs Note here: it's a learning notes on new L ...

  8. 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos

    Unsupervised Learning of Visual Representations using Videos Note here: it's a learning note on Prof ...

  9. 论文笔记之:Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

    Learning Cross-Modal Deep Representations for Robust Pedestrian Detection 2017-04-11  19:40:22  Moti ...

随机推荐

  1. JDK内置锁深入探究

    一.序言 本文讲述仅针对 JVM 层次的内置锁,不涉及分布式锁. 锁有多种分类形式,比如公平锁与非公平锁.可重入锁与非重入锁.独享锁与共享锁.乐观锁与悲观锁.互斥锁与读写锁.自旋锁.分段锁和偏向锁/轻 ...

  2. linux常用理论(一)

    第一周 1.按系列罗列Linux的发行版,并描述不同发行版之间的联系与区别. Debian Redhat issue 2.安装Centos7.9和ubuntu操作系统,创建一个自己名字的用户名,并可以 ...

  3. IDEA编译项目后,target目录下的jsp文件不更新

    tomcat目录说明 先来看一下tomcat的目录: |-bin |-conf |-lib |-logs |-temp |-webapps |-work tomcat 的核心是servlet容器,叫 ...

  4. Docker Compose 的介绍、安装与使用

    什么是 Docker Compose? Compose 是 Docker 官方的开源项目,负责实现Docker容器集群的快速编排,开源代码在 https://github.com/docker/com ...

  5. jfinal极速开发

    下载jfinal项目,上面都配置好了不用自己新建从头配置.https://jfinal.com/ idea打开项目 配置数据库 resources目录下demo-config-dev.txt # co ...

  6. .NET 中 GC 的模式与风格

    垃圾回收(GC)是托管语言必备的技术之一.GC 的性能是影响托管语言性能的关键.我们的 .NET 既能写桌面程序 (WINFROM , WPF) 又能写 web 程序 (ASP.NET CORE),甚 ...

  7. mysql忘记root密码实现免密登录

    1.配置my.cnf文件,跳过授权表: skip-grant-tables 2.重启mysqld服务 3.z直接mysql登录 4.use mysql这个数据库 5.设置密码: update user ...

  8. 使用 VS Code + Markdown 编写 PDF 文档

    背景介绍 作为一个技术人员,基本都需要编写技术相关文档,而且大部分技术人员都应该掌握 markdown 这个技能,使用 markdown 来编写并生成 PDF 文档将会是一个不错的体验,以下就介绍下如 ...

  9. 48. Rotate Image - LeetCode

    Question 48. Rotate Image Solution 把这个二维数组(矩阵)看成一个一个环,循环每个环,循环每条边,每个边上的点进行旋转 public void rotate(int[ ...

  10. 用 notion 管理信用卡与花呗

    用 notion 管理信用卡与花呗 Notion原文,排版更佳 概述 不需要提醒功能和安卓用户可以忽略Scriptable和快捷指令 app的设置 Notion 建立信用卡表格,录入信用卡基本信息,自 ...