Do Transformers Really Perform Badfor Graph Representation?

microsoft/Graphormer: This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?". (github.com)

1 Introduction

作者们发现关键问题在于如何补回Transformer模型的自注意力层丢失掉的图结构信息!不同于序列数据(NLP, Speech)或网格数据(CV),图的结构信息是图数据特有的属性,且对图的性质预测起着重要的作用。

There are many attempts of leveraging Transformer into the graph domain, but the only effective way is replacing some key modules (e.g., feature aggregation) in classic GNN variants by the softmax attention[47,7,22,48,58,43,13]

  • [47] Graph attention networks. ICLR, 2018.
  • [7] Graph transformer for graph-to-sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7464–7471, 2020.
  • [22] Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pages 2704–2710, 2020.
  • [48] Direct multi-hop attention based graph neural network.arXiv preprint arXiv:2009.14332, 2020.
  • [58] Graph-bert: Only attention is needed forlearning graph representations.arXiv preprint arXiv:2001.05140, 2020.
  • [43] Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information ProcessingSystems, 33, 2020.
  • [13] generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021
对于每个节点,the self-attention 只计算节点和其他节点之间的语义相似性,而不考虑反映在节点上的图的结构信息和节点对之间的关系。
基于此,研究人员们在图预测任务上提出了Graphormer模型 —— 一个标准的Transformer模型,并且带有三种结构信息编码(中心性编码Centrality Encoding、空间编码Spatial Encoding以及边编码Edge Encoding),帮助Graphormer模型编码图数据的结构信息。
  • Centrality Encoding: capture the node importance in the graph. In particular, we leverage the degree centrality for the centrality encoding, where a learnable vectoris assigned to each node according to its degree and added to the node features in the input layer.
  • Spatial Encoding: capture the structural relation between nodes.
  • Edge Encoding
通过使用上述编码,我们进一步从数学上证明了Graphormer具有很强的表达能力,因为许多流行的GNN变体只是它的特例。
 

2 Graphormer

2.1 Structural Encodings in Graphormer

2.1.1 a Centrality Encoding

In Graphormer, we use the degree centrality, which is one of the standard centrality measures inliterature, as an additional signal to the neural network. To be specific, we develop a Centrality Encoding which assigns each node two real-valued embedding vectors according to its indegree and outdegree.

2.1.2 a Centrality Encoding

An advantage of Transformer is its global receptive field.

Spatial Encoding:

In this paper, we choose φ(vi,vj) to be the distance of the shortest path (SPD) between vi and vj if the two nodes are connected. If not, we set the output ofφto be a special value, i.e., -1. We assign each (feasible) output value a learnable scalar which will serve as a bias term in the self-attention module. Denote Aij as the  (i,j)-element of the Query-Key product matrix A, we have:

2.1.3 Edge Encoding in the Attention

In many graph tasks, edges also have structural features.

In the first method, the edge features areadded to the associated nodes’ features [21,29].

  • [21] Open graph benchmark: Datasets for machine learning on graphs.arXiv preprintarXiv:2005.00687, 2020.
  • [29] Deepergcn: All you need to train deepergcns.arXiv preprint arXiv:2006.07739, 2020

In the second method, for each node, its associated edges’ features will be used together with the node features in the aggregation [15,51,25].

  • [51] How powerful are graph neural networks?InInternational Conference on Learning Representations, 2019.
  • [25] Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

However, such ways of using edge feature only propagate the edge information to its associated nodes, which may not be an effective way to leverage edge information in representation of the whole graph.

a new edge encoding method in Graphormer:

3.2 Implementation Details of Graphormer

Graphormer Layer:

  • MHA: multi-head self-attention (MHA)
  • FFN: the feed-forward blocks
  • LN: the layer normalization

Special Node:

生成一个VNODE连接图中所有的点,而它与所有节点的 spatial encodings 是 a distinct learnable scalar

3 Experiments

3.1 OGB Large-Scale Challenge

3.2 Graph Representation

Transformers for Graph Representation的更多相关文章

  1. 论文解读(Graphormer)《Do Transformers Really Perform Bad for Graph Representation?》

    论文信息 论文标题:Do Transformers Really Perform Bad for Graph Representation?论文作者:Chengxuan Ying, Tianle Ca ...

  2. 论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》

    论文信息 Title:<Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learn ...

  3. 论文解读(SUGRL)《Simple Unsupervised Graph Representation Learning》

    Paper Information Title:Simple Unsupervised Graph Representation LearningAuthors: Yujie Mo.Liang Pen ...

  4. 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》2

    Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...

  5. 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》

    Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...

  6. 论文解读(GRCCA)《 Graph Representation Learning via Contrasting Cluster Assignments》

    论文信息 论文标题:Graph Representation Learning via Contrasting Cluster Assignments论文作者:Chun-Yang Zhang, Hon ...

  7. 论文解读(MERIT)《Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning》

    论文信息 论文标题:Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning ...

  8. 论文解读(SUBG-CON)《Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning》

    论文信息 论文标题:Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning论文作者:Yizhu Ji ...

  9. 论文阅读 Dynamic Graph Representation Learning Via Self-Attention Networks

    4 Dynamic Graph Representation Learning Via Self-Attention Networks link:https://arxiv.org/abs/1812. ...

随机推荐

  1. 解决@Autowired警告

    在使用spring框架中的依赖注入注解@Autowired时,idea报了一个警告 被警告的代码如下: @Autowired UserDao userDao; 警告提示信息:Field injecti ...

  2. thinkphp之独立日志(tp5.1)

    为了便于分析,File类型的日志还支持设置某些级别的日志信息单独文件记录,以error类型的日志为例,例如: 1.在log.php 中配置 'apart_level' => [ 'error' ...

  3. memcache 和 redis 的区别

    1)Redis中,并不是所有的数据都一直存储在内存中的,这是和Memcache相比一个最大的区别.2)Redis在很多方面具备数据库的特征,或者说就是一个数据库系统,而Memcache只是简单的K/V ...

  4. 【Cocos2d-x】屏蔽Emoji并解决由于Emoji导致的崩溃问题

    IOS的Emoji表情因为编码问题,在Android手机上无法正常显示,如果当前的cc.Label节点使用的是系统字,在系统字库中找不到对应编码的字符,会导致崩溃. 为了解决这个问题,又要兼顾新老版本 ...

  5. [c++] 文件包含

    当一个类用到另一个类时,有两种包含方式,在.h中包含和在.cpp中包含 用到公共类库时,在.h文件中包含(公共类库可视为不变的) 用到项目开发过程中自己或同事写的类时,在.cpp文件中包含(可能根据需 ...

  6. 用nvm的方式安装node

    一.nvm简介 Node Version Manager(Node版本管理工具)由于以后的开发工作可能会在多个Node版本中测试,而且Node的版本也比较多,所以需要这么款工具来管理.   nvm的安 ...

  7. QT发布 - 动态编译,删减以来dll

    经常看到网上有些论调说 Qt 程序无比庞大,甚至拿 .NET 程序来比,说 Qt 程序打包以后跟 .NET 安装包差不多大.由此影响了很多人对 Qt 的选择.我觉得有必要对此做一些澄清-- 显然这个说 ...

  8. Samba服务配置及配置文件说明

    前言 1.配置Samba服务为什么要关闭防火墙(firewalld)和Selinux? 在linux操作系统中默认开启了防火墙,Selinux也处于启动状态,一般状态为enforing:所以,在我们搭 ...

  9. 分布式存储ceph---部署ceph(2)

    一.部署准备 准备5台机器(linux系统为centos7.6版本),当然也可以至少3台机器并充当部署节点和客户端,可以与ceph节点共用: 1台部署节点(配一块硬盘,运行ceph-depoly) 3 ...

  10. Linux中级之lvs三个模式的图像补充(nat,dr,tun)

    负载均衡(Load Balance)集群提供了一种廉价.有效.透明的方法,来扩展网络设备和服务器的负载.带宽.增加吞吐量.加强网络数据处理能力.提高网络的灵活性和可用性. (1)单台计算机无法承受大规 ...