Do Transformers Really Perform Badfor Graph Representation?

microsoft/Graphormer: This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?". (github.com)

1 Introduction

作者们发现关键问题在于如何补回Transformer模型的自注意力层丢失掉的图结构信息！不同于序列数据(NLP, Speech)或网格数据(CV)，图的结构信息是图数据特有的属性，且对图的性质预测起着重要的作用。

There are many attempts of leveraging Transformer into the graph domain, but the only effective way is replacing some key modules (e.g., feature aggregation) in classic GNN variants by the softmax attention[47,7,22,48,58,43,13]

[47] Graph attention networks. ICLR, 2018.
[7] Graph transformer for graph-to-sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7464–7471, 2020.
[22] Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pages 2704–2710, 2020.
[48] Direct multi-hop attention based graph neural network.arXiv preprint arXiv:2009.14332, 2020.
[58] Graph-bert: Only attention is needed forlearning graph representations.arXiv preprint arXiv:2001.05140, 2020.
[43] Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information ProcessingSystems, 33, 2020.
[13] generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021

对于每个节点，the self-attention 只计算节点和其他节点之间的语义相似性，而不考虑反映在节点上的图的结构信息和节点对之间的关系。

基于此，研究人员们在图预测任务上提出了Graphormer模型 —— 一个标准的Transformer模型，并且带有三种结构信息编码(中心性编码Centrality Encoding、空间编码Spatial Encoding以及边编码Edge Encoding)，帮助Graphormer模型编码图数据的结构信息。

Centrality Encoding: capture the node importance in the graph. In particular, we leverage the degree centrality for the centrality encoding, where a learnable vectoris assigned to each node according to its degree and added to the node features in the input layer.
Spatial Encoding: capture the structural relation between nodes.
Edge Encoding

通过使用上述编码，我们进一步从数学上证明了Graphormer具有很强的表达能力，因为许多流行的GNN变体只是它的特例。

2 Graphormer

2.1 Structural Encodings in Graphormer

2.1.1 a Centrality Encoding

In Graphormer, we use the degree centrality, which is one of the standard centrality measures inliterature, as an additional signal to the neural network. To be specific, we develop a Centrality Encoding which assigns each node two real-valued embedding vectors according to its indegree and outdegree.

2.1.2 a Centrality Encoding

An advantage of Transformer is its global receptive field.

Spatial Encoding:

In this paper, we choose φ(vi,vj) to be the distance of the shortest path (SPD) between vi and vj if the two nodes are connected. If not, we set the output ofφto be a special value, i.e., -1. We assign each (feasible) output value a learnable scalar which will serve as a bias term in the self-attention module. Denote Aij as the (i,j)-element of the Query-Key product matrix A, we have:

2.1.3 Edge Encoding in the Attention

In many graph tasks, edges also have structural features.

In the first method, the edge features areadded to the associated nodes’ features [21,29].

[21] Open graph benchmark: Datasets for machine learning on graphs.arXiv preprintarXiv:2005.00687, 2020.
[29] Deepergcn: All you need to train deepergcns.arXiv preprint arXiv:2006.07739, 2020

In the second method, for each node, its associated edges’ features will be used together with the node features in the aggregation [15,51,25].

[51] How powerful are graph neural networks?InInternational Conference on Learning Representations, 2019.
[25] Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

However, such ways of using edge feature only propagate the edge information to its associated nodes, which may not be an effective way to leverage edge information in representation of the whole graph.

a new edge encoding method in Graphormer:

3.2 Implementation Details of Graphormer

Graphormer Layer:

MHA: multi-head self-attention (MHA)
FFN: the feed-forward blocks
LN: the layer normalization

Special Node:

生成一个VNODE连接图中所有的点，而它与所有节点的 spatial encodings 是 a distinct learnable scalar

3 Experiments

3.1 OGB Large-Scale Challenge

3.2 Graph Representation

Transformers for Graph Representation的更多相关文章

论文解读（Graphormer）《Do Transformers Really Perform Bad for Graph Representation?》
论文信息论文标题:Do Transformers Really Perform Bad for Graph Representation?论文作者:Chengxuan Ying, Tianle Ca ...
论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》
论文信息 Title:<Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learn ...
论文解读（SUGRL）《Simple Unsupervised Graph Representation Learning》
Paper Information Title:Simple Unsupervised Graph Representation LearningAuthors: Yujie Mo.Liang Pen ...
论文解读（GMI）《Graph Representation Learning via Graphical Mutual Information Maximization》2
Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...
论文解读（GMI）《Graph Representation Learning via Graphical Mutual Information Maximization》
Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...
论文解读（GRCCA）《 Graph Representation Learning via Contrasting Cluster Assignments》
论文信息论文标题:Graph Representation Learning via Contrasting Cluster Assignments论文作者:Chun-Yang Zhang, Hon ...
论文解读（MERIT）《Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning》
论文信息论文标题:Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning ...
论文解读（SUBG-CON）《Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning》
论文信息论文标题:Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning论文作者:Yizhu Ji ...
论文阅读 Dynamic Graph Representation Learning Via Self-Attention Networks
4 Dynamic Graph Representation Learning Via Self-Attention Networks link:https://arxiv.org/abs/1812. ...

随机推荐

Mybatis 遍历 List<Map<String,Object>>
在上一篇博客中总结了MyBatis Plus 实现多表分页模糊查询(链接在最后).返回类型是编写一个专门的vo类.这次是返回List < Map > 前言编写一个专门的vo返回类,主 ...
jenkins 下使用ansible 跨服务器控制操作
例如: A服务器地址:172.16.1.203 B服务器地址:172.16.1.204 当jenkins 在A 服务器并且用户aa, 控制B 服务器的用户bb的操作 (1)B服务器用ssh-key ...
Java堆的理解
堆的核心概述所有的对象实例以及数组都应当在运行时分配在堆上从实际实用角度看 --"几乎所有的对象实例都在堆中分配内存" 数组和对象可能永远不会存储在栈上,因为栈帧中保存引用,这 ...
.NET平台系列9 .NET Core 3.0 / .NET Core 3.1 详解
系列目录 [已更新最新开发文章,点击查看详细] .NET Core 3.0 于 2019年9月23日发布,重点是增加对同时支持使用 Windwos Forms.WPF 和 Entity Frm ...
[Java] Spring 原理
IOC(Inverse of Control)控制反转依赖对象的获得被反转了,由自己创建变为从IOC容器获取优点代码更简介,不需要new对象面向接口编程,使用者与具体类解耦,易扩展方便进行A ...
PyCharm和JDK安装与配置（windows）
原创 PyCharm和JDK安装与配置(windows) mb5cd21e691f31a关注0人评论2024人阅读2020-03-20 21:08:41 一.PyCharm安装与配置 PyChar ...
与find不同，locate并不是实时查找。你需要更新数据库，以获得最新的文件索引信息。updatedb
find是实时查找,如果需要更快的查询,可试试locate:locate会为文件系统建立索引数据库,如果有文件更新,需要定期执行更新命令来更新索引库: $locate string 寻找包含有stri ...
gcc 版本
$ gcc --versiongcc (Ubuntu 5.4.0-6kord1~16.04.4k2) 5.4.0 20160609Copyright (C) 2015 Free Software Fo ...
Ansible_常用文件模块使用详解
一.Ansibel常用文件模块使用详解 1.file模块 1️⃣:file模块常用的参数列表: path 被管理文件的路径 state状态常用参数: absent 删除 ...
Scala 中为什么不建议用 return 关键字
在scala中使用 return 的话,编译的时候会提示the latest statement is method is automatically returned, use of th retu ...

Transformers for Graph Representation