Do Transformers Really Perform Badfor Graph Representation?

microsoft/Graphormer: This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?". (github.com)

1 Introduction

作者们发现关键问题在于如何补回Transformer模型的自注意力层丢失掉的图结构信息!不同于序列数据(NLP, Speech)或网格数据(CV),图的结构信息是图数据特有的属性,且对图的性质预测起着重要的作用。

There are many attempts of leveraging Transformer into the graph domain, but the only effective way is replacing some key modules (e.g., feature aggregation) in classic GNN variants by the softmax attention[47,7,22,48,58,43,13]

  • [47] Graph attention networks. ICLR, 2018.
  • [7] Graph transformer for graph-to-sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7464–7471, 2020.
  • [22] Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pages 2704–2710, 2020.
  • [48] Direct multi-hop attention based graph neural network.arXiv preprint arXiv:2009.14332, 2020.
  • [58] Graph-bert: Only attention is needed forlearning graph representations.arXiv preprint arXiv:2001.05140, 2020.
  • [43] Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information ProcessingSystems, 33, 2020.
  • [13] generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021
对于每个节点,the self-attention 只计算节点和其他节点之间的语义相似性,而不考虑反映在节点上的图的结构信息和节点对之间的关系。
基于此,研究人员们在图预测任务上提出了Graphormer模型 —— 一个标准的Transformer模型,并且带有三种结构信息编码(中心性编码Centrality Encoding、空间编码Spatial Encoding以及边编码Edge Encoding),帮助Graphormer模型编码图数据的结构信息。
  • Centrality Encoding: capture the node importance in the graph. In particular, we leverage the degree centrality for the centrality encoding, where a learnable vectoris assigned to each node according to its degree and added to the node features in the input layer.
  • Spatial Encoding: capture the structural relation between nodes.
  • Edge Encoding
通过使用上述编码,我们进一步从数学上证明了Graphormer具有很强的表达能力,因为许多流行的GNN变体只是它的特例。
 

2 Graphormer

2.1 Structural Encodings in Graphormer

2.1.1 a Centrality Encoding

In Graphormer, we use the degree centrality, which is one of the standard centrality measures inliterature, as an additional signal to the neural network. To be specific, we develop a Centrality Encoding which assigns each node two real-valued embedding vectors according to its indegree and outdegree.

2.1.2 a Centrality Encoding

An advantage of Transformer is its global receptive field.

Spatial Encoding:

In this paper, we choose φ(vi,vj) to be the distance of the shortest path (SPD) between vi and vj if the two nodes are connected. If not, we set the output ofφto be a special value, i.e., -1. We assign each (feasible) output value a learnable scalar which will serve as a bias term in the self-attention module. Denote Aij as the  (i,j)-element of the Query-Key product matrix A, we have:

2.1.3 Edge Encoding in the Attention

In many graph tasks, edges also have structural features.

In the first method, the edge features areadded to the associated nodes’ features [21,29].

  • [21] Open graph benchmark: Datasets for machine learning on graphs.arXiv preprintarXiv:2005.00687, 2020.
  • [29] Deepergcn: All you need to train deepergcns.arXiv preprint arXiv:2006.07739, 2020

In the second method, for each node, its associated edges’ features will be used together with the node features in the aggregation [15,51,25].

  • [51] How powerful are graph neural networks?InInternational Conference on Learning Representations, 2019.
  • [25] Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

However, such ways of using edge feature only propagate the edge information to its associated nodes, which may not be an effective way to leverage edge information in representation of the whole graph.

a new edge encoding method in Graphormer:

3.2 Implementation Details of Graphormer

Graphormer Layer:

  • MHA: multi-head self-attention (MHA)
  • FFN: the feed-forward blocks
  • LN: the layer normalization

Special Node:

生成一个VNODE连接图中所有的点,而它与所有节点的 spatial encodings 是 a distinct learnable scalar

3 Experiments

3.1 OGB Large-Scale Challenge

3.2 Graph Representation

Transformers for Graph Representation的更多相关文章

  1. 论文解读(Graphormer)《Do Transformers Really Perform Bad for Graph Representation?》

    论文信息 论文标题:Do Transformers Really Perform Bad for Graph Representation?论文作者:Chengxuan Ying, Tianle Ca ...

  2. 论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》

    论文信息 Title:<Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learn ...

  3. 论文解读(SUGRL)《Simple Unsupervised Graph Representation Learning》

    Paper Information Title:Simple Unsupervised Graph Representation LearningAuthors: Yujie Mo.Liang Pen ...

  4. 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》2

    Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...

  5. 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》

    Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...

  6. 论文解读(GRCCA)《 Graph Representation Learning via Contrasting Cluster Assignments》

    论文信息 论文标题:Graph Representation Learning via Contrasting Cluster Assignments论文作者:Chun-Yang Zhang, Hon ...

  7. 论文解读(MERIT)《Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning》

    论文信息 论文标题:Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning ...

  8. 论文解读(SUBG-CON)《Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning》

    论文信息 论文标题:Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning论文作者:Yizhu Ji ...

  9. 论文阅读 Dynamic Graph Representation Learning Via Self-Attention Networks

    4 Dynamic Graph Representation Learning Via Self-Attention Networks link:https://arxiv.org/abs/1812. ...

随机推荐

  1. vscode 将本地项目上传到github、从github克隆项目以及删除github上的某个文件夹

    一.将本地项目上传到github 1.创建本地仓库(文件夹) mkdir study//创建文件夹studycd study //进入study文件夹 2.通过命令git init把这个文件夹变成Gi ...

  2. 手写一个LRU工具类

    LRU概述 LRU算法,即最近最少使用算法.其使用场景非常广泛,像我们日常用的手机的后台应用展示,软件的复制粘贴板等. 本文将基于算法思想手写一个具有LRU算法功能的Java工具类. 结构设计 在插入 ...

  3. Java 进行时间处理

    Java 进行时间处理 一.Calendar (1).Calender介绍 Calendar的中文翻译是日历,实际上,在历史上有着许多种计时的方法.所以为了计时的统一,必需指定一个日历的选择.那现在最 ...

  4. LightningChart JS 3.0 新功能上线

    在这次的LC JS更新中,首次将极坐标图引入图表库. 这种全新的图表类型可以通过API轻松地进行样式设置.极坐标可以用作独立图表或在仪表板中使用. 另外,用于 XY图表的对数轴也添加到了这次的更新,L ...

  5. 【Web前端HTML5&CSS3】08-盒模型补充

    笔记来源:尚硅谷Web前端HTML5&CSS3初学者零基础入门全套完整版 目录 盒模型补充及田径场实战 1. 盒子大小 2. 轮廓 3. 阴影 4. 圆角 圆 椭圆 盒模型补充及田径场实战 1 ...

  6. Envoy:开启访问日志,access_log

    access_log: - name: envoy.listener.accesslog typed_config: "@type": type.googleapis.com/en ...

  7. 运维实战案例之“Too many open files”错误与解决方法

    运维实战案例之"Too many open files"错误与解决方法   技术小甜 2017-11-16 15:02:00 浏览869 服务器 shell tomcat 脚本 o ...

  8. docker部署harbor私有镜像库(3)

    一.harbor介绍 在实际生产运维中,往往需要把镜像发布到几十.上百台或更多的节点上.这时单台Docker主机上镜像已无法满足,项目越来越多,镜像就越来越多,都放到一台Docker主机上是不行的,我 ...

  9. linux初级之总结复习

    一.linux命令复习 1.ls:列出当前目录下的文件 -h: -l: -d: -a: 2. man: 命令帮助手册 3. cd: 切换目录 -:  ~: ..: cd: 4. pwd: 显示当前工作 ...

  10. JDK、JRE 和 JVM 的区别

    JDK JDK 是 Java Development Kit 的缩写,JDK 是 Java 语言的软件开发工具包( SDK ).它提供了Java 开发.编译.运行需要的文件和环境. 如果你是 Java ...