Transformers for Graph Representation
Do Transformers Really Perform Badfor Graph Representation?
1 Introduction
作者们发现关键问题在于如何补回Transformer模型的自注意力层丢失掉的图结构信息!不同于序列数据(NLP, Speech)或网格数据(CV),图的结构信息是图数据特有的属性,且对图的性质预测起着重要的作用。
There are many attempts of leveraging Transformer into the graph domain, but the only effective way is replacing some key modules (e.g., feature aggregation) in classic GNN variants by the softmax attention[47,7,22,48,58,43,13]
- [47] Graph attention networks. ICLR, 2018.
- [7] Graph transformer for graph-to-sequence learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7464–7471, 2020.
- [22] Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pages 2704–2710, 2020.
- [48] Direct multi-hop attention based graph neural network.arXiv preprint arXiv:2009.14332, 2020.
- [58] Graph-bert: Only attention is needed forlearning graph representations.arXiv preprint arXiv:2001.05140, 2020.
- [43] Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information ProcessingSystems, 33, 2020.
- [13] generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications, 2021
- Centrality Encoding: capture the node importance in the graph. In particular, we leverage the degree centrality for the centrality encoding, where a learnable vectoris assigned to each node according to its degree and added to the node features in the input layer.
- Spatial Encoding: capture the structural relation between nodes.
- Edge Encoding
2 Graphormer
2.1 Structural Encodings in Graphormer

2.1.1 a Centrality Encoding
In Graphormer, we use the degree centrality, which is one of the standard centrality measures inliterature, as an additional signal to the neural network. To be specific, we develop a Centrality Encoding which assigns each node two real-valued embedding vectors according to its indegree and outdegree.

2.1.2 a Centrality Encoding
An advantage of Transformer is its global receptive field.
Spatial Encoding:
In this paper, we choose φ(vi,vj) to be the distance of the shortest path (SPD) between vi and vj if the two nodes are connected. If not, we set the output ofφto be a special value, i.e., -1. We assign each (feasible) output value a learnable scalar which will serve as a bias term in the self-attention module. Denote Aij as the (i,j)-element of the Query-Key product matrix A, we have:

2.1.3 Edge Encoding in the Attention
In many graph tasks, edges also have structural features.
In the first method, the edge features areadded to the associated nodes’ features [21,29].
- [21] Open graph benchmark: Datasets for machine learning on graphs.arXiv preprintarXiv:2005.00687, 2020.
- [29] Deepergcn: All you need to train deepergcns.arXiv preprint arXiv:2006.07739, 2020
In the second method, for each node, its associated edges’ features will be used together with the node features in the aggregation [15,51,25].
- [51] How powerful are graph neural networks?InInternational Conference on Learning Representations, 2019.
- [25] Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016
However, such ways of using edge feature only propagate the edge information to its associated nodes, which may not be an effective way to leverage edge information in representation of the whole graph.
a new edge encoding method in Graphormer:

3.2 Implementation Details of Graphormer
Graphormer Layer:
- MHA: multi-head self-attention (MHA)
- FFN: the feed-forward blocks
- LN: the layer normalization

Special Node:
生成一个VNODE连接图中所有的点,而它与所有节点的 spatial encodings 是 a distinct learnable scalar
3 Experiments
3.1 OGB Large-Scale Challenge

3.2 Graph Representation

Transformers for Graph Representation的更多相关文章
- 论文解读(Graphormer)《Do Transformers Really Perform Bad for Graph Representation?》
论文信息 论文标题:Do Transformers Really Perform Bad for Graph Representation?论文作者:Chengxuan Ying, Tianle Ca ...
- 论文解读GALA《Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning》
论文信息 Title:<Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learn ...
- 论文解读(SUGRL)《Simple Unsupervised Graph Representation Learning》
Paper Information Title:Simple Unsupervised Graph Representation LearningAuthors: Yujie Mo.Liang Pen ...
- 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》2
Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...
- 论文解读(GMI)《Graph Representation Learning via Graphical Mutual Information Maximization》
Paper Information 论文作者:Zhen Peng.Wenbing Huang.Minnan Luo.Q. Zheng.Yu Rong.Tingyang Xu.Junzhou Huang ...
- 论文解读(GRCCA)《 Graph Representation Learning via Contrasting Cluster Assignments》
论文信息 论文标题:Graph Representation Learning via Contrasting Cluster Assignments论文作者:Chun-Yang Zhang, Hon ...
- 论文解读(MERIT)《Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning》
论文信息 论文标题:Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning ...
- 论文解读(SUBG-CON)《Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning》
论文信息 论文标题:Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning论文作者:Yizhu Ji ...
- 论文阅读 Dynamic Graph Representation Learning Via Self-Attention Networks
4 Dynamic Graph Representation Learning Via Self-Attention Networks link:https://arxiv.org/abs/1812. ...
随机推荐
- 【vue】报错This dependency was not found
报错 ERROR Failed to compile with 1 errors 10:33:34 ├F10: PM┤ This dependency was not found: * @/views ...
- ffmpeg实践
将mov视频解码300帧,并保存为1024:576分辨率,yuv420格式 ffmpeg -i Community_SneakAttack.mov -aspect 16:9 -vf scale=102 ...
- C++ primer plus读书笔记——第8章 函数探幽
第8章 函数探幽 1. 对于内联函数,编译器将使用相应的函数代码替换函数调用,程序无需跳到一个位置执行代码,再调回来.因此,内联函数的运行速度比常规函数稍快,但代价是需要占用更多内存. 2. 要使用内 ...
- JPEG头部解析
6.3 JPEG格式 6.3.1简介 微处理机中的存放顺序有正序(big endian)和逆序(little endian)之分.正序存放就是高字节存放在前低字节在后,而逆序存放就是低字 ...
- [刷题] 112 Path Sum
要求 给出一个二叉树及数字sum,判断是否存在一条从根到叶子的路径,路径上的所有节点和为sum 实现 转化为寻找左右子树上和为 sum-root 的路径,到达叶子节点时递归终止 注意只有一个孩子时,根 ...
- 关于jmeter线程组和循环次数的设置
初始设置:设置线程数 n = 80,循环次数a = 1,ramp-up period=5 一 计算最后一个线程的生成时间(last) 总共生成80个线程,总共需要5秒,每秒钟会启动16个线程,所以,第 ...
- 抓住异步编程async/await语法糖的牛鼻子: SynchronizationContext
长话短说,本文带大家抓住异步编程async/await语法糖的牛鼻子: SynchronizationContext 引言 C#异步编程语法糖async/await,使开发者很容易就能编写异步代码. ...
- Kubernetes集群搭建 ver1.20.5
目录 部署方式 1. 基础环境准备 1.1 基础初始化 1.2 安装docker 2. 部署harbor及haproxy高可用反向代理 2.1 镜像加速配置 2.2 高可用master可配置 3. 初 ...
- 90%的人都不知道的Node.js 依赖关系管理(下)
转载请注明出处:葡萄城官网,葡萄城为开发者提供专业的开发工具.解决方案和服务,赋能开发者. 原文参考:https://dzone.com/articles/node-dependency-manage ...
- [leetcode] 208. 实现 Trie (前缀树)(Java)
208. 实现 Trie (前缀树) 实现Trie树,网上教程一大堆,没啥可说的 public class Trie { private class Node { private int dumpli ...