论文笔记之：Graph Attention Networks

Graph Attention Networks

2018-02-06 16:52:49

Abstract：

　　本文提出一种新颖的 graph attention networks (GATs), 可以处理 graph 结构的数据，利用 masked self-attentional layers 来解决基于 graph convolutions 以及他们的预测的前人方法（prior methods）的不足。

　　对象：graph-structured data.

　　方法：masked self-attentional layers.

　　目标：to address the shortcomings of prior methods based on graph convolutions or their approximations.

　　具体方法：By stacking layers in which nodes are able to attend over their neghborhood's feature. We enables specifying different weights to different nodes in a neighborhood, without requiring any kinds of costly matrix operation or depending on knowing the graph structure upfront.

Introduction：

　　Background：CNN 已经被广泛的应用于各种 grid 结构的数据当中，各种 task 都取得了不错的效果，如：物体检测，语义分割，机器翻译等等。但是，有些数据结构，不是这种 grid-like structure 的，如：3D meshes, social networks, telecommunication networks, biological networks, brain connection。

　　已经有多个尝试将 RNN 和 graph 结构的东西结合起来，来进行表示。

　　目前，将 convolution 应用到 the graph domain，常见的有两种做法：

　　1. spectral approaches

　　2. non-spectral approaches (spatial based methods)

　　文章对这两种方法进行了简要的介绍，回顾了一些最近的相关工作。

　　然后就提到了 Attention Mechanisms，这种思路已经被广泛的应用于各种场景中。其中一个优势就是：they allow for dealing with variable sized inputs, focusing on the most relvant parts of the input to make decisions。当 attention 被用来计算 single sequence 的表示时，通常被称为：self-attention or intra-attention。将这种方法和 CNN/RNN 结合在一起，就可以得到非常好的结果了。

　　受到最新工作的启发，我们提出了 attention-based architecture 来执行 node classification of graph-structured data。This idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention stategy。这个注意力机制有如下几个有趣的性质：

　　1. 操作是非常有效的。

　　2. 可应用到有不同度的 graph nodes，通过给其紧邻指定不同的权重；

　　3. 这个模型可以直接应用到 inductive learning problems, including tasks where the model has to generalize to completely unseen graphs.

　　Our approach of sharing a neural network computation across edges is reminiscent of the formulation of relational networks (Santoro et al., 2017), wherein relations between objects (regional features from an image extracted by a convolutional neural network) are aggregated across all object pairs, by employing a shared mechanism. 　　

　　作者在三个数据集上进行了实验，达到顶尖的效果，表明了 attention-based models 在处理任意结构的 graph 的潜力。

GAT Architecture ：

1. Graph Attentional Layer

　　本文所提出 attentional layer 的输入是一组节点特征（a set of node features），其中，N 是节点的个数，F 是每个节点的特征数。该层产生一组新的节点特征，作为其输出，即：。

　　为了得到充分表达能力，将输入特征转换为高层特征，至少我们需要一个可学习的线性转换（one learnable linear transformation）。为了达到该目标，作为初始步骤，一个共享的线性转换，参数化为 weight matrix，W，应用到每一个节点上。我们然后在每一个节点上，进行 self-attention --- a shared attentional mechanism a：计算 attention coefficients

　　表明 node j's feature 对 node i 的重要性。最 general 的形式，该模型允许 every node to attend on every other node, dropping all structural information. 我们将这种 graph structure 通过执行 masked attention 来注射到该机制当中 --- 我们仅仅对 nodes $j$ 计算 $e_{ij}$，其中，graph 中节点 i 的一些近邻，记为：$N_{i}$。在我们的实验当中，这就是 the first-order neighbors of $i$。

　　为了使得系数简单的适应不同的节点，我们用 softmax function 对所有的 j 进行归一化：

　　在我们的实验当中，该 attention 机制 a 是一个 single-layer feedforward neural network，参数化为权重向量。全部展开，用 attention 机制算出来的系数，可以表达为：

　　其中，$*^T$ 代表转置，|| 代表 concatenation operation。

　　一旦得到了，该归一化的 attention 系数可以用来计算对应特征的线性加权，可以得到最终的每个节点的输出向量：

　　为了稳定 self-attention 的学习过程，我们发现将我们的机制拓展到 multi-head attention 是有好处的，类似于：Attention is all you need. 特别的，K 个独立的 attention 机制执行公式（4）的转换，然后将其特征进行组合，得到下面的特征输出：

　　特别的，如果我们执行在 network 的最后输出层执行该 multi-head attention，concatenation 就不再是必须的了，相反的，我们采用 averaging，推迟执行最终非线性，

　　所提出 attention 加权机制的示意图，如下所示：

论文笔记之：Graph Attention Networks的更多相关文章

论文解读（GATv2）《How Attentive are Graph Attention Networks?》
论文信息论文标题:How Attentive are Graph Attention Networks?论文作者:Shaked Brody, Uri Alon, Eran Yahav论文来源:202 ...
谣言检测（ClaHi-GAT）《Rumor Detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks》
论文信息论文标题:Rumor Detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks论文作者:Erx ...
论文笔记之：Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
论文笔记之:Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning 2017-06-06 21: ...
GRAPH ATTENTION NETWORKS
基本就是第一层concatenate,第二层不concatenate. 相关论文: Semi-Supervised Classification with Graph Convolutional Ne ...
论文阅读 Streaming Graph Neural Networks
3 Streaming Graph Neural Networks link:https://dl.acm.org/doi/10.1145/3397271.3401092 Abstract 本文提出了 ...
论文笔记：Diffusion-Convolutional Neural Networks （传播-卷积神经网络）
Diffusion-Convolutional Neural Networks (传播-卷积神经网络)2018-04-09 21:59:02 1. Abstract: 我们提出传播-卷积神经网络(DC ...
论文笔记(1)-Dropout-Improving neural networks by preventing co-adaptation of feature detectors
Improving neural networks by preventing co-adaptation of feature detectors 是Hinton在2012年6月份发表的,从这篇文章 ...
论文笔记之：Attention For Fine-Grained Categorization
Attention For Fine-Grained Categorization Google ICLR 2015 本文说是将Ba et al. 的基于RNN 的attention model 拓展 ...
【论文笔记】Progressive Neural Networks 渐进式神经网络
Progressive NN Progressive NN是第一篇我看到的deepmind做这个问题的.思路就是说我不能忘记第一个任务的网络,同时又能使用第一个任务的网络来做第二个任务. 为了不忘记之 ...

随机推荐

UIView常见方法
- (void)addSubview:(UIView *)view; 添加一个子控件view - (void)removeFromSuperview; 从父控件中移除 - (UIView *)vi ...
sitecore系统教程之内容编辑器中创建项目
在内容编辑器中创建新项目时,必须先在内容树中选择一个项目,以指示新项目的位置.您可以创建一个新项目作为您选择的项目的兄弟或子项目: 兄弟是您在与所选项目相同的级别创建的项目. 子项是您在所选项下创建的 ...
codeoforces 975B Mancala
题意: 一个游戏,有14个洞,每个洞中开始有若干个球或者没有球. 每一步的操作,是将一个洞中的所有球取出,再逆时针放一个球到它的后一个洞,后两个洞,后三个洞....如果当前放的是最后一个,那么下一个又 ...
codeoforces 932A
题意: A和B在玩一个游戏,首先有一个X0 >= 3,之后选择一个小于X0的质数p,然后在找一个最小的X1 >= X0,并且p可以整除X1:之后再选择一个小于X1的质数p,然后再找一个最小 ...
GGTalk即时通讯系统（支持广域网）终于有移动端了！（技术原理、实现、源码）
首先要感谢大家一直以来对于GGTalk即时通讯系统的关注和支持!GGTalk即时通讯系统的不断完善与大家的支持分不开! 从2013年最初的GG1.0开放源码以来,到后来陆续增加了网盘功能.远程协助功能 ...
Codeforce 507B - Amr and Pins
Amr loves Geometry. One day he came up with a very interesting problem. Amr has a circle of radius r ...
bc 命令
bc命令是一种支持任意精度的交互执行的计算器语言.是Linux简单的计算器,能进行进制转换与计算.能转换的进制包括十六进制.十进制.八进制.二进制等.可以使用的运算符号包括(+)加法.(-)减法.(* ...
The Little Prince-11/26
WRITE BEFORE THE BOOK REVIEW I have read The Little Prince for three or four times. However I still ...
ES6知识整理（3）--函数的扩展
只有整理过的学习才是有效的学习.也就是学习之后要使用和整理成文,才是真正的学到了... 最近上班有点忙的关系,于是文章更新会慢些.只有晚上加完班之后,空余时间才能学习整理.因此完成一篇也可能要几个晚上 ...
camera理论基础和工作原理（转）
源: camera理论基础和工作原理

论文笔记之：Graph Attention Networks

论文笔记之：Graph Attention Networks的更多相关文章

随机推荐

热门专题