论文信息

论文标题：Rumor Detection on Social Media with Event Augmentations
论文作者：Zhenyu He, Ce Li, Fan Zhou, Yi Yang
论文来源：2021，SIGIR
论文地址：download
论文代码：download

1 Introduction

　　现有的深度学习方法取得了巨大的成功，但是这些方法需要大量可靠的标记数据集来训练，这是耗时和数据低效的。为此，本文提出了 RDEA ，通过事件增强在社交媒体上的谣言检测（RDEA），该方案创新地集成了三种增强策略，通过修改回复属性和事件结构，提取有意义的谣言传播模式，并学习用户参与的内在表示。

　　贡献：

- 涉及了三种可解释的数据增强策略，这在谣言时间图数据中没有得到充分的探索；
- 在谣言数据集中使用对比自监督的方法进行预训练；
- REDA 远高于其他监督学习方法；

2 Methodology

　　总体框架如下：

　　主要包括三个模块：

- event graph data augmentation
- contrastive pre-training
- model fne-tuning

2.1 Event Augmentation

　　谣言事件中存在两种用户：

- malicious users
- naive users

　　malicious users 故意传播虚假信息，nvaive users 无意中帮助了 malicious users 传播虚假信息，所以 mask node 是可行的。

　　给定除 root node 的节点特征矩阵 $E^{-r} \in \mathbb{R}^{(|\mathcal{V}|-1) \times d}$，以及一个 mask rate $p_{m}$，mask 后的节点特征矩阵为：

　　　　$E_{\text {mask }}^{-r}=\mathrm{M} \odot E^{-r} $

　　其中，$M \in\{0,1\}^{(|\mathcal{V}|-1) \times d}$ 代表着 mask matrix，随机删除 $ (|\mathcal{V}|-1) \times p_{m}$ 行节点特征矩阵。

2.2 Subgraph

　　用户在早期阶段通常是支持真实谣言的，所以，在模型训练时，如果过多的访问谣言事件的整个生命周期，将阻碍早期谣言检测的准确性，所以本文采取随机游走生成谣言事件的子图 $G_{i_sub}$。

2.3 Edge dropping

　　形式上，给定一个邻接矩阵 $A$ 和 $N_{e}$ 条边和丢弃率 $p_{d}$，应用 DropEdge 后的邻接矩阵 $A_{d r o p}$，其计算方法如下：

　　　　$A_{d r o p}=A-A^{\prime}$

　　其中，$A^{\prime}$ 是随机采样 $N_{e} \times p_{d} $ 条边的邻接矩阵。

2.2 Contrastive Pre-training

　　在本节将介绍如何通过在输入事件和增强事件之间的对比预训练来获得互信息。

　　形式上，对于 node $j$ 和 event graph $G$，self-supervised learning 过程如下：

　　　　$\begin{array}{l}h_{j}^{(k)} &=&\operatorname{GCL}\left(h_{j}^{(k-1)}\right) \\h^{j} &=&\operatorname{CONCAT}\left(\left\{h_{j}^{(k)}\right\}_{k=1}^{K}\right)\\H(G) &=&\operatorname{READOUT}\left(\left\{h^{j}\right\}_{j=1}^{|\mathcal{V}|}\right)\end{array}$

　　其中，$h_{j}^{(k)}$ 是节点在第 $k$ 层的特征向量。GCL 是 graph convolutional encoder ，$h^{j}$ 是通过将 GCL 所有层的特征向量汇总为一个特征向量，该特征向量捕获以每个节点为中心的不同尺度信息，$H(G)$ 是应用 READOUT 函数的给定事件图的全局表示。本文并选择 GIN 作为 GCL 和 mean 作为 READOUT 函数。对比预训练的目标是使谣言传播图数据集上的互信息（MI）最大化，其计算方法为：

　　　　${\large \begin{aligned}I_{\psi}\left(h^{j}(G) ; H(G)\right):=& \mathbb{E}\left[-\operatorname{sp}\left(-T_{\psi}\left(\vec{h}^{j}\left(G_{i}^{\text {pos }}\right), H\left(G_{i}\right)\right)\right)\right] \\&-\mathbb{E}\left[\operatorname{sp}\left(T_{\psi}\left(\vec{h}^{j}\left(G_{i}^{n e g}\right), H\left(G_{i}\right)\right)\right)\right]\end{aligned}} $

　　其中，$I_{\psi}$ 为互信息估计器，$T_{\psi}$ 为鉴别器（discriminator），$G_{i}$ 是输入 event 的 graph，$G_{i}^{\text {pos }}$ 是 $G_{i}$ 的 positive sample，$G_{i}^{\text {neg }}$ 是 $G_{i}$ 的负样本，$s p(z)=\log \left(1+e^{z}\right)$ 是 softplus function。对于正样本，可以是 $G_{i}\left(E_{\text {mask }}^{-r}\right)$，$G_{i_{-} s u b$，$G_{i}\left(A_{d r o p}\right)$，负样本是一个 batch 中其他 event graph 的局部表示。

　　在对 event graph 进行对比预训练后，我们得到了 input event graph $G_{i}$ 的预训练的向量 $H\left(G_{i}\right)$。然后，对于一个 event $C_{i}=\left[r_{i}, x_{1}^{i}, x_{2}^{i}, \cdots, x_{\left|\mathcal{V}_{i}\right|-1}^{i}, G_{i}\right]$，通过平均所有相关的回复帖子和源帖子的原始特征 $o_{i}=\frac{1}{n_{i}}\left(\sum_{j=1}^{\left|\mathcal{V}_{i}\right|-1} x_{j}^{i}+r_{i}\right)$，我们得到了文本图向量 $o_{i}$。为了强调 source post，将 contrastive vector、textual graph vector 和source post features 合并为：

　　　　$\mathbf{S}_{i}=\mathbf{C O N C A T}\left(H\left(G_{i}\right), o_{i}, r_{i}\right)$

2.3 Fine tuning

　　预训练使用了文本特征，得到了预训练的 event representation，并包含了原始特征和 source post 信息，在 fine-tune 阶段，使用预训练的参数初始化参数，并使用标签训练模型：

　　将上述生成的 $s_{i}$ 通过全连接层进行分类：

　　　　$\hat{\mathbf{y}}_{i}=\operatorname{softmax}\left(F C\left(\mathbf{S}_{i}\right)\right)$

　　最后采用交叉熵损失：

　　　　$\mathcal{L}(Y, \hat{Y})=\sum_{i=1}^{|C|} \mathbf{y}_{i} \log \hat{\mathbf{y}}_{i}+\lambda\|\Theta\|_{2}^{2}$

　　其中，$\|\Theta\|_{2}^{2}$ 代表 $L_{2}$ 正则化，$\Theta$ 代表模型参数，$\lambda$ 是 trade-off 系数。

3 Experiments

3.1 Baselines

- DTC [3]: A rumor detection approach applying decision tree that utilizes tweet features to obtain information credibility.
- SVM-TS [10]: A linear SVM-based time-series model that leverages handcrafted features to make predictions.
- RvNN [11]: A recursive tree-structured model with GRU units that learn rumor representations via the tree structure.
- PPC_RNN+CNN [8]: A rumor detection model combining RNN and CNN for early-stage rumor detection, which learns the rumor representations by modeling user and source tweets.
- Bi-GCN [2]: using directed GCN, which learns the rumor representations through Bi-directional propagation structure.

3.2 Performance Comparison

3.3 Ablation study

　　-R represent our model without root feature enhancement
　　-T represent our model without textual graph
　　-A represent our model without event augmentation
　　-M represent our model without mutual information

3.4 Limited labeled data

　　Figure 3 显示了当标签分数变化时的性能：

　　我们观察到，RDEA 对这两个数据集都比 Bi-GCN 更具有标签敏感性。此外，标签越少，改进幅度越大，说明RDEA的鲁棒性和数据有效性。

3.5 Early Rumor Detection

谣言检测（RDEA）《Rumor Detection on Social Media with Event Augmentations》的更多相关文章

谣言检测（GACL）《Rumor Detection on Social Media with Graph Adversarial Contrastive Learning》
论文信息论文标题:Rumor Detection on Social Media with Graph AdversarialContrastive Learning论文作者:Tiening Sun ...
谣言检测（PSIN）——《Divide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social Media》
论文信息论文标题:Divide-and-Conquer: Post-User Interaction Network for Fake News Detection on Social Media论 ...
谣言检测——(GCAN)《GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media》
论文信息论文标题:GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Medi ...
谣言检测（DUCK）《DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks》
论文信息论文标题:DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks论 ...
谣言检测（）《Rumor Detection with Self-supervised Learning on Texts and Social Graph》
论文信息论文标题:Rumor Detection with Self-supervised Learning on Texts and Social Graph论文作者:Yuan Gao, Xian ...
谣言检测——《MFAN: Multi-modal Feature-enhanced Attention Networks for Rumor Detection》
论文信息论文标题:MFAN: Multi-modal Feature-enhanced Attention Networks for Rumor Detection论文作者:Jiaqi Zheng, ...
谣言检测（PLAN）——《Interpretable Rumor Detection in Microblogs by Attending to User Interactions》
论文信息论文标题:Interpretable Rumor Detection in Microblogs by Attending to User Interactions论文作者:Ling Min ...
谣言检测（）《Data Fusion Oriented Graph Convolution Network Model for Rumor Detection》
论文信息论文标题:Data Fusion Oriented Graph Convolution Network Model for Rumor Detection论文作者:Erxue Min, Yu ...
谣言检测——（PSA）《Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks》
论文信息论文标题:Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks论文作者:Jiayin ...

随机推荐

DateFormat类的format方法和parse方法
/** * 使用DateFormat类中的方法format,把日期格式化为文本 * String format(Date date) 按照指定的模式把Date日期格式化为符合模式的字符串 * 使用步骤 ...
什么是双网口以太网IO模块
MXXXE系列远程IO模块工业级设计,适用于工业物联网和自动化控制系统,MxxxE工业以太网远程 I/O 配备 2 个mac层数据交换芯片的以太网端口,允许数据通过可扩展的菊花链以太网远程 I/O 阵 ...
羽夏看Linux内核——门相关入门知识
写在前面此系列是本人一个字一个字码出来的,包括示例和实验截图.如有好的建议,欢迎反馈.码字不易,如果本篇文章有帮助你的,如有闲钱,可以打赏支持我的创作.如想转载,请把我的转载信息附在文章后面,并 ...
使用fontforge修改字体，只保留数字
设计图上的数字采用了Roboto字体,原字体文件200多k,而小程序主包最大2m,承受不起这么大的字体.因为只用到了数字,所以可以使用fontforge编辑字体,删除多余的部分. 一.下载并安装fon ...
Apache Pulsar Summit Asia 2020 正式启动，演讲议题征集中！
Apache Pulsar Summit 是 Apache Pulsar 社区年度盛会,它将分布在世界各地的 Apache Pulsar 项目 Contributor.Commiter 和各企业 CT ...
Redis 12 持久化
参考源 https://www.bilibili.com/video/BV1S54y1R7SB?spm_id_from=333.999.0.0 版本本文章基于 Redis 6.2.6 概述 Redi ...
稳定好用的短连接生成平台,支持API批量生成
https://www.5w.fit/ 01 安全:快码拥有两种模式:防封模式和极速模式,防封模式使短链更加安全! 02 无流量劫持:快码短链绝不劫持流量! 03 极速:专属大量服务器,支持高并发 ...
大家都能看得懂的源码 - 如何封装 cookie/localStorage/sessionStorage hook?
本文是深入浅出 ahooks 源码系列文章的第九篇,该系列已整理成文档-地址.觉得还不错,给个 star 支持一下哈,Thanks. 今天来看看 ahooks 是怎么封装 cookie/localSt ...
使用 Vue.js 框架后的感想
前言用 Vue 已经有段时间了,把自己的所想所悟写下来,每一个想法都是非常宝贵的,记录成为生活,记录成为习惯. 简化开发 Vue 是可以辅助前端工程师开发 Web App 的一种框架,它节省很多时间 ...
Vue 监听器和计算属性到底有什么不同？
各自的适用场景计算属性临时快照官方文档对于计算属性提到了一个重要的点子--"临时快照"(可能就是前面说的计算属性缓存),每当源状态发生变化时,就会创建一个新的快照. 有时候创建 ...

谣言检测（RDEA）《Rumor Detection on Social Media with Event Augmentations》