论文信息

论文标题：Unsupervised Domain Adaptation for COVID-19 Information Service with Contrastive Adversarial Domain Mixup
论文作者：Huimin Zeng, Zhenrui Yue, Ziyi Kou, Lanyu Shang, Yang Zhang, Dong Wang
论文来源：aRxiv 2022
论文地址：download
论文代码：download

1 Introduction

2 Problem Statement

　　Regarding misinformation detection, we aim at training a model $f$ , which takes an input text $\boldsymbol{x}$ (a COVID-19 claim or a piece of news) to predict whether the information contained in $\boldsymbol{x}$ is valid or not (i.e., a binary classification task). Moreover, in our domain adaptation problem, we use $\mathcal{P}$ to denote source domain data distribution and $\mathcal{Q}$ for the target domain data distribution. Each data point ($\boldsymbol{x}$, $y$) contains an input segment of COVID-19 claim or news ($\boldsymbol{x}$) and a label $y \in\{0,1\}$ ( $y=1$ for true information and $y=0$ for false information). To differentiate the notations of the data sampled from the source distribution $\mathcal{P}$ and the target distribution $\mathcal{Q}$ , we further introduce two definitions of the domain data:

- Source domain: The subscript $s$ is used to denote the source domain data: $\mathcal{X}_{s}=\left\{\left(\boldsymbol{x}_{s}, y_{s}\right) \mid\left(\boldsymbol{x}_{s}, y_{s}\right) \sim \mathcal{P}\right\}$ .
- Target domain: Similarly, the subscript t is used to denote the target domain data: $\mathcal{X}_{t}=\left\{\boldsymbol{x}_{t} \mid \boldsymbol{x}_{t} \sim \mathcal{P}\right\}$ . Note that in our unsupervised setting, the ground truth labels of target domain data $y_{t}$ are not used during training.

　　Our goal is to adapt a classifier $f$ trained on $\mathcal{P}$ to $\mathcal{Q}$ . For a given target domain input $\boldsymbol{x}_{t}$ , a well-adapted model aims at making predictions as correctly as possible.

3 Method

　　整体框架：

3.1 Domain Discriminator

　　第一步是训练一个域鉴别器 $f_{D}$ 来分类输入数据是属于源域还是属于目标域。该域鉴别器与 COVID 模型共享相同的 BERT Encoder $f_{e}$，并具有不同的二进制分类模块 $f_{D}$。域鉴别器以 BERT Encoder 中的标记 [CLS] 表示作为输入，以预测输入数据的域，如所示：

　　　　$\hat{y}=f_{D}(\boldsymbol{z}) \quad\quad(1)$

　　其中，$z$ 是 token [CLS] 的表示。

　　对于 $f_{D}$ 的训练，明确地将源域数据的域标签 $y_{D}$ 定义为 $y_{D}=0$，将目标域数据的域标签定义为 $y_{D}=1$。因此，对域鉴别器的训练可以表述为：

　　　　$\underset{f_{D}}{\text{min}} \;\; \mathbb{E}_{\left(\boldsymbol{x}, y_{D}\right) \sim \mathcal{X}^{\prime}}\left[l\left(f_{D}\left(f_{e}(\boldsymbol{x})\right), y_{D}\right)\right] \quad\quad(2)$

　　其中，$\mathcal{X}^{\prime}$ 表示带有域标签的源域和目标域训练数据的合并数据集。

3.2 Adversarial Domain Mixup

　　在训练了域鉴别器后，我们提出直接干扰来自源域和目标域的输入数据的潜在表示到域鉴别器的决策边界，如 Figure 1b 所示。为此，来自两个域的扰动表示（即域对抗表示）可以变得更接近，表明域间隙减小。在此，从两个域生成的域对抗性表示在模型的潜在特征空间中形成了一个平滑的中间域混合。在数学上，通过求解一个优化问题，可以找到干扰训练样本 $ \boldsymbol{x}$ 的潜在表示 $ \boldsymbol{z}$ 的最优扰动 $\delta^{*}$：

　　　　$\begin{array}{r}\mathcal{A}\left(f_{e}, f_{D}, \boldsymbol{x}, y_{D}, \epsilon\right)=\underset{\boldsymbol{\delta}}{\text{max}} \left[l\left(f_{D}(\boldsymbol{z}+\boldsymbol{\delta}), y_{D}\right)\right] \\\text { s.t. } \quad\|\boldsymbol{\delta}\| \leq \epsilon, \quad \boldsymbol{z}=f_{e}(\boldsymbol{x})\end{array}\quad\quad(3)$

　　注意，在上面的方程中，我们引入了一个超参数 $\epsilon$ 来约束扰动 $\delta$ 的范数，从而避免了无穷大解。最后，将 $\text{Eq.3}$ 应用于合并训练集 $\mathcal{X}^{\prime}$ 中的所有训练样本，得到对抗域混合 $\mathcal{Z}^{\prime}$：

　　　　$\begin{aligned}\mathcal{Z}^{\prime} & =\left\{\boldsymbol{z}^{\prime} \mid \boldsymbol{z}^{\prime}=\boldsymbol{z}+\mathcal{A}\left(f_{e}, f_{D}, \boldsymbol{x}, y_{D}, \epsilon\right),\left(\boldsymbol{x}, y_{D}\right) \in \mathcal{X}^{\prime}\right\} \\& :=\mathcal{Z}_{s}^{\prime} \cup \mathcal{Z}_{t}^{\prime}\end{aligned}\quad\quad(4)$

　　其中，$\mathcal{Z}_{s}^{\prime}$ 是扰动的源特性，$\mathcal{Z}_{t}^{\prime}$ 是受干扰的目标特征。我们使用投影梯度下降（PGD）来近似 $\text{Eq.3}$ 的解，如在[7]，[8]。

3.3 Contrastive Domain Adaptation

　　接下来，受[6]的启发，我们提出了 $\mathcal{Z}_{a d v}$ 的双重对比自适应损失，以进一步将源数据域的知识适应到目标数据域。首先，我们减少了类内表示之间的域差异。也就是说，如果一个表示从源数据域的标签是真（或假）和一个表示从目标数据域的伪标签是真（或假），那么这两个表示被视为类内表示，我们减少域之间的差异。其次，如 Figure 1c 所示，真实信息和虚假信息的表示之间的差异将被扩大。

　　为了计算我们提出的对比自适应损失，我们建议使用径向基函数（RBF）来度量标记类之间的差异。在[11]中，RBF 被证明是量化深度神经网络中不确定性的有效工具。由于我们的伪标记过程是为了自动过滤出目标域数据的低置信度标签，因此使用RBF来衡量标记类之间的差异可以有效地提高伪标签的质量，最终有助于模型的域适应。

　　在形式上，使用 RBF 内核的定义：$k\left(z_{1}, z_{2}\right)=\exp \left[-\frac{\left\|\boldsymbol{z}_{1}-\boldsymbol{z}_{2}\right\|^{2}}{2 \sigma^{2}}\right]$

　　我们定义了错误信息检测任务的类感知损失如下：

　　　　$\begin{aligned}\mathcal{L}_{\text {con }}\left(\mathcal{Z}^{\prime}\right) =&-\sum_{i=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{j=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \frac{\mathbb{1}\left(y_{s}^{(i)}=0, \hat{y}_{t}^{(j)}=0\right) k\left(\boldsymbol{z}_{s}^{(i)}, \boldsymbol{z}_{t}^{(j)}\right)}{\sum_{l=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{m=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \mathbb{1}\left(y_{s}^{(l)}=0, \hat{y}_{t}^{(m)}=0\right)} \\& -\sum_{i=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{j=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \frac{\mathbb{1}\left(y_{s}^{(i)}=1, \hat{y}_{t}^{(j)}=1\right) k\left(\boldsymbol{z}_{s}^{(i)}, \boldsymbol{z}_{t}^{(j)}\right)}{\sum_{l=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{m=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \mathbb{1}\left(y_{s}^{(l)}=1, \hat{y}_{t}^{(m)}=1\right)} \\& +\sum_{i=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{j=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \frac{\mathbb{1}\left(y_{s}^{(i)}=1, y_{s}^{(j)}=0\right) k\left(\boldsymbol{z}_{s}^{(i)}, \boldsymbol{z}_{s}^{(j)}\right)}{\sum_{l=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \sum_{m=1}^{\left|\mathcal{Z}_{s}^{\prime}\right|} \mathbb{1}\left(y_{s}^{(l)}=1, y_{s}^{(m)}=0\right)} \\& +\sum_{i=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \sum_{j=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \frac{\mathbb{1}\left(\hat{y}_{t}^{(i)}=1, \hat{y}_{t}^{(j)}=0\right) k\left(\boldsymbol{z}_{t}^{(i)}, \boldsymbol{z}_{t}^{(j)}\right)}{\sum_{l=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \sum_{m=1}^{\left|\mathcal{Z}_{t}^{\prime}\right|} \mathbb{1}\left(\hat{y}_{t}^{(l)}=1, \hat{y}_{t}^{(m)}=0\right)}\end{aligned}\quad\quad(5)$

　　其中，$\hat{y}_{t}$ 为目标域样本的伪标签，$z$ 表示标记 CLS 的表示。

3.4 Overall Contrastive Adaptation Loss

　　现在，我们将任务分类问题的交叉熵损失和上述对比自适应损失合并为 COVID 模型的单一优化目标：

　　　　$\mathcal{L}_{\text {all }}=\mathcal{L}_{c e}(\boldsymbol{\mathcal { X }})+\lambda \mathcal{L}_{\text {con }}\left(\mathcal{Z}^{\prime}\right) \quad\quad(6)$

　　其中，$\mathcal{L}_{c e}$ 代表交叉熵损失函数。

4 Experiment

　　在我们的实验中，我们使用了三个 source misinformation datasets ：GossipCop , LIAR and PHEME，两个 COVID misinformation datasets：Constraint and ANTiVax。

　　本文使用 DAAT 作为 Encoder ，之前的工作使用 RoBERTa，

　　Results：

虚假新闻检测（CADM）《Unsupervised Domain Adaptation for COVID-19 Information Service with Contrastive Adversarial Domain Mixup》的更多相关文章

Domain Adaptation （3）论文翻译
Abstract The recent success of deep neural networks relies on massive amounts of labeled data. For a ...
Unsupervised Domain Adaptation by Backpropagation
目录概主要内容代码 Ganin Y. and Lempitsky V. Unsupervised Domain Adaptation by Backpropagation. ICML 2015. ...
Deep Transfer Network: Unsupervised Domain Adaptation
转自:http://blog.csdn.net/mao_xiao_feng/article/details/54426101 一.Domain adaptation 在开始介绍之前,首先我们需要知道D ...
论文阅读 | A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes
paper链接:https://arxiv.org/pdf/1812.09953.pdf code链接:https://github.com/YangZhang4065/AdaptationSeg 摘 ...
Domain Adaptation （1）选题讲解
1 所选论文论文题目: <Unsupervised Domain Adaptation with Residual Transfer Networks> 论文信息: NIPS2016, ...
A Primer on Domain Adaptation Theory and Applications
目录概主要内容符号说明 Prior shift Covariate shift KMM Concept shift Subspace mapping Wasserstein distance 应 ...
关于模式识别中的domain generalization 和 domain adaptation
今晚听了李文博士的报告"Domain Generalization and Adaptation using Low-Rank Examplar Classifiers",讲的很精 ...
【论文笔记】Domain Adaptation via Transfer Component Analysis
论文题目:<Domain Adaptation via Transfer Component Analysis> 论文作者:Sinno Jialin Pan, Ivor W. Tsang, ...
域适应（Domain adaptation）
定义在迁移学习中, 当源域和目标的数据分布不同 ,但两个任务相同时,这种特殊的迁移学习叫做域适应 (Domain Adaptation). Domain adaptation有哪些实现手段呢? ...
Domain Adaptation论文笔记
领域自适应问题一般有两个域,一个是源域,一个是目标域,领域自适应可利用来自源域的带标签的数据(源域中有大量带标签的数据)来帮助学习目标域中的网络参数(目标域中很少甚至没有带标签的数据).领域自适应如今 ...

随机推荐

bfs与dfs基础
bfs像二叉树的层序遍历像这个图走bfs就{1, 2, 3, 4, 5, 6, 7, 8}这样走: dfs就{1, 2, 5, 6, 3, 7, 8, 4}. bfs与queue相结合,走到哪就把哪 ...
Kafka之配置信息
Kafka之配置信息一.Broker配置信息属性默认值描述 broker.id 必填参数,broker的唯一标识 log.dirs /tmp/kafka-logs Kafka数据存放的目录 ...
Rdt2.1 和 Rdt2.2的详细解释
Rdt2.1 和 Rdt2.2的详细解释目录 Rdt2.1 和 Rdt2.2的详细解释这俩为啥会出现? 解决之道 Rdt 2.1 Rdt2.2 可靠数据传递中Rdt1.0, Rdt2.0, Rdt ...
Dubbo 02: 直连式
直连式需要用到两个相互独立的maven的web项目项目1:o1-link-userservice-provider 作为服务的提供者项目2:o2-link-consumer 作为使用服务的消费者 ...
抛砖系列之redis监控命令
前言 redis是一款非常流行的kv数据库,以高性能著称,其高吞吐.低延迟等特性让广大开发者趋之若鹜,每每看到别人发出的redis故障报告都让我产生一种居安思危,以史为鉴的危机感,恰逢今年十一西安烟雨 ...
【SSM】学习笔记（一）—— Spring入门
原视频:https://www.bilibili.com/video/BV1Fi4y1S7ix?p=1 P1~P42 目录一.Spring 概述 1.1.Spring 家族 1.2.Spring 发 ...
记录因Sharding Jdbc批量操作引发的一次fullGC
周五晚上告警群突然收到了一条告警消息,点开一看,应用 fullGC 了. 于是赶紧联系运维下载堆内存快照,进行分析. 内存分析使用 MemoryAnalyzer 打开堆文件 mat 下载地址:htt ...
Fibonacci数列与杨辉三角
Fibonacci数列:除第一个与第二个数之外,其余数均由前两个数相加得到: 1, 1, 2, 3, 5, 8, 13, 21, 34, ... 通过生成器,程序如下: def fib(max): m ...
springboot滚动分页展示列表（类似layui瀑布流效果）
背景: 公司项目要求获取用户关联的好友列表,要求分页查询,十条数据一页,滚动页面是点击加载更多,显示下一页列表. 示例图: 实现: 本项目采用的前端模板是freemaker,主要前端页面代码(没有 ...
Jenkinsfile 同时检出多个 Git 仓库
前置通常,在 Jenkinsfile 中使用 Git 仓库是这样的: stage('Checkout git repo') { steps { checkout([ $class: 'GitSCM' ...

虚假新闻检测（CADM）《Unsupervised Domain Adaptation for COVID-19 Information Service with Contrastive Adversarial Domain Mixup》