论文信息

论文标题：Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation
论文作者：Chao Chen , Zhihong Chen , Boyuan Jiang , Xinyu Jin
论文来源：AAAI 2019
论文地址：download
论文代码：download
引用次数：175

1 Introduction

　　近年来，大多数工作集中于减少不同领域之间的分布差异来学习共享的特征表示，由于所有的域对齐方法只能减少而不能消除域偏移，因此分布在簇边缘或远离相应类中心的目标域样本很容易被从源域学习到的超平面误分类。为缓解这一问题，提出联合域对齐和判别特征学习，有利于域对齐和分类。具体提出了一种基于实例的判别特征学习方法和一种基于中心的判别特征学习方法，两者均保证了域不变特征具有更好的类内紧凑性和类间可分性。大量的实验表明，在共享特征空间中学习鉴别特征可以显著提高性能。
　　域适应，关注如何从源域的大量标记样本和目标域有限或没有标记的目标样本学习分类，可以分为如下三种方法：

- feature-based domain adaptation
- instance-based domain adaptation
- classifier-based domain adaptation

2 Method

　　总体框架如下：

2.1 Problem statement

　　In this work, following the settings of unsupervised domain adaptation, we define the labeled source data as $\mathcal{D}^{s}= \left\{\mathbf{X}^{s}, \mathbf{Y}^{s}\right\}=\left\{\left(\boldsymbol{x}_{i}^{s}, y_{i}^{s}\right)\right\}_{i=1}^{n_{s}}$ and define the unlabeled target data as $\mathcal{D}^{t}=\left\{\mathbf{X}^{t}\right\}=\left\{\boldsymbol{x}_{i}^{t}\right\}_{i=1}^{n_{t}}$ , where $\mathbf{x}^{s}$ and $\mathbf{x}^{t}$ have the same dimension $\mathbf{x}^{s(t)} \in \mathbb{R}^{d}$ . Let $\boldsymbol{\Theta}$ denotes the shared parameters to be learned. $\mathbf{H}_{s} \in \mathbb{R}^{b \times L}$ and $\mathbf{H}_{t} \in \mathbb{R}^{b \times L}$ denote the learned deep features in the bottleneck layer regard to the source stream and target stream, respectively. $b$ indicates the batch size during the training stage and $L$ is the number of hidden neurons in the bottleneck layer. Then, the networks can be trained by minimizing the following loss function.

　　　　$\begin{array}{l}\mathcal{L}\left(\boldsymbol{\Theta} \mid \mathbf{X}_{s}, \mathbf{Y}_{s}, \mathbf{X}_{t}\right)=\mathcal{L}_{s}+\lambda_{1} \mathcal{L}_{c}+\lambda_{2} \mathcal{L}_{d} \quad\quad(1)\\\mathcal{L}_{s}=\frac{1}{n_{s}} \sum_{i=1}^{n_{s}} c\left(\boldsymbol{\Theta} \mid \boldsymbol{x}_{i}^{s}, y_{i}^{s}\right) \quad\quad \quad\quad \quad\quad\quad\quad(2)\\\mathcal{L}_{c}=C O R A L\left(\mathbf{H}_{s}, \mathbf{H}_{t}\right) \quad\quad \quad\quad \quad\quad \quad\quad(3)\\\mathcal{L}_{d}=\mathcal{J}_{d}\left(\boldsymbol{\Theta} \mid \mathbf{X}^{s}, \mathbf{Y}^{s}\right) \quad\quad \quad\quad \quad\quad\quad\quad \quad\quad(4)\end{array}$

　　其中

- $\mathcal{L}_{s}$ 代表源域分类损失；
- $\mathcal{L}_{c}=\operatorname{CORAL}\left(\mathbf{H}_{s}, \mathbf{H}_{t}\right) $ 表示通过相关性对齐度量的域差异损失；
- $\mathcal{J}_{d}\left(\boldsymbol{\Theta} \mid \mathbf{X}^{s}, \mathbf{Y}^{s}\right) $ 代表鉴别损失，保证了域不变特征具有更好的类内紧致性和类间可分性；

2.2 Correlation Alignment ($\text{CORAL}$)

　　为学习域不变特征，通过对齐源特征和目标特征的协方差来减少域差异。域差异损失如下：

　　　　$\mathcal{L}_{c}=\operatorname{CORAL}\left(\mathbf{H}_{s}, \mathbf{H}_{t}\right)=\frac{1}{4 L^{2}}\left\|\operatorname{Cov}\left(\mathbf{H}_{s}\right)-\operatorname{Cov}\left(\mathbf{H}_{t}\right)\right\|_{F}^{2}\quad\quad(5)$

　　其中：

- $\|\cdot\|_{F}^{2}$ 为矩阵 $\text{Frobenius}$ 范数；　　
- $\operatorname{Cov}\left(\mathbf{H}_{s}\right)$ 和 $\operatorname{Cov}\left(\mathbf{H}_{t}\right)$ 表示 $\text{bottleneck layer}$ 中源特征和目标特征的协方差矩阵；　　
  - $\operatorname{Cov}\left(\mathbf{H}_{s}\right)=\mathbf{H}_{s}^{\top} \mathbf{J}_{b} \mathbf{H}_{s}$
  - $\operatorname{Cov}\left(\mathbf{H}_{t}\right)=\mathbf{H}_{t}^{\top} \mathbf{J}_{b} \mathbf{H}_{t}$
    - $\mathbf{J}_{b}=\mathbf{I}_{b}-\frac{1}{b} \mathbf{1}_{n} \mathbf{1}_{n}^{T^{s}}$ 是 $\text{centralized matrix}$；
    - $\mathbf{1}_{b} \in \mathbb{R}^{b}$ 全 $1$ 列向量；
    - $b$ 是批大小；

　　注意，训练过程是通过小批量 $\text{SGD}$ 实现的，因此，在每次迭代中，只有一批训练样本被对齐。

2.3 Discriminative Feature Learning

　　为学习更具判别性的特征，提出两种判别特征学习方法：基于实例的判别特征学习和基于中心的判别特征学习。

　　注意，整个训练阶段都是基于小批量 $\text{SGD}$ 的。因此，下面给出的鉴别损失是基于一批样本的。

2.3.1 Instance-Based Discriminative Loss

　　基于实例的判别特征学习的动机是：同一类的样本在特征空间中应该尽可能地接近，不同类的样本之间应有较大距离。

　　基于实例的判别损失 $\mathcal{L}_{d}^{I}$ 可以表示为：

　　　　$\mathcal{J}_{d}^{I}\left(\mathbf{h}_{i}^{s}, \mathbf{h}_{j}^{s}\right)=\left\{\begin{array}{ll}\max \left(0,\left\|\mathbf{h}_{i}^{s}-\mathbf{h}_{j}^{s}\right\|_{2}-m_{1}\right)^{2} & C_{i j}=1 \\\max \left(0, m_{2}-\left\|\mathbf{h}_{i}^{s}-\mathbf{h}_{j}^{s}\right\|_{2}\right)^{2} & C_{i j}=0\end{array}\right.\quad\quad(6)$

　　　　$\mathcal{L}_{d}^{I}=\sum\limits _{i, j=1}^{n_{s}} \mathcal{J}_{d}^{I}\left(\mathbf{h}_{i}^{s}, \mathbf{h}_{j}^{s}\right)\quad\quad(7)$

　　其中：

- $\mathbf{H}_{s}=\left[\mathbf{h}_{1}^{s} ; \mathbf{h}_{2}^{s} ; \cdots ; \mathbf{h}_{b}^{s}\right] $；
- $C_{i j}=1$ 表示 $\mathbf{h}_{i}^{s}$ 和 $\mathbf{h}_{j}^{s}$ 来自同一个类，$C_{i j}=0$ 表示 $\mathbf{h}_{i}^{s}$ 和 $\mathbf{h}_{j}^{s}$ 来自不同的类；
- $m_{2}$ 大于 $m_{1}$；

　　从 $\text{Eq.6}$、$\text{Eq.7}$ 中可以看出，判别损失会使类内样本之间的距离不超过 $m_{1}$，而类间样本之间的距离至少 $m_{2}$。

　　为简洁起见，将深度特征 $\mathbf{H}_{s}$ 的成对距离表示为 $\mathbf{D}^{H} \in \mathbb{R}^{b \times b}$，其中 $\mathbf{D}_{i j}^{H}=\left\|\mathbf{h}_{i}^{s}-\mathbf{h}_{j}^{s}\right\|_{2}$。设 $\mathbf{L} \in \mathbb{R}^{b \times b}$ 表示指示器矩阵，如果第 $i$ 个样本和第 $j$ 个样本来自同一个类，则表示 $\mathbf{L}_{i j}=1$，如果它们来自不同的类，则表示 $\mathbf{L}_{i j}=0$。然后，基于实例的判别损失可简化为：

　　　　$\begin{aligned}\mathcal{L}_{d}^{I} =\alpha\left\|\max \left(0, \mathbf{D}^{H}-m_{1}\right)^{2} \circ \mathbf{L}\right\|_{\text {sum }}+\left\|\max \left(0, m_{2}-\mathbf{D}^{H}\right)^{2} \circ(1-\mathbf{L})\right\|_{s u m}\end{aligned}\quad\quad(8)$

2.3.2 Center-Based Discriminative Loss

　　基于实例的鉴别损失需要计算样本之间的成对距离，计算成本较高。受 Center Loss 惩罚每个样本到相应类中心的距离的启发，本文提出基于中心的判别特征学习：

　　　　$\begin{array}{c}\mathcal{L}_{d}^{C}=\beta \sum\limits _{i=1}^{n_{s}} \max \left(0,\left\|\mathbf{h}_{i}^{s}-\mathbf{c}_{y_{i}}\right\|_{2}^{2}-m_{1}\right)+\sum\limits_{i, j=1, i \neq j}^{c} \max \left(0, m_{2}-\left\|\mathbf{c}_{i}-\mathbf{c}_{j}\right\|_{2}^{2}\right)\end{array}\quad\quad(9)$

　　其中：

- $\beta$ 为权衡参数；
- $m_{1}$ 和 $m_{2}$ 为两个约束边距 $\left(m_{1}<m_{2}\right)$；
- $\mathbf{c}_{y_{i}} \in \mathbb{R}^{d}$ 表示第 $y_{i}$ 类的质心，$y_{i} \in\{1,2, \cdots, c\}$，$c$ 表示类数；

　　理想情况下，类中心 $\mathbf{c}_{i}$ 应通过平均所有样本的深层特征来计算。但由于本文是基于小批量进行更新的，因此很难用整个训练集对深度特征进行平均。在此，本文做了一个必要的修改，对于 $\text{Eq.9}$ 中判别损失的第二项，用于度量类间可分性的 $\mathbf{c}_{i}$ 和 $\mathbf{c}_{j}$ 是通过对当前一批深度特征进行平均来近似计算的，称之为 “批类中心” 。相反，用于测量类内紧致性的 $\mathbf{c}_{y_{i}}$ 应该更准确，也更接近 “全局类中心”。因此，在每次迭代中更新 $\mathbf{c}_{y_{i}}$ 为

　　　　$\begin{array}{l}\Delta \mathbf{c}_{j}=\frac{\sum\limits _{i=1}^{b} \delta\left(y_{i}=j\right)\left(\mathbf{c}_{j}-\mathbf{h}_{i}^{s}\right)}{1+\sum\limits_{i=1}^{b} \delta\left(y_{i}=j\right)} \quad\quad(10) \\\mathbf{c}_{j}^{t+1}=\mathbf{c}_{j}^{t}-\gamma \cdot \Delta \mathbf{c}_{j}^{t}\quad\quad(11)\end{array}$

　　“全局类中心” 在第一次迭代中被初始化为“批类中心”，在每次迭代中通过 $\text{Eq.10}$、$\text{Eq.11}$ 进行更新，其中 $\gamma$ 是更新“全局类中心”的学习速率。为简洁起见，$\text{Eq.9}$ 可以简化为

　　　　$\begin{array}{r}\mathcal{L}_{d}^{C}=\beta\left\|\max \left(0, \mathbf{H}^{c}-m_{1}\right)\right\|_{\text {sum }}+ \left\|\max \left(0, m_{2}-\mathbf{D}^{c}\right) \circ \mathbf{M}\right\|_{\text {sum }}\end{array}$

　　其中：

- $\mathbf{H}^{c}=\left[\mathbf{h}_{1}^{c} ; \mathbf{h}_{2}^{c} ; \ldots ; \mathbf{h}_{b}^{c}\right]$，$\mathbf{h}_{i}^{c}=\left\|\mathbf{h}_{i}^{s}-\mathbf{c}_{y_{i}}\right\|_{2}^{2}$ 表示第 $i$ 个样本深层特征与其对应的中心 $\mathbf{c}_{y_{i}}$ 之间的距离；
- $ \mathbf{D}^{c} \in \mathbb{R}^{c \times c} $ 表示“批类中心”的成对距离，即 $\mathbf{D}_{i j}^{c}=\left\|\mathbf{c}_{i}-\mathbf{c}_{j}\right\|_{2}^{2} $；

　　不同于 $\text{Center Loss }$，它只考虑类内的紧致性，本文不仅惩罚了深度特征与其相应的类中心之间的距离，而且在不同类别的中心之间加强了较大的边际。

2.4 Training

　　所提出的 $\text{Instance-Based joint discriminative domain adaptation (JDDA-I)}$和 $\text{Center-Based joint discriminative domain adaptation (JDDA-C)}$ 都可以通过小批量SGD轻松实现。对于 $\text{JDDA-I}$，总损失为 $\mathcal{L}=\mathcal{L}_{s}+\lambda_{1} \mathcal{L}_{c}+\lambda_{2}^{I} \mathcal{L}_{d}^{I}$，$\mathcal{L}_{c}$ 代表源域的分类损失。因此，参数 $\Theta$ 可以通过标准的反向传播直接更新

　　　　$\Theta^{t+1}=\Theta^{t}-\eta \frac{\partial\left(\mathcal{L}_{s}+\lambda_{1} \mathcal{L}_{c}+\lambda_{2}^{I} \mathcal{L}_{d}^{I}\right)}{\partial \mathbf{x}_{\mathbf{i}}}\quad\quad(13)$

　　由于 “global class center” 不能通过一批样本来计算，因此 $\text{JDDA-C}$ 必须在每次迭代中同时更新 $\Theta$ 和“全局类中心”：

　　　　$\boldsymbol{\Theta}^{t+1}=\boldsymbol{\Theta}^{t}-\eta \frac{\partial\left(\mathcal{L}_{s}+\lambda_{1} \mathcal{L}_{c}+\lambda_{2}^{C} \mathcal{L}_{d}^{C}\right)}{\partial \mathbf{x}_{\mathbf{i}}}$

　　　　$\mathbf{c}_{j}^{t+1}=\mathbf{c}_{j}^{t}-\gamma \cdot \Delta \mathbf{c}_{j}^{t} \quad j=1,2, \cdots, c\quad\quad(14)$

3 Experiments

====

迁移学习（JDDA）《Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation》的更多相关文章

[论文阅读] A Discriminative Feature Learning Approach for Deep Face Recognition (Center Loss)
原文: A Discriminative Feature Learning Approach for Deep Face Recognition 用于人脸识别的center loss. 1)同时学习每 ...
Center Loss - A Discriminative Feature Learning Approach for Deep Face Recognition
URL:http://ydwen.github.io/papers/WenECCV16.pdf这篇论文主要的贡献就是提出了Center Loss的损失函数,利用Softmax Loss和Center ...
A Discriminative Feature Learning Approach for Deep Face Recognition
url: https://kpzhang93.github.io/papers/eccv2016.pdf year: ECCV2016 abstract 对于人脸识别任务来说, 网络学习到的特征具有判 ...
Sebastian Ruder : NLP 领域知名博主博士论文面向自然语言处理的神经网络迁移学习
Sebastian Ruder 博士的答辩 PPT<Neural Transfer Learning for Natural Language Processing>介绍了面向自然语言的迁 ...
《A Survey on Transfer Learning》迁移学习研究综述翻译
迁移学习研究综述 Sinno Jialin Pan and Qiang Yang,Fellow, IEEE 摘要: 在许多机器学习和数据挖掘算法中,一个重要的假设就是目前的训练数据和将来的训练数据 ...
Domain adaptation：连接机器学习（Machine Learning）与迁移学习（Transfer Learning）
domain adaptation(域适配)是一个连接机器学习(machine learning)与迁移学习(transfer learning)的新领域.这一问题的提出在于从原始问题(对应一个 so ...
迁移学习(Transformer)，面试看这些就够了！(附代码)
1. 什么是迁移学习迁移学习(Transformer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中.迁移学习是通过从已学习的相 ...
中文NER的那些事儿2. 多任务，对抗迁移学习详解&代码实现
第一章我们简单了解了NER任务和基线模型Bert-Bilstm-CRF基线模型详解&代码实现,这一章按解决问题的方法来划分,我们聊聊多任务学习,和对抗迁移学习是如何优化实体识别中边界模糊,垂直 ...
【迁移学习】2010-A Survey on Transfer Learning
资源:http://www.cse.ust.hk/TL/ 简介: 一个例子: 关于照片的情感分析. 源:比如你之前已经搜集了大量N种类型物品的图片进行了大量的人工标记(label),耗费了巨大的人力物 ...
【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...

随机推荐

SQL生成脚本
右键要生成脚本的数据库选择task 选择Generate script 选择需要生成脚本的table.view.procedure
Linux的挖矿木马病毒清除（kswapd0进程）
1.top查看资源使用情况看到这些进程一直在变化,但是,主要是由于kswapd0进程在作怪,占据了99%以上的CUP,查找资料后,发现它就是挖矿进程 2.排查kswapd0进程执行命令netsta ...
vue-axios更改操作
<template> <div class="nav"> <label for="">新部门</label>&l ...
python字符串的一些操作
# 1.变量的多次赋值 print('1.变量的多次赋值') name = '小明' # 没有意义的 name = '小刚' # 对前面创建的变量名称进行覆盖 # 删除原来的数据,写入新的数据 pri ...
(线段树) P4588 数学计算
小豆现在有一个数 x,初始值为 1.小豆有 QQ 次操作,操作有两种类型: 1 m:将 x变为 x × m,并输出 x mod M 2 pos:将 x 变为 x 除以第 pos次操作所乘的数(保证第 ...
KatalonRecorder系列（一）：基本使用+XPath元素定位
一.简介 Katalon Recorder是基于selenium的浏览器插件,支持火狐和chrome.可以录制web上的操作并回放,还能导入导出脚本. 二.安装可在谷歌商店或者火狐附件组件中搜索并选 ...
领域驱动设计（DDD）在美团点评业务系统的实践
前言至少 30 年以前,一些软件设计人员就已经意识到领域建模和设计的重要性,并形成一种思潮,Eric Evans 将其定义为领域驱动设计(Domain-Driven Design,简称 DDD).在 ...
C++ undered_map哈希表用法——leetcode两数之和
undered_map 头文件:#include<undered_map> 创建表undered_map<key,value> Map_name; 插入元素 a[key]=va ...
黏包现象、struct模块和解决黏包问题的流程、UDP协议、并发编程理论、多道程序设计技术及进程理论 _
目录黏包现象二.struct模块及解决黏包问题的流程三.粘包代码实战 UDP协议(了解) 并发编程理论多道技术进程理论进程的并行与并发进程的三状态黏包现象什么是粘包 1.服务端连续执 ...
css网页布局设置总结
目录 1 网页样式 1.1 引入方法 1.1.1内联样式 1.1.2内部样式表 1.1.3链接外部样式 1.1.4导入外部样式 1.2 基础语法 1.3 选择器的分类 1.3.1标记选择器 1.3.2 ...

迁移学习（JDDA） 《Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation》