Attribute2Image --- Conditional Image Generation from Visual Attributes 论文笔记

Attribute2Image --- Conditional Image Generation from Visual Attributes

Target：本文提出一种根据属性生成图像的产生式模型。

　　有了具体属性的协助，生成的图像更加真实，降低了采样的不确定性。

　　基于这个假设，本文提出一种学习框架，得到了基于属性的产生式模型。

1. Attribute-conditioned Generative Modeling of Images.

　　3.1 Base Model: Conditional Variational Auto-Encoder (CVAE)

　　关于该节，可以参考博文：http://www.cnblogs.com/wangxiaocvpr/p/6231019.html

　　给定属性 y 和 latent variable z, 我们的目标是构建一个模型，可以在条件 y 和 z 的基础上产生真实的图像。此处，我们将 $p_\theta$ 看作是一个产生器，参数为 $\theta$。

　　条件式图像产生是简单的两部操作，如下：

　　1. 随机的从先验分布 p(z) 中采样出 latent variable z;

　　2. 给定 y 和 z 作为条件变量，从 $p_\theta (x|y, z)$ 产生图像 x。

　　此处，学习的目标是找到最佳的参数 $\theta$ 可以最大化 log-likelihood $log p_\theta (x|y)$. VAE 试着去最大化 log-likelihood 的 variational lower bound。特别的，一个辅助的分布 q 被引入来估计真实的后验概率。

　　此处，the prior $p_\theta (z)$ 被认为是服从各项同性的多方差高斯分布（isotropic multivariate Gaussian distribution），两个条件分布 p 和 q 是多方差高斯分布。我们将辅助的 proposal distribution q 看作是 recognition model，条件数据分布 p 是 generation model。

　　上述模型的第一项 KL（q|p）是一个正则化项，目标是减少 the prior p(z) 和 the proposal distribution q 之间的差距，第二项是样本的 log likelihood。

　　实际上，我们通常考虑 a deterministic generation function 给定 z 和 y 的条件分布 $p_{\theta}(x|z,y)$ 的均值 $x = \mu_{\theta}(z, y)$ 。所以，标准的偏差函数 $\delta_\theta(z, y)$ 是一个固定的常量，并被所有像素点共享，因为 latent factors 捕获了所有的 data variation。所以，我们可以将第二项改写为重构误差 L(*,*)（即：l2 loss）:

　　3.2. Disentangling CVAE with a Layered Representation.

　　　　一张图像可以看做是一个 foreground layer 和 background layer 的组合，如下：

　　　　其中，圆圈符号表示元素级相乘（element-wise product）。g 是 an occlusion layer or a gating function 决定背景像素点的可见性，1-g 表示了前景像素点的可见性。

　　　　但是基于上述公式的 model 可能受到错误预测的 mask 的干扰，因为 it gates the foreground region with imperfect mask estimation.

　　　　我们预测下面的函数，该函数对 mask的预测误差更加鲁邦：

　　　　当照明条件稳定的时候，以及背景在一定的距离，我们放心的假设: foreground and background pixels 是从相互独立的 latent factors.

　　　　为了这个目标，我们提出一种分离的表达（a disentangled representation）在 latent space 的，z = [zF, zB]。zF 和属性 y 一起捕获了 the foreground factors,而 zB 捕获了 the background factors. 所以，对应的，the foreground layer xF 是从 $\mu_{\theta F}(y, z_F)$ 中产生的，而 the background layer xB 从 $\mu_{\theta F}(z_B)$ 中产生的。前景的形状和位置决定了背景遮挡，所以，

the gating layer g 是从 s 产生的。其中 the last layer of s(*) 是 sigmoid function。

　　　总的来说，我们按照下面的过程来进行 the layered generation process:

　　　　1. 采样前景和背景隐层变量zF, zB ;

　　　　2. 给定 y 和 zF, 产生前景层 xF 和 gating layer g; 以及背景layer。

　　　　3. 合成一张图像 x 。

　　Learning 。以完全无监督的方式学习我们的 layered generative model 是非常有挑战的。我们仅仅从图像 x infer 关于 xF, xB and g.

　　本文中，我们进一步的假设 the foreground layer xF (as well as the gating variable g) 在训练的过程中，是可见的。我们训练一个模型，最大化 the joint log-likelihood $log p_\theta (x, xF, g|y)$ 而不是 $log p\theta(x|y)$。有了解绑的 latent variable zF 和 zB，我们 infer layered model a disentangleing conditional variational auto-encoder (disCVAE)。我们对比了 the graphical models of disCVAE with vanilla CVAE in Figure 2.

　　基于 the layered generation process, 我们将产生式模型（the generation model）写成下面的方式：

　　而判别式模型（the recogniton model）记为：

　　the variational lower bound $L_{disCVAE}$ 记为：

　　4. Posterior Inference via Optimization.

　　一旦 the attribute-conditioned generative model 训练完成后，给定属性 y 和 latent variable z 后，图像 x 的 the inference 或者 generation 是非常直观的。

　　但是，给定 an image x，latent variable z 的 inference 及其对应的属性 y 是未知的。实际上，the latent variable inference 是非常有用的，因为其确保了在新图像上的 model evaluation。

　　首先，我们注意到：the recognition model q may not be directly used to infer z.

　　　　一方面，作为估计，我们不知道其距离真实的 posterior p 有多远。因为在 variational learning object 中，KL divergence 被扔掉了；

　　　　另一方面，这种估计在其他模型，如：GANs，甚至不存在。

　　我们给出了一种 general approach 进行 posterior inference，在 latent space 进行 optimization：

　　注意到，the generation models or likelihood terms 可以是 non-Gaussian or even a deterministic function with no proper probabilistic definiton.

　　所以，为了使得我们的算法更加 general，我们将上述的 inference 的过程，写成下面能量最小化的问题：

　　其中，L 是图像重构的 loss，R 是先验正则化项。以简单的高斯model 作为例子，the posterior inference 可以重新写作：

　　注意到，我们用 the mean function u 为 a general image generation function。因为 u 是一个复杂的神经网络，优化公式（9）本质上是误差回传，我们利用 ADAM method 来求解。

　　本文与最新提出的神经网络可视化和文本合成算法的区别在于：

　　We use generation models for recogniton; while others use recogniton model for generation.

　　实验部分：

Attribute2Image --- Conditional Image Generation from Visual Attributes 论文笔记的更多相关文章

论文：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention-阅读总结笔记不能简单的抄写文中的内容,得有自 ...
论文笔记：Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2018-08-10 10:15:06 Pap ...
论文笔记之：Visual Tracking with Fully Convolutional Networks
论文笔记之:Visual Tracking with Fully Convolutional Networks ICCV 2015 CUHK 本文利用 FCN 来做跟踪问题,但开篇就提到并非将其看做 ...
论文笔记：Towards Diverse and Natural Image Descriptions via a Conditional GAN
论文笔记:Towards Diverse and Natural Image Descriptions via a Conditional GAN ICCV 2017 Paper: http://op ...
论文笔记： Dual Deep Network for Visual Tracking
论文笔记: Dual Deep Network for Visual Tracking 2017-10-17 21:57:08 先来看文章的流程吧 ... 可以看到,作者所总结的三个点在于: 1. ...
论文笔记之：Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning
论文笔记之:Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning 2017-06-06 21: ...
Face Aging with Conditional Generative Adversarial Network 论文笔记
Face Aging with Conditional Generative Adversarial Network 论文笔记 2017.02.28 Motivation: 本文是要根据最新的条件产 ...
Deep Reinforcement Learning for Visual Object Tracking in Videos 论文笔记
Deep Reinforcement Learning for Visual Object Tracking in Videos 论文笔记 arXiv 摘要:本文提出了一种 DRL 算法进行单目标跟踪 ...
论文笔记之：End-to-End Localization and Ranking for Relative Attributes
End-to-End Localization and Ranking for Relative Attributes arXiv Paper 摘要:本文提出一种 end-to-end 的属性识别方 ...

随机推荐

[openjudge-搜索]Lake Counting(翻译及实现)
题目原文描述 Due to recent rains, water has pooled in various places in Farmer John's field, which is rep ...
【Hadoop学习之七】Hadoop YARN
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 YARN: ...
over(partition by)开窗函数的使用
开窗函数是分析函数中的一种,开窗函数与聚合函数的区别是:开窗函数是用于计算基于组的某种聚合值且每个的组的聚合计算结果可以有多行,而聚合函数每个组的聚合计算结果只有一个.使用开窗函数可以在没有group ...
拜占庭将军问题(Byzantine Generals Problem)，一个关于分布式系统容错问题故事
拜占庭将军问题(Byzantine Generals Problem),一个关于分布式系统容错问题故事背景:拜占庭帝国派出10支军队,去包围进攻一个强大的敌人,至少6支军队同时进攻才能攻下敌国. 难 ...
spring 线程安全
http://www.cnblogs.com/doit8791/p/4093808.html 写的真的好
centos下搭建Jenkins持续集成环境(安装jenkins)
1.安装JDK yum install -y java 2.安装jenkins 添加Jenkins库到yum库,Jenkins将从这里下载安装. 1 wget -O /etc/yum.repos.d/ ...
Java常用API、Math类介绍
一.API的概述 API——Application Programing Interface:应用程序编程接口,是java提供的一些预定义的函数: 目的:基于API实现程序的快速编写,只需了解其作用, ...
了解Redis过期策略及实现原理
我们在使用redis时,一般会设置一个过期时间,当然也有不设置过期时间的,也就是永久不过期. 当我们设置了过期时间,redis是如何判断是否过期,以及根据什么策略来进行删除的. redis设置过期时间 ...
JS 和 Jquery 的一些常用效果
https://www.cnblogs.com/beiz/tag/%E7%BD%91%E9%A1%B5%E5%B8%B8%E8%A7%81%E6%95%88%E6%9E%9C/ 北执
pyglet player sound
Player = pyglet.media.Player() # our event handling function def on_eos(): print("on player eos ...

Attribute2Image --- Conditional Image Generation from Visual Attributes 论文笔记

Attribute2Image --- Conditional Image Generation from Visual Attributes 论文笔记的更多相关文章

随机推荐

热门专题