论文笔记之：Generative Adversarial Text to Image Synthesis

Generative Adversarial Text to Image Synthesis

ICML 2016

　　摘要：本文将文本和图像练习起来，根据文本生成图像，结合 CNN 和 GAN 来有效的进行无监督学习。

　　Attribute Representation: 是一个非常具有意思的方向。由图像到文本，可以看做是一个识别问题；从文本到图像，则不是那么简单。

　　因为需要解决这两个小问题：

　　1. learning a text feature representation that captures the important visual deatails ;

　　2. use these features to synthesize a compelling image that a human might mistake for real.

　　幸运的是，深度学习对这两个问题都有了较好的解决方案，即：自然语言表示 和 image synthesis 。

　　但是，仍然存在的一个问题是：the distribution of images conditioned on a text description is highly multimodal，in the sense that there are very many plausible configurations of pixels that correctly illustrate the description.

　　Background ：

　　1. GANs.

　　　　此处略，参考相关博客。

　　2. Deep symmetric structured joint embedding.

　　为了得到一个视觉上可以判别的文本表示（text description），我们采用了一个 CVPR 2016 的一篇文章，利用 CNN 和 recurrent text encoder 根据一张 Image 学一个对应的函数。这个 text classifier 是通过以下的 structure loss 进行训练：

　　其中，$\{ v_n, t_n, y_n \}$ 是训练数据集合, $\delta$ 是 0-1 loss，$v_n$ 是image，$t_n$ 是 text description，$y_n$ 是class label。

　　分类器 $f_t$, $f_v$ 参数化如下：

　　其中，一个是 image encoder，一个是 text encoder。当一张图像有了其类别信息的时候，文本的编码应该有更高的兼容性得分，反之亦然。(The intuition here is that a text encoding should have a higher compatibility score with image of the corresponding class and vice-versa。)

　　Method :

　　我们的方法是为了基于text feature，训练一个深度卷积产生式对抗网络 (DC-GAN)。

　　1. Network architecture .

　　基本概念：产生器 G ；判别器 D ;

　　以上，就是本文提出的整个网络框架。

　　首先看产生器 G，将文本信息经过预处理得到其特征表达，然后将其和 noise vector 组合在一起，输入到接下来的反卷积网络中，最终生成一幅图像；

　　再看判别器，将图像进行卷积操作后，我们将本文信息在 depth 方向上组合原本图像卷积得到的feature 上，然后得到一个二元值。

　　2. Matching-aware discriminator (GAN-CLS) :

　　最直接的方法进行 conditional GAN 的训练是将 pairs (text, image) 看做是一个联合的观察（Joint Observations），然后训练判别器来判断这个 pair 是 real or false。这种条件是 naive 的，当处于 the discriminator 没有明显的 notion 是否 real training images match the text embedding context。

　　在 naive GAN，the discriminator 观察到两种输入：real image 和匹配的 text；以及 synthetic images 和随意的 text。所以，必须显示的将两种 errors 分开：

　　unrealistic images （for any text）， and realistic images of the wrong class that mismatch the conditioning information。

　　基于这可能会增加了学习 dynamics 的复杂性，我们修改了 GAN 训练来分开这些 error source。

　　除了在训练阶段，提供 real / fake inputs 给 discriminator 之外，我们增添了第三种输入，即：real images with mismatched text，which the discriminator must learn to score as fake。通过学习 image / text 的 matching，还要学习 image realism （图像的真实性），判别器可以提供额外的信息给产生器（the discriminator can provide an additional signal to the generator）。

　　算法 1 总结了训练的过程。

　　3. Learning with manifold interpolation (GAN-INT) 流型插值

　　Deep network have been shown to learn representations in which interpolations between embedding pairs tend to be near the data manifold.

　　深度学习发现当接近数据流型的数据对之间进行插值来学习表示。

　　受到这个发现的启发：我们可以产生一个 large amount of additional text embeddings by simply interpolating between embeddings of training set captions。

　　关键是，这些插值的 text embeddings 不需要对应上任何真实的 human-written text，所以，不需要额外的 labeling cost。

　　这个就可以看做是：在产生器的目标中增加一个额外的项：

　　由于插值的 embeddings 是伪造的，判别器并没有对应的 image and text pairs 来进行训练。但是，D 学习到了是否当前 image 和 text 相匹配。

　　4. Inverting the generator for style transfer.

　　如果 text encoding 可捕获图像的 content，比如：flower shape 和 colors，然后为了保证一个真实的图像，the noise sample Z 应该可以捕获 style factors，如：背景颜色和姿态。有了一个 trained GAN，我们可能希望转换一个图像的类型，根据特定的文本描述的内容。为了达到这个目的，我们可以训练一个 CNN 来翻转 G 以使得从样本进行回归到 Z。我们利用一个简单的 squared loss 来训练 style encoder：

　　其中，S 是 style encoder network。有了训练的产生器和类型编码，style transfer 根据样本 t 从一张 query image x 执行下列步骤：

　　其中， x 是结果图像， s 是预测的 style。

　　Experiments .

论文笔记之：Generative Adversarial Text to Image Synthesis的更多相关文章

论文阅读 | TextBugger: Generating Adversarial Text Against Real-world Applications
NDSS https://arxiv.org/abs/1812.05271 摘要中的创新点确实是对抗攻击中值得考虑的点: 1. effective 2. evasive recognized b ...
（转）Deep Learning Research Review Week 1: Generative Adversarial Nets
Adit Deshpande CS Undergrad at UCLA ('19) Blog About Resume Deep Learning Research Review Week 1: Ge ...
Generative Adversarial Nets[CycleGAN]
本文来自<Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks>,时间线为2017 ...
Generative Adversarial Nets[pix2pix]
本文来自<Image-to-Image Translation with Conditional Adversarial Networks>,是Phillip Isola与朱俊彦等人的作品 ...
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 论文笔记
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 本文将利 ...
Face Aging with Conditional Generative Adversarial Network 论文笔记
Face Aging with Conditional Generative Adversarial Network 论文笔记 2017.02.28 Motivation: 本文是要根据最新的条件产 ...
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks 笔记
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks 笔记这 ...
论文笔记之：SeqGAN: Sequence generative adversarial nets with policy gradient
SeqGAN: Sequence generative adversarial nets with policy gradient AAAI-2017 Introduction : 产生序列模拟数 ...
《Generative Adversarial Networks for Hyperspectral Image Classification 》论文笔记
论文题目:<Generative Adversarial Networks for Hyperspectral Image Classification> 论文作者:Lin Zhu, Yu ...

随机推荐

协同过滤 CF & ALS 及在Spark上的实现
使用Spark进行ALS编程的例子可以看:http://www.cnblogs.com/charlesblc/p/6165201.html ALS:alternating least squares ...
javascript数组常用方法汇总
1.join()方法: Array.join()方法将数组中所以元素都转化为字符串链接在一起,返回最后生成的字符串.也可以指定可选的字符串在生成的字符串中来分隔数组的各个元素.如果不指定分隔符,默认使 ...
Mybatis 批量insert
@Override public int insertHouseTypeScene(int htid, String name, String icon,int sort, List<House ...
postgresql中执行计划
1.Explain explain select * from tablename; 2.explain输出josn格式 explain (format json) select * from tab ...
android 中 webview 怎么用 localStorage?
我在 android里面使用html5的 localStorage 为什么存不进去也读不出来呀? 网上搜了好多都没效果 1 2 3 4 5 6 7 8 9 mainWebView = (WebVie ...
老麦看点:SEO高手的两大秘诀
一.技术真的是主导因素吗? 很多人站长朋友操作一段网站之后,发现自己的排名还是在渺渺无期,真可谓:“众里寻排名千百度,可是排名却不在阑珊处”,于是我们开始怀疑自己,怀疑自己的技术等,但是我们静下心里仔 ...
CentOS6.4安装Hadoop2.0.5 alpha - 3-Node Cluster
1.在第2个个节点上重复http://www.cnblogs.com/littlesuccess/p/3361497.html文章中的第1-5步 2.修改第1个节点上的hdfs-site.xml中的配 ...
解决SSH会话连接超时问题
用SSH客户端连接linux服务器时,经常会出现与服务器会话连接中断现象,照成这个问题的原因便是SSH服务有自己独特的会话连接机制.记得在一年前就有朋友问过我这个问题,那时候我便是草草打发,结果自己现 ...
OC 解决NSArray、NSDictionary直接打印中文出现乱码的问题
在iOS开发中,经常需要查看数组中得元素是否是自己想要的,但是苹果并没有对直接打印数组中得中文作处理,直接打印就会出现一堆很讨厌的东西,解决其实很简单,就是需要通过为NSArray添加分类,重写 - ...
为什么在保护模式下IA-32处理器最高可访问4GB的内存
在保护模式下,IA-32处理器可访问最高达4GB的内存,这是32位无符号二进制整数地址能够寻址的上限. 今天看汇编的时候发现书里带过一句,不太明白为什么内存上限是4GB,就搜了一下,总结了一下答案. ...

论文笔记之：Generative Adversarial Text to Image Synthesis

论文笔记之：Generative Adversarial Text to Image Synthesis的更多相关文章

随机推荐

热门专题