《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记

出处：arxiv 2016 尚未出版

Motivation

根据文字描述来合成相片级真实感的图片是一项极具挑战性的任务。现有的生成手段，往往只能合成大体的目标，而丢失了生动的细节信息。StackGAN分两步来完成生成目标：Stage-I从文字中生成低分辨率的大体框架和基本色彩，Stage-II以文字和Stage-I中生成的基本框架图为输入，生成高分辨率的具体细节。运用StackGAN可以生成当前state_of_art的256*256分辨率的文字转换图像。训练数据集采用了CUB and Oxford-102。

Introduction

现有工作中，[20][22]可以利用GAN根据文字描述生成低分辨率64*64的图片。为了克服这一困难，作者描述了StackGAN怎样将任务分解为两步来达到目标。

Model

Stage-I GAN

对G来说，输入的文字描述被一个训练好的非线性转换器（nonlinearly transformed）Encoder $\phi$转化为隐变量（text-embeding），通常来说，该隐变量的维度相当高，通常大于100维，在G学习时对连续性有影响。因此作者提出一种扩张机制（augmentation），来为G产生更多的条件变量。作者构建一个特殊的高斯分布，从中进行随机采样，The proposed formulation encourages robustness to small perturbations along the conditioning manifold, and
thus yields more training pairs given a small number of image-text pairs。并且在训练过程中，作者使用KL距离

作为正则项来增强流型的平滑性同时避免overfitting。

损失函数：

stage-II GAN:

把前一阶段生成的低分辨率图像和文字描述作为输入，模型致力于弥补上阶段丢失的细节信息

损失函数：

其中$S_0$是上阶段生成的低分辨率图，随机变量Z没有出现在这个一生成阶段中。两个阶段都共享了训练好的词向量encoder，但是后面接的连接层不同，产生的平均数和方差数不同，因此能比1阶段生成更详细的信息（这段转得很生硬，我也不懂为什么这样就能产生更丰富的信息）。

其他：

数据集：CUB and Oxford-102采用了【21】提供的标签，每张图片提供10个标注

评估指标：使用了【26】推荐的Inception score 来评价生成质量

其中，x是生成的样本，y是label predicted by the Inception model【28】

不足之处：个人认为没有对多目标生成进行研究，这方面如果有所突破将会是篇好的paper。

pytoch 源码地址：https://github.com/hanzhanggit/StackGAN

后续论文：

　　　　 StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

[20] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and
H. Lee. Learning what and where to draw. In NIPS, 2016. 1,
2, 3, 5, 6, 7

[21]S. Reed, Z. Akata, B. Schiele, and H. Lee. Learning deep
representations of fine-grained visual descriptions. In CVPR,
2016.

[22] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and
H. Lee. Generative adversarial text-to-image synthesis. In
ICML, 2016. 1, 2, 3, 5, 6, 7

[26] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung,
A. Radford, and X. Chen. Improved techniques for training
gans. In NIPS, 2016. 2, 5

[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.
Rethinking the inception architecture for computer vision. In
CVPR, 2016. 5

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记的更多相关文章

《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
[place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析（转）
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果用5G的计算量和25M的参数. ...
论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
论文笔记：ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
论文笔记：DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
论文笔记：Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
论文笔记：Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...

随机推荐

洛谷P2414 - [NOI2011]阿狸的打字机
Portal Description 首先给出一个只包含小写字母和'B'.'P'的操作序列$s_0(|s_0|\leq10^5)$.初始时我们有一个空串$t$,依次按$s_0$的每一位进行 ...
sencha architect开发sencha touch应用注意事项
以下说明文字针对sencha architect v2.2.2 一.无限期试用 1. 下载地址: http://www.sencha.com/products/architect/download/ ...
python学习之-- redis模块操作 HASH
redis 操作之 -Hash Hash 操作:hash在内存中的存储格式 name hash n1 ------> k1 -> v1 k2 -> v2 k3 -> v3hs ...
eslint (js代码检查)
eslint 是一个应用广泛的javascript代码检查工具. 能检测变量名重复等等... 1.安装 npm install -g eslint 2.初始化会在当前目录下生成一个.eslintrc ...
BZOJ——2563: 阿狸和桃子的游戏
http://www.lydsy.com/JudgeOnline/problem.php?id=2563 Time Limit: 3 Sec Memory Limit: 128 MBSubmit: ...
Ubuntu 16.04安装双显卡驱动方法收集
说明:不一定有效,要不断尝试. http://www.linuxwang.com/html/2150.html http://blog.csdn.net/feishicheng/article/det ...
前端开发数据mock神器 -- xl_mock
1.为什么要实现数据 mock 要理解为什么要实现数据 mock,我们可以提供几个场景来解释, 1.现在的开发很多都是前后端分离的模式,前后端的工作是不同的,当我们前端界面已经完成,但是后端的接口迟迟 ...
Shell 脚本小试牛刀（5） -- 超便捷脚本之高速ssh 登录其它主机
假设你也是以Linux 为工作环境的童鞋,那么此文真是捷报!由于我的学习/工作中(特别是近期玩耍树莓派)常常会使用到ssh 登录其它主机,而每次使用ssh 登录都须要输入老长一大串让我非常烦.所以我写 ...
HDU 5301 Buildings（2015多校第二场）
Buildings Time Limit: 4000/2000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others) Tota ...
#include<> 和 #include""的区别
#include< file >编译程序会先到标准函数库中找文件 #include”file” 编译程序会先从当前目录中找文件参考原文转: 在C程序中包含文件有以下两种方法: (1)用 ...

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记的更多相关文章

随机推荐

热门专题