《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记

出处：arxiv 2016 尚未出版

Motivation

根据文字描述来合成相片级真实感的图片是一项极具挑战性的任务。现有的生成手段，往往只能合成大体的目标，而丢失了生动的细节信息。StackGAN分两步来完成生成目标：Stage-I从文字中生成低分辨率的大体框架和基本色彩，Stage-II以文字和Stage-I中生成的基本框架图为输入，生成高分辨率的具体细节。运用StackGAN可以生成当前state_of_art的256*256分辨率的文字转换图像。训练数据集采用了CUB and Oxford-102。

Introduction

现有工作中，[20][22]可以利用GAN根据文字描述生成低分辨率64*64的图片。为了克服这一困难，作者描述了StackGAN怎样将任务分解为两步来达到目标。

Model

Stage-I GAN

对G来说，输入的文字描述被一个训练好的非线性转换器（nonlinearly transformed）Encoder $\phi$转化为隐变量（text-embeding），通常来说，该隐变量的维度相当高，通常大于100维，在G学习时对连续性有影响。因此作者提出一种扩张机制（augmentation），来为G产生更多的条件变量。作者构建一个特殊的高斯分布，从中进行随机采样，The proposed formulation encourages robustness to small perturbations along the conditioning manifold, and
thus yields more training pairs given a small number of image-text pairs。并且在训练过程中，作者使用KL距离

作为正则项来增强流型的平滑性同时避免overfitting。

损失函数：

stage-II GAN:

把前一阶段生成的低分辨率图像和文字描述作为输入，模型致力于弥补上阶段丢失的细节信息

损失函数：

其中$S_0$是上阶段生成的低分辨率图，随机变量Z没有出现在这个一生成阶段中。两个阶段都共享了训练好的词向量encoder，但是后面接的连接层不同，产生的平均数和方差数不同，因此能比1阶段生成更详细的信息（这段转得很生硬，我也不懂为什么这样就能产生更丰富的信息）。

其他：

数据集：CUB and Oxford-102采用了【21】提供的标签，每张图片提供10个标注

评估指标：使用了【26】推荐的Inception score 来评价生成质量

其中，x是生成的样本，y是label predicted by the Inception model【28】

不足之处：个人认为没有对多目标生成进行研究，这方面如果有所突破将会是篇好的paper。

pytoch 源码地址：https://github.com/hanzhanggit/StackGAN

后续论文：

　　　　 StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

[20] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and
H. Lee. Learning what and where to draw. In NIPS, 2016. 1,
2, 3, 5, 6, 7

[21]S. Reed, Z. Akata, B. Schiele, and H. Lee. Learning deep
representations of fine-grained visual descriptions. In CVPR,
2016.

[22] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and
H. Lee. Generative adversarial text-to-image synthesis. In
ICML, 2016. 1, 2, 3, 5, 6, 7

[26] T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung,
A. Radford, and X. Chen. Improved techniques for training
gans. In NIPS, 2016. 2, 5

[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna.
Rethinking the inception architecture for computer vision. In
CVPR, 2016. 5

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记的更多相关文章

《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
[place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析（转）
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果用5G的计算量和25M的参数. ...
论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
论文笔记：ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
论文笔记：DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
论文笔记：Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
论文笔记：Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...

随机推荐

【Dijstra堆优化】HDU 3986 Harry Potter and the Final Battle
http://acm.hdu.edu.cn/showproblem.php?pid=3986 [题意] 给定一个有重边的无向图,T=20,n<=1000,m<=5000 删去一条边,使得1 ...
java中filter的用法
filter过滤器主要使用于前台向后台传递数据是的过滤操作.程度很简单就不说明了,直接给几个已经写好的代码: 一.使浏览器不缓存页面的过滤器 Java代码 import javax.servlet ...
winrar5.0破解
RAR registration data Federal Agency for Education 1000000 PC usage license UID=b621cca9a84bc5deffbf ...
poj3532求生成树中最大权与最小权只差最小的生成树+hoj1598俩个点之间的最大权与最小权只差最小的路经。
该题是最小生成树问题变通活用,表示自己开始没有想到该算法:先将所有边按权重排序,然后枚举最小边,求最小生成树(一个简单图的最小生成树的最大权是所有生成树中最大权最小的,这个容易理解,所以每次取最小边, ...
SQL SERVER 2012 第三章 T-SQL 基本语句 having子句
SELECT ManagerID AS Manager,COUNT(*) AS Reports FROM Human.Resources.Employee2 WHERE EmployeeID !=5 ...
Linux面试题完整修订附加答案
册一: 1.Linux挂载Winodws共享文件夹第一步:先在Windows上创建一个共享目录 Windows系统IP是172.16.18.56;共享文件夹:E:\test ...
Javascript setTimeout(0)，闭包
setTimeout常常被用于延迟运行某个函数,使用方法为 setTimeout(function(){ - }, timeout); 有时为了进行异步处理,而使用setTimeout(functio ...
DICOM：再次剖析fo-dicom中DicomService的自己定义事件绑定
题记: 趁着<从0到1>大火的热潮,最近又一次翻阅了一遍<从一到无穷大>(这样是不是感觉整个非负数轴就圆满了^_^). 尽管作为科普类书籍.可是里面的内容还是比較深奥,幸亏有作 ...
Pacemaker 安装与使用
Pacemaker 仅仅做资源管理器(CRM).底下的消息系统採用 corosync. 安装以 ubuntu 为例, sudo aptitude install -y pacemaker coros ...
MySQL多实例配置(一)
MySQL数据库的集中化运维,能够通过在一台MySQL数据库服务器上,部署多个MySQL实例.该功能是通过mysqld_multi来实现.mysqld_multi用于管理多个mysqld的服务进程,这 ...

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记

《StackGAN: Text to Photo-realistic Image Synthesis with Stacked GAN》论文笔记的更多相关文章

随机推荐

热门专题