Deep Convolutional Generative Adversarial Networks

we introduced the basic ideas behind how GANs work. We showed that they can draw samples from some simple, easy-to-sample distribution, like a uniform or normal distribution, and transform them into samples that appear to match the distribution of some dataset. And while our example of matching a 2D Gaussian distribution got the point across, it is not especially exciting.

In this section, we will demonstrate how you can use GANs to generate photorealistic images. We will be basing our models on the deep convolutional GANs (DCGAN) introduced in :cite:Radford.Metz.Chintala.2015. We will borrow the convolutional architecture that have proven so successful for discriminative computer vision problems and show how via GANs, they can be leveraged to generate photorealistic images.

import matplotlib.pyplot as plt
from torch.utils.data import DataLoader
from torch import nn
import numpy as np
from torch.autograd import Variable
import torch
from torchvision.datasets import ImageFolder
from torchvision.transforms import transforms
import zipfile
cuda = True if torch.cuda.is_available() else False
print(cuda)
True

The Pokemon Dataset

The dataset we will use is a collection of Pokemon sprites obtained from pokemondb. First download, extract and load this dataset.

We resize each image into \(64\times 64\). The ToTensor transformation will project the pixel value into \([0, 1]\), while our generator will use the tanh function to obtain outputs in \([-1, 1]\). Therefore we normalize the data with \(0.5\) mean and \(0.5\) standard deviation to match the value range.

data_dir='/home/kesci/input/pokemon8600/'
batch_size=256
transform=transforms.Compose([
transforms.Resize((64,64)),
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])
pokemon=ImageFolder(data_dir+'pokemon',transform)
data_iter=DataLoader(pokemon,batch_size=batch_size,shuffle=True)

Let's visualize the first 20 images.

fig=plt.figure(figsize=(4,4))
imgs=data_iter.dataset.imgs
for i in range(20):
img = plt.imread(imgs[i*150][0])
plt.subplot(4,5,i+1)
plt.imshow(img)
plt.axis('off')
plt.show()

The Generator

The generator needs to map the noise variable \(\mathbf z\in\mathbb R^d\), a length-\(d\) vector, to a RGB image with width and height to be \(64\times 64\) . In :numref:sec_fcn we introduced the fully convolutional network that uses transposed convolution layer (refer to :numref:sec_transposed_conv) to enlarge input size. The basic block of the generator contains a transposed convolution layer followed by the batch normalization and ReLU activation.

class G_block(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=4,strides=2, padding=1):
super(G_block,self).__init__()
self.conv2d_trans=nn.ConvTranspose2d(in_channels, out_channels, kernel_size=kernel_size,
stride=strides, padding=padding, bias=False)
self.batch_norm=nn.BatchNorm2d(out_channels,0.8)
self.activation=nn.ReLU()
def forward(self,x):
return self.activation(self.batch_norm(self.conv2d_trans(x)))

In default, the transposed convolution layer uses a \(k_h = k_w = 4\) kernel, a \(s_h = s_w = 2\) strides, and a \(p_h = p_w = 1\) padding. With a input shape of \(n_h^{'} \times n_w^{'} = 16 \times 16\), the generator block will double input's width and height.

\[\begin{aligned}
n_h^{'} \times n_w^{'} &= [(n_h k_h - (n_h-1)(k_h-s_h)- 2p_h] \times [(n_w k_w - (n_w-1)(k_w-s_w)- 2p_w]\\
&= [(k_h + s_h (n_h-1)- 2p_h] \times [(k_w + s_w (n_w-1)- 2p_w]\\
&= [(4 + 2 \times (16-1)- 2 \times 1] \times [(4 + 2 \times (16-1)- 2 \times 1]\\
&= 32 \times 32 .\\
\end{aligned}
\]

Tensor=torch.cuda.FloatTensor
x=Variable(Tensor(np.zeros((2,3,16,16))))
g_blk=G_block(3,20)
g_blk.cuda()
print(g_blk(x).shape)
torch.Size([2, 20, 32, 32])

If changing the transposed convolution layer to a \(4\times 4\) kernel, \(1\times 1\) strides and zero padding. With a input size of \(1 \times 1\), the output will have its width and height increased by 3 respectively.

x=Variable(Tensor(np.zeros((2,3,1,1))))
g_blk=G_block(3,20,strides=1,padding=0)
g_blk.cuda()
print(g_blk(x).shape)
torch.Size([2, 20, 4, 4])

The generator consists of four basic blocks that increase input's both width and height from 1 to 32. At the same time, it first projects the latent variable into \(64\times 8\) channels, and then halve the channels each time. At last, a transposed convolution layer is used to generate the output. It further doubles the width and height to match the desired \(64\times 64\) shape, and reduces the channel size to \(3\). The tanh activation function is applied to project output values into the \((-1, 1)\) range.

class net_G(nn.Module):
def __init__(self,in_channels):
super(net_G,self).__init__() n_G=64
self.model=nn.Sequential(
G_block(in_channels,n_G*8,strides=1,padding=0),
G_block(n_G*8,n_G*4),
G_block(n_G*4,n_G*2),
G_block(n_G*2,n_G),
nn.ConvTranspose2d(
n_G,3,kernel_size=4,stride=2,padding=1,bias=False
),
nn.Tanh()
)
def forward(self,x):
x=self.model(x)
return x def weights_init_normal(m):
classname = m.__class__.__name__
if classname.find("Conv") != -1:
torch.nn.init.normal_(m.weight.data, mean=0, std=0.02)
elif classname.find("BatchNorm2d") != -1:
torch.nn.init.normal_(m.weight.data, mean=1.0, std=0.02)
torch.nn.init.constant_(m.bias.data, 0.0)

Generate a 100 dimensional latent variable to verify the generator's output shape.

x=Variable(Tensor(np.zeros((1,100,1,1))))
generator=net_G(100)
generator.cuda()
generator.apply(weights_init_normal)
print(generator(x).shape)
torch.Size([1, 3, 64, 64])

Discriminator

The discriminator is a normal convolutional network network except that it uses a leaky ReLU as its activation function. Given \(\alpha \in[0, 1]\), its definition is

\[\textrm{leaky ReLU}(x) = \begin{cases}x & \text{if}\ x > 0\\ \alpha x &\text{otherwise}\end{cases}.
\]

As it can be seen, it is normal ReLU if \(\alpha=0\), and an identity function if \(\alpha=1\). For \(\alpha \in (0, 1)\), leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the "dying ReLU" problem that a neuron might always output a negative value and therefore cannot make any progress since the gradient of ReLU is 0.

alphas = [0, 0.2, 0.4, .6]
x = np.arange(-2, 1, 0.1)
Y = [nn.LeakyReLU(alpha)(Tensor(x)).cpu().numpy()for alpha in alphas]
plt.figure(figsize=(4,4))
for y in Y:
plt.plot(x,y)
plt.show()

The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyper-parameters of the convolution layer are similar to the transpose convolution layer in the generator block.

class D_block(nn.Module):
def __init__(self,in_channels,out_channels,kernel_size=4,strides=2,
padding=1,alpha=0.2):
super(D_block,self).__init__()
self.conv2d=nn.Conv2d(in_channels,out_channels,kernel_size,strides,padding,bias=False)
self.batch_norm=nn.BatchNorm2d(out_channels,0.8)
self.activation=nn.LeakyReLU(alpha)
def forward(self,X):
return self.activation(self.batch_norm(self.conv2d(X)))

A basic block with default settings will halve the width and height of the inputs, as we demonstrated in :numref:sec_padding. For example, given a input shape $n_h = n_w = 16 $, with a kernel shape \(k_h = k_w = 4\), a stride shape \(s_h = s_w = 2\), and a padding shape \(p_h = p_w = 1\), the output shape will be:

\[
\begin{aligned}
n_h^{'} \times n_w^{'} &= \lfloor(n_h-k_h+2p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+2p_w+s_w)/s_w\rfloor\\
&= \lfloor(16-4+2\times 1+2)/2\rfloor \times \lfloor(16-4+2\times 1+2)/2\rfloor\\
&= 8 \times 8 .\\
\end{aligned}

\]

x = Variable(Tensor(np.zeros((2, 3, 16, 16))))
d_blk = D_block(3,20)
d_blk.cuda()
print(d_blk(x).shape)
torch.Size([2, 20, 8, 8])

The discriminator is a mirror of the generator.

class net_D(nn.Module):
def __init__(self,in_channels):
super(net_D,self).__init__()
n_D=64
self.model=nn.Sequential(
D_block(in_channels,n_D),
D_block(n_D,n_D*2),
D_block(n_D*2,n_D*4),
D_block(n_D*4,n_D*8)
)
self.conv=nn.Conv2d(n_D*8,1,kernel_size=4,bias=False)
self.activation=nn.Sigmoid()
# self._initialize_weights()
def forward(self,x):
x=self.model(x)
x=self.conv(x)
x=self.activation(x)
return x

It uses a convolution layer with output channel \(1\) as the last layer to obtain a single prediction value.

x = Variable(Tensor(np.zeros((1, 3, 64, 64))))
discriminator=net_D(3)
discriminator.cuda()
discriminator.apply(weights_init_normal)
print(discriminator(x).shape)
torch.Size([1, 1, 1, 1])

Training

Compared to the basic GAN in :numref:sec_basic_gan, we use the same learning rate for both generator and discriminator since they are similar to each other. In addition, we change \(\beta_1\) in Adam (:numref:sec_adam) from \(0.9\) to \(0.5\). It decreases the smoothness of the momentum, the exponentially weighted moving average of past gradients, to take care of the rapid changing gradients because the generator and the discriminator fight with each other. Besides, the random generated noise Z, is a 4-D tensor and we are using GPU to accelerate the computation.

def update_D(X,Z,net_D,net_G,loss,trainer_D):
batch_size=X.shape[0]
Tensor=torch.cuda.FloatTensor
ones=Variable(Tensor(np.ones(batch_size,)),requires_grad=False).view(batch_size,1)
zeros = Variable(Tensor(np.zeros(batch_size,)),requires_grad=False).view(batch_size,1)
real_Y=net_D(X).view(batch_size,-1)
fake_X=net_G(Z)
fake_Y=net_D(fake_X).view(batch_size,-1)
loss_D=(loss(real_Y,ones)+loss(fake_Y,zeros))/2
loss_D.backward()
trainer_D.step()
return float(loss_D.sum()) def update_G(Z,net_D,net_G,loss,trainer_G):
batch_size=Z.shape[0]
Tensor=torch.cuda.FloatTensor
ones=Variable(Tensor(np.ones((batch_size,))),requires_grad=False).view(batch_size,1)
fake_X=net_G(Z)
fake_Y=net_D(fake_X).view(batch_size,-1)
loss_G=loss(fake_Y,ones)
loss_G.backward()
trainer_G.step()
return float(loss_G.sum()) def train(net_D,net_G,data_iter,num_epochs,lr,latent_dim):
loss=nn.BCELoss()
Tensor=torch.cuda.FloatTensor
trainer_D=torch.optim.Adam(net_D.parameters(),lr=lr,betas=(0.5,0.999))
trainer_G=torch.optim.Adam(net_G.parameters(),lr=lr,betas=(0.5,0.999))
plt.figure(figsize=(7,4))
d_loss_point=[]
g_loss_point=[]
d_loss=0
g_loss=0
for epoch in range(1,num_epochs+1):
d_loss_sum=0
g_loss_sum=0
batch=0
for X in data_iter:
X=X[:][0]
batch+=1
X=Variable(X.type(Tensor))
batch_size=X.shape[0]
Z=Variable(Tensor(np.random.normal(0,1,(batch_size,latent_dim,1,1)))) trainer_D.zero_grad()
d_loss = update_D(X, Z, net_D, net_G, loss, trainer_D)
d_loss_sum+=d_loss
trainer_G.zero_grad()
g_loss = update_G(Z, net_D, net_G, loss, trainer_G)
g_loss_sum+=g_loss d_loss_point.append(d_loss_sum/batch)
g_loss_point.append(g_loss_sum/batch)
print(
"[Epoch %d/%d] [D loss: %f] [G loss: %f]"
% (epoch, num_epochs, d_loss_sum/batch_size, g_loss_sum/batch_size)
) plt.ylabel('Loss', fontdict={ 'size': 14})
plt.xlabel('epoch', fontdict={ 'size': 14})
plt.xticks(range(0,num_epochs+1,3))
plt.plot(range(1,num_epochs+1),d_loss_point,color='orange',label='discriminator')
plt.plot(range(1,num_epochs+1),g_loss_point,color='blue',label='generator')
plt.legend()
plt.show()
print(d_loss,g_loss) Z = Variable(Tensor(np.random.normal(0, 1, size=(21, latent_dim, 1, 1))),requires_grad=False)
fake_x = generator(Z)
fake_x=fake_x.cpu().detach().numpy()
plt.figure(figsize=(14,6))
for i in range(21):
im=np.transpose(fake_x[i])
plt.subplot(3,7,i+1)
plt.imshow(im)
plt.show()

Now let's train the model.

if __name__ == '__main__':
lr,latent_dim,num_epochs=0.005,100,50
train(discriminator,generator,data_iter,num_epochs,lr,latent_dim)

Summary

  • DCGAN architecture has four convolutional layers for the Discriminator and four "fractionally-strided" convolutional layers for the Generator.
  • The Discriminator is a 4-layer strided convolutions with batch normalization (except its input layer) and leaky ReLU activations.
  • Leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the “dying ReLU” problem and helps the gradients flow easier through the architecture.

Exercises

  • What will happen if we use standard ReLU activation rather than leaky ReLU?
  • Apply DCGAN on Fashion-MNIST and see which category works well and which does not.

DCGAN的更多相关文章

  1. 学习笔记GAN003:GAN、DCGAN、CGAN、InfoGAN

    ​GAN应用集中在图像生成,NLP.Robt Learning也有拓展.类似于NLP中的Actor-Critic. https://arxiv.org/pdf/1610.01945.pdf . Gen ...

  2. 学习笔记GAN004:DCGAN main.py

    Scipy 高端科学计算:http://blog.chinaunix.net/uid-21633169-id-4437868.html import os #引用操作系统函数文件 import sci ...

  3. DCGAN 论文简单解读

    DCGAN的全称是Deep Convolution Generative Adversarial Networks(深度卷积生成对抗网络).是2014年Ian J.Goodfellow 的那篇开创性的 ...

  4. DCGAN 代码简单解读

    之前在DCGAN文章简单解读里说明了DCGAN的原理.本次来实现一个DCGAN,并在数据集上实际测试它的效果.本次的代码来自github开源代码DCGAN-tensorflow,感谢carpedm20 ...

  5. 【深度学习】--DCGAN从入门到实例应用

    一.前述 DCGAN就是Deep Concolutions应用到GAN上,但是和传统的卷积应用还有一些区别,最大的区别就是没有池化层.本文将详细分析卷积在GAN上的应用. 二.具体 1.DCGAN和传 ...

  6. 用Tensorflow实现DCGAN

    1. GAN简介 最近几年,深度神经网络在图像识别.语音识别以及自然语言处理方面的应用有了爆炸式的增长,并且都达到了极高的准确率,某些方面甚至超过了人类的表现.然而人类的能力远超出图像识别和语音识别的 ...

  7. Tensorflow:DCGAN生成手写数字

    参考地址:https://blog.csdn.net/miracle_ma/article/details/78305991 使用DCGAN(deep convolutional GAN):深度卷积G ...

  8. 通过DCGAN进行生成花朵

    环境是你要安装Keras和Tensorflow 先来个network.py,里面定义了生成器网络和鉴别器网络: # -*- coding: UTF-8 -*- """ D ...

  9. 学习笔记GAN002:DCGAN

    Ian J. Goodfellow 论文:https://arxiv.org/abs/1406.2661 两个网络:G(Generator),生成网络,接收随机噪声Z,通过噪声生成样本,G(z).D( ...

  10. talk is cheap, show me the code——dcgan,wgan,wgan-gp的tensorflow实现

    最近学习了生成对抗网络(GAN),基于几个经典GAN网络结构做了些小实验,包括dcgan,wgan,wgan-gp.坦率的说,wgan,wgan-gp论文的原理还是有点小复杂,我也没有完全看明白,因此 ...

随机推荐

  1. String类与StringBuffer类

    String类与StringBuffer类   一.String类和StringBuffer类的区别 String类是不可变类,新建的对象为不可变对象(String类的内容和长度是固定的),一旦被创建 ...

  2. 简单看看ReentrantLock

    前面我们分析了AQS的基本原理,然后也试着基于AQS实现了一个可重入的锁了,现在我们再来看看官方的ReentrantLock锁,这个锁是可重入的独占锁,也就是说同时只有一个线程可以获取该锁,而且这个线 ...

  3. jdk1.7推出的Fork/Join提高业务代码处理性能

    jdk1.7推出的Fork/Join提高业务代码处理性能 jdk1.7之后推出了Fork/Join框架,其原理个人理解为:递归多线程并发处理业务代码,以下为我模拟我们公司业务代码做的一个案例,性能可提 ...

  4. SciPy 积分

    章节 SciPy 介绍 SciPy 安装 SciPy 基础功能 SciPy 特殊函数 SciPy k均值聚类 SciPy 常量 SciPy fftpack(傅里叶变换) SciPy 积分 SciPy ...

  5. 安装双版本python2 和 python 3 所产生得问题 解决yum对python依赖版本问题

    错误 解决办法 一是升级yum  直接使用python3以上版本 二是修改yum的解释器为旧版本python2.7,即将连接文件   /usr/bin/python    软连接回   /usr/bi ...

  6. 【转载】Jmeter关联-正则表达式提取器

            今天研发同事提供了一个验证token的接口,要验证token的正确性,现在将整个过程做如下记录: 场景:验证token的正确性 原理:首先用户登录成功后,会在Response head ...

  7. PAT (Advanced Level) 1136~1139:1136模拟 1137模拟 1138 前序中序求后序 1139模拟

    1136 A Delayed Palindrome(20 分) 题意:给定字符串A,判断A是否是回文串.若不是,则将A反转得到B,A和B相加得C,若C是回文串,则A被称为a delayed palin ...

  8. JS+ES6 - 向数组的开头添加一个或更多元素

  9. 使用delphi TThread类创建线程备忘录

    备忘,不常用经常忘了细节 TMyThread = class(TThread) private { Private declarations } protected procedure Execute ...

  10. SQL server 注入 和 SQL server 扩展(10.29 第二十九天)

    Step1:检测注入点 Step2: select * from sysobjects   (sysobjects 系统对象表,保存当前数据库的对象) select * from users wher ...