Deep Convolution Auto-encoder

一、概念介绍

自编码器是一种执行数据压缩的网络架构，其中的压缩和解压缩功能是从数据本身学习得到的，而非人为手工设计的。自编码器的两个核心部分是编码器和解码器，它将输入数据压缩到一个潜在表示空间里面，然后再根据这个表示空间将数据进行重构得到最后的输出数据。编码器和解码器都是用神经网络构建的，整个网络的构建方式和普通的神经网络类似，通过最小化输入和输出之间的差异来得到最好的网络。

二、作用
1. 图像去噪；

2. 数据压缩降维。

但是它的图像压缩性能不如JPEG、MP3等传统压缩方法，并且自编码器泛化到其他数据集方面有困难。

三、卷积自编码器实现：
1. 加载数据：

我们的数据基于MNIST数据集，首先需要下载数据并且放在MNIST_data目录下，可以从文章后面提供的链接下载，也可以自行找网上的资源进行下载。目录结构：

，

MNIST数据集：

加载数据集：

%matplotlib inline

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', validation_size=0)

2. 数据可视化：

查看一张图片：

img = mnist.train.images[2]
plt.imshow(img.reshape((28, 28)), cmap='Greys_r')
输出：

3. 构建神经网络结构：

网络的编码器部分将是一个典型的卷积金字塔。每一个卷积层后面都有一个最大池化层来减少维度。解码器需要从一个窄的表示转换成一个宽的重构图像。例如，表示可以是4x4x8 的最大池化层。这是编码器的输出，也是译码器的输入。我们想要从解码器中得到一个28x28x1图像，所以我们需要从狭窄的解码器输入层返回。这是网络的示意图：

这里我们最后的编码器层有大小4x4x8=128。原始图像的大小为28x28x1=784，因此编码的矢量大约是原始图像大小的16%。这些只是每个层的建议大小。网络的深度和大小都可以更改，但请记住，我们的目标是找到输入数据的一个小表示。

在编码阶段，我们使用卷积层和最大池化层来不断减小输入的维度，在解码器阶段，需要使用反卷积将4x4x8的图片还原到原来的28x28x1。我们使用的这种反卷积方法叫做去卷积，关于反卷积的知识，可以查看这篇文章。在Tensorflow中，很容易使用tf.image.resize_images 或者tf.image.resize_nearest_neighbor实现，代码如下：

inputs_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='inputs')
targets_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='targets')

### 编码器--压缩
conv1 = tf.layers.conv2d(inputs_, 16, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 28x28x16
maxpool1 = tf.layers.max_pooling2d(conv1, (2,2), (2,2), padding='same')
# 当前shape: 14x14x16
conv2 = tf.layers.conv2d(maxpool1, 8, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 14x14x8
maxpool2 = tf.layers.max_pooling2d(conv2, (2,2), (2,2), padding='same')
# 当前shape: 7x7x8
conv3 = tf.layers.conv2d(maxpool2, 8, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 7x7x8
encoded = tf.layers.max_pooling2d(conv3, (2,2), (2,2), padding='same')
# 当前shape: 4x4x8

### 解码器--还原
upsample1 = tf.image.resize_nearest_neighbor(encoded, (7,7))
# 当前shape: 7x7x8
conv4 = tf.layers.conv2d(upsample1, 8, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 7x7x8
upsample2 = tf.image.resize_nearest_neighbor(conv4, (14,14))
# 当前shape: 14x14x8
conv5 = tf.layers.conv2d(upsample2, 8, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 14x14x8
upsample3 = tf.image.resize_nearest_neighbor(conv5, (28,28))
# 当前shape: 28x28x8
conv6 = tf.layers.conv2d(upsample3, 16, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 28x28x16

logits = tf.layers.conv2d(conv6, 1, (3,3), padding='same', activation=None)
#当前shape: 28x28x1

decoded = tf.nn.sigmoid(logits, name='decoded')

#计算损失函数
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits)
cost = tf.reduce_mean(loss)
#使用adam优化器优化损失函数
opt = tf.train.AdamOptimizer(0.001).minimize(cost)
4. 训练网络：

sess = tf.Session()

epochs = 20
batch_size = 200
sess.run(tf.global_variables_initializer())
for e in range(epochs):
for ii in range(mnist.train.num_examples//batch_size):
batch = mnist.train.next_batch(batch_size)
imgs = batch[0].reshape((-1, 28, 28, 1))
batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: imgs,
targets_: imgs})

print("Epoch: {}/{}...".format(e+1, epochs),
"Training loss: {:.4f}".format(batch_cost))

注意：20次迭代需要的时间可能有点长，需要耐心等待~

训练完成：

5. matplotlib绘图查看压缩后还原的图片与原图片的区别：

fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(20,4))
in_imgs = mnist.test.images[:10]
reconstructed = sess.run(decoded, feed_dict={inputs_: in_imgs.reshape((10, 28, 28, 1))})

for images, row in zip([in_imgs, reconstructed], axes):
for img, ax in zip(images, row):
ax.imshow(img.reshape((28, 28)), cmap='Greys_r')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

fig.tight_layout(pad=0.1)
sess.close()
输出：

可以看到有些压缩还原后多了些噪声，到此卷积自编码器实现完成，但是实际上目前构建的编码器并没有什么实际的用处，然而通过在嘈杂的图像上训练网络，它们可以很成功地去表示图像。我们可以通过在训练图像中加入高斯噪声来创建噪声图像，然后将值剪切到0到1之间。我们将使用噪声图像作为输入，并将原始的、干净的图像作为目标。这是我生成的噪声图像和去噪图像的一个例子。

四、使用卷积自编码器进行图像去噪
1. 网络结构：

由于去噪对于网络来说是一个更难的问题，我们将会使用更深层的卷积层，这里使用32-32-16来表示编码器的卷积层的深度，同样的深度在解码器中向后延伸。否则，体系结构和三中自编码器的结构一样。

inputs_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='inputs')
targets_ = tf.placeholder(tf.float32, (None, 28, 28, 1), name='targets')

### 编码器
conv1 = tf.layers.conv2d(inputs_, 32, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 28x28x32
maxpool1 = tf.layers.max_pooling2d(conv1, (2,2), (2,2), padding='same')
# 当前shape: 14x14x32
conv2 = tf.layers.conv2d(maxpool1, 32, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 14x14x32
maxpool2 = tf.layers.max_pooling2d(conv2, (2,2), (2,2), padding='same')
# 当前shape: 7x7x32
conv3 = tf.layers.conv2d(maxpool2, 16, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 7x7x16
encoded = tf.layers.max_pooling2d(conv3, (2,2), (2,2), padding='same')
# 当前shape: 4x4x16

### 解码器
upsample1 = tf.image.resize_nearest_neighbor(encoded, (7,7))
# 当前shape: 7x7x16
conv4 = tf.layers.conv2d(upsample1, 16, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 7x7x16
upsample2 = tf.image.resize_nearest_neighbor(conv4, (14,14))
# 当前shape: 14x14x16
conv5 = tf.layers.conv2d(upsample2, 32, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 14x14x32
upsample3 = tf.image.resize_nearest_neighbor(conv5, (28,28))
# 当前shape: 28x28x32
conv6 = tf.layers.conv2d(upsample3, 32, (3,3), padding='same', activation=tf.nn.relu)
# 当前shape: 28x28x32

logits = tf.layers.conv2d(conv6, 1, (3,3), padding='same', activation=None)
#当前shape: 28x28x1

decoded = tf.nn.sigmoid(logits, name='decoded')

loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits)
cost = tf.reduce_mean(loss)
opt = tf.train.AdamOptimizer(0.001).minimize(cost)
2. 进行100次迭代训练网络(时间更久，建议在GPU上训练)：

sess = tf.Session()
epochs = 100
batch_size = 200
# Set's how much noise we're adding to the MNIST images
noise_factor = 0.5
sess.run(tf.global_variables_initializer())
for e in range(epochs):
for ii in range(mnist.train.num_examples//batch_size):
batch = mnist.train.next_batch(batch_size)
# Get images from the batch
imgs = batch[0].reshape((-1, 28, 28, 1))

# Add random noise to the input images
noisy_imgs = imgs + noise_factor * np.random.randn(*imgs.shape)
# Clip the images to be between 0 and 1
noisy_imgs = np.clip(noisy_imgs, 0., 1.)

# Noisy images as inputs, original images as targets
batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: noisy_imgs,
targets_: imgs})

print("Epoch: {}/{}...".format(e+1, epochs),
"Training loss: {:.4f}".format(batch_cost))
100次迭代后的loss

3. 测试网络的去噪效果：

我们在测试图像中添加了噪声，并将它们传递给自动编码器。尽管有时很难分辨出原始的数字是什么，但它在消除噪音方面做得很好。

fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(20,4))
in_imgs = mnist.test.images[:10]
noisy_imgs = in_imgs + noise_factor * np.random.randn(*in_imgs.shape)
noisy_imgs = np.clip(noisy_imgs, 0., 1.)

reconstructed = sess.run(decoded, feed_dict={inputs_: noisy_imgs.reshape((10, 28, 28, 1))})

for images, row in zip([noisy_imgs, reconstructed], axes):
for img, ax in zip(images, row):
ax.imshow(img.reshape((28, 28)), cmap='Greys_r')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

fig.tight_layout(pad=0.1)
输出：

可以看到，大多数情况下，去噪的效果还是不错的。

源代码以及数据地址：参见这里

Deep Convolution Auto-encoder的更多相关文章

Auto Encoder用于异常检测
对基于深度神经网络的Auto Encoder用于异常检测的一些思考 from:https://my.oschina.net/u/1778239/blog/1861724 一.前言现实中,大部分数据都 ...
Auto Encoder
对自编码器的理解: 对于给定的原始输入x,让网络自动找到一种编码方式(特征提取,原始数据的另一种表达),使其解码后的输出x'尽可能复现原始输入x. 知乎参考:https://www.zhihu.com ...
论文翻译：2021_DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering
论文地址:DeepFilterNet:基于深度滤波的全频带音频低复杂度语音增强框架论文代码:https://github.com/ Rikorose/DeepFilterNet 引用:Schröte ...
论文翻译：2020_Lightweight Online Noise Reduction on Embedded Devices using Hierarchical Recurrent Neural Networks
论文地址:基于分层递归神经网络的嵌入式设备轻量化在线降噪引用格式:Schröter H, Rosenkranz T, Zobel P, et al. Lightweight Online Noise ...
Deep Learning and Shallow Learning
Deep Learning and Shallow Learning 由于 Deep Learning 现在如火如荼的势头,在各种领域逐渐占据 state-of-the-art 的地位,上个学期在一门 ...
使用VAE、CNN encoder+孤立森林检测ssl加密异常流的初探——真是一个忧伤的故事！！！
ssl payload取1024字节,然后使用VAE检测异常的ssl流. 代码如下: from sklearn.model_selection import train_test_split from ...
DeepCoder: A Deep Neural Network Based Video Compression
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Abstract: 在深度学习的最新进展的启发下,我们提出了一种基于卷积神经网络(CNN)的视频压缩框架DeepCoder.我们分别对预测 ...
Deep Learning-Based Video Coding: A Review and A Case Study
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! 1.Abstract: 本文主要介绍的是2015年以来关于深度图像/视频编码的代表性工作,主要可以分为两类:深度编码方案以及基于传统编码方 ...
论文解读（DFCN）《Deep Fusion Clustering Network》
Paper information Titile:Deep Fusion Clustering Network Authors:Wenxuan Tu, Sihang Zhou, Xinwang Liu ...

随机推荐

C#中异步调用示例与详解
using System; using System.Collections.Generic; using System.Text; using System.Runtime.InteropServi ...
codecademy课程笔记——JavaScript Promise
Promise是一种表示异步操作最终的结果的对象,一个Promise对象有三种状态 Pending: 初始状态 ,操作还未完成 Fullfilled:操作成功完成,且这个promise现在有一个r ...
React组件传值
React的单向数据流与组件间的沟通. 首先,我认为使用React的最大好处在于:功能组件化,遵守前端可维护的原则. 先介绍单向数据流吧. React单向数据流: React是单向数据流,数据主要从父 ...
<c:forEach var="role" items="[entity.Role@d54d4d, entity.Role@1c61868, entity.Role@6c58db, entity.Role@13da8a5]"> list 集合数据转换异常
<c:forEach var="role" items="[entity.Role@d54d4d, entity.Role@1c61868, entity.Role ...
servlet实现简单的登录功能
1.登录页面 Login.html <%@page contentType="text/html" pageEncoding="UTF-8"%> & ...
openlayers应用原理
1.数据组织 OpenLayers通过同层(Layer)进行组织渲染,然后通过数据源设置具体的地图数据来源.因此,Layer与Source是密切相关的对应关系,缺一不可.Layer可看做渲染地图的层容 ...
linux ---docker篇
Docker docker是什么? docker最初是dotCloud公司创始人Solomom Hykes在法国期间发起的一个公司内部项目,它是基于dotCloud公司多年云服务技术的一次革新,并在2 ...
Python学习小纪
1.打包发布*.py文件---"文件路径下打开命令行 d:\python\python.exe setup.py sdist" eg:打包发布f:\C\python\print_l ...
报文分析2、IP头的结构
IP头的结构版本(4位) 头长度(4位) 服务类型(8位) 封包总长度(16位) 封包标识(16位) 标志(3位) 片断偏移地址(13位) 存活时间(8位) 协议(8位) 校验和(16位) 来源IP ...
让Delphi的TRichEdit支持新标准
先说明, 不是直接让TRichedit支持, 而是派生出一个类支持原理就是, IDE自带的richedit使用的是2.0版本(RICHEDIT20A/RICHEDIT20W), 这个版本虽然支持图片 ...

Deep Convolution Auto-encoder

Deep Convolution Auto-encoder的更多相关文章

随机推荐

热门专题