FCN
Learning DeConvolution Network for Semantic Sefeijiegmentation
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
U-Net

Tags: Paper

总结几篇看过的语义分割论文，FCN, DeconvNet, SegNet, U-Net,后面会再总结DeepLab的论文.

FCN

Abstract

提出end to end FCN，输入arbitrary size image, 输出同样大小的label map. FCN中的skip architecture combines semantic information from a deep coarse layer with appearance information from a shallow fine layer to produce accurate and detailed segmentations.

Introduction

使用supervised pretrained classification netowrk来进行pixel wise prediction.
语义分割问题面对的问题是语义信息和位置信息之间的inherent tension

Related Work

FCN

FCN作为将深度学习应用到分割问题上的开山鼻祖，虽然不是end-to-end 的，但是为后面的U-net, E-net, SegNet打下基础，特别是使用deconvolution 来对 coarse map unsample这一想法.

Adapting classifiers for dense prediction

全连接层可以看做在整个feature map上卷积的特殊情况，去除网络最后的全连接层网络输出的是label map加上spatial loss 就可以进行end-to-end dense learning.

Shift-and-stitch is filter rarefaction

rarefaction: 稀薄化

a trous algorithm

FCN还提到了后面DeepLab中用到的带孔卷积

Upsampling is backwards strided convolution

In a sense, upsampling with factor \(f\) is convolution with a fractional input stride of \(1/f\). So long as \(f\) is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of \(f\).
这段话有点费解.

Deconvolution的解释:

https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers
https://github.com/vdumoulin/conv_arithmetic
stride two and padding:

称为transposed convolution是因为，transposed convolution经常用在backward计算的时候，反向传播可以通过乘以权重矩阵的转置完成. 图中的filter明明是stride 1为什么是stride 2呢？stride 2是相对原图(没有stride之前), 因为每个像素之前插入了0，现在的stride 1 就相当于原来的stride 2.

Shown above is a transposed convolution. 'stride two' means stride in the corresponding original convolution is two. This is precisely why you have 1 (=2-1, 2 being the original stride) layer of zeros in between rows and columns. Transposed convolution is generally used in backward pass. It is called transposed because of the analogy with fully connected layer where you multiply with the transpose of the weight matrix during a backward pass.

patchwise trainig is loss sampling

Segmentation Architecture

作者fully convolution networks主要由in-network unsampling和pixelwise loss组成, 此外还有skip architecture.

Learning DeConvolution Network for Semantic Sefeijiegmentation

Abstract

deep deconvolution + proposal-wise prediction

反卷积网络由反卷积和上采样层组成

1. Introduction

现有的基于CNN semantic segmentation网络大都是对前面分类网络得到的label map(FCN中是16*16)做基于bilinear interpolation的deconvolution. 然而这种deconvolution 的输入是前面经过convolution 和pooling 的 feature map这个feature map已经失去了很多structured details, 往往使用deconvolution不能得到很好的效果。
一些方法使用FCN + Conditional Random Field来解决这一问题。

2. Related Work

FCN:
FCN由于其fixed size 的 receptive field使其对于过小的物体不能分类，对于过大的物体则会预测处多个类别(大小相对于receptive field而言).
FCN+CRF

3. System Architecture

网络的encoder是VGG分类网络，网络的decoder是对分类网络得到的feature map进行unpooling的deconvolution网络，最后网络输出的是概率图，对于每个像素属于每一个类别的概率. 最后得到每个像素类别的label. 这里可以提前说下DeconvNet没有去除VGG分类网络的fully connected layer, 而fully connected layer中有大量的参数，最后训练处理出的模型会占用大量的空间. 如果是做Application级别的产品最好还是用后面的SegNet, SegNet去除了fully connected layer不管是训练速度还是占用内存都要小很多.

Unpooling和Deconvolution

Unpooling

什么是pooling?

Pooling in convolution network is designed to filter noisy
activations in a lower layer by abstracting activations in a
receptive field with a single representative value.

虽然pooling可以增强激活区域的鲁棒性，但是同事也丢失了感受域内的空间信息。这些structure information可能对需要dense prediction的segmentation有较大的作用.

如何实现unpooling?
记录pooling时最大激活点(maximum activation)的位置。

deconvolution

从unpooling处得到的内容是稀疏的，通过deconvolution 可以得到enlarge dense 的 activation map. 然后将enlarge 边缘的像素裁剪掉得到和unpooling 输入大小一样的feature map.

在网络中unpooling和deconvolution的作用是不一样的：可以说unpooling是example specific的而deconvolution是class specific的. example specific意思就是只要是object那么unpooling通过前面pooling记录的 location information重建object的structure, 但是我们需要对每个像素点进行分类，那么你得到object stucture还不够，周围还有噪声信息和非target class的信息，那么deconvolution就是对其target class信息进行放大，对非target class信息进行抑制. 结合二者, decoder端的deconvolution network就可以输出较为准确的segmentation map.

其实从这两点而言DeconvNet和SegNet的decoder端的结构很相似的. 上采样得到sparse activation map然后通过deconv/conv得到dense activation map.

从下面activation map的可视化也可以看出encoder端是特征逐渐抽象(detail to coarse)的过程而decoder是从(coase to detail)的过程:

instance wise segmentation vs. image level segmentation

这里没怎么看懂

Training

Batch Normalization
Two-stage Training
ensemble with FCN
网络详细结构：

Inference

测试的时候每张图像在输入网络之前，作者使用edge-box来产生candicate proposals这样可以在不同尺度上检测物体. 每张测试图片先产生2000个candicates然后根据object score挑选50个输入网路. 前面提到的instance wise segmentation也应该和这里有关，感觉作者介绍的不是很详细.

总体而言DeConvNet的idea虽然比较novel(不知道SegNet有没有借鉴DeConvNet), 但是很明显网络过深，很难训练，而且没有去除fully connected layer, 还需要使用edge-box产生candicate proposal, 不是一个end-to-end的网络. 实际使用的话我还是推荐SegNet吧.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Abstract

The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample.

1. Introduction

在decoding端重复使用encoding的 max-pooling indices：

improves boundary delineation
reduces the number of parameters enabling end-to-end training

Architecture

without fully connected layer(134M to 14.7M)

encoder

conv + batchnorm + ReLU + max pooling(2*2)

to keep the spatial resolution of the feature map after max pooling, Segnet choose to store max pooling indices.

decoder

upsample feature maps using max pooling indices -> sparse feature maps. + trainable filter banks + batch norm

Use variant kinds of decoders to compare

Training

median frequency balancing
natural frequency balancing

analysis

BF: boundry F1 measure

SegNet和Deconvolution Net相似之处都是在encoder端保存max pooling indices,然后在decoder端使用indices进行unsample得到feature map, 然而这个时候得到的feature map仍然是稀疏的，因此在这个feature map之后再接convolution layer/deconvolutional得到更好的feature map. SegNet和Deconvoluton Net差别在于SegNet没有fully connected layer是一个更加轻量的框架.

U-Net

Abstract

use data augumentation to train the model
contracting path to capture context
symmetric expanding path enables precise localization

Introduction

High resolution features from the contracting path are combined with the upsampled output
overlap-tile strategy 这里没怎么看懂啊
elastic deformation for augmentation
使用weight loss解决多分类问题中的touching border问题

Network Architecture

左边是contracting path, 右边是expansive path

左边使用33 convolution + ReLU + 22 max-pooling, 每次pooling feature channels 加倍
右边使用upsampling + 22 convolution(feature channels数目减半)+concatenation with corresponding feature map from contracting path + 33 convolution + ReLU

Training

energy function:
\[E = \sum_{x\in\Omega} w(x)log(p_{l(x)}(x))\]
weight map:
\[w(x) = w_c(x) + w_0 \cdot exp(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2})\]
每一层的权重初始化,高斯分布，std: \(\sqrt{2/N}\)

Experiments

在两个医学数据集上都取得了较好的效果.

总体而言U-net结构是比较简单的，而且根据作者所言比较适合小数据集，第一个来自于EM segmentation challenge 中只有30张(512*512)图片,

[Papers] Semantic Segmentation Papers(1)的更多相关文章

3D Graph Neural Networks for RGBD Semantic Segmentation
3D Graph Neural Networks for RGBD Semantic Segmentation 原文章:https://www.yuque.com/lart/papers/wmu47a ...
Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation
Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregati ...
semantic segmentation 和instance segmentation
作者:周博磊链接:https://www.zhihu.com/question/51704852/answer/127120264来源:知乎著作权归作者所有,转载请联系作者获得授权. 图1. 这张图清 ...
Review of Semantic Segmentation with Deep Learning
In this post, I review the literature on semantic segmentation. Most research on semantic segmentati ...
Fully Convolutional Networks for semantic Segmentation（深度学习经典论文翻译）
摘要卷积网络在特征分层领域是非常强大的视觉模型.我们证明了经过端到端.像素到像素训练的卷积网络超过语义分割中最先进的技术.我们的核心观点是建立"全卷积"网络,输入任意尺寸,经过有 ...
论文笔记之：Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation xx
论文笔记之：Instance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai Kaiming He Jian Sun ...
目标检测--Rich feature hierarchies for accurate object detection and semantic segmentation(CVPR 2014)
Rich feature hierarchies for accurate object detection and semantic segmentation 作者: Ross Girshick J ...
论文学习：Fully Convolutional Networks for Semantic Segmentation
发表于2015年这篇<Fully Convolutional Networks for Semantic Segmentation>在图像语义分割领域举足轻重. 1 CNN 与 FCN 通 ...

随机推荐

JVM —— Java 对象占用空间大小计算
零. 为什么要知道 Java 对象占用空间大小缓存的实现: 在设计 JVM 内缓存时(不是借助 Memcached. Redis 等), 须要知道缓存的对象是否会超过 JVM 最大堆限制, 假设会超 ...
luogu1775 古代人的难题打表找规律
题目大意:给出一正整数k,求满足(x^2-x*y-y^2)^2=1且x,y∈[1,k]且x^2+y^2最大的正整数x,y. 既然x,y的范围给出来了,我们便有了暴力解法.因此,本题最适合打表找规律了! ...
JS遮罩层
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
C语言 - typedef struct 与struct
c语言中可以选择的数据类型太少了. Java中有一些高级的数据结构. 结构中能够存放基本的数据类型以及其他的结构. 结构定义,一般放在程序的开头部分. 一般放在include之后. #include ...
php错误抑制符
php错误抑制符简介 PHP 支持一个错误控制运算符:@.当将其放置在一个 PHP 表达式之前,该表达式可能产生的任何错误信息都被忽略掉. @这个符号在Java里面是注解符号. 实例 <?ph ...
hdoj--2509--Be the Winner（尼姆博弈）
Be the Winner Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) To ...
解决 dotnet core 1.x 命令行(cli) 下运行路径错误
环境: Windows 10,Visual Studio 2017 centos 7,nginx,supervisor,dotnet core 1.1 问题: 在 Linux 配置 superviso ...
[xPlugins] jQuery Contextmenu右键菜单
[2012-04-12] Contextmenu 右键菜单 v0.1 版本发布 [功能] 在特定区域弹出右键菜单 [功能] 可以在弹出右键菜单区域内,再屏蔽某个小区域. [功能] 有两种方式添加右键菜 ...
springmvc-servlet.xml 第二种选择
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.sp ...
派遣函数IRP
派遣函数是Windows驱动程序中的重要概念.驱动程序的主要功能是负责处理I/O请求,其中大部分I/O请求是在派遣函数中处理的. 用户模式下所有对驱动程序的I/O请求,全部由操作系统转换为一个叫做IR ...

[Papers] Semantic Segmentation Papers(1)

FCN

Abstract

Introduction

Related Work

FCN

Adapting classifiers for dense prediction

Shift-and-stitch is filter rarefaction

a trous algorithm

Upsampling is backwards strided convolution

patchwise trainig is loss sampling

Segmentation Architecture

Learning DeConvolution Network for Semantic Sefeijiegmentation

Abstract

1. Introduction

2. Related Work

3. System Architecture

Unpooling和Deconvolution

Unpooling

deconvolution

instance wise segmentation vs. image level segmentation

Training

Inference

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Abstract

1. Introduction

Architecture

encoder

decoder

Training

analysis

U-Net

Abstract

Introduction

Network Architecture

Training

Experiments

[Papers] Semantic Segmentation Papers(1)的更多相关文章

随机推荐

热门专题