[Papers] Semantic Segmentation Papers(1)
目录
Tags: Paper
总结几篇看过的语义分割论文,FCN, DeconvNet, SegNet, U-Net,后面会再总结DeepLab的论文.
FCN
Abstract
提出end to end FCN,输入arbitrary size image, 输出同样大小的label map. FCN中的skip architecture combines semantic information from a deep coarse layer with appearance information from a shallow fine layer to produce accurate and detailed segmentations.
Introduction
使用supervised pretrained classification netowrk来进行pixel wise prediction.
语义分割问题面对的问题是语义信息和位置信息之间的inherent tension
Related Work
FCN
FCN作为将深度学习应用到分割问题上的开山鼻祖,虽然不是end-to-end 的,但是为后面的U-net, E-net, SegNet打下基础,特别是使用deconvolution 来对 coarse map unsample这一想法.
Adapting classifiers for dense prediction
全连接层可以看做在整个feature map上卷积的特殊情况,去除网络最后的全连接层网络输出的是label map加上spatial loss 就可以进行end-to-end dense learning.
Shift-and-stitch is filter rarefaction
rarefaction: 稀薄化
a trous algorithm
FCN还提到了后面DeepLab中用到的带孔卷积
Upsampling is backwards strided convolution
In a sense, upsampling with factor \(f\) is convolution with a fractional input stride of \(1/f\). So long as \(f\) is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of \(f\).
这段话有点费解.
Deconvolution的解释:
- https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers
- https://github.com/vdumoulin/conv_arithmetic
stride two and padding:
称为transposed convolution是因为,transposed convolution经常用在backward计算的时候,反向传播可以通过乘以权重矩阵的转置完成. 图中的filter明明是stride 1为什么是stride 2呢?stride 2是相对原图(没有stride之前), 因为每个像素之前插入了0,现在的stride 1 就相当于原来的stride 2.
Shown above is a transposed convolution. 'stride two' means stride in the corresponding original convolution is two. This is precisely why you have 1 (=2-1, 2 being the original stride) layer of zeros in between rows and columns. Transposed convolution is generally used in backward pass. It is called transposed because of the analogy with fully connected layer where you multiply with the transpose of the weight matrix during a backward pass.
patchwise trainig is loss sampling
Segmentation Architecture
作者fully convolution networks主要由in-network unsampling和pixelwise loss组成, 此外还有skip architecture.
Learning DeConvolution Network for Semantic Sefeijiegmentation
Abstract
deep deconvolution + proposal-wise prediction
反卷积网络由反卷积和上采样层组成
1. Introduction
现有的基于CNN semantic segmentation网络大都是对前面分类网络得到的label map(FCN中是16*16)做基于bilinear interpolation的deconvolution. 然而这种deconvolution 的输入是前面经过convolution 和pooling 的 feature map这个feature map已经失去了很多structured details, 往往使用deconvolution不能得到很好的效果。
一些方法使用FCN + Conditional Random Field来解决这一问题。
2. Related Work
FCN:
FCN由于其fixed size 的 receptive field使其对于过小的物体不能分类,对于过大的物体则会预测处多个类别(大小相对于receptive field而言).
FCN+CRF
3. System Architecture
网络的encoder是VGG分类网络,网络的decoder是对分类网络得到的feature map进行unpooling的deconvolution网络,最后网络输出的是概率图,对于每个像素属于每一个类别的概率. 最后得到每个像素类别的label. 这里可以提前说下DeconvNet没有去除VGG分类网络的fully connected layer, 而fully connected layer中有大量的参数,最后训练处理出的模型会占用大量的空间. 如果是做Application级别的产品最好还是用后面的SegNet, SegNet去除了fully connected layer不管是训练速度还是占用内存都要小很多.
Unpooling和Deconvolution
Unpooling
什么是pooling?
Pooling in convolution network is designed to filter noisy
activations in a lower layer by abstracting activations in a
receptive field with a single representative value.
虽然pooling可以增强激活区域的鲁棒性,但是同事也丢失了感受域内的空间信息。这些structure information可能对需要dense prediction的segmentation有较大的作用.
如何实现unpooling?
记录pooling时最大激活点(maximum activation)的位置。
deconvolution
从unpooling处得到的内容是稀疏的,通过deconvolution 可以得到enlarge dense 的 activation map. 然后将enlarge 边缘的像素裁剪掉得到和unpooling 输入大小一样的feature map.
在网络中unpooling和deconvolution的作用是不一样的:可以说unpooling是example specific的而deconvolution是class specific的. example specific意思就是只要是object那么unpooling通过前面pooling记录的 location information重建object的structure, 但是我们需要对每个像素点进行分类,那么你得到object stucture还不够,周围还有噪声信息和非target class的信息,那么deconvolution就是对其target class信息进行放大,对非target class信息进行抑制. 结合二者, decoder端的deconvolution network就可以输出较为准确的segmentation map.
其实从这两点而言DeconvNet和SegNet的decoder端的结构很相似的. 上采样得到sparse activation map然后通过deconv/conv得到dense activation map.
从下面activation map的可视化也可以看出encoder端是特征逐渐抽象(detail to coarse)的过程而decoder是从(coase to detail)的过程:
instance wise segmentation vs. image level segmentation
这里没怎么看懂
Training
- Batch Normalization
- Two-stage Training
ensemble with FCN
网络详细结构:
Inference
测试的时候每张图像在输入网络之前,作者使用edge-box来产生candicate proposals这样可以在不同尺度上检测物体. 每张测试图片先产生2000个candicates然后根据object score挑选50个输入网路. 前面提到的instance wise segmentation也应该和这里有关,感觉作者介绍的不是很详细.
总体而言DeConvNet的idea虽然比较novel(不知道SegNet有没有借鉴DeConvNet), 但是很明显网络过深,很难训练,而且没有去除fully connected layer, 还需要使用edge-box产生candicate proposal, 不是一个end-to-end的网络. 实际使用的话我还是推荐SegNet吧.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Abstract
The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample.
1. Introduction
在decoding端重复使用encoding的 max-pooling indices:
- improves boundary delineation
- reduces the number of parameters enabling end-to-end training
Architecture
without fully connected layer(134M to 14.7M)
encoder
conv + batchnorm + ReLU + max pooling(2*2)
to keep the spatial resolution of the feature map after max pooling, Segnet choose to store max pooling indices.
decoder
upsample feature maps using max pooling indices -> sparse feature maps. + trainable filter banks + batch norm
Use variant kinds of decoders to compare
Training
- median frequency balancing
- natural frequency balancing
analysis
BF: boundry F1 measure
SegNet和Deconvolution Net相似之处都是在encoder端保存max pooling indices,然后在decoder端使用indices进行unsample得到feature map, 然而这个时候得到的feature map仍然是稀疏的,因此在这个feature map之后再接convolution layer/deconvolutional得到更好的feature map. SegNet和Deconvoluton Net差别在于SegNet没有fully connected layer是一个更加轻量的框架.
U-Net
Abstract
- use data augumentation to train the model
- contracting path to capture context
- symmetric expanding path enables precise localization
Introduction
- High resolution features from the contracting path are combined with the upsampled output
- overlap-tile strategy 这里没怎么看懂啊
- elastic deformation for augmentation
- 使用weight loss解决多分类问题中的touching border问题
Network Architecture
左边是contracting path, 右边是expansive path
左边使用33 convolution + ReLU + 22 max-pooling, 每次pooling feature channels 加倍
右边使用upsampling + 22 convolution(feature channels数目减半)+concatenation with corresponding feature map from contracting path + 33 convolution + ReLU
Training
- energy function:
\[E = \sum_{x\in\Omega} w(x)log(p_{l(x)}(x))\] - weight map:
\[w(x) = w_c(x) + w_0 \cdot exp(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2})\] - 每一层的权重初始化,高斯分布,std: \(\sqrt{2/N}\)
Experiments
在两个医学数据集上都取得了较好的效果.
总体而言U-net结构是比较简单的,而且根据作者所言比较适合小数据集,第一个来自于EM segmentation challenge 中只有30张(512*512)图片,
[Papers] Semantic Segmentation Papers(1)的更多相关文章
- 3D Graph Neural Networks for RGBD Semantic Segmentation
3D Graph Neural Networks for RGBD Semantic Segmentation 原文章:https://www.yuque.com/lart/papers/wmu47a ...
- Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation
Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregati ...
- semantic segmentation 和instance segmentation
作者:周博磊链接:https://www.zhihu.com/question/51704852/answer/127120264来源:知乎著作权归作者所有,转载请联系作者获得授权. 图1. 这张图清 ...
- Review of Semantic Segmentation with Deep Learning
In this post, I review the literature on semantic segmentation. Most research on semantic segmentati ...
- Fully Convolutional Networks for semantic Segmentation(深度学习经典论文翻译)
摘要 卷积网络在特征分层领域是非常强大的视觉模型.我们证明了经过端到端.像素到像素训练的卷积网络超过语义分割中最先进的技术.我们的核心观点是建立"全卷积"网络,输入任意尺寸,经过有 ...
- 论文笔记之:Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation xx
- 论文笔记之:Instance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai Kaiming He Jian Sun ...
- 目标检测--Rich feature hierarchies for accurate object detection and semantic segmentation(CVPR 2014)
Rich feature hierarchies for accurate object detection and semantic segmentation 作者: Ross Girshick J ...
- 论文学习:Fully Convolutional Networks for Semantic Segmentation
发表于2015年这篇<Fully Convolutional Networks for Semantic Segmentation>在图像语义分割领域举足轻重. 1 CNN 与 FCN 通 ...
随机推荐
- ZooKeeper搭建系列集 (这套很全,也很详细)
原文链接:http://blog.csdn.net/shatelang/article/details/7596007 本篇文章结构: 总共包括10个系列 ZooKeeper系列之一:ZooKeepe ...
- linux程序设计——多线程(第十二章)
12.8 多线程 之前,总是让程序的主线程只创建一个线程.这节将演示怎样在同一个程序中创建多个线程,然后怎样以不同于其启动顺序将它们合并在一起.此外,还演示多线程编程时easy出现的时序问题. ...
- Python面向切面编程-语法层面和functools模块
1,Python语法层面对面向切面编程的支持(方法名装饰后改变为log) __author__ = 'Administrator' import time def log(func): def wra ...
- wpf Listbox 设置ItemContainerStyle后,ItemTemplateSelector不能触发
解决方案: 将Listbox 的ItemTemplateSelector 改为 ItemContainerStyle中ContentPresenter ContentTemplateSelector ...
- MFC 程序的运行流程
CWinApp::InitApplication CMyWinApp::InitInstance CMyFrameWnd::CMyFrameWnd CFrameWnd::Create CWnd::Cr ...
- luogu3865 【模板】 ST表
题目大意:给出一段序列,每次查询一段区间,求区间最大值. ST表:设原序列为A,定义F[i][k]为A[i][2k-1]的最大值.有递归式:F[i][k]=max(F[i][k-1], F[i+2k- ...
- 字符串函数---strcmp()与strncmp()详解及实现【转】
本文转载自:http://blog.csdn.net/lanzhihui_10086/article/details/39829623 一.strcmp()与strncmp() strcmp():st ...
- Linux - 常用网络命令详解netstat,scp
ifconfig 查看生效的ip信息. [root@local ~]# ifconfig eno16777736: flags=4163<UP,BROADCAST,RUNNING,MULTICA ...
- 【POJ 3696】 The Luckiest number
[题目链接] http://poj.org/problem?id=3696 [算法] 设需要x个8 那么,这个数可以表示为 : 8(10^x - 1) / 9, 由题, L | 8(10^x - 1) ...
- c#为程序添加全局热键的方法
在程序失去焦点或者在后台运行时,可以通过使用全局热键的方式,进行一些快捷的操作,如QQ默认操作中ctrl+alt+A调出截图功能. 在Windows中实现热键功能需要使用win32的Api函数Regi ...