二阶段目标检测网络-Mask RCNN 详解

ROI Pooling 和 ROI Align 的区别
Mask R-CNN 网络结构
骨干网络 FPN
anchor 锚框生成规则
实验
参考资料

Mask RCNN 是作者 Kaiming He 于 2018 年发表的论文

ROI Pooling 和 ROI Align 的区别

Understanding Region of Interest — (RoI Align and RoI Warp)

Mask R-CNN 网络结构

Mask RCNN 继承自 Faster RCNN 主要有三个改进：

feature map 的提取采用了 FPN 的多尺度特征网络
ROI Pooling 改进为 ROI Align
在 RPN 后面，增加了采用 FCN 结构的 mask 分割分支

网络结构如下图所示：

可以看出，Mask RCNN 是一种先检测物体，再分割的思路，简单直接，在建模上也更有利于网络的学习。

骨干网络 FPN

卷积网络的一个重要特征：深层网络容易响应语义特征，浅层网络容易响应图像特征。Mask RCNN 的使用了 ResNet 和 FPN 结合的网络作为特征提取器。

FPN 的代码出现在 ./mrcnn/model.py中，核心代码如下：

if callable(config.BACKBONE):

    _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,

                                        train_bn=config.TRAIN_BN)

else:

    _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,

                                        stage5=True, train_bn=config.TRAIN_BN)

# Top-down Layers

# TODO: add assert to varify feature map sizes match what's in config

P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)

P4 = KL.Add(name="fpn_p4add")([

    KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),

    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])

P3 = KL.Add(name="fpn_p3add")([

    KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),

    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])

P2 = KL.Add(name="fpn_p2add")([

    KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),

    KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])

# Attach 3x3 conv to all P layers to get the final feature maps.

P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)

P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)

P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)

P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)

# P6 is used for the 5th anchor scale in RPN. Generated by

# subsampling from P5 with stride of 2.

P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

# Note that P6 is used in RPN, but not in the classifier heads.

rpn_feature_maps = [P2, P3, P4, P5, P6]

mrcnn_feature_maps = [P2, P3, P4, P5]

其中 resnet_graph 函数定义如下：

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):

    """Build a ResNet graph.

        architecture: Can be resnet50 or resnet101

        stage5: Boolean. If False, stage5 of the network is not created

        train_bn: Boolean. Train or freeze Batch Norm layers

    """

    assert architecture in ["resnet50", "resnet101"]

    # Stage 1

    x = KL.ZeroPadding2D((3, 3))(input_image)

    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)

    x = BatchNorm(name='bn_conv1')(x, training=train_bn)

    x = KL.Activation('relu')(x)

    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    # Stage 2

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)

    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)

    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)

    # Stage 3

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)

    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)

    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)

    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)

    # Stage 4

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)

    block_count = {"resnet50": 5, "resnet101": 22}[architecture]

    for i in range(block_count):

        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)

    C4 = x

    # Stage 5

    if stage5:

        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)

        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)

        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)

    else:

        C5 = None

    return [C1, C2, C3, C4, C5]

anchor 锚框生成规则

在 Faster-RCNN 中可以将 SCALE 也可以设置为多个值，而在 Mask RCNN 中则是每一特征层只对应着一个SCALE 即对应着上述所设置的 16。

实验

何凯明在论文中做了很多对比单个模块试验，并放出了对比结果表格。

从上图表格可以看出：

sigmoid 和 softmax 对比，sigmoid 有不小提升；
特征网络选择：可以看出更深的网络和采用 FPN 的实验效果更好，可能因为 FPN 综合考虑了不同尺寸的 feature map 的信息，因此能够把握一些更精细的细节。
RoI Align 和 RoI Pooling 对比：在 instance segmentation 和 object detection 上都有不小的提升。这样看来，RoIAlign 其实就是一个更加精准的 RoIPooling，把前者放到 Faster RCNN 中，对结果的提升应该也会有帮助。

参考资料

Mask R-CNN 论文

二阶段目标检测网络-Mask RCNN 详解的更多相关文章

（二）目标检测算法之R-CNN
系列博客链接: (一)目标检测概述 https://www.cnblogs.com/kongweisi/p/10894415.html 概述: 1.目标检测-Overfeat模型 2.目标检测-R-C ...
第三十五节，目标检测之YOLO算法详解
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object de ...
【转】目标检测之YOLO系列详解
本文逐步介绍YOLO v1~v3的设计历程. YOLOv1基本思想 YOLO将输入图像分成SxS个格子,若某个物体 Ground truth 的中心位置的坐标落入到某个格子,那么这个格子就负责检测出这 ...
物体检测丨Faster R-CNN详解
这篇文章把Faster R-CNN的原理和实现阐述得非常清楚,于是我在读的时候顺便把他翻译成了中文,如果有错误的地方请大家指出. 原文:http://www.telesens.co/2018/03/1 ...
Mask R-CNN详解和安装
Detectron是Facebook的物体检测平台,今天宣布开源,它基于Caffe2,用Python写成,这次开放的代码中就包含了Mask R-CNN的实现. 除此之外,Detectron还包含了IC ...
目标检测 1 ：目标检测中的Anchor详解
咸鱼了半年,年底了,把这半年做的关于目标的检测的内容总结下. 本文主要有两部分: 目标检测中的边框表示 Anchor相关的问题,R-CNN,SSD,YOLO 中的anchor 目标检测中的边框表示目 ...
目标检测：SSD算法详解
一些概念 True Predict True postive False postive 预测为正类 False negivate True negivate 预测为负类真实为 ...
第二十九节，目标检测算法之R-CNN算法详解
Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmenta ...
目标检测(三) Fast R-CNN
引言之前学习了 R-CNN 和 SPPNet,这里做一下回顾和补充. 问题 R-CNN 需要对输入进行resize变换,在对大量 ROI 进行特征提取时,需要进行卷积计算,而且由于 ROI 存在重复 ...
目标检测算法Faster R-CNN
一:Faster-R-CNN算法组成: 1.PRN候选框提取模块: 2.Fast R-CNN检测模块. 二:Faster-R-CNN框架介绍三:RPN介绍 3.1训练步骤:1.将图片输入到VGG或Z ...

随机推荐

密码学奇妙之旅、03 HMAC单向散列消息认证码、Golang代码
HMAC 单向散列消息认证码消息认证码MAC是用于确认完整性并进行认证的技术,消息认证码的输入包括任意长度的消息和一个发送者和接收者之间共享的密钥(可能还需要共享盐值). HMAC是使用单向散列函数 ...
Linux下多线程创建
1.pthread_create Linux中线程创建用pthread_create函数 #include <pthread.h> int pthread_create( pthread_ ...
POJ2282 The Counting Problem（数位DP）
用dp[pos][val][cnt]表示状态,pos是数位,val是当前统计的数字,cnt是目前统计的目标数字的出现次数注意状态的转移过程,统计数字0时前导0的影响. 1 #include<c ...
VMware vSphere 8.0 正式版下载
请访问原文链接:https://sysin.org/blog/vmware-vsphere-8/,查看最新版.原创作品,转载请保留出处. 作者主页:www.sysin.org vSphere 8.0 ...
如何去了解Spring
对于你想了解的技术官方总是一个合适的选择首先,我们所指的Spring 一般指的是Spring Framework,伴随着的时代的进步,Spring全家桶也逐渐完善起来 Spring 1.Why S ...
(Java初学篇)IDEA项目新建流程和软件配置优化以及怎么彻底删除项目
相信很多小伙伴们在初学 Java 时都会出现这样的情况,就是在网上一顿搜索加捣鼓终于把 JDK 和IDEA 这两款软件安装配置好,但是发现面对这个陌生的软件此时却无从下手,那么接下来我就给大家简单地介 ...
齐博x1当前URL标签
当前URL标签 {:get_url('location')} 当前URL的二维码标签 {:urls('index/qrcode/index')}?url={:urlencode(get_url('lo ...
ESP32 IDF 获取天气信息
一.注册天气获取账号我使用的知心天气,没有获取天气账号的小伙伴可以去注册一下,知心天气官网:https://www.seniverse.com/ 取得天气获取的API后,可以直接在浏览器中访问测试一 ...
创建外部表步骤及解决ORA-29913:执行ODCIETTABLEOPEN调出时出错
创建外部表步骤建立目录对象(用sys用户创建.授权) 外部表所在路径一定要写对!!! create directory ext_data as 'D:\ORACLE'; grant read,wri ...
HAProxy反向代理实例
1.环境准备: 设备 IP地址作用系统版本 web1 10.0.0.18 Nginx-Web服务器 Rocky8.6 web2 10.0.0.28 Nginx-Web服务器 Rocky8.6 Ha ...