proposal_layer.py层解读

proposal_layer层是利用训练好的rpn网络来生成region proposal供fast rcnn使用。

proposal_layer整个处理过程：1.生成所有的anchor，对anchor进行4个坐标变换生成新的坐标变成proposals（按照老方法先在最后一层feature map的每个像素点上滑动生成所有的anchor，然后将所有的anchor坐标乘以16，即映射到原图就得到所有的region proposal，接着再用boundingbox regression对每个region proposal进行坐标变换生成更优的region proposal坐标，也是最终的region proposal坐标）　　2.处理掉所有坐标超过了图像边界的proposal　　3.处理掉所有长度宽度小于min_size的proposal　　4.把所有的proposal按score高低进行排序　　5.选择得分前pre_nms_topN的proposal，这是在进行nms前进行一次选择　　6.进行nms处理　　7.选择得分前post_nms_topN的proposal，这是在进行nms后进行的一次选择　　最终就得到了需要传入fast rcnn网络的region proposal。

# --------------------------------------------------------

# Faster R-CNN

# Copyright (c) 2015 Microsoft

# Licensed under The MIT License [see LICENSE for details]

# Written by Ross Girshick and Sean Bell

# --------------------------------------------------------

import caffe

import numpy as np

import yaml

from fast_rcnn.config import cfg

from generate_anchors import generate_anchors

from fast_rcnn.bbox_transform import bbox_transform_inv, clip_boxes

from fast_rcnn.nms_wrapper import nms

DEBUG = False

class ProposalLayer(caffe.Layer):

    """

    Outputs object detection proposals by applying estimated bounding-box

    transformations to a set of regular boxes (called "anchors").

    """

    def setup(self, bottom, top):

        # parse the layer parameter string, which must be valid YAML

        layer_params = yaml.load(self.param_str_)

        self._feat_stride = layer_params['feat_stride']

        anchor_scales = layer_params.get('scales', (8, 16, 32))

        self._anchors = generate_anchors(scales=np.array(anchor_scales))

        self._num_anchors = self._anchors.shape[0]

        if DEBUG:

            print 'feat_stride: {}'.format(self._feat_stride)

            print 'anchors:'

            print self._anchors

        # rois blob: holds R regions of interest, each is a 5-tuple

        # (n, x1, y1, x2, y2) specifying an image batch index n and a

        # rectangle (x1, y1, x2, y2)

        top[0].reshape(1, 5)

        # scores blob: holds scores for R regions of interest

        if len(top) > 1:

            top[1].reshape(1, 1, 1, 1)

    def forward(self, bottom, top):

        # Algorithm:

        #

        # for each (H, W) location i

        #   generate A anchor boxes centered on cell i

        #   apply predicted bbox deltas at cell i to each of the A anchors

        # clip predicted boxes to image

        # remove predicted boxes with either height or width < threshold

        # sort all (proposal, score) pairs by score from highest to lowest

        # take top pre_nms_topN proposals before NMS

        # apply NMS with threshold 0.7 to remaining proposals

        # take after_nms_topN proposals after NMS

        # return the top proposals (-> RoIs top, scores top)

        assert bottom[0].data.shape[0] == 1, \

            'Only single item batches are supported'

        cfg_key = str(self.phase) # either 'TRAIN' or 'TEST'

        pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N　　　　　　　　 #这是在进行nms处理前，从anchor中筛选出前topn个

        post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N　　　　　　　　#这是经过nms处理后，从anchor中筛选出钱topn个

        nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH

        min_size      = cfg[cfg_key].RPN_MIN_SIZE

        # the first set of _num_anchors channels are bg probs

        # the second set are the fg probs, which we want

        scores = bottom[0].data[:, self._num_anchors:, :, :]

        bbox_deltas = bottom[1].data　　　　　　　　　　　　　　　　　　　　#和anchor_target_layer层一样，获得训练得到4个变化值

        im_info = bottom[2].data[0, :]

        if DEBUG:

            print 'im_size: ({}, {})'.format(im_info[0], im_info[1])

            print 'scale: {}'.format(im_info[2])

        # 1. Generate proposals from bbox deltas and shifted anchors

        height, width = scores.shape[-2:]　　　　　　　　　　　　　　　　　　#这里和anchor_target_layer层一样，都是通过rpn_cls_score得到最后一层特征提取层的长度和宽度if DEBUG:

            print 'score map size: {}'.format(scores.shape)

        # Enumerate all shifts

        shift_x = np.arange(0, width) * self._feat_stride

        shift_y = np.arange(0, height) * self._feat_stride

        shift_x, shift_y = np.meshgrid(shift_x, shift_y)

        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),

                            shift_x.ravel(), shift_y.ravel())).transpose()

        # Enumerate all shifted anchors:

        #

        # add A anchors (1, A, 4) to

        # cell K shifts (K, 1, 4) to get

        # shift anchors (K, A, 4)

        # reshape to (K*A, 4) shifted anchors

        A = self._num_anchors

        K = shifts.shape[0]

        anchors = self._anchors.reshape((1, A, 4)) + \

                  shifts.reshape((1, K, 4)).transpose((1, 0, 2))

        anchors = anchors.reshape((K * A, 4))　　　　　　　　　　　　　　　　　　　　　　　　#和anchor_target_layer层一样，得到所有的anchor坐标值，并且形状是4列多行　

        # Transpose and reshape predicted bbox transformations to get them

        # into the same order as the anchors:

        #

        # bbox deltas will be (1, 4 * A, H, W) format

        # transpose to (1, H, W, 4 * A)

        # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)

        # in slowest to fastest order

        bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))　　　　　#将bbox_deltas的shape改成和anchors一样，方便下面运算

        # Same story for the scores:

        #

        # scores are (1, A, H, W) format

        # transpose to (1, H, W, A)

        # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)

        scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))　　　　　　　　　　　#将scores的shape也变成4列多行

        # Convert anchors into proposals via bbox transformations

        proposals = bbox_transform_inv(anchors, bbox_deltas)　　　　　　　　　　　　　 #通过bbox_deltas将anchors转成proposals，

        # 2. clip predicted boxes to image

        proposals = clip_boxes(proposals, im_info[:2])

        # 3. remove predicted boxes with either height or width < threshold

        # (NOTE: convert min_size to input image scale stored in im_info[2])

        keep = _filter_boxes(proposals, min_size * im_info[2])

        proposals = proposals[keep, :]

        scores = scores[keep]

        # 4. sort all (proposal, score) pairs by score from highest to lowest

        # 5. take top pre_nms_topN (e.g. 6000)

        order = scores.ravel().argsort()[::-1]

        if pre_nms_topN > 0:

            order = order[:pre_nms_topN]

        proposals = proposals[order, :]

        scores = scores[order]

        # 6. apply nms (e.g. threshold = 0.7)

        # 7. take after_nms_topN (e.g. 300)

        # 8. return the top proposals (-> RoIs top)

        keep = nms(np.hstack((proposals, scores)), nms_thresh)

        if post_nms_topN > 0:

            keep = keep[:post_nms_topN]

        proposals = proposals[keep, :]

        scores = scores[keep]

        # Output rois blob

        # Our RPN implementation only supports a single input image, so all

        # batch inds are 0

        batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)

        blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))

        top[0].reshape(*(blob.shape))

        top[0].data[...] = blob

        # [Optional] output scores blob

        if len(top) > 1:

            top[1].reshape(*(scores.shape))

            top[1].data[...] = scores

    def backward(self, top, propagate_down, bottom):

        """This layer does not propagate gradients."""

        pass

    def reshape(self, bottom, top):

        """Reshaping happens during the call to forward."""

        pass

def _filter_boxes(boxes, min_size):

    """Remove all boxes with any side smaller than min_size."""

    ws = boxes[:, 2] - boxes[:, 0] + 1

    hs = boxes[:, 3] - boxes[:, 1] + 1

    keep = np.where((ws >= min_size) & (hs >= min_size))[0]

    return keep

这是这一层的prototxt

layer {

  name: 'proposal'

  type: 'Python'

  bottom: 'rpn_cls_prob_reshape'

  bottom: 'rpn_bbox_pred'

  bottom: 'im_info'

  top: 'rois'

  top: 'scores'

  python_param {

    module: 'rpn.proposal_layer'

    layer: 'ProposalLayer'

    param_str: "'feat_stride': 16"

  }

}

可以看到，bottom[1]就是rpn_bbox_pred

所以上面代码中的bbox_deltas = bottom[1].data就是训练得到的坐标的4个变化值。因为训练rpn网络，本身训练的就是这4个变化值，而不是直接的4个坐标值。

 # bbox deltas will be (1, 4 * A, H, W) format

        # transpose to (1, H, W, 4 * A)

        # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)

        # in slowest to fastest order

        bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))

代码中的这一部分必须理解一下。实际上，bbox deltas，也就是要学习的那4个变换值。首先必须知道的是，这4个变换值是训练学习来的，是由卷积训练来的，来自于rpn_bbox_pred这一层，他是一个feature map， shape是（4×anchor个数，h，w）。如何将这个feature map和生成的anchor进行变换，首先必须shape一样才能加或者其他运算。所以，这里所做的就是将bbox deltas的shape变成了和anchors一样的4列多行，4列就代表着x,y,w,h。

注意：无论是anchors还是bbox deltas，还是scores，他们的shape都是多行4列，排列的顺序都是（h，w，a），即第一行是h，w，a，第二行是h+1，w，a，当h排完了，再排w的变换，最后才是a

proposal_layer.py层解读的更多相关文章

【Android】Sensor框架Framework层解读
Sensor整体架构整体架构说明黄色部分表示硬件,它要挂在I2C总线上红色部分表示驱动,驱动注册到Kernel的Input Subsystem上,然后通过Event Device把Sensor数 ...
【Android】Sensor框架HAL层解读
Android sensor构建 Android4.1 系统内置对传感器的支持达13种,他们分别是:加速度传感器(accelerometer).磁力传感器(magnetic field).方向传感器( ...
anchor_target_layer层解读
总结下来,用generate_anchors产生多种坐标变换,这种坐标变换由scale和ratio来,相当于提前计算好.anchor_target_layer先计算的是从feature map映射到原 ...
django setting.py配置文件解读-02
定义项目目录常量 import os # Build paths inside the project like this: os.path.join(BASE_DIR, ...) BASE_DIR ...
caffe层解读系列-softmax_loss
转自:http://blog.csdn.net/shuzfan/article/details/51460895 Loss Function softmax_loss的计算包含2步: (1)计算sof ...
slover层解读
void Solver<Dtype>::UpdateSmoothedLoss(Dtype loss, int start_iter, int average_loss) { if (los ...
caffe层解读-softmax_loss
转自https://blog.csdn.net/shuzfan/article/details/51460895. Loss Function softmax_loss的计算包含2步: (1)计算so ...
如何在Windows下用cpu模式跑通py-faster-rcnn 的demo.py
关键字:Windows.cpu模式.Python.faster-rcnn.demo.py 声明:本篇blog暂时未经二次实践验证,主要以本人第一次配置过程的经验写成.计划在7月底回家去电脑城借台机子试 ...
caffe︱ImageData层、DummyData层作为原始数据导入的应用
Part1:caffe的ImageData层 ImageData是一个图像输入层,该层的好处是,直接输入原始图像信息就可以导入分析. 在案例中利用ImageData层进行数据转化,得到了一批数据. 但 ...

随机推荐

POI中HSSF和XSSF操作Excel
POI中HSSF和XSSF操作Excel 在公司实习快一个月了,这段时间公司业务要用JAVA操作复杂的Excel报表.刚开始的Excel还好,没有涉及到复杂的图表,所以使用JXL操作Excel,但 ...
BZOJ2283: [Sdoi2011]火星移民
Description 在2xyz年,人类已经移民到了火星上.由于工业的需要,人们开始在火星上采矿.火星的矿区是一个边长为N的正六边形,为了方便规划,整个矿区被分为6*N*N个正三角形的区域(如图1) ...
RobotFramework模拟手机浏览器
转自 http://blog.csdn.net/max229max/article/details/70808867 感谢max bai提供的思路 Python - Selenium Chrome 模 ...
h5 页面设计尺寸
请注意:(以下所有讨论内容和规范均将viewport设定为content=”width=device-width”的情况下) 也就是我们的H5页面前端代码里面必须包含 <meta content ...
bzoj3769
树形dp %%%popoqqq 设dp[i][j]表示当前i个节点的树,深度小于等于j的树的个数那么dp[i][j] = sigma(dp[k][j-1]*dp[n-k-1][j-1]) 比较好理解 ...
VS2012上添加SharePoint2013模板,SharePoint2013 Tool安装配置
今天需要在SharePoint2013上做开发,但是安装的VS2012默认只有sharepoint2010的模板,因此需要安装配置好,这里我们通过Web平台安装程序4.0来配置的 Web 平台安装程序 ...
DoDataExchange的作用
void CDlgSelectCS::DoDataExchange(CDataExchange* pDX) { CDialog::DoDataExchange(pDX); DDX_Te ...
2-11 tensorflow常量变量定义
D:\Users\ZHONGZHENHUA\Anaconda3\envs\tensorflow\Lib\site-packages\tensorflow https://developer.nvidi ...
（转）Repeater中增加序号自增列
<%# Convert.ToString(Container.ItemIndex+)%> 当Repeater空为时,提示没有数据... <FooterTemplate> < ...
HDU 2914 Triangle (Fibnacci 数)
题意:给你一个长度为 n 的木棒,求至少拿掉几根使得剩余的木棒构成不了三角形. 析:为了保证不形成三角形,所以保证两边之和等于最大边是最优,这不就是Fibnacci 数么,由于 n 很小,if-els ...

proposal_layer.py层解读

proposal_layer.py层解读的更多相关文章

随机推荐

热门专题