faster-rcnn中ROI_POOIING层的解读

在没有出现sppnet之前，RCNN使用corp和warp来对图片进行大小调整，这种操作会造成图片信息失真和信息丢失。sppnet这个模型推出来之后（关于这个网络的描述，可以看看之前写的一篇理解：http://www.cnblogs.com/gongxijun/p/7172134.html），rg大神沿用了sppnet的思路到他的下一个模型中fast-rcnn中，但是roi_pooling和sppnet的思路虽然相同，但是实现方式还是不同的.我们看一下网络参数:

layer {

name: "roi_pool5"

type: "ROIPooling"

bottom: "conv5_3"

bottom: "rois"

top: "pool5"

roi_pooling_param {

pooled_w:

pooled_h:

spatial_scale: 0.0625 # /

}

结合源代码，作者借助了sppnet的空域金字塔pool方式，但是和sppnet并不同的是，作者在这里只使用了(pooled_w,pooled_h)这个尺度，来将得到的每一个特征图分成(pooled_w,pooled_h)，然后对每一块进行max_pooling取值，最后得到一个n*7*7固定大小的特征图。

 // ------------------------------------------------------------------

 // Fast R-CNN

 // Copyright (c) 2015 Microsoft

 // Licensed under The MIT License [see fast-rcnn/LICENSE for details]

 // Written by Ross Girshick

 // ------------------------------------------------------------------

 #include <cfloat>

 #include "caffe/fast_rcnn_layers.hpp"

 using std::max;

 using std::min;

 using std::floor;

 using std::ceil;

 namespace caffe {

 template <typename Dtype>

 void ROIPoolingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,

       const vector<Blob<Dtype>*>& top) {

   ROIPoolingParameter roi_pool_param = this->layer_param_.roi_pooling_param();

   CHECK_GT(roi_pool_param.pooled_h(), )

       << "pooled_h must be > 0";

   CHECK_GT(roi_pool_param.pooled_w(), )

       << "pooled_w must be > 0";

   pooled_height_ = roi_pool_param.pooled_h(); //定义网络的大小

   pooled_width_ = roi_pool_param.pooled_w();

   spatial_scale_ = roi_pool_param.spatial_scale();

   LOG(INFO) << "Spatial scale: " << spatial_scale_;

 }

 template <typename Dtype>

 void ROIPoolingLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,

       const vector<Blob<Dtype>*>& top) {

   channels_ = bottom[]->channels();

   height_ = bottom[]->height();

   width_ = bottom[]->width();

   top[]->Reshape(bottom[]->num(), channels_, pooled_height_,

       pooled_width_);

   max_idx_.Reshape(bottom[]->num(), channels_, pooled_height_,

       pooled_width_);

 }

 template <typename Dtype>

 void ROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

       const vector<Blob<Dtype>*>& top) {

   const Dtype* bottom_data = bottom[]->cpu_data();

   const Dtype* bottom_rois = bottom[]->cpu_data();//获取roidb信息(n,x1,y1,x2,y2)

   // Number of ROIs

   int num_rois = bottom[]->num();//候选目标的个数

   int batch_size = bottom[]->num();//特征图的维度,vgg16的conv5之后为512

   int top_count = top[]->count();//需要输出的值个数

   Dtype* top_data = top[]->mutable_cpu_data();

   caffe_set(top_count, Dtype(-FLT_MAX), top_data);

   int* argmax_data = max_idx_.mutable_cpu_data();

   caffe_set(top_count, -, argmax_data);

   // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R

   for (int n = ; n < num_rois; ++n) {

     int roi_batch_ind = bottom_rois[];

     int roi_start_w = round(bottom_rois[] * spatial_scale_);//缩小１６倍，将候选区域在原始坐标中的位置，映射到conv_5特征图上

     int roi_start_h = round(bottom_rois[] * spatial_scale_);

     int roi_end_w = round(bottom_rois[] * spatial_scale_);

     int roi_end_h = round(bottom_rois[] * spatial_scale_);

     CHECK_GE(roi_batch_ind, );

     CHECK_LT(roi_batch_ind, batch_size);

     int roi_height = max(roi_end_h - roi_start_h + , );//得到候选区域在特征图上的大小

     int roi_width = max(roi_end_w - roi_start_w + , );

     const Dtype bin_size_h = static_cast<Dtype>(roi_height)

                              / static_cast<Dtype>(pooled_height_);//计算如果需要划分成(pooled_height_，pooled_weight_)这么多块，那么每一个块的大小(bin_size_w,bin_size_h);

     const Dtype bin_size_w = static_cast<Dtype>(roi_width)

                              / static_cast<Dtype>(pooled_width_);

     const Dtype* batch_data = bottom_data + bottom[]->offset(roi_batch_ind);//获取当前维度的特征图数据，比如一共有(n,x1,x2,x3,x4)的数据，拿到第一块特征图的数据

     for (int c = ; c < channels_; ++c) {

       for (int ph = ; ph < pooled_height_; ++ph) {

         for (int pw = ; pw < pooled_width_; ++pw) {

           // Compute pooling region for this output unit:

           //  start (included) = floor(ph * roi_height / pooled_height_)

           //  end (excluded) = ceil((ph + 1) * roi_height / pooled_height_)

           int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)

                                               * bin_size_h)); //计算每一块的位置

           int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)

                                               * bin_size_w));

           int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + )

                                            * bin_size_h));

           int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + )

                                            * bin_size_w));

           hstart = min(max(hstart + roi_start_h, ), height_);

           hend = min(max(hend + roi_start_h, ), height_);

           wstart = min(max(wstart + roi_start_w, ), width_);

           wend = min(max(wend + roi_start_w, ), width_);

           bool is_empty = (hend <= hstart) || (wend <= wstart);

           const int pool_index = ph * pooled_width_ + pw;

           if (is_empty) {

             top_data[pool_index] = ;

             argmax_data[pool_index] = -;

           }

           for (int h = hstart; h < hend; ++h) {

             for (int w = wstart; w < wend; ++w) {

               const int index = h * width_ + w;

               if (batch_data[index] > top_data[pool_index]) {

                 top_data[pool_index] = batch_data[index];　//在取每一块中的最大值，就是max_pooling操作.

                 argmax_data[pool_index] = index;

               }

             }

           }

         }

       }

       // Increment all data pointers by one channel

       batch_data += bottom[]->offset(, );

       top_data += top[]->offset(, );

       argmax_data += max_idx_.offset(, );

     }

     // Increment ROI data pointer

     bottom_rois += bottom[]->offset();

   }

 }

 template <typename Dtype>

 void ROIPoolingLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,

       const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {

   NOT_IMPLEMENTED;

 }

 #ifdef CPU_ONLY

 STUB_GPU(ROIPoolingLayer);

 #endif

 INSTANTIATE_CLASS(ROIPoolingLayer);

 REGISTER_LAYER_CLASS(ROIPooling);

 }  // namespace caffe

进过以上的操作过后，就得到了固定大小的特征图啦，然后就可以进行全连接操作了. 但愿我说明白了.

－－－完.

faster-rcnn中ROI_POOIING层的解读的更多相关文章

对faster rcnn 中rpn层的理解
1.介绍图为faster rcnn的rpn层,接自conv5-3 图为faster rcnn 论文中关于RPN层的结构示意图 2 关于anchor: 一般是在最末层的 feature map 上再用 ...
BiLSTM-CRF模型中CRF层的解读
转自: https://createmomo.github.io/ BiLSTM-CRF模型中CRF层的解读: 文章链接: 标题:CRF Layer on the Top of BiLSTM - 1 ...
tensorflow object detection faster r-cnn 中keep_aspect_ratio_resizer是什么意思
如果小伙伴的英语能力强可以直接阅读这里:https://stackoverflow.com/questions/45137835/what-the-impact-of-different-dimens ...
AI佳作解读系列(二)——目标检测AI算法集杂谈：R-CNN，faster R-CNN，yolo，SSD，yoloV2，yoloV3
1 引言深度学习目前已经应用到了各个领域,应用场景大体分为三类:物体识别,目标检测,自然语言处理.本文着重与分析目标检测领域的深度学习方法,对其中的经典模型框架进行深入分析. 目标检测可以理解为是物 ...
Domain Adaptive Faster R-CNN：经典域自适应目标检测算法，解决现实中痛点，代码开源 | CVPR2018
论文从理论的角度出发,对目标检测的域自适应问题进行了深入的研究,基于H-divergence的对抗训练提出了DA Faster R-CNN,从图片级和实例级两种角度进行域对齐,并且加入一致性正则化来学 ...
【深度学习】目标检测算法总结（R-CNN、Fast R-CNN、Faster R-CNN、FPN、YOLO、SSD、RetinaNet）
目标检测是很多计算机视觉任务的基础,不论我们需要实现图像与文字的交互还是需要识别精细类别,它都提供了可靠的信息.本文对目标检测进行了整体回顾,第一部分从RCNN开始介绍基于候选区域的目标检测器,包括F ...
利用FPN构建Faster R-CNN检测
FPN就是所谓的金字塔结构的检测器,(Feature Pyramid Network) 把FPN融合到Faster rcnn中能够很大程度增加检测器对全图信息的认知, 步骤如图所示: 1.先将图像送入 ...
第三十一节，目标检测算法之 Faster R-CNN算法详解
Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal network ...
faster rcnn 做识别
faster rcnn 主要分为四个部分: 1. convolutional part: 特征提取可以使用vgg,resnet 等等 2.region proposal network: 生成 re ...

随机推荐

201521123056 《Java程序设计》第9周学习总结
1. 本周学习总结 1.1 思维导图如下: 2. 书面作业本次PTA作业题集异常 1. 常用异常题目5-1 1.1 截图你的提交结果(出现学号) 1.2 自己以前编写的代码中经常出现什么异常.需要 ...
201521123038 《Java程序设计》第十一周学习总结
201521123038 <Java程序设计> 第十一周学习总结 1. 本周学习总结 1.1 以你喜欢的方式(思维导图或其他)归纳总结多线程相关内容. 2. 书面作业本次PTA作业题集多 ...
201521123085 《Java程序设计》第14周学习总结
1. 本周学习总结 1.1 以你喜欢的方式(思维导图或其他)归纳总结多数据库相关内容. 2. 书面作业 1. MySQL数据库基本操作建立数据库,将自己的姓名.学号作为一条记录插入.(截图,需出现自 ...
201521123087 《Java程序设计》第11周学习总结
1. 本周学习总结 2. 书面作业本次PTA作业题集多线程互斥访问与同步访问完成题集4-4(互斥访问)与4-5(同步访问)1.1 除了使用synchronized修饰方法实现互斥同步访问,还有什么 ...
201521123014 《Java程序设计》第11周学习总结
1. 本周学习总结 1.1 以你喜欢的方式(思维导图或其他)归纳总结多线程相关内容. 2. 书面作业 Q1 互斥访问与同步访问完成题集4-4(互斥访问)与4-5(同步访问) 1.1 除了使用sync ...
video标签
Video标签的使用 Video标签含有src.poster.preload.autoplay.loop.controls.width.height等几个属性, 以及一个内部使用的标签<sour ...
centOS7网络配置（nmcli，bonding，网络组）
关于网络接口命名 CentOS 6之前,网络接口使用连续号码命名: eth0. eth1等,当增加或删除网卡时,名称可能会发生变化.CentOS 7使用基于硬件,设备拓扑和设置类型命名. 网卡命名机制 ...
idea下使用autowire注解注入对象，结果初始化不到类
如果idea下使用autowire注解注入对象,结果初始化不到类,明明使用快捷键alt+insert是可以找到该注入的对象的. 而我们在使用的时候,缺报错了??? 注意,当我们在注入对象的时候,我们留 ...
Intellij IDEA WEB结构目录说明【转载】
https://my.oschina.net/lujianing/blog/186737?p=1#OSC_h2_1
TestNG操作详解
运行测试步骤方法有如下两种: 1. 直接在Eclipse运行testNG的测试用例, 在代码编辑区域鼠标右键, 选择Run as ->testNG Test 2. 在工程的根目录下, 建立tes ...

faster-rcnn中ROI_POOIING层的解读

faster-rcnn中ROI_POOIING层的解读的更多相关文章

随机推荐

热门专题