目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)

首先贴上代码原作者的github:https://github.com/chenyuntc/simple-faster-rcnn-pytorch（非代码作者，博文只解释代码）

今天看完了simple-faster-rcnn-pytorch-master代码的最后一个train.py文件，是时候认真的总结一下了，我打算一共总结四篇博客用来详细的分析Faster-RCNN的代码的pytorch实现，四篇博客的内容及目录结构如下：

1 Faster-RCNN的数据读取及预处理部分：(对应于代码的/simple-faster-rcnn-pytorch-master/data文件夹)：https://www.cnblogs.com/kerwins-AC/p/9734381.html

2 Faster-RCNN的模型准备部分：(对应于代码目录/simple-faster-rcnn-pytorch-master/model/utils/文件夹)：https://www.cnblogs.com/kerwins-AC/p/9752679.html

3 Faster-RCNN的模型正式介绍：(对应于代码目录/simple-faster-rcnn-pytorch-master/model/文件夹)：尚未完成

4 Faster-RCNN的训练代码部分：(对应于代码目录/simple-faster-rcnn-pytorch-master/train.py,trainer.py代码)：https://www.cnblogs.com/kerwins-AC/p/9728731.html

本篇博客主要介绍代码的数据预处理部分的内容，对应于以下几个文件：

首先是dataset.py文件，我们用函数流程图看一下它的结构：

然后老规矩一个函数一个函数的分析它的内容和功能！

1 def inverse_normalize(img)函数代码如下：

 def inverse_normalize(img):

     if opt.caffe_pretrain:

         img = img + (np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1))

         return img[::-1, :, :]

     # approximate un-normalize for visualize

     return (img * 0.225 + 0.45).clip(min=0, max=1) * 255

inverse_normalize()

函数首先读取opt.caffe_pretrain判断是否使用caffe_pretrain进行预训练如果是的话，对图片进行逆正则化处理，就是将图片处理成caffe模型需要的格式

2 def pytorch_normalize(img) 函数代码如下：

 def pytorch_normalze(img):

     """

     https://github.com/pytorch/vision/issues/223

     return appr -1~1 RGB

     """

     normalize = tvtsf.Normalize(mean=[0.485, 0.456, 0.406],

                                 std=[0.229, 0.224, 0.225])

     img = normalize(t.from_numpy(img))

     return img.numpy()

pytorch_normalize

函数首先设置归一化参数normalize=tvtsf.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]) 然后对图片进行归一化处理img=normalize(t.from_numpy(img))

3 def caffe_normalize(img)函数代码如下：

 def caffe_normalize(img):

     """

     return appr -125-125 BGR

     """

     img = img[[2, 1, 0], :, :]  # RGB-BGR

     img = img * 255

     mean = np.array([122.7717, 115.9465, 102.9801]).reshape(3, 1, 1)

     img = (img - mean).astype(np.float32, copy=True)

     return img

caffe_normalize(img)

caffe的图片格式是BGR，所以需要img[[2,1,0],:,:]将RGB转换成BGR的格式，然后图片img = img*255 , mean = np.array([122.7717,115.9465,102.9801]).reshape(3,1,1)设置图片均值

然后用图片减去均值完成caffe形式的归一化处理

4 def preprocess(img, min_size=600, max_size=1000)函数代码如下：

 def preprocess(img, min_size=600, max_size=1000):

     """Preprocess an image for feature extraction.

     The length of the shorter edge is scaled to :obj:`self.min_size`.

     After the scaling, if the length of the longer edge is longer than

     :param min_size:

     :obj:`self.max_size`, the image is scaled to fit the longer edge

     to :obj:`self.max_size`.

     After resizing the image, the image is subtracted by a mean image value

     :obj:`self.mean`.

     Args:

         img (~numpy.ndarray): An image. This is in CHW and RGB format.

             The range of its value is :math:`[0, 255]`.

     Returns:

         ~numpy.ndarray: A preprocessed image.

     """

     C, H, W = img.shape

     scale1 = min_size / min(H, W)

     scale2 = max_size / max(H, W)

     scale = min(scale1, scale2)

     img = img / 255.

     img = sktsf.resize(img, (C, H * scale, W * scale), mode='reflect',anti_aliasing=False)

     # both the longer and shorter should be less than

     # max_size and min_size

     if opt.caffe_pretrain:

         normalize = caffe_normalize

     else:

         normalize = pytorch_normalze

     return normalize(img)

preprocess()

图片处理函数，C,H,W = img.shape 读取图片格式通道，高度，宽度

Scale1 = min_size/min(H,W)

Scale2 = max_size / max(H,W)

Scale = min(scale1,scale2)设置放缩比，这个过程很直觉，选小的方便大的和小的都能够放缩到合适的位置

img = img/ 255

img = sktsf.resize(img,(C,H*scale,W*scale),model='reflecct')将图片调整到合适的大小位于(min_size,max_size)之间、

然后根据opt.caffe_pretrain是否存在选择调用前面的pytorch正则化还是caffe_pretrain正则化

5 class Transform(object):代码如下

 class Transform(object):

     def __init__(self, min_size=600, max_size=1000):

         self.min_size = min_size

         self.max_size = max_size

     def __call__(self, in_data):

         img, bbox, label = in_data

         _, H, W = img.shape

         img = preprocess(img, self.min_size, self.max_size)

         _, o_H, o_W = img.shape

         scale = o_H / H

         bbox = util.resize_bbox(bbox, (H, W), (o_H, o_W))

         # horizontally flip

         img, params = util.random_flip(

             img, x_random=True, return_param=True)

         bbox = util.flip_bbox(

             bbox, (o_H, o_W), x_flip=params['x_flip'])

         return img, bbox, label, scale

Transform

__init__函数设置了图片的最小最大尺寸，本pytorch代码中min_size=600,max_size=1000

__call__函数中从in_data中读取 img,bbox,label 图片，bboxes的框框和label

然后从_,H,W = img.shape读取出图片的长和宽

img = preposses(img,self.min_size,self.max_size)将图片进行最小最大化放缩然后进行归一化

_,o_H,o_W = img.shape 读取放缩后图片的shape

scale = o_H/H 放缩前后相除，得出放缩比因子

bbox = util.reszie_bbox(bbox,(H,W),(o_H,o_W)) 重新调整bboxes框的大小

img,params = utils.random_flip(img.x_random =True,return_param=True)进行图片的随机反转，图片旋转不变性，增强网络的鲁棒性！

同样的对bboxes进行随机反转，最后返回img,bbox,label,scale

6 class Dataset 代码如下

 class Dataset:

     def __init__(self, opt):

         self.opt = opt

         self.db = VOCBboxDataset(opt.voc_data_dir)

         self.tsf = Transform(opt.min_size, opt.max_size)

     def __getitem__(self, idx):

         ori_img, bbox, label, difficult = self.db.get_example(idx)

         img, bbox, label, scale = self.tsf((ori_img, bbox, label))

         # TODO: check whose stride is negative to fix this instead copy all

         # some of the strides of a given numpy array are negative.

         return img.copy(), bbox.copy(), label.copy(), scale

     def __len__(self):

         return len(self.db)

class Dataset

__init__初始化设置self.opt =opt ,self.db = VOCBboxDataset(opt.voc_data_dir)以及self.tsf = Transform(opt.min_size,opt.max_size)

—getitem__可以简单的理解为从数据集存储路径中将例子一个个的获取出来，然后调用前面的Transform函数将图片,label进行最小值最大值放缩归一化，重新调整bboxes的大小，然后随机反转，最后将数据集返回！

7 class TestDataset 代码如下

 class TestDataset:

     def __init__(self, opt, split='test', use_difficult=True):

         self.opt = opt

         self.db = VOCBboxDataset(opt.voc_data_dir, split=split, use_difficult=use_difficult)

     def __getitem__(self, idx):

         ori_img, bbox, label, difficult = self.db.get_example(idx)

         img = preprocess(ori_img)

         return img, ori_img.shape[1:], bbox, label, difficult

     def __len__(self):

         return len(self.db)

TestDataset

TestData完成的功能和前面类似，但是获取调用的数据集是不同的，因为def __init__(self,opt,split='test',use_difficult=True)可以看到它在从Voc_data_dir中获取数据的时候使用了split='test'也就是从test往后分割的部分数据送入到TestDataset的self.db中，然后在进行图片处理的时候，并没有调用transform函数，因为测试图片集没有bboxes需要考虑，同时测试图片集也不需要随机反转，反转无疑为测试准确率设置了阻碍！所以直接调用preposses()函数进行最大值最小值裁剪然后归一化就完成了测试数据集的处理！最后将整个self.db返回，至此，dataset.py介绍完毕

目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)的更多相关文章

目标检测之Faster-RCNN的pytorch代码详解(模型训练篇)
本文所用代码gayhub的地址:https://github.com/chenyuntc/simple-faster-rcnn-pytorch (非本人所写,博文只是解释代码) 好长时间没有发博客了 ...
目标检测之Faster-RCNN的pytorch代码详解(模型准备篇)
十月一的假期转眼就结束了,这个假期带女朋友到处玩了玩,虽然经济仿佛要陷入危机,不过没关系,要是吃不上饭就看书,吃精神粮食也不错,哈哈!开个玩笑,是要收收心好好干活了,继续写Faster-RCNN的代码 ...
【目标检测】Faster RCNN算法详解
Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal network ...
目标检测算法Faster R-CNN
一:Faster-R-CNN算法组成: 1.PRN候选框提取模块: 2.Fast R-CNN检测模块. 二:Faster-R-CNN框架介绍三:RPN介绍 3.1训练步骤:1.将图片输入到VGG或Z ...
【目标检测】SSD+Tensorflow 300&512 配置详解
SSD_300_vgg和SSD_512_vgg weights下载链接[需要科学上网~]: Model Training data Testing data mAP FPS SSD-300 VGG-b ...
faster RCNN(keras版本)代码讲解(3)-训练流程详情
转载:https://blog.csdn.net/u011311291/article/details/81121519 https://blog.csdn.net/qq_34564612/artic ...
深度学习与CV教程(12) | 目标检测 (两阶段,R-CNN系列)
作者:韩信子@ShowMeAI 教程地址:http://www.showmeai.tech/tutorials/37 本文地址:http://www.showmeai.tech/article-det ...
非极大值抑制（NMS，Non-Maximum Suppression）的原理与代码详解
1.NMS的原理 NMS(Non-Maximum Suppression)算法本质是搜索局部极大值,抑制非极大值元素.NMS就是需要根据score矩阵和region的坐标信息,从中找到置信度比较高的b ...
Kaggle网站流量预测任务第一名解决方案：从模型到代码详解时序预测
Kaggle网站流量预测任务第一名解决方案:从模型到代码详解时序预测 2017年12月13日 17:39:11 机器之心V 阅读数:5931 近日,Artur Suilin 等人发布了 Kaggl ...

随机推荐

2018年暑假ACM个人训练题7 题解报告
A:HDU 1060 Leftmost Digit(求N^N的第一位数字 log10的巧妙使用) B:(还需要研究一下.....) C:HDU 1071 The area(求三个点确定的抛物线的面积, ...
课时91.CSS元素显示模式（掌握）
在HTML中HTML将所有的标签分为两类,分别是容器级和文本级在CSS中CSS也将所有的标签分为两类,分别是块级元素和行内元素 1.什么是块级元素,什么是行内元素? 块级元素会独占一行行内元素不会 ...
Struts2 第六讲 -- Struts2的结果类型
7.struts2的结果类型 l 每个 action 方法都将返回一个 String 类型的值, Struts 将根据这个值来决定响应什么结果. l 每个 Action 声明都必须包含有数量足够多的 ...
菜鸟笔记 -- Chapter 6 面向对象
在Java语言中经常被提到的两个词汇是类与对象,实质上可以将类看作是对象的载体,它定义了对象所具有的功能.学习Java语言必须要掌握类与对象,这样可以从深层次去理解Java这种面向对象语言的开发理念, ...
TCP套接字
端口的概念每个电脑一根网线,但是你挂着QQ的同时还可以浏览网页.两个不同应用的数据在同一根网线里是如何传输的呢?根据七层互联网模型,这个功能由运输层(TCP是运输层主要协议)实现.怎么实现呢,在网络 ...
CentOS 7安装Oracle (CentOS Linux release 7.5.1804)
从安装操作系统到完成oracle安装 1.安装centos7 下载CentOS7 iso安装包,配置虚拟机,由于只进行oracle安装练习,随便配置20G空间.选择安装文件. 开机,开始安装系统: 直 ...
数字三角形W
题目描述 Description 数字三角形要求走到最后mod 100最大输入描述 Input Description 第1行n,表示n行第2到n+1行为每个的权值输出描述 Output De ...
poj_2249_Binomial Showdown
In how many ways can you choose k elements out of n elements, not taking order into account? Write a ...
ASP.NET Web用户控件
用户控件可用来实现页面中可重用的代码,是可以一次编写就多处方便使用的功能块.它们是 ASP.NET控件封装最简单的形式.由于它们最简单,因此创建和使用它们也是最简单的.用户控件实际上是把已有的服务器控 ...
ABAP CDS ON HANA-(8)算術式
Arithmetic expression in CDS View Allowed Arithmetic operators in CDS view. CDS View- @AbapCatalog.s ...

目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)

目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)的更多相关文章

随机推荐

热门专题