pytorch实现yolov3(4) 非极大值抑制nms
在上一篇里我们实现了forward函数.得到了prediction.此时预测出了特别多的box以及各种class probability,现在我们要从中过滤出我们最终的预测box.
理解了yolov3的输出的格式及每一个位置的含义,并不难理解源码.我在阅读源码的过程中主要的困难在于对pytorch不熟悉,所以在这篇文章里,关于其中涉及的一些pytorch中的函数的用法我都已经用加粗标示了并且给出了相应的链接,测试代码等.
obj score threshold
我们设置一个obj score thershold,超过这个值的才认为是有效的.
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask
prediction是1*boxnum*boxattr
prediction[:,:,4]是1*boxnum 元素值为boxattr的index=4的那个值.
torch中的Tensor index和numpy是类似的,参看下列代码输出
import torch
x = torch.Tensor(1,3,10) # Create an un-initialized Tensor of size 2x3
print(x)
print(x.shape) # Print out the Tensor
y = x[:,:,4]
print(y)
print(y.shape)
z = x[:,:,4:6]
print(z)
print(z.shape)
print((y>0.5).float().unsqueeze(2))
#### 输出如下
tensor([[[2.5226e-18, 1.6898e-04, 1.0413e-11, 7.7198e-10, 1.0549e-08,
4.0516e-11, 1.0681e-05, 2.9575e-18, 6.7333e+22, 1.7591e+22],
[1.7184e+25, 4.3222e+27, 6.1972e-04, 7.2443e+22, 1.7728e+28,
7.0367e+22, 5.9018e-10, 2.6540e-09, 1.2972e-11, 5.3370e-08],
[2.7001e-06, 2.6801e-09, 4.1292e-05, 2.1511e+23, 3.2770e-09,
2.5125e-18, 7.7052e+31, 1.9447e+31, 5.0207e+28, 1.1492e-38]]])
torch.Size([1, 3, 10])
tensor([[1.0549e-08, 1.7728e+28, 3.2770e-09]])
torch.Size([1, 3])
tensor([[[1.0549e-08, 4.0516e-11],
[1.7728e+28, 7.0367e+22],
[3.2770e-09, 2.5125e-18]]])
torch.Size([1, 3, 2])
tensor([[[0.],
[0.],
[0.]]])
Squeeze and unsqueeze 降低维度,升高维度.
t = torch.ones(2,1,2,1) # Size 2x1x2x1
r = torch.squeeze(t) # Size 2x2
r = torch.squeeze(t, 1) # Squeeze dimension 1: Size 2x2x1
# Un-squeeze a dimension
x = torch.Tensor([1, 2, 3])
r = torch.unsqueeze(x, 0) # Size: 1x3 表示在第0个维度添加1维
r = torch.unsqueeze(x, 1) # Size: 3x1 表示在第1个维度添加1维
这样prediction中objscore<threshold的已经变成了0.
nms
tensor.new() 创建一个和原有tensor的dtype一致的新tensor https://stackoverflow.com/questions/49263588/pytorch-beginner-tensor-new-method
#得到box坐标(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
prediction[:,:,:4] = box_corner[:,:,:4]
原始的prediction中boxattr存放的是x,y,w,h,...,不方便我们处理,我们将其转换成(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
接下来我们挨个处理每一张图片对应的feature map.
batch_size = prediction.size(0)
write = False
for ind in range(batch_size):
#image_pred.shape=boxnum\*boxattr
image_pred = prediction[ind] #image Tensor box_num*box_attr
#confidence threshholding
#NMS
#返回每一行的最大值,及最大值所在的列.
max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
#升级成和image_pred同样的维度
max_conf = max_conf.float().unsqueeze(1)
max_conf_score = max_conf_score.float().unsqueeze(1)
seq = (image_pred[:,:5], max_conf, max_conf_score)
#沿着列的方向拼接. 现在image_pred变成boxnum\*7
image_pred = torch.cat(seq, 1)
这里涉及到torch.max的用法,参见https://blog.csdn.net/Z_lbj/article/details/79766690
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
按维度dim 返回最大值.可以这么记忆,沿着第dim维度比较.torch.max(0)即沿着行的方向比较,即得到每列的最大值.
假设input是二维矩阵,即行*列,行是第0维,列是第一维.
- torch.max(a,0) 返回每一列中最大值的那个元素,且返回索引(返回最大元素在这一列的行索引)
- torch.max(a,1) 返回每一行中最大值的那个元素,且返回其索引(返回最大元素在这一行的列索引)
c=torch.Tensor([[1,2,3],[6,5,4]])
print(c)
a,b=torch.max(c,1)
print(a)
print(b)
##输出如下:
tensor([[1., 2., 3.],
[6., 5., 4.]])
tensor([3., 6.])
tensor([2, 0])
torch.cat用法,参见https://pytorch.org/docs/stable/torch.html
torch.cat(tensors, dim=0, out=None) → Tensor
>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497]])
>>> torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497],
[ 0.6580, -1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497]])
>>> torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580,
-1.0969, -0.4614],
[-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034,
-0.5790, 0.1497]])
接下来我们只处理obj_score非0的数据(obj_score<obj_threshold转变为0)
non_zero_ind = (torch.nonzero(image_pred[:,4]))
try:
image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
except:
continue
#For PyTorch 0.4 compatibility
#Since the above code with not raise exception for no detection
#as scalars are supported in PyTorch 0.4
if image_pred_.shape[0] == 0:
continue
ok,接下来我们对每一种class做nms.
首先取到我们有哪些类别
#Get the various classes detected in the image
img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index
然后依次对每一种类别做处理
for cls in img_classes:
#perform NMS
#get the detections with one particular class
#取出当前class为当前class且class prob!=0的行
cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind].view(-1,7)
#sort the detections such that the entry with the maximum objectness
#confidence is at the top
#按照obj score从高到低做排序
conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
image_pred_class = image_pred_class[conf_sort_index]
idx = image_pred_class.size(0) #Number of detections
for i in range(idx):
#Get the IOUs of all boxes that come after the one we are looking at
#in the loop
try:
#计算第i个和其后每一行的的iou
ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
except ValueError:
break
except IndexError:
break
#Zero out all the detections that have IoU > treshhold
#把与第i行iou>nms_conf的认为是同一个目标的box,将其转成0
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
#把iou>nms_conf的移除掉
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image
seq = batch_ind, image_pred_class
其中计算iou的代码如下,不多解释了.iou=交叠面积/总面积
def bbox_iou(box1, box2):
"""
Returns the IoU of two bounding boxes
"""
#Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
#get the corrdinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
#Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)
#Union Area
b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area)
return iou
关于nms可以看下https://blog.csdn.net/shuzfan/article/details/52711706
tensor index操作用法如下:
image_pred_ = torch.Tensor([[1,2,3,4,9],[5,6,7,8,9]])
#print(image_pred_[:,-1] == 9)
has_9 = (image_pred_[:,-1] == 9)
print(has_9)
###执行顺序是(image_pred_[:,-1] == 9).float().unsqueeze(1) 再做tensor乘法
cls_mask = image_pred_*(image_pred_[:,-1] == 9).float().unsqueeze(1)
print(cls_mask)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind]
输出如下:
tensor([1, 1], dtype=torch.uint8)
tensor([[1., 2., 3., 4., 9.],
[5., 6., 7., 8., 9.]])
torch.sort用法如下:
d=torch.Tensor([[1,2,3],[6,5,4]])
e=d[:,2]
print(e)
print(torch.sort(e))
输出
tensor([3., 4.])
torch.return_types.sort(
values=tensor([3., 4.]),
indices=tensor([0, 1]))
总结一下我们做nms的流程
每一个image,会预测出N个detetction信息,包括4+1+C(4个坐标信息,1个obj score以及C个class probability)
- 首先过滤掉obj_score < confidence的行
- 每一行只取class probability最高的作为预测出来的类别
- 将所有的预测按照obj_score从大到小排序
- 循环每一种类别,开始做nms
- 比较第一个box与其后所有box的iou,删除iou>threshold的box,即剔除所有相似box
- 比较下一个box与其后所有box的iou,删除所有与该box相似的box
- 不断重复上述过程,直至不再有相似box
- 至此,实现了当前处理的类别的多个box均是独一无二的box.
write_results最终的返回值是一个n*8的tensor,其中8是(batch_index,4个坐标,1个objscore,1个class prob,一个class index)
def write_results(prediction, confidence, num_classes, nms_conf = 0.4):
print("prediction.shape=",prediction.shape)
#将obj_score < confidence的行置为0
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask
#得到box坐标(top-left corner x, top-left corner y, right-bottom corner x, right-bottom corner y)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
#修改prediction第三个维度的前四列
prediction[:,:,:4] = box_corner[:,:,:4]
batch_size = prediction.size(0)
write = False
for ind in range(batch_size):
#image_pred.shape=boxnum\*boxattr
image_pred = prediction[ind] #image Tensor
#confidence threshholding
#NMS
##取出每一行的class score最大的一个
max_conf_score,max_conf = torch.max(image_pred[:,5:5+ num_classes], 1)
max_conf = max_conf.float().unsqueeze(1)
max_conf_score = max_conf_score.float().unsqueeze(1)
seq = (image_pred[:,:5], max_conf_score, max_conf)
image_pred = torch.cat(seq, 1) #现在变成7列,分别为左上角x,左上角y,右下角x,右下角y,obj score,最大probabilty,相应的class index
print(image_pred.shape)
non_zero_ind = (torch.nonzero(image_pred[:,4]))
try:
image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
except:
continue
#For PyTorch 0.4 compatibility
#Since the above code with not raise exception for no detection
#as scalars are supported in PyTorch 0.4
if image_pred_.shape[0] == 0:
continue
#Get the various classes detected in the image
img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index
for cls in img_classes:
#perform NMS
#get the detections with one particular class
#取出当前class为当前class且class prob!=0的行
cls_mask = image_pred_*(image_pred_[:,-1] == cls).float().unsqueeze(1)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind].view(-1,7)
#sort the detections such that the entry with the maximum objectness
#confidence is at the top
#按照obj score从高到低做排序
conf_sort_index = torch.sort(image_pred_class[:,4], descending = True )[1]
image_pred_class = image_pred_class[conf_sort_index]
idx = image_pred_class.size(0) #Number of detections
for i in range(idx):
#Get the IOUs of all boxes that come after the one we are looking at
#in the loop
try:
#计算第i个和其后每一行的的iou
ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
except ValueError:
break
except IndexError:
break
#Zero out all the detections that have IoU > treshhold
#把与第i行iou>nms_conf的认为是同一个目标的box,将其转成0
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
#把iou>nms_conf的移除掉
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) #Repeat the batch_id for as many detections of the class cls in the image
seq = batch_ind, image_pred_class
if not write:
output = torch.cat(seq,1) #沿着列方向,shape 1*8
write = True
else:
out = torch.cat(seq,1)
output = torch.cat((output,out)) #沿着行方向 shape n*8
try:
return output
except:
return 0
pytorch实现yolov3(4) 非极大值抑制nms的更多相关文章
- 非极大值抑制(NMS)
非极大值抑制顾名思义就是抑制不是极大值的元素,搜索局部的极大值.这个局部代表的是一个邻域,邻域有两个参数可变,一个是邻域的维数,二是邻域的大小.这里不讨论通用的NMS算法,而是用于在目标检测中提取分数 ...
- 非极大值抑制(NMS)的几种实现
因为之前对比了RoI pooling的几种实现,发现python.pytorch的自带工具函数速度确实很慢,所以这里再对Faster-RCNN中另一个速度瓶颈NMS做一个简单对比试验. 这里做了四组对 ...
- Pytorch从0开始实现YOLO V3指南 part4——置信度阈值和非极大值抑制
本节翻译自:https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch ...
- 非极大值抑制(Non-Maximum Suppression,NMS)
概述 非极大值抑制(Non-Maximum Suppression,NMS),顾名思义就是抑制不是极大值的元素,可以理解为局部最大搜索.这个局部代表的是一个邻域,邻域有两个参数可变,一是邻域的维数,二 ...
- 目标检测 非极大值抑制(Non-Maximum Suppression,NMS)
非极大值抑制(Non-Maximum Suppression,NMS),顾名思义就是抑制不是极大值的元素,可以理解为局部最大搜索.也可以理解为只取置信度最高的一个识别结果. 举例:  如图所示,现在 ...
- 非极大值抑制(NMS)
转自:https://www.cnblogs.com/makefile/p/nms.html 概述 非极大值抑制(Non-Maximum Suppression,NMS),顾名思义就是抑制不是极大值的 ...
- 非极大值抑制Non-Maximum Suppression(NMS)
非极大值抑制(Non-Maximum Suppression,NMS) 概述 非极大值抑制(Non-Maximum Suppression,NMS),顾名思义就是抑制不是极大值的元素,可以理解为局 ...
- 输出预测边界框,NMS非极大值抑制
我们预测阶段时: 生成多个锚框 每个锚框预测类别和偏移量 但是,当同一个目标上可能输出较多的相似的预测边界框.我们可以移除相似的预测边界框.——NMS(非极大值抑制). 对于一个预测边界框B,模型会计 ...
- IoU与非极大值抑制(NMS)的理解与实现
1. IoU(区域交并比) 计算IoU的公式如下图,可以看到IoU是一个比值,即交并比. 在分子中,我们计算预测框和ground-truth之间的重叠区域: 分母是并集区域,或者更简单地说,是预测框和 ...
随机推荐
- Python 推断素数
a = raw_input() #输入数字 a = int(a) #铸造成int b=True #的标记 for i in range(2,a): #从2开始循环本身 if a%i==0: #除了自己 ...
- 谷歌推出备份新工具:Google Drive将同步计算机文件
Google 正在将云端硬盘 Drive 转变成更强大的文件备份工具.很快,Google Drive 将能监测并备份你电脑上的(几乎)所有文件,只要是你勾选的文档,Drive 就能同步至云端. 具体来 ...
- WPF DispatcherTimer(定时器应用) 无人触摸60s自动关闭窗口
原文:WPF DispatcherTimer(定时器应用) 无人触摸60s自动关闭窗口 如果无人触摸:60s自动关闭窗口 xmal:部分 <s:SurfaceWindow x:Class=&qu ...
- 将exe和dll文件打包成单一的启动文件
当我们用 VS 或其它编程工具生成了可执行exe要运行它必须要保证其目录下有一大堆dll库文件,看起来很不爽,用专业的安装程序生成软件又显得繁琐,下面这个方法教你如何快速把exe文件和dll文件打包成 ...
- 照片美妆---基于Haar特征的Adaboost级联人脸检测分类器
原文:照片美妆---基于Haar特征的Adaboost级联人脸检测分类器 本文转载自张雨石http://blog.csdn.net/stdcoutzyx/article/details/3484223 ...
- SQLServer 服务器架构迁移
原文:SQLServer 服务器架构迁移 最近服务器架构迁移,将原来的服务器架构迁移到新的服务器,新的服务器在硬件方面比之前更好!原来服务器使用双向同步,并且为水平划分到多个数据库服务器.迁移过程中, ...
- C#数据导出Excel详细介绍
概要: excel导出在C#代码中应用己经很广泛了,我这里就做些总结,供自己和读者学习用. Excel知识点.一.添加引用和命名空间 添加Microsoft.Office.Interop.Excel引 ...
- C# 中使用OPenCV(Emgu)心得
原文:C# 中使用OPenCV(Emgu)心得 首先介绍一下自己的情况,2010年的3月份开始接触学习C#编程,之前C#和OpenCV都是零基础,由于全都是自学进度比较慢,中间也走了不少弯路.进过三个 ...
- 【Python】:用python做下百度2014笔试题
国庆节最后一天,明天就要上班了,闲来无事做做百度2014笔试题,好久没用过C++了,索性就用python简单的写一下,体验下题目难度.题目是从[大卫David]那里copy过来的. 1.给定任意一个正 ...
- GzipStream的简单使用压缩和解压
压缩和解压都需要用到三个流实例,分别是文件读取流.文件写入流.压缩流. 读取流和写入流有多种形式,压缩流就一种GzipStream. 不同的是对于压缩,是需要用文件写入流作为创建压缩流实例的参数, 压 ...