VOC数据集目标检测

最近在做与目标检测模型相关的工作,很多都要求VOC格式的数据集.

PASCAL VOC挑战赛（The PASCAL Visual Object Classes ）是一个世界级的计算机视觉挑战赛, PASCAL全称：Pattern Analysis, Statical Modeling and Computational Learning，是一个由欧盟资助的网络组织。很多模型都基于此数据集推出.比如目标检测领域的yolo,ssd等等.

voc数据集结构

看下目录结构

:~/git_projects/models/research/VOCdevkit/VOC2012$ tree -d

.

├── Annotations

├── ImageSets

│   ├── Action

│   ├── Layout

│   ├── Main

│   └── Segmentation

├── JPEGImages

├── SegmentationClass

└── SegmentationObject

JPEGImages

这个目录下存放的是图片数据.
Annotations下存放的是xml文件,描述了图片信息.

~/git_projects/models/research/VOCdevkit/VOC2012/Annotations$ cat 2012_004331.xml

<annotation>

	<filename>2012_004331.jpg</filename>

	<folder>VOC2012</folder>

	<object>

		<name>person</name>

		<actions>

			<jumping>1</jumping>

			<other>0</other>

			<phoning>0</phoning>

			<playinginstrument>0</playinginstrument>

			<reading>0</reading>

			<ridingbike>0</ridingbike>

			<ridinghorse>0</ridinghorse>

			<running>0</running>

			<takingphoto>0</takingphoto>

			<usingcomputer>0</usingcomputer>

			<walking>0</walking>

		</actions>

		<bndbox>

			<xmax>208</xmax>

			<xmin>102</xmin>

			<ymax>230</ymax>

			<ymin>25</ymin>

		</bndbox>

		<difficult>0</difficult>

		<pose>Unspecified</pose>

		<point>

			<x>155</x>

			<y>119</y>

		</point>

	</object>

	<segmented>0</segmented>

	<size>

		<depth>3</depth>

		<height>375</height>

		<width>500</width>

	</size>

	<source>

		<annotation>PASCAL VOC2012</annotation>

		<database>The VOC2012 Database</database>

		<image>flickr</image>

	</source>

</annotation>

对应的图片为

我们注意需要关注的就是节点下的数据,尤其是bndbox下的数据.xmin,ymin构成了boundingbox的左上角,xmax,ymax构成了boundingbox的右下角.

啥叫boundingbox? 模型检测出目标了,会画一个框框,标定这个框框内的东西,认为是一个object.

3. ImageSets

Action下存放的是人的动作（例如running、jumping等等，这也是VOC challenge的一部分）
Layout下存放的是具有人体部位的数据（人的head、hand、feet等等，这也是VOC challenge的一部分）
Segmentation下存放的是可用于分割的数据。
Main下存放的是图像物体识别的数据，总共分为20类。

我们主要关注Main下面的文件.

一共63个文件,train.txt/val.txt/trainval.txt里面记录的是对应的数据集图片名字. 剩下60个文件=20*3. 一共20个类别,每个类别有xxx_train.txt,xxx_val.txt,xxx_trainval.txt.

1代表正样本,-1代表负样本

看一下aeroplane_train.txt中的部分内容

2011_003177  1    //意思是2011_003177.jpg中有aeroplane

2011_003183 -1    //意思是2011_003183.jpg中没有aeroplane

2011_003184 -1

2011_003187 -1

2011_003188 -1

2011_003192 -1

2011_003194 -1

2011_003216 -1

2011_003223 -1

2011_003230 -1

2011_003236 -1

2011_003238 -1

2011_003246 -1

2011_003247 -1

2011_003253 -1

2011_003255 -1

2011_003259 -1

2011_003274 -1

看一下train.txt中的内容  只含图片名称

2011_003187

2011_003188

2011_003192

2011_003194

2011_003216

2011_003223

2011_003230

2011_003236

2011_003238

制作自己的voc数据集

数据准备
标定图片:生成label文件,文件内容为类别及boundingbox信息
生成符合VOC格式要求的文件主要是Annotations/.xml ImageSets/main/.txt

数据准备这一步,你的数据可能来自公开数据集,或者合作方的私有数据.

数据集的标注这一步可以使用labelIImg 标注自己的图片https://github.com/tzutalin/labelImg

在做数据集格式转换的过程里,不可避免的要写很多脚本,每个人的需求不同,转换前拿到的文件内的数据格式不同,需要的脚本也都有所差异.这里提供几个我自己用的脚本.

#数据集划分

import os

import random

root_dir='./park_voc/VOC2007/'

## 0.7train 0.1val 0.2test

trainval_percent = 0.8

train_percent = 0.7

xmlfilepath = root_dir+'Annotations'

txtsavepath = root_dir+'ImageSets/Main'

total_xml = os.listdir(xmlfilepath)

num = len(total_xml)  # 100

list = range(num)

tv = int(num*trainval_percent)  # 80

tr = int(tv*train_percent)  # 80*0.7=56

trainval = random.sample(list, tv)

train = random.sample(trainval, tr)

ftrainval = open(root_dir+'ImageSets/Main/trainval.txt', 'w')

ftest = open(root_dir+'ImageSets/Main/test.txt', 'w')

ftrain = open(root_dir+'ImageSets/Main/train.txt', 'w')

fval = open(root_dir+'ImageSets/Main/val.txt', 'w')

for i in list:

    name = total_xml[i][:-4]+'\n'

    if i in trainval:

        ftrainval.write(name)

        if i in train:

            ftrain.write(name)

        else:

            fval.write(name)

    else:

        ftest.write(name)

ftrainval.close()

ftrain.close()

fval.close()

ftest .close()

#.txt-->.xml

#! /usr/bin/python

# -*- coding:UTF-8 -*-

import os, sys

import glob

from PIL import Image

# VEDAI 图像存储位置

src_img_dir = "/home/train/dataset-expand/park_voc/VOC2007/JPEGImages"

# VEDAI 图像的 ground truth 的 txt 文件存放位置

src_txt_dir = "/home/train/dataset-expand/label_expand"

src_xml_dir = "/home/train/dataset-expand/park_voc/VOC2007/Annotations"

img_Lists = glob.glob(src_img_dir + '/*.jpg')

img_basenames = [] # e.g. 100.jpg

for item in img_Lists:

    img_basenames.append(os.path.basename(item))

img_names = [] # e.g. 100

for item in img_basenames:

    temp1, temp2 = os.path.splitext(item)

    img_names.append(temp1)

for img in img_names:

    im = Image.open((src_img_dir + '/' + img + '.jpg'))

    width, height = im.size

    # open the crospronding txt file

    gt = open(src_txt_dir + '/' + img.replace('img','label',1) + '.txt').read().splitlines()

    #gt = open(src_txt_dir + '/gt_' + img + '.txt').read().splitlines()

    # write in xml file

    #os.mknod(src_xml_dir + '/' + img + '.xml')

    xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')

    xml_file.write('<annotation>\n')

    xml_file.write('    <folder>VOC2007</folder>\n')

    xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')

    xml_file.write('    <size>\n')

    xml_file.write('        <width>' + str(width) + '</width>\n')

    xml_file.write('        <height>' + str(height) + '</height>\n')

    xml_file.write('        <depth>3</depth>\n')

    xml_file.write('    </size>\n')

    # write the region of image on xml file

    for img_each_label in gt:

        spt = img_each_label.split(',') #这里如果txt里面是以逗号‘，’隔开的，那么就改为spt = img_each_label.split(',')。

        xml_file.write('    <object>\n')

        xml_file.write('        <name>' + str(spt[4]) + '</name>\n')

        xml_file.write('        <pose>Unspecified</pose>\n')

        xml_file.write('        <truncated>0</truncated>\n')

        xml_file.write('        <difficult>0</difficult>\n')

        xml_file.write('        <bndbox>\n')

        xml_file.write('            <xmin>' + str(spt[0]) + '</xmin>\n')

        xml_file.write('            <ymin>' + str(spt[1]) + '</ymin>\n')

        xml_file.write('            <xmax>' + str(spt[2]) + '</xmax>\n')

        xml_file.write('            <ymax>' + str(spt[3]) + '</ymax>\n')

        xml_file.write('        </bndbox>\n')

        xml_file.write('    </object>\n')

    xml_file.write('</annotation>')

目标检测判断标准

今天先不写了,待补充.