faster rcnn训练详解
http://blog.csdn.net/zy1034092330/article/details/62044941
py-faster-rcnn训练自己的数据:流程很详细并附代码
https://huangying-zhan.github.io/2016/09/22/detection-faster-rcnn
Summary
This post records my experience with py-faster-rcnn, including how to setup py-faster-rcnn from scratch, how to perform a demo training on PASCAL VOC dataset by py-faster-rcnn, how to train your own dataset, and some errors I encountered. All the steps are based on Ubuntu 14.04 + CUDA 8.0. Faster R-CNN is an important research result for object detection with an end-to-end deep convolutional neural network architure. For the details, please refer to original paper.
The source code related to adding own dataset is provided at: (https://github.com/Huangying-Zhan/py-faster-rcnn)[https://github.com/Huangying-Zhan/py-faster-rcnn]
Contents
Part 1. Setup py-faster-rcnn
In this part, a simple instruction for install py-faster-rcnn is introduced. The instruction mainly refers to py-faster-rcnn.
Clone the Faster R-CNN repo
# Make sure to clone with --recursive
$ git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
Lets call the directory as
$FRCNBuild the Cython modules
$ cd $FRCN/lib
$ make
Build Caffe and PyCaffe
For this part, please refer to Caffe official installation instruction or my post about Caffe installation. If you have experience with Caffe, just follow the instruction below.
$ cd $FRCN/caffe-fast-rcnn
$ cp Makefile.config.example Makefile.config # Modify Makefile.config, uncommment this line
WITH_PYTHON_LAYER := 1
# Modifiy Makefile.config according to your need, such as setup related to GPU support, cuDNN, CUDA version, Anaconda, OpenCV, etc. # After modification on Makefile.config
$ make all -j4 # -j4 is for complilation acceleration only. 4 is the number of core in your CPU, change it according to your computer CPU.
# Suppose you have installed prerequites for PyCaffe, otherwise, go back to the Caffe installation instructions.
$ make pycaffe -j4
Download pre-computed Faster R-CNN models
$ cd $FRCN
$ ./data/scripts/fetch_faster_rcnn_models.sh
Run the demo
However, in this part you might get into trouble with different errors, such as without some packages. At the end of this post, some encountered errors and solution are provided. For those unexpected error, google the error and you should be able to find a solution.
$ ./tools/demo.py
Part 2. Demo Training on PASCAL VOC
In this part, the training of py-faster-rcnn will be explained. Firstly, an original training procedure on PASCAL VOC dataset is provided. The purpose is to understand the structure of dataset and training steps.
2.1. Prepare dataset and Pre-trained model
Download VOC dataset
$ cd $FRCN/data
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar $ tar xvf VOCdevkit_08-Jun-2007.tar
$ tar xvf VOCtrainval_06-Nov-2007.tar
$ tar xvf VOCtest_06-Nov-2007.tar $ ln -s VOCdevkit VOCdevkit2007 #create a softlink
Download pre-trained models
$ cd $FRCN
$ ./data/scripts/fetch_imagenet_models.sh
$ ./data/scripts/fetch_faster_rcnn_models.sh
2.2. Training
There are 2 types of training methods provided by py-faster-rcnn. One is using the alternating optimization algrithm while another one is approximate joint training method. In this post, approximate joint training method is introduced. For the details, please refer to the paper, Faster R-CNN.
$ cd $FRCN
# ./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] [DATASET]
# Directly run this command might have an error "AssertionError: Selective search data not found at:". For the solution, please refer to Part 4.
$ ./experiments/scripts/faster_rcnn_end2end.sh 0 ZF pascal_voc
Here is a remark about the logic and idea behind the training script.
faster_rcnn_end2end.shThis is a shell script, which is the toppest layer of the whole pipeline, it monitors the input arguments, including GPU ID, network structure(ZF-Net, VGG, or others), dataset (PASCAL VOC, COCO or others), and extra configurations.
# Part of the script
GPU_ID=$1
NET=$2
NET_lc=${NET,,}
DATASET=$3 array=( $@ )
len=${#array[@]}
EXTRA_ARGS=${array[@]:3:$len}
EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_}
Then, it will call two programs,
train_net.pyand followed bytest_net.py. As the name given,train_net.pyis to train a model whiletest_net.pyis to evaluate performance of the trained model.# Part of the script
time ./tools/train_net.py --gpu ${GPU_ID} \
--solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt \
--weights data/imagenet_models/${NET}.v2.caffemodel \
--imdb ${TRAIN_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/faster_rcnn_end2end.yml \
${EXTRA_ARGS} set +x
NET_FINAL=`grep -B 1 "done solving" ${LOG} | grep "Wrote snapshot" | awk '{print $4}'`
set -x time ./tools/test_net.py --gpu ${GPU_ID} \
--def models/${PT_DIR}/${NET}/faster_rcnn_end2end/test.prototxt \
--net ${NET_FINAL} \
--imdb ${TEST_IMDB} \
--cfg experiments/cfgs/faster_rcnn_end2end.yml \
${EXTRA_ARGS}
faster_rcnn_end2end.ymlAs we can see from
faster_rcnn_end2end.sh,cfgcomes fromfaster_rcnn_end2end.yml, which means that this file stores many importatnt configurations. Here shows some original configurations provided.EXP_DIR: faster_rcnn_end2end
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
TEST:
HAS_RPN: True
However, if you wish to add your own configurations, such as number of iterations to take a model snapshot while training, you may refer to
$FRCN/lib/fast_rcnn/config.py. This file contains all the configuration parameters. You don’t need to set the configuration in thisconfig.pybut just add a statement infaster_rcnn_end2end.yml. The program can parse the arguments automatically. Of course there exists default values if you do not declare the items in the .yml file.# Example to add SNAPSHOT_ITERS into the configuration
EXP_DIR: faster_rcnn_end2end
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
SNAPSHOT_ITERS: 10000 # This line is an example to add arguments.
TEST:
HAS_RPN: True
train_net.pyBasically, there are 3 things included in the file.
# Read dataset
imdb, roidb = combined_roidb(args.imdb_name) # Pass configurations from `faster_rcnn_end2end.sh` and `faster_rcnn_end2end.yml` to lower layer programs/functions
# Call `fast_rcnn.train_net` for training
train_net(args.solver, roidb, output_dir, pretrained_model=args.pretrained_model, max_iters=args.max_iters)
combined_roidb&pascal_voc.pyRecall that in
faster_rcnn_end2end.sh. You have entered an argument,–imdb ${TRAIN_IMDB}
combined_roidbdo nothing but just trace back and read the datasets, such as train, val, and test using functions in$FRCN/lib/datasets/pascal_voc.py.fast_rcnn.train_netThis function is at
$FRCN/lib/fast_rcnn/train.pyThis function is the core of whole training pipeline since it callssolver.prototxt, but in fact you don’t need to care this part in most of the time.solver.prototxt&train.prototxtIf you are familiar with Caffe, you should know the purpose of
solver.prototxtandtrain.prototxt. Otherwise, you are suggested to go through Caffe’s MNIST tutorial. In here, the idea will be described briefly only.Basically,
solver.prototxttells the program where to find your ConvNet structure prototxt and some training setups, such as learning rate, learning policy, etc.train_net: "models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt"
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 50000
display: 20
average_loss: 100
momentum: 0.9
weight_decay: 0.0005
snapshot_prefix: "zf_faster_rcnn"
iter_size: 2
train.prototxtdescribes the network structure, including number of layer, type of layer, number of neurons in each layer, etc. Again, refer to Caffe’s MNIST tutorial in order to understandtrain.prototxt.
Part 3. Training on new dataset
In this part, basketball detection will be used as an example to illustrate how to train a new dataset using py-faster-rcnn.

3.1. Prepare dataset
The dataset used in this part is downloaded from ImageNet.
DISCLAIMER: This dataset should be only used for non-commercial research activities. Please follow the ImageNet rules about the use of the dataset.
Download dataset
Here provides a link to download Basketball Dataset. This dataset has the following structure.
|-- basketball
|-- JPEGImages
Contains all raw .JPEG images
|-- ImageSets
.txt files state training set, validataion set. Extension is not required in these files
|-- Annotations
Bounding boxes annotation for each image. The annotation files are written in .xml format.
# Unzip the folder
$ mv basketball.tar.gz $FRCN/data/
$ cd $FRCN/data
$ tar xzf basketball.tar.gz
Add a dataset python file
Add a
basketball.pyto$FRCN/lib/datasets/. You may check on the source code for reference. If you wish to modify this file, basically, you can just find and replace basketball by your new dataset name.Add
basketball_eval.pyAdd a
basketball_eval.pyto$FRCN/lib/datasets/. Again, check on the source code for reference. Again, find and replace basketabll.Update
/lib/datasets/factory.pyThe purpose of
basketball.pyis to read a part of whole dataset, such as train_set or val_set. The purpose of function infactory.pyis to get all sets of whole dataset.Add config file
As mentioned in Part 2.2 (2), we need a
config.ymlto store configurations. In here, we can use the originalfaster_rcnn_end2end.ymlas a reference. However, there are many configurations you can set in this file. In here, we may setEXP_DIRfirst and others if necessary.$ cd $FRCN/experiments/cfgs
$ cp faster_rcnn_end2end.yml config.yml
Update
imdb.pySince new dataset may have conflicts in annotation with original PASCAL VOC dataset. For example, ImageNet images start with index 0 in row and col while PASCAL VOC dataset starts with index 1. In
imdb.py, a part of code should be inserted inappend_flipped_images(). Refer to source code.
3.2. Prepare network and pre-trained model
To train our own model, basically we don’t need to train the model from scratch unless you have a huge dataset which is comparable to ImageNet. Otherwise, we can train our model from fine-tuning a pre-trained Faster R-CNN model. The reason is because a pre-trained Faster R-CNN contains a lot of good lower level features, which can be used generally. Even you are using a new and self-defined architure (i.e. no existing pre-trained Faster R-CNN model), follow the training method of Faster R-CNN and train a Faster R-CNN first, followed by fine-tuning on your own dataset is suggested.
For simplicity, the network and model adopted in this part is ZF-net and a pre-trained Faster R-CNN (ZF) respectively.
$ cd $FRCN/models
# copy a well-defined network and make modification based on it
$ mkdir basketball
$ cp ./pascal_voc/ZF/faster_rcnn_end2end/* ./basketball/
$ cd basketball
Now, we should modify all files in basketball/, including,
solver.prototxt- train_net
- snapshot_prefix
- Others if necessary
train.prototxt&test.prototxtFor
train.prototxtandval.prototxt, basically we need to update the number of output in final layers. Let’s say, in this basketball dataset, we only need 2 classes (background + basketball) and 8 output for bounding box regressor. Orignial pascal_voc have 21 classes including background and 21*4 bounding box regressor output.$ cd $FRCN/models/basketball
$ grep 21 *
$ grep 84 *
# These two commands help you to check the lines that you should modify in the files.
In this part, there are two more items we need to modify. Since we are fine-tuning a pre-trained ConvNet model on our own dataset and the number of output at last fully-connected layers (cls_score & bbox_pred) has been changed, the original weight in pre-trained ConvNet model is not suitable for our current network. The dimension is totally different. The details can be refered to Caffe’s fine-tuning tutorial. The solution is to rename the layers such that the weights for the layers will be initialized randomly instead of copying from pre-trained model (actually copying from pre-trained model will cause error).
name: "cls_score" -> name: "cls_score_basketball"
name: "bbox_pred" -> name: "bbox_pred_basketball"
However, renaming the layers may cause problems in later parts since “cls_score” and “bbox_pred” are used as keys in testing. Therefore, in the training part, we can train the model accroding to the following procedure.
- Rename the layers to cls_score_basketball and bbox_pred_basketball
- Fine-tune pre-trained Faster R-CNN (FRCN) model and snapshot at iteration 0. Let’s call the snapshot Basketball_0.caffemodel. Stop training.
- Rename the layers back to cls_score and bbox_pred.
- Fine-tune Basketball_0.caffemodel to get our final model.
The details and code will be explained in the following part.
3.3. Training and evaluation
Before training on your new dataset, you may need to check $FRCN/data/cache to remove caches if necessary. Caches stores information of previously trained dataset. It may cause problem while training.
Rename the layers
As mentions in the previous part, rename the two layers.
Reminder: if you are using find and replace, please find the name with quotes(i.e. “cls_score”). If you just search for cls_score, without quotes, it may also replace some other layers since there is a layer named rpn_cls_score.
First fine-tuning
The purpose of first fine-tuning is to get a caffemodel which has two outputs at final fully-connected layers.
$ ./tools/train_net.py --gpu 0 --weights data/faster_rcnn_models/ZF_faster_rcnn_final.caffemodel --imdb basketball_train --cfg experiments/cfgs/config.yml --solver models/basketball/solver.prototxt --iter 0
After this fine-tuning, we should get the model we needed.
Rename the layers back
Rename the two layers back to “cls_score” and “bbox_pred”.
Second fine-tuning
This fine-tuning should train models for our final use. The pre-trained model in this stage is the model we saved in stage 2.
$ ./tools/train_net.py --gpu 0 --weights output/basketball/train/zf_faster_rcnn_basketball_iter_0.caffemodel --imdb basketball_train --cfg experiments/cfgs/config.yml --solver models/basketball/solver.prototxt --iter 10000
Evaluation / Testing
To test the performance of trained model, we can use the provided
test_net.pyfor the purpose.$ ./tools/test_net.py --gpu 0 --def models/basketball/test.prototxt --net output/basketball/train/zf_faster_rcnn_basketball_iter_20000.caffemodel --imdb basketball_val --cfg experiments/cfgs/config.yml
At the end, you should be able to see something like this.

After going through such long path, training on py-faster-rcnn is completed!
Part 4. Error and solution
no easydict, cv2
# Without Anaconda
$ sudo pip install easydict
$ sudo apt-get install python-opencv # With Anaconda
$ conda install -c verydeep easydict
$ conda install opencv
# Normally, people will follow the online instruction at https://anaconda.org/auto/easydict and install auto/easydict. However, this easydict (ver.1.4) has a problem in passing the message of configuration and cause many unexpected error while verydeep/easydict (ver.1.6) won't cause these errors.
assertionError: Selective Search data is not found
Solution: install verydeep/easydict rather than auto/easydict
$ conda install -c verydeep easydict
box [:, 0] > box[:, 2]
Solution: add the following code block in imdb.py
def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in xrange(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2
boxes[:, 2] = widths[i] - oldx1
for b in range(len(boxes)):
if boxes[b][2] < boxes[b][0]:
boxes[b][0]=0
assert (boxes[:, 2] >= boxes[:, 0]).all()
For ImageNet detection dataset, no need to minus one on coordinates
# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text)
y2 = float(bbox.find('ymax').text)
cls = self._class_to_ind[obj.find('name').text.lower().strip()]
Reference
faster rcnn训练详解的更多相关文章
- 第三十一节,目标检测算法之 Faster R-CNN算法详解
Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal network ...
- 【目标检测】Faster RCNN算法详解
Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal network ...
- Faster R-CNN:详解目标检测的实现过程
本文详细解释了 Faster R-CNN 的网络架构和工作流,一步步带领读者理解目标检测的工作原理,作者本人也提供了 Luminoth 实现,供大家参考. Luminoth 实现:https:// ...
- 目标检测算法之Faster R-CNN算法详解
Fast R-CNN存在的问题:选择性搜索,非常耗时. 解决:加入一个提取边缘的神经网络,将候选框的选取交给神经网络. 在Fast R-CNN中引入Region Proposal Network(RP ...
- Faster R-CNN论文详解 - CSDN博客
废话不多说,上车吧,少年 paper链接:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ...
- 第三十节,目标检测算法之Fast R-CNN算法详解
Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE International Conference on Computer Vision. 2 ...
- 第二十九节,目标检测算法之R-CNN算法详解
Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmenta ...
- 如何才能将Faster R-CNN训练起来?
如何才能将Faster R-CNN训练起来? 首先进入 Faster RCNN 的官网啦,即:https://github.com/rbgirshick/py-faster-rcnn#installa ...
- caffe学习三:使用Faster RCNN训练自己的数据
本文假设你已经完成了安装,并可以运行demo.py 不会安装且用PASCAL VOC数据集的请看另来两篇博客. caffe学习一:ubuntu16.04下跑Faster R-CNN demo (基于c ...
随机推荐
- Luogu P1776 宝物筛选_NOI导刊2010提高(02)(多重背包模版)
传送门 多重背包板子题, 多重背包就是每种东西有好几个,可以把它拆分成一个一个的01背包 优化:二进制拆分(拆成1+2+4+8+16+...) 比如18=1+2+4+8+3,可以证明18以内的任何数都 ...
- selenium:断言
在编写自动化测试脚本时,为了使“机器”去自动辨识test case的执行结果是True还是False,一般都需要在用例执行过程中获取一些信息,来判断用例的执行时成功还是失败. 判断成功失败与否,就涉及 ...
- Python爬虫 获得淘宝商品评论
自从写了第一个sina爬虫,便一发不可收拾.进入淘宝评论爬虫正题: 在做这个的时候,也没有深思到底爬取商品评论有什么用,后来,爬下来了数据.觉得这些数据可以用于帮助分析商品的评论,从而为用户选择商品提 ...
- Scala学习(九)练习
文件正则表达式&练习 1. 编写一小段Scala代码,将某个文件中的行倒转顺序,将最后一行作为第一行,依此类推 程序代码: import scala.io.Source import java ...
- Android开发之自定义万能BaseAdapter
话不多说哦,直接上模板: package com.zyzpp.adapter; import android.content.Context; import android.util.SparseAr ...
- SpringBoot整合篇
目录 SpringBoot整合篇 SpringBoot简介 SpringBoot运行 SpringBoot目录结构 整合JdbcTemplate @RestController 整合JSP 整合JPA ...
- H5海报制作实践
引言 年后一直处于秣马厉兵的状态,上周接到了一个紧急需求,为38妇女节做一个活动页,主要功能是生成海报,第一次做这种需求,我也是个半桶水前端,这里将碰到的问题.踩的坑,如何解决的分享给大家,讲的不到位 ...
- gulp + gulp-better-rollup + rollup 构建 ES6 开发环境
gulp + gulp-better-rollup + rollup 构建 ES6 开发环境 关于 Gulp 就不过多啰嗦了.常用的 js 模块打包工具主要有 webpack.rollup 和 bro ...
- WCF系列教程之WCF服务配置工具
本文参考自http://www.cnblogs.com/wangweimutou/p/4367905.html Visual studio 针对服务配置提供了一个可视化的配置界面(Microsoft ...
- Python2和Python3中urllib库中urlencode的使用注意事项
前言 在Python中,我们通常使用urllib中的urlencode方法将字典编码,用于提交数据给url等操作,但是在Python2和Python3中urllib模块中所提供的urlencode的包 ...