Caffe实现多标签输入，添加数据层(data layer)

因为之前遇到了sequence learning问题(CRNN)，里面涉及到一张图对应多个标签。Caffe源码本身是不支持多类标签数据的输入的。

如果之前习惯调用脚本create_imagenet.sh，将原始数据转换成lmdb数据格式，在这里就会遇到坑。我们去看convert_imageset源码，我们就会发现它是把最后一个空格前面的当作输入，最后一个空格之后的当作标签，那当然无法多标签啊。

通常解决办法

换框架，换一个能支持多标签分类问题的，例如mxnet，但我觉得你既然选择用Caffe来解决问题，估计也不会换。
HDF5+Slice Layer实现，因为Caffe中要求一个hdf5文件大小不超过2GB，所以文件如果较大的话，需要生成多个hdf5文件，所以需要用到Slice Layer。参考生成hdf5文件用于多标签训练
用两个data的输入（两个LMDB），一个只输出图片，一个只输出标签，这种方法相对前面两种要难一些.
修改Caffe源码caffe 实现多标签输入
其实我个人总结的是，数据层的添加可以考虑用python，因为比较简单、快，也不会影响效率，计算层的添加还是需要用C++来写的。

本文解决方案

我采用的方案就是用python将数据转换成lmdb格式，然后在prototxt中定义采用python module的方式，去读取之前转换的lmdb数据。

具体步骤

1. 前期数据准备

前期的数据准备和单分类一样，只不过现在我们有多个标签了，那么我就在train.txt和val.txt中，用空格将多个标签分隔开。例如 image1.jpg label1 label2 label3 label4

2. 数据转lmdb格式

#!/usr/bin/python

# -*- coding: utf-8 -*-

import numpy as np

import lmdb

import sys, os

import caffe

from skimage import io

import cv2

import random

train_path = 'train.txt'                     # 训练集标签

val_path = 'val.txt'						 # 验证集标签

train_lmdb = '/path/to/your/data_train_lmdb' # 生成lmdb格式训练集数据的路径，到目录级别就可以了

val_lmdb = '/path/to/your/data_val_lmdb'     # 生成lmdb格式验证集数据的路径，到目录级别就可以了

# 加载train.txt

def load_txt(txt, shuffle):

    if txt == None:

	print "txtpath!!!"

	exit(0)

    if not os.path.exists(txt):

	print "the txt is't exists"

	exit(0)

    # 将数据按行存入list中

    file_content = []

    with open(txt, 'r') as fr:

	for line in fr.readlines():

	    line = line.strip()

	    file_content.append([_ for _ in line.split(' ')])

	# shuffle数据

    if shuffle:

	random.shuffle(file_content)

    return file_content

if __name__ == '__main__':

    content = []

    # 这里定义了要处理的文件目录，因为我们有train data 和 val data，所以我们需要把val_path和val_lmdb改成train_path和train_lmdb再执行一次这个脚本。

    content = load_txt(val_path, True)

    env = lmdb.Environment(val_lmdb, map_size=int(1e12))

    with env.begin(write=True) as txn:

	for i in range(len(content)):

	    pic_path = content[i][0]

	    # 采用skimage库的方式来读文件

	    img_file = io.imread(pic_path, as_grey=True)

	    # 如果采用opencv的方式来读文件，那么下面也要改成mat转string的方式

	    #img_file = cv2.imread(pic_path, 0)

            data = np.zeros(( img_file.shape[0], img_file.shape[1]), dtype=np.uint8)

	    data = img_file

	    # 因为lmdb是键值数据库，所以我们采用将键和值都设置为字符串格式

	    str_id = "image-%09d" %(i)

	    cv2.imencode('.jpg', data)

	    txn.put(str_id.encode('ascii'), cv2.imencode('.jpg', data)[1].tostring())

		# 这里的多标签采用的是空格分隔，到时候读lmdb数据库的时候，也用空格解析就可以了

	    multi_labels = ""

	    for _ in content[i][1:len(content[i])]:

		multi_labels += _

		multi_labels += " "

            multi_labels += content[i][-1]  

	    # 键和值都是字符串格式

	    str_id = "label-%09d" %(i)

	    #txn.put(str_id.encode('ascii'), multi_labels)

	    txn.put(str_id, multi_labels)

	    #txn.put(str_id, multi_labels)

     	str_id = "num-samples"

	txn.put(str_id, str(len(content)))

	#txn.put(str_id.encode('ascii'), str(len(content)))

	print str(len(content))

分别设置train和val执行这个脚本两次，得到的就是两个目录，里面包含lmdb格式的训练集和验证集，这就回到了我们熟悉的方式，因为之前直接调用自带脚本得到的结果也是这样。

3. 定义dataLayer

这步的作用就是，在prototxt会定义input是采用这个dataLayer将数据读入的。

具体做法将上一步生成的lmdb数据读出来就可以了。

我们先来看看官方给的python接口格式。

# 这是一个lossLayer的例子

import caffe

import numpy as np

class EuclideanLossLayer(caffe.Layer):

    """

    Compute the Euclidean Loss in the same manner as the C++ EuclideanLossLayer

    to demonstrate the class interface for developing layers in Python.

    """

	# 设置参数

    def setup(self, bottom, top):

        # check input pair

        if len(bottom) != 2:

            raise Exception("Need two inputs to compute distance.")

    def reshape(self, bottom, top):

        # check input dimensions match

        if bottom[0].count != bottom[1].count:

            raise Exception("Inputs must have the same dimension.")

        # difference is shape of inputs

        self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)

        # loss output is scalar

        top[0].reshape(1)

	# 前向计算方式

    def forward(self, bottom, top):

        self.diff[...] = bottom[0].data - bottom[1].data

        top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.

	# 反向传播方式

    def backward(self, top, propagate_down, bottom):

        for i in range(2):

            if not propagate_down[i]:

                continue

            if i == 0:

                sign = 1

            else:

                sign = -1

            bottom[i].diff[...] = sign * self.diff / bottom[i].num

那么我们知道接口长什么样以后，我们就开始依葫芦画瓢了。别急，先来看看prototxt怎么定义参数的，因为到时候这个决定了我们要向data Layer中传入什么参数。先看看官方接口

4. 定义prototxt

message PythonParameter {

  optional string module = 1;

  optional string layer = 2;

  // This value is set to the attribute `param_str` of the `PythonLayer` object

  // in Python before calling the `setup()` method. This could be a number,

  // string, dictionary in Python dict format, JSON, etc. You may parse this

  // string in `setup` method and use it in `forward` and `backward`.

  optional string param_str = 3 [default = '']; # 这里比较关键，也就是我们通过这个参数，来决定如何读取lmdb数据的

  // DEPRECATED

  optional bool share_in_parallel = 4 [default = false];

}

我给一个实例代码

layer {

  name: "data"

  type: "Python"

  top: "data"

  top: "label"

  include {

    phase: TRAIN

  }

  python_param {

     module: "dataLayer"

     layer: "CRNNDataLayer"

     param_str: "{'data' : '/path/to/your/data_train_lmdb', 'batch_size' : 128}"

  }

}

我们可以看到它会去调用dataLayer这个python模块，那么就需要定义dataLayer，具体实现如下。

import sys

import caffe

from caffe import layers as L, params as P

from caffe.coord_map import crop

import numpy as np

import os

import cv2

import lmdb

import random

import timeit

import os

class CRNNDataLayer(caffe.Layer):

    def setup(self, bottom, top):

        params = eval(self.param_str)

        # 读prototxt中的参数

        self.lmdb = lmdb.open(params['data']).begin(buffers=True).cursor()

        # 这个是生成lmdb数据的时候，定义的样本的总个数

        c=self.lmdb.get('num-samples')

#        print '['+str(c)+']'

        self.max_num = int(str(c))

        self.batch_size = int(params['batch_size'])

        # two tops: data and label

        if len(top) != 2:

            raise Exception("Need to define two tops: data and label.")

        # data layers have no bottoms

        if len(bottom) != 0:

            raise Exception("Do not define a bottom.")

    def reshape(self, bottom, top):

        # load image + label image pair

        start = timeit.timeit()

        self.data,self.label = self.load_data()

        end = timeit.timeit()

#        print 'time used for reshape',end-start

        # reshape tops to fit (leading 1 is for batch dimension)

        top[0].reshape(*self.data.shape)

        top[1].reshape(*self.label.shape)        

    def forward(self, bottom, top):

        # assign output

        top[0].data[...] = self.data

        top[1].data[...] = self.label

	# 因为是data layer，所以不需要定义backward

    def backward(self, top, propagate_down, bottom):

        pass

    def load_data(self):

    	# 采用随机读入的方式

        rnd = random.randint(0,self.max_num-self.batch_size-1)

        # 先初始化一个多维数组，用于存放读入的数据，在这里设置batch size, channel, height, width

        img_list= np.zeros((self.batch_size, channel, height, width),

                           dtype = np.float32)

        # 先初始化一个多维数组，用于存放标签数据，设置batch size, label size(每张图对应的标签的个数)

        label_seq = np.ones((self.batch_size, label_size), dtype = np.float32)

        j = 0

        i = 0

#        print 'loading data ...'

        while i < self.batch_size:

#            rnd = random.randint(0,self.max_num-self.batch_size-1)

            imageKey = 'image-%09d' % (rnd + j)

            labelKey = 'label-%09d' % (rnd + j)

            try:

                img_array = np.asarray(bytearray(self.lmdb.get(imageKey)), dtype=np.uint8)

                #imgdata = cv2.imdecode(img_array, 0)

		        imgdata = cv2.imdecode(np.fromstring(img_array, np.uint8), cv2.CV_LOAD_IMAGE_GRAYSCALE)

		        # 设置resize的width和height

                image = cv2.resize(imgdata, width,height))

                image = (image - 128.0)/128

                img_list[i] = image

                label = str(self.lmdb.get(labelKey))

                #numbers = np.array(map(lambda x: float(ascii2label(ord(x))), label))

				 label_list = label.split(" ")

				 label_list = [int(_) for _ in label_list]

				 # 这里把标签依次放入数组中

                label_seq[i, :len(label_list)] = label_list

                i+=1

            except Exception as e:

		print e

            j+=1

#        print 'data loaded'

        return img_list,label_seq

5. 重新编译caffe

因为我们添加了一个python module，那么我们要在环境变量中，设置这个module，不然会出现找不到的情况。

vim ~/.bash_profile

export PYTHONPATH=$PYTHONPATH:(添加dataLayer.py所在目录)

source ~/.bash_profile

编译

WITH_PYTHON_LAYER=1 make && make pycaffe

大功告成

本人亲测以上方式是可行的。

Caffe实现多标签输入，添加数据层(data layer)的更多相关文章

CGI servlet Applet Scriptlet Scriptlet JSP data layer(数据层),business layer(业务层), presentation layer(表现层)
https://en.wikipedia.org/wiki/Common_Gateway_Interface In computing, Common Gateway Interface (CGI) ...
Caffe学习系列(15)：添加新层
如何在Caffe中增加一层新的Layer呢?主要分为四步: (1)在./src/caffe/proto/caffe.proto 中增加对应layer的paramter message: (2)在./i ...
caffe读取多标签的lmdb数据
问题描述: lmdb文件支持数据+标签的形式,但是却只能写入一个标签,引入多标签的解决方法有很多,这儿详细说一下我的办法:制作多个data数据,分别加入一个标签.我的方法只适用于标签数量较少的情况,标 ...
【撸码caffe 五】数据层搭建
caffe.cpp中的train函数内声明了一个类型为Solver类的智能指针solver: // Train / Finetune a model. int train() { -- shared_ ...
23、ASP.NET MVC入门到精通——业务层和数据层父类及接口-T4模板
本系列目录:ASP.NET MVC4入门到精通系列目录汇总在上一篇中,我们已经把项目的基本框架搭起来了,这一篇我们就来实现业务层和数据层的父接口及父类. 1.我们先来定义一个业务层父接口IBaseB ...
caffe添加python数据层
caffe添加python数据层(ImageData) 在caffe中添加自定义层时,必须要实现这四个函数,在C++中是(LayerSetUp,Reshape,Forward_cpu,Backward ...
【转】Caffe初试（四）数据层及参数
要运行caffe,需要先创建一个模型(model),如比较常用的Lenet,Alex等,而一个模型由多个层(layer)构成,每一层又由许多参数组成.所有的参数都定义在caffe.proto这个文件中 ...
Caffe学习系列(2)：数据层及参数
要运行caffe,需要先创建一个模型(model),如比较常用的Lenet,Alex等, 而一个模型由多个屋(layer)构成,每一屋又由许多参数组成.所有的参数都定义在caffe.proto这个文件 ...
转 Caffe学习系列(2)：数据层及参数
http://www.cnblogs.com/denny402/p/5070928.html 要运行caffe,需要先创建一个模型(model),如比较常用的Lenet,Alex等, 而一个模型由多个 ...

随机推荐

Jmeter--CSV Data Set Config 参数化配置
博客首页:http://www.cnblogs.com/fqfanqi/ 设置界面如下: Filename:参数文件名,一般是.csv和.txt文件.绝对路径和相对路径都可以,为了便于脚本迁移,建议使 ...
LeetCode 学习
1.整数反转题目:给出一个 32 位的有符号整数,你需要将这个整数中每位上的数字进行反转. 思路:把最后的一位提取出来,放到新的容器前面,反复进行上面的操作,同时也要判断是否会导致溢出 class ...
在Editplus中Dev C++配置C++的编译运行环境
1.首先得下载安装DEV-cpp 2.打开Editplus编辑器,工具->配置自定义工具 3.具体配置编译C:命令:D:\Dev-Cpp\MinGW64\bin\g++.exe参数:" ...
Linux磁盘分区的理解
磁盘分割: 一块磁盘可以被分割为多个分区artition. 磁盘链接的方式正常的实体机使用的都是/dev/sd[a-]的硬盘档名虚拟机可能会使用/dev/vd[a-p]这种装置档名 SATA/US ...
Button 自动换行
UIView *view=[[UIView alloc]initWithFrame:CGRectMake(0, 200, self.view.frame.size.width, 300)]; view ...
numpy.random.random & numpy.ndarray.astype & numpy.arange
今天看到这样一句代码: xb = np.random.random((nb, d)).astype('float32') #创建一个二维随机数矩阵(nb行d列) xb[:, 0] += np.aran ...
洛谷 P2300 合并神犇
洛谷听说这题可以\(n^2\)水过去,不过这里介绍一种\(O(n)\)的做法. \(f[i]\)为第\(1-i\)位合并的次数. \(pre[i]\)为第\(1-i\)位最末尾的数. \(j\)为满 ...
pandas数据分析第二天
一:汇总和计算描述统计 pandas对象拥有一组常用的数据和统计方法,用于从Series中提取单个值(sum,mean)或者从DataFrame的行或者列中提取一个Series对应的Numpy数组方法 ...
Tomcat Server
Tomcat Server的组成部分: 站在框架的顶层的是Server和ServiceServer:servletcontainer Service:Service是这样一个集合:它由一个或者多个Co ...
0408-服务注册与发现-Eureka常用配置
一.概述参看地址:https://cloud.spring.io/spring-cloud-static/Edgware.SR3/single/spring-cloud.html#_appendix ...

Caffe实现多标签输入，添加数据层(data layer)

通常解决办法

本文解决方案

具体步骤

1. 前期数据准备

2. 数据转lmdb格式

3. 定义dataLayer

4. 定义prototxt

5. 重新编译caffe

大功告成

Caffe实现多标签输入，添加数据层(data layer)的更多相关文章

随机推荐

热门专题