Caffe之layer_factory
之前在测试NN中各个层的时间的时候,遇到一个非常奇怪的问题,分别使用Caffe自己的gpu方法和cuDNN方法,在卷积上性能差异非常大,但是在pooling层上基本没有变化。抽空检查了代码之后,发现是layer_factory模式导致的问题。下面就以下几个方面来进行
1.工厂模式
2.layer_factory详解
3.layer_factory中坑
4.问题影响分析
1.工厂模式
工厂模式是设计模式中的一种,面向的业务大概是在编码时不能预见需要创建那种类的实例,系统不依赖产品类如何被创建、组合和表达的细节,工厂模式的弊端是扩展比较少的项目中比较合适。
工厂模式有三种角色:
工厂类角色:根据逻辑产生具体的产品
抽象产品角色:具体产品的父类,一把由Java中的接口或者C++中的抽象类来实现
具体产品角色:产品实例
2.layer_factory详解
众所周知,Caffe1.0版本中,目前有三大类算子:CPU版本、Caffe自己实现的CUDA版本的和CuDNN版本的。layer_factory文件负责组装Caffe中算子,工厂模式的意思就是根据用户的设置,在执行时,选择相应版本的算子进行。
以下参考至http://zhuanlan.zhihu.com/hacker-and-painter/20456649
layer_factory.hpp是layer_factory的头文件
/**
* @brief A layer factory that allows one to register layers.
* During runtime, registered layers could be called by passing a LayerParameter
* protobuffer to the CreateLayer function:
*
* LayerRegistry<Dtype>::CreateLayer(param);
*
* There are two ways to register a layer. Assuming that we have a layer like:
*
* template <typename Dtype>
* class MyAwesomeLayer : public Layer<Dtype> {
* // your implementations
* };
*
* and its type is its C++ class name, but without the "Layer" at the end
* ("MyAwesomeLayer" -> "MyAwesome").
*
* If the layer is going to be created simply by its constructor, in your c++
* file, add the following line:
*
* REGISTER_LAYER_CLASS(MyAwesome);
*
* Or, if the layer is going to be created by another creator function, in the
* format of:
*
* template <typename Dtype>
* Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) {
* // your implementation
* }
*
* (for example, when your layer has multiple backends, see GetConvolutionLayer
* for a use case), then you can register the creator function instead, like
*
* REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer)
*
* Note that each layer type should only be registered once.
*/ #ifndef CAFFE_LAYER_FACTORY_H_
#define CAFFE_LAYER_FACTORY_H_ #include <map>
#include <string> #include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h" namespace caffe { template <typename Dtype>
class Layer;
//LayerResistry的功能很简单,就是将类和对应的字符串类型放入到一个map当中去,以便灵活调用。主要就是注册类的功能
template <typename Dtype>
class LayerRegistry {
public:
// 函数指针Creator,返回的是Layer<Dtype>类型的指针
typedef shared_ptr<Layer<Dtype> > (*Creator)(const LayerParameter&);
// CreatorRegistry是字符串与对应的Creator的映射
typedef std::map<string, Creator> CreatorRegistry; static CreatorRegistry& Registry() {
static CreatorRegistry* g_registry_ = new CreatorRegistry();
return *g_registry_;
} // Adds a creator.
// 根据类型和函数指针,加入到表中
static void AddCreator(const string& type, Creator creator) {
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), )
<< "Layer type " << type << " already registered.";
registry[type] = creator;
} // Get a layer using a LayerParameter.
//给定层的类型,创建层
static shared_ptr<Layer<Dtype> > CreateLayer(const LayerParameter& param) {
LOG(INFO) << "Creating layer " << param.name();
// 从参数中获得类型字符串
const string& type = param.type();
// 检查是否查找到给定type的Creator
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), ) << "Unknown layer type: " << type
<< " (known types: " << LayerTypeList() << ")";
// 调用对应的层的Creator函数
return registry[type](param);
} private:
// Layer registry should never be instantiated - everything is done with its
// static variables.
// 禁止实例化,因为该类都是静态函数,所以是私有的
LayerRegistry() {}
//返回层的类型列表
static string LayerTypeList() {
// 获得注册表
CreatorRegistry& registry = Registry();
string layer_types;
// 遍历注册表压入layer_types字符串容器
for (typename CreatorRegistry::iterator iter = registry.begin();
iter != registry.end(); ++iter) {
if (iter != registry.begin()) {
layer_types += ", ";
}
layer_types += iter->first;
}
return layer_types;
}
}; // LayerRegisterer
// 自己定义层的注册器
// 以供后面的宏进行使用
template <typename Dtype>
class LayerRegisterer {
public:
// 层的注册器的构造函数
LayerRegisterer(const string& type,
shared_ptr<Layer<Dtype> > (*creator)(const LayerParameter&)) {
// LOG(INFO) << "Registering layer type: " << type;
// 还是调用的层注册表中的加入Creator函数加入注册表
LayerRegistry<Dtype>::AddCreator(type, creator);
}
};
//为了方便作者还弄了个宏便于注册自己写的层类
// 生成g_creator_f_type(type, creator<Dtype>)的两个函数 (double和float类型)
#define REGISTER_LAYER_CREATOR(type, creator) \
static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>); \
static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>) \
/* 注册自己定义的类,类名为type,
假设比如type=bias,那么生成如下的代码
下面的函数直接调用你自己的类的构造函数生成一个类的实例并返回
CreatorbiasLayer(const LayerParameter& param)
下面的语句是为你自己的类定义了LayerRegisterer<float>类型的静态变量g_creator_f_biasLayer(float类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)
static LayerRegisterer<float> g_creator_f_biasLayer(bias, CreatorbiasLayer)
下面的语句为你自己的类定义了LayerRegisterer<double>类型的静态变量g_creator_d_biasLayer(double类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)
static LayerRegisterer<double> g_creator_d_biasLayer(bias, CreatorbiasLayer)
*/
#define REGISTER_LAYER_CLASS(type) \
template <typename Dtype> \
shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \
{ \
return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param)); \
} \
REGISTER_LAYER_CREATOR(type, Creator_##type##Layer) } // namespace caffe #endif // CAFFE_LAYER_FACTORY_H_
经过上边的阐述之后,实现部分(这部分和1.0版本有出入,大的方面不影响)
layer_factory.hpp:
// Make sure we include Python.h before any system header
// to avoid _POSIX_C_SOURCE redefinition
#ifdef WITH_PYTHON_LAYER
#include <boost/python.hpp>
#endif
#include <string> #include "caffe/layer.hpp"
#include "caffe/layer_factory.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/vision_layers.hpp" #ifdef WITH_PYTHON_LAYER
#include "caffe/python_layer.hpp"
#endif namespace caffe { // 写一个获取卷积层实例的函数
// Get convolution layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetConvolutionLayer(
const LayerParameter& param) {
// 从参数中获取是使用什么引擎进行计算CUDNN还是CAFFE还是DEFAULT
// engine可从caffe.proto中看出是枚举类型的
ConvolutionParameter_Engine engine = param.convolution_param().engine();
if (engine == ConvolutionParameter_Engine_DEFAULT) {
engine = ConvolutionParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = ConvolutionParameter_Engine_CUDNN;
#endif
}
if (engine == ConvolutionParameter_Engine_CAFFE) {
// 直接初始化Caffe的卷积层
return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == ConvolutionParameter_Engine_CUDNN) {
// 初始化CUDNN的卷积层
return shared_ptr<Layer<Dtype> >(new CuDNNConvolutionLayer<Dtype>(param));
#endif
} else {// 否则就是出错了
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
}
// 注册该卷积层,类型名为Convolution,获取卷积层的实例为GetConvolutionLayer函数
REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer); // 获取池化层的实例,同卷积层的逻辑
// Get pooling layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetPoolingLayer(const LayerParameter& param) {
PoolingParameter_Engine engine = param.pooling_param().engine();
if (engine == PoolingParameter_Engine_DEFAULT) {
engine = PoolingParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = PoolingParameter_Engine_CUDNN;
#endif
}
if (engine == PoolingParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == PoolingParameter_Engine_CUDNN) {
PoolingParameter p_param = param.pooling_param();
if (p_param.pad() || p_param.pad_h() || p_param.pad_w() ||
param.top_size() > ) {
LOG(INFO) << "CUDNN does not support padding or multiple tops. "
<< "Using Caffe's own pooling layer.";
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
}
return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} // 注册池化层
REGISTER_LAYER_CREATOR(Pooling, GetPoolingLayer); // 注册ReLU层
// Get relu layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetReLULayer(const LayerParameter& param) {
ReLUParameter_Engine engine = param.relu_param().engine();
if (engine == ReLUParameter_Engine_DEFAULT) {
engine = ReLUParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = ReLUParameter_Engine_CUDNN;
#endif
}
if (engine == ReLUParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new ReLULayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == ReLUParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNReLULayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(ReLU, GetReLULayer); // 注册sigmoid层
// Get sigmoid layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetSigmoidLayer(const LayerParameter& param) {
SigmoidParameter_Engine engine = param.sigmoid_param().engine();
if (engine == SigmoidParameter_Engine_DEFAULT) {
engine = SigmoidParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = SigmoidParameter_Engine_CUDNN;
#endif
}
if (engine == SigmoidParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new SigmoidLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == SigmoidParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNSigmoidLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(Sigmoid, GetSigmoidLayer); // 注册softmax层
// Get softmax layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetSoftmaxLayer(const LayerParameter& param) {
SoftmaxParameter_Engine engine = param.softmax_param().engine();
if (engine == SoftmaxParameter_Engine_DEFAULT) {
engine = SoftmaxParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = SoftmaxParameter_Engine_CUDNN;
#endif
}
if (engine == SoftmaxParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new SoftmaxLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == SoftmaxParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNSoftmaxLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(Softmax, GetSoftmaxLayer); // 注册tanh层
// Get tanh layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetTanHLayer(const LayerParameter& param) {
TanHParameter_Engine engine = param.tanh_param().engine();
if (engine == TanHParameter_Engine_DEFAULT) {
engine = TanHParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = TanHParameter_Engine_CUDNN;
#endif
}
if (engine == TanHParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new TanHLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == TanHParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNTanHLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(TanH, GetTanHLayer); // 注册PYTHON层
#ifdef WITH_PYTHON_LAYER
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetPythonLayer(const LayerParameter& param) {
Py_Initialize();
try {
bp::object module = bp::import(param.python_param().module().c_str());
bp::object layer = module.attr(param.python_param().layer().c_str())(param);
return bp::extract<shared_ptr<PythonLayer<Dtype> > >(layer)();
} catch (bp::error_already_set) {
PyErr_Print();
throw;
}
} REGISTER_LAYER_CREATOR(Python, GetPythonLayer);
#endif // Layers that use their constructor as their default creator should be
// registered in their corresponding cpp files. Do not register them here.
} // namespace caffe
3.layer_factory中坑
在现有的代码中,Pooling层的注册部分出现了这个代码:
// CuDNN assumes layers are not being modified in place, thus
// breaking our index tracking for updates in some cases in Caffe.
// Until there is a workaround in Caffe (index management) or
// cuDNN, use Caffe layer to max pooling, or don't use in place
// layers after max pooling layers
if (param.pooling_param().pool() == PoolingParameter_PoolMethod_MAX) {
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
} else {
return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
}
这就直接导致,只要你用的是MaxPool,使用的一定是Caffe自己实现的cu代码,永远无法使用cuDNN版本的代码,这就解释了我们之前测试MaxPool层性能一直没有变化的原因
4.问题影响分析
但是caffe的作者为什么不使用cuDNN的MaxPool呢,经过查询NVIDIA cuDNN的User Manual,我们发现,
4.144. cudnnPoolingForward
cudnnStatus_t cudnnPoolingForward(
cudnnHandle_t handle,
const cudnnPoolingDescriptor_t poolingDesc,
const void *alpha,
const cudnnTensorDescriptor_t xDesc,
const void *x,
const void *beta,
const cudnnTensorDescriptor_t yDesc,
void *y)
This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.
Parameters
- handle
-
Input. Handle to a previously created cuDNN context.
- poolingDesc
-
Input. Handle to a previously initialized pooling descriptor.
- alpha, beta
-
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Refer to this section for additional details.
- xDesc
-
Input. Handle to the previously initialized input tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.
- x
-
Input. Data pointer to GPU memory associated with the tensor descriptorxDesc.
- yDesc
-
Input. Handle to the previously initialized output tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.
- y
-
Output. Data pointer to GPU memory associated with the output tensor descriptoryDesc.
The possible error values returned by this function and their meanings are listed below.
Returns
- CUDNN_STATUS_SUCCESS
-
The function launched successfully.
- CUDNN_STATUS_BAD_PARAM
-
At least one of the following conditions are met:
- The dimensionsn,cof the input tensor and output tensors differ.
- Thedatatypeof the input tensor and output tensors differs.
- CUDNN_STATUS_NOT_SUPPORTED
-
The function does not support the provided configuration. See the following for some examples of non-supported configurations:
- ThewStrideof input tensor or output tensor is not 1.
- CUDNN_STATUS_EXECUTION_FAILED
-
The function failed to launch on the GPU
这个地方比较神奇的是只能传入两个参数,这就无法实现mask的更新,不太明白cuDNN设计者的思路,目前看,这个地方要想保持正确性,暂时应该是无法使用cuDNN的PoolingForward了。
Caffe之layer_factory的更多相关文章
- 基于Caffe的DeepID2实现(中)
小喵的唠叨话:我们在上一篇博客里面,介绍了Caffe的Data层的编写.有了Data层,下一步则是如何去使用生成好的训练数据.也就是这一篇的内容. 小喵的博客:http://www.miaoerduo ...
- 浅析py-faster-rcnn中不同版本caffe的安装及其对应不同版本cudnn的解决方案
浅析py-faster-rcnn中不同版本caffe的安装及其对应不同版本cudnn的解决方案 本文是截止目前为止最强攻略,按照本文方法基本可以无压力应对caffe和Ross B. Girshick的 ...
- 【caffe】mnist训练日志
@tags caffe 前面根据train_lenet.sh改写了train_lenet.py后,在根目录下执行它,得到一系列输出,内容如下: I1013 10:05:16.721294 1684 c ...
- 在caffe中添加新的layer
比如现在要添加一个vision layer,名字叫Ly_Layer:(一般命名第一个字母大写,其余小写.) 1.属于哪个类型的layer(共五种:common_layer, data_layer, l ...
- caffe: compile error: Could not open or find file your path~~/resized_data/0 and a total of 2 images .
I0219 14:48:40.965386 31108 net.cpp:76] Memory required for data: 0I0219 14:48:40.965517 31108 layer ...
- [caffe]深度学习之图像分类模型VGG解读
一.简单介绍 vgg和googlenet是2014年imagenet竞赛的双雄,这两类模型结构有一个共同特点是go deeper.跟googlenet不同的是.vgg继承了lenet以及alexnet ...
- caffe+GPU︱AWS.G2+Ubuntu14.04+GPU+CUDA8.0+cudnn8.0
国服亚马逊的GPU实例G2.2xlarge的python+caffe的安装过程,被虐- 一周才装出来- BVLC/caffe的在AWS安装的官方教程github: https://github.com ...
- caffe项目工程化封装FRCNN
各种坑!!想要做好,一定要自己一步步试,下载别人的总会出现各种问题. 步骤如下:(可以把这些文件打包在一个文件加下,分两个文件libs,include,一定要是自己的文件) 1 首先是配置caffe的 ...
- caffe中使用python定义新的层
转载链接:http://withwsf.github.io/2016/04/14/Caffe-with-Python-Layer/ Caffe通过Boost中的Boost.Python模块来支持使用P ...
随机推荐
- Http的请求协议请求行介绍
请求协议包含的内容 请求行 GET /day04-tomcat/index.jsp HTTP/1.1 HTTP/1.1: 表示的是我们使用的是http协议的1.1版本 请求头 请求空行 请求体: 存储 ...
- MySQL、sqlalchemy、pymysql、mysqldb、DBAPI之间关系梳理(终于明白了)
MySQL.sqlalchemy.pymysql.mysqldb.DBAPI之间关系梳理(终于明白了) python3不再支持mysqldb 请用pymysql和mysql.connector 问题背 ...
- Linux发行版本简介
Linux发行版 1. Linux本身 1.1. 1991年,当时一名来自赫尔辛基的计算机科学学生LinusTorvalds创建了一个操作系统内核 1.1.1. 一年后 ...
- MyISAM与InnoDB之间的区别
区别: 1. InnoDB支持事务,MyISAM不支持,对于InnoDB每一条SQL语言都默认封装成事务,自动提交,这样会影响速度,所以最好把多条SQL语言放在begin和commit之间,组成一个事 ...
- REPLACE 语法
转自:https://www.cnblogs.com/jiangzhengjun/p/4292994.html#_Toc411766043 REPLACE REPLACE [{FIRST OCCURR ...
- Method org/apache/commons/dbcp/DelegatingResultSet.isClosed()Z is abstract
按照网络上的排除建议,换成了alibaba的druid. 相关maven依赖如下,注意版本匹配问题 spring版本使用4.3.24.RELEASE <dependency> <gr ...
- getchwd() 函数返回当前工作目录。
getchwd() 函数返回当前工作目录.
- Java中处理OPC寄存器数据类型
1. 在milo中,处理WORD等数据类型 例子如下: VariableNode node = client.getAddressSpace().createVariableNode( new Nod ...
- TensorFlow.训练_资料(有视频)
ZC:自己训练 的文章 貌似 能度娘出来很多,得 自己弄过才知道哪些个是坑 哪些个好用...(在CSDN文章的右侧 也有列出很多相关的文章链接)(貌似 度娘的关键字是"TensorFlow ...
- eNSP——配置Trunk接口
原理: 在以太网中,通过划分 VLAN 来隔离广播域和增强网络通信的安全性.以太网通常由多台交换机组成,为了使 VLAN 的数据帧跨越多台交换机传递,交换机之间互连的链路需要设置为干道链路( Trun ...