paddle中新增layer

Implement C++ Class

The C++ class of the layer implements the initialization, forward, and backward part of the layer. It needs to derive the base class paddle::Layer, and it needs to override the following functions:

constructor and destructor.
init function. It is used to initialize the parameters and settings.
forward. It implements the forward part of the layer.
backward. It implements the backward part of the layer.
prefetch. It is utilized to determine the rows corresponding parameter matrix to prefetch from parameter server. You do not need to override this function if your layer does not need remote sparse update. (most layers do not need to support remote sparse update)

头文件：

namespace paddle {

/**

 * A layer has full connections to all neurons in the previous layer.

 * It computes an inner product with a set of learned weights, and

 * (optionally) adds biases.

 *

 * The config file api is fc_layer.

 */

class FullyConnectedLayer : public Layer {

protected:

  WeightList weights_;

  std::unique_ptr<Weight> biases_;

public:

  explicit FullyConnectedLayer(const LayerConfig& config)

      : Layer(config) {}

  ~FullyConnectedLayer() {}

  bool init(const LayerMap& layerMap, const ParameterMap& parameterMap);

  Weight& getWeight(int idx) { return *weights_[idx]; }

  void prefetch();

  void forward(PassType passType);

  void backward(const UpdateCallback& callback = nullptr);

};

}  // namespace paddle

It defines the parameters as class variables. We use Weight class as abstraction of parameters. It supports multi-thread update. The details of this class will be described in details in the implementations.

weights_ is a list of weights for the transformation matrices. The current implementation can have more than one inputs. Thus, it has a list of weights. One weight corresponds to an input.
biases_ is a weight for the bias vector.

The fully connected layer does not have layer configuration hyper-parameters. If there are some layer hyper-parameters, a common practice is to store it in LayerConfig& config, and put it into a class variable in the constructor.

The following code snippet implements the init function.

First, every init function must call the init function of the base class Layer::init(layerMap, parameterMap);. This statement will initialize the required variables and connections for each layer.
The it initializes all the weights matrices $W$ . The current implementation can have more than one inputs. Thus, it has a list of weights.(当前layer的输入可能来自多个layer，每个layer对应一个weight)
Finally, it initializes the bias.

bool FullyConnectedLayer::init(const LayerMap& layerMap,

                               const ParameterMap& parameterMap) {

  /* Initialize the basic parent class */

  Layer::init(layerMap, parameterMap);

  /* initialize the weightList */

  CHECK(inputLayers_.size() == parameters_.size());

  for (size_t i = ; i < inputLayers_.size(); i++) {

    // Option the parameters
    // 输入层的神经元数目

    size_t height = inputLayers_[i]->getSize();
    // 当前层的神经元数目

    size_t width = getSize();

    // create a new weight

    if (parameters_[i]->isSparse()) {

      CHECK_LE(parameters_[i]->getSize(), width * height);

    } else {

      CHECK_EQ(parameters_[i]->getSize(), width * height);

    }

    Weight* w = new Weight(height, width, parameters_[i]);

    // append the new weight to the list

    weights_.emplace_back(w);

  }

  /* initialize biases_ */

  if (biasParameter_.get() != NULL) {

    biases_ = std::unique_ptr<Weight>(new Weight(, getSize(), biasParameter_));

  }

  return true;

}

The implementation of the forward part has the following steps.

Every layer must call Layer::forward(passType); at the beginning of its forward function.
Then it allocates memory for the output using reserveOutput(batchSize, size);. This step is necessary because we support the batches to have different batch sizes. reserveOutput will change the size of the output accordingly. For the sake of efficiency, we will allocate new memory if we want to expand the matrix, but we will reuse the existing memory block if we want to shrink the matrix.
Then it computes $\sum_i W_i x + b$ using Matrix operations。 getInput(i).value retrieve the matrix of the i-th input. Each input is a $batchSize×dim$ matrix, where each row represents an single input in a batch. For a complete lists of supported matrix operations, please refer to paddle/math/Matrix.h and paddle/math/BaseMatrix.h.

Finally it applies the activation function using forwardActivation();. It will automatically applies the corresponding activation function specifies in the network configuration.

void FullyConnectedLayer::forward(PassType passType) {

  Layer::forward(passType);

  /* malloc memory for the output_ if necessary */
  // batchSize是样本数，size是神经元数目

  int batchSize = getInput().getBatchSize();

  int size = getSize();

  {

    // Settup the size of the output.

    reserveOutput(batchSize, size);

  }

  MatrixPtr outV = getOutputValue();

  // Apply the the transformation matrix to each input.

  for (size_t i = ; i != inputLayers_.size(); ++i) {

    auto input = getInput(i);

    CHECK(input.value) << "The input of 'fc' layer must be matrix";

    i ==  ? outV->mul(input.value, weights_[i]->getW(), , )

           : outV->mul(input.value, weights_[i]->getW(), , );

  }

  /* add the bias-vector */

  if (biases_.get() != NULL) {

    outV->addBias(*(biases_->getW()), );

  }

  /* activation */ {

    forwardActivation();

  }

}

The implementation of the backward part has the following steps.

backwardActivation() computes the gradients of the activation. The gradients will be multiplies in place to the gradients of the output, which can be retrieved using getOutputGrad().
Compute the gradients of bias. Notice that we an use biases_->getWGrad() to get the gradient matrix of the corresponding parameter. After the gradient of one parameter is updated, it must call getParameterPtr()->incUpdate(callback);. This is utilize for parameter update over multiple threads or multiple machines.

Then it computes the gradients of the transformation matrices and inputs, and it calls incUpdate for the corresponding parameter. This gives the framework the chance to know whether it has gathered all the gradient to one parameter so that it can do some overlapping work (e.g., network communication)

void FullyConnectedLayer::backward(const UpdateCallback& callback) {

  /* Do derivation for activations.*/ {
    // 计算本层网络的激活关于本层网络参数的偏导

    backwardActivation();

  }

  if (biases_ && biases_->getWGrad()) {
    // 计算loss函数关于本层网络偏差的梯度

    biases_->getWGrad()->collectBias(*getOutputGrad(), );

    biases_->getParameterPtr()->incUpdate(callback);

  }

  bool syncFlag = hl_get_sync_flag();

  for (size_t i = ; i != inputLayers_.size(); ++i) {

    /* Calculate the W-gradient for the current layer */

    if (weights_[i]->getWGrad()) {

      MatrixPtr input_T = getInputValue(i)->getTranspose();

      MatrixPtr oGrad = getOutputGrad();

      {

        weights_[i]->getWGrad()->mul(input_T, oGrad, , );

      }

    }

    /* Calculate the input layers error */

    MatrixPtr preGrad = getInputGrad(i);

    if (NULL != preGrad) {

      MatrixPtr weights_T = weights_[i]->getW()->getTranspose();

      preGrad->mul(getOutputGrad(), weights_T, , );

    }

    {

      weights_[i]->getParameterPtr()->incUpdate(callback);

    }

  }

}

The prefetch function specifies the rows that need to be fetched from parameter server during training. It is only useful for remote sparse training. In remote sparse training, the full parameter matrix is stored distributedly at the parameter server. When the layer uses a batch for training, only a subset of locations of the input is non-zero in this batch. Thus, this layer only needs the rows of the transformation matrix corresponding to the locations of these non-zero entries. The prefetch function specifies the ids of these rows.

Most of the layers do not need remote sparse training function. You do not need to override this function in this case.

void FullyConnectedLayer::prefetch() {

  for (size_t i = ; i != inputLayers_.size(); ++i) {

    auto* sparseParam =

        dynamic_cast<SparsePrefetchRowCpuMatrix*>(weights_[i]->getW().get());

    if (sparseParam) {

      MatrixPtr input = getInputValue(i);

      sparseParam->addRows(input);

    }

  }

}

Finally, you can use REGISTER_LAYER(fc, FullyConnectedLayer); to register the layer. fc is the identifier of the layer, and FullyConnectedLayer is the class name of the layer.

namespace paddle {

REGISTER_LAYER(fc, FullyConnectedLayer);

}

If the cpp file is put into paddle/gserver/layers, it will be automatically added to the compilation list.

Implement Python Wrapper

Implementing Python wrapper allows us to use the added layer in configuration files. All the Python wrappers are in file python/paddle/trainer/config_parser.py. An example of the Python wrapper for fully connected layer is listed below. It has the following steps:

Use @config_layer('fc') at the decorator for all the Python wrapper class. fc is the identifier of the layer.
Implements __init__ constructor function.
- It first call super(FCLayer, self).__init__(name, 'fc', size, inputs=inputs, **xargs) base constructor function. FCLayer is the Python wrapper class name, and fc is the layer identifier name. They must be correct in order for the wrapper to work.
- Then it computes the size and format (whether sparse) of each transformation matrix as well as the size.

@config_layer('fc')

class FCLayer(LayerBase):

    def __init__(

            self,

            name,

            size,

            inputs,

            bias=True,

            **xargs):

        super(FCLayer, self).__init__(name, 'fc', size, inputs=inputs, **xargs)

        for input_index in xrange(len(self.inputs)):

            input_layer = self.get_input_layer(input_index)

            psize = self.config.size * input_layer.size

            dims = [input_layer.size, self.config.size]

            format = self.inputs[input_index].format

            sparse = format == "csr" or format == "csc"

            if sparse:

                psize = self.inputs[input_index].nnz

            self.create_input_parameter(input_index, psize, dims, sparse, format)

        self.create_bias_parameter(bias, self.config.size)

In network configuration, the layer can be specifies using the following code snippets. The arguments of this class are:

name is the name identifier of the layer instance.
type is the type of the layer, specified using layer identifier.
size is the output size of the layer.
bias specifies whether this layer instance has bias.
inputs specifies a list of layer instance names as inputs.

Layer(

    name = "fc1",

    type = "fc",

    size = ,

    bias = True,

    inputs = [Input("pool3")]

)

You are also recommended to implement a helper for the Python wrapper, which makes it easier to write models. You can refer to python/paddle/trainer_config_helpers/layers.py for examples.

http://doc.paddlepaddle.org/doc/howto/dev/new_layer_en.html

paddle源码解析：

http://wiki.babel.baidu.com/twiki/bin/view/Main/Paddle%E6%BA%90%E7%A0%81%E5%89%96%E6%9E%90--Layer#2.2 backward函数

http://wiki.baidu.com/pages/viewpage.action?pageId=353372756

paddle中新增layer的更多相关文章

html5中新增的form表单属性
html5中新增两个表单属性,分别autocomplete和novalidate属性 1.autocomplete属性该属性用于控制自动完成功能的开启和关闭.可以设置表单或者input元素,有两个属 ...
Bash 4.4 中新增的 ${parameter@operator} 语法
Bash 4.4 中新增了一种 ${...} 语法,长这样:${parameter@operator}.根据不同的 operator,它展开后的值可能是 parameter 这个参数的值经过某种转换后 ...
在 .NET 4.0 中使用 .NET 4.5 中新增的特性（CallerMemberNameAttribute/CallerFilePathAttribute/CallerLineNumberAttribute）
介绍标题中所说的三个特性 CallerMemberNameAttribute / CallerFilePathAttribute / CallerLineNumberAttribute 我们统称为调 ...
[转]在NopCommerce中新增一个Domain Model的步骤
本文转自:http://www.cnblogs.com/aneasystone/archive/2012/08/27/2659183.html 在NopCommerce中新增一个Domain Mode ...
S5中新增的Array方法详细说明
ES5中新增的Array方法详细说明 by zhangxinxu from http://www.zhangxinxu.com 本文地址:http://www.zhangxinxu.com/wor ...
ES5中新增的Array方法详细说明
一.前言-索引 ES5中新增的不少东西,了解之对我们写JavaScript会有不少帮助,比如数组这块,我们可能就不需要去有板有眼地for循环了. ES5中新增了写数组方法,如下: forEach (j ...
AJAX-----13HTML5中新增的API---FormData
FormData 表单数据对象,这是在HTML5中新增的一个API,他能以表单对象做参数,自动的将表单的数据打包,当ajax发送数据是,发送FormData内的表单数据给后端即可 <!DOCTY ...
SQL Server 2008中新增的 1.变更数据捕获（CDC）和 2.更改跟踪
概述 1.变更数据捕获(CDC) 每一次的数据操作都会记录下来 2.更改跟踪只会记录最新一条记录以上两种的区别: http://blog.csdn.n ...
2dx解析cocosbuilder中使用layer时的缺陷
2dx解析cocosbuilder中使用layer时的缺陷 cocos2d-x 3.7 cocosbuilder中的layer通常会用到触摸属性: 但是在2dx解析布局文件的时候,却很多属性都没解析: ...

随机推荐

简易的mysql性能查询脚本
#!/bin/bash mysqladmin -P3306 -uroot -p -h127. -r -i ext |\ awk -F"|" \ "BEGIN{ count ...
linux设备驱动程序 - 待解决问题记录
1.每个模式都有自己的内存映射,也即自己的地址空间?(P26) http://www.cnblogs.com/wuchanming/p/4360277.html (不知道是不是,没时间看)
django第六天(模板相关,过滤器和标记)
django第6天 DTL简介 django template language django模板语言语法: filter{{}} tag{% %} 简单变量的使用视图函数可以通过两种方式将变量船 ...
ProxyHandler处理器__代理设置__自定义opener
ProxyHandler处理器(代理设置) 使用代理IP,这是爬虫/反爬虫的第二大招,通常也是最好用的. 很多网站会检测某一段时间某个IP的访问次数(通过流量统计,系统日志等),如果访问次数多的不像正 ...
【0门槛】PR稿的自我修养
本文来自网易云社区作者:巩爽十一过完,离2018年结束就只剩下85天啦!是不是2016年许下的2017年的梦想,在2018年还没有实现? 做过的项目仿佛都小有成就,可惜只是内部自嗨,想做域外宣传却 ...
.NET重构（六）：删除用户和结账的理解
导读:这是第二回机房了,第一回不明不白,不清不楚的就过去了(相对),这一回,有了新的发现.就是在用户删除的时候,涉及到的一些逻辑问题,以及结账时的数据来源问题. 一.用户删除问题:第一次机房,包括重 ...
自动化运维之shell引号和正则表达式(二)
1 shell引号 1)反斜线\ 转译 echo * 显示当前目录中所有的文件列表 echo \* 显示*字符换行 find / \ 换行输入多行命令 > -name "test.t ...
iOS NSLog各种打印
%@ 对象 %d,%i 整型 (%i的老写法) %hd 短整型 %ld , %lld 长整型 %u 无符整型 %f 浮点型和double型 %0.2f 精度浮点数,只保留两位小数 %x: 为32 ...
haskell 乱搞笔记[原创]
脑洞时间:为什么世界上有那么多程序语言,那是腐朽的资本主义为了增加广大人民学习成本以及编译原理太过普及造成的,建议大学取消编译原理的一切课程,并挥起奥姆休的剃刀,把所有程序语言统统踢了,除机器 ...
Linux（13）：期中架构（5）--- 前端部分：keepalived高可用 & HTTPS & iptables防火墙
keepalived 高可用集群 1. keepalived服务概念说明 # 1.1 keepalived软件的作用? Keepalived软件起初是专为LVS负载均衡软件设计的, 用来管理并监控LV ...

paddle中新增layer

paddle中新增layer的更多相关文章

随机推荐

热门专题