1. 前向传播的计算ConvolutionLayer<Dtype>::Forward_cpu

注：不考虑反向传播的计算过程…

前向传播时，分别调用base_conv_layer.cpp中的BaseConvolutionLayer<Dtype>::forward_cpu_gemm和base_conv_layer.cpp中的BaseConvolutionLayer<Dtype>::forward_cpu_bias

     template <typename Dtype>

     void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

         const vector<Blob<Dtype>*>& top)

     {

         const Dtype* weight = this->blobs_[]->cpu_data();     // weight参数

         for (int i = ; i < bottom.size(); ++i) {              // 多少个输入。一般1个的比较常见吧

             const Dtype* bottom_data = bottom[i]->cpu_data();    // 第i个输入：NCHiWi

             Dtype* top_data = top[i]->mutable_cpu_data();        // 第i个输出：NCHoWo

             for (int n = ; n < this->num_; ++n) {               // batchsize

                 //forward_cpu_gemm输入为第n个channel的起始位置(C*Hi*Wi)，及权重参数(No*Ni*Kh*Kw)，输出为第n个channel的起始位置,(C*Ho*Wo)

                 this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight, top_data + n * this->top_dim_);

                 if (this->bias_term_) {                            // 含有bias

                     const Dtype* bias = this->blobs_[]->cpu_data(); // bias参数

                     this->forward_cpu_bias(top_data + n * this->top_dim_, bias);  // 计算增加bias后的输出

                 }

             }

         }

     }

在forward之前，计算输出特征的尺寸函数为compute_output_shape

     template <typename Dtype>

     void ConvolutionLayer<Dtype>::compute_output_shape() {

         const int* kernel_shape_data = this->kernel_shape_.cpu_data();

         const int* stride_data = this->stride_.cpu_data();

         const int* pad_data = this->pad_.cpu_data();

         const int* dilation_data = this->dilation_.cpu_data();   // 卷积核膨胀的宽高，默认为1；核膨胀，即在核中间加0

         this->output_shape_.clear();

         for (int i = ; i < this->num_spatial_axes_; ++i) {   // HW总共维度，num_spatial_axes_=2

             // i + 1 to skip channel axis

             const int input_dim = this->input_shape(i + ); //inline int input_shape(int i) {return (*bottom_shape_)[channel_axis_ + i];}

             const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - ) + ;  //得到膨胀之后的核的尺寸

             const int output_dim = (input_dim +  * pad_data[i] - kernel_extent) / stride_data[i] + ;  //得到输出特征的尺寸

             this->output_shape_.push_back(output_dim);   // 输出特征宽高

         }

     }

2. forward_cpu_gemm

该函数首先判断是否为1*1的卷积，如果不是，则调用conv_im2col_cpu函数，将输入ChiWi变换成(C*Kh*Kw)*Ho*Wo的临时矩阵col_buffer_。

之后调用caffe_cpu_gemm，每次计算一部分输出，如果group_为1，则一次计算完：output（Co*（Ho*Wo））=1* weights（Co*（Ci*Kh*Kw））* col_buff（（Ci*Kh*Kw）*（Ho*Wo）） + 0* output

     template <typename Dtype>

     void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,

         const Dtype* weights, Dtype* output, bool skip_im2col) {  //bool skip_im2col = false

         const Dtype* col_buff = input;

         if (!is_1x1_) {  // 不是1*1卷积

             if (!skip_im2col)

             {

                 // 调用base_conv_layer.hpp中的im2col_cpu，将输入CiHiWi变换成(Ci*Kh*Kw)*Ho*Wo的临时变量

                 // 由于调用本函数的函数ConvolutionLayer<Dtype>::Forward_cpu中调用batchsize次本函数，因而本函数内部不包含batchsize

                 conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());

             }

             col_buff = col_buffer_.cpu_data();

         }

         for (int g = ; g < group_; ++g) {  // group_默认为1

             caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ / group_,  // Co

                 conv_out_spatial_dim_, kernel_dim_,    // Ho*Wo    //  卷积核的Ci*Kh*Kw

                 (Dtype)., weights + weight_offset_ * g, col_buff + col_offset_ * g,

                 (Dtype)., output + output_offset_ * g);

         }

     }

3. conv_im2col_cpu

该函数为内联函数，对im2col_cpu进行了封装，方便调用，如下：

         inline void conv_im2col_cpu(const Dtype* data, Dtype* col_buff) {

             if (!force_nd_im2col_ && num_spatial_axes_ == ) {

                 im2col_cpu(data, conv_in_channels_,

                     conv_input_shape_.cpu_data()[], conv_input_shape_.cpu_data()[],

                     kernel_shape_.cpu_data()[], kernel_shape_.cpu_data()[],

                     pad_.cpu_data()[], pad_.cpu_data()[],

                     stride_.cpu_data()[], stride_.cpu_data()[],

                     dilation_.cpu_data()[], dilation_.cpu_data()[], col_buff);

             }

             else {

                 im2col_nd_cpu(data, num_spatial_axes_, conv_input_shape_.cpu_data(),

                     col_buffer_shape_.data(), kernel_shape_.cpu_data(),

                     pad_.cpu_data(), stride_.cpu_data(), dilation_.cpu_data(), col_buff);

             }

         }

4. im2col_cpu

该函数用于将图像转换成卷积所需的列格式。a中黑色实线方框中为特征（或像素），虚线中为边界填充的0，红色虚线框为3*3的卷积核大小。如对于a所示的7*9输入图像（为方便b中的显示，因而a中值为1—63），四个边界各填充一个0后，通过该函数，得到的col格式如b所示，其中红色虚线为a中的位置对应的列格式的像素。b中…代表依次递增的5个特征。可以认为b中矩阵为一个kernel_h*kernel_w*output_h*output_w的行向量，也可以认为是一个（kernel_h*kernel_w）*（output_h*output_w）的2维的矩阵（每一行的长度为output_h*output_w）。通过这种方式得到的col格式数据，与卷积核可通过矩阵相乘，提高运算速度。

该函数代码如下。其中output_rows的for循环对应b中的蓝色箭头范围，output_col的for循环对应b中的橙色半框范围。

     template <typename Dtype>

     void im2col_cpu(const Dtype* data_im, const int channels,  // channels为输入特征个数

         const int height, const int width, const int kernel_h, const int kernel_w,

         const int pad_h, const int pad_w,  // 特征边界填充的宽高

         const int stride_h, const int stride_w,   // 间隔的宽高

         const int dilation_h, const int dilation_w, // 卷积核膨胀的宽高，默认为1；核膨胀，即在核中间加0 // https://blog.csdn.net/wangyuxi__/article/details/83003357

         Dtype* data_col) {  //  为(kernel_h*kernel_w)*(output_h*output_w)的缓冲区。每一行为滑动窗口的某个位置对应的所有特征

         const int output_h = (height +  * pad_h - (dilation_h * (kernel_h - ) + )) / stride_h + ;  // 输出特征宽高

         const int output_w = (width +  * pad_w - (dilation_w * (kernel_w - ) + )) / stride_w + ;

         const int channel_size = height * width;  // 输入特征的每个通道的总特征数

         for (int channel = channels; channel--; data_im += channel_size)   // 每次循环完毕，输入特征偏移一个通道

         {

             for (int kernel_row = ; kernel_row < kernel_h; kernel_row++)

             {

                 for (int kernel_col = ; kernel_col < kernel_w; kernel_col++)

                 {

                     int input_row = -pad_h + kernel_row * dilation_h;  // 每次核在特征上的起始行坐标

                     for (int output_rows = output_h; output_rows; output_rows--)  // 遍历输入特征每行

                     {

                         if (!is_a_ge_zero_and_a_lt_b(input_row, height))   // a<0 或者 a>=b，即当前行超出输入边界

                         {

                             for (int output_cols = output_w; output_cols; output_cols--)  // 每列填0

                             {

                                 *(data_col++) = ;

                             }

                         }

                         else {   // 当前行在输入边界内

                             int input_col = -pad_w + kernel_col * dilation_w;  // 每次核在特征上的起始列坐标

                             for (int output_col = output_w; output_col; output_col--)  // 遍历输入特征每列

                             {

                                 if (is_a_ge_zero_and_a_lt_b(input_col, width))    // 当前列在输入边界内

                                 {

                                     *(data_col++) = data_im[input_row * width + input_col];  // 将输入特征赋值给data_col

                                 }

                                 else   // 当前列超出输入边界

                                 {

                                     *(data_col++) = ;

                                 }

                                 input_col += stride_w;  // 输入特征位置增加stride_w

                             }

                         }

                         input_row += stride_h;  // 输入特征位置增加stride_h

                     }

                 }

             }

         }

     }

5. BaseConvolutionLayer<Dtype>::forward_cpu_bias

该函数为output =1*bias（C*1）* bias_multiplier_（1*（H*W））+ 1*output。其中C为输出特征的通道数No，H为特征高Ho，W为特征宽Wo，最终得到某个batch中CoHoWo的特征。

     template <typename Dtype>

     void BaseConvolutionLayer<Dtype>::forward_cpu_bias(Dtype* output,

         const Dtype* bias) {

         caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_,   //输出特征维度No

             out_spatial_dim_, , (Dtype)., bias, bias_multiplier_.cpu_data(),  // Wo*Ho

             (Dtype)., output);

     }

bias_multiplier_为1*（Wo*Ho）的向量，在void BaseConvolutionLayer<Dtype>::Reshape中将其所有的值均设置为1：

         out_spatial_dim_ = top[]->count(first_spatial_axis);  // Wo*Ho

         if (bias_term_) {

             vector<int> bias_multiplier_shape(, out_spatial_dim_);

             bias_multiplier_.Reshape(bias_multiplier_shape);

             caffe_set(bias_multiplier_.count(), Dtype(),  // bias_multiplier_为1*（Wo*Ho）的向量，所有元素值为1

                 bias_multiplier_.mutable_cpu_data());

         }

6. caffe_cpu_gemm

该函数调用cblas_sgemm，实现矩阵相乘：

 template<>

 void caffe_cpu_gemm<float>(const CBLAS_TRANSPOSE TransA,

     const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,

     const float alpha, const float* A, const float* B, const float beta,

     float* C) {

   int lda = (TransA == CblasNoTrans) ? K : M;

   int ldb = (TransB == CblasNoTrans) ? N : K;

   cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,

       ldb, beta, C, N);

 }

cblas_sgemm具体见：http://www.cnblogs.com/darkknightzh/p/5553336.html

（原）caffe中的conv的更多相关文章

（原）torch和caffe中的BatchNorm层
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/6015990.html BatchNorm具体网上搜索. caffe中batchNorm层是通过Batc ...
（原）caffe中通过图像生成lmdb格式的数据
转载请注明出处: http://www.cnblogs.com/darkknightzh/p/5909121.html 参考网址: http://www.cnblogs.com/wangxiaocvp ...
caffe 中base_lr、weight_decay、lr_mult、decay_mult代表什么意思？
在机器学习或者模式识别中,会出现overfitting,而当网络逐渐overfitting时网络权值逐渐变大,因此,为了避免出现overfitting,会给误差函数添加一个惩罚项,常用的惩罚项是所有权 ...
caffe代码阅读10：Caffe中卷积的实现细节（涉及到BaseConvolutionLayer、ConvolutionLayer、im2col等）-2016.4.3
一. 卷积层的作用简单介绍卷积层是深度神经网络中的一个重要的层,该层实现了局部感受野.通过这样的局部感受野,能够有效地减少參数的数目. 我们将结合caffe来解说详细是怎样实现卷积层的前传和反传的. ...
caffe中权值初始化方法
首先说明:在caffe/include/caffe中的 filer.hpp文件中有它的源文件,如果想看,可以看看哦,反正我是不想看,代码细节吧,现在不想知道太多,有个宏观的idea就可以啦,如果想看代 ...
在caffe中使用hdf5的数据
caffe默认使用的数据格式为lmdb文件格式,它提供了把图片转为lmdb文件格式的小程序,但是呢,我的数据为一维的数据,我也要分类啊,那我怎么办?肯定有办法可以转为lmdb文件格式的,我也看了一些源 ...
caffe中各层的作用:
关于caffe中的solver: cafffe中的sover的方法都有: Stochastic Gradient Descent (type: "SGD"), AdaDelta ( ...
caffe中python接口的使用
下面是基于我自己的接口,我是用来分类一维数据的,可能不具通用性: (前提,你已经编译了caffe的python的接口) 添加 caffe塻块的搜索路径,当我们import caffe时,可以找到. 对 ...
C++primer原书中的一个错误（派生类using声明对基类权限的影响）
在C++primer 第4版的 15章 15.2.5中有以下这样一段提示: "注解:派生类能够恢复继承成员的訪问级别,但不能使訪问级别比基类中原来指定的更严格或者更宽松." 在vs ...

随机推荐

sql语句start with connect by prior语法解析
prior分两种放法: 1 放在子节点端表示start with 指定的节点作为根节点,按照从上到下的顺序遍历 2 放在父节点端表示start with指定的节点作为最底层节点,按照从下到上的顺序 ...
javascript监听数组变化
, ]; ); , ]; ); , ]; ); ]; , ]; ); ); ); break; } if (inserted) { ob.observeArray(inserted); } // 通知 ...
CUDA版Grabcut的实现
在上次用 CUDA实现导向滤波后,想着导向滤波能以很小的mask还原高分辨率下的边缘,能不能搞点事情出来,当时正好在研究Darknet框架,然后又看到grabcut算法,用opencv试了下,感觉效 ...
SpringMVC框架01——使用IDEA搭建SpringMVC环境
1.Spring MVC 入门 1.1.Spring MVC 简介把Web应用程序分为三层,分别是: 控制器(Controller):负责接收并处理请求,响应客户端: 模型(Model):模型数据, ...
Oracle no TOP, how to get top from order
On ROWNUM and Limiting Results Our technologist explains how ROWNUM works and how to make it work fo ...
Xamarin Essentials教程获取路径文件系统FileSystem
Xamarin Essentials教程获取路径文件系统FileSystem 文件系统用于管理设备内的各类文件.通过文件系统,应用程序可以创建永久文件和临时文件,也可以获取预先打包的文件,如预设数据库 ...
Xamarin Essentials教程语音播报TextToSpeech
Xamarin Essentials教程语音播报TextToSpeech 语音播报是一种将文本信息转化为音频信息的技术.使用该技术,开发者可以让用户不用盯着屏幕,就可以获取到信息.例如,支付宝为商 ...
System.ServiceModel.AddressAccessDeniedException
发生了 System.ServiceModel.AddressAccessDeniedException HResult=0x80131501 Message=HTTP 无法注册 URL ht ...
校园网使用IPV6 tunnel免流量上网
前段时间购买了一个vps,做梯子感觉不错,但是在校园网内,vps流量远超10块钱校园流量,眼看着上个月vps的流量被清零.但是校园网有免费的IPV6,而我的VPS也有个IPV6的地址,于是乎就想着如何 ...
[AHOI2017/HNOI2017]大佬
Description: 人们总是难免会碰到大佬.他们趾高气昂地谈论凡人不能理解的算法和数据结构,走到任何一个地方,大佬的气场就能让周围的人吓得瑟瑟发抖,不敢言语. 你作为一个 OIER,面对这样的事 ...

（原）caffe中的conv