Training LeNet on MNIST with Caffe

We will assume that you have Caffe successfully compiled. If not, please refer to the Installation page. In this tutorial, we will assume that your Caffe installation is located at CAFFE_ROOT.

Prepare Datasets

You will first need to download and convert the data format from the MNIST website. To do this, simply run the following commands:

cd $CAFFE_ROOT

./data/mnist/get_mnist.sh

./examples/mnist/create_mnist.sh

If it complains that wget or gunzip are not installed, you need to install them respectively. After running the script there should be two datasets, mnist_train_lmdb, and mnist_test_lmdb.

LeNet: the MNIST Classification Model

Before we actually run the training program, let’s explain what will happen. We will use the LeNetnetwork, which is known to work well on digit classification tasks. We will use a slightly different version from the original LeNet implementation, replacing the sigmoid activations with Rectified Linear Unit (ReLU) activations for the neurons.

The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt.

Define the MNIST Network

This section explains the lenet_train_test.prototxt model definition that specifies the LeNet model for MNIST handwritten digit classification. We assume that you are familiar with Google Protobuf, and assume that you have read the protobuf definitions used by Caffe, which can be found at$CAFFE_ROOT/src/caffe/proto/caffe.proto.

Specifically, we will write a caffe::NetParameter (or in python, caffe.proto.caffe_pb2.NetParameter) protobuf. We will start by giving the network a name:

name: "LeNet"

Writing the Data Layer

Currently, we will read the MNIST data from the lmdb we created earlier in the demo. This is defined by a data layer:

layer {

  name: "mnist"

  type: "Data"

  data_param {

    source: "mnist_train_lmdb"

    backend: LMDB

    batch_size: 64

    scale: 0.00390625

  }

  top: "data"

  top: "label"

}

Specifically, this layer has name mnist, type data, and it reads the data from the given lmdb source. We will use a batch size of 64, and scale the incoming pixels so that they are in the range [0,1). Why 0.00390625? It is 1 divided by 256. And finally, this layer produces two blobs, one is the data blob, and one is the label blob.

Writing the Convolution Layer

Let’s define the first convolution layer:

layer {

  name: "conv1"

  type: "Convolution"

  param { lr_mult: 1 }

  param { lr_mult: 2 }

  convolution_param {

    num_output: 20

    kernel_size: 5

    stride: 1

    weight_filler {

      type: "xavier"

    }

    bias_filler {

      type: "constant"

    }

  }

  bottom: "data"

  top: "conv1"

}

This layer takes the data blob (it is provided by the data layer), and produces the conv1 layer. It produces outputs of 20 channels, with the convolutional kernel size 5 and carried out with stride 1.

The fillers allow us to randomly initialize the value of the weights and bias. For the weight filler, we will use the xavier algorithm that automatically determines the scale of initialization based on the number of input and output neurons. For the bias filler, we will simply initialize it as constant, with the default filling value 0.

lr_mults are the learning rate adjustments for the layer’s learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates.

Writing the Pooling Layer

Phew. Pooling layers are actually much easier to define:

layer {

  name: "pool1"

  type: "Pooling"

  pooling_param {

    kernel_size: 2

    stride: 2

    pool: MAX

  }

  bottom: "conv1"

  top: "pool1"

}

This says we will perform max pooling with a pool kernel size 2 and a stride of 2 (so no overlapping between neighboring pooling regions).

Similarly, you can write up the second convolution and pooling layers. Check$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt for details.

Writing the Fully Connected Layer

Writing a fully connected layer is also simple:

layer {

  name: "ip1"

  type: "InnerProduct"

  param { lr_mult: 1 }

  param { lr_mult: 2 }

  inner_product_param {

    num_output: 500

    weight_filler {

      type: "xavier"

    }

    bias_filler {

      type: "constant"

    }

  }

  bottom: "pool2"

  top: "ip1"

}

This defines a fully connected layer (known in Caffe as an InnerProduct layer) with 500 outputs. All other lines look familiar, right?

Writing the ReLU Layer

A ReLU Layer is also simple:

layer {

  name: "relu1"

  type: "ReLU"

  bottom: "ip1"

  top: "ip1"

}

Since ReLU is an element-wise operation, we can do in-place operations to save some memory. This is achieved by simply giving the same name to the bottom and top blobs. Of course, do NOT use duplicated blob names for other layer types!

After the ReLU layer, we will write another innerproduct layer:

layer {

  name: "ip2"

  type: "InnerProduct"

  param { lr_mult: 1 }

  param { lr_mult: 2 }

  inner_product_param {

    num_output: 10

    weight_filler {

      type: "xavier"

    }

    bias_filler {

      type: "constant"

    }

  }

  bottom: "ip1"

  top: "ip2"

}

Writing the Loss Layer

Finally, we will write the loss!

layer {

  name: "loss"

  type: "SoftmaxWithLoss"

  bottom: "ip2"

  bottom: "label"

}

The softmax_loss layer implements both the softmax and the multinomial logistic loss (that saves time and improves numerical stability). It takes two blobs, the first one being the prediction and the second one being the label provided by the data layer (remember it?). It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to ip2. This is where all magic starts.

Additional Notes: Writing Layer Rules

Layer definitions can include rules for whether and when they are included in the network definition, like the one below:

layer {

  // ...layer definition...

  include: { phase: TRAIN }

}

This is a rule, which controls layer inclusion in the network, based on current network’s state. You can refer to $CAFFE_ROOT/src/caffe/proto/caffe.proto for more information about layer rules and model schema.

In the above example, this layer will be included only in TRAIN phase. If we change TRAIN with TEST, then this layer will be used only in test phase. By default, that is without layer rules, a layer is always included in the network. Thus, lenet_train_test.prototxt has two DATA layers defined (with differentbatch_size), one for the training phase and one for the testing phase. Also, there is an Accuracy layer which is included only in TEST phase for reporting the model accuracy every 100 iteration, as defined in lenet_solver.prototxt.

Define the MNIST Solver

Check out the comments explaining each line in the prototxt$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt:

# The train/test net protocol buffer definition

net: "examples/mnist/lenet_train_test.prototxt"

# test_iter specifies how many forward passes the test should carry out.

# In the case of MNIST, we have test batch size 100 and 100 test iterations,

# covering the full 10,000 testing images.

test_iter: 100

# Carry out testing every 500 training iterations.

test_interval: 500

# The base learning rate, momentum and the weight decay of the network.

base_lr: 0.01

momentum: 0.9

weight_decay: 0.0005

# The learning rate policy

lr_policy: "inv"

gamma: 0.0001

power: 0.75

# Display every 100 iterations

display: 100

# The maximum number of iterations

max_iter: 10000

# snapshot intermediate results

snapshot: 5000

snapshot_prefix: "examples/mnist/lenet"

# solver mode: CPU or GPU

solver_mode: GPU

Training and Testing the Model

Training the model is simple after you have written the network definition protobuf and solver protobuf files. Simply run train_lenet.sh, or the following command directly:

cd $CAFFE_ROOT

./examples/mnist/train_lenet.sh

train_lenet.sh is a simple script, but here is a quick explanation: the main tool for training is caffewith action train and the solver protobuf text file as its argument.

When you run the code, you will see a lot of messages flying by like this:

I1203 net.cpp:66] Creating Layer conv1

I1203 net.cpp:76] conv1 <- data

I1203 net.cpp:101] conv1 -> conv1

I1203 net.cpp:116] Top shape: 20 24 24

I1203 net.cpp:127] conv1 needs backward computation.

These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:

I1203 net.cpp:142] Network initialization done.

I1203 solver.cpp:36] Solver scaffolding done.

I1203 solver.cpp:44] Solving LeNet

Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 1000 iterations. You will see messages like this:

I1203 solver.cpp:204] Iteration 100, lr = 0.00992565

I1203 solver.cpp:66] Iteration 100, loss = 0.26044

...

I1203 solver.cpp:84] Testing net

I1203 solver.cpp:111] Test score #0: 0.9785

I1203 solver.cpp:111] Test score #1: 0.0606671

For each training iteration, lr is the learning rate of that iteration, and loss is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.

And after a few minutes, you are done!

I1203 solver.cpp:84] Testing net

I1203 solver.cpp:111] Test score #0: 0.9897

I1203 solver.cpp:111] Test score #1: 0.0324599

I1203 solver.cpp:126] Snapshotting to lenet_iter_10000

I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate

I1203 solver.cpp:78] Optimization Done.

The final model, stored as a binary protobuf file, is stored at

lenet_iter_10000

which you can deploy as a trained model in your application, if you are training on a real-world application dataset.

Um… How about GPU training?

You just did! All the training was carried out on the GPU. In fact, if you would like to do training on CPU, you can simply change one line in lenet_solver.prototxt:

# solver mode: CPU or GPU

solver_mode: CPU

and you will be using CPU for training. Isn’t that easy?

MNIST is a small dataset, so training with GPU does not really introduce too much benefit due to communication overheads. On larger datasets with more complex models, such as ImageNet, the computation speed difference will be more significant.

How to reduce the learning rate at fixed steps?

Look at lenet_multistep_solver.prototxt

【Caffe 测试】Training LeNet on MNIST with Caffe的更多相关文章

windows下使用caffe测试mnist数据集
在win10机子上装了caffe,感谢大神们的帖子,要入坑caffe-windows的朋友们看这里,还有这里,安装下来基本没什么问题. 好了,本博文写一下使用caffe测试mnist数据集的步骤. 1 ...
基于LeNet的手写汉字识别(caffe)
我假设已经成功编译caffe,如果没有,请参考http://caffe.berkeleyvision.org/installation.html 在本教程中,我假设你的caffe安装目录是CAFFE_ ...
Caffe介绍与测试及相关Hi35xx平台下caffe yolox的使用参考
这一篇我大概讲讲Caffe框架下MNIST的实现与基于Hi35xx平台下caffe yolox的运用等,供大家参考 1.Caffe介绍与测试 caffe全称Caffe Convolutional Ar ...
Caffe测试单独的算子
最近有一个需求是测试单独算子在CPU.Caffe使用的GPU.cuDNN上的性能,一个是使用caffe的time问题,还有一个是使用单独的test功能. time选项的使用,大家都比较熟悉,单独的te ...
Caffe学习笔记1--Ubuntu 14.04 64bit caffe安装
本篇博客主要用于记录Ubuntu 14.04 64bit操作系统搭建caffe环境,目前针对的的是CPU版本: 1.安装依赖库 sudo apt-get install libprotobuf-dev ...
LeNet训练MNIST
jupyter notebook: https://github.com/Penn000/NN/blob/master/notebook/LeNet/LeNet.ipynb LeNet训练MNIST ...
Caffe学习笔记（三）：Caffe数据是如何输入和输出的？
Caffe学习笔记(三):Caffe数据是如何输入和输出的? Caffe中的数据流以Blobs进行传输,在<Caffe学习笔记(一):Caffe架构及其模型解析>中已经对Blobs进行了简 ...
Caffe学习笔记（一）：Caffe架构及其模型解析
Caffe学习笔记(一):Caffe架构及其模型解析写在前面:关于caffe平台如何快速搭建以及如何在caffe上进行训练与预测,请参见前面的文章<caffe平台快速搭建:caffe+wind ...
Caffe学习笔记（二）：Caffe前传与反传、损失函数、调优
Caffe学习笔记(二):Caffe前传与反传.损失函数.调优在caffe框架中,前传/反传(forward and backward)是一个网络中最重要的计算过程:损失函数(loss)是学习的驱动 ...

随机推荐

js生成随机字符串或者随机数
//返回一个指定范围内的随机数 function createRandomNum(Min,Max){ let Range = Max - Min; let Rand = Math.random(); ...
javascript 事件流及应用
当页面元素触发事件的时候,该元素的容器以及整个页面都会按照特定顺序发生该元素的触发事件,事件传播的顺序叫做事件流 1.事件流的分类: A.冒泡型事件(所有浏览器都支持) 由明确的事件源到最不确定 ...
Spring-Boot-XML-Restful-Service
http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#howto-write-an-xml-rest-service ...
PDO操作mysql数据库(二)
从 MySQL 数据库读取数据 <?php $server = "localhost"; $user = "root"; $pwd = "123 ...
使用reinterpret_cast的危险
关键字: c++ cast // Cast.cpp : Defines the entry point for the console application. // #include "s ...
不建议使用NSUserDefault存储大量数据
NSUserDefaults 其实是一个 plist 文件(有待验证),即使只是修改一个 key 都会 load 整个文件,不适合存储大量数据. NSUserDefaults是保存成文本格式的,容易被 ...
C语言写解一元二次方程程序心得
前言:在网上看到不少解一元二次方程的小程序,在使用时总得出一大堆小数,感觉很不爽,遂自己重新写了一遍. 首先,先回忆一下一元二次方程的求根公式: 分别读取二次项.一次项和常数项系数并且求出delta ...
Jquery效果代码--（二）
//jQuery 效果- 隐藏和显示.通过 jQuery,您可以使用 hide() 和 show() 方法来隐藏和显示 HTML 元素: //掩藏效果演示: $(document).ready(fun ...
把内表 itab1 的 n1 到 n2 行内容附加到 itab2 内表中去.
语法:append lines of itab1 [ from n1 ] [ to n2 ] to itab2. DATA:BEGIN OF gt_00 OCCURS 0, l_01 ...
statspack系列6
原文:http://jonathanlewis.wordpress.com/2006/12/27/analysing-statspack-6/ 作者:Jonathan Lewis 下面是一段时间以前网 ...

【Caffe 测试】Training LeNet on MNIST with Caffe