mxnet设置动态学习率（learning rate）

https://blog.csdn.net/xiaotao_1/article/details/78874336

如果learning rate很大，算法会在局部最优点附近来回跳动，不会收敛；
　　如果learning rate太小，算法每步的移动距离很短，就会导致算法收敛速度很慢。
　　所以我们可以先设置一个比较大的学习率，随着迭代次数的增加慢慢降低它。mxnet中有现成的类class，我们可以直接引用。
　　这里有三种mxnet.lr_scheduler。
　　第一种是：

mxnet.lr_scheduler.FactorScheduler(step, factor=1, stop_factor_lr=1e-08)
# Reduce the learning rate by a factor for every n steps.
# It returns a new learning rate by:
base_lr * pow(factor, floor(num_update/step))

# Parameters:
step (int) – Changes the learning rate for every n updates.
factor (float, optional) – The factor to change the learning rate.
stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.
1
2
3
4
5
6
7
8
9
　　例如：

lr_sch = mxnet.lr_scheduler.FactorScheduler(step=500, factor=0.9)
model.fit(
train_iter,
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate': 0.1, 'lr_scheduler': lr_sch},
eval_metric=metric,
num_epoch=num_epoch,
1
2
3
4
5
6
7
8
　　这里就表示：初始学习率是0.1 。经过500次参数更新后，学习率变为0.1×0.90.1×0.9。经过1000次参数更新之后，学习率变为0.1×0.9×0.90.1×0.9×0.9
　　第二种是：

class mxnet.lr_scheduler.LRScheduler(base_lr=0.01)
# Base class of a learning rate scheduler.
# A scheduler returns a new learning rate based on the number of updates that have been performed.
Parameters: base_lr (float, optional) – The initial learning rate.

__call__(num_update)
# Return a new learning rate.
# The num_update is the upper bound of the number of updates applied to every weight.
# Assume the optimizer has updated i-th weight by k_i times, namely optimizer.update(i, weight_i) is called by k_i times. Then:
num_update = max([k_i for all i])
Parameters: num_update (int) – the maximal number of updates applied to a weight.
1
2
3
4
5
6
7
8
9
10
11
　　第三种是：

class mxnet.lr_scheduler.MultiFactorScheduler(step, factor=1)
# Reduce the learning rate by given a list of steps.
# Assume there exists k such that:
step[k] <= num_update and num_update < step[k+1]

# Then calculate the new learning rate by:
base_lr * pow(factor, k+1)
# Parameters:
step (list of int) – The list of steps to schedule a change
factor (float) – The factor to change the learning rate.
1
2
3
4
5
6
7
8
9
10
11
参考：https://mxnet.incubator.apache.org/api/python/optimization/optimization.html#mxnet.lr_scheduler.LRScheduler
---------------------
作者：xiaotao_1
来源：CSDN
原文：https://blog.csdn.net/xiaotao_1/article/details/78874336
版权声明：本文为博主原创文章，转载请附上博文链接！

mxnet设置动态学习率（learning rate）的更多相关文章

深度学习: 学习率 (learning rate)
Introduction 学习率 (learning rate),控制模型的学习进度 : lr 即 stride (步长) ,即反向传播算法中的 ηη : ωn←ωn−η∂L∂ωnωn←ωn−η∂ ...
学习率(Learning rate)的理解以及如何调整学习率
1. 什么是学习率(Learning rate)? 学习率(Learning rate)作为监督学习以及深度学习中重要的超参,其决定着目标函数能否收敛到局部最小值以及何时收敛到最小值.合适的学习率 ...
学习率 Learning Rate
本文从梯度学习算法的角度中看学习率对于学习算法性能的影响,以及介绍如何调整学习率的一般经验和技巧. 在机器学习中,监督式学习(Supervised Learning)通过定义一个模型,并根据训练集上的 ...
Dynamic learning rate in training - 培训中的动态学习率
I'm using keras 2.1.* and want to change the learning rate during training. I know about the schedul ...
权重衰减（weight decay）与学习率衰减（learning rate decay）
本文链接:https://blog.csdn.net/program_developer/article/details/80867468“微信公众号” 1. 权重衰减(weight decay)L2 ...
Keras 自适应Learning Rate (LearningRateScheduler)
When training deep neural networks, it is often useful to reduce learning rate as the training progr ...
跟我学算法-吴恩达老师(mini-batchsize，指数加权平均，Momentum 梯度下降法，RMS prop， Adam 优化算法， Learning rate decay)
1.mini-batch size 表示每次都只筛选一部分作为训练的样本,进行训练,遍历一次样本的次数为(样本数/单次样本数目) 当mini-batch size 的数量通常介于1,m 之间当 ...
TensorFlow使用记录 (三）： Learning Rate Scheduling
file: tensorflow/python/training/learning_rate_decay.py 参考:tensorflow中常用学习率更新策略神经网络中通过超参数 learning ...
Batchsize与learning rate
https://www.zhihu.com/question/64134994 1.增加batch size会使得梯度更准确,但也会导致variance变小,可能会使模型陷入局部最优: 2.因此增大b ...

随机推荐

【LeetCode每天一题】Combination Sum II（组合和II）
Given a collection of candidate numbers (candidates) and a target number (target), find all unique c ...
开启Laravel之旅的标准姿势
1.github下载最新的laravel https://github.com/laravel/laravel 2.下载到本地,改名,composer install,安装项目的依赖包 compose ...
numpy.meshgrid()
numpy提供的numpy.meshgrid()函数可以让我们快速生成坐标矩阵X,Y 语法:X,Y = numpy.meshgrid(x, y)输入:x,y,就是网格点的横纵坐标列向量(非矩阵)输出: ...
JavaScript原型继承的实例
// 创建构造函数实例(获取DOM节点) <div id="app">测试字符</div>
C#学习入门第二篇
4.转义字符\b 退格符\n 换行\r 回车,移到本行开头\t 水平制表符\\ 代表反斜线字符“\“\' 代表一个单引号字符@字在字符串前面表示 ...
C#6.0中10大新特性的应用和总结
微软发布C#6.0.VS2015等系列产品也有一段时间了,但是网上的教程却不多,这里真对C#6.0给大家做了一些示例,分享给大家. 微软于2015年7月21日发布了Visual Studio 20 ...
PB中datewindow单双行显示不同颜色
调出datewindow,找到detail中的列,右击properties,左侧Background中的color属性添加 IF(MOD(GETROW(),2)=0,RGB( 255, 250, 20 ...
tcl脚本
tcl,全名tool command language,是一种通用的工具语言. 1)每个命令之间,通过换行符或者分号隔开: 2)tcl的每个命令包含一个或者多个单词,默认第一个单词表示命令,第二个单词 ...
websocket服务器握手协议
测试网页代码如下 <!DOCTYPE html> <html> <head> <title>测试 websocket 世界最简单案例</title ...
HDU1007.Quoit Design
-- 点我 -- 题目大意 :给你一堆点,求一个最小圆能够覆盖两个点的半径(最近两点距离的一半): 最多100000个点,暴力即O(n^2)会超时,考虑二分,先求左边最短距离dl,右边dr, 和一个点 ...

mxnet设置动态学习率（learning rate）

mxnet设置动态学习率（learning rate）的更多相关文章

随机推荐

热门专题