https://blog.csdn.net/xiaotao_1/article/details/78874336

如果learning rate很大,算法会在局部最优点附近来回跳动,不会收敛;
  如果learning rate太小,算法每步的移动距离很短,就会导致算法收敛速度很慢。
  所以我们可以先设置一个比较大的学习率,随着迭代次数的增加慢慢降低它。mxnet中有现成的类class,我们可以直接引用。
  这里有三种mxnet.lr_scheduler。
  第一种是:

mxnet.lr_scheduler.FactorScheduler(step, factor=1, stop_factor_lr=1e-08)
# Reduce the learning rate by a factor for every n steps.
# It returns a new learning rate by:
base_lr * pow(factor, floor(num_update/step))

# Parameters:
step (int) – Changes the learning rate for every n updates.
factor (float, optional) – The factor to change the learning rate.
stop_factor_lr (float, optional) – Stop updating the learning rate if it is less than this value.
1
2
3
4
5
6
7
8
9
  例如:

lr_sch = mxnet.lr_scheduler.FactorScheduler(step=500, factor=0.9)
model.fit(
train_iter,
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate': 0.1, 'lr_scheduler': lr_sch},
eval_metric=metric,
num_epoch=num_epoch,
1
2
3
4
5
6
7
8
  这里就表示:初始学习率是0.1 。经过500次参数更新后,学习率变为0.1×0.90.1×0.9。经过1000次参数更新之后,学习率变为0.1×0.9×0.90.1×0.9×0.9
  第二种是:

class mxnet.lr_scheduler.LRScheduler(base_lr=0.01)
# Base class of a learning rate scheduler.
# A scheduler returns a new learning rate based on the number of updates that have been performed.
Parameters: base_lr (float, optional) – The initial learning rate.

__call__(num_update)
# Return a new learning rate.
# The num_update is the upper bound of the number of updates applied to every weight.
# Assume the optimizer has updated i-th weight by k_i times, namely optimizer.update(i, weight_i) is called by k_i times. Then:
num_update = max([k_i for all i])
Parameters: num_update (int) – the maximal number of updates applied to a weight.
1
2
3
4
5
6
7
8
9
10
11
  第三种是:

class mxnet.lr_scheduler.MultiFactorScheduler(step, factor=1)
# Reduce the learning rate by given a list of steps.
# Assume there exists k such that:
step[k] <= num_update and num_update < step[k+1]

# Then calculate the new learning rate by:
base_lr * pow(factor, k+1)
# Parameters:
step (list of int) – The list of steps to schedule a change
factor (float) – The factor to change the learning rate.
1
2
3
4
5
6
7
8
9
10
11
参考:https://mxnet.incubator.apache.org/api/python/optimization/optimization.html#mxnet.lr_scheduler.LRScheduler
---------------------
作者:xiaotao_1
来源:CSDN
原文:https://blog.csdn.net/xiaotao_1/article/details/78874336
版权声明:本文为博主原创文章,转载请附上博文链接!

mxnet设置动态学习率(learning rate)的更多相关文章

  1. 深度学习: 学习率 (learning rate)

    Introduction 学习率 (learning rate),控制 模型的 学习进度 : lr 即 stride (步长) ,即反向传播算法中的 ηη : ωn←ωn−η∂L∂ωnωn←ωn−η∂ ...

  2. 学习率(Learning rate)的理解以及如何调整学习率

    1. 什么是学习率(Learning rate)?   学习率(Learning rate)作为监督学习以及深度学习中重要的超参,其决定着目标函数能否收敛到局部最小值以及何时收敛到最小值.合适的学习率 ...

  3. 学习率 Learning Rate

    本文从梯度学习算法的角度中看学习率对于学习算法性能的影响,以及介绍如何调整学习率的一般经验和技巧. 在机器学习中,监督式学习(Supervised Learning)通过定义一个模型,并根据训练集上的 ...

  4. Dynamic learning rate in training - 培训中的动态学习率

    I'm using keras 2.1.* and want to change the learning rate during training. I know about the schedul ...

  5. 权重衰减(weight decay)与学习率衰减(learning rate decay)

    本文链接:https://blog.csdn.net/program_developer/article/details/80867468“微信公众号” 1. 权重衰减(weight decay)L2 ...

  6. Keras 自适应Learning Rate (LearningRateScheduler)

    When training deep neural networks, it is often useful to reduce learning rate as the training progr ...

  7. 跟我学算法-吴恩达老师(mini-batchsize,指数加权平均,Momentum 梯度下降法,RMS prop, Adam 优化算法, Learning rate decay)

    1.mini-batch size 表示每次都只筛选一部分作为训练的样本,进行训练,遍历一次样本的次数为(样本数/单次样本数目) 当mini-batch size 的数量通常介于1,m 之间    当 ...

  8. TensorFlow使用记录 (三): Learning Rate Scheduling

    file: tensorflow/python/training/learning_rate_decay.py 参考:tensorflow中常用学习率更新策略 神经网络中通过超参数 learning ...

  9. Batchsize与learning rate

    https://www.zhihu.com/question/64134994 1.增加batch size会使得梯度更准确,但也会导致variance变小,可能会使模型陷入局部最优: 2.因此增大b ...

随机推荐

  1. SOA架构父工程的pom配置

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/20 ...

  2. 【Java】-NO.20.Exam.1.Java.1.001- 【1z0-807】- OCEA

    1.0.0 Summary Tittle:[Java]-NO.20.Exam.1.Java.1.001-[1z0-807] Style:EBook Series:Java Since:2017-10- ...

  3. socket发送http报文的疑惑(求高手指点一二)

    给8080或80端口的服务端(自己写的serverSocket服务端)发送字符串,此字符串按照http协议拼接而成,既是所谓的http报文.服务端接受成功.如果在报头与消息体之间少了“\r\n\r\n ...

  4. [LeetCode] 613. Shortest Distance in a Line_Easy tag: SQL

    Table point holds the x coordinate of some points on x-axis in a plane, which are all integers. Writ ...

  5. Amazon RDS多区域高可用测试

    最近在AWS上面需要部署一组多区域RDS集群,AWS的多区域简单理解就是RDS一主一从分别在当地的两个机房(两个区域).所以就有了下面各方面的测试. 我们需要测试什么? Primary挂掉时,Seco ...

  6. Mybatis select、insert、update、delete 增删改查操作

    MyBatis 是支持普通 SQL 查询,存储过程和高级映射的优秀持久层框架. MyBatis 消除了几乎所有的 JDBC 代码和参数的手工设置以及对结果集的检索.MyBatis 可以使用简单的XML ...

  7. PO模型

    大神绕道而行,自我小白的笔记,仅此 一.创建文件夹,创建xxx.ini文件用来存放界面的定位元素,用 [界面_element]-->界面, 来划分界面界面元素,维护方便.定位元素的格式:  us ...

  8. 如何解决“504 Gateway Time-out”错误

    做网站的同学经常会发现一些nginx服务器访问时候提示504 Gateway Time-out错误,一般情况下是由nginx默认的fastcgi进程响应慢引起的,但也有其他情况,这里我总结了一些解决办 ...

  9. 008-副文本编辑器UEditor

    <!DOCTYPE HTML> <html lang="en-US"> <head> <meta charset="UTF-8& ...

  10. Android -- 自定义ViewGroup实现FlowLayout效果

    1,在开发的时候,常在我们的需求中会有这种效果,添加一个商品的一些热门标签,效果图如下: 2,从上面效果可以看得出来,这是一个自定义的ViewGroup,然后实现换行效果,让我们一起来实现一下 自定义 ...