分段常数衰减

分段常数衰减是在事先定义好的训练次数区间上,设置不同的学习率常数。刚开始学习率大一些,之后越来越小,区间的设置需要根据样本量调整,一般样本量越大区间间隔应该越小。tf中定义了tf.train.piecewise_constant 函数,实现了学习率的分段常数衰减功能。

指数衰减

指数衰减是比较常用的衰减方法,学习率是跟当前的训练轮次指数相关的。tf中实现指数衰减的函数是 tf.train.exponential_decay()。

- decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

TensorFlow提供了一种非常灵活的学习率设置方法,指数衰减法。通过这种方式可以很好的解决上面的问题,先用一个较大的学习率来快速得到一个比较优的参数值,然后通过迭代次数的增加逐渐减少学习率,使得保证参数极优的同时迭代次数也少。TensorFlow提供了一个exponential_decay函数会指数极的逐渐减少学习率,函数的功能有下面的公式可以表示:

decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

公式中的参数,其中decayed_learning_rate表示每轮迭代所使用的学习率,learning_rate为初始化学习率,decay_rate为衰减系数,随着迭代次数的增加,学习率会逐步降低。

tf.train.exponential_decay(learning_rate,global_step,decay_step,staircase=False,name=None)

learning_rate:一个标量类型为float32或floate64、张量或一个python数字,代表初始化的学习率

global_step:一个标量类型为int32或int64,张量或一个python数字,用于衰减计算中,不能是负数。

decay_steps:一个标量类型为int32或int64,张量或一个python数字,必须是正数,用于衰减计算中。

decay_rate:一个标量类型为float32或floate64,张量或一个python数字,表示衰减的比率。

staircase:Boolean类型,默认是False表示衰减的学习率是连续的,如果是True代表衰减的学习率是一个离散的间隔。

自然指数衰减

自然指数衰减是指数衰减的一种特殊情况,学习率也是跟当前的训练轮次指数相关,只不过以 e 为底数。tf中实现自然指数衰减的函数是 tf.train.natural_exp_decay()

多项式衰减

多项式衰减是这样一种衰减机制:定义一个初始的学习率,一个最低的学习率,按照设置的衰减规则,学习率从初始学习率逐渐降低到最低的学习率,并且可以定义学习率降低到最低的学习率之后,是一直保持使用这个最低的学习率,还是到达最低的学习率之后再升高学习率到一定值,然后再降低到最低的学习率(反复这个过程)。tf中实现多项式衰减的函数是 tf.train.polynomial_decay()

global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) *
(1 - global_step / decay_steps) ^ (power) +
end_learning_rate

余弦衰减

余弦衰减的衰减机制跟余弦函数相关,形状也大体上是余弦形状。tf中的实现函数是:tf.train.cosine_decay()

改进的余弦衰减方法还有:
线性余弦衰减,对应函数 tf.train.linear_cosine_decay()
噪声线性余弦衰减,对应函数 tf.train.noisy_linear_cosine_decay()

倒数衰减

倒数衰减指的是一个变量的大小与另一个变量的大小成反比的关系,具体到神经网络中就是学习率的大小跟训练次数有一定的反比关系。

tf中实现倒数衰减的函数是 tf.train.inverse_time_decay()。

训练模型之loss曲线滑动平均

- 只依赖python

def print_loss(config, title, loss_dict, epoch, iters, current_iter, need_plot=False):
data_str = ''
for k, v in loss_dict.items():
if data_str != '':
data_str += ', '
data_str += '{}: {:.10f}'.format(k, v) if need_plot and config.vis is not None:
plot_line(config, title, k, (epoch-1)*iters+current_iter, v) # step is the progress rate of the whole dataset (split by batchsize)
print('[{}] [{}] Epoch [{}/{}], Iter [{}/{}]'.format(title, config.experiment_name, epoch, config.epochs, current_iter, iters))
print(' {}'.format(data_str)) class AverageWithinWindow():
def __init__(self, win_size):
self.win_size = win_size
self.cache = []
self.average = 0
self.count = 0 def update(self, v):
if self.count < self.win_size:
self.cache.append(v)
self.count += 1
self.average = (self.average * (self.count - 1) + v) / self.count
else:
idx = self.count % self.win_size
self.average += (v - self.cache[idx]) / self.win_size
self.cache[idx] = v
self.count += 1 class DictAccumulator():
def __init__(self, win_size=None):
self.accumulator = OrderedDict()
self.total_num = 0
self.win_size = win_size def update(self, d):
self.total_num += 1
for k, v in d.items():
if not self.win_size:
self.accumulator[k] = v + self.accumulator.get(k,0)
else:
self.accumulator.setdefault(k, AverageWithinWindow(self.win_size)).update(v) def get_average(self):
average = OrderedDict()
for k, v in self.accumulator.items():
if not self.win_size:
average[k] = v*1.0/self.total_num
else:
average[k] = v.average
return average def train(epoch, train_loader, model):
loss_accumulator = utils.DictAccumulator(config.loss_average_win_size)
grad_accumulator = utils.DictAccumulator(config.loss_average_win_size)
score_accumulator = utils.DictAccumulator(config.loss_average_win_size)
iters = len(train_loader) for i, (inputs, targets) in enumerate(train_loader):
inputs = inputs.cuda()
print (inputs.shape)
targets = targets.cuda()
inputs = Variable(inputs)
targets = Variable(targets) net_outputs, loss, grad, lr_dict, score = model.fit(inputs, targets, update=True, epoch=epoch,
cur_iter=i+1, iter_one_epoch=iters)
loss_accumulator.update(loss)
grad_accumulator.update(grad)
score_accumulator.update(score) if (i+1) % config.loss_average_win_size == 0:
need_plot = True
if hasattr(config, 'plot_loss_start_iter'):
need_plot = (i + 1 + (epoch - 1) * iters >= config.plot_loss_start_iter)
elif hasattr(config, 'plot_loss_start_epoch'):
need_plot = (epoch >= config.plot_loss_start_epoch) utils.print_loss(config, "train_loss", loss_accumulator.get_average(), epoch=epoch, iters=iters, current_iter=i+1, need_plot=need_plot)
utils.print_loss(config, "grad", grad_accumulator.get_average(), epoch=epoch, iters=iters, current_iter=i+1, need_plot=need_plot)
utils.print_loss(config, "learning rate", lr_dict, epoch=epoch, iters=iters, current_iter=i+1, need_plot=need_plot) utils.print_loss(config, "train_score", score_accumulator.get_average(), epoch=epoch, iters=iters, current_iter=i+1, need_plot=need_plot) if epoch % config.save_train_hr_interval_epoch == 0:
k = random.randint(0, net_outputs['output'].size(0) - 1)
for name, out in net_outputs.items():
utils.save_tensor(out.data[k], os.path.join(config.TRAIN_OUT_FOLDER, 'epoch_%d_k_%d_%s.png' % (epoch, k, name))) def validate(valid_loader, model):
loss_accumulator = utils.DictAccumulator()
score_accumulator = utils.DictAccumulator() # loss of the whole validation dataset
for i, (inputs, targets) in enumerate(valid_loader):
inputs = inputs.cuda()
targets = targets.cuda() inputs = Variable(inputs, volatile=True)
targets = Variable(targets) loss, score = model.fit(inputs, targets, update=False) loss_accumulator.update(loss)
score_accumulator.update(score) return loss_accumulator.get_average(), score_accumulator.get_average()

- 依赖torch

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import time
from collections import defaultdict
from collections import deque
from datetime import datetime import torch from .comm import is_main_process class SmoothedValue(object):
"""Track a series of values and provide access to smoothed values over a
window or the global series average.
""" def __init__(self, window_size=20):
self.deque = deque(maxlen=window_size)
self.series = []
self.total = 0.0
self.count = 0 def update(self, value):
self.deque.append(value)
self.series.append(value)
self.count += 1
self.total += value @property
def median(self):
d = torch.tensor(list(self.deque))
return d.median().item() @property
def avg(self):
d = torch.tensor(list(self.deque))
return d.mean().item() @property
def global_avg(self):
return self.total / self.count class MetricLogger(object):
def __init__(self, delimiter="\t"):
self.meters = defaultdict(SmoothedValue)
self.delimiter = delimiter def update(self, **kwargs):
for k, v in kwargs.items():
if isinstance(v, torch.Tensor):
v = v.item()
assert isinstance(v, (float, int))
self.meters[k].update(v) def __getattr__(self, attr):
if attr in self.meters:
return self.meters[attr]
return object.__getattr__(self, attr) def __str__(self):
loss_str = []
for name, meter in self.meters.items():
loss_str.append(
"{}: {:.4f} ({:.4f})".format(name, meter.median, meter.global_avg)
)
return self.delimiter.join(loss_str) class TensorboardLogger(MetricLogger):
def __init__(self,
log_dir='logs',
exp_name='maskrcnn-benchmark',
start_iter=0,
delimiter='\t'): super(TensorboardLogger, self).__init__(delimiter)
self.iteration = start_iter
self.writer = self._get_tensorboard_writer(log_dir, exp_name) @staticmethod
def _get_tensorboard_writer(log_dir, exp_name):
try:
from tensorboardX import SummaryWriter
except ImportError:
raise ImportError(
'To use tensorboard please install tensorboardX '
'[ pip install tensorflow tensorboardX ].'
) if is_main_process():
timestamp = datetime.fromtimestamp(time.time()).strftime('%Y%m%d-%H:%M')
tb_logger = SummaryWriter('{}/{}-{}'.format(log_dir, exp_name, timestamp))
return tb_logger
else:
return None def update(self, ** kwargs):
super(TensorboardLogger, self).update(**kwargs)
if self.writer:
for k, v in kwargs.items():
if isinstance(v, torch.Tensor):
v = v.item()
assert isinstance(v, (float, int))
self.writer.add_scalar(k, v, self.iteration)
self.iteration += 1 def do_train(
model,
data_loader,
optimizer,
scheduler,
checkpointer,
device,
checkpoint_period,
arguments,
tb_log_dir,
tb_exp_name,
use_tensorboard=False
):
logger = logging.getLogger("maskrcnn_benchmark.trainer")
logger.info("Start training") meters = TensorboardLogger(log_dir=tb_log_dir,
exp_name=tb_exp_name,
start_iter=arguments['iteration'],
delimiter=" ") \
if use_tensorboard else MetricLogger(delimiter=" ") max_iter = len(data_loader)
start_iter = arguments["iteration"]
model.train()
start_training_time = time.time()
end = time.time()
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
data_time = time.time() - end
iteration = iteration + 1
arguments["iteration"] = iteration scheduler.step() images = images.to(device)
targets = [target.to(device) for target in targets] loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) # reduce losses over all GPUs for logging purposes
loss_dict_reduced = reduce_loss_dict(loss_dict)
losses_reduced = sum(loss for loss in loss_dict_reduced.values())
meters.update(loss=losses_reduced, **loss_dict_reduced) optimizer.zero_grad()
losses.backward()
optimizer.step() batch_time = time.time() - end
end = time.time()
meters.update(time=batch_time, data=data_time) eta_seconds = meters.time.global_avg * (max_iter - iteration)
eta_string = str(datetime.timedelta(seconds=int(eta_seconds))) if iteration % 20 == 0 or iteration == max_iter:
logger.info(
meters.delimiter.join(
[
"eta: {eta}",
"iter: {iter}",
"{meters}",
"lr: {lr:.6f}",
"max mem: {memory:.0f}",
]
).format(
eta=eta_string,
iter=iteration,
meters=str(meters),
lr=optimizer.param_groups[0]["lr"],
memory=torch.cuda.max_memory_allocated() / 1024.0 / 1024.0,
)
)
if iteration % checkpoint_period == 0:
checkpointer.save("model_{:07d}".format(iteration), **arguments)
if iteration == max_iter:
checkpointer.save("model_final", **arguments) total_training_time = time.time() - start_training_time
total_time_str = str(datetime.timedelta(seconds=total_training_time))
logger.info(
"Total training time: {} ({:.4f} s / it)".format(
total_time_str, total_training_time / (max_iter)
)
)

- 依赖torch

import math
from . import meter
import torch class MovingAverageValueMeter(meter.Meter):
def __init__(self, windowsize):
super(MovingAverageValueMeter, self).__init__()
self.windowsize = windowsize
self.valuequeue = torch.Tensor(windowsize)
self.reset() def reset(self):
self.sum = 0.0
self.n = 0
self.var = 0.0
self.valuequeue.fill_(0) def add(self, value):
queueid = (self.n % self.windowsize)
oldvalue = self.valuequeue[queueid]
self.sum += value - oldvalue
self.var += value * value - oldvalue * oldvalue
self.valuequeue[queueid] = value
self.n += 1 def value(self):
n = min(self.n, self.windowsize)
mean = self.sum / max(1, n)
std = math.sqrt(max((self.var - n * mean * mean) / max(1, n - 1), 0))
return mean, std def main():
.....
# TensorBoard Logger
writer = SummaryWriter(CONFIG.LOG_DIR)
loss_meter = MovingAverageValueMeter(20) model.train()
model.module.scale.freeze_bn() for iteration in tqdm(
range(1, CONFIG.ITER_MAX + 1),
total=CONFIG.ITER_MAX,
leave=False,
dynamic_ncols=True,
): # Set a learning rate
poly_lr_scheduler(
optimizer=optimizer,
init_lr=CONFIG.LR,
iter=iteration - 1,
lr_decay_iter=CONFIG.LR_DECAY,
max_iter=CONFIG.ITER_MAX,
power=CONFIG.POLY_POWER,
) # Clear gradients (ready to accumulate)
optimizer.zero_grad() iter_loss = 0
for i in range(1, CONFIG.ITER_SIZE + 1):
try:
images, labels = next(loader_iter)
except:
loader_iter = iter(loader)
images, labels = next(loader_iter) images = images.to(device)
labels = labels.to(device).unsqueeze(1).float() # Propagate forward
logits = model(images) # Loss
loss = 0
for logit in logits:
# Resize labels for {100%, 75%, 50%, Max} logits
labels_ = F.interpolate(labels, logit.shape[2:], mode="nearest")
labels_ = labels_.squeeze(1).long()
# Compute crossentropy loss
loss += criterion(logit, labels_) # Backpropagate (just compute gradients wrt the loss)
loss /= float(CONFIG.ITER_SIZE)
loss.backward() iter_loss += float(loss) loss_meter.add(iter_loss) # Update weights with accumulated gradients
optimizer.step() # TensorBoard
if iteration % CONFIG.ITER_TB == 0:
writer.add_scalar("train_loss", loss_meter.value()[0], iteration)
for i, o in enumerate(optimizer.param_groups):
writer.add_scalar("train_lr_group{}".format(i), o["lr"], iteration)
if False: # This produces a large log file
for name, param in model.named_parameters():
name = name.replace(".", "/")
writer.add_histogram(name, param, iteration, bins="auto")
if param.requires_grad:
writer.add_histogram(
name + "/grad", param.grad, iteration, bins="auto"
) # Save a model
if iteration % CONFIG.ITER_SAVE == 0:
torch.save(
model.module.state_dict(),
osp.join(CONFIG.SAVE_DIR, "checkpoint_{}.pth".format(iteration)),
) # Save a model (short term)
if iteration % 100 == 0:
torch.save(
model.module.state_dict(),
osp.join(CONFIG.SAVE_DIR, "checkpoint_current.pth"),
) torch.save(
model.module.state_dict(), osp.join(CONFIG.SAVE_DIR, "checkpoint_final.pth")
)

学习率设置&&训练模型之loss曲线滑动平均的更多相关文章

  1. TensorFlow笔记-07-神经网络优化-学习率,滑动平均

    TensorFlow笔记-07-神经网络优化-学习率,滑动平均 学习率 学习率 learning_rate: 表示了每次参数更新的幅度大小.学习率过大,会导致待优化的参数在最小值附近波动,不收敛:学习 ...

  2. TensorFlow+实战Google深度学习框架学习笔记(11)-----Mnist识别【采用滑动平均,双层神经网络】

    模型:双层神经网络 [一层隐藏层.一层输出层]隐藏层输出用relu函数,输出层输出用softmax函数 过程: 设置参数 滑动平均的辅助函数 训练函数 x,y的占位,w1,b1,w2,b2的初始化 前 ...

  3. 吴裕雄 python 神经网络——TensorFlow训练神经网络:不使用滑动平均

    import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data INPUT_NODE = 784 ...

  4. 理解滑动平均(exponential moving average)

    1. 用滑动平均估计局部均值 滑动平均(exponential moving average),或者叫做指数加权平均(exponentially weighted moving average),可以 ...

  5. tensorflow入门笔记(二) 滑动平均模型

    tensorflow提供的tf.train.ExponentialMovingAverage 类利用指数衰减维持变量的滑动平均. 当训练模型的时候,保持训练参数的滑动平均是非常有益的.评估时使用取平均 ...

  6. (转)理解滑动平均(exponential moving average)

    转自:理解滑动平均(exponential moving average) 1. 用滑动平均估计局部均值 滑动平均(exponential moving average),或者叫做指数加权平均(exp ...

  7. deep_learning_Function_tf.train.ExponentialMovingAverage()滑动平均

    近来看batch normalization的代码时,遇到tf.train.ExponentialMovingAverage()函数,特此记录. tf.train.ExponentialMovingA ...

  8. 探索学习率设置技巧以提高Keras中模型性能 | 炼丹技巧

      学习率是一个控制每次更新模型权重时响应估计误差而调整模型程度的超参数.学习率选取是一项具有挑战性的工作,学习率设置的非常小可能导致训练过程过长甚至训练进程被卡住,而设置的非常大可能会导致过快学习到 ...

  9. Tensorflow滑动平均模型tf.train.ExponentialMovingAverage解析

    觉得有用的话,欢迎一起讨论相互学习~Follow Me 移动平均法相关知识 移动平均法又称滑动平均法.滑动平均模型法(Moving average,MA) 什么是移动平均法 移动平均法是用一组最近的实 ...

随机推荐

  1. python接口自动化测试十七:使用bs4框架进行简单的爬虫

    安装:beautifulsoup4 from bs4 import BeautifulSoup yoyo = open('yoyo.html', 'r')   # 以读的方式打开“yoyo.html” ...

  2. Python3-RabbitMQ 3.7.2学习——Hello World(二)

    RabbitMQ环境搭建好了,接下来就是学习编程的入门级hello world. 在运行程序前,要先确保开启RabbitMQ服务 然后安装pika,命令:pip install pika 1.创建一个 ...

  3. 使用共同函数,将PNotify弹出提示框公用

    PNotify(http://sciactive.github.io/pnotify/)是个不错的jquery库(好像最新版的pnotify已不需要jquery了). 使用它来实现网站常用的提示框,很 ...

  4. URAL - 1495 One-two, One-two 2

    URAL - 1495 这是在dp的专题里写了,想了半天的dp,其实就是暴力... 题目大意:给你一个n,问你在30位以内有没有一个只由1或2 构成的数被 n 整除,如果 有则输出最小的那个,否则输出 ...

  5. 098实战 Job的调度

    一:介绍 1.job调度 容量调度:Apache Hadoop的默认方式 公平调度:CDH版本的Hadoop的默认方式 2.公平调度 是一种资源分配方式,在yarn的整个生命周期中,所有的applic ...

  6. 073 HBASE的读写以及client API

    一:读写思想 1.系统表 hbase:namespace 存储hbase中所有的namespace的信息 hbase:meta rowkey:hbase中所有表的region的名称 column:re ...

  7. 树莓派(Raspbian系统)中使用pyinstaller封装Python代码为可执行程序

    一.前言 将做好的Python软件运行在树莓派上时,不想公开源码,就需要对文件进行封装(或称打包),本文主要介绍使用pyinstaller封装Python代码为可执行程序. Python是一个脚本语言 ...

  8. python在使用MySQLdb模块时报Can't extract file(s) to egg cacheThe following error occurred while trying to extract file(s) to the Python eggcache的错误。

    这个是因为python使用MySQLdb模块与mysql数据库交互时需要一个地方作为cache放置暂存的数据,但是调用python解释器的用户(常常是服务器如apache的www用户)对于cache所 ...

  9. Web大前端面试题-Day10

    1. px和em的区别? px和em都是长度单位; 区别是: px的值是固定的,指定是多少就是多少, 计算比较容易. em得值不是固定的,并且em会继承父级元素的字体大小. 浏览器的默认字体高都是16 ...

  10. 项目冲刺Forth

    Forth Sprint 1.各个成员今日完成的任务 蔡振翼:修改部分博客 谢孟轩:续借功能和编辑资料功能的实现 林凯:初步实现登录功能 肖志豪:帮助其他人解决一些问题 吴文清:编写完善管理员个人界面 ...