Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题
一.问题:
keras中不能在每个epoch实时显示学习速率learning rate,从而方便调试,实际上也是为了调试解决这个问题:Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结
二.解决方法
1.把下面代码加入keras文件callbacks.py中:
class DisplayLearningRate(Callback):
'''Display Learning rate .
'''
def __init__(self):
super(DisplayLearningRate, self).__init__() def on_epoch_begin(self, epoch, logs={}):
assert hasattr(self.model.optimizer, 'lr'), \
'Optimizer must have a "lr" attribute.'
lr_now = K.get_value(self.model.optimizer.lr) print('Epoch %05d: Learning rate is %s' % (epoch, lr_now))
2.应用方法如下:
history = model.fit(X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
show_accuracy=False,
verbose=,
validation_data=(X_test, Y_test),
callbacks = [
keras.callbacks.DisplayLearningRate(),
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=, save_best_only=True, mode='auto'), # 该回调函数将在每个epoch后保存模型到filepath
# keras.callbacks.EarlyStopping(monitor='val_loss', patience=, verbose=, mode='auto')# 当监测值不再改善时,该回调函数将中止训练.当early stop被激活(如发现loss相比上一个epoch训练没有下降),则经过patience个epoch后停止训练
])
三.总结
按照上面的方法试了之后发现,每个epoch显示的learning rate都是一样的,原来按照这样显示的是最开始初始化时的learning rate,每次epoch学习速率更新后,并没有把值赋给初始时的learning rate,所以才会这样,那么要怎么样才能实时显示每个epoch的学习速率呢? 我觉得应该是显示optimizer中的updates.
四.最终办法
# set the decay as 1e-1 to see the Ir change between epochs.
sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
class LossHistory(Callback):
def on_epoch_begin(self, batch, logs={}):
lr = self.lr * (1. / (1. + self.decay * self.iterations))
print('Ir:', lr)
history=LossHistory()
model.fit(X_train, Y_train,
batch_size= batch_size,
nb_epoch= nb_epoch,
callbacks= [history])
参考:http://stackoverflow.com/questions/40144805/print-learning-rate-evary-epoch-in-sgd
下面我分别把keras==0.3.3和1.2.0时的optimizer.py分别贴出来:
keras==0.3.3时的optimizer.py如下:
from __future__ import absolute_import
from . import backend as K
import numpy as np
from .utils.generic_utils import get_from_module
from six.moves import zip def clip_norm(g, c, n):
if c > 0:
g = K.switch(n >= c, g * c / n, g)
return g def kl_divergence(p, p_hat):
return p_hat - p + p * K.log(p / p_hat) class Optimizer(object):
'''Abstract optimizer base class. Note: this is the parent class of all optimizers, not an actual optimizer
that can be used for training models. All Keras optimizers support the following keyword arguments: clipnorm: float >= 0. Gradients will be clipped
when their L2 norm exceeds this value.
clipvalue: float >= 0. Gradients will be clipped
when their absolute value exceeds this value.
'''
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
self.updates = [] def get_state(self):
return [K.get_value(u[0]) for u in self.updates] def set_state(self, value_list):
assert len(self.updates) == len(value_list)
for u, v in zip(self.updates, value_list):
K.set_value(u[0], v) def get_updates(self, params, constraints, loss):
raise NotImplementedError def get_gradients(self, loss, params):
grads = K.gradients(loss, params)
if hasattr(self, 'clipnorm') and self.clipnorm > 0:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
if hasattr(self, 'clipvalue') and self.clipvalue > 0:
grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
return grads def get_config(self):
return {"name": self.__class__.__name__} class SGD(Optimizer):
'''Stochastic gradient descent, with support for momentum,
decay, and Nesterov momentum. # Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
'''
def __init__(self, lr=0.01, momentum=0., decay=0., nesterov=False,
*args, **kwargs):
super(SGD, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.momentum = K.variable(momentum)
self.decay = K.variable(decay) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
lr = self.lr * (1.0 / (1.0 + self.decay * self.iterations))
self.updates = [(self.iterations, self.iterations + 1.)] for p, g, c in zip(params, grads, constraints):
m = K.variable(np.zeros(K.get_value(p).shape)) # momentum
v = self.momentum * m - lr * g # velocity
self.updates.append((m, v)) if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"momentum": float(K.get_value(self.momentum)),
"decay": float(K.get_value(self.decay)),
"nesterov": self.nesterov} class RMSprop(Optimizer):
'''RMSProp optimizer. It is recommended to leave the parameters of this optimizer
at their default values. This optimizer is usually a good choice for recurrent
neural networks. # Arguments
lr: float >= 0. Learning rate.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor.
'''
def __init__(self, lr=0.001, rho=0.9, epsilon=1e-6, *args, **kwargs):
super(RMSprop, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.rho = K.variable(rho) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, c in zip(params, grads, accumulators, constraints):
# update accumulator
new_a = self.rho * a + (1 - self.rho) * K.square(g)
self.updates.append((a, new_a)) new_p = p - self.lr * g / K.sqrt(new_a + self.epsilon)
self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"rho": float(K.get_value(self.rho)),
"epsilon": self.epsilon} class Adagrad(Optimizer):
'''Adagrad optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
epsilon: float >= 0.
'''
def __init__(self, lr=0.01, epsilon=1e-6, *args, **kwargs):
super(Adagrad, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, c in zip(params, grads, accumulators, constraints):
new_a = a + K.square(g) # update accumulator
self.updates.append((a, new_a))
new_p = p - self.lr * g / K.sqrt(new_a + self.epsilon)
self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"epsilon": self.epsilon} class Adadelta(Optimizer):
'''Adadelta optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate. It is recommended to leave it at the default value.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor. # References
- [Adadelta - an adaptive learning rate method](http://arxiv.org/abs/1212.5701)
'''
def __init__(self, lr=1.0, rho=0.95, epsilon=1e-6, *args, **kwargs):
super(Adadelta, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
delta_accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, d_a, c in zip(params, grads, accumulators,
delta_accumulators, constraints):
# update accumulator
new_a = self.rho * a + (1 - self.rho) * K.square(g)
self.updates.append((a, new_a)) # use the new accumulator and the *old* delta_accumulator
update = g * K.sqrt(d_a + self.epsilon) / K.sqrt(new_a + self.epsilon) new_p = p - self.lr * update
self.updates.append((p, c(new_p))) # apply constraints # update delta_accumulator
new_d_a = self.rho * d_a + (1 - self.rho) * K.square(update)
self.updates.append((d_a, new_d_a))
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"rho": self.rho,
"epsilon": self.epsilon} class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
*args, **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+1.)] t = self.iterations + 1
lr_t = self.lr * K.sqrt(1 - K.pow(self.beta_2, t)) / (1 - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of velocity
v = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
v_t = (self.beta_2 * v) + (1 - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append((m, m_t))
self.updates.append((v, v_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon} class Adamax(Optimizer):
'''Adamax optimizer from Adam paper's Section 7. It is a variant
of Adam based on the infinity norm. Default parameters follow those provided in the paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
*args, **kwargs):
super(Adamax, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+1.)] t = self.iterations + 1
lr_t = self.lr / (1 - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of 1st moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of exponentially weighted infinity norm
u = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
u_t = K.maximum(self.beta_2 * u, K.abs(g))
p_t = p - lr_t * m_t / (u_t + self.epsilon) self.updates.append((m, m_t))
self.updates.append((u, u_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon} # aliases
sgd = SGD
rmsprop = RMSprop
adagrad = Adagrad
adadelta = Adadelta
adam = Adam
adamax = Adamax def get(identifier, kwargs=None):
return get_from_module(identifier, globals(), 'optimizer',
instantiate=True, kwargs=kwargs)
keras==1.2.0时的optimizer.py如下:
from __future__ import absolute_import
from . import backend as K
from .utils.generic_utils import get_from_module
from six.moves import zip def clip_norm(g, c, n):
if c > 0:
g = K.switch(n >= c, g * c / n, g)
return g def optimizer_from_config(config, custom_objects={}):
all_classes = {
'sgd': SGD,
'rmsprop': RMSprop,
'adagrad': Adagrad,
'adadelta': Adadelta,
'adam': Adam,
'adamax': Adamax,
'nadam': Nadam,
'tfoptimizer': TFOptimizer,
}
class_name = config['class_name']
if class_name in custom_objects:
cls = custom_objects[class_name]
else:
if class_name.lower() not in all_classes:
raise ValueError('Optimizer class not found:', class_name)
cls = all_classes[class_name.lower()]
return cls.from_config(config['config']) class Optimizer(object):
'''Abstract optimizer base class. Note: this is the parent class of all optimizers, not an actual optimizer
that can be used for training models. All Keras optimizers support the following keyword arguments: clipnorm: float >= 0. Gradients will be clipped
when their L2 norm exceeds this value.
clipvalue: float >= 0. Gradients will be clipped
when their absolute value exceeds this value.
'''
def __init__(self, **kwargs):
allowed_kwargs = {'clipnorm', 'clipvalue'}
for k in kwargs:
if k not in allowed_kwargs:
raise TypeError('Unexpected keyword argument '
'passed to optimizer: ' + str(k))
self.__dict__.update(kwargs)
self.updates = []
self.weights = [] def get_updates(self, params, constraints, loss):
raise NotImplementedError def get_gradients(self, loss, params):
grads = K.gradients(loss, params)
if hasattr(self, 'clipnorm') and self.clipnorm > 0:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
if hasattr(self, 'clipvalue') and self.clipvalue > 0:
grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
return grads def set_weights(self, weights):
'''Sets the weights of the optimizer, from Numpy arrays. Should only be called after computing the gradients
(otherwise the optimizer has no weights). # Arguments
weights: a list of Numpy arrays. The number
of arrays and their shape must match
number of the dimensions of the weights
of the optimizer (i.e. it should match the
output of `get_weights`).
'''
params = self.weights
weight_value_tuples = []
param_values = K.batch_get_value(params)
for pv, p, w in zip(param_values, params, weights):
if pv.shape != w.shape:
raise ValueError('Optimizer weight shape ' +
str(pv.shape) +
' not compatible with '
'provided weight shape ' + str(w.shape))
weight_value_tuples.append((p, w))
K.batch_set_value(weight_value_tuples) def get_weights(self):
'''Returns the current weights of the optimizer,
as a list of numpy arrays.
'''
return K.batch_get_value(self.weights) def get_config(self):
config = {}
if hasattr(self, 'clipnorm'):
config['clipnorm'] = self.clipnorm
if hasattr(self, 'clipvalue'):
config['clipvalue'] = self.clipvalue
return config @classmethod
def from_config(cls, config):
return cls(**config) class SGD(Optimizer):
'''Stochastic gradient descent, with support for momentum,
learning rate decay, and Nesterov momentum. # Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
'''
def __init__(self, lr=0.01, momentum=0., decay=0.,
nesterov=False, **kwargs):
super(SGD, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.momentum = K.variable(momentum)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates .append(K.update_add(self.iterations, 1)) # momentum
shapes = [K.get_variable_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v)) if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p) self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'momentum': float(K.get_value(self.momentum)),
'decay': float(K.get_value(self.decay)),
'nesterov': self.nesterov}
base_config = super(SGD, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class RMSprop(Optimizer):
'''RMSProp optimizer. It is recommended to leave the parameters of this optimizer
at their default values
(except the learning rate, which can be freely tuned). This optimizer is usually a good choice for recurrent
neural networks. # Arguments
lr: float >= 0. Learning rate.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor.
decay: float >= 0. Learning rate decay over each update.
'''
def __init__(self, lr=0.001, rho=0.9, epsilon=1e-8, decay=0.,
**kwargs):
super(RMSprop, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.rho = K.variable(rho)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a in zip(params, grads, accumulators):
# update accumulator
new_a = self.rho * a + (1. - self.rho) * K.square(g)
self.updates.append(K.update(a, new_a))
new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon) # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'rho': float(K.get_value(self.rho)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(RMSprop, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adagrad(Optimizer):
'''Adagrad optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
epsilon: float >= 0. # References
- [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
'''
def __init__(self, lr=0.01, epsilon=1e-8, decay=0., **kwargs):
super(Adagrad, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a in zip(params, grads, accumulators):
new_a = a + K.square(g) # update accumulator
self.updates.append(K.update(a, new_a))
new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon)
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adagrad, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adadelta(Optimizer):
'''Adadelta optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
It is recommended to leave it at the default value.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor. # References
- [Adadelta - an adaptive learning rate method](http://arxiv.org/abs/1212.5701)
'''
def __init__(self, lr=1.0, rho=0.95, epsilon=1e-8, decay=0.,
**kwargs):
super(Adadelta, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
delta_accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators + delta_accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a, d_a in zip(params, grads, accumulators, delta_accumulators):
# update accumulator
new_a = self.rho * a + (1. - self.rho) * K.square(g)
self.updates.append(K.update(a, new_a)) # use the new accumulator and the *old* delta_accumulator
update = g * K.sqrt(d_a + self.epsilon) / K.sqrt(new_a + self.epsilon) new_p = p - lr * update
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p)) # update delta_accumulator
new_d_a = self.rho * d_a + (1 - self.rho) * K.square(update)
self.updates.append(K.update(d_a, new_d_a))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'rho': self.rho,
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adadelta, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0., **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations)) t = self.iterations + 1
lr_t = lr * K.sqrt(1. - K.pow(self.beta_2, t)) / (1. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adam, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adamax(Optimizer):
'''Adamax optimizer from Adam paper's Section 7. It is a variant
of Adam based on the infinity norm. Default parameters follow those provided in the paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0., **kwargs):
super(Adamax, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations)) t = self.iterations + 1
lr_t = lr / (1. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
# zero init of 1st moment
ms = [K.zeros(shape) for shape in shapes]
# zero init of exponentially weighted infinity norm
us = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + us for p, g, m, u in zip(params, grads, ms, us): m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
u_t = K.maximum(self.beta_2 * u, K.abs(g))
p_t = p - lr_t * m_t / (u_t + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(u, u_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adamax, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Nadam(Optimizer):
'''
Nesterov Adam optimizer: Much like Adam is essentially RMSprop with momentum,
Nadam is Adam RMSprop with Nesterov momentum. Default parameters follow those provided in the paper.
It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Nadam report](http://cs229.stanford.edu/proj2015/054_report.pdf)
- [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, schedule_decay=0.004, **kwargs):
super(Nadam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.m_schedule = K.variable(1.)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.schedule_decay = schedule_decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] t = self.iterations + 1 # Due to the recommendations in [2], i.e. warming momentum schedule
momentum_cache_t = self.beta_1 * (1. - 0.5 * (K.pow(0.96, t * self.schedule_decay)))
momentum_cache_t_1 = self.beta_1 * (1. - 0.5 * (K.pow(0.96, (t + 1) * self.schedule_decay)))
m_schedule_new = self.m_schedule * momentum_cache_t
m_schedule_next = self.m_schedule * momentum_cache_t * momentum_cache_t_1
self.updates.append((self.m_schedule, m_schedule_new)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes] self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
# the following equations given in [1]
g_prime = g / (1. - m_schedule_new)
m_t = self.beta_1 * m + (1. - self.beta_1) * g
m_t_prime = m_t / (1. - m_schedule_next)
v_t = self.beta_2 * v + (1. - self.beta_2) * K.square(g)
v_t_prime = v_t / (1. - K.pow(self.beta_2, t))
m_t_bar = (1. - momentum_cache_t) * g_prime + momentum_cache_t_1 * m_t_prime self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) p_t = p - self.lr * m_t_bar / (K.sqrt(v_t_prime) + self.epsilon)
new_p = p_t # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'epsilon': self.epsilon,
'schedule_decay': self.schedule_decay}
base_config = super(Nadam, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class TFOptimizer(Optimizer): def __init__(self, optimizer):
self.optimizer = optimizer
self.iterations = K.variable(0.)
self.updates = [] def get_updates(self, params, constraints, loss):
if constraints:
raise ValueError('TF optimizers do not support '
'weights constraints. Either remove '
'all weights constraints in your model, '
'or use a Keras optimizer.')
grads = self.optimizer.compute_gradients(loss, params)
opt_update = self.optimizer.apply_gradients(
grads, global_step=self.iterations)
self.updates.append(opt_update)
return self.updates @property
def weights(self):
raise NotImplementedError def get_config(self):
raise NotImplementedError def from_config(self, config):
raise NotImplementedError # aliases
sgd = SGD
rmsprop = RMSprop
adagrad = Adagrad
adadelta = Adadelta
adam = Adam
adamax = Adamax
nadam = Nadam def get(identifier, kwargs=None):
if K.backend() == 'tensorflow':
# Wrap TF optimizer instances
import tensorflow as tf
if isinstance(identifier, tf.train.Optimizer):
return TFOptimizer(identifier)
# Instantiate a Keras optimizer
return get_from_module(identifier, globals(), 'optimizer',
instantiate=True, kwargs=kwargs)
Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题的更多相关文章
- python3 写CSV文件多一个空行的解决办法
Python文档中有提到: open('eggs.csv', newline='') 也就是说,打开文件的时候多指定一个参数.Python文档中也有这样的示例: import csvwith open ...
- 【转载】 迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)
--------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...
- 我的Keras使用总结(4)——Application中五款预训练模型学习及其应用
本节主要学习Keras的应用模块 Application提供的带有预训练权重的模型,这些模型可以用来进行预测,特征提取和 finetune,上一篇文章我们使用了VGG16进行特征提取和微调,下面尝试一 ...
- 我的Keras使用总结(1)——Keras概述与常见问题整理
今天整理了自己所写的关于Keras的博客,有没发布的,有发布的,但是整体来说是有点乱的.上周有空,认真看了一周Keras的中文文档,稍有心得,整理于此.这里附上Keras官网地址: Keras英文文档 ...
- 我的Keras使用总结(5)——Keras指定显卡且限制显存用量,常见函数的用法及其习题练习
Keras 是一个高层神经网络API,Keras是由纯Python编写而成并基于TensorFlow,Theano以及CNTK后端.Keras为支持快速实验而生,能够将我们的idea迅速转换为结果.好 ...
- 强化学习(Reinforcement Learning)中的Q-Learning、DQN,面试看这篇就够了!
1. 什么是强化学习 其他许多机器学习算法中学习器都是学得怎样做,而强化学习(Reinforcement Learning, RL)是在尝试的过程中学习到在特定的情境下选择哪种行动可以得到最大的回报. ...
- 变量分割技术、判别学习(discriminative learning method)
基于模型的优化方法(model-based optimization method): 小波变换.卡尔曼滤波.中值滤波.均值滤波: 优点:对于处理不同的逆问题都非常灵活:缺点:为了更好的效果而采用各种 ...
- 写一个PHP函数,实现扫描并打印出指定目录下(含子目录)的所有jpg文件名
写一个PHP函数,实现扫描并打印出指定目录下(含子目录)的所有jpg文件名 <?php $dir = "E:\照片\\";//打印文件夹中所有jpg文件 function p ...
- 【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...
随机推荐
- 【luogu】P1772物流运输(最短路+DP)
题目链接 对于本题我们设ext[i][j]计算第i个码头在前j天总共有几天不能用(其实就一前缀和),设dis[i][j]是从第i天到第j天不变运输路线的最短路径,设f[i]是前i天运输货物的最小花费. ...
- BZOJ 1419 Red is good ——期望DP
定义f[i][j]表示还剩i张红牌,j张黑牌的时候能取得的期望最大值 显然有$f[i][j]=max(0,\frac {i}{i+j}(f[i-1][j]+1)+ \frac {j}{i+j}(f[i ...
- 【DFS序+树状数组】HDU 3887 Counting Offspring
http://acm.hdu.edu.cn/showproblem.php?pid=3887 [题意] 给定一棵树,给定这棵树的根 对于每个结点,统计子树中编号比他小的结点个数 编号从小到大一次输出 ...
- 联合权值(codevs 3728)
Description 无向连通图 G 有 n 个点,n−1 条边.点从 1 到 n 依次编号,编号为 i 的点的权值为 Wi,每条边的长度均为 1.图上两点 (u,v) 的距离定义为 u 点到 v ...
- wamp Apache和mysql服务无法启动的终极解决方法!!!!!!
用了几年的wampserver 突然宣告无法启动,Apache和mysql都崩溃了,在计算机的服务选项里面也是无法启动的,系统报了一个未知错误,重装了N个版本的PHP集成开发环境,都宣告失败! 我想应 ...
- 页面中用Context.Handler传递
最近被WCF弄得身心疲惫.今天抽空看了一下页面传值的一些技巧.传统的cookie session 什么的就不介绍了 今天介绍Context的用法 首先要应用using System.Runtim ...
- 【BZOJ3224】普通平衡树(splay)
题意: 您需要写一种数据结构(可参考题目标题),来维护一些数,其中需要提供以下操作:1. 插入x数2. 删除x数(若有多个相同的数,因只删除一个)3. 查询x数的排名(若有多个相同的数,因输出最小的排 ...
- Oracle常用操作【自己的练习】
Oracle查询的时候条件要用单引号包裹,不能用双引号;Oracle的in子查询里面的值最多有1000个........ 连接orcl数据库 C:\Windows\system32@orcl as s ...
- R语言入门视频笔记--9--随机与数据描述分析
古典概型的样本总量是一定的,且每种可能的可能性是相同的, 1.中位数:median(x) 2.百分位数:quantile(x)或者quantile(x,probe=seq(0,1,0.2)) #后面这 ...
- Python入门--13--递归
什么是递归: 有调用函数自身的行为 有一个正确的返回条件 设置递归的深度: import sys sys.setrecursionlimit(10000) #可以递归一万次 用普通的方法也就是非递归版 ...