一.疑问

这几天一直纠结于一个问题:

同样的代码,为什么在keras的0.3.3版本中,拟合得比较好,也没有过拟合,验证集准确率一直高于训练准确率. 但是在换到keras的1.2.0版本中的时候,就过拟合了,验证误差一直高于训练误差

二.答案

今天终于发现原因了,原来是这两个版本的keras的optimezer实现不一样,但是它们的默认参数是一样的,因为我代码中用的是adam方法优化,下面就以optimezer中的adam来举例说明:

1.下面是keras==0.3.3时,其中optimezer.py中的adam方法实现:

 class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-,
*args, **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+.)] t = self.iterations +
lr_t = self.lr * K.sqrt( - K.pow(self.beta_2, t)) / ( - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of velocity
v = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + ( - self.beta_1) * g
v_t = (self.beta_2 * v) + ( - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append((m, m_t))
self.updates.append((v, v_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon}

2.下面是keras==1.2.0时,其中optimezer.py中的adam方法实现:

 class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-, decay=., **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, )] lr = self.lr
if self.inital_decay > :
lr *= (. / (. + self.decay * self.iterations)) t = self.iterations +
lr_t = lr * K.sqrt(. - K.pow(self.beta_2, t)) / (. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
m_t = (self.beta_1 * m) + (. - self.beta_1) * g
v_t = (self.beta_2 * v) + (. - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adam, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

读代码对比,可发现这两者实现方式有不同,而我的代码中一直使用的是adam的默认参数,所以才会结果不一样.

三.解决

要避免这一问题可用以下方法:

1.在自己的代码中,要对优化器的参数给定,不要用默认参数.

adam = optimizers.Adam(lr=1e-)

但是,在keras官方文档中,明确有说明,在用这些优化器的时候,最好使用默认参数,所以也可采用第2种方法.

2.优化函数中的优化方法要给定,也就是在训练的时候,在fit函数中的callbacks参数中的schedule要给定.

比如:

 # Callback that implements learning rate schedule
schedule = Step([], [1e-, 1e-]) history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test),
callbacks=[
schedule,
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=,save_best_only=True, mode='auto')# 该回调函数将在每个epoch后保存模型到filepath
# ,keras.callbacks.EarlyStopping(monitor='val_loss', patience=, verbose=, mode='auto')# 当监测值不再改善时,该回调函数将中止训练.当early stop被激活(如发现loss相比上一个epoch训练没有下降),则经过patience个epoch后停止训练
],
verbose=, shuffle=True)

其中Step函数如下:

 class Step(Callback):

     def __init__(self, steps, learning_rates, verbose=):
self.steps = steps
self.lr = learning_rates
self.verbose = verbose def change_lr(self, new_lr):
old_lr = K.get_value(self.model.optimizer.lr)
K.set_value(self.model.optimizer.lr, new_lr)
if self.verbose == :
print('Learning rate is %g' %new_lr) def on_epoch_begin(self, epoch, logs={}):
for i, step in enumerate(self.steps):
if epoch < step:
self.change_lr(self.lr[i])
return
self.change_lr(self.lr[i+]) def get_config(self):
config = {'class': type(self).__name__,
'steps': self.steps,
'learning_rates': self.lr,
'verbose': self.verbose}
return config @classmethod
def from_config(cls, config):
offset = config.get('epoch_offset', )
steps = [step - offset for step in config['steps']]
return cls(steps, config['learning_rates'],
verbose=config.get('verbose', ))

Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结的更多相关文章

  1. Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题

    一.问题: keras中不能在每个epoch实时显示学习速率learning rate,从而方便调试,实际上也是为了调试解决这个问题:Deep Learning 31: 不同版本的keras,对同样的 ...

  2. How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

    Hyperparameter optimization is a big part of deep learning. The reason is that neural networks are n ...

  3. Top Deep Learning Projects in github

    Top Deep Learning Projects A list of popular github projects related to deep learning (ranked by sta ...

  4. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现(转)

    Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些论文, ...

  5. Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Regularization)

    声明:所有内容来自coursera,作为个人学习笔记记录在这里. Regularization Welcome to the second assignment of this week. Deep ...

  6. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现

    https://blog.csdn.net/zouxy09/article/details/9993371 自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一 ...

  7. How To Improve Deep Learning Performance

    如何提高深度学习性能 20 Tips, Tricks and Techniques That You Can Use ToFight Overfitting and Get Better Genera ...

  8. Unsupervised Feature Learning and Deep Learning(UFLDL) Exercise 总结

    7.27 暑假开始后,稍有时间,“搞完”金融项目,便开始跑跑 Deep Learning的程序 Hinton 在Nature上文章的代码 跑了3天 也没跑完 后来Debug 把batch 从200改到 ...

  9. (转) 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

    特别棒的一篇文章,仍不住转一下,留着以后需要时阅读 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

随机推荐

  1. MAC生成公钥私钥

    前言 需要开发者在本地上使用openssl来生成私钥和公钥 由于mac 自带openssl工具,所以不用像windows那样要下载安装openssl工具 步骤 1.创建一个文件夹,终端进入该文件夹 c ...

  2. 记一次Jenkins 打包异常 ERROR: Exception when publishing, exception message [Failure]

    今天早上打包一直都没有问题,突然有一次打包突然出现异常现象,如下: ERROR: Exception when publishing, exception message [Failure] Buil ...

  3. PHP提示Cannot modify header information - headers already sent by解决方法

    PHP提示Cannot modify header information - headers already sent by解决方法 因为 header();发送头之前不能有任何输出,空格也不行, ...

  4. spring boot--常用配置

    spring boot 需要引用的依赖项: spring-boot-starter-parent // 所有Spring Boot组件的基础引用 spring-boot-starter-web // ...

  5. npm 安装vue-cli

    TIP:win10下安装,使用管理员身份进行,否则会有权限限制. 1,安装完成node,node有自带的npm,可以直接在cmd中,找到nodeJs安装的路径下,进行命令行全局安装vue-cli.(n ...

  6. T3138 栈练习2 codevs

    http://codevs.cn/problem/3138/ 题目描述 Description 给定一个栈(初始为空,元素类型为整数,且小于等于100),只有两个操作:入栈和出栈.先给出这些操作,请输 ...

  7. python type()函数

    我怎么把一个变量的类型写入文件?a = 3type(a)貌似返回的是type类型,不能打印,也不能用文件的write怎么半,或者怎么转换成srt之类的? type()函数得到的是一个类型而不是字符串, ...

  8. [Bzoj3172][Tjoi2013]单词(fail树)

    3172: [Tjoi2013]单词 Time Limit: 10 Sec  Memory Limit: 512 MBSubmit: 4777  Solved: 2345[Submit][Status ...

  9. Codeforces 451 E Devu and Flowers

    Discription Devu wants to decorate his garden with flowers. He has purchased n boxes, where the i-th ...

  10. decorate all function in all module

    需求: 有package db_api,其下有很多 module 如 plane.py ship.py ufo.py.这些module内定义了方法如 plane.fly(), ship.float() ...