ON LARGE BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA

2024-09-02 15:36:27 原文

目录

概
主要内容
- 一些解决办法

Keskar N S, Mudigere D, Nocedal J, et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima[J]. arXiv: Learning, 2016.

作者代码

@article{keskar2016on,

title={On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima},

author={Keskar, Nitish Shirish and Mudigere, Dheevatsa and Nocedal, Jorge and Smelyanskiy, Mikhail and Tang, Ping Tak Peter},

journal={arXiv: Learning},

year={2016}}

概

本文主要阐述了一种现象, 就是在我们训练网络的时候, 小的batch_size会比大的batch_size效果更好(表现在准确率上).

主要内容

因为作者主要是进行实验论证的, 所以就介绍一下结果, 我们用LB表示大的batch_size, SB表示小的batch_size.

作者认为, LB会导致参数尖化, 而SB会导致平坦的解, 个人感觉这种就是一个灵敏度的问题. 作者也说, LB会导致\(\nabla^2 f(x)\)呈现某个特征值特别大(绝对值), 其余特征值很小的情况, 而SB的\(\nabla^2 f(x)\)的特征值分布往往比较均匀.

注: 这里的\(x\)指的是网络的参数而非样本.

记LB训练后所对应的解为\(x^*_l\), 而SB训练后所对应的解为\(x^*_s\), 作者沿着俩个点的连续探索其landscape,

\[f(ax^*_l+(1-a)x_s^*), \quad a \in [-1,2],
\]

其结果如下

显然, 在\(\alpha=1\)处(即\(x=x_l^*\))左右的未知变化特别大, 这也反应了尖的特性.

一些解决办法

data augmentation, 效果显著
conservative training, 即采用proximal下降

\[\tag{5}
x_{k+1} = \argmin_x \frac{1}{|B_k|} \sum_{i \in B_k} f_i(x) + \frac{\lambda}{2} \|x - x_k\|_2^2,
\]

其中\(f_i\)表示输入为第\(i\)个样本.

3. robust training, 即利用原样本和对抗样本进行训练, 但是效果不是很明显(有可能是Goodfellow的机制不对? 新的是不需要利用原样本的).

ON LARGE BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA的更多相关文章

16 On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima 1609.04836v1
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping Tak Peter Tang N ...
Deep Learning中的Large Batch Training相关理论与实践
背景 [作者:DeepLearningStack,阿里巴巴算法工程师,开源TensorFlow Contributor] 在分布式训练时,提高计算通信占比是提高计算加速比的有效手段,当网络通信优化到一 ...
[Converge] Feature Selection in training of Deep Learning
特征相关性对于DL的影响链接:https://www.zhihu.com/question/47908908/answer/110987483 经验一: 1. 输入特征最好不相关.如果某些维输入的 ...
Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记
Spectral Norm Regularization for Improving the Generalizability of Deep Learning论文笔记 2018年12月03日 00: ...
Deep Learning in a Nutshell: History and Training
Deep Learning in a Nutshell: History and Training This series of blog posts aims to provide an intui ...
[C3] Andrew Ng - Neural Networks and Deep Learning
About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...
（转）分布式深度学习系统构建简介 Distributed Deep Learning
HOME ABOUT CONTACT SUBSCRIBE VIA RSS DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part ...
A Full Hardware Guide to Deep Learning
A Full Hardware Guide to Deep Learning Deep Learning is very computationally intensive, so you will ...
A Full Hardware Guide to Deep Learning深度学习电脑配置
https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 欢迎关注博 ...

随机推荐

linux 实用指令搜索查找类
linux 实用指令搜索查找类目录 linux 实用指令搜索查找类 find指令 locate指令 grep指令和管道符号 | find指令说明从指定目录向下递归地遍历其各个子目录,将满足条件的 ...
python写的多项式符号乘法
D:\>poly.py(x - 1) * (x^2 + x + 1) = x^3 - 1 1 import ply.lex as lex # pip install ply 2 import p ...
day03 MySQL数据库之主键与外键
day03 MySQL数据库之主键与外键昨日内容回顾针对库的基本SQL语句 # 增 create database meng; # 查 show databases; shwo create da ...
flink03-----1.Task的划分 2.共享资源槽 3.flink的容错
1. Task的划分在flink中,划分task的依据是发生shuffle(也叫redistrubute),或者是并行度发生变化 1. wordcount为例 package cn._51doit ...
大数据学习day33----spark13-----1.两种方式管理偏移量并将偏移量写入redis 2. MySQL事务的测试 3.利用MySQL事务实现数据统计的ExactlyOnce（sql语句中出现相同key时如何进行累加（此处时出现相同的单词））4 将数据写入kafka
1.两种方式管理偏移量并将偏移量写入redis (1)第一种:rdd的形式一般是使用这种直连的方式,但其缺点是没法调用一些更加高级的api,如窗口操作.如果想更加精确的控制偏移量,就使用这种方式代 ...
栈常考应用之括号匹（C++）
思路在注释里.还是使用链栈的API,为啥使用链栈呢,因为喜欢链栈. //header.h #pragma once #include<iostream> using namespace s ...
Redis缓存穿透、缓存击穿以及缓存雪崩
作为一个内存数据库,redis也总是免不了有各种各样的问题,这篇文章主要是针对其中三个问题进行讲解:缓存穿透.缓存击穿和缓存雪崩.并给出一些解决方案.这三个问题是基本问题也是面试常问问题. 这篇文章我 ...
Spring 与 SpringBoot 的区别
概述 Spring 与 SpringBoot 有什么区别???梳理一下 Spring 和 SpringBoot 到底有什么区别,从 Spring 和 SpringBoot 两方面入手. Spring ...
用户信息系统_serviceImpl
package com.hopetesting.service.impl;import com.hopetesting.dao.UserDao;import com.hopetesting.dao.i ...
深入浅出 Docker
一.什么Docker 从作用的角度: Docker是一个为开发人员和系统管理员开发.迁移和运行应用程序的平台.应用程序通过Docker打包成Docker Image后,可以实现统一的方式来下载.启动. ...