A Deep Neural Network’s Loss Surface Contains Every Low-dimensional Pattern
A Deep Neural Network’s Loss Surface Contains Every Low-dimensional Pattern
概
作者关于Loss Surface的情况做了一个理论分析, 即证明足够大的神经网络能够逼近所有的低维损失patterns.
相关工作
文中多处用到了universal approximators.
主要内容
引理1
\(\mathcal{F}\)定义了universal approximators, 即同一定义域内的任意函数\(f\)都能用\(\mathcal{F}\)中的元素来逼近. \(\sigma(f_\theta)\)则是将值域进行了扩展, 而这并不影响其universal approximator的性质.
定理1
证明:
假设神经网络的第一层的权重矩阵为\(\theta_W \in \mathbb{R}^{d \times k}\), 偏置向量为\(\theta_b\), 神经网络剩余的参数为\(\theta'\), 记\(\theta = \{\theta_W, \theta_b, \theta'\}\). 则网络的输出为:
f_{\theta}(x) = f_{\{\theta_W, \theta_b, \theta' \}}(x) = g_{\theta'}(\langle x, \theta_W \rangle + \theta_b),
\]
\(N\)个样本点的损失就是
L(\theta) = \frac{1}{N} \sum_i \ell (f_{\theta}(x_i), y_i).
\]
现在假设目标\(z\)维loss pattern为(应当为连续函数)
\mathcal{T}(h_1,h_2,\ldots, h_z):[0,1]^z \rightarrow [0, 1].
\]
我们现在, 希望将网络中的某些参数视作变量\(h_1,\ldots,h_z\), 得以逼近\(\mathcal{T}\).
令\(\theta_W=0\) (这样网络的输出与\(x\)无关), \(\theta_b=[h_1,\ldots, h_z,0,\ldots,0]\)(这隐含了\(k \ge z\)的假设).
根据universal approximation theorem我们可以使得\(q_{\theta'}\)成为approximator. 相对应的
定义\(\sigma(p):=\frac{1}{N}\sum_i \ell(q_{\theta'}(h_1,\ldots, h_z),y_i)\), 只需要\(\sigma\)满足引理1中的条件, 就存在\(\theta_{\epsilon}(\mathcal{T})\), 使得\(L(h_1,h_2,\ldots, h_z, \theta_{\epsilon}(\mathcal{T}))\)逼近\(\mathcal{T}\).
定理2
说实话, 这个定理没怎么看懂, 看证明, 这个global minimum似乎指的是\(\mathcal{T}(h)\)的最小值.
证明:
\(\theta_b\)不变, \(\theta_W\)只令前\(z\)列为0, 则第一层(未经激活)的输出为\((h_1,\ldots,h_z,\phi(x))\), 于是
令\(h^* := \arg \min_{h \in [0,1]^z \mathcal{T}(h)}\), 并假设\(L^*=\mathcal{T}(h^*)\)(?). 假设损失\(\ell_i(p) = \ell (p, y_i)\), 可逆且逆函数光滑(这个性质对于损失函数来讲很普遍).
在这个假设下, 我们有
q_{\theta'}(h, \phi(x_i)) \approx \ell_i^{-1}(\mathcal{T}(h)),
\]
文中说这个也是因为逼近定理, 固定\(i\)的时候, 这个自然是成立的, 如何能保证对于所有的\(i=1,\ldots,n\)成立, 我有一个思路.
假设二者的距离(\(+\infty\)范数)为\(\epsilon_i^h \in \mathbb{R}\), 则
所以
且此时\(|L(h^*)-\mathcal{T}(h^*)|<\epsilon\).
我比较关心的问题是, 能否选择合适的loss patterns (相当于选择合适的空间) 使得网络在某些性能上比较好(比方防过拟合, 最优性).
A Deep Neural Network’s Loss Surface Contains Every Low-dimensional Pattern的更多相关文章
- 深度神经网络如何看待你,论自拍What a Deep Neural Network thinks about your #selfie
Convolutional Neural Networks are great: they recognize things, places and people in your personal p ...
- XiangBai——【AAAI2017】TextBoxes_A Fast Text Detector with a Single Deep Neural Network
XiangBai--[AAAI2017]TextBoxes:A Fast Text Detector with a Single Deep Neural Network 目录 作者和相关链接 方法概括 ...
- 论文阅读(XiangBai——【AAAI2017】TextBoxes_A Fast Text Detector with a Single Deep Neural Network)
XiangBai——[AAAI2017]TextBoxes:A Fast Text Detector with a Single Deep Neural Network 目录 作者和相关链接 方法概括 ...
- Neural Networks and Deep Learning(week4)Deep Neural Network - Application(图像分类)
Deep Neural Network for Image Classification: Application 预先实现的代码,保存在本地 dnn_app_utils_v3.py import n ...
- Neural Networks and Deep Learning(week4)Building your Deep Neural Network: Step by Step
Building your Deep Neural Network: Step by Step 你将使用下面函数来构建一个深层神经网络来实现图像分类. 使用像relu这的非线性单元来改进你的模型 构建 ...
- 课程一(Neural Networks and Deep Learning),第四周(Deep Neural Networks)——2.Programming Assignments: Building your Deep Neural Network: Step by Step
Building your Deep Neural Network: Step by Step Welcome to your third programming exercise of the de ...
- What are the advantages of ReLU over sigmoid function in deep neural network?
The state of the art of non-linearity is to use ReLU instead of sigmoid function in deep neural netw ...
- 论文笔记之:Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation xx
- Deep Learning: Assuming a deep neural network is properly regulated, can adding more layers actually make the performance degrade?
Deep Learning: Assuming a deep neural network is properly regulated, can adding more layers actually ...
随机推荐
- 学习java 7.5
学习内容: Alt + Insert 快捷键 根据需要选择操作 继承的格式 public class 子类名 extends 父类名{} 继承好处:提高了代码的复用性,维护性 弊端:改变父类,子类也改 ...
- .Net 下高性能分表分库组件-连接模式原理
ShardingCore ShardingCore 一款ef-core下高性能.轻量级针对分表分库读写分离的解决方案,具有零依赖.零学习成本.零业务代码入侵. Github Source Code 助 ...
- A Child's History of England.9
But, first, as it was important to know how numerous those pestilent Danes were, and how they were f ...
- A Child's History of England.18
But, although she was a gentle lady, in all things worthy to be beloved - good, beautiful, sensible, ...
- linux 内存变量的分布
我们知道,linux通过虚拟内存管理进程的内存(进程的地址空间),而进程的地址空间分布如下 : 从进程的空间中可以看出,内存中的变量有的来自可执行elf文件,在elf文件中已经分配好存储空间,有的是在 ...
- oracle加密encrypt,解密decrypt
目录 oracle加密encrypt,解密decrypt 加密 解密 oracle加密encrypt,解密decrypt 有的oracle版本没有加解密函数,以下操作可以手动添加 oracle数据使用 ...
- Output of C++ Program | Set 1
Predict the output of below C++ programs. Question 1 1 // Assume that integers take 4 bytes. 2 #incl ...
- Linux学习 - 文件系统常用命令
一.文件系统查看命令df df [选项] [挂载点] -a 查看所有文件系统信息,包括特殊文件系统 -h 使用习惯单位显示容量 -T 显示文件系统类型 -m 以MB为单位显示容量 -k 以KB为单位显 ...
- Classs类
Classs类如何获得 获得Class对象 方式一: 通过Object类中的getClass()方法 方式二: 通过 类名.class 获取到字节码文件对象( 方式三: 通过Class类中的方法(将类 ...
- jQuery全局进行方法扩展
<!DOCTYPE html><html><head> <meta charset="UTF-8"> <title>01 ...