Cho Y, Saul L K. Kernel Methods for Deep Learning[C]. neural information processing systems, 2009: 342-350.

@article{cho2009kernel,

title={Kernel Methods for Deep Learning},

author={Cho, Youngmin and Saul, Lawrence K},

pages={342--350},

year={2009}}

这篇文章介绍了一种新的核函数, 其启发来自于神经网络的运算.



其中\(\Theta(z)=\frac{1}{2}(1+\mathrm{sign}(z))\).

主要内容

主要性质, 公式(1)可以表示成:

\[k_n(\mathbf{x}, \mathbf{y}) = \frac{1}{\pi} \|\mathbf{x}\|^n\|\mathbf{y}\|^n J_n(\theta).
\tag{2}
\]

其中:

\[J_n(\theta) = (-1)^n (\sin \theta)^{2n+1} (\frac{1}{\sin \theta} \frac{\partial}{\partial \theta})^n(\frac{\pi-\theta}{\sin \theta}).
\tag{3}
\]
\[\theta = \cos^{-1} (\frac{\mathbf{x}\cdot \mathbf{y}}{\|\mathbf{x}\| \|\mathbf{y}\|}).
\tag{4}
\]

特别的:

其证明如下:



第(17)的证明我没有推, 因为 contour integration 暂时不了解.

细心的读者可能会发现, 最后的结果是\(\frac{\partial^n}{\partial(\cos \theta)^n}\), 注意对于一个函数\(f(\cos \theta)\), 我们可以令\(g(\theta) = f(\cos \theta)\)则:

\[\frac{\partial f}{\partial \cos \theta} = \frac{\partial{g}}{\partial \theta} \frac{\partial\theta}{\partial \cos \theta},
\]

\[\mathrm{d}\cos \theta =-\sin \theta \mathrm{d} \theta.
\]

便得结论.

与深度学习的联系

如果我们把注意力集中在某一层, 假设输入为\(\mathbf{x}\), 输出为:

\[\mathbf{f}(\mathbf{x}) = g(W\mathbf{x}) \in \mathbb{R}^m,
\]

其中\(g(z) = \Theta(z) z^n\)是激活函数, 不同的n有如下的表现:



\(n=1\)便是我们熟悉的ReLU.

考虑俩个输入\(\mathbf{x},\mathbf{y}\)所对应的输出\(\mathbf{f}(\mathbf{x}),\mathbf{f}(\mathbf{y})\)的内积:

\[\mathbf{f}(\mathbf{x}) \cdot \mathbf{f}(\mathbf{y}) = \sum_{i=1}^m \Theta(\mathbf{w}_i \cdot \mathbf{x}) \Theta(\mathbf{w}_i \cdot \mathbf{y}) (\mathbf{w}_i \cdot \mathbf{x})^n (\mathbf{w}_i \cdot \mathbf{y})^n
\]

如果每个权重\(W_{ij}\)都服从标准正态分布, 则:

\[\lim_{m \rightarrow \infty} \frac{2}{m} \mathbf{f} (\mathbf{x}) \cdot \mathbf{f}(\mathbf{x}) = k_n(\mathbf{x}, \mathbf{y}).
\]

实验

实验失败了, 代码如下.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import NuSVC
"""
Arc_cosine kernel
"""
class Arc_cosine: def __init__(self, n=1):
self.n = n
self.own_kernel = self.kernels(n) def kernel0(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
return 1 - angle / np.pi def kernel1(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
sin_value = np.sin(angle)
return (norm_x * norm_y) ** self.n * \
(sin_value + (np.pi - angle) *
cos_value) / np.pi def kernel2(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
sin_value = np.sin(angle)
return (norm_x * norm_y) ** self.n * \
3 * sin_value * cos_value + \
(np.pi - angle) * (1 + 2 * cos_value ** 2) def kernels(self, n):
if n is 0:
return self.kernel0
elif n is 1:
return self.kernel1
elif n is 2:
return self.kernel2
else:
raise ValueError("No such kernel, n should be "
"0, 1 or 2") def kernel(self, X, Y):
m = X.shape[0]
n = Y.shape[0]
C = np.zeros((m, n))
for i in range(m):
for j in range(n):
C[i, j] = self.own_kernel(
X[i], Y[j]
)
return C def __call__(self, X, Y):
return self.kernel(X, Y)

在俩个数据上进行SVM, 数据如下:





在SVM上跑:

'''
#生成圈圈数据
def generate_data(circle, r1, r2, nums=300):
variance = 1
rs1 = np.random.randn(nums) * variance + r1
rs2 = np.random.randn(nums) * variance + r2
angles = np.linspace(0, 2*np.pi, nums)
data1 = (rs1 * np.sin(angles) + circle[0],
rs1 * np.cos(angles) + circle[1])
data2 = (rs2 * np.sin(angles) + circle[0],
rs2 * np.cos(angles) + circle[1])
df1 = pd.DataFrame({'x':data1[0], 'y': data1[1],
'label':np.ones(nums)})
df2 = pd.DataFrame({'x':data2[0], 'y': data2[1],
'label':-np.ones(nums)}) return df1, df2
''' #生成十字数据
def generate_data(left, right, down, up,
circle=(0., 0.), nums=300):
variance = 1
y1 = np.random.rand(nums) * variance + circle[1]
x2 = np.random.rand(nums) * variance + circle[0]
x1 = np.linspace(left, right, nums)
y2 = np.linspace(down, up, nums)
df1 = pd.DataFrame(
{'x': x1,
'y': y1,
'label':np.ones_like(x1)}
)
df2 = pd.DataFrame(
{'x': x2,
'y': y2,
'label':-np.ones_like(x2)}
)
return df1, df2 def pre_test(left, right, func, nums=100):
x1, y1 = left
x2, y2 = right
x = np.linspace(x1, x2, nums)
y = np.linspace(y1, y2, nums)
X,Y = np.meshgrid(x,y)
m, n = X.shape
Z = func(np.vstack((X.reshape(1, -1),
Y.reshape(1, -1))).T).reshape(m, n) return X, Y, Z df1, df2 = generate_data(-10, 10, -10, 10)
df = df1.append(df2)
classifer2 = NuSVC(kernel=Arc_cosine(n=1))
classifer2.fit(df.iloc[:, :2], df['label'])
X, Y, Z = pre_test((-10, -10), (10, 10), classifer2.predict)
plt.contourf(X, Y, Z)
plt.show()

预测结果均为:

而在一般的RBF上, 结果都是很好的:

在多项式核上也ok:

如果有人能发现代码中的错误,请务必指正.

Kernel Methods for Deep Learning的更多相关文章

  1. (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance

    Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...

  2. 深度学习的集成方法——Ensemble Methods for Deep Learning Neural Networks

    本文主要参考Ensemble Methods for Deep Learning Neural Networks一文. 1. 前言 神经网络具有很高的方差,不易复现出结果,而且模型的结果对初始化参数异 ...

  3. Paper List ABOUT Deep Learning

    Deep Learning 方向的部分 Paper ,自用.一 RNN 1 Recurrent neural network based language model RNN用在语言模型上的开山之作 ...

  4. Deep Learning方向的paper

    转载 http://hi.baidu.com/chb_seaok/item/6307c0d0363170e73cc2cb65 个人阅读的Deep Learning方向的paper整理,分了几部分吧,但 ...

  5. Kernel Functions for Machine Learning Applications

    In recent years, Kernel methods have received major attention, particularly due to the increased pop ...

  6. Deep Learning and the Triumph of Empiricism

    Deep Learning and the Triumph of Empiricism By Zachary Chase Lipton, July 2015 Deep learning is now ...

  7. How To Improve Deep Learning Performance

    如何提高深度学习性能 20 Tips, Tricks and Techniques That You Can Use ToFight Overfitting and Get Better Genera ...

  8. My deep learning reading list

    My deep learning reading list 主要是顺着Bengio的PAMI review的文章找出来的.包括几本综述文章,将近100篇论文,各位山头们的Presentation.全部 ...

  9. Deep Learning关于Vision的Reading List

    最近开始学习深度学习了,加油! 下文转载自:http://blog.sina.com.cn/s/blog_bda0d2f10101fpp4.html 主要是顺着Bengio的PAMI review的文 ...

随机推荐

  1. A Child's History of England.28

    By such means, and by taxing and oppressing the English people in every possible way, the Red King b ...

  2. Oracle—merge into语法

    oracle的merge into语法,在这种情况下: 基于某些字段,存在就更新,不存在就插入: 不需要先去判断一下记录是否存在,直接使用merge into merge into 语法: MERGE ...

  3. GO 时间处理

    比较大小 比较大小 先把当前时间格式化成相同格式的字符串,然后使用time的Before, After, Equal 方法即可. time1 := "2015-03-20 08:50:29& ...

  4. Copy constructor vs assignment operator in C++

    Difficulty Level: Rookie Consider the following C++ program. 1 #include<iostream> 2 #include&l ...

  5. ubuntu基础

    下载地址: http://cdimage.ubuntu.com/releases/ #:配置多网卡静态IP地址和路由 root@ubuntu:~# vim /etc/netplan/01-netcfg ...

  6. Oracle常用函数(SQL语句)

    使用sql函数,您可以在一个select语句的查询当中,直接计算数据库资料的平均值.总数.最小值.最大值.总和.标准差.变异数等统计.使用recordset对象时,也可使用这些sql函数. sql函数 ...

  7. CentOs 7 yum 安装Nginx

    打开官网下载文档:http://nginx.org/en/download.html 2进入操作系统 centOs 7,建立文件夹 nginx ,进入nginx ,拷贝 上图1编辑命令:/etc/yu ...

  8. 【力扣】122. 买卖股票的最佳时机 II

    给定一个数组,它的第 i 个元素是一支给定股票第 i 天的价格. 设计一个算法来计算你所能获取的最大利润.你可以尽可能地完成更多的交易(多次买卖一支股票). 注意:你不能同时参与多笔交易(你必须在再次 ...

  9. 【C++】最长回文子串/动态规划

    ACM #include <bits/stdc++.h> using namespace std; const int maxn = 1010; char S[maxn]; int dp[ ...

  10. Jenkins多分支构建

    目录 一.创建多分支pipeline 二.根据分支部署 gitlab触发与多分支 Generic Webhook多分支 一.创建多分支pipeline 在实际中,需要多分支同时进行开发.如果每个分支都 ...