Kernel Methods for Deep Learning
@article{cho2009kernel,
title={Kernel Methods for Deep Learning},
author={Cho, Youngmin and Saul, Lawrence K},
pages={342--350},
year={2009}}
引
这篇文章介绍了一种新的核函数, 其启发来自于神经网络的运算.

其中\(\Theta(z)=\frac{1}{2}(1+\mathrm{sign}(z))\).
主要内容
主要性质, 公式(1)可以表示成:
\tag{2}
\]
其中:
\tag{3}
\]
\tag{4}
\]
特别的:

其证明如下:

第(17)的证明我没有推, 因为 contour integration 暂时不了解.
细心的读者可能会发现, 最后的结果是\(\frac{\partial^n}{\partial(\cos \theta)^n}\), 注意对于一个函数\(f(\cos \theta)\), 我们可以令\(g(\theta) = f(\cos \theta)\)则:
\]
又
\]
便得结论.
与深度学习的联系
如果我们把注意力集中在某一层, 假设输入为\(\mathbf{x}\), 输出为:
\]
其中\(g(z) = \Theta(z) z^n\)是激活函数, 不同的n有如下的表现:

\(n=1\)便是我们熟悉的ReLU.
考虑俩个输入\(\mathbf{x},\mathbf{y}\)所对应的输出\(\mathbf{f}(\mathbf{x}),\mathbf{f}(\mathbf{y})\)的内积:
\]
如果每个权重\(W_{ij}\)都服从标准正态分布, 则:
\]
实验
实验失败了, 代码如下.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import NuSVC
"""
Arc_cosine kernel
"""
class Arc_cosine:
def __init__(self, n=1):
self.n = n
self.own_kernel = self.kernels(n)
def kernel0(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
return 1 - angle / np.pi
def kernel1(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
sin_value = np.sin(angle)
return (norm_x * norm_y) ** self.n * \
(sin_value + (np.pi - angle) *
cos_value) / np.pi
def kernel2(self, x, y):
norm_x = np.linalg.norm(x)
norm_y = np.linalg.norm(y)
cos_value = x @ y / (norm_x *
norm_y)
angle = np.arccos(cos_value)
sin_value = np.sin(angle)
return (norm_x * norm_y) ** self.n * \
3 * sin_value * cos_value + \
(np.pi - angle) * (1 + 2 * cos_value ** 2)
def kernels(self, n):
if n is 0:
return self.kernel0
elif n is 1:
return self.kernel1
elif n is 2:
return self.kernel2
else:
raise ValueError("No such kernel, n should be "
"0, 1 or 2")
def kernel(self, X, Y):
m = X.shape[0]
n = Y.shape[0]
C = np.zeros((m, n))
for i in range(m):
for j in range(n):
C[i, j] = self.own_kernel(
X[i], Y[j]
)
return C
def __call__(self, X, Y):
return self.kernel(X, Y)
在俩个数据上进行SVM, 数据如下:


在SVM上跑:
'''
#生成圈圈数据
def generate_data(circle, r1, r2, nums=300):
variance = 1
rs1 = np.random.randn(nums) * variance + r1
rs2 = np.random.randn(nums) * variance + r2
angles = np.linspace(0, 2*np.pi, nums)
data1 = (rs1 * np.sin(angles) + circle[0],
rs1 * np.cos(angles) + circle[1])
data2 = (rs2 * np.sin(angles) + circle[0],
rs2 * np.cos(angles) + circle[1])
df1 = pd.DataFrame({'x':data1[0], 'y': data1[1],
'label':np.ones(nums)})
df2 = pd.DataFrame({'x':data2[0], 'y': data2[1],
'label':-np.ones(nums)})
return df1, df2
'''
#生成十字数据
def generate_data(left, right, down, up,
circle=(0., 0.), nums=300):
variance = 1
y1 = np.random.rand(nums) * variance + circle[1]
x2 = np.random.rand(nums) * variance + circle[0]
x1 = np.linspace(left, right, nums)
y2 = np.linspace(down, up, nums)
df1 = pd.DataFrame(
{'x': x1,
'y': y1,
'label':np.ones_like(x1)}
)
df2 = pd.DataFrame(
{'x': x2,
'y': y2,
'label':-np.ones_like(x2)}
)
return df1, df2
def pre_test(left, right, func, nums=100):
x1, y1 = left
x2, y2 = right
x = np.linspace(x1, x2, nums)
y = np.linspace(y1, y2, nums)
X,Y = np.meshgrid(x,y)
m, n = X.shape
Z = func(np.vstack((X.reshape(1, -1),
Y.reshape(1, -1))).T).reshape(m, n)
return X, Y, Z
df1, df2 = generate_data(-10, 10, -10, 10)
df = df1.append(df2)
classifer2 = NuSVC(kernel=Arc_cosine(n=1))
classifer2.fit(df.iloc[:, :2], df['label'])
X, Y, Z = pre_test((-10, -10), (10, 10), classifer2.predict)
plt.contourf(X, Y, Z)
plt.show()
预测结果均为:

而在一般的RBF上, 结果都是很好的:


在多项式核上也ok:


如果有人能发现代码中的错误,请务必指正.
Kernel Methods for Deep Learning的更多相关文章
- (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance
Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...
- 深度学习的集成方法——Ensemble Methods for Deep Learning Neural Networks
本文主要参考Ensemble Methods for Deep Learning Neural Networks一文. 1. 前言 神经网络具有很高的方差,不易复现出结果,而且模型的结果对初始化参数异 ...
- Paper List ABOUT Deep Learning
Deep Learning 方向的部分 Paper ,自用.一 RNN 1 Recurrent neural network based language model RNN用在语言模型上的开山之作 ...
- Deep Learning方向的paper
转载 http://hi.baidu.com/chb_seaok/item/6307c0d0363170e73cc2cb65 个人阅读的Deep Learning方向的paper整理,分了几部分吧,但 ...
- Kernel Functions for Machine Learning Applications
In recent years, Kernel methods have received major attention, particularly due to the increased pop ...
- Deep Learning and the Triumph of Empiricism
Deep Learning and the Triumph of Empiricism By Zachary Chase Lipton, July 2015 Deep learning is now ...
- How To Improve Deep Learning Performance
如何提高深度学习性能 20 Tips, Tricks and Techniques That You Can Use ToFight Overfitting and Get Better Genera ...
- My deep learning reading list
My deep learning reading list 主要是顺着Bengio的PAMI review的文章找出来的.包括几本综述文章,将近100篇论文,各位山头们的Presentation.全部 ...
- Deep Learning关于Vision的Reading List
最近开始学习深度学习了,加油! 下文转载自:http://blog.sina.com.cn/s/blog_bda0d2f10101fpp4.html 主要是顺着Bengio的PAMI review的文 ...
随机推荐
- A Child's History of England.28
By such means, and by taxing and oppressing the English people in every possible way, the Red King b ...
- Oracle—merge into语法
oracle的merge into语法,在这种情况下: 基于某些字段,存在就更新,不存在就插入: 不需要先去判断一下记录是否存在,直接使用merge into merge into 语法: MERGE ...
- GO 时间处理
比较大小 比较大小 先把当前时间格式化成相同格式的字符串,然后使用time的Before, After, Equal 方法即可. time1 := "2015-03-20 08:50:29& ...
- Copy constructor vs assignment operator in C++
Difficulty Level: Rookie Consider the following C++ program. 1 #include<iostream> 2 #include&l ...
- ubuntu基础
下载地址: http://cdimage.ubuntu.com/releases/ #:配置多网卡静态IP地址和路由 root@ubuntu:~# vim /etc/netplan/01-netcfg ...
- Oracle常用函数(SQL语句)
使用sql函数,您可以在一个select语句的查询当中,直接计算数据库资料的平均值.总数.最小值.最大值.总和.标准差.变异数等统计.使用recordset对象时,也可使用这些sql函数. sql函数 ...
- CentOs 7 yum 安装Nginx
打开官网下载文档:http://nginx.org/en/download.html 2进入操作系统 centOs 7,建立文件夹 nginx ,进入nginx ,拷贝 上图1编辑命令:/etc/yu ...
- 【力扣】122. 买卖股票的最佳时机 II
给定一个数组,它的第 i 个元素是一支给定股票第 i 天的价格. 设计一个算法来计算你所能获取的最大利润.你可以尽可能地完成更多的交易(多次买卖一支股票). 注意:你不能同时参与多笔交易(你必须在再次 ...
- 【C++】最长回文子串/动态规划
ACM #include <bits/stdc++.h> using namespace std; const int maxn = 1010; char S[maxn]; int dp[ ...
- Jenkins多分支构建
目录 一.创建多分支pipeline 二.根据分支部署 gitlab触发与多分支 Generic Webhook多分支 一.创建多分支pipeline 在实际中,需要多分支同时进行开发.如果每个分支都 ...