scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

前面介绍的线性模型都是从最小二乘，均方误差的角度去建立的，从最简单的最小二乘到带正则项的 lasso，ridge 等。而 Bayesian regression 是从 Bayesian 概率模型的角度出发的，虽然最后也会转换成一个能量函数的形式。

从前面的线性模型中，我们都假设如下的关系：

y=wx" role="presentation">y=wxy=wx

上面这个关系式其实是直接从值的角度来考虑，其实我们也可以假设如下的关系：

y=wx+ϵ" role="presentation">y=wx+ϵy=wx+ϵ

这个 ϵ" role="presentation" style="position: relative;">ϵϵ 表示一种误差，或者噪声，如果估计的值非常准确，那么 ϵ=0" role="presentation" style="position: relative;">ϵ=0ϵ=0, 否则，这将是一个随机数。

如果我们有一组训练样本，那么每个观察值 y" role="presentation" style="position: relative;">yy 都会有个对应的 ϵ" role="presentation" style="position: relative;">ϵϵ, 而且我们假设 ϵ" role="presentation" style="position: relative;">ϵϵ 是满足独立同分布的。那么我们可以用概率的形式表示为：

对于一组训练集，我们可以表示为：

最后，利用最大似然估计，可以将上面的表达式转化为一个能量最小的形式。上面是从最大似然估计的角度去求系数。

下面我们考虑从最大后验概率的角度，

p(w|y)=p(y|w)p(w|α)p(α)" role="presentation">p(w|y)=p(y|w)p(w|α)p(α)p(w|y)=p(y|w)p(w|α)p(α)

p(w|α)=N(w|0,α−1I)" role="presentation">p(w|α)=N(w|0,α−1I)p(w|α)=N(w|0,α−1I)

p(α)" role="presentation" style="position: relative;">p(α)p(α) 本身是服从 gamma 分布的。

sklearn 上也给出了一个例子：

import numpy as np

import matplotlib.pyplot as plt

from scipy import stats

from sklearn.linear_model import BayesianRidge, LinearRegression

# #############################################################################

# Generating simulated data with Gaussian weights

np.random.seed(0)

n_samples, n_features = 100, 100

X = np.random.randn(n_samples, n_features)  # Create Gaussian data

# Create weights with a precision lambda_ of 4.

lambda_ = 4.

w = np.zeros(n_features)

# Only keep 10 weights of interest

relevant_features = np.random.randint(0, n_features, 10)

for i in relevant_features:

    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))

# Create noise with a precision alpha of 50.

alpha_ = 50.

noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)

# Create the target

y = np.dot(X, w) + noise

# #############################################################################

# Fit the Bayesian Ridge Regression and an OLS for comparison

clf = BayesianRidge(compute_score=True)

clf.fit(X, y)

ols = LinearRegression()

ols.fit(X, y)

# #############################################################################

# Plot true weights, estimated weights, histogram of the weights, and

# predictions with standard deviations

lw = 2

plt.figure(figsize=(6, 5))

plt.title("Weights of the model")

plt.plot(clf.coef_, color='lightgreen', linewidth=lw,

         label="Bayesian Ridge estimate")

plt.plot(w, color='gold', linewidth=lw, label="Ground truth")

plt.plot(ols.coef_, color='navy', linestyle='--', label="OLS estimate")

plt.xlabel("Features")

plt.ylabel("Values of the weights")

plt.legend(loc="best", prop=dict(size=12))

plt.figure(figsize=(6, 5))

plt.title("Histogram of the weights")

plt.hist(clf.coef_, bins=n_features, color='gold', log=True,

         edgecolor='black')

plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),

            color='navy', label="Relevant features")

plt.ylabel("Features")

plt.xlabel("Values of the weights")

plt.legend(loc="upper left")

plt.figure(figsize=(6, 5))

plt.title("Marginal log-likelihood")

plt.plot(clf.scores_, color='navy', linewidth=lw)

plt.ylabel("Score")

plt.xlabel("Iterations")

# Plotting some predictions for polynomial regression

def f(x, noise_amount):

    y = np.sqrt(x) * np.sin(x)

    noise = np.random.normal(0, 1, len(x))

    return y + noise_amount * noise

degree = 10

X = np.linspace(0, 10, 100)

y = f(X, noise_amount=0.1)

clf_poly = BayesianRidge()

clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)

y_plot = f(X_plot, noise_amount=0)

y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)

plt.figure(figsize=(6, 5))

plt.errorbar(X_plot, y_mean, y_std, color='navy',

             label="Polynomial Bayesian Ridge Regression", linewidth=lw)

plt.plot(X_plot, y_plot, color='gold', linewidth=lw,

         label="Ground Truth")

plt.ylabel("Output y")

plt.xlabel("Feature X")

plt.legend(loc="lower left")

plt.show()

scikit-learn 学习笔记-- Generalized Linear Models （三）的更多相关文章

scikit-learn 学习笔记-- Generalized Linear Models (一)
scikit-learn 是非常优秀的一个有关机器学习的 Python Lib,包含了除深度学习之外的传统机器学习的绝大多数算法,对于了解传统机器学习是一个很不错的平台.每个算法都有相应的例子,既可以 ...
scikit-learn 学习笔记-- Generalized Linear Models （二）
Lasso regression 今天介绍另外一种带正则项的线性回归, ridge regression 的正则项是二范数,还有另外一种是一范数的,也就是lasso 回归,lasso 回归的正则项是系 ...
Andrew Ng机器学习公开课笔记 -- Generalized Linear Models
网易公开课,第4课 notes,http://cs229.stanford.edu/notes/cs229-notes1.pdf 前面介绍一个线性回归问题,符合高斯分布一个分类问题,logstic回 ...
机器学习-scikit learn学习笔记
scikit-learn官网:http://scikit-learn.org/stable/ 通常情况下,一个学习问题会包含一组学习样本数据,计算机通过对样本数据的学习,尝试对未知数据进行预测. 学习 ...
[Scikit-learn] 1.1 Generalized Linear Models - from Linear Regression to L1&L2
Introduction 一.Scikit-learning 广义线性模型 From: http://sklearn.lzjqsdd.com/modules/linear_model.html#ord ...
[Scikit-learn] 1.5 Generalized Linear Models - SGD for Regression
梯度下降一.亲手实现“梯度下降” 以下内容其实就是<手动实现简单的梯度下降>. 神经网络的实践笔记,主要包括: Logistic分类函数反向传播相关内容 Link: http://pe ...
[Scikit-learn] 1.5 Generalized Linear Models - SGD for Classification
NB: 因为softmax,NN看上去是分类,其实是拟合(回归),拟合最大似然. 多分类参见:[Scikit-learn] 1.1 Generalized Linear Models - Logist ...
[Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax
二分类:Logistic regression 多分类:Softmax分类函数对于损失函数,我们求其最小值, 对于似然函数,我们求其最大值. Logistic是loss function,即: 在逻 ...
广义线性模型（Generalized Linear Models）
前面的文章已经介绍了一个回归和一个分类的例子.在逻辑回归模型中我们假设: 在分类问题中我们假设: 他们都是广义线性模型中的一个例子,在理解广义线性模型之前需要先理解指数分布族. 指数分布族(The E ...

随机推荐

【译】Using Objects to Organize Your Code
耗了一个晚上吐血翻译不过也学到了不少...<使用对象来组织你的代码>,翻译中发现原作者在原文中有部分代码有误或不全,本文已修改和添加~ 丽贝卡·墨菲原文链接:http://rmurphey ...
Grid Search学习
转自:https://www.cnblogs.com/ysugyl/p/8711205.html Grid Search:一种调参手段:穷举搜索:在所有候选的参数选择中,通过循环遍历,尝试每一种可能性 ...
Ajax保留浏览器历史的两种解决方案（Hash&Pjax）
总是在github down点东西,github整个界面做的不错,体验也很好~对于其中的源代码滑动的特效最为喜欢了~刚开始以为这个只是普通的ajax请求效果,但是发现这个特效能够导致浏览器地址栏跟随变 ...
(android实战)破解apk
简单的总结几个关键步骤: 一.工具准备:apktool , dex2jar , jd-gui 二.使用dex2jar + jd-gui 得到apk的java源码 1.用解压工具从 apk包中取出 cl ...
禁止或强制使用堆分配---《C++必知必会》条款34
有时候,指明一些特定类的对象不应该被分配到堆(heap)上是个好主意.通常这是为了确保该对象的析构函数一定会得到调用.维护对象本身(body object)的引用计数的句柄对象(handle obje ...
ACM ICPC, JUST Collegiate Programming Contest (2018) Solution
A:Zero Array 题意:两种操作, 1 p v 将第p个位置的值改成v 2 查询最少的操作数使得所有数都变为0 操作为可以从原序列中选一个非0的数使得所有非0的数减去它,并且所有数不能 ...
哈尔滨理工大学软件与微电子学院第八届程序设计竞赛同步赛（高年级） Solution
A: Solved. 分别处理出每个%7后余数的数字个数,再组合一下 #include <bits/stdc++.h> using namespace std; #define ll lo ...
《零起点，python大数据与量化交易》
<零起点,python大数据与量化交易>,这应该是国内第一部,关于python量化交易的书籍. 有出版社约稿,写本量化交易与大数据的书籍,因为好几年没写书了,再加上近期"前海智库 ...
Cup fungus in Corvobado Nation Park,Costa Rica
python webdriver 测试框架-数据驱动xml驱动方式
数据驱动xml驱动的方式存数据的xml文件:TestData.xml: <?xml version="1.0" encoding="utf-8"?> ...

scikit-learn 学习笔记-- Generalized Linear Models （三）

Bayesian regression

scikit-learn 学习笔记-- Generalized Linear Models （三）的更多相关文章

随机推荐

热门专题