Foundations of Machine Learning: The PAC Learning Framework(2)

Foundations of Machine Learning: The PAC Learning Framework(2)

（一）假设集有限在一致性下的学习界。

在上一篇文章中我们介绍了PAC-learnable的定义，以及证明了一个例子是PAC-learnable。这一节我们介绍当hypothesis set是有限时，且算法$\mathcal{A}$相对与样本S满足一致性条件下的PAC问题。下一节介绍不一致条件下的PAC问题。

一致性(consistent)：如果一个算法产生的假设$h_s$不会在训练样本上产生错误，那么我们就说$h_s$ 相对与样本S是一致的。

定理 1.1 假设集$H$有限且算法跟样本一致条件下的学习界。令$H$为从$\mathcal{X}$ 到 $\mathcal{Y}$的映射集合，且$|H|$有限。假设对于任意的目标概念$c\in H$以及任意一个独立同分布的样本$S$，算法$\mathcal{A}$总是返回一致性的假设$h_s: \widehat{\mathcal{R}}(h_s)=0$（一致性要求）。那么对于任意的$\epsilon$，$\delta > 0$，如果$m\geq\frac{1}{\epsilon}(log|H|+log\frac{1}{\delta})$成立，那么以下不等式成立：

$$\mathop{Pr}_{s\backsim D^m}[\mathcal{R}(h_s) \leq \epsilon] \geq 1-\delta$$

同样，当样本大小满足$m\geq\frac{1}{\epsilon}(log|H|+log\frac{1}{\delta})$时，对任意$\epsilon$，$\delta > 0$，下面不等式至少以概率$1-\delta$成立：

\begin{align}R(h_S)\leq\frac{1}{m}(log|H|+log\frac{1}{\delta}) \label{equ:2}\end{align}

证明：由于我们无法知道算法将会选择哪一个一致性假设$h_S \in H$(因为这个假设是依赖与训练样本S),所以我们无法给出它的上界。但是我们可以通过给出满足一致性的所有假设的上界，而这个上界也必定是算法选择的那一个一致性假设的上界，即

\begin{eqnarray*} & &\mathop{Pr}\limits_{S \sim D^m}[\exists h\in H: \widehat{\mathcal{R}}(h)=0 \wedge \mathcal{R}(h)>\epsilon] \\ &=&\mathop{Pr}\limits_{S \sim D^m}[ (h_1 \in H,\widehat{\mathcal{R}}(h_1)=0 \wedge \mathcal{R}(h_1)>\epsilon) \\ & & \ \ \ \vee (h_2 \in H,\widehat{\mathcal{R}}(h_2)=0 \wedge \mathcal{R}(h_2)>\epsilon) \vee ...] \\ &\leq&\sum\limits_{h\in H}\mathop{Pr}[\widehat{\mathcal{R}}(h)=0 \wedge \mathcal{R}(h)> \epsilon ] \ \ \ \ \ \ \ (union\ bound) \\ &\leq&\sum\limits_{h\in H}\mathop{Pr}[\widehat{\mathcal{R}}(h)=0 | \mathcal{R}(h)> \epsilon ] \ \ \ \ \ \ (definition\ of\ conditional\ probability) \\ &\leq& |H|(1-\epsilon)^m \\ &\leq& |H|exp(-m\epsilon)\end{eqnarray*}

$\mathop{Pr}[\widehat{\mathcal{R}}(h)=0 | \mathcal{R}(h)> \epsilon ] $意味着在$\mathcal{R}(h)>\epsilon$条件下，在样本S上假设h没有产生错误, 而错误的概率为$\mathcal{R}(h) > \epsilon$, 所以上述条件不产生错误的概率小于等于$(1-\epsilon)^m$。
令$\delta =|H|exp(-m\epsilon)$,则$\epsilon = \frac{1}{m}(log|H|+log\frac{1}{\delta})$.
由$\delta >|H|exp(-m\epsilon)$,则$m \geq \frac{1}{\delta}(log|H|+log\frac{1}{\delta})$. 证毕！

这个定理表明：当假设集为有限集合时，一致性算法是一个PAC-learnable。并且从式子\ref{equ:2}中可以看出generalization error的上界随着m增长而减少，随着$|H|$的增长而增长，但减小的速度为$O(\frac{1}{m})$，而增长的速度为$O(log|H|)$。

例子：考虑这样一个概率集合：由至多n个二值变量$(x_1,x_2,...,x_n)$行成的合取式子，如$x_1\wedge \bar{x_2}\wedge x_5$，这里取$n=5$。对于每一个example, 合取式子都对应着一个结果，如$(1,0,0,0,1)$ 对应正结果，$(0,1,1,1,0)$ 对应负结果。现在我们构造这样一个算法：对每一个有正结果的example$(b_1,b_2,...,b_n)$, 如果$b_i=1$，那么$\bar{x_i}$ 在合取式子里的可能性被排除；如果$b_i=0$，那么$x_i$ 在合取式子里的可能性被排除。该算法对应的假设集为：$a_1 \wedge a_2 \wedge ... \wedge a_n$其中$a_i$ 可以为$x_i$，$\bar{x_i}$ 或者为空，也就是说$|H|=3^n$。
很显然这样构造出来的假设与样本是一致性的，也就说这是一个一致性的算法。所以我们可以利用上述定理得：对$\forall \epsilon >0,\delta>0$, 当$m \geq \frac{1}{\epsilon}(log_{3}n + log\frac{1}{\delta})$ 时，上述概念PAC-learnable。

（二）假设集有限在不一致性下的学习界。

先补充一下Hoeffding's不等式,以后的证明会大量用到。

Hoeffding's inequality：令$X_1,...,X_m$为取值为$[a_i,b_i]$的独立随机变量。那么对于任意$\varepsilon >0$，以下不等式成立，其中$S_m=\sum_{i=1}^mX_i$:

$$Pr[S_m-E[S_m]\geq \varepsilon]\leq e^{-2\varepsilon^2/\sum_{i=1}^m(b_i-a_i)^2}$$

$$Pr[S_m-E[S_m]\leq -\varepsilon]\leq e^{-2\varepsilon^2/\sum_{i=1}^m(b_i-a_i)^2}$$

上一节我们介绍了一致性条件下的PAC-learnable，但在实际情况下，我们的算法总是会在训练集上产生一些错误，也就是非一致性情况。这一节我们介绍非一致性情况。

推论 1.1 固定$\epsilon>0$。令$S$表示大小为$m$的独立同分布样本。那么对于任意的假设$h:\mathcal{X}\rightarrow \{ 0,1 \}$，以下不等式成立：

$$\mathop{Pr}\limits_{S\sim D^m}[ \widehat{\mathcal{R}}(h)-\mathcal{R}(h)\geq \epsilon ]\leq exp(-2m\epsilon^2),$$

$$\mathop{Pr}\limits_{S\sim D^m}[ \widehat{\mathcal{R}}(h)-\mathcal{R}(h)\leq -\epsilon ]\leq exp(-2m\epsilon^2),$$

通过联合界可以得到如下不等式：

$$\mathop{Pr}\limits_{S\sim D^m}[ \mid\widehat{\mathcal{R}}(h)-\mathcal{R}(h)\mid \geq \epsilon ]\leq 2exp(-2m\epsilon^2). $$

证明：对于样本$S=(x_1,...,x_m)$，令$X_i=\mathbb{I}((h(x_i)\neq c(x_i))$,则：

$$\widehat{\mathcal{R}}(h)=\frac{1}{m}\sum_{i=1}^m \mathbb{I}((h(x_i)\neq c(x_i))=\frac{1}{m}\sum_{i=1}^mX_i.$$

所以$S_m=m\widehat{\mathcal{R}}(h)$,又$E[\widehat{\mathcal{R}}(h)]=\mathcal{R}(h)$，则:

$$E(S_m)=m\mathcal{R}(h)$$

由Hoeffding's不等式可得：

$$\mathop{Pr}\limits_{S\sim D^m}[m\widehat{\mathcal{R}}(h)-m\mathcal{R}(h)\geq \epsilon']\leq e^{-2\epsilon'^2/m},$$

即：

$$\mathop{Pr}\limits_{S\sim D^m}[\widehat{\mathcal{R}}(h)-\mathcal{R}(h)\geq \frac{\epsilon'}{m}]\leq e^{-2\epsilon'^2/m}.$$

令$\epsilon=\frac{\epsilon'}{m}$,则$\mathop{Pr}\limits_{S\sim D^m}[ \widehat{\mathcal{R}}(h)-\mathcal{R}(h)\geq \epsilon ]\leq exp(-2m\epsilon^2).$
同理可证得第二个式子：
$$\mathop{Pr}\limits_{S\sim D^m}[ \widehat{\mathcal{R}}(h)-R(h)\leq -\epsilon ]\leq exp(-2m\epsilon^2)$$
再应用联合界即得到：

$$\mathop{Pr}\limits_{S\sim D^m}[ \mid\widehat{\mathcal{R}}(h)-\mathcal{R}(h)\mid \geq \epsilon ]\leq 2exp(-2m\epsilon^2).$$

证毕！

由上述推论可得以下推论：

推论 1.2 单个假设下的泛化界。固定一个假设$h:\mathcal{X}\rightarrow \{ 0,1 \}$。那么，对于任意$\delta>0$，以下不等式至少以概率$1-\delta$成立：

$$\mathcal{R}(h)\leq \widehat{\mathcal{R}}(h)+\sqrt{\frac{log\frac{2}{\delta}}{2m}}.$$

证明：令$\delta=2e^{-2m\epsilon^2} \Longrightarrow \epsilon = \sqrt{\frac{log\frac{2}{\delta}}{2m}}$. 证毕！

根据上面的两个引理，再考虑$\forall h\in H$时的bound，可推得我们要的结论：

定理 1.2 令$H$为有限假设集合。那么，对于任何$\delta>0$，以下不等式至少以概率$1-\delta$成立：

$$\forall h\in H,\ \mathcal{R}(h)\leq\widehat{\mathcal{R}}(h)+\sqrt{\frac{log|H|+log\frac{2}{\delta}}{2m}}.$$

证明：令$h_1,...,h_{|H|}$ 为集合H中的元素，应用联合界和推论1.1 可得：

\begin{eqnarray*} & & Pr[\exists h\in H|\widehat{\mathcal{R}}(h)-\mathcal{R}(h)>\epsilon] \\ &=& Pr[(\widehat{\mathcal{R}}(h_1)-\mathcal{R}(h_1)>\epsilon )\vee ... \vee \widehat{\mathcal{R}}(h_{|H|})-\mathcal{R}(h_{|h|})>\epsilon )] \\ &\leq& \sum\limits_{h\in H}Pr[|\widehat{\mathcal{R}}(h)-\mathcal{R}(h)|>\epsilon] \\ &\leq& 2|H|exp(-2m\epsilon^2). \end{eqnarray*}

令$\delta=2|H|exp(-2m\epsilon^2)$即可得证。证毕！

同样该定理表明，generalization error 的上界与$m$和$log|H|$相关，但这里多了一个根号。另外，上述定理还表明：

当$|H|$越大，empirical error 的上界越小，但$\sqrt{\frac{log|H|+log\frac{2}{\delta}}{2m}}$越大，所以这里有一个trade-off。
当m越大时，$\sqrt{\frac{log|H|+log\frac{2}{\delta}}{2m}}$越小，但 empirical error 越大，所以这里也有一个trade-off.
当 empirical error 一样时，我们应尽可能使$|H|$越小，这也符合Occam's Razor Principle。

Foundations of Machine Learning: The PAC Learning Framework(2)的更多相关文章

Foundations of Machine Learning: The PAC Learning Framework(1)
写在最前:本系列主要是在阅读 Mehryar Mohri 等的最新书籍<Foundations of Machine Learning>以及 Schapire 和 Freund 的 < ...
Foundations of Machine Learning: The Margin Explanation for Boosting's Effectiveness
Foundations of Machine Learning: The Margin Explanation for Boosting's Effectiveness 在这一节,我们要回答的一个问题 ...
Foundations of Machine Learning: Boosting
Foundations of Machine Learning: Boosting Boosting是属于自适应基函数(Adaptive basis-function Model(ABM))中的一种模 ...
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(2)
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(2) (一) 增长函数(Growth function) ...
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(1)
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(1) 前面两篇文章中,我们在给出PAC-learnabl ...
[FML]学习笔记二 PAC Learning Model
对于一个concept class C,如果存在一个算法A和一个多项式poly(.,.,.,.),有对于任意的ε>0.δ>0以及X的任意分布D和任何target concept C,当sa ...
神经机器翻译 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
论文:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE 综述背景及问题背景: 翻译: 翻译模型学习条件分布 ...
对Neural Machine Translation by Jointly Learning to Align and Translate论文的详解
读论文 Neural Machine Translation by Jointly Learning to Align and Translate 这个论文是在NLP中第一个使用attention机制 ...
[笔记] encoder-decoder NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
原文地址 :[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org) ...

随机推荐

mysqld_safe脚本执行的基本流程
mysqld_safe脚本执行的基本流程:1.查找basedir和ledir.2.查找datadir和my.cnf.3.对my.cnf做一些检查,具体检查哪些选项请看附件中的注释.4.解析my.cnf ...
进程操作篇atexit execl exit fprintf fscanf getpid nice get priority printf setpid system vfork wait waitpid
atexit(设置程序正常结束前调用的函数) 相关函数 _exit,exit,on_exit 表头文件 #include<stdlib.h> 定义函数 int atexit (void ( ...
特殊汉字“𣸭”引发的对于字符集的思考；mysql字符集；sqlalchemy字符集设置；客户端字符集设置；
字符集.字符序的概念与联系在数据的存储上,MySQL提供了不同的字符集支持.而在数据的对比操作上,则提供了不同的字符序支持. MySQL提供了不同级别的设置,包括server级.database级. ...
[leetcode]Sum Root to Leaf Numbers @ Python
原题地址:http://oj.leetcode.com/problems/sum-root-to-leaf-numbers/ 题意: Given a binary tree containing di ...
WF4.0(1)---WorkFlow简介
编程编的越久就发现自己以前的语文真的没学好,写个随笔取个名字都需要思考半天,以前工作的时候只是听说过工作流,知道的范围仅限于工作流在OA审批流程中用的比较多,现在自己实实在在的用工作流也做过不少项目, ...
EventBus 事件总线原理 MD
Markdown版本笔记我的GitHub首页我的博客我的微信我的邮箱 MyAndroidBlogs baiqiantao baiqiantao bqt20094 baiqiantao@sina ...
分解大量switch-case分支的两种方法
项目经过长期多人的维护,所谓人多手杂,出现不少过多过长的switch-case分支,或者多重switch-case嵌套.每每添加功能,我都会紧皱眉头,然后带着罪恶感向已经成百上千行的函数里再添上一个c ...
Effective C++ Item 43 学习处理模板化基类内的名称
本文为senlie原创.转载请保留此地址:http://blog.csdn.net/zhengsenlie 经验:可在derived class templates 内通过 "this-&g ...
Cognos 报表在列表上面显示汇总
一直以来,Cognos Report Studio设计报表的时候,汇总默认显示在列表下方: 1如图,拖一个列表 2运行如下,数据显示正常按日期排序 3选中订单笔数.订单金额,添加自动汇总 4:运行,可 ...
MATLAB数据处理快速学习教程
转自:http://blog.csdn.net/abcjennifer/article/details/7706581 本篇内容集合了MATLAB中的基本操作.数据存储与计算.数据的直线与曲线拟合与画 ...

Foundations of Machine Learning: The PAC Learning Framework(2)

Foundations of Machine Learning: The PAC Learning Framework(2)的更多相关文章

随机推荐

热门专题