1 导引

我们在上一篇博客《学习理论:预测器-拒绝器多分类弃权学习》中介绍了弃权学习的基本概念和方法,其中包括了下列针对多分类问题的单阶段预测器-拒绝器弃权损失\(L_{\text{abst}}\):

\[L_{\text{abst}}(h, r, x, y) = \underbrace{\mathbb{I}_{\text{h}(x) \neq y}\mathbb{I}_{r(x) > 0}}_{\text{不弃权}} + \underbrace{c(x) \mathbb{I}_{r(x)\leqslant 0}}_{\text{弃权}}
\]

其中\((x, y)\in \mathcal{X}\times \mathcal{Y}\)(标签\(\mathcal{Y} = \{1, \cdots, n\}\)(\(n\geqslant 2\))),\((h, r)\in \mathcal{H}\times\mathcal{R}\)为预测器-拒绝器对(\(\mathcal{H}\)和\(\mathcal{R}\)为两个从\(\mathcal{X}\)到\(\mathbb{R}\)的函数构成的函数族),\(\text{h}(x) = \text{arg max}_{y\in \mathcal{Y}} {h(x)}_y\)直接输出实例\(x\)的预测标签。为了简化讨论,在后文中我们假设\(c\in (0, 1)\)为一个常量花费函数。

设\(\mathcal{l}\)为在标签\(\mathcal{Y}\)上定义的0-1多分类损失的代理损失,则我们可以在此基础上进一步定义弃权代理损失\(L\):

\[L(h, r, x, y) = \mathcal{l}(h, x, y)\phi(-\alpha r(x)) + \psi(c) \phi(\beta r(x))
\]

其中\(\psi\)是非递减函数,\(\phi\)是非递增辅助函数(做为\(z \mapsto \mathbb{I}_{z \leqslant 0}\)的上界),\(\alpha\)、\(\beta\)为正常量。下面,为了简便起见,我们主要对\(\phi(z) = \exp(-z)\)进行分析,尽管相似的分析也可以应用于其它函数\(\phi\)。

在上一篇博客中,我们还提到了单阶段代理损失满足的\((\mathcal{H}, \mathcal{R})\)-一致性界:

定理 1 单阶段代理损失的\((\mathcal{H}, \mathcal{R})\) - 一致性界 假设\(\mathcal{H}\)是对称与完备的。则对\(\alpha=\beta\),\(\mathcal{l} = \mathcal{l}_{\text{mae}}\),或者\(\mathcal{l} = \mathcal{l}_{\rho}\)与\(\psi(z) = z\),或者\(\mathcal{l} = \mathcal{l}_{\rho - \text{hinge}}\)与\(\psi(z) = z\),有下列\((\mathcal{H}, \mathcal{R})\) - 一致性界对\(h\in \mathcal{H}, r\in \mathcal{R}\)和任意分布成立:

\[R_{L_{\text{abst}}}(h, r) - R_{L_{\text{abst}}}^{*}(\mathcal{H}, \mathcal{R}) + M_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}) \leqslant \Gamma(R_L(h, r) - R_{L}^{*}(\mathcal{H}, \mathcal{R}) + M_{L}(\mathcal{H}, \mathcal{R}))
\]

其中对\(\mathcal{l} = \mathcal{l}_{\text{mae}}\)取\(\Gamma (z) = \max\{2n\sqrt{z}, nz\}\);对\(\mathcal{l}=\mathcal{l}_{\rho}\)取\(\Gamma (z) = \max\{2\sqrt{z}, z\}\);对\(\mathcal{l} = \mathcal{l}_{\rho - \text{hinge}}\)取\(\Gamma (z) = \max\{2\sqrt{nz}, z\}\)。

不过,在上一篇博客中,我们并没有展示单阶段代理损失的\((\mathcal{H}, \mathcal{R})\)-一致性界的详细证明过程,在这片文章里我们来看该如何对该定理进行证明(正好我导师也让我仔细看看这几篇论文中相关的分析部分,并希望我掌握单阶段方法的证明技术)。

2 一些分析的预备概念

我们假设带标签样本\(S=((x_1, y_1), \cdots, (x_m, y_m))\)独立同分布地采自\(p(x, y)\)。则对于目标损失\(L_{\text{abst}}\)和代理损失\(L\)而言,可分别定义\(L_{\text{abst}}\)-期望弃权损失\(R_{L}(h, r)\)(也即目标损失函数的泛化误差)和\(L\)-期望弃权代理损失\(R_{L}(h, r)\)(也即代理损失函数的泛化误差)如下:

\[R_{L_{\text{abst}}}(h, r) = \mathbb{E}_{p(x, y)}\left[L_{\text{abst}}(h, r, x, y)\right], \quad R_{L}(h, r) = \mathbb{E}_{p(x, y)}\left[L(h, r, x, y)\right]
\]

设\(R_{{L}^{*}_{\text{abst}}}(\mathcal{H}, \mathcal{R}) = \inf_{h\in \mathcal{H}, r\in \mathcal{R}}R_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R})\)和\(R_{L}^{*}(\mathcal{H}, \mathcal{R}) = \inf_{h\in \mathcal{H}, r\in \mathcal{R}}R_{L}(\mathcal{H}, \mathcal{R})\)分别为\(R_{L_{\text{abst}}}\)和\(R_L\)在\(\mathcal{H}\times \mathcal{R}\)上的下确界。

为了进一步简化后续的分析,我们根据概率的乘法规则将\(R_L(h, r)\)写为:

\[R_{L}(h, r) = \mathbb{E}_{p(x, y)}\left[L(h, r, x, y)\right] = \mathbb{E}_{p(x)}\underbrace{\left[\mathbb{E}_{p(y\mid x)}\left[L(h, r, x, y)\right]\right]}_{\text{conditional risk }C_L}
\]

我们称其中内层的条件期望项为代理损失\(L\)的条件风险(conditional risk)(也称为代理损失\(L\)的pointwise风险),由于在其计算过程中\(y\)取期望取掉了,因此该项只和\(h\)、\(r\)、\(x\)相关,因此我们将其记为\(C_L(h, r, x)\):

\[C_L(h, r, x) = \mathbb{E}_{p(y\mid x)}\left[L(h, r, x, y)\right] = \sum_{y\in \mathcal{Y}}p(y\mid x)L(h, r, x, y)
\]

我们用\(C^*_L(\mathcal{H}, \mathcal{R}, x) = \inf_{h\in \mathcal{H}, r\in \mathcal{R}} C_L(h, r, x)\)来表示假设类最优(best-in-class) 的\(L\)的条件风险。同理,我们用\(C_{L_{\text{abst}}}\)来表示目标损失\(L_{\text{abst}}\)的条件风险,并用\(C^*_{L_{\text{abst}}}\)来表示假设类最优的\(L_{\text{abst}}\)的条件风险。

根据\(R_{L}^*(h, r)\)和\(C^*_L(\mathcal{H}, \mathcal{R}, x)\),我们可以表示出最小化能力差距(minimizability gap)

\[M_L(\mathcal{H}, \mathcal{R}) = R_{L}^*(\mathcal{H}, \mathcal{R}) - \mathbb{E}_{p(x)}\left[C_L^*(\mathcal{H}, \mathcal{R}, x)\right]
\]

\(M_{L_{\text{abst}}}\)的表示同理。

于是,我们可以对要证明的\((\mathcal{H}, \mathcal{R})\)-一致性界进行改写:

\[R_{L_{\text{abst}}}(h, r) - R_{L_{\text{abst}}}^{*}(\mathcal{H}, \mathcal{R}) + M_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}) \leqslant \Gamma(R_L(h, r) - R_{L}^{*}(\mathcal{H}, \mathcal{R}) + M_{L}(\mathcal{H}, \mathcal{R}))\\
\Rightarrow R_{L_{\text{abst}}}(h, r) - \mathbb{E}_{p(x)}\left[C_{L_{\text{abst}}}^*(\mathcal{H}, \mathcal{R}, x)\right] \leqslant \Gamma\left(R_{L}(h, r) - \mathbb{E}_{p(x)}\left[C_{L}^*(\mathcal{H}, \mathcal{R}, x)\right]\right)
\]

其中\(R_{L_{\text{abst}}}(h, r)\)和\(R_L(h, r)\)分别为\(\mathbb{E}_{p(x)}C_{L_{\text{abst}}}(h, r, x)\)和\(\mathbb{E}_{p(x)}C_{L}(h, r, x)\),于是上述不等式即为

\[\mathbb{E}_{p(x)}\underbrace{\left[C_{L_{\text{abst}}}(h, r, x) - C_{L_{\text{abst}}}^*(\mathcal{H}, \mathcal{R}, x)\right]}_{\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)} \leqslant \Gamma\left(\mathbb{E}_{p(x)}\underbrace{\left[C_{L}(h, r, x) - C_{L}^*(\mathcal{H}, \mathcal{R}, x)\right]}_{\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)}\right)
\]

我们将上述不等式两边的被取期望的项简记为\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)和\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\),其中\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)被称为校准差距(calibration gap)。由于按定义\(\Gamma(\cdot)\)是凹函数,由Jensen不等式有:

\[\mathbb{E}_{p(x)}\left[\Gamma\left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\right] \leqslant \Gamma\left(\mathbb{E}_{p(x)}\left[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right]\right)
\]

于是,若我们能证明下述不等式,则原不等式得证:

\[\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)
\]

我们后面将会看到,\((\mathcal{H}, \mathcal{R})\)-一致性界的证明过程中重要的一步即是证明\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)能被\(\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)界定。

3 \(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)的表示

我们先来看\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = C_{L_{\text{abst}}}(h, r, x) - C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x)\)如何表示。根据定义,我们有:

\[\begin{aligned}
C_{L_{\text{abst}}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x)L_{\text{abst}}(h, r, x, y) \\
&= \sum_{y\in \mathcal{Y}}p(y\mid x) \mathbb{I}_{\text{h}(x) \neq y}\mathbb{I}_{r(x) > 0} + c(x) \mathbb{I}_{r(x)\leqslant 0}
\end{aligned}
\]

由于是关于\(y\)的条件期望,上式最后一行中只需要对\(\mathbb{I}_{\text{h}(x) \neq y}\)进行加权求和即可。为了进一步对\(C_{L_{\text{abst}}}(h, r, x)\)进行表示,我们需要对\(r(x)\)的正负情况进行分类讨论:

  1. \(r(x) > 0\):此时\(C_{L_{\text{abst}}}(h, r, x) = \sum_{y\in \mathcal{Y}}p(y\mid x) \mathbb{I}_{\text{h}(x) \neq y} = 1 - p(\text{h}(x)\mid x)\)。
  2. \(r(x) \leqslant 0\):此时\(C_{L_{\text{abst}}}(h, r, x) = c\)。

接下来我们来看\(C^*_{L_{\text{abst}}}\)如何表示。我们假设拒绝函数集\(\mathcal{R}\)是完备的(也即对任意\(x\in \mathcal{X}, \{r(x): r\in \mathcal{R}\} = \mathbb{R}\)),那么\(\mathcal{R}\)也是弃权正规的(也即使得对任意\(x\in \mathcal{X}\),存在\(r_1, r_2\in \mathcal{R}\)满足\(r_1(x) > 0\)与\(r_2(x) \leqslant 0\))。于是我们有

\[\begin{aligned}
C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) &= \inf_{h\in \mathcal{H}, r\in \mathcal{R}}C_{L_{\text{abst}}}(h, r, x)\\
& = \min \left\{\min_{h\in \mathcal{H}}\left(1 - p\left( \text{h}(x)\mid x\right)\right), c\right\}\\
& = 1 - \max\left\{\max_{h\in \mathcal{H}}p\left(\text{h}(x)\mid x\right), 1 - c\right\}
\end{aligned}
\]

我们假设\(\mathcal{H}\)是对称的且完备的(具体定义参见博客《学习理论:预测器-拒绝器多分类弃权学习》),则我们有\(\{\text{h}(x): h\in \mathcal{H}\} = \mathcal{Y}\),于是

\[C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = 1 - \max\left\{\max_{y\in \mathcal{Y}}p\left(y\mid x\right), 1 - c\right\}
\]

为了进一步对\(C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{y}, x)\)进行表示,我们需要对\(\max_{y\in \mathcal{Y}}p(y\mid x)\)和\((1 - c)\)的大小比较情况进行分类讨论:

  1. \(\max_{y\in \mathcal{Y}}p(y\mid x) > 1 - c\):此时\(C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = 1 - \max_{y\in \mathcal{Y}}p(y\mid x)\)。
  2. \(\max_{y\in \mathcal{Y}}p(y\mid x) \leqslant 1 - c\):此时\(C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) = c\)。

于是,我们有:

\[\begin{aligned}
\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_{L_{\text{abst}}}(h, r, x) - C^*_{L_{\text{abst}}}(\mathcal{H}, \mathcal{R}, x) \\
& = \left\{\begin{aligned}
&\max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x)\quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c),r(x) > 0 \\
&1 - c - p(\text{h}(x)\mid x) \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c),r(x) > 0 \\
&0 \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c),r(x) \leqslant 0 \\
&\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c \quad &\text{if } \max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c),r(x) \leqslant 0 \\
\end{aligned}\right.
\end{aligned}
\]

4 \(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)的表示

4.1 分类讨论的准备

接下来我们来看\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) = C_L(h, r, x) - C^*_L(\mathcal{H}, \mathcal{R}, x)\)如何表示。根据定义,若\(\alpha = \beta\),\(\phi(z) = \exp(-z)\),我们有:

\[\begin{aligned}
C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x)L(h, r, x, y) \\
&= \sum_{y\in \mathcal{Y}}p(y\mid x) \mathcal{l}(h, x, y)e^{\alpha r(x)} + \psi(c) e^{-\alpha r(x)}
\end{aligned}
\]

由于是关于\(y\)的条件期望,上式最后一行中只需要对\(\mathcal{l}(h, x, y)\)进行加权求和即可。在后文中我们将会针对下列三种不同的\(\mathcal{l}\)函数以及\(\psi(z)\)的选择情况来分别对\(C_L(h, r, x)\)进行讨论:

  1. \(\mathcal{l} = \mathcal{l}_{\text{mae}}\),\(\psi(z) = z\);
  2. \(\mathcal{l} = \mathcal{l}_{\rho}\),\(\psi(z) = z\);
  3. \(\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}\),\(\psi(z) = nz\)。

这三种不同\(\mathcal{l}\)的定义参见博客《学习理论:预测器-拒绝器多分类弃权学习》),我在这里把它们的定义贴一下:

  • 平均绝对误差损失:\(\mathcal{l}_{\text{mae}}(h, x, y) = 1 - \frac{e^{{h(x)}_y}}{\sum_{y^{\prime}\in \mathcal{Y}}e^{{h(x)}_{y^{\prime}}}}\);
  • 约束\(\rho\)-合页损失:\(\mathcal{l}_{\rho-\text{hinge}}(h, x, y) = \sum_{y^{\prime}\neq y}\phi_{\rho-\text{hinge}}(-{h(x)}_{y^{\prime}}), \rho > 0\),其中\(\phi_{\rho-\text{hinge}}(z) = \max\{0, 1 - \frac{z}{\rho}\}\)为\(\rho\)-合页损失,且约束条件\(\sum_{y\in \mathcal{Y}}{h(x)}_y=0\)。
  • \(\rho\)-间隔损失:\(\mathcal{l}_{\rho}(h, x, y) = \phi_{\rho}({\rho_h (x, y)})\),其中\(\rho_{h}(x, y) = h(x)_y - \max_{y^{\prime} \neq y}h(x)_{y^{\prime}}\)是置信度间隔,\(\phi_{\rho}(z) = \min\{\max\{0, 1 - \frac{z}{\rho}\}, 1\}, \rho > 0\)为\(\rho\)-间隔损失。

4.2 \(\mathcal{l} = \mathcal{l}_{\text{mae}}\),\(\psi(z) = z\)

在这种情况下\(C_L(h, r, x)\)可以表示为:

\[\begin{aligned}
C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\left(1 - \frac{e^{{h(x)}_y}}{\sum_{y^{\prime}\in \mathcal{Y}}e^{{h(x)}_{y^{\prime}}}}\right)}_{\mathcal{l}_{\text{mae}}}e^{\alpha r(x)} + c e^{-\alpha r(x)} \\
&= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}
\end{aligned}
\]

其中\(s_h(x, y) = \frac{e^{{h(x)}_y}}{\sum_{y^{\prime}\in \mathcal{Y}}e^{{h(x)}_{y^{\prime}}}}\)。

于是

\[\begin{aligned}
C_L^*(\mathcal{H}, \mathcal{R}, x) &= \inf_{h\in \mathcal{H}, r\in\mathcal{R}} \left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\} \\
&= \inf_{r\in\mathcal{R}} \left\{\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)\right\}e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\}
\end{aligned}
\]

由于假设了\(\mathcal{H}\)是对称的与完备的,我们有

\[\begin{aligned}
&\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y\right))\right\} \\
&= 1 - \sup_{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)s_h(x, y) \\
&= 1 - \max_{y\in \mathcal{Y}}p(y\mid x)\quad \left(s_h(x, y)\in (0, 1)\right)
\end{aligned}
\]

实际上,对任意\(h\in \mathcal{H}\),有:

\[\begin{aligned}
&\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\
&= \max_{y\in \mathcal{Y}} p(y\mid x) - \sum_{y\in \mathcal{Y}}p(y\mid x)s_h(x, y) \\
&= \max_{y\in \mathcal{Y}} p(y\mid x) - \left(p\left(\text{h}(x)\mid x\right)s_h\left(x, \text{h}(x)\right) + \sum_{y\neq \text{h}(x)}p(y\mid x)s_h(x, y)\right) \\
&\geqslant \max_{y\in \mathcal{Y}} p(y\mid x) - \left(p\left(\text{h}(x)\mid x\right)s_h\left(x, \text{h}(x)\right) + \max_{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h\left(x, \text{h}(x)\right)\right)\right) \\
&= s_h\left(x, \text{h}(x)\right)\left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) \\
&\geqslant \frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right)
\end{aligned}
\]

这个结论我们会在后面的证明中多次用到。该结论的一个推论是如果分类器\(h^*\)为贝叶斯最优分类器(也即\(p(\text{h}^*(x)\mid x) = \max_{y\in \mathcal{Y}} p(y\mid x)\)),则\(\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right) \geqslant 0\),可直观地将其理解为\(\mathbb{E}_{p(y\mid x)}\left[\mathcal{l}_{\text{mae}}\right]\)更可能接近其下确界。

于是

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = \inf_{r\in\mathcal{R}} \left\{\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}\right\}
\]

记上式中需要求极值的部分为泛函\(F(r)\),则其泛函导数为

\[\frac{\delta F}{\delta r(x)} = \alpha \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} - c\alpha e^{-\alpha r(x)}
\]

令\(\frac{\delta F}{\delta p(x)} = 0\)(对\(\forall x\in \mathcal{X}\)),解得\(r^*(x) = -\frac{1}{2\alpha}\log \left(\frac{1 - \max_{y\in \mathcal{Y}}p(y\mid x)}{c}\right)\)。将其代入\(F(r)\)可得:

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p(y\mid x))}
\]

于是

\[\begin{aligned}
\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*_L(\mathcal{H}, \mathcal{R}, x) \\
&= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p(y\mid x))}
\end{aligned}
\]

为了构建\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)和\(\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)的不等式关系,接下来我们将会采用第3节中类似的做法,针对\(\max_{y\in \mathcal{Y}} p(y\mid x)\)与\(1 - c\)的大小比较情况与\(r(x)\)的正负情况来对\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)进行分类讨论:

  1. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) > 0\):

    此时

    \[\begin{aligned}
    \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\
    & \geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - \left(c + \underbrace{\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)}_{<c}\right) \\
    & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad (\text{AM-GM inequality}) \\
    &\geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - ce^{-\alpha r(x)} - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)e^{\alpha r(x)} \\
    &\geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)\\
    &\geqslant \frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) \\
    &= \frac{1}{n} \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)
    \end{aligned}
    \]

    (其中\(\text{AM-GM inequality}\)为算术-几何平均值不等式)

    取\(\Gamma (z) = nz\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)得证。

  2. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) > 0\):

    \[ \begin{aligned}
    \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\
    & \geqslant \underbrace{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h\left(x, y\right)\right)}_{\geqslant c}e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(\sum_{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h(x, y)\right)\right)} \\
    & \geqslant \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) + c - 2\sqrt{c\left(\sum_{y\in \mathcal{Y}}p(y\mid x)\left(1 - s_h(x, y)\right)\right)} \\
    &= \left(\sqrt{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)} - \sqrt{c}\right)^2 \\
    &= \left(\frac{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - c}{\sqrt{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)} + \sqrt{c}}\right)^2 \\
    &\geqslant \left(\frac{\sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right) - \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right) + \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c\right)}{2}\right)^2 \\
    &\geqslant \left(\frac{\frac{1}{n} \left(\max_{y\in \mathcal{Y}}p(y\mid x) - p\left(\text{h}(x)\mid x\right)\right) + \frac{1}{n}\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c\right)}{2}\right)^2 \\
    &= \frac{1}{4n^2}\left(1 - c - p\left(\text{h}(x)\mid x\right)\right)^2 \\
    &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4n^2}
    \end{aligned}
    \]

    取\(\Gamma (z) = 2n\sqrt{z}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)得证。

  3. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) \leqslant 0\):

    由于此时\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0\),因此\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma\left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)对任意\(\Gamma \geqslant 0\)成立。

  4. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) \leqslant 0\):

    \[ \begin{aligned}
    \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \left(1 - s_h(x, y)\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\
    &\geqslant \left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)\underbrace{e^{\alpha r(x)}}_{\leqslant 1} + c \underbrace{e^{-\alpha r(x)}}_{\geqslant 1} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\
    &\geqslant 1 - \max_{y\in \mathcal{Y}}p(y\mid x) + c - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p(y\mid x)\right)} \\
    &= \left(\sqrt{1 - \max_{y\in \mathcal{Y}}p(y\mid x)} - \sqrt{c}\right)^2 \\
    &= \left(\frac{1 - \max_{y\in \mathcal{Y}}p(y\mid x) - c}{\sqrt{1 - \max_{y\in \mathcal{Y}}p(y\mid x)} + \sqrt{c}}\right)^2 \\
    &\geqslant \left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 \\
    &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4}
    \end{aligned}
    \]

    取\(\Gamma (z) = 2\sqrt{z}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)得证。

综上所述,若取\(\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2n\sqrt{z}, nz\}\),则恒有\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)。于是\(\mathcal{l} = \mathcal{l}_{\text{mae}}\),\(\psi(z) = z\)时单阶段代理损失的\((\mathcal{H}, \mathcal{R})\)-一致性界得证。

4.3 \(\mathcal{l} = \mathcal{l}_{\rho}\),\(\psi(z) = z\)

在这种情况下\(C_L(h, r, x)\)可以表示为:

\[\begin{aligned}
C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\min\left\{\max\left\{0, 1 - \frac{\rho_h(x, y)}{\rho}\right\}, 1\right\}}_{\mathcal{l}_{\rho}}e^{\alpha r(x)} + c e^{-\alpha r(x)} \\
&= \left(1 - \sum_{y\in \mathcal{Y}} p(y\mid x)\max\left\{\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}, 0\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} \\
&= \left(1 - \sum_{y\in \mathcal{Y}} p(y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)}
\end{aligned}
\]

其中\(\rho_h(x, y) = h(x)_y - \max_{y^{\prime}\neq y}h(x)_{y^{\prime}}\)为间隔。

由于假设了\(\mathcal{H}\)是对称的与完备的,我们有

\[\begin{aligned}
&\inf_{h\in \mathcal{H}}\left\{1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right\} \\
&= 1 - \sup_{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\} \\
&= 1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\quad (\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\in [0, 1])
\end{aligned}
\]

实际上,对任意\(h\in \mathcal{H}\),有:

\[\begin{aligned}
&\left(1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right) - \left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\
&= \max_{y\in \mathcal{Y}} p(y\mid x) - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\} \\
&= \max_{y\in \mathcal{Y}} p(y\mid x) - \min \left\{1, \frac{\rho_h\left(x, \text{h}(x)\right)}{\rho}\right\}p\left(\text{h}(x)\mid x\right) \\
&\geqslant \max_{y\in \mathcal{Y}}p\left(y\mid x\right) - p\left(\text{h}(x)\mid x\right)
\end{aligned}
\]

和之前\(\mathcal{l}_{\text{mae}}\)的证明类似,这个结论我们会在后面的证明中多次用到。

于是和之前\(\mathcal{l}_{\text{mae}}\)类似,我们有

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))}
\]

于是

\[\begin{aligned}
\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*_L(\mathcal{H}, \mathcal{R}, x) \\
&= \left(1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))}
\end{aligned}
\]

为了构建\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)和\(\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)的不等式关系,接下来我们将会采用\(\mathcal{l}_{\text{mae}}\)的证明中类似的做法,针对\(\max_{y\in \mathcal{Y}} p(y\mid x)\)与\(1 - c\)的大小比较情况与\(r(x)\)的正负情况来对\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)进行分类讨论:

  1. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) > 0\):

    此时

    \[\begin{aligned}
    \Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= \left(1 - \sum_{y\in \mathcal{Y}} p (y\mid x)\min\left\{1, \frac{\rho_h(x, y)}{\rho}\right\}\right)e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c\left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right)} \\
    &\geqslant \frac{1}{4}\left(1 - c - p\left(\text{h}\left(x\right)\mid x\right)\right)^2 \\
    &= \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4}
    \end{aligned}
    \]

    (由于证明步骤与\(\mathcal{l}_{\text{mae}}\)类似,这里对证明步骤进行了一些精简,下面同理)

    取\(\Gamma_2 (z) = 2\sqrt{z}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

  2. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) > 0\):

    \[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x) = \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)
    \]

    取\(\Gamma_1 (z) = z\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

  3. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) \leqslant 0\):

    由于此时\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0\),因此\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)对任意\(\Gamma \geqslant 0\)成立。

  4. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) \leqslant 0\):

    \[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 = \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4}
    \]

    取\(\Gamma_3 (z) = 2\sqrt{z}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

综上所述,若取\(\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2\sqrt{z}, z\}\),则恒有\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)。于是\(\mathcal{l} = \mathcal{l}_{\rho}\),\(\psi(z) = z\)时单阶段代理损失的\((\mathcal{H}, \mathcal{R})\)-一致性界得证。

4.4 \(\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}\),\(\psi(z) = nz\)

在这种情况下\(C_L(h, r, x)\)可以表示为:

\[\begin{aligned}
C_L(h, r, x) &= \sum_{y\in \mathcal{Y}}p(y\mid x) \underbrace{\sum_{y^{\prime} \neq y}\max\left\{0, 1 + \frac{h(x)_{y^{\prime}}}{\rho}\right\}}_{\mathcal{l}_{\rho}-\text{hinge}}e^{\alpha r(x)} + nce^{-\alpha r(x)} \\
&= \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}e^{\alpha r(x)} + nce^{-\alpha r(x)} \\
\end{aligned}
\]

由于假设了\(\mathcal{H}\)是对称的与完备的,我们有

\[\begin{aligned}
&\inf_{h\in \mathcal{H}}\left\{\sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}\right\} \\
&= n - \sup_{h\in \mathcal{H}}\sum_{y\in \mathcal{Y}}p(y\mid x)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\} \\
&= n\left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right)
\end{aligned}
\]

实际上,若取\(h_{\rho}\)使得\(h_{\rho}(x)_y = \left\{\begin{aligned}
&h(x)_y\quad &\text{if } y\notin \left\{y_{\max}, \text{h}(x)\right\} \\
&-\rho \quad &\text{if } y = \text{h}(x) \\
&h\left(x\right)_{y_{\text{max}}} + h\left(x\right)_{\text{h}(x)} + \rho \quad &\text{if } y = y_{\text{max}} \\
\end{aligned}\right.\)满足约束\(\sum_{y\in \mathcal{Y}}h_{\rho}(y\mid x)=0\),其中\(y_{\max} = \text{arg max}_{y\in \mathcal{Y}}p(y\mid x)\),则对任意\(h\in \mathcal{H}\)有:

\[\begin{aligned}
&\sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\} - n\left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\
&\geqslant \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}\right\} - n\left(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right)\right) \\
&\geqslant \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}\right\} \\
&\quad - \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\min\left\{n, \max\left\{0, 1 + \frac{h_{\rho}(x)_y}{\rho}\right\}\right\} \\
&= \left(p(y_{\text{max}}\mid x) - p(\text{h}(x)\mid x)\right)\min\left\{n, 1 + \frac{h(x)_{\text{h}(x)}}{\rho}\right\} \\
&\geqslant \max_{y\in \mathcal{Y}}p\left(y\mid x\right) - p\left(\text{h}\left(x\right)\mid x\right)
\end{aligned}
\]

和之前\(\mathcal{l}_{mae}\)、\(\mathcal{l}_{\rho}\)的证明类似,这个结论我们会在后面的证明中多次用到。

于是和之前\(\mathcal{l}_{mae}\)、\(\mathcal{l}_{\rho}\)类似,我们有

\[C_L^*(\mathcal{H}, \mathcal{R}, x) = 2\sqrt{n^2c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))}
\]

于是

\[\begin{aligned}
\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) &= C_L(h, r, x) - C^*_L(\mathcal{H}, \mathcal{R}, x) \\
&= \sum_{y\in \mathcal{Y}}\left(1 - p(y\mid x)\right)\max\left\{0, 1 + \frac{h(x)_y}{\rho}\right\}e^{\alpha r(x)} + c e^{-\alpha r(x)} - 2\sqrt{c(1 - \max_{y\in \mathcal{Y}}p\left(y\mid x\right))}
\end{aligned}
\]

为了构建\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)\)和\(\Gamma \left(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\right)\)的不等式关系,接下来我们将会采用\(\mathcal{l}_{\text{mae}}\)、\(\mathcal{l}_{\rho}\)的证明中类似的做法,针对\(\max_{y\in \mathcal{Y}} p(y\mid x)\)与\(1 - c\)的大小比较情况与\(r(x)\)的正负情况来对\(\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)\)进行分类讨论:

  1. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) > 0\):

    此时

    \[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x)
    \geqslant \max_{y\in \mathcal{Y}}p(y\mid x) - p(\text{h}(x)\mid x) = \Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2
    \]

    取\(\Gamma_1 (z) = z\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

  2. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) > 0\):

    \[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant \frac{1}{4n}\left(1 - c - p\left(\text{h}\left(x\right)\mid x\right)\right)^2 = \frac{\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4n}
    \]

    取\(\Gamma_1 (z) = 2\sqrt{nz}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

  3. \(\max_{y\in \mathcal{Y}} p(y\mid x) \leqslant (1 - c)\),\(r(x) \leqslant 0\):

    由于此时\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) = 0\),因此\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)对任意\(\Gamma \geqslant 0\)成立。

  4. \(\max_{y\in \mathcal{Y}} p(y\mid x) > (1 - c)\),\(r(x) \leqslant 0\):

    \[\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x) \geqslant n\left(\frac{\max_{y\in \mathcal{Y}}p(y\mid x) - 1 + c}{2}\right)^2 = \frac{n\Delta C_{\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x)^2}{4}
    \]

    取\(\Gamma_3 (z) = 2\sqrt{z/n}\),于是\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)得证。

综上所述,若取\(\Gamma(z) = \max\{\Gamma_1(z), \Gamma_2(z), \Gamma_3(z)\} = \max\{2\sqrt{nz}, z\}\),则恒有\(\Delta C_{L_\text{abst}, \mathcal{H}, \mathcal{R}}(h, r, x) \leqslant \Gamma (\Delta C_{L, \mathcal{H}, \mathcal{R}}(h, r, x))\)。于是\(\mathcal{l} = \mathcal{l}_{\rho-\text{hinge}}\),\(\psi(z) = nz\)时单阶段代理损失的\((\mathcal{H}, \mathcal{R})\)-一致性界得证。

参考

  • [1] Mao A, Mohri M, Zhong Y. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms[C]//International Conference on Algorithmic Learning Theory. PMLR, 2024: 822-867.
  • [2] Cortes C, DeSalvo G, Mohri M. Boosting with abstention[J]. Advances in Neural Information Processing Systems, 2016, 29.
  • [3] Ni C, Charoenphakdee N, Honda J, et al. On the calibration of multiclass classification with rejection[J]. Advances in Neural Information Processing Systems, 2019, 32.
  • [4] Han Bao: Learning Theory Bridges Loss Functions
  • [5] Crammer K, Singer Y. On the algorithmic implementation of multiclass kernel-based vector machines[J]. Journal of machine learning research, 2001, 2(Dec): 265-292.
  • [6] Awasthi P, Mao A, Mohri M, et al. Multi-Class $ H $-Consistency Bounds[J]. Advances in neural information processing systems, 2022, 35: 782-795.

学习理论:单阶段代理损失的(H, R) - 一致界证明的更多相关文章

  1. ICCV2021 | TOOD:任务对齐的单阶段目标检测

    ​前言  单阶段目标检测通常通过优化目标分类和定位两个子任务来实现,使用具有两个平行分支的头部,这可能会导致两个任务之间的预测出现一定程度的空间错位.本文提出了一种任务对齐的一阶段目标检测(TOOD) ...

  2. H - R(N)

    H - R(N) Time Limit:1000MS     Memory Limit:32768KB     64bit IO Format:%I64d & %I64u Submit Sta ...

  3. Attrib +s +a +h +r 隐藏文件原理与破解

    制作了一个PE启动盘,不过这个启动盘不能深度隐藏,否则没效果,可以又想不让别人看见PE启动盘的一些内容,防止别人误删或者修改,于是就想找一种可以隐藏文件的方法,普通的隐藏文件的方法如下:

  4. python 在头文件添加 #include \"stdafx.h\"\r\n

    import osimport shutil#-*- coding:cp936 -*-import codecsfrom sys import argv def replace_all_files(p ...

  5. 在 .h 和 cpp 中查找 :grep consume ~/test/2016/AMQP-CPP/**/*.cpp ~/test/2016/AMQP-CPP/**/*.h -r

    :grep consume ~/test/2016/AMQP-CPP/**/*.cpp ~/test/2016/AMQP-CPP/**/*.h -r -w "whole" 匹配整个 ...

  6. BUG-jQuery提交表单submit方法-TypeError: e[h] is not a function

    问题:button按钮设置id为submit后,表单jquery.submit()无法提交,报告异常TypeError: e[h] is not a function 源码: 解决:参考http:// ...

  7. nginx实现单服务代理多域名

    通过一台nginx服务器代理多个域名进行跳转,原理很简单,重点在玩法!适用于公司处理域名紧急备案问题. 域名: www.hx123.com www.hx456.com nginx服务器: ginx.c ...

  8. 恒天云单节点部署指南--OpenStack H版本虚拟机单节点部署解决方案

    本帖是openstack单节点在虚拟机上部署的实践.想要玩玩和学习openstack的小伙伴都看过来,尤其是那些部署openstack失败的小伙伴.本帖可以让你先领略一下openstack的魅力.本I ...

  9. 经典书单、站点 —— 大数据/数据分析/R语言

    1. 科普.入门 <大数据智能>,刘知远.崔安顺等著: 特色:系统,宏观和全面: 2. R 语言站点 http://langdawei.com/:R 语言数据采集与可视化:

  10. 如​何​屏​蔽​C​h​r​o​m​e​、​S​a​f​a​r​i​等​W​e​b​k​i​t​内​核​浏​览​器​文​本​框​和​文​本​域​的​高​亮​边​框​、​可​变​大​小​等​自​动​外​观​处​理

    1.高亮外框的取消 input { outline: none; } textarea { outline: none; } 如上,使用CSS的outline就可以实现 2.文本域缩放功能的取消 也是 ...

随机推荐

  1. TypeScript 源码详细解读(2)词法1-字符处理

    本节文章研究的代码位于 tsc/src/compiler/scanner.ts 字符 任何源码都是由很多字符组成的,这些字符可以是字母.数字.空格.符号.汉字等-- 每一个字符都有一个编码值,比如字符 ...

  2. 【NAS】绿联NAS+极狐Gitlab+1Panel

    1. 准备域名 例如我的 ???.mllt.cc 2. 内网穿透 我使用的Natfrp(https://www.natfrp.com/tunnel/) 创建HTTP隧道(对应端口10080)创建HTT ...

  3. 2024年1月Java项目开发指南4:IDEA里配置MYSQL

    提前声明:文章首发博客园(cnblogs.com/mllt) 自动"搬家"(同步)到CSDN,如果博客园中文章发生修改是不会同步过去的,所以建议大家到我的博客园中查看 前提条件: ...

  4. 【uni-app】【01】底部导航栏与页面切换

    1.(配置文件在哪)uni-app 路由控制是在 pages.json文件中的. 2.(基本配置项有哪些)初学的时候主要有三个配置项,①pages ② globalStyle ③ tabbar [!T ...

  5. Docker 容器运行一个 web 应用

    Docker 容器安装和基础使用请看上一篇 Docker 容器运行一个 web 应用 使用 docker 构建一个 web 应用程序. docker pull training/webapp # 载入 ...

  6. CDS标准视图:预期应收 I_FutureAccountsReceivables

    视图名称:预期应收 视图类型:参数 视图代码: 点击查看代码 //Documentation about annotations can be found at http://help.sap.com ...

  7. 阿里云-数据库-ClickHouse

    https://help.aliyun.com/product/144466.html 云数据库ClickHouse是开源列式数据库管理系统ClickHouse在阿里云上的托管服务,用户可以在阿里云上 ...

  8. Maven详操作指南

    学习目标1. Maven 的环境搭建2. Pom.xml 文件的使用3. Maven 指令的使用4. Idea 创建 Maven 项目使用 Maven 改变传统项目构建为什么使用Maven,解决了哪些 ...

  9. 解析mysql奇葩语句

    首先看看完整的表如下图 那么看看一个比较奇葩的语句 select * from users where name = 'aa'='bb'这个语句为啥能执行成功以及为什么打印出了除了两个aa之外的所有行 ...

  10. 恭喜我同事的论文被IEEE HPCC收录!

    近日,由天翼云科技有限公司云网产品事业部天玑实验室撰写的<关于公有云区分负载QoS感知的内存资源动态超分管理优化>(Thoth:Provisioning Overcommitted Mem ...