stacked generalization 堆积正则化 堆积泛化 加权特征线性堆积
https://en.wikipedia.org/wiki/Ensemble_learning
Stacking
Stacking (sometimes called stacked generalization) involves training a learning algorithm to combine the predictions of several other learning algorithms. First, all of the other algorithms are trained using the available data, then a combiner algorithm is trained to make a final prediction using all the predictions of the other algorithms as additional inputs. If an arbitrary combiner algorithm is used, then stacking can theoretically represent any of the ensemble techniques described in this article, although in practice, a single-layer logistic regression model is often used as the combiner.
Stacking typically yields performance better than any single one of the trained models.[22] It has been successfully used on both supervised learning tasks (regression,[23]classification and distance learning [24]) and unsupervised learning (density estimation).[25] It has also been used to estimate bagging's error rate.[3][26] It has been reported to out-perform Bayesian model-averaging.[27] The two top-performers in the Netflix competition utilized blending, which may be considered to be a form of stacking.[28]
https://arxiv.org/pdf/0911.0460.pdf
【显著提升协同过滤的准确性】
Ensemble methods, such as stacking, are designed to boost predictive accuracy by blending the predictions of multiple machine learning models. Recent work has shown that the use of meta-features, additional inputs describing each example in a dataset, can boost the performance of ensemble methods, but the greatest reported gains have come from nonlinear procedures requiring significant tuning and training time. Here, we present a linear technique, Feature-Weighted Linear Stacking (FWLS), that incorporates meta-features for improved accuracy while retaining the well-known virtues of linear regression regarding speed, stability, and interpretability. FWLS combines model predictions linearly using coefficients that are themselves linear functions of meta-features. This technique was a key facet of the solution of the second place team in the recently concluded Netflix Prize competition. Significant increases in accuracy over standard linear stacking are demonstrated on the Netflix Prize collaborative filtering dataset.
【a blend of blends - stacking--调和 混合 堆积 调和的调和 】
“Stacking” is a technique in which the predictions of a collection of models are given as inputs to a second-level learning algorithm. This second-level algorithm is trained to combine the model predictions optimally to form a final set of predictions. Many machine learning practitioners have had success using stacking and related techniques to boost prediction accuracy beyond the level obtained by any of the individual models. In some contexts, stacking is also referred to as blending, and we will use the terms interchangeably here. Since its introduction [23], modellers have employed stacking successfuly on a wide variety of problems, including chemometrics [8], spam filtering [16], and large collections of datasets drawn from the UCI Machine learning repository [21, 7]. One prominent recent example of the
power of model blending was the Netflix Prize1 collaborative filtering competition. The team BellKor’s Pragmatic Chaos won the $1 million prize using a blend of hundreds of different models [22, 11, 14]. Indeed, the winning solution was a blend at multiple levels, i.e., a blend of blends. Intuition suggests that the reliability of a model may vary as a function of the conditions in which it is used. For instance, in a collaborative filtering context where we wish to predict the preferences of customers for various products, the amount of data collected may vary significantly depending on which customer or which product is under consideration. Model A may be more reliable than model B for users who have rated many products, but model B may outperform model A for users who have only rated a few products. In an attempt to capitalize on this intuition, many researchers have developed approaches that attempt to improve the accuracy of stacked regression by adapting the blending on the basis of side information. Such an additional source of information, like the number of products rated by a user or the number of days since a product was released, is often referred to as a “meta-feature,” and we will use that terminology here.
stacked generalization 堆积正则化 堆积泛化 加权特征线性堆积的更多相关文章
- Ensemble Learning: Bootstrap aggregating (Bagging) & Boosting & Stacked generalization (Stacking)
Booststrap aggregating (有些地方译作:引导聚集),也就是通常为大家所熟知的bagging.在维基上被定义为一种提升机器学习算法稳定性和准确性的元算法,常用于统计分类和回归中. ...
- 机器学习中模型泛化能力和过拟合现象(overfitting)的矛盾、以及其主要缓解方法正则化技术原理初探
1. 偏差与方差 - 机器学习算法泛化性能分析 在一个项目中,我们通过设计和训练得到了一个model,该model的泛化可能很好,也可能不尽如人意,其背后的决定因素是什么呢?或者说我们可以从哪些方面去 ...
- R语言 绘图——条形图可以将堆积条形图与百分比堆积条形图配合使用
在使用堆积条形图时候,新增一个百分比堆积条形图,可以加深读者印象. 封装一个function函数后只需要在调用的数据上改一下pos=‘fill’的代码即可.比较方便. 案例: # 封装函数 fun1& ...
- C++编程之面向对象的三个基本特征
面向对象的三个基本特征是:封装.继承.多态. 封装 封装最好理解了.封装是面向对象的特征之一,是对象和类概念的主要特性. 封装,也就是把客观事物封装成抽象的类,并且类可以把自己的数据和方法只让可信的类 ...
- 机器学习入门13 - 正则化:稀疏性 (Regularization for Sparsity)
原文链接:https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/ 1- L₁正 ...
- 【cs229-Lecture11】贝叶斯统计正则化
本节知识点: 贝叶斯统计及规范化 在线学习 如何使用机器学习算法解决具体问题:设定诊断方法,迅速发现问题 贝叶斯统计及规范化(防止过拟合的方法) 就是要找更好的估计方法来减少过度拟合情况的发生. 回顾 ...
- Andrew Ng-ML-第八章-正则化
1.过度拟合overfitting 过度拟合,因为有太多的特征+过少的训练数据,学习到的假设可能很适应训练集,但是不能泛化到新的样例.即泛化generalize能力差. 解决办法: 1.手动/使用选择 ...
- 线性回归和正则化(Regularization)
python风控建模实战lendingClub(博主录制,包含大量回归建模脚本和和正则化解释,2K超清分辨率) https://study.163.com/course/courseMain.htm? ...
- coursera机器学习-logistic回归,正则化
#对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...
随机推荐
- HNOI 2006 BZOJ 1195 最短母串
题面 问题描述 给定n个字符串(S1,S2,„,Sn),要求找到一个最短的字符串T,使得这n个字符串(S1,S2,„,Sn)都是T的子串. 输入 第一行是一个正整数n(n<=12),表示给定的字 ...
- Topshelf+Quartz.net+Dapper+Npoi(一)
背景 前段时间公司有个需求(每天给业务导出一批数据,以excel的形式通过邮件发送给他).A说:直接写个服务,判断等于某个时间点,执行一下sql语句,生成excel,写个EmaiHelper发送给他不 ...
- mariadb 启动方法
通用启动方法 /etc/init.d/mariadb status #查看状态 /etc/init.d/mariadb start #启动 /etc/init.d/mariadb restart #重 ...
- SSD配置
SSD: Single Shot MultiBox Detector - 运行“ make -j32”时出错: nvcc warning : The 'compute_20', 'sm_20', an ...
- opencv-2.4.11编译备忘
编译完成后,想测试example中例子,但是由于没有sudo权限,不能运行pkg-config查看opencv的--cflags和--libs. 记录一下,备忘: pkg-config --libs ...
- Testin云測手游质量管家 七大兵器助CP称霸江湖
Testin云測手游质量管家 七大兵器助CP称霸江湖 2014/09/29 · Testin · 产品评測 在武侠江湖里,高手不须要武功高强.亦要具备厉害的武器.有人的地方就有江湖.手游行业相同腥风血 ...
- 以前整理的网络上免费API接口
以前整理的一些免费的API接口,具体是否好用还需要时间测试,但是先分享给大家. 天气接口 聚合数据: http://op.juhe.cn/onebox/weather/query 用例 官方文档 来源 ...
- websocket聊天时,图片压缩处理(url或者input-file)
业务背景:私信聊天,需要发送图片,但是图片过大需要压缩处理.此时只有图片url,可以使用以下方法:canvasDataURL(url, 目标图片宽度,图片要显示区域的父元素) 注:该文件包含了inpu ...
- 【Excle数据透视表】如何创建非共享缓存的数据透视表
一般情况下,利用同一个数据源创建多个数据表时,默认创建的是共享缓存的数据透视表.刷新一个数据透视表时会影响其他数据透视表的展示结果. 解决方案 创建非共享缓存的多个数据透视表 步骤一 单击工作表数据任 ...
- Node.app让Nodejs平台在iOS和OS X系统上奔跑
首先呢,欢迎大家去查看相同内容的链接:http://www.livyfeel.com/nodeapp/. 由于那个平台我用的markdown语法,我也懒得改动了,就这样黏贴过来了. 这是一个惊人的恐怖 ...