3.1.7. Cross validation of time series data
3.1.7. Cross validation of time series data
Time series data is characterised by the correlation between observations that are near in time (autocorrelation). However, classical cross-validation techniques such as KFold and ShuffleSplit assume the samples are independent and identically distributed, and would result in unreasonable correlation between training and testing instances (yielding poor estimates of generalisation error) on time series data. Therefore, it is very important to evaluate our model for time series data on the “future” observations least like those that are used to train the model. To achieve this, one solution is provided by TimeSeriesSplit.
3.1.7.1. Time Series Split
TimeSeriesSplit is a variation of k-fold which returns first
folds as train set and the
th fold as test set. Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them. Also, it adds all surplus data to the first training partition, which is always used to train the model.
This class can be used to cross-validate time series data samples that are observed at fixed time intervals.
Example of 3-split time series cross-validation on a dataset with 6 samples:
>>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> tscv = TimeSeriesSplit(n_splits=3)
>>> print(tscv)
TimeSeriesSplit(n_splits=3)
>>> for train, test in tscv.split(X):
... print("%s %s" % (train, test))
[0 1 2] [3]
[0 1 2 3] [4]
[0 1 2 3 4] [5]
3.1.7. Cross validation of time series data的更多相关文章
- 交叉验证(Cross Validation)原理小结
交叉验证是在机器学习建立模型和验证模型参数时常用的办法.交叉验证,顾名思义,就是重复的使用数据,把得到的样本数据进行切分,组合为不同的训练集和测试集,用训练集来训练模型,用测试集来评估模型预测的好坏. ...
- 交叉验证 Cross validation
来源:CSDN: boat_lee 简单交叉验证 hold-out cross validation 从全部训练数据S中随机选择s个样例作为训练集training set,剩余的作为测试集testin ...
- Cross Validation done wrong
Cross Validation done wrong Cross validation is an essential tool in statistical learning 1 to estim ...
- 交叉验证(cross validation)
转自:http://www.vanjor.org/blog/2010/10/cross-validation/ 交叉验证(Cross-Validation): 有时亦称循环估计, 是一种统计学上将数据 ...
- 10折交叉验证(10-fold Cross Validation)与留一法(Leave-One-Out)、分层采样(Stratification)
10折交叉验证 我们构建一个分类器,输入为运动员的身高.体重,输出为其从事的体育项目-体操.田径或篮球. 一旦构建了分类器,我们就可能有兴趣回答类似下述的问题: . 该分类器的精确率怎么样? . 该分 ...
- Cross Validation(交叉验证)
交叉验证(Cross Validation)方法思想 Cross Validation一下简称CV.CV是用来验证分类器性能的一种统计方法. 思想:将原始数据(dataset)进行分组,一部分作为训练 ...
- S折交叉验证(S-fold cross validation)
S折交叉验证(S-fold cross validation) 觉得有用的话,欢迎一起讨论相互学习~Follow Me 仅为个人观点,欢迎讨论 参考文献 https://blog.csdn.net/a ...
- 交叉验证(Cross Validation)简介
参考 交叉验证 交叉验证 (Cross Validation)刘建平 一.训练集 vs. 测试集 在模式识别(pattern recognition)与机器学习(machine lea ...
- cross validation笔记
preface:做实验少不了交叉验证,平时常用from sklearn.cross_validation import train_test_split,用train_test_split()函数将数 ...
随机推荐
- a &a &a[0]之间的区别和联系
数组中,a为数组的首地址,&a[0]为数组第一个元素的地址. 所以 a == &a[0] 但是,&a又是什么东西呢? 我们来做下面的代码测试: #include <std ...
- 从 Microsoft Dynamics CRM 4.0 server迁移到 Microsoft Dynamics CRM 2013 Server
不能就地升级早于 Microsoft Dynamics CRM Server 2011 的版本号,比方 Microsoft Dynamics CRM 4.0 server.可是,能够在升级过程中使用 ...
- 第二十一篇:Linux 操作系统中的进程结构
前言 在 Linux 中,一个正在执行的程序往往由各种各样的进程组成,这些进程除了父子关系,还有其他的关系.依赖于这些关系,所有进程构成一个整体,给用户提供完整的服务( 考虑到了终端,即与用户的交互 ...
- 第四篇:new和delete的基本用法
前言 new和delete是C++中用来动态管理内存分配的运算符,其用法较为灵活.如果你对它们的几种不同用法感到困惑,混淆,那么接着看下去吧. 功能一:动态管理单变量/对象空间 下面例子使用new为单 ...
- Oracle数据库列出所有表名SQL语句
select table_name from user_tables
- poj_1860 SPFA
题目大意 有N种货币,M个兑换点,每个兑换点只能相互兑换两种货币.设兑换点A可以兑换货币C1和C2,给出rate(C1--C2)表示1单位的C1货币可以兑换出的C2货币数目,rate(C2--C1)表 ...
- OpenGL编程指南第九章:纹理映射
转自://http://blog.csdn.net/longhuihu/article/details/8477614 纹理(texture)是一块矩形数据序列,存储的数据为颜色.亮度.alpha值. ...
- 170320、使用快照和AOF将Redis数据持久化到硬盘中
前言 我们知道Redis是一款内存服务器,就算我们对自己的服务器足够的信任,不会出现任何软件或者硬件的故障,但也会有可能出现突然断电等情况,造成Redis服务器中的数据失效.因此,我们需要向传统的关系 ...
- mysql IPv4 IPv6
w如何通过一个mysql方法,而不是借助脚本判断?INET6_ATON(expr) https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-func ...
- [LeetCode] 1.Two Sum - Swift
1. Two Sum Given an array of integers, return indices of the two numbers such that they add up to a ...