Autocorrelation in Time Series Data
Why Time Series Data Is Unique
A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in the data space because it often displays serial dependence序列依赖. Serial dependence occurs when the value of a datapoint at one time is statistically dependent on another datapoint in another time. However, this attribute of time series data violates违反 one of the fundamental assumptions of many statistical analyses — that data is statistically independent.
What Is Autocorrelation?
Autocorrelation is a type of serial dependence. Specifically, autocorrelation is when a time series is linearly related to a lagged version of itself. By contrast, correlation is simply when two independent variables are linearly related.
Why Autocorrelation Matters
Often, one of the first steps in any data analysis is performing regression analysis. However, one of the assumptions of regression analysis is that the data has no autocorrelation. This can be frustrating because if you try to do a regression analysis on data with autocorrelation, then your analysis will be misleading.
Additionally, some time series forecasting methods (specifically regression modeling) rely on the assumption that there isn’t any autocorrelation in the residuals (the difference between the fitted model and the data). People often use the residuals to assess whether their model is a good fit while ignoring that assumption that the residuals have no autocorrelation (or that the errors are independent and identically distributed or i.i.d). This mistake can mislead people into believing that their model is a good fit when in fact it isn’t. I highly recommend reading this article about How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls in which the author demonstrates how the increasingly popular LSTM (Long Short Term Memory) Network can appear to be an excellent univariate time series predictor, when in reality it’s just overfitting the data. He goes further to explain how this misconception is the result of accuracy metrics failing due to the presence of autocorrelation.
Finally, perhaps the most compelling aspect of autocorrelation analysis is how it can help us uncover hidden patterns in our data and help us select the correct forecasting methods. Specifically, we can use it to help identify seasonality and trend in our time series data. Additionally, analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for your time series prediction.
How to Determine if Your Time Series Data Has Autocorrelation by python
For this exercise, I’m using InfluxDB and the InfluxDB Python CL. I am using available data from the National Oceanic and Atmospheric Administration’s (NOAA) Center for Operational Oceanographic Products and Services. Specifically, I will be looking at the water levels and water temperatures of a river in Santa Monica.
Autocorrelation in Time Series Data的更多相关文章
- 3.1.7. Cross validation of time series data
3.1.7. Cross validation of time series data Time series data is characterised by the correlation bet ...
- 增长中的时间序列存储(Scaling Time Series Data Storage) - Part I
本文摘译自 Netflix TechBlog : Scaling Time Series Data Storage - Part I 重点:扩容.缓存.冷热分区.分块. 时序数据 - 会员观看历史 N ...
- 时间序列大数据平台建设(Time Series Data,简称TSD)
来源:https://blog.csdn.net/bluishglc/article/details/79277455 引言在大数据的生态系统里,时间序列数据(Time Series Data,简称T ...
- vehicle time series data analysis
以HADOOP为代表的云计算提供的仅仅是一个算法执行环境,为大数据的并行计算提供了在现有软硬件水平下最好的(近似)方法.并不能解决大数据应用中的全部问题.从详细应用而言,通过物联网方式接入IT圈的数据 ...
- PP: Tripoles: A new class of relationships in time series data
Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...
- Time Series data 与 sequential data 的区别
It is important to note the distinction between time series and sequential data. In both cases, the ...
- PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
From: Stanford University; Jure Leskovec, citation 6w+; Problem: subsequence clustering. Challenging ...
- Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做
A sample network anomaly detection project Suppose we wanted to detect network anomalies with the un ...
- Time series data mining
from here 论文Timeseries data mining(2012)中提出:时间序列数据挖掘包括7个基本任务和3个基础问题: 7 tasks: query by content clust ...
随机推荐
- 大数据才是未来,Oracle、SQL Server成昨日黄花?
1. 引子**** 有人在某个专注SQL的公众号留言如下: 这个留言触碰到一个非常敏感的问题:搞关系型数据库还有前途吗?现在都2020年了,区块链正火热,AI人才已经"过剩",大数 ...
- P1308 统计单词数(cin,getline() ,transform() )
题目描述 一般的文本编辑器都有查找单词的功能,该功能可以快速定位特定单词在文章中的位置,有的还能统计出特定单词在文章中出现的次数. 现在,请你编程实现这一功能,具体要求是:给定一个单词,请你输出它在给 ...
- [大数据技术]datax的安装以及使用
1.datax简述 DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL.Oracle.SqlServer.Postgre.HDFS.Hive.ADS.HBase.Ta ...
- Spark学习之路 (八)SparkCore的调优之开发调优[转]
前言 在大数据计算领域,Spark已经成为了越来越流行.越来越受欢迎的计算平台之一.Spark的功能涵盖了大数据领域的离线批处理.SQL类处理.流式/实时计算.机器学习.图计算等各种不同类型的计算操作 ...
- Tomcat配置绝对路径
JSP访问项目工程下的地址 <video width="640" height="264" controls src="http://local ...
- window服务session隔离
在window服务中抓取窗体是做不到的,因为window系统的session隔离机制:如果想要调用外部程序,可以通过 创建代理进程 进行操作(通过非托管代码CreateProcessAsUser函数进 ...
- CentOS MySQL自动备份shell脚本
先执行 vim/mysqlBack/back.sh 然后添加以下内容 ## 记录日志 # 以下配置信息请自己修改 mysql_user="root" #MySQL备份用户 mys ...
- 为什么要使用Redis? —— Redis实战经验
(序言,从一张思维导图开始,慢慢介绍我自己关于Redis的实战经验) 现在很多互联网应用的服务端都使用到了Redis,到底大家为什么要用Redis呢?Redis有很多特性,比如高性能.高可用.数据类型 ...
- C++->List的使用注释
List容器的应用: //----------单链队列-------队列的链式存储结构--------------- typedef struct QNode{ ...
- jQuery---淘宝精品案例
淘宝精品案例 <!DOCTYPE html> <html> <head lang="en"> <meta charset="UT ...