Why Time Series Data Is Unique

A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in the data space because it often displays serial dependence序列依赖. Serial dependence occurs when the value of a datapoint at one time is statistically dependent on another datapoint in another time. However, this attribute of time series data violates违反 one of the fundamental assumptions of many statistical analyses — that data is statistically independent.

What Is Autocorrelation?

Autocorrelation is a type of serial dependence. Specifically, autocorrelation is when a time series is linearly related to a lagged version of itself. By contrast, correlation is simply when two independent variables are linearly related.

Why Autocorrelation Matters

Often, one of the first steps in any data analysis is performing regression analysis. However, one of the assumptions of regression analysis is that the data has no autocorrelation. This can be frustrating because if you try to do a regression analysis on data with autocorrelation, then your analysis will be misleading.

Additionally, some time series forecasting methods (specifically regression modeling) rely on the assumption that there isn’t any autocorrelation in the residuals (the difference between the fitted model and the data). People often use the residuals to assess whether their model is a good fit while ignoring that assumption that the residuals have no autocorrelation (or that the errors are independent and identically distributed or i.i.d). This mistake can mislead people into believing that their model is a good fit when in fact it isn’t. I highly recommend reading this article about How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls in which the author demonstrates how the increasingly popular LSTM (Long Short Term Memory) Network can appear to be an excellent univariate time series predictor, when in reality it’s just overfitting the data. He goes further to explain how this misconception is the result of accuracy metrics failing due to the presence of autocorrelation.

Finally, perhaps the most compelling aspect of autocorrelation analysis is how it can help us uncover hidden patterns in our data and help us select the correct forecasting methods. Specifically, we can use it to help identify seasonality and trend in our time series data. Additionally, analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for your time series prediction.

How to Determine if Your Time Series Data Has Autocorrelation by python

For this exercise, I’m using InfluxDB and the InfluxDB Python CL. I am using available data from the National Oceanic and Atmospheric Administration’s (NOAA) Center for Operational Oceanographic Products and Services. Specifically, I will be looking at the water levels and water temperatures of a river in Santa Monica.

More.

Autocorrelation in Time Series Data的更多相关文章

  1. 3.1.7. Cross validation of time series data

    3.1.7. Cross validation of time series data Time series data is characterised by the correlation bet ...

  2. 增长中的时间序列存储(Scaling Time Series Data Storage) - Part I

    本文摘译自 Netflix TechBlog : Scaling Time Series Data Storage - Part I 重点:扩容.缓存.冷热分区.分块. 时序数据 - 会员观看历史 N ...

  3. 时间序列大数据平台建设(Time Series Data,简称TSD)

    来源:https://blog.csdn.net/bluishglc/article/details/79277455 引言在大数据的生态系统里,时间序列数据(Time Series Data,简称T ...

  4. vehicle time series data analysis

    以HADOOP为代表的云计算提供的仅仅是一个算法执行环境,为大数据的并行计算提供了在现有软硬件水平下最好的(近似)方法.并不能解决大数据应用中的全部问题.从详细应用而言,通过物联网方式接入IT圈的数据 ...

  5. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  6. Time Series data 与 sequential data 的区别

    It is important to note the distinction between time series and sequential data. In both cases, the ...

  7. PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    From: Stanford University; Jure Leskovec, citation 6w+; Problem: subsequence clustering. Challenging ...

  8. Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做

    A sample network anomaly detection project Suppose we wanted to detect network anomalies with the un ...

  9. Time series data mining

    from here 论文Timeseries data mining(2012)中提出:时间序列数据挖掘包括7个基本任务和3个基础问题: 7 tasks: query by content clust ...

随机推荐

  1. 基于java开发jsp+ssm+mysql实现的在线考试系统 源码下载

    实现的关于在线考试的功能有:用户前台:用户注册登录.查看考试信息.进行考试.查看考试成绩.查看历史考试记录.回顾已考试卷.修改密码.修改个人信息等,后台管理功能(脚手架功能不在这里列出),科目专业管理 ...

  2. Linux下的python3,virtualenv,Mysql、nginx、redis等常用服务安装配置

    Linux下的python3,virtualenv,Mysql.nginx.redis等常用服务安装配置   学了前面的Linux基础,想必童鞋们是不是更感兴趣了?接下来就学习常用服务部署吧! 安装环 ...

  3. stm32f103vct6外扩sram芯片

    STM32F103是一款高性价比.多功能的单片机,配备常用的32位单片机片外资源,基于ARM Cortex-M3的32位处理器芯片,片内具有256KB FLASH,48KB RAM ( 片上集成12B ...

  4. Python中字符串的学习

    Python中字符串的学习 一.字符串的格式化输出 % 占位符 %s 字符串 %d integer %x 十六进制 integer %f float 指定长度 %5d 右对齐,不足左边补空格 %-5d ...

  5. ECMAScript基本对象——String 对象

    对象用于处理文本(字符串). 1.创建 var txt = new String("string"); var txt = "string"; 2.方法 cha ...

  6. 第1节-认识Jemeter

    1-Jemeter是什么 Apache JMeter是一款100%纯java实现的应用程序,它是开源的.该软件用于测试软件系统或应用程序的功能和性能. 最初设计这个软件的目的是用户测试web应用程序, ...

  7. react 中 函数bind 和箭头函数

    用bind形式 方便测试,含有this时候最好用bind形 其他情况用箭头函数 含有this的时候也可以用箭头函数

  8. 曼孚科技:AI语音交互领域常用的4个术语

    ​语音交互是基于语音输入的新一代交互模式,比较典型的应用场景是各类语音助手. 本文整理了语音交互领域常用的4个术语,希望可以帮助大家更好地理解这门学科. 1. 语音合成标记语言(SSML) 语音合成标 ...

  9. exe 发布为服务

    参考连接: https://www.cnblogs.com/liuxiaoji/p/8016261.html 1.有两个文件 srvany.exe,instsrv.exe 然后放到指定的文件下下: 2 ...

  10. elasitic search fresh flush segment merge

    new document首先在in memory buffer 中 (1)fresh 触发条件:默认one second 执行一次 执行过程:将memory buffer中documents 写入至f ...