Why Time Series Data Is Unique

A time series is a series of data points indexed in time. The fact that time series data is ordered makes it unique in the data space because it often displays serial dependence序列依赖. Serial dependence occurs when the value of a datapoint at one time is statistically dependent on another datapoint in another time. However, this attribute of time series data violates违反 one of the fundamental assumptions of many statistical analyses — that data is statistically independent.

What Is Autocorrelation?

Autocorrelation is a type of serial dependence. Specifically, autocorrelation is when a time series is linearly related to a lagged version of itself. By contrast, correlation is simply when two independent variables are linearly related.

Why Autocorrelation Matters

Often, one of the first steps in any data analysis is performing regression analysis. However, one of the assumptions of regression analysis is that the data has no autocorrelation. This can be frustrating because if you try to do a regression analysis on data with autocorrelation, then your analysis will be misleading.

Additionally, some time series forecasting methods (specifically regression modeling) rely on the assumption that there isn’t any autocorrelation in the residuals (the difference between the fitted model and the data). People often use the residuals to assess whether their model is a good fit while ignoring that assumption that the residuals have no autocorrelation (or that the errors are independent and identically distributed or i.i.d). This mistake can mislead people into believing that their model is a good fit when in fact it isn’t. I highly recommend reading this article about How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls in which the author demonstrates how the increasingly popular LSTM (Long Short Term Memory) Network can appear to be an excellent univariate time series predictor, when in reality it’s just overfitting the data. He goes further to explain how this misconception is the result of accuracy metrics failing due to the presence of autocorrelation.

Finally, perhaps the most compelling aspect of autocorrelation analysis is how it can help us uncover hidden patterns in our data and help us select the correct forecasting methods. Specifically, we can use it to help identify seasonality and trend in our time series data. Additionally, analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) in conjunction is necessary for selecting the appropriate ARIMA model for your time series prediction.

How to Determine if Your Time Series Data Has Autocorrelation by python

For this exercise, I’m using InfluxDB and the InfluxDB Python CL. I am using available data from the National Oceanic and Atmospheric Administration’s (NOAA) Center for Operational Oceanographic Products and Services. Specifically, I will be looking at the water levels and water temperatures of a river in Santa Monica.

More.

Autocorrelation in Time Series Data的更多相关文章

  1. 3.1.7. Cross validation of time series data

    3.1.7. Cross validation of time series data Time series data is characterised by the correlation bet ...

  2. 增长中的时间序列存储(Scaling Time Series Data Storage) - Part I

    本文摘译自 Netflix TechBlog : Scaling Time Series Data Storage - Part I 重点:扩容.缓存.冷热分区.分块. 时序数据 - 会员观看历史 N ...

  3. 时间序列大数据平台建设(Time Series Data,简称TSD)

    来源:https://blog.csdn.net/bluishglc/article/details/79277455 引言在大数据的生态系统里,时间序列数据(Time Series Data,简称T ...

  4. vehicle time series data analysis

    以HADOOP为代表的云计算提供的仅仅是一个算法执行环境,为大数据的并行计算提供了在现有软硬件水平下最好的(近似)方法.并不能解决大数据应用中的全部问题.从详细应用而言,通过物联网方式接入IT圈的数据 ...

  5. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  6. Time Series data 与 sequential data 的区别

    It is important to note the distinction between time series and sequential data. In both cases, the ...

  7. PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    From: Stanford University; Jure Leskovec, citation 6w+; Problem: subsequence clustering. Challenging ...

  8. Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做

    A sample network anomaly detection project Suppose we wanted to detect network anomalies with the un ...

  9. Time series data mining

    from here 论文Timeseries data mining(2012)中提出:时间序列数据挖掘包括7个基本任务和3个基础问题: 7 tasks: query by content clust ...

随机推荐

  1. Spring-Security无法正常捕捉到UsernameNotFoundException异常

    前言 在Web应用开发中,安全一直是非常重要的一个方面.在庞大的spring生态圈中,权限校验框架也是非常完善的.其中,spring security是非常好用的.今天记录一下在开发中遇到的一个spr ...

  2. java文本文件加密

    加密方法是通过输入流对源文件字符逐个读取,对其读取到字符的ascll值进行异或运算,并将其放入新文件中,解密时只要用相同的密钥进行ascll异或运算并向新文件输出即可,即对文件首次用该程序处理为加密, ...

  3. vim和emacs

    vim和emacs 在编程界一直有两大神器的传说.这两大神器一个是emacs,一个是vim.一个是神的编辑器,一个是编辑器之神. 程序员的圈子里面也一直流传着一个段子,说是世界上的程序员分为三种.使用 ...

  4. dubbo的使用

    dubbo现在用的也不多,基本都在用spring cloud那一套,所以不详细写这个dubbo了. 1.zookeeper的安装 2.demo示例 我们需要把提供者注册到dubbo注册中心,消费者去订 ...

  5. 敏捷@Scrum基础知识

    敏捷,用以概括一套全新的软件开发价值观:敏捷宣言由价值观和原则组成. (一)敏捷核心价值观 敏捷宣言 个体和交互        胜过      过程和工具 可以工作的软件    胜过      面面俱 ...

  6. 第31届IMO 第2题

    题目 设n>=3,考虑一个圆上由2n-1个不同点构成的集合E.现给E中恰好k个点染上黑色,如果至少有一对黑点使得这两个黑点之间的弧上(两段弧中的某一个)包含恰好E中的n个点,就成这样的染色方法是 ...

  7. id、css命名规范

    main热点:hot新闻:news下载:download子导航:subnav菜单:menu子菜单:submenu搜索:search友情链接:friendlink页脚:footer版权:copyrigh ...

  8. ubuntu python 安装使用虚拟环境 virtualenv

    1,虚拟环境是干啥用的? 我在电脑上装了cuda,显卡驱动,cudnn等一堆配套文件,然后又依赖于cuda和驱动安装了tensorflow2.0的gpu测试版,不知为何,我每次跑完tf2程序电脑都会卡 ...

  9. HTML文档快捷键

    一.web浏览器 1.刷新网页   F5 二.VS Code 常用快捷键 1.快速生成HTML代码 首先,建立一个空文档,选择编程语言为HTML: 其次,按下!(英文状态下),再按下tab键,就可以了 ...

  10. java exec python program

    I find three methods, the first is using jython, the module of jython can transform the type of data ...