From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

  • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
  • each cluster is a markov random field.
  • In thes MRFs, an edge represents a partial correlation between two variables.
  • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
  • This network has multiple layers.
  • the number of layers corresponds to the window size of a short subsequence.
  • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

  • define each cluster by a Gaussian inverse covariance.
  • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
  • K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

4. ticc code

Reference:

1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

  1. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  2. 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)

    图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...

  3. PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

    PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...

  4. PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval

    from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...

  5. PP: Unsupervised deep embedding for clustering analysis

    Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...

  6. [转]Multivariate Time Series Forecasting with LSTMs in Keras

    1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...

  7. PP: A dual-stage attention-based recurrent neural network for time series prediction

    Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...

  8. PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering A clustering network transforms the data into another space and then selects one ...

  9. PP: Time series clustering via community detection in Networks

    Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

  1. 源码浅析:InnoDB聚集索引如何定位到数据的物理位置,并从磁盘读取

    索引结构概述: MyISAM索引文件和数据文件是分离的,索引文件仅保存数据记录的地址.这与Oracle的索引结构相似,比较好理解.那么,常用的Innodb聚集索引结构是怎样的呢? InnoDB的数据文 ...

  2. 【Java】Swing实现一个简单的计算器

    import javax.swing.*; import java.awt.*; /** * 计算器 * @author paul * 2019.11.25 21:43 * */ public cla ...

  3. idea 普通项目 改成 maven项目

    首先 pom.xml 是必要的 然后在该文件上右击 add as maven project

  4. 蓝眼睛与红眼睛(The blue-eyed islanders puzzle)

    澳大利亚的华裔数学神童陶哲轩曾在网上贴出来一个问题 The blue-eyed islanders puzzle 让大家思考,逗大家玩儿. 说一个岛上有100个人,其中有5个红眼睛,95个蓝眼睛.这个 ...

  5. ArcGIS Engine开发碰到问题及解决方式

    1.问题描述——运行提示:ArcGIS version not specified. You must call RuntimeManager.Bind before creating any Arc ...

  6. [CF1311B] WeirdSort

    Solution 按照 \(p[i]\) 进行分段,如果某个 \(k\) 不存在 \(p[i]=k\),那么就把 \(i,i+1\) 分割开 处理出每一段的左端点和右端点 进而处理出每段的最小值和最大 ...

  7. css3基本选择器+属性选择器+动态伪类+UI状态伪类+结构类

    后代选择器 祖先元素 后代元素{ } 子元素选择器(直接子元素选择器) 父元素>子元素{ } 兄弟选择器 元素+兄弟元素(紧邻该元素之后的下一个兄弟元素) 所有兄弟元素选择器 元素~兄弟元素(该 ...

  8. 【MVC】Scripts.Render的用法

    一.配置BundleConfig.cs文件 1.首先要在App_Start 里面BundleConfig.cs 文件里面 添加要包含的css文件2.BundleConfig就是一个微软新加的 一个打包 ...

  9. 使用pem连接服务器

    后台同学甩给你一个pem文件,username@IP后如何链接服务器 准备:ssh客户端 例子xshell 文件->新建->主机(连接界面主机输入框输入IP)->点击用户身份-> ...

  10. 论Flaks与Django的区别

    1. jiaji2和DjangoTemplates模板引擎相比,jiaja2语法更简单 2. 耦合 3. 模型 3.1 模型定义 3.2 模型数据查询 Django: 自带ORM(Object-Rel ...