From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

  • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
  • each cluster is a markov random field.
  • In thes MRFs, an edge represents a partial correlation between two variables.
  • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
  • This network has multiple layers.
  • the number of layers corresponds to the window size of a short subsequence.
  • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

  • define each cluster by a Gaussian inverse covariance.
  • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
  • K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

4. ticc code

Reference:

1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

  1. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  2. 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)

    图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...

  3. PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

    PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...

  4. PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval

    from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...

  5. PP: Unsupervised deep embedding for clustering analysis

    Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...

  6. [转]Multivariate Time Series Forecasting with LSTMs in Keras

    1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...

  7. PP: A dual-stage attention-based recurrent neural network for time series prediction

    Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...

  8. PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering A clustering network transforms the data into another space and then selects one ...

  9. PP: Time series clustering via community detection in Networks

    Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

  1. uniapp-使用心得

    <view class="cu-item flex-sub" :class="index==TabCur?'text-orange cur':''" v- ...

  2. Python中verbaim标签使用详解

    verbatim标签:默认在"DTL"模板中是会去解析那些特殊字符串的,比如{% 和 %}以及{{等.如果你在某个代码片段中不想使用"DTL"的解析引擎,那么就 ...

  3. console 打印消息时,可以使用 %c 指定随后的文本样式; %s 可引用参数变量。

    1.console.log 使用 加%c console.log('%c Merry Christmas!!', 'color:green;background:yellow;text-shadow: ...

  4. 【DTOJ】2703:两个数的余数和商

    DTOJ 2703:两个数的余数和商  解题报告 2017.11.10 第一版 ——由翱翔的逗比w原创,引用<C++ Primer Plus(第6版)中文版> 题目信息: 题目描述 给你a ...

  5. 部署Nexus作为docker的私有仓库

    目录 Docker搭建Nexus私有仓库... 1 一.安装部署... 1 1.安装... 2 2.访问网页端... 2 二.配置使用... 2 1.创建本地仓库... 2 2.docker配置... ...

  6. 使用INF创建CSR文件

    公司要为一个英国的客户提供由HTTP升级到HTTPS的服务,于是接触到了申请SSL证书这方面的内容. 一.总的来说,申请证书需要两步,一是创建CSR文件,二是在证书提供商购买证书并将CSR文件发给证书 ...

  7. P4883 mzf的考验[平衡树]

    P4883 mzf的考验 维护一种数据结构 支持区间翻转 区间异或 区间求和- 显然 fhq treap 区间异或显然是拆位 ~~然后复杂度*20~~ 第一次先遍历一下整棵树 pushup 一下 就可 ...

  8. nodejs events

    EventEmitter类 events模块提供一个对象:events.EventEmitter,核心是事件触发和事件监听的封装. 大多数时候不会直接使用EventEmitter,而是在对象中继承它( ...

  9. Mac苹果电脑如何格式化?

    一般而言,我们想要在Windows系统上实现格式化操作是非常容易的.然而在苹果电脑上,我们则需要通过launchpad下的磁盘工具来进行,相对而言比较麻烦.关于“苹果电脑怎么格式化”的问题也困扰着无数 ...

  10. Java代码中特殊注释

    Java代码中特殊注释 TODO: + 说明:标识处有功能代码待编写,待实现的功能在说明中会简略说明. FIXME: + 说明:标识处代码需要修正,甚至代码是错误的,不能工作,需要修复,如何修正会在说 ...