From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

  • define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
  • each cluster is a markov random field.
  • In thes MRFs, an edge represents a partial correlation between two variables.
  • learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
  • This network has multiple layers.
  • the number of layers corresponds to the window size of a short subsequence.
  • 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

  • define each cluster by a Gaussian inverse covariance.
  • so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
  • K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

4. ticc code

Reference:

1. 如何用简单易懂的例子解释条件随机场(CRF)模型?

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

  1. PP: Tripoles: A new class of relationships in time series data

    Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...

  2. 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)

    图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...

  3. PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network

    PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...

  4. PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval

    from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...

  5. PP: Unsupervised deep embedding for clustering analysis

    Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...

  6. [转]Multivariate Time Series Forecasting with LSTMs in Keras

    1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...

  7. PP: A dual-stage attention-based recurrent neural network for time series prediction

    Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...

  8. PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering A clustering network transforms the data into another space and then selects one ...

  9. PP: Time series clustering via community detection in Networks

    Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

  1. hadoop之HDFS核心类Filesystem的使用

    1.导入jar包,要使用hadoop的HDFS就要导入hadoop-2.7.7\share\hadoop\common下的3个jar包和lib下的依赖包.hadoop-2.7.7\share\hado ...

  2. RMAN中MAXSETSIZE和MAXPIECESIZE的用法

    MAXSETSIZE跟MAXPIECESIZE用法 区别:maxpiecesize设置的是备份完成后的备份片大小,对备份整体的大小没有影响,比如一个G的备份完成文件,maxpiecesize设置为10 ...

  3. 小白的linux笔记5:关于权限那些事

    在设置smb时发现,目录的权限是个影响访问的大问题,还是得研究清楚. 关于文件权限 查看当前目录下文件和文件夹的权限状态:ls -l drwxrwxr--.  4 root root    4096 ...

  4. 如何把已有的本地git仓库,推送到远程新的仓库(github private)并进行远程开发;

    最近因为疫情,在家干活,连接不上之前的gitlab 服务器:所以不得把现有的代码迁移到github 的私有仓库来进行开发:下面简要记录迁移的过程: 首先,确保你已经配置好本地访问远程私有仓库的所有权限 ...

  5. mysql在node中的一些操作

    mysql 服务: a) 安装wamp|xamp 开启 mysql服务 b) 安装mysql 开启服务 库操作: 客户端:软件操作(UI工具) wamp的客户端是phpmyadmin navicat ...

  6. javaSE学习笔记(15) ---缓冲流、转换流、序列化流

    javaSE学习笔记(15) ---缓冲流.转换流.序列化流 缓冲流 昨天复习了基本的一些流,作为IO流的入门,今天我们要见识一些更强大的流.比如能够高效读写的缓冲流,能够转换编码的转换流,能够持久化 ...

  7. layui弹出表单提交后,界面model验证部分起作用

    情况1----input属性中type=submit时验证都可以起作用,但是弹出层表单的返回值不能获取,所以用ajax二次提交后会出现重复添加数据的问题 情况2----input属性中type=but ...

  8. 一键安装最新内核并开启 BBR 脚本

    最近,Google 开源了其 TCP BBR 拥塞控制算法,并提交到了 Linux 内核,从 4.9 开始,Linux 内核已经用上了该算法.根据以往的传统,Google 总是先在自家的生产环境上线运 ...

  9. Python 用户输入&while循环 初学者笔记

    input() 获取用户输入(获取的都是字符串哦) //函数input()让程序停止运行,等待用户输入一些文本. //不同于C的是可在input中添加用户提示,而scanf不具备这一特性. //提示超 ...

  10. scp 后台执行(防止大文件关闭会话丢失)

    Linux scp 设置nohup后台运行 Linux scp 设置nohup后台运行 1.正常执行scp命令 2.输入ctrl + z 暂停任务 3.bg将其放入后台 1.正常执行scp命令 从or ...