PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

keeps_you_warm 2024-08-29 02:22:34 原文

From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法，在处理multivariate time series上有劣势，看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
each cluster is a markov random field.
In thes MRFs, an edge represents a partial correlation between two variables.
learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
This network has multiple layers.
the number of layers corresponds to the window size of a short subsequence.
逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

define each cluster by a Gaussian inverse covariance.
so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取，全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

Reference:

1. 如何用简单易懂的例子解释条件随机场（CRF）模型？

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

PP: Tripoles: A new class of relationships in time series data
Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...
PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...
PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval
from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...
PP: Unsupervised deep embedding for clustering analysis
Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...
[转]Multivariate Time Series Forecasting with LSTMs in Keras
1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...
PP: A dual-stage attention-based recurrent neural network for time series prediction
Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...
PP: Deep clustering based on a mixture of autoencoders
Problem: clustering A clustering network transforms the data into another space and then selects one ...
PP: Time series clustering via community detection in Networks
Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

Sql Server Proc 先看看简单吧
CREATE PRoc [名字] { @参数数据类型, @参数数据类型 OUTPUT[输入] } AS begin select INSERT UPDATE (SQL) end --基本语句快 - ...
在本地搭建git服务器
GitHub就是一个免费托管开源代码的远程仓库.但是对于某些视源代码如生命的商业公司来说,既不想公开源代码,又舍不得给GitHub交保护费,那就只能自己搭建一台Git服务器作为私有仓库使用. 搭建Gi ...
POST注入之sqlmap
POST注入方法一加—form跑数据库sqlmap.py -u http://59.63.200.79:8815/Pass-05/index.php —form —dbs跑出数据库后查询表名假设库名 ...
小白的linux笔记3：对外联通——开通ssh和ftp和smb共享
1.SSH的开通.https://www.cnblogs.com/DiDiao-Liang/articles/8283686.html 安装:yum install sshd或yum install ...
在Oracle中使用sqlload做数据迁移
前提:检查sqlload是否可用,输入sqlldr,提示有版本即可 1.创建测试表(已有则跳过)create table testTable(user varchar2(255),name var ...
使用JAVA导出EXCEL表格（POI）
一.POI概述 Jakarta POI 是一套用于访问微软格式文档的Java API.POI提供API给Java程序对Microsoft Office格式档案读和写的功能.在许多企业办公系统中,经常会 ...
java 相关书籍介绍
自己做开发也有两年多了吧,其中也关注过许多大牛的博客,买过许多的书看. 自己也是个比较爱阅读的人,从小的时候被老爸逼着每次寒暑假看书,到后来慢慢长大爱上了阅读,习惯了看书. 农村的小孩吗,那时候又不像 ...
JAVA输出流与输入流
输出流编程入门的第一个程序,输出一串字符串 public class C { public static void main(String[] args) { System.out.println( ...
Pikachu-URL重定向
不安全的url跳转不安全的url跳转问题可能发生在一切执行了url地址跳转的地方.如果后端采用了前端传进来的(可能是用户传参,或者之前预埋在前端页面的url地址)参数作为了跳转的目的地,而又没有做判 ...
python下timer定时器常用的两种实现方法
方法一,使用线程中现成的: 这种一般比较常用,特别是在线程中的使用方法,下面是一个例子能够很清楚的说明它的具体使用方法: #! /usr/bin/python3 #! -*- conding: u ...