PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

keeps_you_warm 2024-08-29 02:22:34 原文

From: Stanford University; Jure Leskovec, citation 6w+;

Problem:

subsequence clustering.

Challenging:

discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.

Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.

Introduction:

long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.

Key steps: simultaneously segment and cluster the time series.

Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.

how to discover interpretable structure in the data?

Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics

distance-based metrics, DTW.

距离式的算法，在处理multivariate time series上有劣势，看不到细微的数据结构相似性。

Propose a new method for multivariate time series clustering TICC:

define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
each cluster is a markov random field.
In thes MRFs, an edge represents a partial correlation between two variables.
learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
This network has multiple layers.
the number of layers corresponds to the window size of a short subsequence.
逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.

Related work:

time series clustering and convex optimization;

variations of dtw; symbolic representations; rule-based motif discovery;

However, these methods generally rely on distance-based metrics.

TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.

define each cluster by a Gaussian inverse covariance.
so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
K clusters/ inverse covariances.

selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.

看不懂哇 T T

Supplementary knowledge:

1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取，全是靠经验。

2. Aarhus data, Martin, 做多变量time series 预测。

3. Toeplitz Matrices: 常对角矩阵。

Reference:

1. 如何用简单易懂的例子解释条件随机场（CRF）模型？

PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章

PP: Tripoles: A new class of relationships in time series data
Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...
PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...
PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval
from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...
PP: Unsupervised deep embedding for clustering analysis
Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...
[转]Multivariate Time Series Forecasting with LSTMs in Keras
1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...
PP: A dual-stage attention-based recurrent neural network for time series prediction
Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...
PP: Deep clustering based on a mixture of autoencoders
Problem: clustering A clustering network transforms the data into another space and then selects one ...
PP: Time series clustering via community detection in Networks
Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...

随机推荐

配置 Apache James 邮件服务器以使用加密邮件通讯协议
可先参照: 使用 Apache James 3.3.0(开源免费) 搭建内网电子邮件服务器(基于 Windows + Amazon Corretto 8)https://www.cnblogs.com ...
mysql 主主备份
1.1.主主备份原理. 主主备份实际上是互为主从,主要是为了去缓解写入压力. 1.2.环境准备两台机器ip分别为 100.100.100.105 (主1) 100.100.100.106(主2) 安 ...
今天带来compass的使用方式
一.为什么我们要使用compass呢 Experience cleaner markup without presentational classes. It’s chock full of the ...
uniapp后台api设计(微信user表)
MySQL 创建数据库: CREATE DATABASE [IF NOT EXISTS] <数据库名> [[DEFAULT] CHARACTER SET <字符集名>] [[ ...
【54】目标检测之Bounding Box预测
Bounding Box预测(Bounding box predictions) 在上一篇笔记中,你们学到了滑动窗口法的卷积实现,这个算法效率更高,但仍然存在问题,不能输出最精准的边界框.在这个笔记中 ...
GHM论文笔记（CVPR2019）
目录作者要解决的问题 Focal loss(CVPR2017) Focal loss的解决方案 Focal loss的不足设计思路梯度与样本的关系梯度分布计算方法:将0-1的梯度切bin,计算 ...
Spring Boot源码（五）：BeanFactoryPostProcessor和BeanPostProcessor
BeanFactoryPostProcessor是spring BeanFactory加载Bean后调用, BeanPostProcessor是Bean初始化前后调用. BeanFactoryPost ...
VSCode常用插件之ESLint使用
更多VSCode插件使用请访问:VSCode常用插件汇总 ESLint这是VS Code ESLint扩展,将ESLint JavaScript集成到VS Code中. 首先简单说一下使用流程: 1. ...
安装Kubernetes到CentOS（Minikube）
运行环境系统版本:CentOS Linux release 7.6.1810 (Core) 软件版本:Docker-ce-18.06.0.Kubectl-1.15.0.Kubernetes-v1.1 ...
Jquery实现挂号平台首页源码
带二级导航.轮播海报.二级联动.搜索功能.Tab选项卡按照国际惯例先放图 index.html <!DOCTYPE html> <html lang="zh-cn&quo ...