PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
From: Stanford University; Jure Leskovec, citation 6w+;
Problem:
subsequence clustering.
Challenging:
discover patterns is challenging because it requires simultaneous segmentation and clustering of the time series + interpreting the cluster results is difficult.
Why discover time series patterns is a challenge?? thinking by yourself!! there are already so many distance measures(DTW, manifold distance) and clustering methods(knn,k-means etc.). But I admit the interpretation is difficult.
Introduction:
long time series ----breakdown-----> a sequence of states/patterns ------> so time series can be expressed as a sequential timeline of a few key states. -------> discover repeated patterns/ understand trends/ detect anomalies/ better interpret large and high-dimensional datasets.
Key steps: simultaneously segment and cluster the time series.
Unsupervised learning: hard to interpretation, after clustering, you have to view data itself.
how to discover interpretable structure in the data?
Traditional clustering methods are not particularly well-suited to discover interpretable structure in the data. This is because they typically rely on distance-based metrics
distance-based metrics, DTW.
距离式的算法,在处理multivariate time series上有劣势,看不到细微的数据结构相似性。
Propose a new method for multivariate time series clustering TICC:
- define each cluster as a dependency network showing the relationships between the different sensors in a short subsequence.
- each cluster is a markov random field.
- In thes MRFs, an edge represents a partial correlation between two variables.
- learn each cluster's MRF by estimating a sparse Gaussian inverse covariance matrix.
- This network has multiple layers.
- the number of layers corresponds to the window size of a short subsequence.
- 逆协方差矩阵定义了MRF dependency network 的adjaccency matrix.
Related work:
time series clustering and convex optimization;
variations of dtw; symbolic representations; rule-based motif discovery;
However, these methods generally rely on distance-based metrics.
TICC ------ a model-based clustering method, like ARMA, Gaussian mixture or hidden markov models.
- define each cluster by a Gaussian inverse covariance.
- so the Gaussian inverse covariance defines a Markov random field encoding the structural representation.
- K clusters/ inverse covariances.
selecting the number of clusters: cross-validation; mornalized mutual information; BIC or silhouette score.
看不懂哇 T T
Supplementary knowledge:
1. 对于unsupervised learning, 目前对结果的解释或者中间参数的选取,全是靠经验。
2. Aarhus data, Martin, 做多变量time series 预测。
3. Toeplitz Matrices: 常对角矩阵。
4. ticc code
Reference:
PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data的更多相关文章
- PP: Tripoles: A new class of relationships in time series data
Problem: ?? mining relationships in time series data; A new class of relationships in time series da ...
- 图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix)
图Lasso求逆协方差矩阵(Graphical Lasso for inverse covariance matrix) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/ka ...
- PP: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
PROBLEM: OmniAnomaly multivariate time series anomaly detection + unsupervised 主体思想: input: multivar ...
- PP: Deep r -th Root of Rank Supervised Joint Binary Embedding for Multivariate Time Series Retrieval
from: Dacheng Tao 悉尼大学 PROBLEM: time series retrieval: given the current multivariate time series se ...
- PP: Unsupervised deep embedding for clustering analysis
Problem: unsupervised clustering represent data in feature space; learn a non-linear mapping from da ...
- [转]Multivariate Time Series Forecasting with LSTMs in Keras
1. Air Pollution Forecasting In this tutorial, we are going to use the Air Quality dataset. This is ...
- PP: A dual-stage attention-based recurrent neural network for time series prediction
Problem: time series prediction The nonlinear autoregressive exogenous model: The Nonlinear autoregr ...
- PP: Deep clustering based on a mixture of autoencoders
Problem: clustering A clustering network transforms the data into another space and then selects one ...
- PP: Time series clustering via community detection in Networks
Improvement can be done in fulture:1. the algorithm of constructing network from distance matrix. 2. ...
随机推荐
- Git的基本使用 -- 文件的添加、撤销、对比、删除
显示当前工作区.暂存区.仓库的状态 git status 当工作区的所有文件都提交到仓库,并和仓库保持一致时 有修改的文件时,会显示有改动的文件,并提示如何提交这些修改 添加到暂存区,还未提交到仓库时 ...
- CentOS安装图解及配置
CentOS-7-x86_64-Minimal安装图解 界面说明: Install CentOS 7 安装CentOS 7 Test this media & install CentOS ...
- FIB表与RIB表的区别与联系
RIB (route information base) 和 FIB (forwarding information base),又称Ip路由表 和 CEF表,它们之间的关系可以用下面这张图片来高度概 ...
- 原创: idea 的maven 方式配置 开发javaWeb(idea+tomcat +maven
通过idea 编辑器来配置基于maven创建站点,在tomcat上配置 在采用idea 配置之前,首先要确保maven和 tomcat 正确运行. maven 配置链接 tomcat 配置链接 实际你 ...
- XSStrike工具的安装使用
0x01简介 XSStrike 是一款用于探测并利用XSS漏洞的脚本 XSStrike目前所提供的产品特性: 对参数进行模糊测试之后构建合适的payload 使用payload对参数进行穷举匹配 内置 ...
- 面试再问ThreadLocal,别说你不会!
ThreadLocal是什么 以前面试的时候问到ThreadLocal总是一脸懵逼,只知道有这个哥们,不了解他是用来做什么的,更不清楚他的原理了.表面上看他是和多线程,线程同步有关的一个工具类,但其 ...
- Hadoop学习之路(8)Yarn资源调度系统详解
文章目录 1.Yarn介绍 2.Yarn架构 2.1 .ResourceManager 2.2 .ApplicationMaster 2.3 .NodeManager 2.4 .Container 2 ...
- maven的核心概念——仓库
第十章仓库 10.1 分类 [1]本地仓库:为当前本机电脑上的所有Maven工程服务. [2]远程仓库 (1)私服:架设在当前局域网环境下,为当前局域网范围内的所有Maven工程服务. (2)中央仓库 ...
- MVC5+EF6入门完整教程7:排序过滤分页
https://www.cnblogs.com/miro/p/4134241.html 前置准备 – 应用之前样式,增加测试数据 界面样式修改前: 下面对Views --> Account -- ...
- H5解决active伪类失效---点击后背景效果
<body ontouchstart></body> 给body注册一个空事件即可