Improvement can be done in fulture:
1. the algorithm of constructing network from distance matrix. 
2. evolution of sliding time window
3. the later processing or visual analysis of generated graphs.

Thinking:

1.What's the ground truth in load profiles?

For clustering, there's no ground truth, so how to tune the parameters or options in step2, step3 and step4? In this paper, they have the labels of time series, so they use RI to guide their selection of parameters, for example: k and \epsilon.

Suppose: similar time series tend to connect to each other and form communities.

Background and related works

shaped based distance measures; feature based distance measures; structure based distance measures. time series clustering; community detection in networks.

Methodology

  1. data normalization
  2. time series distance calculation
  3. network construction
  4. community detection

Which step influence the clustering results:

distance calculation algorithm; network construction methods. community detection methods.

2. distance matrix

calculating the distance for each pair of time series in the data set and construct a distance matrix D, where dij is the distance between series Xi and XJ . A good choice of distance measure has strong influence on the network construction and clustering result.

3. network construction

Two common method: K-NN and \epsilon-NN;  EXPLORATION

Experiments

45 time series data sets.

Purpose: check the performance of each combination of step2, step3,and step4 to each data sets.

Index指标:Rand index.

Vary the parameters: the k of k-NN from 1 to n-1;  the epsilon of epsilon-NN from min(D) to max(D) in 100 steps.

Step2: Manhattan, Euclidean, infinite Norm, DTW, short time series, DISSIM, Complexity-Invariant, Wavlet tranform, Pearson correlation, Intergrated periodogram.

Step3: fast greedy; multilevel; walktrap; infomap; label propagration.

Step4: vary the parameter of k and \epsilon.

Results

1. the effect of k and \epsilon to the clustering results(RI).

The k-NN construction method just allows discrete values of k while the ε-NN method accepts continuous values. When k and ε are small, vertices tend to make just few connections.

??what's the meaning of A,B,C,D in figure 5.

2. the statistical test of the effect of different distance methods. Friedman test and Nemenyi test.

多个算法在多个数据库上的对比:

  • 如果样本符合ANOVA(repeated measure)的假设(如正态、等方差),优先使用ANOVA。
  • 如果样本不符合ANOVA的假设,使用Friedman test配合Nemenyi test做post-hoc。
  • 如果样本量不一样,或因为特定原因不能使用Friedman-Nemenyi,可以尝试Kruskal Wallis配合Dunn's test。值得注意的是,这种方法是用来处理独立测量数据,要分情况讨论。

DTW measure presents the best results for both network construction methods.

3. the statistical test of the effect of community detection algorithms. Friedman test and Nemenyi test.

4. comparison to rival methods.

i. some classic clustering algorithms: k-medoids, complete-linkage, single-linkage, average-linkage, median-linkage, centroid-linkage and diana;

ii. three up-to-date ones: Zhang’s method [41], Maharaj’s method [24] and PDC [5]

5. detect time series clusters with time-shifts

Suppose: Clustering algorithms should be capable of detecting groups of time series that have similar variations in time.

CBF dataset: 30个序列,一共三组, 全部正确分组/clustering.

6. detect shape patterns

1000 time series of length 128, four groups.

detect shape patterns (UD, DD, DU, UU);

Discussion

1. the same idea can be extended to multivariate time series clustering.

2. evaluate the simulation results using different indexes.

3. As future works, we plan to propose automatic strategies for choosing the best number of neighbors (k and ε) and speeding up the network construction method, instead of using the naive method.

4. We also plan to apply the idea to solve other kinds of problems in time series analysis, such as time series prediction.   ??

Supplementary knowledge: 

1. box plot

它能显示出一组数据的最大值最小值中位数、及上下四分位数

以下是箱形图的具体例子:

                            +-----+-+
* o |-------| + | |---|
+-----+-+ +---+---+---+---+---+---+---+---+---+---+ 分数
0 1 2 3 4 5 6 7 8 9 10

这组数据显示出:

  • 最小值(minimum)=5
  • 下四分位数(Q1)=7
  • 中位数(Med --也就是Q2)=8.5
  • 上四分位数(Q3)=9
  • 最大值(maximum )=10
  • 平均值=8
  • 四分位间距(interquartile range)={\displaystyle (Q3-Q1)}=2 (即ΔQ)

2. 观念转变, experiment部分也很重要,不是可有可无的, 要细看。

3. 统计学检验

常用的机器学习算法比较?

All models are wrong, but some are useful. ----------统计学家George Box.

4. univariate and multivariate time series. 

Univariate time series: Only one variable is varying over time. For example, data collected from a sensor measuring the temperature of a room every second. Therefore, each second, you will only have a one-dimensional value, which is the temperature.

Multivariate time series: Multiple variables are varying over time. For example, a tri-axial accelerometer三轴加速器. There are three accelerations, one for each axis (x,y,z) and they vary simultaneously over time.

Considering the data you showed in the question, you are dealing with a multivariate time series, where value_1value_2 andvalue_3 are three variables changing simultaneously over time.

PP: Time series clustering via community detection in Networks的更多相关文章

  1. PP: Learning representations for time series clustering

    Problem: time series clustering TSC - unsupervised learning/ category information is not available. ...

  2. 【论文阅读】A practical algorithm for distributed clustering and outlier detection

    文章提出了一种分布式聚类的算法,这是第一个有理论保障的考虑离群点的分布式聚类算法(文章里自己说的).与之前的算法对比有以下四个优点: 1.耗时短O(max{k,logn}*n), 2.传递信息规模小: ...

  3. 论文解读(CGC)《CGC: Contrastive Graph Clustering for Community Detection and Tracking》

    论文信息 论文标题:CGC: Contrastive Graph Clustering for Community Detection and Tracking论文作者:Namyong Park, R ...

  4. A Node Influence Based Label Propagation Algorithm for Community detection in networks 文章算法实现的疑问

    这是我最近看到的一篇论文,思路还是很清晰的,就是改进的LPA算法.改进的地方在两个方面: (1)结合K-shell算法计算量了节点重重要度NI(node importance),标签更新顺序则按照NI ...

  5. LabelRank非重叠社区发现算法介绍及代码实现(A Stabilized Label Propagation Algorithm for Community Detection in Networks)

    最近在研究基于标签传播的社区分类,LabelRank算法基于标签传播和马尔科夫随机游走思路上改装的算法,引用率较高,打算将代码实现,便于加深理解. 这个算法和Label Propagation 算法不 ...

  6. PP: Time series anomaly detection with variational autoencoders

    Problem: unsupervised anomaly detection Model: VAE-reEncoder VAE with two encoders and one decoder. ...

  7. [Localization] R-CNN series for Localization and Detection

    CS231n Winter 2016: Lecture 8 : Localization and Detection CS231n Winter 2017: Lecture 11: Detection ...

  8. PP: Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    From: Stanford University; Jure Leskovec, citation 6w+; Problem: subsequence clustering. Challenging ...

  9. 关于目标检测(Object Detection)的文献整理

    本文对CV中目标检测子方向的研究,整理了如下的相关笔记(持续更新中): 1. Cascade R-CNN: Delving into High Quality Object Detection 年份: ...

随机推荐

  1. jQuery---on注册事件的2种方式

    on注册事件的2种方式 on注册事件的语法 on注册简单事件 // 这个是p自己注册的事件(简单事件) $("p").on("click", function ...

  2. java 实现大顶堆

    Java实现堆排序(大根堆)   堆排序是一种树形选择排序方法,它的特点是:在排序的过程中,将array[0,...,n-1]看成是一颗完全二叉树的顺序存储结构,利用完全二叉树中双亲节点和孩子结点之间 ...

  3. 剑指offer-面试题21-调整数组顺序使奇数位于偶数前面-双指针

    /* 题目: 调整数组顺序使奇数位于偶数前面. */ /* 思路: 双指针: 一个指针last用于遍历,当为奇数时+1, 当为偶数时,交换last和pre指向的值,向前移动pre指针. */ #inc ...

  4. vitualbox安装centos7卡死

    在用vitualbox安装centos7的时候,每次到配置页面,都会莫名卡死,试了几遍才发现不是卡死,而是弹窗用鼠标点击是没用的,需要用tab键和回车来选中执行.

  5. 替换 MyEclipse 中已有的项目

    一.删除 tomcat 中的项目 1.停止 tomcat 2.删除 tomcat 中的项目 选中项目,然后右键 - Remove deployment,这个可能需要一点时间 二.删除 MyEclips ...

  6. "Chrome的network中无法显示OPTIONS请求"的解决方案

    目录 #事故现场 #分析及解决方法 #参考 #事故现场 在前端发送一个跨域请求的时候,要先发送个options请求,从而获知服务端是否允许该跨域请求. 跨域资源共享标准新增了一组 HTTP 首部字段, ...

  7. mac 安装Kafka

    1. 安装zookeeper brew install zookeeper 默认安装位置 启动文件: /usr/local/Cellar/zookeeper/3.4.10/bin/ 配置文件: /us ...

  8. std::ref和std::cref使用(转载)

    转载于:https://blog.csdn.net/lmb1612977696/article/details/81543802 std::ref和std::cref 解释: std::ref 用于包 ...

  9. tcolorbox 宏包简明教程

    嗯,我消失好几天了.那么,我都在做什么呢?没错,就是写这篇文章了.这篇文章写起来着实有些费神了.于是,如果你觉得这篇文章对你有帮助,不妨扫描文末的二维码,适量赞助一下哦~! tcolorbox 宏包是 ...

  10. 优化公式排版和Beamer相关知识

    做优化的同学可能会碰到排列形如 max    ******* s.t.   ***** = *        ***** > ***        ...    的格式 既要要求 max 和 s ...