[comment]: # Unsupervised Classification - Sprawl Classification Algorithm

Idea

Points (data) in same cluster are near each others, or are connected by each others.

So:

  • For a distance d,every points in a cluster always can find some points in the same cluster.
  • Distances between points in difference clusters are bigger than the distance d.

    The above condition maybe not correct totally, e.g. in the case of clusters which have common points, the condition will be incorrect.

    So need some improvement.

Sprawl Classification Algorithm

  • Input:

    • data: Training Data
    • d: The minimum distance between clusters
    • minConnectedPoints: The minimum connected points:
  • Output:
    • Result: an array of classified data
  • Logical:
Load data into TotalCache.
i = 0
while (TotalCache.size > 0)
{
Find a any point A from TotalCache, put A into Cache2.
Remove A from TotalCache
In TotalCache, find points 'nearPoints' less than d from any point in the Cache2.
Put Cache2 points into Cache1.
Clear Cache2.
Put nearPoints into Cache2.
Remove nearPoints from TotalCache.
if Cache2.size = 0, add Cache1 points into Result[i].
Clear Cache1.
i++
}
Return Result

Note: As the algorithm need to calculating the distances between points, maybe need to normalize data first to each feature has same weight.

Improvement

A big problem is the method need too much calculation for the distances between points. The max times is \(/frac{n * (n - 1)}{2}\).

Improvement ideas:

  • Check distance for one feature first maybe quicker.

    We need not to calculate the real distance for each pair, because we only need to make sure whether the distance is less than \(d\),

    if points x1, x2, the distance will be bigger or equals to \(d\) when there is a $ \vert x1[i] - x2[i] \vert \geqslant d$.
  • Split data in multiple area

    For n dimensions (features) dataset, we can split the dataset into multiple smaller datasets, each dataset is in a n dimension space whose size \(d^{n}\).

    We can image that each small space is a n dimensions cube and adjoin each other.

    so we only need to calculate points in the current space and neighbour spaces.

Cons

  • Need a amount of calculating.
  • Need to improve to handle clusters which have common points.

Unsupervised Classification - Sprawl Classification Algorithm的更多相关文章

  1. 微软亚洲实验室一篇超过人类识别率的论文:Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ImageNet Classification

    在该文章的两大创新点:一个是PReLU,一个是权值初始化的方法.下面我们分别一一来看. PReLU(paramter ReLU) 所谓的PRelu,即在 ReLU激活函数的基础上加入了一个参数,看一个 ...

  2. What are the advantages of different classification algorithms?

    What are the advantages of different classification algorithms? For instance, if we have large train ...

  3. Classification / Recognition

    转载 https://handong1587.github.io/deep_learning/2015/10/09/recognition.html#facenet Classification / ...

  4. sklearn中的metrics模块中的Classification metrics

    metrics是sklearn用来做模型评估的重要模块,提供了各种评估度量,现在自己整理如下: 一.通用的用法:Common cases: predefined values 1.1 sklearn官 ...

  5. 机器学习-TensorFlow应用之classification和ROC curve

    概述 前面几节讲的是linear regression的内容,这里咱们再讲一个非常常用的一种模型那就是classification,classification顾名思义就是分类的意思,在实际的情况是非 ...

  6. 学习笔记之k-nearest neighbors algorithm (k-NN)

    k-nearest neighbors algorithm - Wikipedia https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith ...

  7. Exploratory Undersampling for Class-Imbalance Learning

    Abstract - Undersampling is a popular method in dealing with class-imbalance problems, which uses on ...

  8. arcmap Command

    The information in this document is useful if you are trying to programmatically find a built-in com ...

  9. A Gentle Guide to Machine Learning

    A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

随机推荐

  1. 【汇总】涉及iOS&iPhone开发相关文章汇总

    此文章汇总本博客中有涉及iPhone开发的相关文章,不定时更新中... 1.NSUserDefaults快速存储数据: http://www.cnblogs.com/ios-wmm/archive/2 ...

  2. [原创]Android Handler使用Message的一个注意事项

    最近发现了一个莫名其妙的问题,在使用Handler.post(Runnable)这个接口时,Runnable有时候没有运行,非常奇怪,后来发现是因为调用Handler.removeMessage()时 ...

  3. C#壓縮文件幫助類 使用ICSharpCode.SharpZipLib.dll

    using ICSharpCode.SharpZipLib.Checksums; using ICSharpCode.SharpZipLib.Zip; using System; using Syst ...

  4. Foundation和UIKit框架组织图

    转自:http://fantom.iteye.com/blog/1776558

  5. POJ 1308 Is It A Tree?

    Is It A Tree? Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 18778   Accepted: 6395 De ...

  6. linux 下面 jdk1.7 rpm 包的安装

    1.下载安装jdk7.0 for linux 我下载的版本为:jdk-7u2-linux-i586.rpm 下载地址为:http://www.oracle.com/technetwork/java/j ...

  7. Android View自动生成插件

    在ButterKnife这样强大的注入库出来之后,使用注入进行UI开发已经非常普遍.但考虑到效率.学习成本等问题,findViewById方式仍然是不错的选择. 但是当页面UI变得复杂后,我们从Lay ...

  8. 让我们一起Go(十一)

    前言: 今天又要继续了,当初自己的挖的坑必须得填啊,尽管天气非常滴热,但是丝毫无法阻挡我填坑的热情,那么,我们继续让我们一起Go!!! 定义方法: 这里我们要来看看Golang中的(Methods)方 ...

  9. zepto - slice

    var ss = ['1', '2', '3', '4', '5', '6']; console.log(ss.slice(2,4));

  10. 代码演示用 .NET 4.5 (C# 5.0)自带的压缩类 ZipArchive 创建一个压缩文件

    代码如下: using System; using System.Collections.Generic; using System.IO; using System.IO.Compression; ...