[comment]: # Unsupervised Classification - Sprawl Classification Algorithm

Idea

Points (data) in same cluster are near each others, or are connected by each others.

So:

For a distance d，every points in a cluster always can find some points in the same cluster.
Distances between points in difference clusters are bigger than the distance d.

The above condition maybe not correct totally, e.g. in the case of clusters which have common points, the condition will be incorrect.

So need some improvement.

Sprawl Classification Algorithm

Input：
- data: Training Data
- d: The minimum distance between clusters
- minConnectedPoints: The minimum connected points:
Output:
- Result: an array of classified data
Logical：

Load data into TotalCache.

i = 0

while (TotalCache.size > 0)

{

    Find a any point A from TotalCache, put A into Cache2.

    Remove A from TotalCache

    In TotalCache, find points 'nearPoints' less than d from any point in the Cache2.

    Put Cache2 points into Cache1.

    Clear Cache2.

    Put nearPoints into Cache2.

    Remove nearPoints from TotalCache.

    if Cache2.size = 0, add Cache1 points into Result[i].

    Clear Cache1.

    i++

}

Return Result

Note: As the algorithm need to calculating the distances between points, maybe need to normalize data first to each feature has same weight.

Improvement

A big problem is the method need too much calculation for the distances between points. The max times is $/frac{n * (n - 1)}{2}$.

Improvement ideas:

Check distance for one feature first maybe quicker.

We need not to calculate the real distance for each pair, because we only need to make sure whether the distance is less than $d$,

if points x1, x2, the distance will be bigger or equals to $d$ when there is a $ \vert x1[i] - x2[i] \vert \geqslant d$.
Split data in multiple area

For n dimensions (features) dataset, we can split the dataset into multiple smaller datasets, each dataset is in a n dimension space whose size $d^{n}$.

We can image that each small space is a n dimensions cube and adjoin each other.

so we only need to calculate points in the current space and neighbour spaces.

Cons

Need a amount of calculating.
Need to improve to handle clusters which have common points.

Unsupervised Classification - Sprawl Classification Algorithm的更多相关文章

微软亚洲实验室一篇超过人类识别率的论文：Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ImageNet Classification
在该文章的两大创新点:一个是PReLU,一个是权值初始化的方法.下面我们分别一一来看. PReLU(paramter ReLU) 所谓的PRelu,即在 ReLU激活函数的基础上加入了一个参数,看一个 ...
What are the advantages of different classification algorithms?
What are the advantages of different classification algorithms? For instance, if we have large train ...
Classification / Recognition
转载 https://handong1587.github.io/deep_learning/2015/10/09/recognition.html#facenet Classification / ...
sklearn中的metrics模块中的Classification metrics
metrics是sklearn用来做模型评估的重要模块,提供了各种评估度量,现在自己整理如下: 一.通用的用法:Common cases: predefined values 1.1 sklearn官 ...
机器学习-TensorFlow应用之classification和ROC curve
概述前面几节讲的是linear regression的内容,这里咱们再讲一个非常常用的一种模型那就是classification,classification顾名思义就是分类的意思,在实际的情况是非 ...
学习笔记之k-nearest neighbors algorithm (k-NN)
k-nearest neighbors algorithm - Wikipedia https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith ...
Exploratory Undersampling for Class-Imbalance Learning
Abstract - Undersampling is a popular method in dealing with class-imbalance problems, which uses on ...
arcmap Command
The information in this document is useful if you are trying to programmatically find a built-in com ...
A Gentle Guide to Machine Learning
A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

随机推荐

centos中 mysql 5.7安装
以免授权模式启动编辑 /etc/my.cnf,添加以下内容: linux环境中:vi /etc/my.cnf 在[MySQL(和PHP搭配之最佳组合)d]配置段添加如下两行: user=mysql ...
[leetcode]Rotate Array
in place交换如果是k步,那么就是把后面k个放到前面了嘛. 我们先把整个数组reverse,然后把前面的reverse回来,再把后面的reverse回来对于AB我们要通过reverse操作得 ...
构建基于WinRT的WP8.1 App 03：Page控件
单页面模板通常利用Visual Studio 2013创建的最简单的WP8.1应用是Blank App,它只包含一个不带任何UI的页面,并且没有任何状态管理的逻辑. 该不带任何UI的页面称为Blan ...
linux上java路径设置
linux上java路径设置标签: javalinuxpath 2012-04-30 22:14 1843人阅读评论(0) 收藏举报版权声明:本文为博主原创文章,未经博主允许不得转载. 往/e ...
hadoop 转
detailed http://wenku.baidu.com/view/c2d1ebb4ba0d4a7302763a84.html http://hadoop.apache.org/docs/r1. ...
编译升级php之路(5.5.7 到 5.5.37)
为在一台旧服务器上能使用slim,共经历了: 1.安装composer(需要高版本php,原来是5.5.7) 2.升级php版本到5.5.37(编译出错,准备使用docker) 3.升级centos内 ...
安卓开发笔记——重识Activity
Activity并不是什么新鲜的东西,老生常谈,这里只是随笔记录一些笔记. 每当说起Activity,感觉最关注的还是它的生命周期,因为要使我们的应用程序更加健壮,客户体验更加良好,如果对生命周期不熟 ...
用C#实现修改网页数据
背景由于某宝最近升级,导致朋友买的刷单软件不能用了:在又付过钱之后,那个刷单软件供应商竟然捐款跑路了...于是,朋友委托我做一个功能一样的软件.功能他给我描述的软件功能,是这个样子的: ...
WebStorm 8 注册码
UserName:William ===== LICENSE BEGIN ===== 45550-12042010 00001SzFN0n1bPII7FnAxnt0DDOPJA INauvJkeVJB ...
浏览器 Pointer Events
前言 Pointer Events是一套触控输入处理规格,支持Pointer Events的浏览器包括了IE和Firefox,最近Chrome也宣布即将支持该处理规则. PointerEvent Po ...

Unsupervised Classification - Sprawl Classification Algorithm