[comment]: # Unsupervised Classification - Sprawl Classification Algorithm

Idea

Points (data) in same cluster are near each others, or are connected by each others.

So:

  • For a distance d,every points in a cluster always can find some points in the same cluster.
  • Distances between points in difference clusters are bigger than the distance d.

    The above condition maybe not correct totally, e.g. in the case of clusters which have common points, the condition will be incorrect.

    So need some improvement.

Sprawl Classification Algorithm

  • Input:

    • data: Training Data
    • d: The minimum distance between clusters
    • minConnectedPoints: The minimum connected points:
  • Output:
    • Result: an array of classified data
  • Logical:
Load data into TotalCache.
i = 0
while (TotalCache.size > 0)
{
Find a any point A from TotalCache, put A into Cache2.
Remove A from TotalCache
In TotalCache, find points 'nearPoints' less than d from any point in the Cache2.
Put Cache2 points into Cache1.
Clear Cache2.
Put nearPoints into Cache2.
Remove nearPoints from TotalCache.
if Cache2.size = 0, add Cache1 points into Result[i].
Clear Cache1.
i++
}
Return Result

Note: As the algorithm need to calculating the distances between points, maybe need to normalize data first to each feature has same weight.

Improvement

A big problem is the method need too much calculation for the distances between points. The max times is \(/frac{n * (n - 1)}{2}\).

Improvement ideas:

  • Check distance for one feature first maybe quicker.

    We need not to calculate the real distance for each pair, because we only need to make sure whether the distance is less than \(d\),

    if points x1, x2, the distance will be bigger or equals to \(d\) when there is a $ \vert x1[i] - x2[i] \vert \geqslant d$.
  • Split data in multiple area

    For n dimensions (features) dataset, we can split the dataset into multiple smaller datasets, each dataset is in a n dimension space whose size \(d^{n}\).

    We can image that each small space is a n dimensions cube and adjoin each other.

    so we only need to calculate points in the current space and neighbour spaces.

Cons

  • Need a amount of calculating.
  • Need to improve to handle clusters which have common points.

Unsupervised Classification - Sprawl Classification Algorithm的更多相关文章

  1. 微软亚洲实验室一篇超过人类识别率的论文:Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ImageNet Classification

    在该文章的两大创新点:一个是PReLU,一个是权值初始化的方法.下面我们分别一一来看. PReLU(paramter ReLU) 所谓的PRelu,即在 ReLU激活函数的基础上加入了一个参数,看一个 ...

  2. What are the advantages of different classification algorithms?

    What are the advantages of different classification algorithms? For instance, if we have large train ...

  3. Classification / Recognition

    转载 https://handong1587.github.io/deep_learning/2015/10/09/recognition.html#facenet Classification / ...

  4. sklearn中的metrics模块中的Classification metrics

    metrics是sklearn用来做模型评估的重要模块,提供了各种评估度量,现在自己整理如下: 一.通用的用法:Common cases: predefined values 1.1 sklearn官 ...

  5. 机器学习-TensorFlow应用之classification和ROC curve

    概述 前面几节讲的是linear regression的内容,这里咱们再讲一个非常常用的一种模型那就是classification,classification顾名思义就是分类的意思,在实际的情况是非 ...

  6. 学习笔记之k-nearest neighbors algorithm (k-NN)

    k-nearest neighbors algorithm - Wikipedia https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith ...

  7. Exploratory Undersampling for Class-Imbalance Learning

    Abstract - Undersampling is a popular method in dealing with class-imbalance problems, which uses on ...

  8. arcmap Command

    The information in this document is useful if you are trying to programmatically find a built-in com ...

  9. A Gentle Guide to Machine Learning

    A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

随机推荐

  1. Unity中Mesh分解与边缘高亮加上深度检测

    一个比较简单的需求,不过遇到些坑,记录下. 房间有多个模型,每个模型可能多个SubMesh,点击后,需要能具体到是那个SubMesh,并且在这个SubMesh上显示边缘高光,以及能个性这单个SubMe ...

  2. dnspod动态域名使用感受

    继花生壳不能用之后,3322也开始不太好用了,首先就是360把所有3322的域名全部判定为危险域名,甚至拦截程序对于3322url的api请求. 所以想把3322换成我们自己的独立域名,但是3322他 ...

  3. mvc 分页视图 js 失效

    MVC的分页视图确实是好东西,比ajax直观,可是联动后 之前绑定的js事件失效,所以我们在绑定的时候,要注意使用jquery的 动态绑定功能 最常见的用法应该是 select 的 change 事件 ...

  4. 给MySQL官方提交的bug report备忘

    1.  Bug #72215 When LOCK_plugin conflicts very much, one uninstall-audit-plugin operation crash  htt ...

  5. 彻底解决Google浏览器CSS居中问题

    div做的界面时,又出现CSS hack(CSS兼容浏览器问题)在IE内核浏览器或者firefox浏览器中都能居中,没有居中的可以用其特殊标签来设定居中,如下划线 _ IE6优先识别,!importa ...

  6. 编译升级php之路(5.5.7 到 5.5.37)

    为在一台旧服务器上能使用slim,共经历了: 1.安装composer(需要高版本php,原来是5.5.7) 2.升级php版本到5.5.37(编译出错,准备使用docker) 3.升级centos内 ...

  7. html中css三种常见的样式选择器 zz

    1:标签选择器 标签选择器,是所有带有某种标签的都生效.这里以p为例,也就是所有的带有p标记的都会这样的样式 <html><head><styletype="t ...

  8. 使用hessian+protocol buffer+easyUI综合案例--登陆

    首先先简单介绍下hessian ,protocol buffer, easyUI框架 hessian: Hessian是一个轻量级的remoting on http工具,采用的是Binary RPC协 ...

  9. C++输入输出

    i. C++如何输入一行,按回车键结束 #include <sstream>1 getline(cin, line); istringstream input(line); ii. C++ ...

  10. SNF开发平台WinForm之九-代码生成器使用说明-SNF快速开发平台3.3-Spring.Net.Framework

    下面就具体的使用说明: 1.获取代码生成器的授权码(根据本机)-----还原数据库-------改config-----代码生成器 改代码生成器Config 2.登录代码生成器 3.查看是否连接成功 ...