Unsupervised Classification - Sprawl Classification Algorithm
[comment]: # Unsupervised Classification - Sprawl Classification Algorithm
Idea
Points (data) in same cluster are near each others, or are connected by each others.
So:
- For a distance d,every points in a cluster always can find some points in the same cluster.
- Distances between points in difference clusters are bigger than the distance d.
The above condition maybe not correct totally, e.g. in the case of clusters which have common points, the condition will be incorrect.
So need some improvement.
Sprawl Classification Algorithm
- Input:
- data: Training Data
- d: The minimum distance between clusters
- minConnectedPoints: The minimum connected points:
- Output:
- Result: an array of classified data
- Logical:
Load data into TotalCache.
i = 0
while (TotalCache.size > 0)
{
Find a any point A from TotalCache, put A into Cache2.
Remove A from TotalCache
In TotalCache, find points 'nearPoints' less than d from any point in the Cache2.
Put Cache2 points into Cache1.
Clear Cache2.
Put nearPoints into Cache2.
Remove nearPoints from TotalCache.
if Cache2.size = 0, add Cache1 points into Result[i].
Clear Cache1.
i++
}
Return Result
Note: As the algorithm need to calculating the distances between points, maybe need to normalize data first to each feature has same weight.
Improvement
A big problem is the method need too much calculation for the distances between points. The max times is \(/frac{n * (n - 1)}{2}\).
Improvement ideas:
- Check distance for one feature first maybe quicker.
We need not to calculate the real distance for each pair, because we only need to make sure whether the distance is less than \(d\),
if points x1, x2, the distance will be bigger or equals to \(d\) when there is a $ \vert x1[i] - x2[i] \vert \geqslant d$. - Split data in multiple area
For n dimensions (features) dataset, we can split the dataset into multiple smaller datasets, each dataset is in a n dimension space whose size \(d^{n}\).
We can image that each small space is a n dimensions cube and adjoin each other.
so we only need to calculate points in the current space and neighbour spaces.
Cons
- Need a amount of calculating.
- Need to improve to handle clusters which have common points.
Unsupervised Classification - Sprawl Classification Algorithm的更多相关文章
- 微软亚洲实验室一篇超过人类识别率的论文:Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ImageNet Classification
在该文章的两大创新点:一个是PReLU,一个是权值初始化的方法.下面我们分别一一来看. PReLU(paramter ReLU) 所谓的PRelu,即在 ReLU激活函数的基础上加入了一个参数,看一个 ...
- What are the advantages of different classification algorithms?
What are the advantages of different classification algorithms? For instance, if we have large train ...
- Classification / Recognition
转载 https://handong1587.github.io/deep_learning/2015/10/09/recognition.html#facenet Classification / ...
- sklearn中的metrics模块中的Classification metrics
metrics是sklearn用来做模型评估的重要模块,提供了各种评估度量,现在自己整理如下: 一.通用的用法:Common cases: predefined values 1.1 sklearn官 ...
- 机器学习-TensorFlow应用之classification和ROC curve
概述 前面几节讲的是linear regression的内容,这里咱们再讲一个非常常用的一种模型那就是classification,classification顾名思义就是分类的意思,在实际的情况是非 ...
- 学习笔记之k-nearest neighbors algorithm (k-NN)
k-nearest neighbors algorithm - Wikipedia https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith ...
- Exploratory Undersampling for Class-Imbalance Learning
Abstract - Undersampling is a popular method in dealing with class-imbalance problems, which uses on ...
- arcmap Command
The information in this document is useful if you are trying to programmatically find a built-in com ...
- A Gentle Guide to Machine Learning
A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...
随机推荐
- 在线PDF编辑网站http://www.pdfescape.com
网站地址:http://www.pdfescape.com 先转载一个简单介绍的文章 如果你以前很少阅读PDF文档,电脑中也没有PDF阅读器:adobe reader,foxit reader之类的软 ...
- 部署tomcat在windows服务器下,将tomcat控制台日志记录到日志文件中
在Linux系统中,Tomcat 启动后默认将很多信息都写入到 catalina.out 文件中,我们可以通过tail -f catalina.out 来跟踪Tomcat 和相关应用运行的情况. ...
- Eclipse中如何修改SVN的地址
在SVN服务端的IP更改后,客户端SVN的连接地址可以在Eclipse中进行修改,方法如下: 首先:在Eclipse中选择Windows-> Show View->others 就会出现[ ...
- SSAS:菜鸟摸门
官方:SSAS 多维模型 Analysis Services 多维解决方案使用多维数据集结构来分析多个维度之间的业务数据. 多维模式是 Analysis Services 的默认服务器模式. 它包括针 ...
- Ubuntu中QT使用FFmpeg的奇怪问题
FFmpeg都已经编译安装好了,QT的程序中调用av_register_all却总是在链接阶段报错,经过长时间的摸索,发现时静态链接库的问题,网上给出的答案都只能解决部分问题,所需的全部链接库如下: ...
- ML的灌水现象
(http://demonstrate.ycool.com/post.3137870.html) 看了几天 paper 和书,发现自己果然就是 zt好多东西就是不懂,那些人做的真快,我才建立起一种大致 ...
- 图文安装Windows Template Library - WTL Version 9.0
从http://wtl.sourceforge.net/下载 WTL 9.0,或者点此链接下载:WTL90_4140_Final.zip,然后解压到你的VC目录下面, 我的地址是:C:\Program ...
- Linux SSH 连接不上
http://blog.csdn.net/cryhelyxx/article/details/46473783 在xshell下用ssh登录远程主机centos出现以下问题: Connection e ...
- Sublime Text3 插件集合
下载地址:http://download.csdn.net/detail/yinluhui/9029791 [包含的插件有: AndyJS2.BracketHighlighter.emmet-subl ...
- php 使用zendstudio 生成webservice文件 wsdl
首先新建一个项目 在项目中新建下面这些文件 php类文件 test.php <?php class test { public function __construct() { } public ...