Unsupervised Classification - Sprawl Classification Algorithm
[comment]: # Unsupervised Classification - Sprawl Classification Algorithm
Idea
Points (data) in same cluster are near each others, or are connected by each others.
So:
- For a distance d,every points in a cluster always can find some points in the same cluster.
- Distances between points in difference clusters are bigger than the distance d.
The above condition maybe not correct totally, e.g. in the case of clusters which have common points, the condition will be incorrect.
So need some improvement.
Sprawl Classification Algorithm
- Input:
- data: Training Data
- d: The minimum distance between clusters
- minConnectedPoints: The minimum connected points:
- Output:
- Result: an array of classified data
- Logical:
Load data into TotalCache.
i = 0
while (TotalCache.size > 0)
{
Find a any point A from TotalCache, put A into Cache2.
Remove A from TotalCache
In TotalCache, find points 'nearPoints' less than d from any point in the Cache2.
Put Cache2 points into Cache1.
Clear Cache2.
Put nearPoints into Cache2.
Remove nearPoints from TotalCache.
if Cache2.size = 0, add Cache1 points into Result[i].
Clear Cache1.
i++
}
Return Result
Note: As the algorithm need to calculating the distances between points, maybe need to normalize data first to each feature has same weight.
Improvement
A big problem is the method need too much calculation for the distances between points. The max times is \(/frac{n * (n - 1)}{2}\).
Improvement ideas:
- Check distance for one feature first maybe quicker.
We need not to calculate the real distance for each pair, because we only need to make sure whether the distance is less than \(d\),
if points x1, x2, the distance will be bigger or equals to \(d\) when there is a $ \vert x1[i] - x2[i] \vert \geqslant d$. - Split data in multiple area
For n dimensions (features) dataset, we can split the dataset into multiple smaller datasets, each dataset is in a n dimension space whose size \(d^{n}\).
We can image that each small space is a n dimensions cube and adjoin each other.
so we only need to calculate points in the current space and neighbour spaces.
Cons
- Need a amount of calculating.
- Need to improve to handle clusters which have common points.
Unsupervised Classification - Sprawl Classification Algorithm的更多相关文章
- 微软亚洲实验室一篇超过人类识别率的论文:Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification ImageNet Classification
在该文章的两大创新点:一个是PReLU,一个是权值初始化的方法.下面我们分别一一来看. PReLU(paramter ReLU) 所谓的PRelu,即在 ReLU激活函数的基础上加入了一个参数,看一个 ...
- What are the advantages of different classification algorithms?
What are the advantages of different classification algorithms? For instance, if we have large train ...
- Classification / Recognition
转载 https://handong1587.github.io/deep_learning/2015/10/09/recognition.html#facenet Classification / ...
- sklearn中的metrics模块中的Classification metrics
metrics是sklearn用来做模型评估的重要模块,提供了各种评估度量,现在自己整理如下: 一.通用的用法:Common cases: predefined values 1.1 sklearn官 ...
- 机器学习-TensorFlow应用之classification和ROC curve
概述 前面几节讲的是linear regression的内容,这里咱们再讲一个非常常用的一种模型那就是classification,classification顾名思义就是分类的意思,在实际的情况是非 ...
- 学习笔记之k-nearest neighbors algorithm (k-NN)
k-nearest neighbors algorithm - Wikipedia https://en.wikipedia.org/wiki/K-nearest_neighbors_algorith ...
- Exploratory Undersampling for Class-Imbalance Learning
Abstract - Undersampling is a popular method in dealing with class-imbalance problems, which uses on ...
- arcmap Command
The information in this document is useful if you are trying to programmatically find a built-in com ...
- A Gentle Guide to Machine Learning
A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...
随机推荐
- openssl命令行工具简介 - RSA操作
原文链接: http://www.cnblogs.com/aLittleBitCool/archive/2011/09/22/2185418.html 首先介绍下命令台下openssl工具的简单使用: ...
- 系统UINavigationController使用相关参考
闲来无事便在网上google&baidu了一番UINavigationController的相关文章,然后又看了下官方文档:看看更新到iOS7之后UINavigationController的 ...
- IOC容器特性注入第一篇:程序集反射查找
学习kooboo的框架发现它的注入容器方法比较特别,同样是利用MVC的注入点,但它是查找网站下面bin所有的DLL利用反射查找特性找到对应的服务注入到容器. 这样的好处很简单:完全可以不用关心IOC容 ...
- .Net基础
标题 状态 内容 NET应用程序是如何执行的? http://www.cnblogs.com/kingmoon/archive/2012/07/16/2594459.html ...
- DB2解除锁表
背景 生产环境中,我几乎没有遇到过锁表.多是在开发过程中遇到的,比如团队开发中经常会遇到多个功能访问同一张表的情况.如果有开发人员在这张表加了排它锁,然后又忘记提交事务,那么其他开发人员就要一直等待了 ...
- 三种ViewController跳转的异同
- (void)presentViewController:(UIViewController *)viewControllerToPresent animated:(BOOL)flag comple ...
- 十、EnterpriseFrameWork框架的分层架构及意义(控制器、业务对象、实体、Dao之间关系)
本章内容主要包括两个方面,一.是框架分层(控制器.业务对象.实体.Dao)的详细说明,二.是对比常用三层结构的区别和优势: 本文要点: 1.框架中的各个分层详细说明 2.对比常用三层结构的区别和优势 ...
- 【转】ContextMenuStrip菜单应用
测试可用的代码: #region 右键快捷菜单单击事件 private void contextMenuStrip1_ItemClick(object sender, EventArgs e) { T ...
- zk框架中利用map类型传值来创建window,并且传值
@Command @NotifyChange("accList") public void clear(@BindingParam("id") String a ...
- 【cs229-Lecture20】策略搜索
本节内容: 1.POMDP: 2.Policy search算法:reinforced和Pegasus: 马尔科夫决策过程(Partially Observable Markov Decision P ...