https://en.wikipedia.org/wiki/K-means_clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm[citation needed].

cluster analysis in data mining的更多相关文章

  1. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

  2. Cluster analysis

    https://en.wikipedia.org/wiki/Cluster_analysis Cluster analysis or clustering is the task of groupin ...

  3. Data Mining的十种分析方法——摘自《市场研究网络版》谢邦昌教授

    Data Mining的十种分析方法: 记忆基础推理法(Memory-Based Reasoning:MBR)        记忆基础推理法最主要的概念是用已知的案例(case)来预测未来案例的一些属 ...

  4. A web crawler design for data mining

    Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...

  5. Weka 3: Data Mining Software in Java

    官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...

  6. data mining,machine learning,AI,data science,data science,business analytics

    数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...

  7. 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?

    本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...

  8. 论文翻译:Data mining with big data

    原文: Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and dat ...

  9. 18 Candidates for the Top 10 Algorithms in Data Mining

    Classification============== #1. C4.5 Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.Morga ...

随机推荐

  1. Sql server之路 (一)基础学习

    查询 1.Select * from表名 2.Select 字段1,字段2,from表名 3.Select 字段1,字段2,...from表名 where 字段1 in('内容') 插入 1.inse ...

  2. Android下拉刷新完全解析,教你如何一分钟实现下拉刷新功能 (转)

    转载请注明出处:http://blog.csdn.net/guolin_blog/article/details/9255575 最 近项目中需要用到ListView下拉刷新的功能,一开始想图省事,在 ...

  3. linux下用top命令查看cpu利用率超过100%

    今天跑了一个非常耗时的批量插入操作..通过top命令查看cpu以及内存的使用的时候,cpu的时候查过了120%..以前没注意..通过在top的情况下按大键盘的1,查看的cpu的核数为4核. 通过网上查 ...

  4. php生成二维码的插件phpqrcode

    参考网址: http://www.thinkphp.cn/topic/7749.html http://blog.csdn.net/stxyc/article/details/44650971 php ...

  5. Memcached启停脚本小结

    编写配置文件 编写启动脚本 vim /etc/rc.d/init.d/memcached startesac and $<!= 0); } elsif (open PIDHANDLE," ...

  6. Jquery操作

    一.文档操作 1.内部插入:append(),appendTo(),prepend(): 2.外部插入:after(),before(): 3.删除操作:remove(),empty(): 4.克隆操 ...

  7. 哈希表--HashSet<T>

    .Net3.5之后出现了HashSet<T>,硬翻译过来就是“哈希集合”,跟“哈希”两字挂钩说明这种集合的内部实现用到了哈希算法,用Reflector工具就可以发现,HashSet< ...

  8. 关于Java的数据结构HashMap,ArrayList的使用总结及使用场景和原理分析

    使用,必须要知道其原理,在课堂上学过散列函数的用法及其原理.但一直不知道怎么实践. 后来,在实际项目中,需要做一个流量分析预处理程序.每5分钟会接收到现网抓来的数据包并解析,每个文本文件大概200M左 ...

  9. 浅谈MySQL索引背后的数据结构及算法

    摘要 本文以MySQL数据库为研究对象,讨论与数据库索引相关的一些话题.特别需要说明的是,MySQL支持诸多存储引擎,而各种存储引擎对索引的支持也各不相同,因此MySQL数据库支持多种索引类型,如BT ...

  10. BZOJ4303 : 数列

    将每个点看成二维坐标点$(i,a_i)$,那么每次操作的范围都是一个矩形. 于是建立KD-Tree,通过打标记支持操作即可. 时间复杂度$O(m\sqrt{n})$. #include<cstd ...