cluster analysis in data mining
https://en.wikipedia.org/wiki/K-means_clustering
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.
The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm[citation needed].
cluster analysis in data mining的更多相关文章
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- Cluster analysis
https://en.wikipedia.org/wiki/Cluster_analysis Cluster analysis or clustering is the task of groupin ...
- Data Mining的十种分析方法——摘自《市场研究网络版》谢邦昌教授
Data Mining的十种分析方法: 记忆基础推理法(Memory-Based Reasoning:MBR) 记忆基础推理法最主要的概念是用已知的案例(case)来预测未来案例的一些属 ...
- A web crawler design for data mining
Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...
- Weka 3: Data Mining Software in Java
官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...
- data mining,machine learning,AI,data science,data science,business analytics
数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...
- 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?
本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...
- 论文翻译:Data mining with big data
原文: Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and dat ...
- 18 Candidates for the Top 10 Algorithms in Data Mining
Classification============== #1. C4.5 Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.Morga ...
随机推荐
- Sql server之路 (一)基础学习
查询 1.Select * from表名 2.Select 字段1,字段2,from表名 3.Select 字段1,字段2,...from表名 where 字段1 in('内容') 插入 1.inse ...
- Android下拉刷新完全解析,教你如何一分钟实现下拉刷新功能 (转)
转载请注明出处:http://blog.csdn.net/guolin_blog/article/details/9255575 最 近项目中需要用到ListView下拉刷新的功能,一开始想图省事,在 ...
- linux下用top命令查看cpu利用率超过100%
今天跑了一个非常耗时的批量插入操作..通过top命令查看cpu以及内存的使用的时候,cpu的时候查过了120%..以前没注意..通过在top的情况下按大键盘的1,查看的cpu的核数为4核. 通过网上查 ...
- php生成二维码的插件phpqrcode
参考网址: http://www.thinkphp.cn/topic/7749.html http://blog.csdn.net/stxyc/article/details/44650971 php ...
- Memcached启停脚本小结
编写配置文件 编写启动脚本 vim /etc/rc.d/init.d/memcached startesac and $<!= 0); } elsif (open PIDHANDLE," ...
- Jquery操作
一.文档操作 1.内部插入:append(),appendTo(),prepend(): 2.外部插入:after(),before(): 3.删除操作:remove(),empty(): 4.克隆操 ...
- 哈希表--HashSet<T>
.Net3.5之后出现了HashSet<T>,硬翻译过来就是“哈希集合”,跟“哈希”两字挂钩说明这种集合的内部实现用到了哈希算法,用Reflector工具就可以发现,HashSet< ...
- 关于Java的数据结构HashMap,ArrayList的使用总结及使用场景和原理分析
使用,必须要知道其原理,在课堂上学过散列函数的用法及其原理.但一直不知道怎么实践. 后来,在实际项目中,需要做一个流量分析预处理程序.每5分钟会接收到现网抓来的数据包并解析,每个文本文件大概200M左 ...
- 浅谈MySQL索引背后的数据结构及算法
摘要 本文以MySQL数据库为研究对象,讨论与数据库索引相关的一些话题.特别需要说明的是,MySQL支持诸多存储引擎,而各种存储引擎对索引的支持也各不相同,因此MySQL数据库支持多种索引类型,如BT ...
- BZOJ4303 : 数列
将每个点看成二维坐标点$(i,a_i)$,那么每次操作的范围都是一个矩形. 于是建立KD-Tree,通过打标记支持操作即可. 时间复杂度$O(m\sqrt{n})$. #include<cstd ...