https://en.wikipedia.org/wiki/K-means_clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however, k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine learning technique for classification that is often confused with k-means because of the k in the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means to classify new data into the existing clusters. This is known as nearest centroid classifier or Rocchio algorithm[citation needed].

cluster analysis in data mining的更多相关文章

  1. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

  2. Cluster analysis

    https://en.wikipedia.org/wiki/Cluster_analysis Cluster analysis or clustering is the task of groupin ...

  3. Data Mining的十种分析方法——摘自《市场研究网络版》谢邦昌教授

    Data Mining的十种分析方法: 记忆基础推理法(Memory-Based Reasoning:MBR)        记忆基础推理法最主要的概念是用已知的案例(case)来预测未来案例的一些属 ...

  4. A web crawler design for data mining

    Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...

  5. Weka 3: Data Mining Software in Java

    官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...

  6. data mining,machine learning,AI,data science,data science,business analytics

    数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...

  7. 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?

    本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...

  8. 论文翻译:Data mining with big data

    原文: Wu X, Zhu X, Wu G Q, et al. Data mining with big data[J]. IEEE transactions on knowledge and dat ...

  9. 18 Candidates for the Top 10 Algorithms in Data Mining

    Classification============== #1. C4.5 Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.Morga ...

随机推荐

  1. 【现代程序设计】homework-05

    本次作业要求设计服务器和客户端,由于之前对网络编程是一窍不通,上上节课听宗学长讲述Tcp的时候心里想这个东西还真是高大上啊一点儿都听不懂,但是上个周末看了看C#网络编程的博客和书之后,发现这个东西入门 ...

  2. 通过解析PE头。读取dll模块 和 dll模块函数

    win32 int main(){ //001e1000 ::MessageBox(NULL, TEXT("111"), TEXT("222"), 0); HM ...

  3. 如何开启SQL Server 2008的远程联机

    需要开启SQL Server 2008 远程联机,需按如下操作步骤执行: 1.首先需要在{程序}-{Microsoft SQL Server 2008}-{配置工具}-{SQL Server 配置管理 ...

  4. 解决:jquery-1.11.1.min.js红叉问题

    工程中导入jquery-1.11.1.min.js,工程正常运行.但是jquery-1.11.1.min.js一直显示红叉. 解决方法如下: 红叉的原因是:myeclipse没有去验证它! 选中js文 ...

  5. poj 1114 完全背包 dp

    如果可以每个物品拿多件,则从小到大遍历,否则从大到小遍历. G - Piggy-Bank Time Limit:1000MS     Memory Limit:32768KB     64bit IO ...

  6. 在Windows下利用MinGW编译FFmpeg

    目录 [隐藏]  1 环境与软件 2 第一步:安装MinGW 3 第二步:配置编译环境 4 第三步:配置SDL 5 第四步:编译 5.1 编译faac 5.2 编译fdk-aac 5.3 编译x264 ...

  7. java生成字符串md5函数类

    import java.security.MessageDigest; /** * Md5 工具 */ public class Md5Util { private static MessageDig ...

  8. bzoj1030 [JSOI2007]文本生成器

    1030: [JSOI2007]文本生成器 Time Limit: 1 Sec  Memory Limit: 162 MBSubmit: 2654  Solved: 1100[Submit][Stat ...

  9. CodeForces Round 193 Div2

    I know,this is another failure.It seems like that failure is a habit with me.This time I solved two ...

  10. POJ 2955 (区间DP)

    题目链接: http://poj.org/problem?id=2955 题目大意:括号匹配.对称的括号匹配数量+2.问最大匹配数. 解题思路: 看起来像个区间问题. DP边界:无.区间间隔为0时,默 ...