K-means algorithm----PRML读书笔记

The K-means algorithm is based on the use of squared Euclidean distance as the measure of dissimilarity between a data point and a prototype vector. Our goal is to partition the data set into some number K of clusters, where we shall suppose for the moment that the value of K is given. We can then define an objective function, sometimes called a distortion measure, given by J=Σ_nΣ_kr_nk||x_n-μ_k||²,where n=1,...N, k=1,...,K, N is observations of a random D-dimensional Euclidean variable x, K is number of clusters. J represents the sum of the squares of the distances of each data point to its assigned vector μ_k. We can think of the μ_k as representing the centres of the clusters. Our goal is to find values for the {r_nk} and the {μ_k} so as to minimize J. First we choose some initial values for the μ_k.Then in the first phase we minimize J with respect to the r_nk, keeping the μ_k fixed. In the second phase we minimize J with respect to μ_k, keeping r_nk fixed. This two-stage optimization is then repeated until convergence. We simply assign the n^th data point to the closest cluster centre, this can be expressed as r_nk=1,if k=argmin_j||x_n-μ_j||², otherwise r_nk=0. The objective function J is a quadratic function of μ_k, and it can be minimized by setting its derivative with respect to μ_kto zero giving 2Σ_nr_nk(x_n-μ_k)=0. μ_k=(Σ_nr_nkx_n)/(Σ_nr_nk), this result has a simple interpretation, namely set μ_k equal to the mean of all of the data points x_n assigned to cluster k. For this reason, the procedure is known as the K-means algorithm.

K-means algorithm----PRML读书笔记的更多相关文章

expectation-maximization algorithm ---- PRML读书笔记
An elegant and powerful method for finding maximum likelihood solutions for models with latent varia ...
PRML读书笔记——2 Probability Distributions
2.1. Binary Variables 1. Bernoulli distribution, p(x = 1|µ) = µ 2.Binomial distribution + 3.beta dis ...
PRML读书笔记——机器学习导论
什么是模式识别(Pattern Recognition)? 按照Bishop的定义,模式识别就是用机器学习的算法从数据中挖掘出有用的pattern. 人们很早就开始学习如何从大量的数据中发现隐藏在背后 ...
PRML读书笔记——3 Linear Models for Regression
Linear Basis Function Models 线性模型的一个关键属性是它是参数的一个线性函数,形式如下: w是参数,x可以是原始的数据,也可以是关于原始数据的一个函数值,这个函数就叫bas ...
PRML读书笔记——Mathematical notation
x, a vector, and all vectors are assumed to be column vectors. M, denote matrices. xT, a row vcetor, ...
【PRML读书笔记-Chapter1-Introduction】1.5 Decision Theory
初体验: 概率论为我们提供了一个衡量和控制不确定性的统一的框架,也就是说计算出了一大堆的概率.那么,如何根据这些计算出的概率得到较好的结果,就是决策论要做的事情. 一个例子: 文中举了一个例子: 给定 ...
PRML读书笔记——Introduction
1.1. Example: Polynomial Curve Fitting 1. Movitate a number of concepts: (1) linear models: Function ...
【PRML读书笔记-Chapter1-Introduction】1.6 Information Theory
熵给定一个离散变量,我们观察它的每一个取值所包含的信息量的大小,因此,我们用来表示信息量的大小,概率分布为.当p(x)=1时,说明这个事件一定会发生,因此,它带给我的信息为0.(因为一定会发生,毫无 ...
【PRML读书笔记-Chapter1-Introduction】1.4 The Curse of Dimensionality
维数灾难给定如下分类问题: 其中x6和x7表示横轴和竖轴(即两个measurements),怎么分? 方法一(simple): 把整个图分成:16个格,当给定一个新的点的时候,就数他所在的格子中,哪 ...
【PRML读书笔记-Chapter1-Introduction】1.3 Model Selection
在训练集上有个好的效果不见得在测试集中效果就好,因为可能存在过拟合(over-fitting)的问题. 如果训练集的数据质量很好,那我们只需对这些有效数据训练处一堆模型,或者对一个模型给定系列的参数值 ...

随机推荐

【译】x86程序员手册18-6.3.1描述符保存保护参数
6.3 Segment-Level Protection 段级保护 All five aspects of protection apply to segment translation: 段转换时会 ...
id 转 entity
object 是 entity原始的类要使用id转化成entity要先将id.getobject 然后将这个值 (entity)转化成entity entity ent =id.getentity& ...
【转】IDEA 中tomcat图片储存和访问虚拟路径
1.idea 修改Tomcat的虚拟路径(第一种方法)修改配置文件有很多种,但是一直不成功;后来想还是idea的配置原因,这里tomcat虚拟路径只说一种; 修改Tomcat安装路径下server.x ...
OprenCV学习之路一：将彩色图片转换成灰度图
//将一张彩色图片转成灰度图: //////////////////////////// #include<cv.h> #include<cvaux.h> #include&l ...
POJ——3169Layout（差分约束）
POJ——3169Layout Layout Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 14702 Accepted ...
C++ API实现创建桌面快捷方式
#include<windows.h> #include <string> #include <shellapi.h> #include <shlobj.h& ...
4.bool组合查询
主要知识: 学习bool组合查询 bool嵌套 1.搜索发帖日期为2017-01-01,或者帖子ID为XHDK-A-1293-#fJ3的帖子,同时要求帖子的发帖日期绝对不为2017-01-02 ...
MINSUB - Largest Submatrix
MINSUB - Largest Submatrix no tags You are given an matrix M (consisting of nonnegative integers) a ...
清北学堂模拟赛d2t4 最大值(max)
题目描述LYK有一本书,上面有很多有趣的OI问题.今天LYK看到了这么一道题目:这里有一个长度为n的正整数数列ai(下标为1~n).并且有一个参数k.你需要找两个正整数x,y,使得x+k<=y, ...
CodeForces - 357D - Xenia and Hamming
先上题目: D. Xenia and Hamming time limit per test 1 second memory limit per test 256 megabytes input st ...

K-means algorithm----PRML读书笔记

K-means algorithm----PRML读书笔记的更多相关文章

随机推荐

热门专题