ML | k-means
what's xxx
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard)
k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.
Given a set of observations $(x_1, x_2, …, x_n)$, where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ n) $S = {S_1, S_2, …, S_k}$ so as to minimize the within-cluster sum of squares 平方和(WCSS):
$\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x_j \in S_i} \left\| \mathbf x_j - \boldsymbol\mu_i \right\|^2 $
where $μ_i$ is the mean of points in $S_i$.
Algorithm
heuristic
1. Assignment step: $S_i^{(t)} = \big \{ x_p : \big \| x_p - m^{(t)}_i \big \|^2 \le \big \| x_p - m^{(t)}_j \big \|^2 \ \forall j, 1 \le j \le k \big\}$,
where each $x_p$ is assigned to exactly one $S^{(t)}$, even if it could be is assigned to two or more of them.
2. Update step: Calculate the new means to be the centroids of the observations in the new clusters.
$m^{(t+1)}_i = \frac{1}{|S^{(t)}_i|} \sum_{x_j \in S^{(t)}_i} x_j $
Since the arithmetic mean is a least-squares estimator, this also minimizes the within-cluster sum of squares (WCSS) objective.
The algorithm has converged when the assignments no longer change. Since both steps optimize the WCSS objective, and there only exists a finite number of such partitionings, the algorithm must converge to a (local) optimum. There is no guarantee that the global optimum is found using this algorithm.
ML | k-means的更多相关文章
- KNN 与 K - Means 算法比较
KNN K-Means 1.分类算法 聚类算法 2.监督学习 非监督学习 3.数据类型:喂给它的数据集是带label的数据,已经是完全正确的数据 喂给它的数据集是无label的数据,是杂乱无章的,经过 ...
- 软件——机器学习与Python,聚类,K——means
K-means是一种聚类算法: 这里运用k-means进行31个城市的分类 城市的数据保存在city.txt文件中,内容如下: BJ,2959.19,730.79,749.41,513.34,467. ...
- 快速查找无序数组中的第K大数?
1.题目分析: 查找无序数组中的第K大数,直观感觉便是先排好序再找到下标为K-1的元素,时间复杂度O(NlgN).在此,我们想探索是否存在时间复杂度 < O(NlgN),而且近似等于O(N)的高 ...
- 网络费用流-最小k路径覆盖
多校联赛第一场(hdu4862) Jump Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Ot ...
- numpy.ones_like(a, dtype=None, order='K', subok=True)返回和原矩阵一样形状的1矩阵
Return an array of ones with the same shape and type as a given array. Parameters: a : array_like Th ...
- Abstractive Summarization
Sequence-to-sequence Framework A Neural Attention Model for Abstractive Sentence Summarization Alexa ...
- R 语言实战-Part 4 笔记
R 语言实战(第二版) part 4 高级方法 -------------第13章 广义线性模型------------------ #前面分析了线性模型中的回归和方差分析,前提都是假设因变量服从正态 ...
- 当我们在谈论kmeans(2)
本稿为初稿,后续可能还会修改:如果转载,请务必保留源地址,非常感谢! 博客园:http://www.cnblogs.com/data-miner/ 其他:建设中- 当我们在谈论kmeans(2 ...
- scikit-learn包的学习资料
http://scikit-learn.org/stable/modules/clustering.html#k-means http://my.oschina.net/u/175377/blog/8 ...
- HDU 3584 Cube (三维 树状数组)
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3584 Cube Problem Description Given an N*N*N cube A, ...
随机推荐
- 采用Atlas+Keepalived实现MySQL读写分离、读负载均衡
========================================================================================== 一.基础介绍 == ...
- Python PyAudio 安装使用
Python PyAudio安装: Python3.7 无法安装pyaudio pip install pyaudio提示error: Microsoft Visual C++ 14.0 is req ...
- 用decimal模块增加python的浮点数精度
浮点数python默认是17位精度,也就是小数点后16位(16位以后的全部四舍五入了),虽然有16位,但是这个精度越往后越不准. 如果有特殊需求,需要更多的精度,可以用decimal模块,通过更改其里 ...
- SolrCloud下DIH实践
创建Collection 在/usr/local/solrcloud/solr/server/solr文件夹下创建coreTest文件夹 将/usr/local/solrcloud/solr/serv ...
- Linux学习-SELinux 初探
什么是 SELinux 什么是 SELinux 呢?其实他是『 Security Enhanced Linux 』的缩写,字面上的意义就是安全强化的 Linux 之意! 当初设计的目标:避免资源的误用 ...
- jeecg使用uploadify上传组件
在jeecg框架的系统内使用uploadify组件进行上传操作,有时无法正常发送请求,一直被重定向到登录请求,有可能使系统对上传操作进行了过滤,需要将这个上传请求放到非拦截序列里,才能正常使用. 第二 ...
- 反射的妙用-类名方法名做参数进行方法调用实例demo
首先声明一点,大家都会说反射的效率低下,但是大多数的框架能少了反射吗?当反射能为我们带来代码上的方便就可以用,如有不当之处还望大家指出 1,项目结构图如下所示:一个ClassLb类库项目,一个为测试用 ...
- Maven项目下Tomcat插件选择方法
1. 进入Tomcat官网:http://tomcat.apache.org/ 选择Maven plugin 2. 选择版本 3. 查看版本对应的插件版本: 有两种方式添加:如下图所示:
- TOJ4277: Sequence 组合数学
4277: Sequence Time Limit(Common/Java):2000MS/6000MS Memory Limit:65536KByte Total Submit: 39 ...
- 算法golang篇
1.slice反转,偏移 func reverse(s []int) { , len(s) - ; i < j; i, j = i+, j- { s[i], s[j] = s[j], s[i] ...