k-means算法的Python实现

 #coding=utf-8

 import codecs

 import numpy

 from numpy import *

 import pylab

 def loadDataSet(fileName):

     dataMat = []

     fr = codecs.open(fileName)

     for line in fr.readlines():

         curLine = line.strip().split('\t')

         fltLine = map(float, curLine)

         dataMat.append(fltLine)

     return dataMat    

 def distMeasure(vecA, vecB):

     #print vecA

     dist = sqrt(sum(power(vecA - vecB, 2)))

     return dist

 def kMeansInitCentroids(X, K):

     """

     KMEANSINITCENTROIDS This function initializes K centroids that are to be

     used in K-Means on the dataset X

     centroids = KMEANSINITCENTROIDS(X, K) returns K initial centroids to be

     used with the K-Means on the dataset X.

     """

     n = shape(X)[1]

     centroids = mat(zeros((K,n)))

     for j in range(n):

         #print X[:,j]

         minJ = min(X[:,j])

         rangeJ = float(max(array(X)[:,j]) - minJ)

         centroids[:,j] = minJ + rangeJ * random.rand(K,1)

     return centroids

 def findClosestCentroids(X, centroids):

     """

     FINDCLOSESTCENTROIDS computes the centroid memberships for every example

     idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids

     in idx for a dataset X where each row is a single example. idx = m x 1

     vector of centroid assignments (i.e. each entry in range [1..K])

     """

     # 数据总量

     m = shape(X)[0]

     K = shape(centroids)[0]

     clusterAssment = mat(zeros((m,2)))#create mat to assign data points

                                       #to a centroid, also holds SE of each point

     #centroids = createCent(dataSet, k)

     clusterChanged = True

     while clusterChanged:

         clusterChanged = False

         for i in range(m):#for each data point assign it to the closest centroid

             minDist = inf; minIndex = -1

             # k个中间数据（质心）都与数据i进行欧氏比较，选择距离最近的第minIndex类

             for j in range(K):

                 distJI = distMeasure(centroids[j,:],X[i,:])

                 if distJI < minDist:

                     minDist = distJI; minIndex = j

             if clusterAssment[i,0] != minIndex: clusterChanged = True

             clusterAssment[i,:] = minIndex,minDist**2

     return clusterAssment

 def computeCentroids(X, clusterAssment, K):

     """

     COMPUTECENTROIDS returs the new centroids by computing the means of the

     data points assigned to each centroid.

     centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by

     computing the means of the data points assigned to each centroid. It is

     given a dataset X where each row is a single data point, a vector

     idx of centroid assignments (i.e. each entry in range [1..K]) for each

     example, and K, the number of centroids. You should return a matrix

     centroids, where each row of centroids is the mean of the data points

     assigned to it.

     """

     n = shape(X)[1]

     centroids = mat(zeros((K,n)))

     for centroid in range(K):#recalculate centroids

         # nonzero会产生两个array，第一个非零的为序号列表

         ptsInClust = X[nonzero(clusterAssment[:,0].A==centroid)[0]]#get all the point in this cluster

         #print 'ererer:',ptsInClust,'dfdf'

         centroids[centroid,:] = mean(ptsInClust, axis=0) #assign centroid to mean

     return centroids

 def show(dataSet, k, centroids, clusterAssment):

     from matplotlib import pyplot as plt

     numSamples, dim = dataSet.shape

     mark = ['or', 'ob', 'og', 'ok', '^r', '+r', 'sr', 'dr', '<r', 'pr']

     print type(dataSet)

     for i in xrange(numSamples):

         markIndex = int(clusterAssment[i, 0])

         plt.plot(dataSet[i, 0], dataSet[i, 1], mark[markIndex])

     mark = ['Dr', 'Db', 'Dg', 'Dk', '^b', '+b', 'sb', 'db', '<b', 'pb']

     for i in range(k):

         plt.plot(centroids[i, 0], centroids[i, 1], mark[i], markersize = 12)

     plt.show()

 def runkMeans(X, initial_centroids,max_iters, plot_progress):

     """

     RUNKMEANS runs the K-Means algorithm on data matrix X, where each row of X

     is a single example

     [centroids, idx] = RUNKMEANS(X, initial_centroids, max_iters, ...

     plot_progress) runs the K-Means algorithm on data matrix X, where each

     row of X is a single example. It uses initial_centroids used as the

     initial centroids. max_iters specifies the total number of interactions

     of K-Means to execute. plot_progress is a true/false flag that

     indicates if the function should also plot its progress as the

     learning happens. This is set to false by default. runkMeans returns

     centroids, a Kxn matrix of the computed centroids and idx, a m x 1

     vector of centroid assignments (i.e. each entry in range [1..K]).

     """

     (m,n) = shape(X)

     K = shape(initial_centroids)[0]

     centroids = initial_centroids

     clusterAssment = zeros((m,2))

     #Run K-Means

     for i in range(max_iters):

         clusterAssment = findClosestCentroids(X, centroids)

         centroids = computeCentroids(X, clusterAssment, K);

     return centroids, clusterAssment

 def main():

     K =5

     max_iters = 10

     dataSet =  loadDataSet('E://PythonSpace//TextClustering//data//test2.txt')

     X = array(dataSet)

     X = (X - mean(X)) / std(X)

     initial_centroids = kMeansInitCentroids(X, K)

     myCentroids, clusterAssment = runkMeans(X, initial_centroids, max_iters,False);

     print "-------------------------------------"

     show(X, K, myCentroids, clusterAssment)

 main()

参考了Andrew Ng的Machine Learning Assignment(https://github.com/rieder91/MachineLearning/blob/master/Exercise%207/ex7/runkMeans.m)

以及博文http://www.cnblogs.com/MrLJC/p/4127553.html

运行结果：

k-means算法的Python实现的更多相关文章

Fuzzy C Means 算法及其 Python 实现——写得很清楚，见原文
Fuzzy C Means 算法及其 Python 实现转自:http://note4code.com/2015/04/14/fuzzy-c-means-%E7%AE%97%E6%B3%95%E5% ...
分类算法——k最近邻算法（Python实现）（文末附工程源代码）
kNN算法原理 k最近邻(k-Nearest Neighbor)算法是比较简单的机器学习算法.它采用测量不同特征值之间的距离方法进行分类,思想很简单:如果一个样本在特征空间中的k个最近邻(最相似)的样 ...
KNN 与 K - Means 算法比较
KNN K-Means 1.分类算法聚类算法 2.监督学习非监督学习 3.数据类型:喂给它的数据集是带label的数据,已经是完全正确的数据喂给它的数据集是无label的数据,是杂乱无章的,经过 ...
K－means算法
K-means算法很简单,它属于无监督学习算法中的聚类算法中的一种方法吧,利用欧式距离进行聚合啦. 解决的问题如图所示哈:有一堆没有标签的训练样本,并且它们可以潜在地分为K类,我们怎么把它们划分呢? ...
Python实现kNN（k邻近算法）
Python实现kNN(k邻近算法) 运行环境 Pyhton3 numpy科学计算模块计算过程 st=>start: 开始 op1=>operation: 读入数据 op2=>op ...
机器学习算法与Python实践之（五）k均值聚类（k-means）
机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书.因为自己想学习Python,然后也想对一些机器学习算法加深下了解,所以就想通过Python来实现几个比较常用的机器学 ...
机器学习算法与Python实践之（六）二分k均值聚类
http://blog.csdn.net/zouxy09/article/details/17590137 机器学习算法与Python实践之(六)二分k均值聚类 zouxy09@qq.com http ...
用Python从零开始实现K近邻算法
KNN算法的定义: KNN通过测量不同样本的特征值之间的距离进行分类.它的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别.K通 ...
机器学习 Python实践-K近邻算法
机器学习K近邻算法的实现主要是参考<机器学习实战>这本书. 一.K近邻(KNN)算法 K最近邻(k-Nearest Neighbour,KNN)分类算法,理解的思路是:如果一个样本在特征空 ...
K均值算法-python实现
测试数据展示: #coding:utf-8__author__ = 'similarface''''实现K均值算法算法摘要:-----------------------------输入:所有数据点 ...

随机推荐

ul li排版左右对齐
定义两个ul的class, 一个向左浮动, 一个向右浮动 #navtop{ width:100%; height:46px; background-color:#ecf0 ...
UICollectController
九宫格 UICollectController 1.新建一个xib描述cell 2.注册xib 3.collectionView显示cell *必须设置数据源(和代理并遵守协议) *实现数据源和代理的 ...
libevent linux安装
wget http://monkey.org/~provos/libevent-1.4.13-stable.tar.gzwget http://downloads.sourceforge.net/le ...
理解交互设计之"行为设计与对象设计"
本文是辛向阳教授在<装饰>杂志(大家可以关注这个权威杂志的公众号,分享给大家)2015年第1期公开发表的学术论文,文章探讨的是交互设计研究思路的转变.这一转变不仅适用于交互设计,也适用于 ...
ServletConfig与ServletContext对象（接口）
ServletConfig:封装servlet的配置信息. 在Servlet的配置文件中,可以使用一个或多个<init-param>标签为servlet配置一些初始化参数. <ser ...
oracle中的常用函数1-------decode方法
DECODE函数是ORACLE PL/SQL是功能强大的函数之一,目前还只有ORACLE公司的SQL提供了此函数,其他数据库厂商的SQL实现还没有此功能.DECODE有什么用途呢? 先构造一个例子,假 ...
HTML5新特性总览
html5的革新带来了更多的功能,简单的一个标签遍可以做到很多事情,例如 (1)canvas画图,vedio视屏,geolocation等等新标签. 如何检查浏览器是否支持这些新特性? 这样就足够,改 ...
POJ1182--食物链(经典并查集)并查集看不出来系列2
食物链 Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 65906 Accepted: 19437 Description ...
java基本输入类型数据System.out.println()或System.out.print()
P图
照片名称:调出照片柔和的蓝黄色-简单方法,1.打开原图素材,按Ctrl + J把背景图层复制一层,点通道面板,选择蓝色通道,图像 > 应用图像,图层为背景,混合为正片叠底,不透明度50%,反相打 ...

k-means算法的Python实现

k-means算法的Python实现的更多相关文章

随机推荐

热门专题