tensorflow kmeans 聚类

iris：

# -*- coding: utf-8 -*-

# K-means with TensorFlow

#----------------------------------

#

# This script shows how to do k-means with TensorFlow

import numpy as np

import matplotlib.pyplot as plt

import tensorflow as tf

from sklearn import datasets

from scipy.spatial import cKDTree

from sklearn.decomposition import PCA

from sklearn.preprocessing import scale

from tensorflow.python.framework import ops

ops.reset_default_graph()

sess = tf.Session()

iris = datasets.load_iris()

num_pts = len(iris.data)

num_feats = len(iris.data[0])

# Set k-means parameters

# There are 3 types of iris flowers, see if we can predict them

k=3

generations = 25

data_points = tf.Variable(iris.data)

cluster_labels = tf.Variable(tf.zeros([num_pts], dtype=tf.int64))

# Randomly choose starting points

rand_starts = np.array([iris.data[np.random.choice(len(iris.data))] for _ in range(k)])

centroids = tf.Variable(rand_starts)

# In order to calculate the distance between every data point and every centroid, we

#  repeat the centroids into a (num_points) by k matrix.

centroid_matrix = tf.reshape(tf.tile(centroids, [num_pts, 1]), [num_pts, k, num_feats])

# Then we reshape the data points into k (3) repeats

point_matrix = tf.reshape(tf.tile(data_points, [1, k]), [num_pts, k, num_feats])

distances = tf.reduce_sum(tf.square(point_matrix - centroid_matrix), axis=2)

#Find the group it belongs to with tf.argmin()

centroid_group = tf.argmin(distances, 1)

# Find the group average

def data_group_avg(group_ids, data):

    # Sum each group

    sum_total = tf.unsorted_segment_sum(data, group_ids, 3)

    # Count each group

    num_total = tf.unsorted_segment_sum(tf.ones_like(data), group_ids, 3)

    # Calculate average

    avg_by_group = sum_total/num_total

    return(avg_by_group)

means = data_group_avg(centroid_group, data_points)

update = tf.group(centroids.assign(means), cluster_labels.assign(centroid_group))

init = tf.global_variables_initializer()

sess.run(init)

for i in range(generations):

    print('Calculating gen {}, out of {}.'.format(i, generations))

    _, centroid_group_count = sess.run([update, centroid_group])

    group_count = []

    for ix in range(k):

        group_count.append(np.sum(centroid_group_count==ix))

    print('Group counts: {}'.format(group_count))

[centers, assignments] = sess.run([centroids, cluster_labels])

# Find which group assignments correspond to which group labels

# First, need a most common element function

def most_common(my_list):

    return(max(set(my_list), key=my_list.count))

label0 = most_common(list(assignments[0:50]))

label1 = most_common(list(assignments[50:100]))

label2 = most_common(list(assignments[100:150]))

group0_count = np.sum(assignments[0:50]==label0)

group1_count = np.sum(assignments[50:100]==label1)

group2_count = np.sum(assignments[100:150]==label2)

accuracy = (group0_count + group1_count + group2_count)/150.

print('Accuracy: {:.2}'.format(accuracy))

# Also plot the output

# First use PCA to transform the 4-dimensional data into 2-dimensions

pca_model = PCA(n_components=2)

reduced_data = pca_model.fit_transform(iris.data)

# Transform centers

reduced_centers = pca_model.transform(centers)

# Step size of mesh for plotting

h = .02

# Plot the decision boundary. For that, we will assign a color to each

x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1

y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Get k-means classifications for the grid points

xx_pt = list(xx.ravel())

yy_pt = list(yy.ravel())

xy_pts = np.array([[x,y] for x,y in zip(xx_pt, yy_pt)])

mytree = cKDTree(reduced_centers)

dist, indexes = mytree.query(xy_pts)

# Put the result into a color plot

indexes = indexes.reshape(xx.shape)

plt.figure(1)

plt.clf()

plt.imshow(indexes, interpolation='nearest',

           extent=(xx.min(), xx.max(), yy.min(), yy.max()),

           cmap=plt.cm.Paired,

           aspect='auto', origin='lower')

# Plot each of the true iris data groups

symbols = ['o', '^', 'D']

label_name = ['Setosa', 'Versicolour', 'Virginica']

for i in range(3):

    temp_group = reduced_data[(i*50):(50)*(i+1)]

    plt.plot(temp_group[:, 0], temp_group[:, 1], symbols[i], markersize=10, label=label_name[i])

# Plot the centroids as a white X

plt.scatter(reduced_centers[:, 0], reduced_centers[:, 1],

            marker='x', s=169, linewidths=3,

            color='w', zorder=10)

plt.title('K-means clustering on Iris Dataset\n'

          'Centroids are marked with white cross')

plt.xlim(x_min, x_max)

plt.ylim(y_min, y_max)

plt.legend(loc='lower right')

plt.show()

tensorflow kmeans 聚类的更多相关文章

用 TensorFlow 实现 k-means 聚类代码解析
k-means 是聚类中比较简单的一种.用这个例子说一下感受一下 TensorFlow 的强大功能和语法. 一. TensorFlow 的安装按照官网上的步骤一步一步来即可,我使用的是 virtua ...
K-Means 聚类算法
K-Means 概念定义: K-Means 是一种基于距离的排他的聚类划分方法. 上面的 K-Means 描述中包含了几个概念: 聚类(Clustering):K-Means 是一种聚类分析(Clus ...
用scikit-learn学习K-Means聚类
在K-Means聚类算法原理中,我们对K-Means的原理做了总结,本文我们就来讨论用scikit-learn来学习K-Means聚类.重点讲述如何选择合适的k值. 1. K-Means类概述在sc ...
K-Means聚类算法原理
K-Means算法是无监督的聚类算法,它实现起来比较简单,聚类效果也不错,因此应用很广泛.K-Means算法有大量的变体,本文就从最传统的K-Means算法讲起,在其基础上讲述K-Means的优化变体 ...
K-means聚类算法
聚类分析(英语:Cluster analysis,亦称为群集分析) K-means也是聚类算法中最简单的一种了,但是里面包含的思想却是不一般.最早我使用并实现这个算法是在学习韩爷爷那本数据挖掘的书中, ...
k-means聚类算法python实现
K-means聚类算法算法优缺点: 优点:容易实现缺点:可能收敛到局部最小值,在大规模数据集上收敛较慢使用数据类型:数值型数据算法思想 k-means算法实际上就是通过计算不同样本间的距离来判断他 ...
K-Means 聚类算法原理分析与代码实现
前言在前面的文章中,涉及到的机器学习算法均为监督学习算法. 所谓监督学习,就是有训练过程的学习.再确切点,就是有 "分类标签集" 的学习. 现在开始,将进入到非监督学习领域.从经 ...
Kmeans聚类算法原理与实现
Kmeans聚类算法 1 Kmeans聚类算法的基本原理 K-means算法是最为经典的基于划分的聚类方法,是十大经典数据挖掘算法之一.K-means算法的基本思想是:以空间中k个点为中心进行聚类,对 ...
机器学习六--K-means聚类算法
机器学习六--K-means聚类算法想想常见的分类算法有决策树.Logistic回归.SVM.贝叶斯等.分类作为一种监督学习方法,要求必须事先明确知道各个类别的信息,并且断言所有待分类项都有一个类别 ...

随机推荐

如何删除mysql 主键索引
如果一个主键是自增长的,不能直接删除该列的主键索引, 应当先取消自增长,再删除主键特性 alter table 表名 drop primary key; [如果这个主键是自增的,先取消自增长.] ...
vue2.0 仿手机新闻站（四）axios
1.axios的配置 main.js import Vue from 'vue' import App from './App.vue' // 引入路由 import VueRouter from ...
Sahi ---实现 Web 自动化测试
参考网址:http://sahipro.com/docs/sahi-apis/index.html Sahi 是 Tyto Software 旗下的一个基于业务的开源 Web 应用自动化测试工具.Sa ...
dm8148 videoM3 link源代码解析
样例:从A8送一帧jpeg图片到videoM3解码,然后在将解码的数据传递到A8, 这个流程涉及的link源代码例如以下: dm8148 link之间数据传递 1)在A8上调用IpcBitsOutLi ...
vue-bus 组件通信插件
vue-bus 一个 Vue.js 事件中心插件,同时支持 Vue 1.0 和 2.0 原因 Vue 2.0 重新梳理了事件系统,因为基于组件树结构的事件流方式实在是让人难以理解,并且在组件结构扩展的 ...
charles 4.x 破解版安装以及使用
下载地址 https://pan.baidu.com/s/1dFvYM7B 破解方法未破解的情况下,每30分钟会弹出一个提示,然后关闭软件将压缩包内的 charles.jar 复制到安装目录下,替 ...
Docker入门系列5：常见问题小结
重启容器再次运行容器: docker start container_id 然后 docker attach container_id 就可以继续下命令了. [编辑]命名 --name [编辑]端口 ...
Python学习总结之四 -- 这就是Python的字典
字典原来是这么回事儿 Python学习到现在,我们已经知道,如果想将值分组到结构中,并且通过编号对其进行引用,列表就可以派上用场.不过,今天,我们将学到一种通过名字引用值的数据结构,应该知道这种数据类 ...
解析域名得到IP
本文转载至 http://www.cocoachina.com/bbs/read.php?tid=142713&page=e&#a 分享类型:游戏开发相关 #include &l ...
SPOJ LCS2 - Longest Common Substring II 后缀自动机多个串的LCS
LCS2 - Longest Common Substring II no tags A string is finite sequence of characters over a non-emp ...

tensorflow kmeans 聚类

tensorflow kmeans 聚类的更多相关文章

随机推荐

热门专题