菜鸟之路——机器学习之HierarchicalClustering层次分析及个人理解

这个算法。我个人感觉有点鸡肋。最终的表达也不是特别清楚。

原理很简单，从所有的样本中选取Euclidean distance最近的两个样本，归为一类，取其平均值组成一个新样本，总样本数少1；不断的重复，最终样本数为1。这样的话就形成了一个树，每个节点要不有两个子节点，要不没有子节点。

这个算法也大概能分出来类，但是实用性我觉得不是很强。

源代码

 from numpy import *

 class cluster_node:

     def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):

         self.left=left

         self.right = right

         self.vec = vec

         self.distance = distance

         self.id = id

         self.count = count

 def L2dist(v1,v2):

     return sqrt(sum(v1-v2)**2)

 def L1dist(v1,v2):

     return sum(abs(v1-v2))

 def hcluster(features,distance=L2dist):

     distances={}

     currentclustid=-1

     clust=[cluster_node(array(features[i],id=i) for i in range(len(features)))]

     while len(clust)>1:

         lowstpiar=(0,1)

         closest=distance(clust[0].vec,clust[1].vec)

         for i in range(len(clust)):

             for j in range(i+1,len(clust)):

                 if(clust[i].id,clust[j].id) not in distances:

                     distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)

                 d=distances[(clust[i].id,clust[j].id)]

                 if d<closest:

                     closest=d

                     lowstpiar=(i,j)

         mergeve=[(clust[lowstpiar[0]].vec[i]+clust[lowstpiar[1]].vec[i])/2.0 for i in range(len(clust[lowstpiar[1]].vec))]

         newcluster=cluster_node(array(mergeve),left=clust[lowstpiar[0]],right=clust[lowstpiar[1]],distance=closest,id=currentclustid)

         currentclustid-=1

         del clust[lowstpiar[1]]

         del clust[lowstpiar[0]]

         clust.append(newcluster)

     return clust[0]

 def extract_clusters(clust,dist):

     clusters={}

     if clust.distance<dist:

         return [clust]

     else:

         cl=[]

         cr=[]

         if clust.left!=None:

             cl=extract_clusters(clust.left,dist=dist)

         if clust.right != None:

             cr=extract_clusters(clust.right,dist=dist)

         return cl+cr

 def get_cluster_element(clust):

     if clust.id>=0:

         return [clust.id]

     else:

         cl=[]

         cr=[]

         if clust.left!=None:

             cl=get_cluster_element(clust.left)

         if clust.right != None:

             cr=get_cluster_element(clust.right)

         return cl+cr

 def printclust(clust,labels=None,n=0):

     for i in range(n):print(' ')

     if clust.id<0:

         print('-')

     else:

         if labels==None:print(clust.id)

         else:print(labels[clust.id])

     if clust.left !=None:printclust(clust.left,labels=labels,n=n+1)

     if clust.right != None: printclust(clust.right, labels=labels, n=n + 1)

 def getheight(clust):

     if clust.left==None and clust.right==None:return 1

     return getheight(clust.left)+getheight(clust.right)

 def getdepth(clust):

     if clust.left==None and clust.right==None:return 0

     return max(getheight(clust.left),getheight(clust.right))+clust.distance

为了节约时间，我只写了算法部分，实际应用的没写。

这个当中的递归用的不错。还有对每个节点类的定义

菜鸟之路——机器学习之HierarchicalClustering层次分析及个人理解的更多相关文章

菜鸟之路——机器学习之决策树个人理解及Python实现
最近开始学习机器学习,以下会记录我学习中遇到的问题以及我个人的理解决策树算法,网上很多介绍,在这不复制粘贴.下面解释几个关键词就好. 信息熵(entropy):就是信息不确定性的多少 H(x)=-Σ ...
菜鸟之路——机器学习之BP神经网络个人理解及Python实现
关键词: 输入层(Input layer).隐藏层(Hidden layer).输出层(Output layer) 理论上如果有足够多的隐藏层和足够大的训练集,神经网络可以模拟出任何方程.隐藏层多的时 ...
菜鸟之路——机器学习之KNN算法个人理解及Python实现
KNN(K Nearest Neighbor) 还是先记几个关键公式距离:一般用Euclidean distance E(x,y)√∑(xi-yi)2 .名字这么高大上,就是初中学的两点间的距离 ...
菜鸟之路——机器学习之Kmeans聚类个人理解及Python实现
一些概念相关系数:衡量两组数据相关性决定系数:(R2值)大概意思就是这个回归方程能解释百分之多少的真实值. Kmeans聚类大致就是选择K个中心点.不断遍历更新中心点的位置.离哪个中心点近就属于哪 ...
菜鸟之路——机器学习之非线性回归个人理解及python实现
关键词: 梯度下降:就是让数据顺着梯度最大的方向,也就是函数导数最大的放下下降,使其快速的接近结果. Cost函数等公式太长,不在这打了.网上多得是. 这个非线性回归说白了就是缩小版的神经网络. py ...
菜鸟之路——机器学习之线性回归个人理解及Python实现
这一节很简单,都是高中讲过的东西简单线性回归:y=b0+b1x+ε.b1=(Σ(xi-x–)(yi-y–))/Σ(xi-x–)ˆ2 b0=y--b1x- 其中ε取为均值为0的正态 ...
菜鸟之路——机器学习之SVM分类器学习理解以及Python实现
SVM分类器里面的东西好多呀,碾压前两个.怪不得称之为深度学习出现之前表现最好的算法. 今天学到的也应该只是冰山一角,懂了SVM的一些原理.还得继续深入学习理解呢. 一些关键词: 超平面(hyper ...
菜鸟之路——Linux基础::计算机网络基础，Linux常用系统命令，Linux用户与组权限
最近又重新安排了一下我的计划.准备跟着老男孩的教程继续学习,感觉这一套教程讲的很全面,很详细.比我上一套机器学习好的多了. 他的第一阶段是Python基础,第二阶段是高等数学基础,主要将机器学习和深度 ...
从Elo Rating System谈到层次分析法
1. Elo Rating System Elo Rating System对于很多人来说比较陌生,根据wikipedia上的解释:Elo评分系统是一种用于计算对抗比赛(例如象棋对弈)中对手双方技能水 ...

随机推荐

linux 命令——7 mv(转）
mv命令是move的缩写,可以用来移动文件或者将文件改名(move (rename) files),是Linux系统下常用的命令,经常用来备份文件或者目录. 1．命令格式: mv [选项] 源文件或目 ...
OpenGL位图变形问题
因为初次接触OpenGL,图形学也后悔当初在学校没有认真学,隐约记得教授当时讲过图像变形的问题,而且我的bitmap也是2的N次方:16*16的,在网络上找到的大多都是一句话:“视口的纵横比一般和视景 ...
log4j.properties中的这句话“log4j.logger.org.hibernate.SQL=DEBUG ”该怎么写在log4j.xml里面呢？
http://www.cnblogs.com/gredswsh/p/log4j_xml_properties.html 请问:log4j.properties中的这句话“log4j.logger.or ...
php xdebug扩展无法进入断点问题
Waiting for incoming connection with ide key 看到这句话就恶心这两天搞php运行环境搞的头大,好在现在终于调通了,可以正常进入断点了现在记录一下,避免下 ...
Java设计模式学习——设计原则
第一章设计原则 1.开闭原则一个软件实体,像类,模块,函数应该对扩展开放,对修改关闭在设计的时候,要时刻考虑,让这个类尽量的好,写好了就不要去修改.如果有新的需求来,在增加一个类就完事了,原来的 ...
dom事件操作例题，电子时钟，验证码，随机事件
dom事件操作当事件发生时,可以执行js 例子: 当用户点击时,会改变<h1>的内容: <h1 onClick="this.innerHTML='文本更换'"& ...
python之enumerate
http://eagletff.blog.163.com/blog/static/116350928201266111125832/一般情况下,如果要对一个列表或者数组既要遍历索引又要遍历元素时,可以 ...
pytthon + Selenium+chrome linux 部署
1,centos7 安装 google-chrome (1) 添加chrome的repo源 vi /etc/yum.repos.d/google.repo [google] name=Google-x ...
Linux入门-第八周
1.用shell脚本实现自动登录机器 #!/usr/bin/expectset ip 192.168.2.192set user rootset password rootspawn ssh $use ...
12.2 VUE学习之-if判断,实践加减input里的值
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http ...

菜鸟之路——机器学习之HierarchicalClustering层次分析及个人理解

菜鸟之路——机器学习之HierarchicalClustering层次分析及个人理解的更多相关文章

随机推荐

热门专题