实战--利用HierarchicalClustering 进行基因表达聚类分析
利用建立分级树对酵母基因表达数据进行聚类分析
一、原理
根据基因表达数据,得出距离矩阵

↓

- 最初,每个点都是一个集合
- 每次选取距离最小的两个集合,将他们合并,然后更新这个新集合与其它点的距离
- 新集合与别的集合距离的计算方法
- ①两个集合之间的最短距离
- ②两个集合所有点之间求距离求平均 →

- 把这个新集合加入距离矩阵中,原来的两个小集合就被替换掉
- 如此循环,直到剩下一个集合,那就建立了一棵树
- 在树的某一处横断,就可以得到6类

230个酵母基因表达数据
http://bioinformaticsalgorithms.com/data/realdatasets/Clustering/230genes_log_expression.txt
二、python下利用Sklearn包实现
Sklearn包的安装
参照 https://scikit-learn.org/stable/install.html
代码(python 3.7环境)
from sklearn.cluster import AgglomerativeClustering
import numpy as np
from os.path import dirname
import numpy as np
import math
import random
import matplotlib.pyplot as plt def InputData(dataset):
dataset = [line.split() for line in dataset]
name = [item[1] for item in dataset[1:]]
points = []
for line in dataset[1:]:
if(len(line)==10):
points.append(list(map(float,line[3:])))
elif(len(line)==9):
points.append(list(map(float,line[2:])))
return points if __name__ == '__main__': f = open("output","w") INF = 999999
dataset = open(dirname(__file__)+'230genes_log_expression.txt').read().strip().split('\n') X = np.array(InputData(dataset)) clustering = AgglomerativeClustering(n_clusters=6).fit(X) #print(clustering.labels_) lb = clustering.labels_ clusters = [[] for i in range(6)] for i in range(len(X)):
clusters[lb[i]].append(i) fig=plt.figure() x = [i for i in range(1,8)]
for c in range(6):
plt.subplot(231+c)
for i in clusters[c]:
plt.plot(x,X[i],linewidth=0.5,linestyle='-',marker='')
plt.show()
结果

!注意
初次使用sklearn的时候python3.7解释器会报错
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp
是因为从python3.4开始就不再支持 import imp , 需要按照错误信息找到相关的文件,将import imp 改为 import importlib,问题即解决
三、手动实现建立 Hierarchical树
'''
coder Lokwongho 2018.11 ''' from os.path import dirname
import numpy as np
import math
import random
import matplotlib.pyplot as plt ######## Get the Distance Matrix ########
def PearsonCorrelationDistance(Expres):
n = len(Expres)
l = len(Expres[0])
Distance = np.zeros(shape=[n,n],dtype=float) u = [ sum(i)/l for i in Expres ]
Sigma = []
for i in range(n):
sig=0;
for x in range(l):
sig += (Expres[i][x]-u[i])**2;
Sigma.append(sig) for i in range(n):
for j in range(i,n):
sigU=0
for x in range(l):
sigU += (Expres[i][x]-u[i])*(Expres[j][x]-u[j])
Distance[i][j]=1-(sigU)/math.sqrt(Sigma[i]*Sigma[j])
Distance[j][i]=Distance[i][j] return Distance ######## Build the Hierarchical Tree ########
def locateMin(size): # calculate the diatance between two cluster: minimum distance method
minVal = 999999
minId = [-1,-1]
for x in range(size):
for y in range(size):
if x!=y and matrix[x][y]<minVal and delete[x]==False and delete[y]==False:
minVal = matrix[x][y]
minId = [x,y]
#print(minId,size)
return [minId,minVal] def buildTree():
Tree = [[]for i in range(maxn)]
global matrix
global delete
ptr = n
delete = [False for i in range(maxn)]
num = [0 for i in range(maxn)]
for i in range(n):
num[i] = 1
weight = [0 for i in range(maxn)]
while(ptr<maxn): [[minX,minY],minVal] = locateMin(ptr)
Tree[ptr].append([minX,minVal/2-weight[minX]])
Tree[ptr].append([minY,minVal/2-weight[minY]])
Tree[minX].append([ptr,minVal/2-weight[minX]])
Tree[minY].append([ptr,minVal/2-weight[minY]])
weight[ptr]=minVal/2
delete[minX]=True
delete[minY]=True
#print(minX,minY,minVal)
#print(delete)
for i in range(ptr+1):
if delete[i]==False:
tmp=(matrix[minX][i]*num[minX]+matrix[minY][i]*num[minY])/(num[minX]+num[minY])
matrix[ptr][i] = tmp
matrix[i][ptr] = tmp
num[ptr] = num[minX]+num[minY]
ptr += 1
return Tree def HierarchicalClustering(DMatrix):
global maxn
global n
global Tree
global matrix n = len(DMatrix)
maxn = 2*n-1
matrix = np.zeros(shape=(maxn,maxn))
for i in range(n):
matrix[i][:n] = DMatrix[i]
Tree = buildTree() ######## Input and output Data ########
def InputData(dataset):
dataset = [line.split() for line in dataset]
name = [item[1] for item in dataset[1:]]
#print(m)
# print(m,k)
points = []
for line in dataset[1:]:
if(len(line)==10):
points.append(list(map(float,line[3:])))
elif(len(line)==9):
points.append(list(map(float,line[2:])))
return [points,name] '''
Reference to 'whoami_T' in csdn for print the tree
https://blog.csdn.net/weixin_39722498/article/details/81534247
'''
def tree(lst):
# 树状图输出列表
l = len(lst)
if l == 0:
print('-' * 3)
else:
for i, j in enumerate(lst):
if i != 0:
f.write(tabs[0])
print(tabs[0], end='')
if l == 1:
s = '=' * 3
elif i == 0:
s = '┬' + '-' * 2
elif i + 1 == l:
s = '└' + '─' * 2
else:
s = '├' + '─' * 2
f.write(s)
print(s, end='')
if isinstance(j, list) or isinstance(j, tuple):
if i + 1 == l:
tabs[0] += blank[0] * 3
else:
tabs[0] += '│' + blank[0] * 2
tree(j)
else:
print(name[j])
f.write(name[j] + "\n")
tabs[0] = tabs[0][:-3] def traversalTree(v): global visited
if v < n:
return [v]
visited[v] = 1
items = []
for i in Tree[v]:
if visited[i[0]] == 0:
items += traversalTree(i[0]) return [items] def outPut():
global visited
global tabs
global blank blank = [
chr(183)] ##此处为空格格式;Windows控制台下可改为chr(12288) ;linux系统中可改为chr(32)【chr(32)==' ' ;chr(183)=='·' ;chr(12288)==' '】
tabs = ['']
visited = [0 for i in range(maxn)]
TreeList = traversalTree(maxn-1)
#print(Tree)
tree(TreeList)
########################### if __name__ == '__main__': f = open("output","w") INF = 999999
dataset = open(dirname(__file__)+'230genes_log_expression.txt').read().strip().split('\n') [points,name] = InputData(dataset) DMatrix = PearsonCorrelationDistance(points) HierarchicalClustering(DMatrix) outPut()
===┬--┬--┬--┬--YJL109C
···│··│··│··└──YNL174W
···│··│··└──┬--YLR129w
···│··│·····└──┬--YGR264C
···│··│········└──┬--YLR449W
···│··│···········└──YNL002C
···│··└──┬--┬--YOL039W
···│·····│··└──┬--YNL067W
···│·····│·····└──┬--YDR382W
···│·····│········└──YGL031C
···│·····└──┬--YBR238C
···│········└──┬--YNL065W
···│···········└──┬--YNL303W
···│··············└──┬--┬--┬--YLR249W
···│·················│··│··└──YPL131W
···│·················│··└──┬--┬--YBR032W
···│·················│·····│··└──┬--YNL069C
···│·················│·····│·····└──┬--YBL027W
···│·················│·····│········└──YNL119W
···│·················│·····└──┬--┬--┬--┬--┬--YML063W
···│·················│········│··│··│··│··└──┬--YHL015W
···│·················│········│··│··│··│·····└──YDL136w
···│·················│········│··│··│··└──┬--YLR062C
···│·················│········│··│··│·····└──┬--YLL045c
···│·················│········│··│··│········└──YBR181C
···│·················│········│··│··└──┬--YPL220W
···│·················│········│··│·····└──┬--┬--YDR418W
···│·················│········│··│········│··└──YIL018W
···│·················│········│··│········└──┬--YER131w
···│·················│········│··│···········└──YLR198C
···│·················│········│··└──┬--┬--YGL102C
···│·················│········│·····│··└──YJL190C
···│·················│········│·····└──┬--┬--YBR189W
···│·················│········│········│··└──┬--YJL136C
···│·················│········│········│·····└──YBR191W
···│·················│········│········└──┬--YMR121C
···│·················│········│···········└──┬--YLR076C
···│·················│········│··············└──YLR325C
···│·················│········└──┬--┬--┬--YJR145C
···│·················│···········│··│··└──YLR344W
···│·················│···········│··└──┬--YHL033C
···│·················│···········│·····└──YOL040C
···│·················│···········└──┬--┬--┬--YDR064W
···│·················│··············│··│··└──┬--YHR089C
···│·················│··············│··│·····└──YDL208W
···│·················│··············│··└──┬--YIL069C
···│·················│··············│·····└──┬--YNL096C
···│·················│··············│········└──┬--YOR234C
···│·················│··············│···········└──┬--YDR417C
···│·················│··············│··············└──YGR148C
···│·················│··············└──┬--┬--YKL009W
···│·················│·················│··└──┬--YNL301C
···│·················│·················│·····└──YOL120C
···│·················│·················└──┬--┬--YLR340W
···│·················│····················│··└──┬--YJR123W
···│·················│····················│·····└──YLR048w
···│·················│····················└──┬--┬--YGR214W
···│·················│·······················│··└──YOR309C
···│·················│·······················└──┬--┬--YOR310C
···│·················│··························│··└──YOR312C
···│·················│··························└──┬--YEL054c
···│·················│·····························└──YKR059W
···│·················└──┬--┬--┬--┬--┬--YGL076C
···│····················│··│··│··│··└──YNL175C
···│····················│··│··│··└──┬--YPR137W
···│····················│··│··│·····└──YDL083C
···│····················│··│··└──┬--┬--YJL148W
···│····················│··│·····│··└──┬--YGL078C
···│····················│··│·····│·····└──YAL012W
···│····················│··│·····└──┬--YMR217W
···│····················│··│········└──┬--YGR103W
···│····················│··│···········└──YLR196W
···│····················│··└──┬--YLR186W
···│····················│·····└──┬--YDR025W
···│····················│········└──YBR247C
···│····················└──┬--┬--┬--┬--┬--YHR128W
···│·······················│··│··│··│··└──YKL081W
···│·······················│··│··│··└──┬--┬--YLR180W
···│·······················│··│··│·····│··└──YNL141W
···│·······················│··│··│·····└──┬--YDR060w
···│·······················│··│··│········└──YLR413W
···│·······················│··│··└──┬--┬--┬--YMR290C
···│·······················│··│·····│··│··└──YPL012W
···│·······················│··│·····│··└──┬--YDR144C
···│·······················│··│·····│·····└──YIL053W
···│·······················│··│·····└──┬--┬--YJL177W
···│·······················│··│········│··└──YBR249C
···│·······················│··│········└──┬--YLL044W
···│·······················│··│···········└──YLR448W
···│·······················│··└──┬--┬--YBR048W
···│·······················│·····│··└──YMR131C
···│·······················│·····└──┬--┬--YLR339C
···│·······················│········│··└──YNR053C
···│·······················│········└──┬--YPL226W
···│·······················│···········└──┬--YDR398W
···│·······················│··············└──YAL003W
···│·······················└──┬--YLR355C
···│··························└──┬--YGR160W
···│·····························└──YMR093W
···└──┬--YFR053C
······└──┬--┬--┬--YDL085w
·········│··│··└──┬--YBR117C
·········│··│·····└──┬--┬--YBR116C
·········│··│········│··└──┬--YKL187C
·········│··│········│·····└──YLR267W
·········│··│········└──┬--YDL199c
·········│··│···········└──┬--YBL043W
·········│··│··············└──YBL049W
·········│··└──┬--┬--YBL108W
·········│·····│··└──YCL025C
·········│·····└──┬--┬--YBR051W
·········│········│··└──┬--┬--┬--YNL117W
·········│········│·····│··│··└──YCR010C
·········│········│·····│··└──┬--YER024w
·········│········│·····│·····└──YGR067C
·········│········│·····└──┬--YDL215C
·········│········│········└──┬--YLR377C
·········│········│···········└──┬--YER065c
·········│········│··············└──YKR097W
·········│········└──┬--┬--YLR142w
·········│···········│··└──┬--YIL125W
·········│···········│·····└──┬--YNL195C
·········│···········│········└──┬--YJL089W
·········│···········│···········└──YAL054C
·········│···········└──┬--┬--YJR095W
·········│··············│··└──┬--YOL084W
·········│··············│·····└──┬--YLR174W
·········│··············│········└──┬--YHR096C
·········│··············│···········└──YJL045W
·········│··············└──┬--YEL012w
·········│·················└──┬--YBL045C
·········│····················└──┬--YGL259W
·········│·······················└──YLR312C
·········└──┬--┬--YCR021c
············│··└──┬--YDR343C
············│·····└──┬--YER053c
············│········└──YMR250W
············└──┬--┬--┬--┬--┬--YIL136W
···············│··│··│··│··└──┬--YMR090W
···············│··│··│··│·····└──YNL305C
···············│··│··│··└──┬--┬--YHR087W
···············│··│··│·····│··└──YKL142W
···············│··│··│·····└──┬--YDR031w
···············│··│··│········└──YJR096W
···············│··│··└──┬--YDR516C
···············│··│·····└──YIL111W
···············│··└──┬--┬--YDR342C
···············│·····│··└──YBR183W
···············│·····└──┬--┬--┬--┬--YBR072W
···············│········│··│··│··└──┬--YKL085W
···············│········│··│··│·····└──YLR327C
···············│········│··│··└──┬--┬--YGR008C
···············│········│··│·····│··└──┬--YNL200C
···············│········│··│·····│·····└──┬--YNL173C
···············│········│··│·····│········└──YOL053C
···············│········│··│·····└──┬--YGR248W
···············│········│··│········└──YNL015W
···············│········│··└──┬--YLR258W
···············│········│·····└──┬--YKL103C
···············│········│········└──┬--YGL037C
···············│········│···········└──YLR178C
···············│········└──┬--┬--┬--YHL021C
···············│···········│··│··└──YML128C
···············│···········│··└──┬--YFR015C
···············│···········│·····└──YMR105C
···············│···········└──┬--┬--YFL014W
···············│··············│··└──YOR374W
···············│··············└──┬--YGR244C
···············│·················└──┬--YLL041c
···············│····················└──┬--YER150w
···············│·······················└──YNL160W
···············└──┬--┬--┬--┬--┬--YLL026w
··················│··│··│··│··└──YDR258C
··················│··│··│··└──┬--YDR258C
··················│··│··│·····└──YLR149C
··················│··│··└──┬--┬--YBR139W
··················│··│·····│··└──YLR304C
··················│··│·····└──┬--┬--YEL024w
··················│··│········│··└──┬--YDR529C
··················│··│········│·····└──YPR149W
··················│··│········└──┬--YDR178W
··················│··│···········└──YEL011w
··················│··└──┬--┬--┬--YHR051W
··················│·····│··│··└──┬--YNL052W
··················│·····│··│·····└──YNR001C
··················│·····│··└──┬--YKL141W
··················│·····│·····└──YOR065W
··················│·····└──┬--YBL100C
··················│········└──YPR184W
··················└──┬--┬--YIL162W
·····················│··└──┬--┬--┬--YKL193C
·····················│·····│··│··└──┬--YIL113W
·····················│·····│··│·····└──YMR170C
·····················│·····│··└──┬--┬--YOL032W
·····················│·····│·····│··└──┬--YKL151C
·····················│·····│·····│·····└──┬--YDR272W
·····················│·····│·····│········└──YCL035C
·····················│·····│·····└──┬--YFL054C
·····················│·····│········└──┬--YNL274C
·····················│·····│···········└──┬--YLR270W
·····················│·····│··············└──┬--YDR272W
·····················│·····│·················└──YHR104W
·····················│·····└──┬--YLR356W
·····················│········└──YBR241C
·····················└──┬--YDL204w
························└──┬--┬--┬--YBL015W
···························│··│··└──┬--YKL217W
···························│··│·····└──YMR107W
···························│··└──┬--YMR191W
···························│·····└──┬--YKL109W
···························│········└──YML054C
···························└──┬--YGL191W
······························└──┬--┬--┬--YGR243W
·································│··│··└──┬--YBL048W
·································│··│·····└──YGR236C
·································│··└──┬--YDR070c
·································│·····└──┬--YBR147W
·································│········└──┬--YNL134C
·································│···········└──YNL194C
·································└──┬--┬--┬--YBL064C
····································│··│··└──YGR043C
····································│··└──┬--┬--YDR171W
····································│·····│··└──YBL078C
····································│·····└──┬--YKL026C
····································│········└──┬--YOR215C
····································│···········└──┬--YFR033C
····································│··············└──YGR088W
····································└──┬--YER067w
·······································└──┬--YDR533C
··········································└──YOR178C
Hierarchical Tree
实战--利用HierarchicalClustering 进行基因表达聚类分析的更多相关文章
- 实战--利用SVM对基因表达标本是否癌变的预测
利用支持向量机对基因表达标本是否癌变的预测 As we mentioned earlier, gene expression analysis has a wide variety of applic ...
- Weblogic CVE-2020-2551漏洞复现&CS实战利用
Weblogic CVE-2020-2551漏洞复现 Weblogic IIOP 反序列化 漏洞原理 https://www.anquanke.com/post/id/199227#h3-7 http ...
- Druid未授权访问实战利用
Druid未授权访问实战利用 最近身边的同学都开始挖src了,而且身边接触到的挖src的网友也是越来越多.作者也是在前几天开始了挖src之路.惊喜又遗憾的是第一次挖src就挖到了一家互联网公司的R ...
- 实战--利用Lloyd算法进行酵母基因表达数据的聚类分析
背景:酵母会在一定的时期发生diauxic shift,有一些基因的表达上升,有一些基因表达被抑制,通过聚类算法,将基因表达的变化模式聚成6类. ORF Name R1.Ratio R2.Ratio ...
- 机器学习实战------利用logistics回归预测病马死亡率
大家好久不见,实战部分一直托更,很不好意思.本文实验数据与代码来自机器学习实战这本书,倾删. 一:前期代码准备 1.1数据预处理 还是一样,设置两个数组,前两个作为特征值,后一个作为标签.当然这是简单 ...
- 利用GSEA对基因表达数据做富集分析
image Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a p ...
- 项目实战利用Python来看美国大选
一.项目介绍 首先分析美国总统竞选这个项目是一个烂大街的项目,但是他的确是一个适合Python新手入门的数据处理项目. 本人在大二刚刚学习了Python数据处理,学习时间不超过5个小时,但是已经可以完 ...
- JSON Hijacking实战利用
0×01漏洞的挖掘 一般挖掘的过程中,burpsuite代理的History做寻找,过滤多余不可能存在漏洞的一些链接,如下图所示: 我们在返回包中json格式发现了如下的敏感信息(用户Id,用户名,用 ...
- 利用RNAseq数据做聚类分析
library(ConsensusClusterPlus)library(factoextra)library(cluster)library(NbClust)# 读入数据data = read.ta ...
随机推荐
- 线特征---LSD and LBD程序运行(一)
最近在看有关特征提取的线特征,暑期就看了相关的论文:<基于点线综合特征的双目视觉SLAM方法_谢晓佳>,最近呢,把里面有关线特征提取LSD和描述子LBD的代码跑了一遍,记录如下: [1]L ...
- datepicker动态初始化
datepicker 初始化动态表单的input,需要调用jquery的on方法来给未来元素初始化. //对动态添加的时间文本框进行动态初始化 $('table').on("focus&qu ...
- 如何将你拍摄的照片转换成全景图及六面体(PTGui)
在完成全景照片的拍摄之后,接下来,我们需要的是进行全景图的拼接.全景图片分为两种类型1.立方体全景图(6面体)制作全景时通常使用该种格式 如下图 2.球形图(2:1的单张全景图片)2:1全景图宽高比例 ...
- 10.16JS日记
1.parseint() 2.parsefloat() 这两个单词运行的时候遇到第一个非数字就结束了 3.var a="hello word" a这个变量为字符串,每一个字母为字 ...
- Linux移植之auto.conf、autoconf.h、Mach-types.h的生成过程简析
在Linux移植之make uImage编译过程分析中分析了uImage文件产生的过程,在uImage产生的过程中,顺带还产生了其它的一些中间文件.这里主要介绍几个比较关键的文件 1.linux-2. ...
- 体验godaddy域名转入,添加A记录,及使用dnspod的NS
有两个域名一直放在朋友那,这个朋友是个神人,经常换电话号码,联系非常不方便. 近日将域名转入到godaddy下面了,第一次做域名转移,很是好奇. 之前域名在21.cn注册的,朋友帮我申请域名转出后,2 ...
- Zookeeper简介与使用
1. Zookeeper概念简介: Zookeeper是一个分布式协调服务:就是为用户的分布式应用程序提供协调服务 A.zookeeper是为别的分布式程序服务的 B.Zookeeper本身就是一 ...
- fedora25的nfs文件系统搭建
关于NFS的原理介绍可以参考这篇文章:http://blog.51cto.com/atong/1343950 1> nfs服务端 安装nfs工具包和rpcbind包 dnf install nf ...
- MySQL5.7的安装(CentOS 7 & Ubuntu 16.04)
CentOS 通过 yum 安装MySQL5.7 Yum Repository 下载地址:https://dev.mysql.com/downloads/repo/yum/ 选择相应的版本进行下载:R ...
- Creating Your Own PHP Helper Functions In Laravel
By Hamza Ali LAST UPDATED AUG 26, 2018 12,669 104 Laravel provides us with many built-in helper fun ...