AP(affinity propagation)研究
待补充……
AP算法,即Affinity propagation,是Brendan J. Frey* 和Delbert Dueck于2007年在science上提出的一种算法(文章链接,维基百科)
现在只是初步研究了一下官网上提供的MATLAB源码:apcluster.m
%APCLUSTER Affinity Propagation Clustering (Frey/Dueck, Science 2007)
% [idx,netsim,dpsim,expref]=APCLUSTER(s,p) clusters data, using a set
% of real-valued pairwise data point similarities as input. Clusters
% are each represented by a cluster center data point (the "exemplar").
% The method is iterative and searches for clusters so as to maximize
% an objective function, called net similarity.
%
% For N data points, there are potentially N^2-N pairwise similarities;
% this can be input as an N-by-N matrix 's', where s(i,k) is the
% similarity of point i to point k (s(i,k) needn抰 equal s(k,i)). In
% fact, only a smaller number of relevant similarities are needed; if
% only M similarity values are known (M < N^2-N) they can be input as
% an M-by-3 matrix with each row being an (i,j,s(i,j)) triple.
%
% APCLUSTER automatically determines the number of clusters based on
% the input preference 'p', a real-valued N-vector. p(i) indicates the
% preference that data point i be chosen as an exemplar. Often a good
% choice is to set all preferences to median(s); the number of clusters
% identified can be adjusted by changing this value accordingly. If 'p'
% is a scalar, APCLUSTER assumes all preferences are that shared value.
%
% The clustering solution is returned in idx. idx(j) is the index of
% the exemplar for data point j; idx(j)==j indicates data point j
% is itself an exemplar. The sum of the similarities of the data points to
% their exemplars is returned as dpsim, the sum of the preferences of
% the identified exemplars is returned in expref and the net similarity
% objective function returned is their sum, i.e. netsim=dpsim+expref.
%
% [ ... ]=apcluster(s,p,'NAME',VALUE,...) allows you to specify
% optional parameter name/value pairs as follows:
%
% 'maxits' maximum number of iterations (default: 1000)
% 'convits' if the estimated exemplars stay fixed for convits
% iterations, APCLUSTER terminates early (default: 100)
% 'dampfact' update equation damping level in [0.5, 1). Higher
% values correspond to heavy damping, which may be needed
% if oscillations occur. (default: 0.9)
% 'plot' (no value needed) Plots netsim after each iteration
% 'details' (no value needed) Outputs iteration-by-iteration
% details (greater memory requirements)
% 'nonoise' (no value needed) APCLUSTER adds a small amount of
% noise to 's' to prevent degenerate cases; this disables that.
%
% Copyright (c) B.J. Frey & D. Dueck (2006). This software may be
% freely used and distributed for non-commercial purposes.
% (RUN APCLUSTER WITHOUT ARGUMENTS FOR DEMO CODE)
function [idx,netsim,dpsim,expref]=apcluster(s,p,varargin);
if nargin==0, % display demo
fprintf('Affinity Propagation (APCLUSTER) sample/demo code\n\n');
fprintf('N=100; x=rand(N,2); % Create N, 2-D data points\n');
fprintf('M=N*N-N; s=zeros(M,3); % Make ALL N^2-N similarities\n');
fprintf('j=1;\n');
fprintf('for i=1:N\n');
fprintf(' for k=[1:i-1,i+1:N]\n');
fprintf(' s(j,1)=i; s(j,2)=k; s(j,3)=-sum((x(i,:)-x(k,:)).^2);\n');
fprintf(' j=j+1;\n');
fprintf(' end;\n');
fprintf('end;\n');
fprintf('p=median(s(:,3)); % Set preference to median similarity\n');
fprintf('[idx,netsim,dpsim,expref]=apcluster(s,p,''plot'');\n');
fprintf('fprintf(''Number of clusters: %%d\\n'',length(unique(idx)));\n');
fprintf('fprintf(''Fitness (net similarity): %%g\\n'',netsim);\n');
fprintf('figure; % Make a figures showing the data and the clusters\n');
fprintf('for i=unique(idx)''\n');
fprintf(' ii=find(idx==i); h=plot(x(ii,1),x(ii,2),''o''); hold on;\n');
fprintf(' col=rand(1,3); set(h,''Color'',col,''MarkerFaceColor'',col);\n');
fprintf(' xi1=x(i,1)*ones(size(ii)); xi2=x(i,2)*ones(size(ii)); \n');
fprintf(' line([x(ii,1),xi1]'',[x(ii,2),xi2]'',''Color'',col);\n');
fprintf('end;\n');
fprintf('axis equal tight;\n\n');
return;
end;
start = clock;
% Handle arguments to function
if nargin<2 error('Too few input arguments');
else
maxits=1000; convits=100; lam=0.9; plt=0; details=0; nonoise=0;
i=1;
while i<=length(varargin)
if strcmp(varargin{i},'plot')
plt=1; i=i+1;
elseif strcmp(varargin{i},'details')
details=1; i=i+1;
elseif strcmp(varargin{i},'sparse')
% [idx,netsim,dpsim,expref]=apcluster_sparse(s,p,varargin{:});
fprintf('''sparse'' argument no longer supported; see website for additional software\n\n');
return;
elseif strcmp(varargin{i},'nonoise')
nonoise=1; i=i+1;
elseif strcmp(varargin{i},'maxits')
maxits=varargin{i+1};
i=i+2;
if maxits<=0 error('maxits must be a positive integer'); end;
elseif strcmp(varargin{i},'convits')
convits=varargin{i+1};
i=i+2;
if convits<=0 error('convits must be a positive integer'); end;
elseif strcmp(varargin{i},'dampfact')
lam=varargin{i+1};
i=i+2;
if (lam<0.5)||(lam>=1)
error('dampfact must be >= 0.5 and < 1');
end;
else i=i+1;
end;
end;
end;
if lam>0.9
fprintf('\n*** Warning: Large damping factor in use. Turn on plotting\n');
fprintf(' to monitor the net similarity. The algorithm will\n');
fprintf(' change decisions slowly, so consider using a larger value\n');
fprintf(' of convits.\n\n');
end; % Check that standard arguments are consistent in size
if length(size(s))~=2 error('s should be a 2D matrix');
elseif length(size(p))>2 error('p should be a vector or a scalar');
elseif size(s,2)==3
tmp=max(max(s(:,1)),max(s(:,2)));
if length(p)==1 N=tmp; else N=length(p); end;
if tmp>N
error('data point index exceeds number of data points');
elseif min(min(s(:,1)),min(s(:,2)))<=0
error('data point indices must be >= 1');
end;
elseif size(s,1)==size(s,2)
N=size(s,1);
if (length(p)~=N)&&(length(p)~=1)
error('p should be scalar or a vector of size N');
end;
else error('s must have 3 columns or be square'); end; % Construct similarity matrix
if N>3000
fprintf('\n*** Warning: Large memory request. Consider activating\n');
fprintf(' the sparse version of APCLUSTER.\n\n');
end;
if size(s,2)==3 && size(s,1)~=3,
S=-Inf*ones(N,N,class(s));
for j=1:size(s,1), S(s(j,1),s(j,2))=s(j,3); end;
else S=s;
end; if S==S', symmetric=true; else symmetric=false; end;
realmin_=realmin(class(s)); realmax_=realmax(class(s)); % In case user did not remove degeneracies from the input similarities,
% avoid degenerate solutions by adding a small amount of noise to the
% input similarities
if ~nonoise
rns=randn('state'); randn('state',0);
S=S+(eps*S+realmin_*100).*rand(N,N);
randn('state',rns);
end; % Place preferences on the diagonal of S
if length(p)==1 for i=1:N S(i,i)=p; end;
else for i=1:N S(i,i)=p(i); end;
end; % Numerical stability -- replace -INF with -realmax
n=find(S<-realmax_); if ~isempty(n), warning('-INF similarities detected; changing to -REALMAX to ensure numerical stability'); S(n)=-realmax_; end; clear('n');
if ~isempty(find(S>realmax_,1)), error('+INF similarities detected; change to a large positive value (but smaller than +REALMAX)'); end; % Allocate space for messages, etc
dS=diag(S); A=zeros(N,N,class(s)); R=zeros(N,N,class(s)); t=1;
if plt, netsim=zeros(1,maxits+1); end;
if details
idx=zeros(N,maxits+1);
netsim=zeros(1,maxits+1);
dpsim=zeros(1,maxits+1);
expref=zeros(1,maxits+1);
end; % Execute parallel affinity propagation updates
e=zeros(N,convits); dn=0; i=0;
if symmetric, ST=S; else ST=S'; end; % saves memory if it's symmetric
while ~dn
i=i+1; % Compute responsibilities
A=A'; R=R';
for ii=1:N,
old = R(:,ii);
AS = A(:,ii) + ST(:,ii); [Y,I]=max(AS); AS(I)=-Inf;
[Y2,I2]=max(AS);
R(:,ii)=ST(:,ii)-Y;
R(I,ii)=ST(I,ii)-Y2;
R(:,ii)=(1-lam)*R(:,ii)+lam*old; % Damping
R(R(:,ii)>realmax_,ii)=realmax_;
end;
A=A'; R=R'; % Compute availabilities
for jj=1:N,
old = A(:,jj);
Rp = max(R(:,jj),0); Rp(jj)=R(jj,jj);
A(:,jj) = sum(Rp)-Rp;
dA = A(jj,jj); A(:,jj) = min(A(:,jj),0); A(jj,jj) = dA;
A(:,jj) = (1-lam)*A(:,jj) + lam*old; % Damping
end; % Check for convergence
E=((diag(A)+diag(R))>0); e(:,mod(i-1,convits)+1)=E; K=sum(E);
if i>=convits || i>=maxits,
se=sum(e,2);
unconverged=(sum((se==convits)+(se==0))~=N);
if (~unconverged&&(K>0))||(i==maxits) dn=1; end;
end; % Handle plotting and storage of details, if requested
if plt||details
if K==0
tmpnetsim=nan; tmpdpsim=nan; tmpexpref=nan; tmpidx=nan;
else
I=find(E); notI=find(~E); [tmp c]=max(S(:,I),[],2); c(I)=1:K; tmpidx=I(c);
tmpdpsim=sum(S(sub2ind([N N],notI,tmpidx(notI))));
tmpexpref=sum(dS(I));
tmpnetsim=tmpdpsim+tmpexpref;
end;
end;
if details
netsim(i)=tmpnetsim; dpsim(i)=tmpdpsim; expref(i)=tmpexpref;
idx(:,i)=tmpidx;
end;
if plt,
netsim(i)=tmpnetsim;
figure(234);
plot(((netsim(1:i)/10)*100)/10,'r-'); xlim([0 i]); % plot barely-finite stuff as infinite
xlabel('# Iterations');
ylabel('Fitness (net similarity) of quantized intermediate solution');
% drawnow;
end;
end; % iterations
I=find((diag(A)+diag(R))>0); K=length(I); % Identify exemplars
if K>0
[tmp c]=max(S(:,I),[],2); c(I)=1:K; % Identify clusters
% Refine the final set of exemplars and clusters and return results
for k=1:K ii=find(c==k); [y j]=max(sum(S(ii,ii),1)); I(k)=ii(j(1)); end; notI=reshape(setdiff(1:N,I),[],1);
[tmp c]=max(S(:,I),[],2); c(I)=1:K; tmpidx=I(c);
tmpdpsim=sum(S(sub2ind([N N],notI,tmpidx(notI))));
tmpexpref=sum(dS(I));
tmpnetsim=tmpdpsim+tmpexpref;
else
tmpidx=nan*ones(N,1); tmpnetsim=nan; tmpexpref=nan;
end;
if details
netsim(i+1)=tmpnetsim; netsim=netsim(1:i+1);
dpsim(i+1)=tmpdpsim; dpsim=dpsim(1:i+1);
expref(i+1)=tmpexpref; expref=expref(1:i+1);
idx(:,i+1)=tmpidx; idx=idx(:,1:i+1);
else
netsim=tmpnetsim; dpsim=tmpdpsim; expref=tmpexpref; idx=tmpidx;
end;
if plt||details
fprintf('\nNumber of exemplars identified: %d (for %d data points)\n',K,N);
fprintf('Net similarity: %g\n',tmpnetsim);
fprintf(' Similarities of data points to exemplars: %g\n',dpsim(end));
fprintf(' Preferences of selected exemplars: %g\n',tmpexpref);
fprintf('Number of iterations: %d\n\n',i);
fprintf('Elapsed time: %g sec\n',etime(clock,start));
end;
if unconverged
fprintf('\n*** Warning: Algorithm did not converge. Activate plotting\n');
fprintf(' so that you can monitor the net similarity. Consider\n');
fprintf(' increasing maxits and convits, and, if oscillations occur\n');
fprintf(' also increasing dampfact.\n\n');
end;
实际使用的示例数据:
s矩阵以及p的取值,
s=[1 0.85 0.9 0.5 0.45 0.5 0.4 0.4 0.5 0.45;
0.85 1 0.85 0.6 0.65 0.7 0.6 0.55 0.8 0.7;
0.9 0.85 1 0.75 0.7 0.65 0.55 0.5 0.6 0.5;
0.5 0.6 0.75 1 0.9 0.7 0.7 0.85 0.5 0.45;
0.45 0.65 0.7 0.9 1 0.9 0.9 0.85 0.6 0.65;
0.5 0.7 0.65 0.7 0.9 1 0.85 0.75 0.75 0.75;
0.4 0.6 0.55 0.7 0.9 0.85 1 0.85 0.5 0.55;
0.4 0.55 0.5 0.85 0.85 0.75 0.85 1 0.3 0.25;
0.5 0.8 0.6 0.5 0.6 0.75 0.5 0.3 1 0.9;
0.45 0.7 0.5 0.45 0.65 0.75 0.55 0.25 0.9 1;
];
p=median(median(s));
最后的运行结果:
idx = 1
1
1
5
5
5
5
5
9
9 netsim = 8.1875 dpsim = 6.2000 expref = 1.9875
AP(affinity propagation)研究的更多相关文章
- Affinity Propagation Demo1学习
利用AP算法进行聚类: 首先导入需要的包: from sklearn.cluster import AffinityPropagation from sklearn import metrics fr ...
- Affinity Propagation Algorithm
The principle of Affinity Propagation Algorithm is discribed at above. It is widly applied in many f ...
- Affinity Propagation Demo2学习【可视化股票市场结构】
这个例子利用几个无监督的技术从历史报价的变动中提取股票市场结构. 使用报价的日变化数据进行试验. Learning a graph structure 首先使用sparse inverse(相反) c ...
- AP聚类算法(Affinity propagation Clustering Algorithm )
AP聚类算法是基于数据点间的"信息传递"的一种聚类算法.与k-均值算法或k中心点算法不同,AP算法不需要在运行算法之前确定聚类的个数.AP算法寻找的"examplars& ...
- 伪AP检测技术研究
转载自:http://www.whitecell-club.org/?p=310 随着城市无线局域网热点在公共场所大规模的部署,无线局域网安全变得尤为突出和重要,其中伪AP钓鱼攻击是无线网络中严重的安 ...
- Affinity Propagation
1. 调用方法: AffinityPropagation(damping=0.5, max_iter=200, convergence_iter=15, copy=True, preference=N ...
- knn/kmeans/kmeans++/Mini Batch K-means/Affinity Propagation/Mean Shift/层次聚类/DBSCAN 区别
可以看出来除了KNN以外其他算法都是聚类算法 1.knn/kmeans/kmeans++区别 先给大家贴个简洁明了的图,好几个地方都看到过,我也不知道到底谁是原作者啦,如果侵权麻烦联系我咯~~~~ k ...
- AP聚类
基于代表点的聚类算法可以说是聚类算法中"最经典的,最流行的,也是最前沿的". "最经典"是因为K均值是最早出现的聚类算法之一; "最流行"是 ...
- 机器学习:Python实现聚类算法(一)之AP算法
1.算法简介 AP(Affinity Propagation)通常被翻译为近邻传播算法或者亲和力传播算法,是在2007年的Science杂志上提出的一种新的聚类算法.AP算法的基本思想是将全部数据点都 ...
随机推荐
- Release Management 安装 之 集成TFS
集成TFS时需要在TFS服务器执行 tfssecurity /g+ "Team Foundation Service Accounts" n:ALM\rmtfsint ALLOW ...
- Leetcode: plus one
June 22, 2015 Given a number represented as an array of digits, plus one to the number. Leetcode: pl ...
- 【2016-11-15】【坚持学习】【Day26】【WPF 命令绑定到事件】
今天同事跟我说了已经有用的东西. System.Windows.Interativity 这个命名空间可以让我在界面上将命令绑定到对应的事件上.解决了我一直的疑问,只有点击事件可以绑定??现在有答案了 ...
- Codeforces Round #378 (Div. 2)
A: 思路: 水题,没啥意思; B: 思路: 暴力,也没啥意思; C: 思路: 思维,可以发现从前往后和为b[i]的分成一块,然后这一块里面如果都相同就没法开始吃,然后再暴力找到那个最大的且能一开始就 ...
- golang中的slice翻转存在以及map中的key判断
//slice翻转 func stringReverse(src []string){ if src == nil { panic(fmt.Errorf("the src can't be ...
- jquery用一个事件控制另一个事件是否执行(不是删除事件)
想用click事件控制mouseover事件的执行,如果用删除绑定mouseover事件以后就不能再使用mouseover了,于是只需要设置一个全局变量,并赋值false,当点击click事件,将全局 ...
- 洛谷P1111 修复公路
题目背景 A地区在地震过后,连接所有村庄的公路都造成了损坏而无法通车.政府派人修复这些公路. 题目描述 给出A地区的村庄数N,和公路数M,公路是双向的.并告诉你每条公路的连着哪两个村庄,并告诉你什么时 ...
- 百度数据可视化图表套件echart实战
最近我一直在做数据可视化的前端工作,我用的最多的绘图工具是d3.d3有点像photoshop,功能很强大,例子也很多,但是学习成本也不低,做项目是需要较大人力投入的.3月底由在亚马逊工作的同学介绍下使 ...
- 杂谈 Taglib、EL、OGNL
本篇主要是自己对 JavaWeb中 EL.OGNL.Taglib 个人理解. 因为以上内容都是在 JSP 的基础下产生的概念, 首先得说说我眼中的 JSP. 她自由的.奔放的,就像辽阔的大海,跨越任何 ...
- Mysql 索引实现原理. 聚集索引, 非聚集索引
Mysql索引实现: B-tree,B是balance,一般用于数据库的索引.使用B-tree结构可以显著减少定位记录时所经历的中间过程,从而加快存取速度.而B+tree是B-tree的一个变种,My ...