Weka EM covariance

description 1:

Dear All,

I am trying to find out what is the real meaning of the minStdDev parameter in the EM clustering algorithm. Can anyone help me?

I have not looked at the code, but I suspect that the minStdDev is used as the first estimate of the covariance of a Gaussian in the mixture model. Am I correct?

I have found the equations or perhaps similar equations to the ones used to calculate the parameters for a Gaussian mixture model in the EM algorithm and there are three, which have these functions:

    The first one calculates the probability of each Gaussian.
    The second calculates the mean of each Gaussian
    The third calculates the covariance matrix of each Gaussian

But this means to start off with there has to be an initial guess at the parameters for the Gaussian mixture model ie the probability or weighting factor for each Gaussian is needed, as is the mean and Covariance matrix.

If I am wrong how is the EM algorithm initiated ie how is the initial guess at the mixture model arrived at? Does minStdDev have any part to play in it? Also is a full covariance matrix calculated in the EM algorithm or are just the standard deviations or variances calculated, ie are right elliptical Gaussians used?

I am guessing that the random number generator is used to pick one or more data points at random as initial values for the means.

This question really follows up on my previous postings about differences between Mac and PC using the EM algorithm and worries about the stability of the algorithm. I was (naively) using the default value of 1.0E-6. However after a reply to a previous posting I have tried scaling the data to be between -1 and +1 and alsozero mean and unit SD. When I try these scaled data sets Mac and PC produce the same result. So I realised that ought to think about the value of minStdDev.

Many thanks for your help in advance.

John Black

description 2:

EM in java is a naive implementation. That is, it treats each  
attribute independently of the others given the cluster (much the same  
as naive Bayes for classification). Therefore, a full covariance  
matrix is not computed, just the means and standard deviations of each  
numeric attribute.

The minStdDev parameter is there simply to help prevent numerical  
problems. This can be a problem when multiplying large densities  
(arising from small standard deviations) when there are many singleton  
or near-singleton values. The standard deviation for a given attribute  
will not be allowed to be less than the minStdDev value.

EM is initialized with the best result out of 10 executions of  
SimpleKMeans (with different seed values).

Hope this helps.

Cheers,
Mark.

Weka EM 协方差的更多相关文章

  1. Weka:call for the EM algorithm to achieve clustering.(EM算法)

    EM算法: 在Eclipse中写出读取文件的代码然后调用EM算法计算输出结果: package EMAlg; import java.io.*; import weka.core.*; import ...

  2. Weka中EM算法详解

    private void EM_Init (Instances inst) throws Exception { int i, j, k; // 由于EM算法对初始值较敏感,故选择run k mean ...

  3. GMM的EM算法实现

    转自:http://blog.csdn.net/abcjennifer/article/details/8198352 在聚类算法K-Means, K-Medoids, GMM, Spectral c ...

  4. 【EM】代码理解

    本来想自己写一个EM算法的,但是操作没两步就进行不下去了.对那些数学公式着实不懂.只好从网上找找代码,看看别人是怎么做的. 代码:来自http://blog.sina.com.cn/s/blog_98 ...

  5. 高斯混合聚类及EM实现

    一.引言 我们谈到了用 k-means 进行聚类的方法,这次我们来说一下另一个很流行的算法:Gaussian Mixture Model (GMM).事实上,GMM 和 k-means 很像,不过 G ...

  6. [转载]GMM的EM算法实现

    在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...

  7. GMM及EM算法

    GMM及EM算法 标签(空格分隔): 机器学习 前言: EM(Exception Maximizition) -- 期望最大化算法,用于含有隐变量的概率模型参数的极大似然估计: GMM(Gaussia ...

  8. GMM的EM算法

    在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...

  9. EM算法 大白话讲解

    假设有一堆数据点,它是由两个线性模型产生的.公式如下: 模型参数为a,b,n:a为线性权值或斜率,b为常数偏置量,n为误差或者噪声. 一方面,假如我们被告知这两个模型的参数,则我们可以计算出损失. 对 ...

随机推荐

  1. Service完全解析(转)

    今天我们来讲一下Android中Service的相关内容. Service在Android中和Activity是属于同一级别上的组件,我们可以将他们认为是两个好哥们,Activity仪表不凡,迷倒万千 ...

  2. Mac下配置环境变量

    1.创建并以 TextEdit 的方式打开 ~/.bash_profile 文件,如果没有则 touch ~/.bash_profile; 然后打开 vim ~/.bash_profile 2.新增环 ...

  3. 可以Ping通和DNS解析,但打不开网页的解决办法

    一. 网络故障表现为: 1.Ping地址正常,能ping通任何本来就可以ping通地址,如网关.域名. 2.能DNS解析域名. 3.无法打开网页,感觉是网页打开的一瞬间就显示无网络连接. 4.只需要连 ...

  4. 浅谈MySQL Replication(复制)基本原理

    1.MySQL Replication复制进程MySQL的复制(replication)是一个异步的复制,从一个MySQL instace(称之为Master)复制到另一个MySQL instance ...

  5. Android应用开发学习—Toast使用方法大全

    Toast 是一个 View 视图,快速的为用户显示少量的信息. Toast 在应用程序上浮动显示信息给用户,它永远不会获得焦点,不影响用户的输入等操作,主要用于 一些帮助 / 提示. Toast 最 ...

  6. 【进阶——最小费用最大流】hdu 1533 Going Home (费用流)Pacific Northwest 2004

    题意: 给一个n*m的矩阵,其中由k个人和k个房子,给每个人匹配一个不同的房子,要求所有人走过的曼哈顿距离之和最短. 输入: 多组输入数据. 每组输入数据第一行是两个整型n, m,表示矩阵的长和宽. ...

  7. Hanoi塔问题

    说明:河内之塔(Towers of Hanoi)是法国人M.Claus(Lucas)于1883年从泰国带至法国的,河内为越战时北越的首都,即现在的胡志明市:1883年法国数学家 Edouard Luc ...

  8. Ubuntu 升级到13.10之后出现Apache2启动失败的问题

    昨天看到Ubuntu 13.04提示有新的发行版Ubuntu 13.10了,手痒了一下,没有忍住就升级了. 结果升级完毕之后发现Apache2服务启动失败了,失败信息是: Invalid comman ...

  9. 判断CString字符串中各位是数字,大小写字母,符号,汉字.xml

    pre{ line-height:1; color:#1e1e1e; background-color:#e9e9ff; font-size:16px;}.sysFunc{color:#627cf6; ...

  10. kali ssh 登录

    kali 开启ssh 登录:可在windows 中通过 xshell 登录,方便操作. 修改sshd_config文件, vi /etc/ssh/sshd_config 将#PasswordAuthe ...