What is an intuitive explanation of the relation between PCA and SVD?
What is an intuitive explanation of the relation between PCA and SVD?
36 FOLLOWERS










What is an intuitive explanation of the relation between PCA and SVD?

Mike Tamir, CSO - GalvanizeU accredited Masters program creating top tier Data Scientists...
The result of this process is a ranked list of "directions" in the feature space ordered from most variance to least. The directions along which there is greatest variance are referred to as the "principal components" (of variation in the data) and the common wisdom is that by focusing on the way the data is distributed along these dimensions exclusively, one can capture most of the information represented in in the original feature space without having to deal with such a high number of dimensions which can be of great benefit in statistical modeling and Data Science applications (see: When and where do we use SVD?).
What is the Formal Relation between SVD and PCA?
Let's let the matrix M be our data matrix where the m rows represents our data points and and the n columns represents the features of the data point. The data may already have been mean centered and normalized by the standard deviations column-wise (most off-the-shelf implementations provide these options).
SVD: Because in most cases a data matrix M will not have exactly the same number of data points as features (i.e. m≠n) the matrix M will not be a square matrix and a diagonalization of the from M=UΣUT where U is an m×m orthogonal matrix of the eigenvectors of M and Σ is the diagonal m×m matrix of the eigenvalues of M will not exist. However, in cases where n≠m, an analogue of this decomposition is possible and M can be factored as follows M=UΣVT, where
- U is an m×m orthogonal matrix of the the "left singular-vectors" of M.
- V is an n×n orthogonal matrix of the the "right singular-vectors" of M.
- And, Σ is an m×n matrix with non-zero entries Σi,i referred to as the "singular-values" of M.
- Note, u⃗ , v⃗ , and σ form a left singular-vector, right singular-vector, and singular-value triple for a given matrix M if they satisfy the following equations:
- Mv⃗ =σu⃗ and
- MTu⃗ =σv⃗
PCA: PCA sidesteps the problem of M not being diagonalizable by working directly with the n×n "covariance matrix" MTM. Because MTM is symmetric it is guaranteed to be diagonalizable. So PCA works by finding the eigenvectors of the covariance matrix and ranking them by their respective eigenvalues. The eigenvectors with the greatest eigenvalues are the Principal Components of the data matrix.
Now, a little bit of matrix algebra can be done to show that the Principal Components of a PCA diagonalization of the covariance matrix MTM are the same left-singular vectors that are found through SVD (i.e. the columns of matrix V) - the same as the principal components found through PCA:
From SVD we have M=UΣVT so...
- MTM=(UΣVT)T(UΣVT)
- MTM=(VΣTUT)(UΣVT)
- but since U is orthogonal UTU=I
so
- MTM=VΣ2VT
where Σ2 is an n×n diagonal matrix with the diagonal elements Σ2i,i from the matrix Σ. So the matrix of eigenvectors V in PCA are the same as the singular vectors from SVD, and the eigenvalues generated in PCA are just the squares of the singular values from SVD.
So is it ever better to use SVD over PCA?
Yes. While formally both solutions can be used to calculate the same principal components and their corresponding eigen/singular values, the extra step of calculating the covariance matrix MTM can lead to numerical rounding errors when calculating the eigenvalues/vectors.
Related Questions
I'll be assuming your data matrix is an m×n matrix that is organized such that rows are data samples (m samples), and columns are features (d features).
The first point is that SVD preforms low rank matrix approximation.
Your input to SVD is a number k (that is smaller than m or d), and the SVD procedure will return a set of k vectors of d dimensions (can be organized in a k×d matrix), and a set of k coefficients for each data sample (there are m data samples, so it can be organized in a m×k matrix), such that for each sample, the linear combination of it's k coefficients multiplied by the k vectors best reconstructs that data sample (in the euclidean distance sense). and this is true for all data samples.
So in a sense, the SVD procedure finds the optimum k vectors that together span a subspace in which most of the data samples lie in (up to a small reconstruction error).
PCA on the other hand is:
1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.
So, the second point is that PCA is giving you as output the subspace thatspans the deviations from the mean data sample, and SVD provides you with a subspace that spans the data samples themselves (or, you can view this as a subspace that spans the deviations from zero).
Note that these two subspaces are usually NOT the same, and will be the same only if the mean data sample is zero.
In order to understand a little better why they are not the same, let's think of a data set where all features values for all data samples are in the range 999-1001, and each feature's mean is 1000.
From the SVD point of view, the main way in which these sample deviate from zero are along the vector (1,1,1,...,1).
From the PCA point of view, on the other hand, the main way in which these data samples deviate from the mean data sample is dependent on the precise data distributions around the mean data sample...
In short, we can think of SVD as "something that compactly summarizes the main ways in which my data is deviating from zero" and PCA as "something that compactly summarizes the main ways in which my data is deviating from the mean data sample".
What is an intuitive explanation of the relation between PCA and SVD?的更多相关文章
- False Discovery Rate, a intuitive explanation
[转载请注明出处]http://www.cnblogs.com/mashiqi Today let's talk about a intuitive explanation of Benjamini- ...
- [转]An Intuitive Explanation of Convolutional Neural Networks
An Intuitive Explanation of Convolutional Neural Networks https://ujjwalkarn.me/2016/08/11/intuitive ...
- An Intuitive Explanation of Fourier Theory
Reprinted from: http://cns-alumni.bu.edu/~slehar/fourier/fourier.html An Intuitive Explanation of Fo ...
- An Intuitive Explanation of Convolutional Neural Networks
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ An Intuitive Explanation of Convolu ...
- 一目了然卷积神经网络 - An Intuitive Explanation of Convolutional Neural Networks
An Intuitive Explanation of Convolutional Neural Networks 原文地址:https://ujjwalkarn.me/2016/08/11/intu ...
- 从 Quora 的 187 个问题中学习机器学习和NLP
从 Quora 的 187 个问题中学习机器学习和NLP 原创 2017年12月18日 20:41:19 作者:chen_h 微信号 & QQ:862251340 微信公众号:coderpai ...
- PCA,SVD
PCA的数学原理 https://www.zhihu.com/question/34143886/answer/196294308 奇异值分解的揭秘(二):降维与奇异向量的意义 奇异值分解的揭秘(一) ...
- Goldstone's theorem(转载)
Goldstone's theorem是凝聚态物理中的重要定理之一.简单来说,定理指出:每个自发对称破缺都对应一个无质量的玻色子(准粒子),或者说一个zero mode. 看过文章后,我个人理解这其实 ...
- Why one-norm is an agreeable alternative for zero-norm?
[转载请注明出处]http://www.cnblogs.com/mashiqi Today I try to give a brief inspection on why we always choo ...
随机推荐
- Scrum6.0
一,组员任务完成情况 首页设计初步完成但是需要优化界面,只能简单的输出信息和在首页进行登录.界面极其简单. 鸡汤版面设计有困难,问题在于用何种形式来管理用户的数据上传,但是经过小组间的讨论确定设计方向 ...
- Linux基础五(系统管理)
Linux 系统管理 1. 进程管理 1.1 进程管理简介 进程的简介: 一个程序在运行的时候会占用系统的资源,即系统分配资源给某个程序使用,进程就是正在运行中的某个程序或者命令.进程又可以细分为线程 ...
- 【Nginx笔记】 fastcgi_param解释
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;#脚本文件请求的路径 fastcgi_param QUERY_STRI ...
- QString,string,char* 在utf8和gbk不同编码下的相互转化
关于编码简介:ascii编码是最开始的编码规则本,里面只收纳了英文.特殊字符.数字等有限字符,采用的是8位一个字节的方式进行编码对照:unicode在ascii码的基础上进行了升级扩展,立志将全世界所 ...
- Musical Theme POJ - 1743(后缀数组+二分)
求不可重叠最长重复子串 对于height[i]定义为sa[i]和 sa[i-1]的最长公共前缀 这个最长公共前缀的值肯定是最大的 证明: 设rank[j] < rank[k], 则不难证明后缀j ...
- [POJ1704]Georgia and Bob 博弈论
从这开始我们来进入做题环节!作为一个较为抽象的知识点,博弈论一定要结合题目才更显魅力.今天,我主要介绍一些经典的题目,重点是去理解模型的转化,sg函数的推理和证明.话不多说,现在开始! Georgia ...
- 【bzoj3195】 Jxoi2012—奇怪的道路
http://www.lydsy.com/JudgeOnline/problem.php?id=3195 (题目链接) 题意 一张$n$个点$m$条边的无向图,每个点度数为偶数,一个点只能向标号与它的 ...
- 解题:JSOI 2016 最佳团体
题面 0/1分数规划+树形背包检查 要求$\frac{\sum P_i}{\sum S_i}的最大值,$按照0/1分数规划的做法,二分一个mid之后把式子化成$\sum P_i=\sum S_i*mi ...
- 2018.9.20 Educational Codeforces Round 51
蒟蒻就切了四道水题,然后EF看着可写然而并不会,中间还WA了一次,我太菜了.jpg =.= A.Vasya And Password 一开始看着有点虚没敢立刻写,后来写完第二题发现可以暴力讨论,因为保 ...
- fzyzojP3580 -- [校内训练-互测20180315]小基的高智商测试
题目还有一个条件是,x>y的y只会出现一次(每个数直接大于它的只有一个) n<=5000 是[HNOI2015]实验比较 的加强版 g(i,j,k)其实可以递推:g(i,j,k)=g(i- ...