What is an intuitive explanation of the relation between PCA and SVD?

What is an intuitive explanation of the relation between PCA and SVD?

 
3 Answers

Mike Tamir, CSO - GalvanizeU accredited Masters program creating top tier Data Scientists...

5.2k Views • Upvoted by Ricky Kwok, Ph.D. in Applied Math from UC Davis
 
There is a very direct mathematical relation between SVD (Singular Value Decomposition) and PCA (Principal Component Analysis) - see below.  For this reason, the two algorithms deliver essentially the same result: a set of "new axes" constructed from linear combinations of the original the feature space axes in which the dataset is plotted.  These “new axes” are useful because they systematically break down the variance in the data points (how widely the data points are distributed) based on each direction's contribution to the variance in the data:



The result of this process is a ranked list of "directions" in the feature space ordered from most variance to least.  The directions along which there is greatest variance are referred to as the "principal components" (of variation in the data) and the common wisdom is that by focusing on the way the data is distributed along these dimensions exclusively, one can capture most of the information represented in in the original feature space without having to deal with such a high number of dimensions which can be of great benefit in statistical modeling and Data Science applications (see: When and where do we use SVD?).

What is the Formal Relation between SVD and PCA?
Let's let the matrix M be our data matrix where the m rows represents our data points and and the n columns represents the features of the data point.  The data may already have been mean centered and normalized by the standard deviations column-wise (most off-the-shelf implementations provide these options).

SVD: Because in most cases a data matrix M will not have exactly the same number of data points as features (i.e. m≠n) the matrix M will not be a square matrix and a diagonalization of the from M=UΣUT where U is an m×m orthogonal matrix of the eigenvectors of M and Σ is the diagonal m×m matrix of the eigenvalues of M will not exist.  However, in cases where n≠m, an analogue of this decomposition is possible and M can be factored as follows M=UΣVT, where 

  1. U is an m×m orthogonal matrix of the the "left singular-vectors" of M.
  2. V is an n×n orthogonal matrix of the the "right singular-vectors" of M.
  3. And, Σ is an m×n matrix with non-zero entries Σi,i referred to as the  "singular-values" of M.
  • Note, u⃗ , v⃗ , and σ form a left singular-vector, right singular-vector, and singular-value triple for a given matrix M if they satisfy the following equations:
  • Mv⃗ =σu⃗  and
  • MTu⃗ =σv⃗

PCA: PCA sidesteps the problem of M not being diagonalizable by working directly with the n×n "covariance matrix" MTM.  Because MTM is symmetric it is guaranteed to be diagonalizable.  So PCA works by finding the eigenvectors of the covariance matrix and ranking them by their respective eigenvalues.  The eigenvectors with the greatest eigenvalues are the Principal Components of the data matrix.

Now, a little bit of matrix algebra can be done to show that the Principal Components of a PCA diagonalization of the covariance matrix MTM are the same left-singular vectors that are found through SVD (i.e. the columns of matrix V) - the same as the principal components found through PCA:

From SVD we have M=UΣVT so...

  • MTM=(UΣVT)T(UΣVT)
  • MTM=(VΣTUT)(UΣVT)
  • but since U is orthogonal UTU=I

so

  • MTM=VΣ2VT

where  Σ2 is an n×n diagonal matrix with the diagonal elements Σ2i,i from the matrix Σ.  So the matrix of eigenvectors V in PCA are the same as the singular vectors from SVD, and the eigenvalues generated in PCA are just the squares of the singular values from SVD.

So is it ever better to use SVD over PCA?
Yes. While formally both solutions can be used to calculate the same principal components and their corresponding eigen/singular values, the extra step of calculating the covariance matrix MTM can lead to numerical rounding errors when calculating the eigenvalues/vectors.

  

David Beniaguev

486 Views
 
I would like to refine two points that I think are important:

I'll be assuming your data matrix is an m×n matrix that is organized such that rows are data samples (m samples), and columns are features (d features).

The first point is that SVD preforms low rank matrix approximation.
Your input to SVD is a number k (that is smaller than m or d), and the SVD procedure will return a set of k vectors of d dimensions (can be organized in a k×d matrix), and a set of k coefficients for each data sample (there are m data samples, so it can be organized in a m×k matrix), such that for each sample, the linear combination of it's k coefficients multiplied by the k vectors best reconstructs that data sample (in the euclidean distance sense). and this is true for all data samples. 
So in a sense, the SVD procedure finds the optimum k vectors that together span a subspace in which most of the data samples lie in (up to a small reconstruction error).

PCA on the other hand is:
1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.

So, the second point is that PCA is giving you as output the subspace thatspans the deviations from the mean data sample, and SVD provides you with a subspace that spans the data samples themselves (or, you can view this as a subspace that spans the deviations from zero).

Note that these two subspaces are usually NOT the same, and will be the same only if the mean data sample is zero.

In order to understand a little better why they are not the same, let's think of a data set where all features values for all data samples are in the range 999-1001, and each feature's mean is 1000.

From the SVD point of view, the main way in which these sample deviate from zero are along the vector (1,1,1,...,1). 
From the PCA point of view, on the other hand, the main way in which these data samples deviate from the mean data sample is dependent on the precise data distributions around the mean data sample...

In short, we can think of SVD as "something that compactly summarizes the main ways in which my data is deviating from zero" and PCA as "something that compactly summarizes the main ways in which my data is deviating from the mean data sample".

  

Tigran Ishkhanov

1.3k Views
 
PCA is a statistical technique in which SVD is used as a low level linear algebra algorithm. One can apply SVD to any matrix C. In PCA this matrix C arises from the data and has a statistical meaning - the element c_ij is a covariance between i-th and j-th coordinates of your dataset after mean-normalization.

  
 
 
Related Questions

What is an intuitive explanation of the relation between PCA and SVD?的更多相关文章

  1. False Discovery Rate, a intuitive explanation

    [转载请注明出处]http://www.cnblogs.com/mashiqi Today let's talk about a intuitive explanation of Benjamini- ...

  2. [转]An Intuitive Explanation of Convolutional Neural Networks

    An Intuitive Explanation of Convolutional Neural Networks https://ujjwalkarn.me/2016/08/11/intuitive ...

  3. An Intuitive Explanation of Fourier Theory

    Reprinted from: http://cns-alumni.bu.edu/~slehar/fourier/fourier.html An Intuitive Explanation of Fo ...

  4. An Intuitive Explanation of Convolutional Neural Networks

    https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ An Intuitive Explanation of Convolu ...

  5. 一目了然卷积神经网络 - An Intuitive Explanation of Convolutional Neural Networks

    An Intuitive Explanation of Convolutional Neural Networks 原文地址:https://ujjwalkarn.me/2016/08/11/intu ...

  6. 从 Quora 的 187 个问题中学习机器学习和NLP

    从 Quora 的 187 个问题中学习机器学习和NLP 原创 2017年12月18日 20:41:19 作者:chen_h 微信号 & QQ:862251340 微信公众号:coderpai ...

  7. PCA,SVD

    PCA的数学原理 https://www.zhihu.com/question/34143886/answer/196294308 奇异值分解的揭秘(二):降维与奇异向量的意义 奇异值分解的揭秘(一) ...

  8. Goldstone's theorem(转载)

    Goldstone's theorem是凝聚态物理中的重要定理之一.简单来说,定理指出:每个自发对称破缺都对应一个无质量的玻色子(准粒子),或者说一个zero mode. 看过文章后,我个人理解这其实 ...

  9. Why one-norm is an agreeable alternative for zero-norm?

    [转载请注明出处]http://www.cnblogs.com/mashiqi Today I try to give a brief inspection on why we always choo ...

随机推荐

  1. 使用Fabric自动化你的任务

    Fabric是一个Python库,可以通过SSH在多个host上批量执行任务.你可以编写任务脚本,然后通过Fabric在本地就可以使用SSH在大量远程服务器上自动运行.这些功能非常适合应用的自动化部署 ...

  2. [BUAA_SE_2017]提问回顾

    提问回顾 学期初疑问回答 学期初疑问博客 教材中说,PM在衡量需求时需要方方面面的能力与研究.可是,当下许多互联网IT公司只承担外包业务,即客户给什么需求就实现什么需求,甚至可能不要求其它先进的功能. ...

  3. [BUAA_SE_2017]代码复审-Week2

    代码复审 CheckList 1.概要部分 代码能符合需求和规格说明么? 符合,经过-c及-s合法参数测试,程序均能生成.求解相应数独. 代码设计是否有周全的考虑? 对于非法输入,程序处理不够周全. ...

  4. Robot Framework 教程 (6) - 使用条件表达式

    本篇文章,主要对如何在Robot Framework中使用条件表达式做过程控制作说明. 按照Robot Framework的官方文档介绍,Robot Framework并不建议在TestCase或Ke ...

  5. (二)Jmeter各部件的作用

    JMeter主要组件介绍 1.测试计划(Test Plan)是使用 JMeter 进行测试的起点,它是其它 JMeter 测试元件的容器. 2.线程组(Thread Group)代表一定数量的并发用户 ...

  6. MFC各种属性设置

    在使用MFC的时候经常需要对例如对话框的外观进行一些设置.MFC哪些属性的含义和设置可以参照博客: http://www.cnblogs.com/lzmfywz/archive/2012/04/20/ ...

  7. nowcoder 202H-卡牌游戏

    题目链接 题目描述 小贝喜欢玩卡牌游戏.某个游戏体系中共有N种卡牌,其中M种是稀有的.小贝每次和电脑对决获胜之后都会有一个抽卡机会,这时系统会随机从N种卡中选择一张给小贝.普通卡可能多次出现,而稀有卡 ...

  8. 【Java并发编程】之十三:生产者—消费者模型

    生产者消费者问题是线程模型中的经典问题:生产者和消费者在同一时间段内共用同一存储空间,生产者向空间里生产数据,而消费者取走数据. ​ 这里实现如下情况的生产--消费模型: ​ 生产者不断交替地生产两组 ...

  9. MyFlash闪回恢复数据

    使用限制: .binlog格式必须为row,且binlog_row_image=full. .仅支持5.6与5.. .只能回滚DML(增.删.改). .mysqlbinlog版本请保持一致. 1.安装 ...

  10. linux 第三周读书笔记-----第一二章 20135334赵阳林

    第一章 Linux内核简介 1.1 Unix的历史 由于Unix系统设计简洁并且在发布时提供源代码,所以许多其他组织和团体都对它进了进一步的开发. Unⅸ虽然已经使用了40年,但计算机科学家仍然认为它 ...