Comparison

  LSA pLSA
1. Theoretical background Linear Algebra Probabilities and Statistics
2. Objective function Frobenius norm Likelihood function
3. Polysemy No Yes
4. Folding-in Straightforward Complicated

1. LSA stems from Linear Algebra as it is nothing more than a Singular Value Decomposition. On the other hand, pLSA has a strong probabilistic grounding (latent variable models).

2. SVD is a least squares method (it finds a low-rank matrix approximation that minimizes the Frobenius norm of the difference with the original matrix). Moreover, as it is well known in Machine Learning, the least squares solution corresponds to the Maximum Likelihood solution when experimental errors are gaussian. Therefore, LSA makes an implicit assumption of gaussian noise on the term counts. On the other hand, the objective function maximized in pLSA is the likelihood function of multinomial sampling.

The values in the concept-term matrix found by LSA are not normalized and may even contain negative values. On the other hand, values found by pLSA are probabilities which means they are interpretable and can be combined with other models.

Note: SVD is equivalent to PCA (Principal Component Analysis) when the data is centered (has zero-mean).

3. Both LSA and pLSA can handle synonymy but LSA cannot handle polysemy, as words are defined by a unique point in a space.

4. LSA and pLSA analyze a corpus of documents in order to find a new low-dimensional representation of it. In order to be comparable, new documents that were not originally in the corpus must be projected in the lower-dimensional space too. This is called “folding-in”. Clearly, new documents folded-in don’t contribute to learning the factored representation so it is necessary to rebuild the model using all the documents from time to time.

In LSA, folding-in is as easy as a matrix-vector product. In pLSA, this requires several iterations of the EM algorithm.

LSA和pLSA的比较的更多相关文章

  1. LSA,pLSA原理及其代码实现

    一. LSA 1. LSA原理 LSA(latent semantic analysis)潜在语义分析,也被称为 LSI(latent semantic index),是 Scott Deerwest ...

  2. 文本情感分析(一):基于词袋模型(VSM、LSA、n-gram)的文本表示

    现在自然语言处理用深度学习做的比较多,我还没试过用传统的监督学习方法做分类器,比如SVM.Xgboost.随机森林,来训练模型.因此,用Kaggle上经典的电影评论情感分析题,来学习如何用传统机器学习 ...

  3. LDA

    2 Latent Dirichlet Allocation Introduction LDA是给文本建模的一种方法,它属于生成模型.生成模型是指该模型可以随机生成可观测的数据,LDA可以随机生成一篇由 ...

  4. bow lsa plsa

    Bag-of-Words (BoW) 模型是NLP和IR领域中的一个基本假设.在这个模型中,一个文档(document)被表示为一组单词(word/term)的无序组合,而忽略了语法或者词序的部分.B ...

  5. 一口气讲完 LSA — PlSA —LDA在自然语言处理中的使用

    自然语言处理之LSA LSA(Latent Semantic Analysis), 潜在语义分析.试图利用文档中隐藏的潜在的概念来进行文档分析与检索,能够达到比直接的关键词匹配获得更好的效果. LSA ...

  6. Latent semantic analysis note(LSA)

    1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...

  7. NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型

    LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...

  8. DL4NLP——词表示模型(一)表示学习;syntagmatic与paradigmatic两类模型;基于矩阵的LSA和GloVe

    本文简述了以下内容: 什么是词表示,什么是表示学习,什么是分布式表示 one-hot representation与distributed representation(分布式表示) 基于distri ...

  9. [IR] Concept Search and PLSA

    [Topic Model]主题模型之概率潜在语义分析(Probabilistic Latent Semantic Analysis) 感觉LDA在实践中的优势其实不大,学好pLSA才是重点 阅读笔记 ...

随机推荐

  1. vue学习前奏——webpack

    "工欲善其事必先利其器",要想学习vue,首先需要我们去了解webpack,便于后期快速构建运行项目.废话不多说,下面开始介绍在开始一个vue项目前我们需要对webpack有一定的 ...

  2. ROS命令

    rospack find [package_name]功能:获取软件包的路径 例:运行 rospack find roscpp ,会返回 /opt/ros/indigo/share/roscppros ...

  3. ZendStudio-12.5.0-win32.win32.x86_64.msi官方版本及破解工具

    网上的工具试了好多,最后下载的这个工具成功了,之前的N个工具都失败了 亲自试用,表示有效!!! ZendStudio-12.5.0-win32.win32.x86_64.msi官方版本下载地址:  百 ...

  4. vue搭建项目前奏曲——vue-cli

    vue-cli是快速构建这个单页应用的脚手架,这个可是官方的.官方给的建议,如果你是初次尝试Vue,哪就老老实实用普通的书写引入js文件,这里牵扯太多的东西,例如webpack.Node.js.npm ...

  5. for循环的基础使用

    for循环:    for 变量名 in 列表:do           循环体    done    执行机制:           依次将列表中的元素赋值给“变量名”:每次赋值后即执行一次循环体: ...

  6. ajax 轮询 和 php长连接

     只看加粗的字体 js   部分       1:  ajax 成功回调函数中 一定要用时间函数间隔调用  get_comment(). get_comments('init'); function ...

  7. JAVA基础知识总结:五

    一.初步认识数组 1.理解数组 数组是用来存储相兼容数据类型的定长的容器 特点: a.只能存放相兼容数据类型,不能存放多种数据类型 b.可以存放基本数据类型和引用数据类型 c.数组是定长的,一旦被初始 ...

  8. HTML5的Websocket(理论篇 I)

    HTML5的Websocket(理论篇 I) ** 先请来TA的邻居:** http:无状态.基于tcp请求/响应模式的应用层协议 (A:哎呀,上次你请我吃饭了么? B:我想想, 上次请你吃了么) t ...

  9. LeetCode 461. Hamming Distance (汉明距离)

    The Hamming distance between two integers is the number of positions at which the corresponding bits ...

  10. LeetCode 105. Construct Binary Tree from Preorder and Inorder Traversal (用先序和中序树遍历来建立二叉树)

    Given preorder and inorder traversal of a tree, construct the binary tree. Note:You may assume that ...