Topic modeling【经典模型】
http://www.cs.princeton.edu/~blei/topicmodeling.html
Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.
Below, you will find links to introductory materials, corpus browsers based on topic models, and open source software (from my research group) for topic modeling.
Introductory materials
- I wrote a general introduction to topic modeling.
- John Lafferty and I wrote a more technical review paper about this field.
- Here are slides from some recent tutorials about topic modeling:
- Here is a video from a talk on dynamic and correlated topic models applied to the journal Science . (Here are the slides.)
- David Mimno maintains a bibliography of topic modeling papers and software.
- The topic models mailing list is a good forum for discussing topic modeling.
Corpus browsers based on topic models
The structure uncovered by topic models can be used to explore an otherwise unorganized collection. The following are browsers of large collections of documents, built with topic models.
- A 100-topic browser of the dynamic topic model fit to Science (1882-2001).
- A 100-topic browserof the correlated topic model fit to Science (1980-2000)
- A 50-topic browser of latent Dirichlet allocation fit to the 2006 arXiv.
- A 20-topic browserof latent Dirichlet allocation fit to The American Political Science Review
Also see Sean Gerrish's discipline browser for an interesting application of topic modeling at JSTOR.
To build your own browsers, see Allison Chaney's excellent Topic Model Visualization Engine(TMVE). For example, here is a browser of 100,000 Wikipedia articles that uses TMVE.
Topic modeling software
Our research group has released many open-source software packages for topic modeling. Please post questions, comments, and suggestions about this code to the topic models mailing list.
| Link | Model/Algorithm | Language | Author | Notes |
| lda-c | Latent Dirichlet allocation | C | D. Blei | This implements variational inference for LDA. |
| class-slda | Supervised topic models for classifiation | C++ | C. Wang | Implements supervised topic models with a categorical response. |
| lda | R package for Gibbs sampling in many models | R | J. Chang | Implements many models and is fast . Supports LDA, RTMs (for networked documents), MMSB (for network data), and sLDA (with a continuous response). |
| online lda | Online inference for LDA | Python | M. Hoffman | Fits topic models to massive data. The demo downloads random Wikipedia articles and fits a topic model to them. |
| online hdp | Online inference for the HDP | Python | C. Wang | Fits hierarchical Dirichlet process topic models to massive data. The algorithm determines the number of topics. |
| tmve(online) | Topic Model Visualization Engine | Python | A. Chaney | A package for creating corpus browsers. See, for example,Wikipedia. |
| ctr | Collaborative modeling for recommendation | C++ | C. Wang | Implements variational inference for a collaborative topic models. These models recommend items to users based on item content and other users' ratings. |
| dtm | Dynamic topic models and the influence model | C++ | S. Gerrish | This implements topics that change over time and a model of how individual documents predict that change. |
| hdp | Hierarchical Dirichlet processes | C++ | C. Wang | Topic models where the data determine the number of topics. This implements Gibbs sampling. |
| ctm-c | Correlated topic models | C | D. Blei | This implements variational inference for the CTM. |
| diln | Discrete infinite logistic normal | C | J. Paisley | This implements the discrete infinite logistic normal, a Bayesian nonparametric topic model that finds correlated topics. |
| hlda | Hierarchical latent Dirichlet allocation | C | D. Blei | This implements a topic model that finds a hierarchy of topics. The structure of the hierarchy is determined by the data. |
| turbotopics | Turbo topics | Python | D. Blei | Turbo topics find significant multiword phrases in topics. |
Topic modeling【经典模型】的更多相关文章
- 用GibbsLDA做Topic Modeling
http://weblab.com.cityu.edu.hk/blog/luheng/2011/06/24/%E7%94%A8gibbslda%E5%81%9Atopic-modeling/#comm ...
- 论文《Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling》
Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling 一.主要贡献 1. pro ...
- 【Keras篇】---利用keras改写VGG16经典模型在手写数字识别体中的应用
一.前述 VGG16是由16层神经网络构成的经典模型,包括多层卷积,多层全连接层,一般我们改写的时候卷积层基本不动,全连接层从后面几层依次向前改写,因为先改参数较小的. 二.具体 1.因为本文中代码需 ...
- 【神经网络篇】--基于数据集cifa10的经典模型实例
一.前述 本文分享一篇基于数据集cifa10的经典模型架构和代码. 二.代码 import tensorflow as tf import numpy as np import math import ...
- 【BZOJ 3232】圈地游戏 二分+SPFA判环/最小割经典模型
最小割经典模型指的是“一堆元素进行选取,对于某个元素的取舍有代价或价值,对于某些对元素,选取后会有额外代价或价值”的经典最小割模型,建立倒三角进行最小割.这个二分是显然的,一开始我也是想到了最小割的那 ...
- 大话CNN经典模型:VGGNet
2014年,牛津大学计算机视觉组(Visual Geometry Group)和Google DeepMind公司的研究员一起研发出了新的深度卷积神经网络:VGGNet,并取得了ILSVRC20 ...
- 大话CNN经典模型:AlexNet
2012年,Alex Krizhevsky.Ilya Sutskever在多伦多大学Geoff Hinton的实验室设计出了一个深层的卷积神经网络AlexNet,夺得了2012年ImageNet LS ...
- 大话CNN经典模型:LeNet
近几年来,卷积神经网络(Convolutional Neural Networks,简称CNN)在图像识别中取得了非常成功的应用,成为深度学习的一大亮点.CNN发展至今,已经有很多变种,其中有 ...
- 【思维题 经典模型】cf632F. Magic Matrix
非常妙的经典模型转化啊…… You're given a matrix A of size n × n. Let's call the matrix with nonnegative elements ...
随机推荐
- ActionContext介绍(在Struts2中)
一种属性的有序序列,它们为驻留在环境内的对象定义环境.在对象的激活过程中创建上下文,对象被配置为要求某些自动服务,如同步.事务.实时激活.安全性等等.多个对象可以存留在一个上下文内.也有根据上下文理解 ...
- opencv之图像滤波
均值滤波 均值滤波函数cv2.blur() import cv2 img = cv2.imread('01.jpg') blur = cv2.blur(img,(5,5)) cv2.imshow(&q ...
- fn project 试用之后的几个问题
今天试用fnproject 之后自己有些思考,后面继续解决 1. 目前测试是强依赖 dockerhub 的,实际可能不是很方便 2. 如何与k8s .mesos.docker swarm 集成 ...
- liunx基础(5)
第十三单元 硬盘分区.格式化及文件系统的管理二 1. 文件系统的挂载与卸载(详见linux系统管理P406)1) 掌握挂载的定义:挂载指将一个设备(通常是存储设备)挂接到一个已存在的目录上.2) 掌握 ...
- iPhone之IOS5内存管理(ARC技术概述)
ARC(Automatic Reference Counting )技术概述 此文章由Tom翻译,首发于csdn的blog,任何人都可以转发,但是请保留原始链接和翻译者得名字.多谢! Automati ...
- java代码关于匿名内部类和接口的方法使用
总结:主要是多个按钮实现监听时,能够响应不同的事件 以上步骤我们可以用多种方法实现.但人们通常用二种方法.第一种方法是只利用一个监听器以及多个if语句来决定是哪个组件产生的事件:第二种方法是使用多个内 ...
- PTA PAT排名汇总(25 分)
PAT排名汇总(25 分) 计算机程序设计能力考试(Programming Ability Test,简称PAT)旨在通过统一组织的在线考试及自动评测方法客观地评判考生的算法设计与程序设计实现能力,科 ...
- Java-API-Package:java.sql百科
ylbtech-Java-API-Package:java.sql百科 提供使用 JavaTM 编程语言访问并处理存储在数据源(通常是一个关系数据库)中的数据的 API.此 API 包括一个框架,凭借 ...
- Vue.js:计算属性
ylbtech-Vue.js:计算属性 1.返回顶部 1. Vue.js 计算属性 计算属性关键词: computed. 计算属性在处理一些复杂逻辑时是很有用的. 可以看下以下反转字符串的例子: 实例 ...
- Py修行路 python基础 (五)三元运算 字符编码 元组 集合 三级菜单优化!
三元运算 条件判断不能加冒号: a=3 b=5 c=a if a<b else b oct() 转成八进制的简写:16进制 标志:BH为后缀或是0x为前缀hex() 转成16进制 元组 跟列表是 ...