（转）Predictive learning vs. representation learning 预测学习与表示学习

Predictive learning vs. representation learning 预测学习与表示学习

When you take a machine learning class, there’s a good chance it’s divided into a unit on supervised learning and a unit on unsupervised learning. We certainly care about this distinction for a practical reason: often there’s orders of magnitude more data available if we don’t need to collect ground-truth labels. But we also tend to think it matters for more fundamental reasons. In particular, the following are some common intuitions:

In supervised learning, the particular algorithm is usually less important than engineering and tuning it really well. In unsupervised learning, we’d think carefully about the structure of the data and build a model which reflects that structure.
In supervised learning, except in small-data settings, we throw whatever features we can think of at the problem. In unsupervised learning, we carefully pick the features we think best represent the aspects of the data we care about.
Supervised learning seems to have many algorithms with strong theoretical guarantees, and unsupervised learning very few.
Off-the-shelf algorithms perform very well on a wide variety of supervised tasks, but unsupervised learning requires more care and expertise to come up with an appropriate model.

I’d argue that this is deceptive. I think real division in machine learning isn’t between supervised and unsupervised, but what I’ll term predictive learning and representation learning. I haven’t heard it described in precisely this way before, but I think this distinction reflects a lot of our intuitions about how to approach a given machine learning problem.

In predictive learning, we observe data drawn from some distribution, and we are interested in predicting some aspect of this distribution. In textbook supervised learning, for instance, we observe a bunch of pairs , and given some new example , we’re interested in predicting something about the corresponding . In density modeling (a form of unsupervised learning), we observe unlabeled data , and we are interested in modeling the distribution the data comes from, perhaps so we can perform inference in that distribution. In each of these cases, there is a well-defined predictive task where we try to predict some aspect of the observable values possibly given some other aspect.

In representation learning, our goal isn’t to predict observables, but to learn something about the underlying structure. In cognitive science and AI, a representation is a formal system which maps to some domain of interest in systematic ways. A good representation allows us to answer queries about the domain by manipulating that system. In machine learning, representations often take the form of vectors, either real- or binary-valued, and we can manipulate these representations with operations like Euclidean distance and matrix multiplication. For instance, PCA learns representations of data points as vectors. We can ask how similar two data points are by checking the Euclidean distance between them.

In representation learning, the goal isn’t to make predictions about observables, but to learn a representation which would later help us to answer various queries. Sometimes the representations are meant for people, such as when we visualize data as a two-dimensional embedding. Sometimes they’re meant for machines, such as when the binary vector representations learned by deep Boltzmann machines are fed into a supervised classifier. In either case, what’s important is that mathematical operations map to the underlying relationships in the data in systematic ways.

Whether your goal is prediction or representation learning influences the sorts of techniques you’ll use to solve the problem. If you’re doing predictive learning, you’ll probably try to engineer a system which exploits as much information as possible about the data, carefully using a validation set to tune parameters and monitor overfitting. If you’re doing representation learning, there’s no good quantitative criterion, so you’ll more likely build a model based on your intuitions about the domain, and then keep staring at the learned representations to see if they make intuitive sense.

In other words, it parallels the differences I listed above between supervised and unsupervised learning. This shouldn’t be surprising, because the two dimensions are strongly correlated: most supervised learning is predictive learning, and most unsupervised learning is representation learning. So to see which of these dimensions is really the crux of the issue, let’s look at cases where the two differ.

Language modeling is a perfect example of an application which is unsupervised but predictive. The goal is to take a large corpus of unlabeled text (such as Wikipedia) and learn a distribution over English sentences. The problem is motivated by Bayesian models for speech recognition: a distribution over sentences can be used as a prior for what a person is likely to say. The goal, then, is to model the distribution, and any additional structure is unnecessary. Log-linear models, such as that of Mnih et al. [1], are very good at this, and recurrent neural nets [2] are even better. These are the sorts of approaches we’d normally apply in a supervised setting: very good at making predictions, but often hard to interpret. One state-of-the-art algorithm for density modeling of text is PAQ [3], which is a heavily engineered ensemble of sequential predictors, somewhat reminiscent of the winning entries of the Netflix competition.

On the flip side, supervised neural nets are often used to learn representations. One example is Collobert-Weston networks [4], which attempt to solve a number of supervised NLP tasks by learning representations which are shared between them. Some of the tasks are fairly simple and have a large amount of labeled data, such as predicting which of two words should be used to fill in the blank. Others are harder and have less data available, such as semantic role labeling. The simpler tasks are artificial, and they are there to help learn a representation of words and phrases as vectors, where similar words and phrases map to nearby vectors; this representation should then help performance on the harder tasks. We don’t care about the performance on those tasks per se; we care whether the learned embeddings reflect the underlying structure. To debug and tune the algorithm, we’d focus on whether the representations make intuitive sense, rather than on the quantitative performance. There are no theoretical guarantees that such an approach would work — it all depends on our intuitions of how the different tasks are related.

Based on these two examples, it seems like it’s the predictive/representation dimension which determines how we should approach the problem, rather than supervised/unsupervised.

In machine learning, we tend to think there’s no solid theoretical framework for unsupervised learning. But really, the problem is that we haven’t begun to formally characterize the problem of representation learning. If you just want to build a density modeler, that’s about as well understood as the supervised case. But if the goal is to learn representations which capture the underlying structure, that’s much harder to formalize. In my next post, I’ll try to take a stab at characterizing what representation learning is actually about.

[1] Mnih, A., and Hinton, G. E. Three new graphical models for statistical language modeling. NIPS 2009

[2] Sutskever, I., Martens, J., and Hinton, G. E. Generating text with recurrent neural networks. ICML 2011

[3] Mahoney, M. Adaptive weighting of context models for lossless data compression. Florida Institute of Technology Tech report, 2005

[4] Collobert, R., and Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. ICML 2008

Posted in Machine Learning.

No comments

By Roger Grosse – February 4, 2013

（转）Predictive learning vs. representation learning 预测学习与表示学习的更多相关文章

(zhuan) Notes on Representation Learning
this blog from: https://opendatascience.com/blog/notes-on-representation-learning-1/ Notes on Repr ...
Deep Learning and Shallow Learning
Deep Learning and Shallow Learning 由于 Deep Learning 现在如火如荼的势头,在各种领域逐渐占据 state-of-the-art 的地位,上个学期在一门 ...
Representation Learning with Contrastive Predictive Coding
目录概主要内容从具有序的数据讲起 Contrastive Predictive Coding (CPC) 图片构建序 Den Oord A V, Li Y, Vinyals O, et al. ...
网络表示学习Network Representation Learning/Embedding
网络表示学习相关资料网络表示学习(network representation learning,NRL),也被称为图嵌入方法(graph embedding method,GEM)是这两年兴起的工 ...
深度学习论文笔记-Deep Learning Face Representation from Predicting 10,000 Classes
来自:CVPR 2014 作者:Yi Sun ,Xiaogang Wang,Xiaoao Tang 题目:Deep Learning Face Representation from Predic ...
Learning Structured Representation for Text Classification via Reinforcement Learning 学习笔记
Representation learning : 表征学习,端到端的学习 pre-specified 预先指定的 demonstrate 论证;证明,证实;显示,展示;演示,说明 attempt ...
多视图子空间聚类/表示学习(Multi-view Subspace Clustering/Representation Learning)
多视图子空间聚类/表示学习(Multi-view Subspace Clustering/Representation Learning) 作者:凯鲁嘎吉 - 博客园 http://www.cnblo ...
翻译 Improved Word Representation Learning with Sememes
翻译 Improved Word Representation Learning with Sememes 题目 Improved Word Representation Learning with ...
Hierarchical Attention Based Semi-supervised Network Representation Learning
Hierarchical Attention Based Semi-supervised Network Representation Learning 1. 任务给定:节点信息网络目标:为每个节 ...

随机推荐

C# 里的if/switch
今天又重新翻了翻C# Step by Step if 语句 if(bool 表达式) { 语句块: } else { 语句块: } switch(day) { case 0: dayName=&quo ...
iOS开发一个用户登录注册模块需要解决的坑
最近和另外一位同事负责公司登录和用户中心模块的开发工作,开发周期计划两周,减去和产品和接口的协调时间,再减去由于原型图和接口的问题,导致强迫症纠结症状高发,情绪不稳定耗费的时间,能在两周基本完成也算是 ...
php 序列化、json
序列化和反序列化1. serialize和unserialize函数 2. json_encode 和 json_decode 使用JSON格式序列化和反序列化3. var_export 和 ev ...
android assets文件夹浅谈
---恢复内容开始--- 最近在研究assets文件夹的一些属性跟使用方法.根据网上一些文章.实例做一下汇总,拿出来跟大家分享下,有不足的地方还请多多指教. 首先了解一下assets是干什么用的,as ...
BackTrack5-r3配置网络信息
设置静态IP在BT终端输入:ifconfig -a 按回车// 查看所有网卡在BT终端输入:vi /etc/network/interfaces ...
操作系统学习笔记（五）－－CPU调度
由于第四章线程的介绍没有上传视频,故之后看书来补. 最近开始学习操作系统原理这门课程,特将学习笔记整理成技术博客的形式发表,希望能给大家的操作系统学习带来帮助.同时盼望大家能对文章评论,大家一起多多交 ...
react native 之 react-native-image-picke的详细使用图解
最近需要在react native项目中集成相机和相册的功能,于是在网上找了一个好用的第三方插件:react-native-image-picke. 该插件可以同时给iOS和Android两个平台下使 ...
C# 字符编码解码 Encoder 和Decoder
在网络传输和文件操作中,如果数据量很大,需要将其划分为较小的快,此时可能出现一个数据块的末尾是一个不匹配的高代理项,而与其匹配的低代理项在下一个数据块. 这时候使用Encoding的GetBytes方 ...
Data Big Bang
在过去的五十多年中,我们可以较为直观地看到IT行业正以蓬勃发展之势渗入到我们生活的方方面面.虽经历过几轮新兴和重叠的技术浪潮,但每一波浪潮都伴随着新兴技术的革新.IT供应商主导着互联网的走向,网络秩序 ...
What Is Mathematics?
What Is Mathematics? The National Council of Teachers of Mathematics (NCTM), the world's largest org ...

（转）Predictive learning vs. representation learning 预测学习 与 表示学习

（转）Predictive learning vs. representation learning 预测学习 与 表示学习的更多相关文章

随机推荐

热门专题

（转）Predictive learning vs. representation learning 预测学习与表示学习

（转）Predictive learning vs. representation learning 预测学习与表示学习的更多相关文章