Self-Taught Learning
the promise of self-taught learning and unsupervised feature learning is that if we can get our algorithms to learn from unlabeled data, then we can easily obtain and learn from massive amounts of it.Even though a single unlabeled example is less informative than a single labeled example, if we can get tons of the former---for example, by downloading random unlabeled images/audio clips/text documents off the internet---and if our algorithms can exploit this unlabeled data effectively, then we might be able to achieve better performance than the massive hand-engineering and massive hand-labeling approaches.
Learning features
We have already seen how an autoencoder can be used to learn features from unlabeled data. Concretely, suppose we have an unlabeled training set
with
unlabeled examples. (The subscript "u" stands for "unlabeled.") We can then train a sparse autoencoder on this data (perhaps with appropriate whitening or other pre-processing):
Having trained the parameters
of this model, given any new input
, we can now compute the corresponding vector of activations
of the hidden units. As we saw previously, this often gives a better representation of the input than the original raw input
. We can also visualize the algorithm for computing the features/activations
as the following neural network:
This is just the sparse autoencoder that we previously had, with with the final layer removed.
Now, suppose we have a labeled training set
of
examples. (The subscript "l" stands for "labeled.") We can now find a better representation for the inputs. In particular, rather than representing the first training example as
, we can feed
as the input to our autoencoder, and obtain the corresponding vector of activations
. To represent this example, we can either just replace the original feature vector with
. Alternatively, we can concatenate the two feature vectors together, getting a representation
.
Thus, our training set now becomes
(if we use the replacement representation, and use
to represent the
-th training example), or
(if we use the concatenated representation). In practice, the concatenated representation often works better; but for memory or computation representations, we will sometimes use the replacement representation as well.
Finally, we can train a supervised learning algorithm such as an SVM, logistic regression, etc. to obtain a function that makes predictions on the
values. Given a test example
, we would then follow the same procedure: For feed it to the autoencoder to get
. Then, feed either
or
to the trained classifier to get a prediction.
On pre-processing the data
During the feature learning stage where we were learning from the unlabeled training set
, we may have computed various pre-processing parameters. For example, one may have computed a mean value of the data and subtracted off this mean to perform mean normalization, or used PCA to compute a matrix
to represent the data as
(or used PCA whitening or ZCA whitening). If this is the case, then it is important to save away these preprocessing parameters, and to use the same parameters during the labeled training phase and the test phase, so as to make sure we are always transforming the data the same way to feed into the autoencoder. In particular, if we have computed a matrix
using the unlabeled data and PCA, we should keep the same matrix
and use it to preprocess the labeled examples and the test data. We should not re-estimate a different
matrix (or data mean for mean normalization, etc.) using the labeled training set, since that might result in a dramatically different pre-processing transformation, which would make the input distribution to the autoencoder very different from what it was actually trained on.
On the terminology of unsupervised feature learning
There are two common unsupervised feature learning settings, depending on what type of unlabeled data you have. The more general and powerful setting is the self-taught learning setting, which does not assume that your unlabeled data xu has to be drawn from the same distribution as your labeled data xl. The more restrictive setting where the unlabeled data comes from exactly the same distribution as the labeled data is sometimes called the semi-supervised learning setting. This distinctions is best explained with an example, which we now give.
Suppose your goal is a computer vision task where you'd like to distinguish between images of cars and images of motorcycles; so, each labeled example in your training set is either an image of a car or an image of a motorcycle. Where can we get lots of unlabeled data? The easiest way would be to obtain some random collection of images, perhaps downloaded off the internet. We could then train the autoencoder on this large collection of images, and obtain useful features from them. Because here the unlabeled data is drawn from a different distribution than the labeled data (i.e., perhaps some of our unlabeled images may contain cars/motorcycles, but not every image downloaded is either a car or a motorcycle), we call this self-taught learning.
In contrast, if we happen to have lots of unlabeled images lying around that are all images of either a car or a motorcycle, but where the data is just missing its label (so you don't know which ones are cars, and which ones are motorcycles), then we could use this form of unlabeled data to learn the features. This setting---where each unlabeled example is drawn from the same distribution as your labeled examples---is sometimes called the semi-supervised setting. In practice, we often do not have this sort of unlabeled data (where would you get a database of images where every image is either a car or a motorcycle, but just missing its label?), and so in the context of learning features from unlabeled data, the self-taught learning setting is more broadly applicable.
自学习 VS 半监督学习
半监督学习假设,未标记数据和已标记数据拥有相同的数据分布
Self-Taught Learning的更多相关文章
- 一个Self Taught Learning的简单例子
idea: Concretely, for each example in the the labeled training dataset xl, we forward propagate the ...
- The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near
The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near ...
- What is machine learning?
What is machine learning? One area of technology that is helping improve the services that we use on ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance
Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...
- A Brief Overview of Deep Learning
A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind de ...
- 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics
5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...
- 深度学习Deep learning
In the last chapter we learned that deep neural networks are often much harder to train than shallow ...
- 【转】The most comprehensive Data Science learning plan for 2017
I joined Analytics Vidhya as an intern last summer. I had no clue what was in store for me. I had be ...
- Neural Networks and Deep Learning
Neural Networks and Deep Learning This is the first course of the deep learning specialization at Co ...
随机推荐
- rails 开发随手记 8
rails上传文件 无需gem 首先是model class DataFile < ActiveRecord::Base def initialize end def name @name en ...
- PostgreSQL Replication之第八章 与pgbouncer一起工作(4)
8.4 提升性能 从一开始考虑pgbouncer的时候,性能就是一个关键的因素.为了确保高性能,有些问题必须认真对待.首先,确保参与您设置的所有节点相互之间的距离较近.这对于降低网络往返时间有很多的帮 ...
- python读取word文档
周末需要做一个统计word文档字数的问题,刚开始以为很简单,因为之前做过excel表格相关的任务,所以认为利用扩展模块应该比较简单. 通过搜索,确实搜到了一个python操作word的模块,pytho ...
- 关联Anaconda和最新Pycharm2018.3.2
在Anaconda和Pycharm 2018.3.2 x64都安装好之后,进行Anaconda 与Pycharm的关联操作 首先File -->New Project 打开以后切记要把Proje ...
- 如何在Google Play上通过电脑下载apk
操作步骤: 1.首先打开翻 墙软件. 2.键入网址:http://apps.evozi.com/apk-downloader/ 3.将Google Play里apk的网址,复制到“Package na ...
- #ifdef__cplusplus
百度知道: 一般用于将C++代码以标准C形式输出(即以C的形式被调用),这是因为C++虽然常被认为是C的超集,但是C++的编译器还是与C的编译器不同的.C中调用C++中的代码这样定义会是安全的. 一般 ...
- 【Codeforces Round #239 (Div. 1) B】 Long Path
[链接] 我是链接,点我呀:) [题意] 在这里输入题意 [题解] DP,设f[i]表示第一次到i这个房间的时候传送的次数. f[1] = 0,f[2] = 2 考虑第i个位置的情况. 它肯定是从i- ...
- poj2031-Building a Space Station(最小生成树,kruskal,prime)
Building a Space Station Time Limit: 1000MS Memory Limit: 30000K Total Submissions: 5874 Accepte ...
- Android旋转屏幕后国际化语言失效的解决的方法
本文已同步至个人博客:liyuyu.cn 近期在项目中使用到了国际化多语言(英文+中文),但在使用时发现了一个问题.当屏幕旋转后.APP语言(中文)自己主动转换为了系统语言(英文).设置了Activi ...
- iOS Code Sign error: Provisioning profile can't be found 解决方式
出现error的过程:在执行另外一个xcode项目重置了code sign.回到原来的项目的时候出现这个error 修复方法: targe-build settings-code signing id ...
with
unlabeled examples. (The subscript "u" stands for "unlabeled.") We can then train a sparse autoencoder on this data (perhaps with appropriate whitening or other pre-processing):
of this model, given any new input
, we can now compute the corresponding vector of activations
of the hidden units. As we saw previously, this often gives a better representation of the input than the original raw input
of
examples. (The subscript "l" stands for "labeled.") We can now find a better representation for the inputs. In particular, rather than representing the first training example as
, we can feed
. To represent this example, we can either just replace the original feature vector with
.
(if we use the replacement representation, and use
to represent the
-th training example), or
(if we use the concatenated representation). In practice, the concatenated representation often works better; but for memory or computation representations, we will sometimes use the replacement representation as well.
values. Given a test example
, we would then follow the same procedure: For feed it to the autoencoder to get
. Then, feed either
to the trained classifier to get a prediction.
to represent the data as
(or used PCA whitening or ZCA whitening). If this is the case, then it is important to save away these preprocessing parameters, and to use the same parameters during the labeled training phase and the test phase, so as to make sure we are always transforming the data the same way to feed into the autoencoder. In particular, if we have computed a matrix