the promise of self-taught learning and unsupervised feature learning is that if we can get our algorithms to learn from unlabeled data, then we can easily obtain and learn from massive amounts of it.Even though a single unlabeled example is less informative than a single labeled example, if we can get tons of the former---for example, by downloading random unlabeled images/audio clips/text documents off the internet---and if our algorithms can exploit this unlabeled data effectively, then we might be able to achieve better performance than the massive hand-engineering and massive hand-labeling approaches.

Learning features

We have already seen how an autoencoder can be used to learn features from unlabeled data. Concretely, suppose we have an unlabeled training set with unlabeled examples. (The subscript "u" stands for "unlabeled.") We can then train a sparse autoencoder on this data (perhaps with appropriate whitening or other pre-processing):

Having trained the parameters of this model, given any new input , we can now compute the corresponding vector of activations of the hidden units. As we saw previously, this often gives a better representation of the input than the original raw input . We can also visualize the algorithm for computing the features/activations as the following neural network:

This is just the sparse autoencoder that we previously had, with with the final layer removed.

Now, suppose we have a labeled training set of examples. (The subscript "l" stands for "labeled.") We can now find a better representation for the inputs. In particular, rather than representing the first training example as , we can feed as the input to our autoencoder, and obtain the corresponding vector of activations . To represent this example, we can either just replace the original feature vector with . Alternatively, we can concatenate the two feature vectors together, getting a representation .

Thus, our training set now becomes (if we use the replacement representation, and use to represent the -th training example), or (if we use the concatenated representation). In practice, the concatenated representation often works better; but for memory or computation representations, we will sometimes use the replacement representation as well.

Finally, we can train a supervised learning algorithm such as an SVM, logistic regression, etc. to obtain a function that makes predictions on the values. Given a test example , we would then follow the same procedure: For feed it to the autoencoder to get . Then, feed either or to the trained classifier to get a prediction.

On pre-processing the data

During the feature learning stage where we were learning from the unlabeled training set , we may have computed various pre-processing parameters. For example, one may have computed a mean value of the data and subtracted off this mean to perform mean normalization, or used PCA to compute a matrix to represent the data as (or used PCA whitening or ZCA whitening). If this is the case, then it is important to save away these preprocessing parameters, and to use the same parameters during the labeled training phase and the test phase, so as to make sure we are always transforming the data the same way to feed into the autoencoder. In particular, if we have computed a matrix using the unlabeled data and PCA, we should keep the same matrix and use it to preprocess the labeled examples and the test data. We should not re-estimate a different matrix (or data mean for mean normalization, etc.) using the labeled training set, since that might result in a dramatically different pre-processing transformation, which would make the input distribution to the autoencoder very different from what it was actually trained on.

On the terminology of unsupervised feature learning

There are two common unsupervised feature learning settings, depending on what type of unlabeled data you have. The more general and powerful setting is the self-taught learning setting, which does not assume that your unlabeled data xu has to be drawn from the same distribution as your labeled data xl. The more restrictive setting where the unlabeled data comes from exactly the same distribution as the labeled data is sometimes called the semi-supervised learning setting. This distinctions is best explained with an example, which we now give.

Suppose your goal is a computer vision task where you'd like to distinguish between images of cars and images of motorcycles; so, each labeled example in your training set is either an image of a car or an image of a motorcycle. Where can we get lots of unlabeled data? The easiest way would be to obtain some random collection of images, perhaps downloaded off the internet. We could then train the autoencoder on this large collection of images, and obtain useful features from them. Because here the unlabeled data is drawn from a different distribution than the labeled data (i.e., perhaps some of our unlabeled images may contain cars/motorcycles, but not every image downloaded is either a car or a motorcycle), we call this self-taught learning.

In contrast, if we happen to have lots of unlabeled images lying around that are all images of either a car or a motorcycle, but where the data is just missing its label (so you don't know which ones are cars, and which ones are motorcycles), then we could use this form of unlabeled data to learn the features. This setting---where each unlabeled example is drawn from the same distribution as your labeled examples---is sometimes called the semi-supervised setting. In practice, we often do not have this sort of unlabeled data (where would you get a database of images where every image is either a car or a motorcycle, but just missing its label?), and so in the context of learning features from unlabeled data, the self-taught learning setting is more broadly applicable.

自学习 VS 半监督学习

半监督学习假设,未标记数据和已标记数据拥有相同的数据分布

Self-Taught Learning的更多相关文章

  1. 一个Self Taught Learning的简单例子

    idea: Concretely, for each example in the the labeled training dataset xl, we forward propagate the ...

  2. The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near

    The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near ...

  3. What is machine learning?

    What is machine learning? One area of technology that is helping improve the services that we use on ...

  4. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  5. (转) Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance

    Ensemble Methods for Deep Learning Neural Networks to Reduce Variance and Improve Performance 2018-1 ...

  6. A Brief Overview of Deep Learning

    A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind de ...

  7. 5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

    5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...

  8. 深度学习Deep learning

    In the last chapter we learned that deep neural networks are often much harder to train than shallow ...

  9. 【转】The most comprehensive Data Science learning plan for 2017

    I joined Analytics Vidhya as an intern last summer. I had no clue what was in store for me. I had be ...

  10. Neural Networks and Deep Learning

    Neural Networks and Deep Learning This is the first course of the deep learning specialization at Co ...

随机推荐

  1. 51Nod 活动安排问题(排序+优先队列)

    有若干个活动,第i个开始时间和结束时间是[Si,fi),同一个教室安排的活动之间不能交叠,求要安排所有活动,最少需要几个教室? Input 第一行一个正整数n (n <= 10000)代表活动的 ...

  2. ZOJ-3261 Connections in Galaxy War 并查集 离线操作

    题目链接:https://cn.vjudge.net/problem/ZOJ-3261 题意 有n个星星,之间有m条边 现一边询问与x星连通的最大星的编号,一边拆开一些边 思路 一开始是真不会,甚至想 ...

  3. NOIP2016 天天爱跑步(树上差分)

    题意 给定一棵树,从时刻 0 开始,有若干人从 S[i] 出发向 T[i] 移动,每单位时刻移动一条边 对于树上每个点 x,求 w[x]  时刻有多少人恰好路过 x N,M≤300000 题解 从上午 ...

  4. top---实时动态地查看系统的整体运行情况

    top命令可以实时动态地查看系统的整体运行情况,是一个综合了多方信息监测系统性能和运行信息的实用工具.通过top命令所提供的互动式界面,用热键可以管理. 语法 top(选项) 选项 -b:以批处理模式 ...

  5. python_webApp

    提高开发效率:当更改代码后,不重启服务器就能使用新效果 参考链接:https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df ...

  6. Linux学习总结(9)——Linux 新手必知必会的 10 条 Linux 基本命令

    Linux 对我们的生活产生了巨大的冲击.至少你的安卓手机使用的就是 Linux 核心.尽管如此,在第一次开始使用 Linux 时你还是会感到难以下手.因为在 Linux 中,通常需要使用终端命令来取 ...

  7. ArcGIS api for javascript——查询没有地图的数据

    描述 本例展示了用户能够从没有显示服务的地图服务查询数据.大部分地图服务包含属性信息的数据集,数据集能够被查询并显示在一个简单的列或表格里. 本例按提供的州名称查询USA人口普查数据,然后显示关于州的 ...

  8. (2) 我的结果- spec2006中精确的simulation points运行点

    spec06中获取simpoints的环境说明: spec的版本号为spec2006v1.0; 使用ref input with runspec; 100millions为周期生成的simpoints ...

  9. URAL 1823. Ideal Gas(数学啊 )

    题目链接:http://acm.timus.ru/problem.aspx?space=1&num=1823 1823. Ideal Gas Time limit: 0.5 second Me ...

  10. node.js操作Cookie

    node.js操作Cookie http://www.tuicool.com/articles/F3UF7n