迁移学习-Transfer Learning

迁移学习两种类型：
- ConvNet as fixed feature extractor：利用在大数据集(如ImageNet)上预训练过的ConvNet(如AlexNet，VGGNet)，移除最后几层（一般是最后分类器），将剩下的ConvNet作为应用于新数据集的固定不变的特征提取器，输出特征称为CNN codes，如果在预训练网络上是经过ReLUd，那这些codes也要经过ReLUd（important for performance）；提取出所有CNN codes之后，再基于新数据集训练一个线性分类器（Linear SVM or Softmax classifier）；
- Fine-tuning the ConvNet：第一步：在新数据集上，替换预训练ConvNet顶层的分类器并retrain该分类器；第二步：以较小的学习率继续反向传播来微调预训练网络的权重，两种做法：微调ConvNet的所有层，或者保持some earlier layers fixed (due to overfitting concerns) ，只微调some higher-level portion of the network；
- 原理：一般认为CNN中前端（靠近输入图片）的层提取的是纹理、色彩等基本特征，越靠近后端，提取的特征越高级、抽象、面向具体任务。所以更普遍的微调方法是：固定其他参数不变，替换预训练网络最后几层，基于新数据集重新训练最后几层的参数（之前的层参数保持不变，作为特征提取器），之后再用较小的学习率将网络整体训练。
- 一些开源的Pretrained models：Model Zoo

When and how to fine-tune?
- 四个主要场景：
- 新数据集小，且与原始数据集相似：要考虑小数据集过度拟合问题；利用CNN codes 训练一个线性分类器
- 新数据集大，且与原始数据集相似：不用考虑过度拟合，可尝试微调整个神经网络；
- 新数据集小，并与原始数据集差距大：训练一个线性分类器，而新数据集与原始数据集差距大，work better to train the SVM classifier from activations somewhere earlier in the network
- 新数据集大，且与原始数据集差距大，使用预训练模型参数，基于新数据集微调整个神经网络

Practical advice
- a few additional things to keep in mind when performing Transfer Learning:
- Constraints from pretrained models：使用预训练网络，新数据集使用的架构将受限，比如不能随意take out Conv layers from the pretrained network；
- Learning rates：微调 ConvNet权重（ConvNet weights are relatively good）时的学习率要比新的线性分类器（权重是随机初始化的）的学习率要小；

英文原文：

(These notes are currently in draft form and under development)

Table of Contents:

● Transfer Learning

● Additional References

Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:

● ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.

● Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.

● Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

Practical advice. There are a few additional things to keep in mind when performing Transfer Learning:

● Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can’t arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.

● Learning rates. It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don’t wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

Additional References

● CNN Features off-the-shelf: an Astounding Baseline for Recognition trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.

● DeCAF reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.

● How transferable are features in deep neural networks? studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.

迁移学习transfer-learning
- 迁移学习参考资料：the CS231n course notes
- 在tensorflow上实现迁移学习： How to Retrain Inception's Final Layer for New Categories
- 在实际中，通常不会自己训练庞大的神经网络。事实上已有很多使用大型数据集（如 ImageNet）训练长达数周的模型。
- 学习使用预训练模型 VGGNet 对花朵图像进行分类
- VGGNet is great because it's simple and has great performance, coming in second in the ImageNet competition. The idea here is that we keep all the convolutional layers, but replace the final fully connected layers with our own classifier. This way we can use VGGNet as a feature extractor for our images then easily train a simple classifier on top of that. What we'll do is take the first fully connected layer with 4096 units, including thresholding with ReLUs. We can use those values as a code for each image, then build a classifier on top of those codes.
- using VGGNet to classify images of flowers：
- Github-tensorflow-vgg--Tensorflow VGG16 and VGG19

迁移学习-Transfer Learning的更多相关文章

【转载】迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)
--------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...
【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...
pytorch例子学习——TRANSFER LEARNING TUTORIAL
参考:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 以下是两种主要的迁移学习场景微调convnet : ...
图像识别 | AI在医学上的应用 | 深度学习 | 迁移学习
参考:登上<Cell>封面的AI医疗影像诊断系统:机器之心专访UCSD张康教授 Identifying Medical Diagnoses and Treatable Diseases b ...
AI小白必读：深度学习、迁移学习、强化学习别再傻傻分不清
摘要:诸多关于人工智能的流行词汇萦绕在我们耳边,比如深度学习 (Deep Learning).强化学习 (Reinforcement Learning).迁移学习 (Transfer Learning ...
【迁移学习】2010-A Survey on Transfer Learning
资源:http://www.cse.ust.hk/TL/ 简介: 一个例子: 关于照片的情感分析. 源:比如你之前已经搜集了大量N种类型物品的图片进行了大量的人工标记(label),耗费了巨大的人力物 ...
[DeeplearningAI笔记]ML strategy_2_3迁移学习/多任务学习
机器学习策略-多任务学习 Learninig from multiple tasks 觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.7 迁移学习 Transfer Learninig 神 ...
迁移学习(Transformer)，面试看这些就够了！(附代码)
1. 什么是迁移学习迁移学习(Transformer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中.迁移学习是通过从已学习的相 ...
迁移学习（ Transfer Learning ）
在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型:然后利用这个学习到的模型来对测试文档进行分类与预测.然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关 ...

随机推荐

【JAVASCRIPT】event对象
一.preventDefault 与 stopPropagation event.preventDefault() 和 event.stopPropagation() 不是JQuery的方法,是JS ...
【MYSQL】解决Mysql直接登录问题（删除匿名用户）(转)
刚安装的Mysql会存在匿名用户. 在命令行下输入mysql,(如果这时提示不是外部或内部指令,那就把mysql server文件下的bin目录添加到系统路径Path中) 如果没有任何提示,直接进入& ...
HTML5使用Canvas来绘制图形
一.Canvas标签: 1.HTML5<canvas>元素用于图形的绘制,通过脚本(通常是javascript)来完成. 2.<canvas>标签只是图形容器,必须使用脚本来绘 ...
Docker学习--->>Docker的认识，安装，及常用命令熟悉
Docker是什么? 在平常的软件开发中,会面临着开发不同的程序或服务需要不同的环境.而在该环境上开发完成后,想要在其他的环境上部署,则需要自己去重新部署,而Docker的出现使得这样的迁移变得容易. ...
nodeJS之crypto加密
前面的话加密模块提供了 HTTP 或 HTTPS 连接过程中封装安全凭证的方法.也提供了 OpenSSL 的哈希,hmac, 加密(cipher), 解密(decipher), 签名(sign) 和 ...
23. leetcode 169. Majority Element
169. Majority Element Given an array of size n, find the majority element. The majority element is t ...
Hibernate与 MyBatis的比较(转)
第一章 Hibernate与MyBatis Hibernate 是当前最流行的O/R mapping框架,它出身于sf.NET,现在已经成为Jboss的一部分. Mybatis 是另外一种优秀 ...
ThinkPHP5.0相关
1.tp5的下载安装使用git克隆下面的仓库地址,这个地址下载的速度比较快,差不多两分钟的时间. 克隆tp5的应用项目: git clone https://github.com/top-think ...
Lettuce_webdriver 自动化测试
这篇文章主要讲解以下几点: 1. Lettuce_webdriver环境搭建 2. lettuce_webdriver自动化实例讲解一. lettuce_webdriver环境搭建搭建lettuc ...
Python查询SQLserver数据库备份（抛砖引玉）
通过python pymssql直接访问SQLserver数据库,查找其数据库mode,这个脚本具有很强的抛砖引玉特性: 1.可以巡检多台多数据库服务器 2.query内容可以多样化,譬如查询死锁.连 ...

迁移学习-Transfer Learning

迁移学习-Transfer Learning的更多相关文章

随机推荐

热门专题