1. 迁移学习两种类型:

    • ConvNet as fixed feature extractor:利用在大数据集(如ImageNet)上预训练过的ConvNet(如AlexNet,VGGNet),移除最后几层(一般是最后分类器),将剩下的ConvNet作为应用于新数据集的固定不变的特征提取器,输出特征称为CNN codes,如果在预训练网络上是经过ReLUd,那这些codes也要经过ReLUd(important for performance);提取出所有CNN codes之后,再基于新数据集训练一个线性分类器(Linear SVM or Softmax classifier);
    • Fine-tuning the ConvNet:第一步:在新数据集上,替换预训练ConvNet顶层的分类器并retrain该分类器;第二步:以较小的学习率继续反向传播来微调预训练网络的权重,两种做法:微调ConvNet的所有层,或者保持some earlier layers fixed (due to overfitting concerns) ,只微调some higher-level portion of the network;
    • 原理:一般认为CNN中前端(靠近输入图片)的层提取的是纹理、色彩等基本特征,越靠近后端,提取的特征越高级、抽象、面向具体任务。所以更普遍的微调方法是:固定其他参数不变,替换预训练网络最后几层,基于新数据集重新训练最后几层的参数(之前的层参数保持不变,作为特征提取器),之后再用较小的学习率将网络整体训练。
    • 一些开源的Pretrained modelsModel Zoo

  • When and how to fine-tune?

    • 四个主要场景:
    • 新数据集小,且与原始数据集相似:要考虑小数据集过度拟合问题;利用CNN codes 训练一个线性分类器
    • 新数据集大,且与原始数据集相似:不用考虑过度拟合,可尝试微调整个神经网络;
    • 新数据集小,并与原始数据集差距大:训练一个线性分类器,而新数据集与原始数据集差距大,work better to train the SVM classifier from activations somewhere earlier in the network
    • 新数据集大,且与原始数据集差距大,使用预训练模型参数,基于新数据集微调整个神经网络

  • Practical advice

    • a few additional things to keep in mind when performing Transfer Learning:
    • Constraints from pretrained models:使用预训练网络,新数据集使用的架构将受限,比如不能随意take out Conv layers from the pretrained network;
    • Learning rates:微调 ConvNet权重(ConvNet weights are relatively good)时的学习率要比新的线性分类器(权重是随机初始化的)的学习率要小;

英文原文:

(These notes are currently in draft form and under development)

Table of Contents:

● Transfer Learning

● Additional References

Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:

● ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.

● Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.

● Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

  1. New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
  2. New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
  3. New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
  4. New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

    Practical advice. There are a few additional things to keep in mind when performing Transfer Learning:

    ● Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can’t arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.

    ● Learning rates. It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don’t wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

    Additional References

    ● CNN Features off-the-shelf: an Astounding Baseline for Recognition trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.

    ● DeCAF reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.

    ● How transferable are features in deep neural networks? studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.

  • 迁移学习transfer-learning

    • 迁移学习参考资料:the CS231n course notes
    • 在tensorflow上实现迁移学习: How to Retrain Inception's Final Layer for New Categories
    • 在实际中,通常不会自己训练庞大的神经网络。事实上已有很多使用大型数据集(如 ImageNet) 训练长达数周的模型。
    • 学习使用预训练模型 VGGNet 对花朵图像进行分类
    • VGGNet is great because it's simple and has great performance, coming in second in the ImageNet competition. The idea here is that we keep all the convolutional layers, but replace the final fully connected layers with our own classifier. This way we can use VGGNet as a feature extractor for our images then easily train a simple classifier on top of that. What we'll do is take the first fully connected layer with 4096 units, including thresholding with ReLUs. We can use those values as a code for each image, then build a classifier on top of those codes.
    • using VGGNet to classify images of flowers:
    • Github-tensorflow-vgg--Tensorflow VGG16 and VGG19

迁移学习-Transfer Learning的更多相关文章

  1. 【转载】 迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)

    --------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...

  2. 【深度学习系列】迁移学习Transfer Learning

    在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...

  3. pytorch例子学习——TRANSFER LEARNING TUTORIAL

    参考:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 以下是两种主要的迁移学习场景 微调convnet : ...

  4. 图像识别 | AI在医学上的应用 | 深度学习 | 迁移学习

    参考:登上<Cell>封面的AI医疗影像诊断系统:机器之心专访UCSD张康教授 Identifying Medical Diagnoses and Treatable Diseases b ...

  5. AI小白必读:深度学习、迁移学习、强化学习别再傻傻分不清

    摘要:诸多关于人工智能的流行词汇萦绕在我们耳边,比如深度学习 (Deep Learning).强化学习 (Reinforcement Learning).迁移学习 (Transfer Learning ...

  6. 【迁移学习】2010-A Survey on Transfer Learning

    资源:http://www.cse.ust.hk/TL/ 简介: 一个例子: 关于照片的情感分析. 源:比如你之前已经搜集了大量N种类型物品的图片进行了大量的人工标记(label),耗费了巨大的人力物 ...

  7. [DeeplearningAI笔记]ML strategy_2_3迁移学习/多任务学习

    机器学习策略-多任务学习 Learninig from multiple tasks 觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.7 迁移学习 Transfer Learninig 神 ...

  8. 迁移学习(Transformer),面试看这些就够了!(附代码)

    1. 什么是迁移学习 迁移学习(Transformer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中.迁移学习是通过从已学习的相 ...

  9. 迁移学习( Transfer Learning )

    在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型:然后利用这个学习到的模型来对测试文档进行分类与预测.然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关 ...

随机推荐

  1. Hibernate批量操作(二)

    Hibernate提供了一系列的查询接口,这些接口在实现上又有所不同.这里对Hibernate中的查询接口进行一个小结. 我们首先来看一下session加载实体对象的过程:Session在调用数据库查 ...

  2. java 各种去空格的方法

    String str =" dgd fdgd ";   方法一:str = str.trim();//去前后空格 返回:dgd fdgd   方法二:str = str.repla ...

  3. Autofac in webapi2

    安装包:Autofac.webapi2 注意: install-package autofac.webapi2 (注意:您的项目中如果使用的是webapi2,此处必须为webapi2而不是webapi ...

  4. Luogu P2690 接苹果

    题目背景 USACO 题目描述 很少有人知道奶牛爱吃苹果.农夫约翰的农场上有两棵苹果树(编号为1和2), 每一棵树上都长满了苹果.奶牛贝茜无法摘下树上的苹果,所以她只能等待苹果 从树上落下.但是,由于 ...

  5. RSA简介(三)——寻找质数

    要生成RSA的密钥,第一步就是要寻找质数,本节专讲如何寻找质数. 我们的质数(又称素数).合数一般是对正整数来讲,质数就是只有1和本身两个的正整数,合数至少有3个约数,而1既不是合数也不是质数. 质数 ...

  6. chrome地址栏命令

    Chrome作为一个前端开发的标准浏览器,用来体验和测试日新月异的新特性,自然是没话说. 有些新特性是需要特意开启设置的,有很多浏览器的内置功能也是要通过命令来开启或者使用的. Chrome 有很多的 ...

  7. python——字符串 & 正则表达

    raw字符串(原始字符串) 所见即所得,例如r''My's\n'' Python转义字符 在需要在字符中使用特殊字符时,python用反斜杠(\)转义字符.如下表: 转义字符 描述 \(在行尾时) 续 ...

  8. InnoDB online DDL与快速索引创建

    导读:在MySQL5.6之前版本,Innodb表的许多DDL操作是非常昂贵.许多ALTER TABLE操作的原理是通过创建新的空表,定义被要求的表选项和索引,然后逐行拷贝已存在记录到新表,在插入行时更 ...

  9. Python 函数装饰器和闭包

    装饰器基础知识 装饰器是可调用的对象,其参数是另一个函数(被装饰的函数). 装饰器可能会处理被装饰的函数,然后把它返回,或者将其替换成另一个函数或可调用对象. p.p1 { margin: 0.0px ...

  10. Django编写RESTful API(一):序列化

    欢迎访问我的个人网站:www.comingnext.cn 关于RESTful API 现在,在开发的过程中,我们经常会听到前后端分离这个技术名词,顾名思义,就是前台的开发和后台的开发分离开.这个技术方 ...