卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning

Goals for the lecture:
Introduction & overview of the key methods and developments.
[Good starting point for you to start reading and understanding papers!]
原文链接:

@
Probabilistic Graphical Models | Elements of Meta-Learning
01 Intro to Meta-Learning

Motivation and some examples
When is standard machine learning not enough?
Standard ML finally works for well-defined, stationary tasks.

But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?

General formulation and probabilistic view
What is meta-learning?
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:

Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

A Toy Example: Few-shot Image Classification


Other (practical) Examples of Few-shot Learning




Gradient-based and other types of meta-learning
Model-agnostic Meta-learning (MAML) 与模型无关的元学习
- Start with a common model initialization \(\theta\)
- Given a new task \(T_i\) , adapt the model using a gradient step:

- Meta-training is learning a shared initialization for all tasks:


Does MAML Work?

MAML from a Probabilistic Standpoint
Training points: 
testing points:
MAML with log-likelihood loss对数似然损失:


One More Example: One-shot Imitation Learning 模仿学习

Prototype-based Meta-learning

Prototypes:

Predictive distribution:

Does Prototype-based Meta-learning Work?

Rapid Learning or Feature Reuse 特征重用




Neural processes and relation of meta-learning to GPs
Drawing parallels between meta-learning and GPs
In few-shot learning:
- Learn to identify functions that generated the data from just a few examples.
- The function class and the adaptation rule encapsulate our prior knowledge.
Recall Gaussian Processes (GPs): 高斯过程
- Given a few (x, y) pairs, we can compute the predictive mean and variance.
- Our prior knowledge is encapsulated in the kernel function.

Conditional Neural Processes 条件神经过程




On software packages for meta-learning
A lot of research code releases (code is fragile and sometimes broken)
A few notable libraries that implement a few specific methods:
- Torchmeta (https://github.com/tristandeleu/pytorch-meta)
- Learn2learn (https://github.com/learnables/learn2learn)
- Higher (https://github.com/facebookresearch/higher)

Takeaways
- Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
- Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
- Two families of widely popular methods:
- Gradient-based meta-learning (MAML and such)
- Prototype-based meta-learning (Protonets, Neural Processes, ...)
- Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
- Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
- Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)
02 Elements of Meta-RL
What is meta-RL and why does it make sense?
Recall the definition of learning-to-learn
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:

Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.

Meta-learning for RL

On-policy and off-policy meta-RL
On-policy RL: Quick Recap 符合策略的RL:快速回顾

REINFORCE algorithm:

On-policy Meta-RL: MAML (again!)
- Start with a common policy initialization \(\theta\)
- Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:

- Meta-training is learning a shared initialization for all tasks:


Adaptation as Inference 适应推理
Treat policy parameters, tasks, and all trajectories as random variables随机变量

meta-learning = learning a prior and adaptation = inference

Off-policy meta-RL: PEARL


Key points:
- Infer latent representations z of each task from the trajectory data.
- The inference networkq is decoupled from the policy, which enables off-policy learning.
- All objectives involve the inference and policy networks.

Adaptation in nonstationary environments 不稳定环境
Classical few-shot learning setup:
- The tasks are i.i.d. samples from some underlying distribution.
- Given a new task, we get to interact with it before adapting.
- What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?

Example: adaptation to a learning opponent
Each new round is a new task. Nonstationary environment is a sequence of tasks.
Continuous adaptation setup:
- The tasks are sequentially dependent.
- meta-learn to exploit dependencies

Continuous adaptation
Treat policy parameters, tasks, and all trajectories as random variables

RoboSumo: a multiagent competitive env
an agent competes vs. an opponent, the opponent’s behavior changes over time

Takeaways
- Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
- Both on-policy and off-policy RL can be “upgraded” to meta-RL:
- On-policy meta-RL is directly enabled by MAML
- Decoupling task inference and policy learning enables off-policy methods
- Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
- Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
- Very active area of research.
卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning的更多相关文章
- 李飞飞确认将离职!谷歌云AI总帅换人,卡耐基·梅隆老教授接棒
https://mp.weixin.qq.com/s/i1uwZALu1BcOq0jAMvPdBw 看点:李飞飞正式回归斯坦福,新任谷歌云AI总帅还是个教授,不过这次是全职. 智东西9月11日凌晨消息 ...
- 知乎:在卡内基梅隆大学 (Carnegie Mellon University) 就读是怎样一番体验?
转自:http://www.zhihu.com/question/24295398 知乎 Yu Zhang 知乎搜索 首页 话题 发现 消息 调查类问题名校就读体验修改 在卡内基梅隆大学 (Car ...
- 卡内基梅隆大学软件工程研究所先后制定用于评价软件系统成熟度的模型CMM和CMMI
SEI(美国卡内基梅隆大学软件工程研究所(Software Engineering Institute, SEI))开发的CMM模型有: 用于软件的(SW-CMM;SW代表'software即软件') ...
- 洛谷P3389 高斯消元 / 高斯消元+线性基学习笔记
高斯消元 其实开始只是想搞下线性基,,,后来发现线性基和高斯消元的关系挺密切就一块儿在这儿写了好了QwQ 先港高斯消元趴? 这个算法并不难理解啊?就会矩阵运算就过去了鸭,,, 算了都专门为此写个题解还 ...
- 【敬业福bug】支付宝五福卡敬业福太难求 被炒至200元
016年央视春晚官方独家互动合作伙伴--支付宝,正式上线春晚红包玩法集福卡活动. 用户新加入10个支付宝好友,就可以获成3张福卡.剩下2张须要支付宝好友之间相互赠送.交换,终于集齐5张福卡就有机会平分 ...
- 【转载】 准人工智能分享Deep Mind报告 ——AI“元强化学习”
原文地址: https://www.sohu.com/a/231895305_200424 ------------------------------------------------------ ...
- (@WhiteTaken)设计模式学习——享元模式
继续学习享元模式... 乍一看到享元的名字,一头雾水,学习了以后才觉得,这个名字确实比较适合这个模式. 享元,即共享对象的意思. 举个例子,如果制作一个五子棋的游戏,如果每次落子都实例化一个对象的话, ...
- 大学启示录I 浅谈大学生的学习与就业
教育触感 最近看了一些书,有了一些思考,以下纯属博主脑子被抽YY的一些无关大雅的思考,如有雷同,纯属巧合.. 现实总是令人遗憾的,我们当中太多人已经习惯于沿着那一成不变的"典型成功道路&qu ...
- python学习(十)元类
python 可以通过`type`函数创建类,也可通过type判断数据类型 import socket from io import StringIO import sys class TypeCla ...
随机推荐
- python给图片添加文字
如何用几行代码给图片加上想要的文字呢? 下面为大家说下实现过程. 关注公众号 "轻松学编程"了解更多. 有图如下,想添加自写的诗句 诗句 静安心野 朝有赤羽暮落霞, 小舟载我湖旋停 ...
- python求平均数及打印出低于平均数的值列表
刚学Python的时候还是要多动手进行一些小程序的编写,要持续不断的进行,知识才能掌握的牢.今天就讲一下Python怎么求平均数,及打印出低于平均数的数值列表 方法一: scores1 = [91, ...
- ZOJ 1005 Jugs(BFS)
Jugs In the movie "Die Hard 3", Bruce Willis and Samuel L. Jackson were confronted with th ...
- 使用sql导出数据_mysql
在mysql中 使用sql 脚本导出数据的方式之一: select * from table_name where x=y INFO OUTFILE "/tmp/table_name.tx ...
- 英特尔与 Facebook 合作采用第三代英特尔® 至强® 可扩展处理器和支持 BFloat16 加速的英特尔® 深度学习加速技术,提高 PyTorch 性能
英特尔与 Facebook 曾联手合作,在多卡训练工作负载中验证了 BFloat16 (BF16) 的优势:在不修改训练超参数的情况下,BFloat16 与单精度 32 位浮点数 (FP32) 得到了 ...
- 常用DOS指令
Windows的DOS命令,其实是Windows系统的cmd命令,它是由原来的MS-DOS系统保留下来的. MS-DOS称为微软磁盘操作系统,最开始从西雅图公司(蒂姆·帕特森)买过来 MS-DOS系 ...
- 7. 基于MLlib的机器学习
*以下内容由<Spark快速大数据分析>整理所得. 读书笔记的第七部分是讲的是如何使用Spark中提供机器学习函数的MLlib库,在集群中并行运行机器学习算法. MLlib是Spark中提 ...
- 设计模式之工厂模式(Factory模式)
在面向对象系统设计中经常遇到以下两类问题: 1)为了提高内聚(Cohesion)和松耦合(Coupling),我们经常会抽象出一些类的公共接口以形成抽象基类或者接口.这样我们可以通过声明一个指向基类的 ...
- WSL2:我在原生的Win10玩转Linux系统
原文地址:梁桂钊的博客 博客地址:http://blog.720ui.com 欢迎关注公众号:「服务端思维」.一群同频者,一起成长,一起精进,打破认知的局限性. WSL2:我在原生的Win10玩转Li ...
- 8.java设计模式之装饰者模式
基本需求: 咖啡的种类有很多种,调料也有很多种,下单时,可以点单品咖啡也可以点单品咖啡+调料的组合,并计算下单时花费的金额 传统方式: 方式一 创建一个抽象类Drink,让所有的单品咖啡和组合咖啡都继 ...