Goals for the lecture:

Introduction & overview of the key methods and developments.

[Good starting point for you to start reading and understanding papers!]

原文链接:



@

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

Motivation and some examples

When is standard machine learning not enough?

Standard ML finally works for well-defined, stationary tasks.



But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?

General formulation and probabilistic view

What is meta-learning?

Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:



Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

A Toy Example: Few-shot Image Classification



Other (practical) Examples of Few-shot Learning







Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

  • Start with a common model initialization \(\theta\)
  • Given a new task \(T_i\) , adapt the model using a gradient step:

  • Meta-training is learning a shared initialization for all tasks:



Does MAML Work?

MAML from a Probabilistic Standpoint

Training points:

testing points:

MAML with log-likelihood loss对数似然损失:



One More Example: One-shot Imitation Learning 模仿学习

Prototype-based Meta-learning



Prototypes:



Predictive distribution:



Does Prototype-based Meta-learning Work?

Rapid Learning or Feature Reuse 特征重用







Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs

In few-shot learning:

  • Learn to identify functions that generated the data from just a few examples.
  • The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

  • Given a few (x, y) pairs, we can compute the predictive mean and variance.
  • Our prior knowledge is encapsulated in the kernel function.

Conditional Neural Processes 条件神经过程







On software packages for meta-learning

A lot of research code releases (code is fragile and sometimes broken)

A few notable libraries that implement a few specific methods:



Takeaways

  • Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
  • Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
  • Two families of widely popular methods:
    • Gradient-based meta-learning (MAML and such)
    • Prototype-based meta-learning (Protonets, Neural Processes, ...)
    • Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
  • Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
  • Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn

Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:



Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description



Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.

Meta-learning for RL

On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL:快速回顾



REINFORCE algorithm:

On-policy Meta-RL: MAML (again!)

  • Start with a common policy initialization \(\theta\)
  • Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:

  • Meta-training is learning a shared initialization for all tasks:





    Adaptation as Inference 适应推理

    Treat policy parameters, tasks, and all trajectories as random variables随机变量



    meta-learning = learning a prior and adaptation = inference



    Off-policy meta-RL: PEARL



Key points:

  • Infer latent representations z of each task from the trajectory data.
  • The inference networkq is decoupled from the policy, which enables off-policy learning.
  • All objectives involve the inference and policy networks.

Adaptation in nonstationary environments 不稳定环境

Classical few-shot learning setup:

  • The tasks are i.i.d. samples from some underlying distribution.
  • Given a new task, we get to interact with it before adapting.
  • What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?



    Example: adaptation to a learning opponent

    Each new round is a new task. Nonstationary environment is a sequence of tasks.

Continuous adaptation setup:

  • The tasks are sequentially dependent.
  • meta-learn to exploit dependencies

Continuous adaptation

Treat policy parameters, tasks, and all trajectories as random variables

RoboSumo: a multiagent competitive env

an agent competes vs. an opponent, the opponent’s behavior changes over time

Takeaways

  • Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
  • Both on-policy and off-policy RL can be “upgraded” to meta-RL:
    • On-policy meta-RL is directly enabled by MAML
    • Decoupling task inference and policy learning enables off-policy methods
  • Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
  • Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
  • Very active area of research.

卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning的更多相关文章

  1. 李飞飞确认将离职!谷歌云AI总帅换人,卡耐基·梅隆老教授接棒

    https://mp.weixin.qq.com/s/i1uwZALu1BcOq0jAMvPdBw 看点:李飞飞正式回归斯坦福,新任谷歌云AI总帅还是个教授,不过这次是全职. 智东西9月11日凌晨消息 ...

  2. 知乎:在卡内基梅隆大学 (Carnegie Mellon University) 就读是怎样一番体验?

    转自:http://www.zhihu.com/question/24295398   知乎 Yu Zhang 知乎搜索 首页 话题 发现 消息 调查类问题名校就读体验修改 在卡内基梅隆大学 (Car ...

  3. 卡内基梅隆大学软件工程研究所先后制定用于评价软件系统成熟度的模型CMM和CMMI

    SEI(美国卡内基梅隆大学软件工程研究所(Software Engineering Institute, SEI))开发的CMM模型有: 用于软件的(SW-CMM;SW代表'software即软件') ...

  4. 洛谷P3389 高斯消元 / 高斯消元+线性基学习笔记

    高斯消元 其实开始只是想搞下线性基,,,后来发现线性基和高斯消元的关系挺密切就一块儿在这儿写了好了QwQ 先港高斯消元趴? 这个算法并不难理解啊?就会矩阵运算就过去了鸭,,, 算了都专门为此写个题解还 ...

  5. 【敬业福bug】支付宝五福卡敬业福太难求 被炒至200元

    016年央视春晚官方独家互动合作伙伴--支付宝,正式上线春晚红包玩法集福卡活动. 用户新加入10个支付宝好友,就可以获成3张福卡.剩下2张须要支付宝好友之间相互赠送.交换,终于集齐5张福卡就有机会平分 ...

  6. 【转载】 准人工智能分享Deep Mind报告 ——AI“元强化学习”

    原文地址: https://www.sohu.com/a/231895305_200424 ------------------------------------------------------ ...

  7. (@WhiteTaken)设计模式学习——享元模式

    继续学习享元模式... 乍一看到享元的名字,一头雾水,学习了以后才觉得,这个名字确实比较适合这个模式. 享元,即共享对象的意思. 举个例子,如果制作一个五子棋的游戏,如果每次落子都实例化一个对象的话, ...

  8. 大学启示录I 浅谈大学生的学习与就业

    教育触感 最近看了一些书,有了一些思考,以下纯属博主脑子被抽YY的一些无关大雅的思考,如有雷同,纯属巧合.. 现实总是令人遗憾的,我们当中太多人已经习惯于沿着那一成不变的"典型成功道路&qu ...

  9. python学习(十)元类

    python 可以通过`type`函数创建类,也可通过type判断数据类型 import socket from io import StringIO import sys class TypeCla ...

随机推荐

  1. .NetCore中简单使用EasyNetQ

    前言 我们在.Net中使用RabbitMQ,最原始的就是基于RabbitMQ.Client进行编码,在这个过程中我们需要通过代码约定和维护队列,Exchange等.如果是自行编码封装通用型的Rabbi ...

  2. Git的全局及单个仓库配置

    我们先来了解一下在git中的配置文件路径: /etc/gitconfig 文件: 包含系统上每一个用户及他们仓库的通用配置. 如果在执行 git config 时带上 --system 选项,那么它就 ...

  3. 7. 组合你的UI

    1. UI布局关键概念 一个组合应用UI的根节点被称作Shell,一般只有一个Shell.Shell作为应用的主页,包含一个或者多个域.域是内容占位符,可以包含一个或者多个View.有很多控件可以作为 ...

  4. SU+GIS,让SketchUp模型在地图上活起来

    一.SU+GIS的场景展示 skp与卫星地图和倾斜摄影模型相结合人工模型与实景模型完美融合 这么一看是不是直接秒杀了单纯看看skp后联想的规划效果? 二.如何快速把草图大师的结果和GIS结合呢?在图新 ...

  5. pandas_知识总结_基础

    # Pandas 知识点总结 # Pandas数据结构:Series 和 DataFrame import pandas as pd import numpy as np # 一,Series: # ...

  6. [MIT6.006] 8. Hashing with Chaining 散列表

    一.字典 在之前课里,如果我们要实现插入,删除和查找,使用树结构,最好的时间复杂度是AVL下的Ο(log2n),使用线性结构,最好的复杂度为基数排序Ο(n).但如果使用字典数据类型去做,时间复杂度可为 ...

  7. Exactly Once 语义

    将服务器的 ACK 级别设置为-1,可以保证 Producer 到 Server 之间不会丢失数据,即 At Least Once 语义. 相对的,将服务器 ACK 级别设置为 0,可以保证生产者每条 ...

  8. centos6 virbox安装

    yum install kernel-devel yum update kernel* wget http://download.virtualbox.org/virtualbox/debian/or ...

  9. Vue 组件化开发

    组件化开发 基本概念 在最开始的时候,已经大概的聊了聊Vue是单页面开发,用户总是在一个页面上进行操作,看到的不同内容也是由不同组件构成的. 通过用户的操作,Vue将会向用户展示某些组件,也会隐藏某些 ...

  10. 如何剔掉 sql 语句中的尾巴,我用 C# 苦思了五种办法

    一:背景 1. 讲故事 这几天都在修复bug真的太忙了,期间也遇到了一个挺有趣bug,和大家分享一下,这是一块sql挺复杂的报表相关业务,不知道哪一位大佬在错综复杂的 嵌套 + 平行 if判断中sql ...