temporal credit assignment in reinforcement learning 【强化学习经典论文】

Sutton 出版论文的主页：

http://incompleteideas.net/publications.html

Phd 论文： temporal credit assignment in reinforcement learning

http://incompleteideas.net/publications.html#PhDthesis

最近在做强化学习方面的课题，发现在强化学习方面被称作强化学习之父的 Sutton 确实很厉害， TD算法和策略梯度策略算法都是他所提出的，虽然Reinforcement learning 的现在框架是从 Q-learning 开始确定的，但是强化学习做的最早的人之一，对强化学习中经典思想的贡献最多的人估计就是Sutton了，Sutton本硕都是在MIT读的心理学，博士阶段才读的计算机，看来确实是很强的。作为强化学习最经典的论文，也是Sutton的博士毕业论文，很是值得读一读的，寻找该篇论文许久，发现可能是由于该篇论文发表的时间过久，所以所有的数据库都没有收录，唯一收入的应该是Sutton的博士授予的大学 Massachusetts 马萨诸塞州大学，但是由于该文章只向本校学生开发，所以找了几天都没有找到，今天灵机一动，为什么不到作者的个人主页上找一找呢，这一弄还果然发现了它的存在，特此mark一下。

----------------------------------------------------------------------------------------------------------------

附：（Sutton主页 Publication部分内容）

Rich Sutton's Publications

First, a quick guide to the highlights, roughly in order of the work's popularity or potential current interest:

The
2nd edition of Reinforcement Learning: An Introduction
Emphatic TD (λ); Yu's convergence proof
Weighted importance sampling
version of LSTD (λ), linear-complexity algorithms
True online TD(λ)
The predictive
approach to knowledge representation; PEAK; Horde; nexting
Fast gradient-based TD algorithms, nonlinear case, GQ(lambda),
control, Maei's thesis
RL book
Temporal-difference learning; TD(lambda) details
The
TD model of Pavlovian conditioning; earlier Sutton-Barto
model; more biological 1982
& 1986;
and instrumental
learning
Dyna; as an integrated
architecture; with
FA 1996, 2008
The options paper; UAV example; precursor
not superseded;
Policy gradient methods; Incremental Natural
Actor-Critic Algorithms
PhD thesis, introduced actor-critic
architectures and "temporal credit assignment"
PSRs; the
predictive
representations hypothesis; TD networks;
with options
RL for RoboCup soccer keepaway
RL with continuous state and action
spaces
Step-size
adaptation by meta-gradient descent; IDBD; improved; earliest pub; in classical conditioning; in human category
learning, in
tracking
Random representations; representation search; feature discovery; more
Pole-balancing;
tracking nonstationarity
Exponentiated-gradient RL; fuller TR
A study in alpha and lambda
Two problems with backprop

Also, some RL pubs that aren't mine, available for researchers:

For any broken links, please send email to
rich@richsutton.com.

temporal credit assignment in reinforcement learning 【强化学习经典论文】的更多相关文章

[Reinforcement Learning] 强化学习介绍
随着AlphaGo和AlphaZero的出现,强化学习相关算法在这几年引起了学术界和工业界的重视.最近也翻了很多强化学习的资料,有时间了还是得自己动脑筋整理一下. 强化学习定义先借用维基百科上对强化 ...
Reinforcement Learning 强化学习入门
https://www.zhihu.com/question/277325426 https://github.com/jinglescode/reinforcement-learning-tic-t ...
The categories of Reinforcement Learning 强化学习分类
RL分为三大类: (1)通过行为的价值来选取特定行为的方法,具体包括使用表格学习的 q learning, sarsa, 使用神经网络学习的 deep q network: (2)直接输出行为的 p ...
Deep Reinforcement Learning for Dialogue Generation 论文阅读
本文来自李纪为博士的论文 Deep Reinforcement Learning for Dialogue Generation. 1,概述当前在闲聊机器人中的主要技术框架都是seq2seq模型.但 ...
Fully Convolutional Networks for semantic Segmentation（深度学习经典论文翻译）
摘要卷积网络在特征分层领域是非常强大的视觉模型.我们证明了经过端到端.像素到像素训练的卷积网络超过语义分割中最先进的技术.我们的核心观点是建立"全卷积"网络,输入任意尺寸,经过有 ...
【转载】 “强化学习之父”萨顿：预测学习马上要火，AI将帮我们理解人类意识
原文地址: https://yq.aliyun.com/articles/400366 本文来自AI新媒体量子位(QbitAI) ------------------------------- ...
<Machine Learning - 李宏毅> 学习笔记
<Machine Learning - 李宏毅> 学习笔记 b站视频地址:李宏毅2019国语第一章机器学习介绍 Hand crafted rules Machine learning ...
【强化学习】MOVE37-Introduction（导论）/马尔科夫链/马尔科夫决策过程
写在前面的话:从今日起,我会边跟着硅谷大牛Siraj的MOVE 37系列课程学习Reinforcement Learning(强化学习算法),边更新这个系列.课程包含视频和文字,课堂笔记会按视频为单位 ...
DQN（Deep Q-learning）入门教程（一）之强化学习介绍
什么是强化学习? 强化学习(Reinforcement learning,简称RL)是和监督学习,非监督学习并列的第三种机器学习方法,如下图示: 首先让我们举一个小时候的例子: 你现在在家,有两个动作 ...

随机推荐

TP5.0 PHPExcel 数据表格导出导入(原)
今天看的是PHPExcel这个扩展库,Comporse 下载不下来,最后只能自己去github里面手动下载,但有一个问题就是下载下来的PHPExcel没有命名空间,所以框架里面的use根本引入不进去, ...
day 09初始函数
# with open('小护士班主任',encoding='utf-8') as f,open ('小护士班主任.bak','w',encoding='utf-8')as f2: # for lin ...
STL 小白学习（8） set 二叉树
#include <iostream> using namespace std; #include <set> void printSet(set<int> s) ...
.NET反射简单应用———遍历枚举字段
反射(Reflection)是一个非常强大的工具,可以用来查看和遍历类型和类型成员的元数据:动态创建类型实例,动态调用所创建的实例方法.字段.属性:迟绑定方法和属性.此次要介绍的是使用反射查看类型成员 ...
Hibernate的HQL语句中定位参数和具名参数传参
HQL查询: 1.有关hql查询语句中的问号参数形式,如果出现有多个问号,这对后面设置参数就比较麻烦. 如:from User user where user.name=? and user.age= ...
jquery的js代码兼容全部浏览器的解决方法
//以下均可console.log()实验 var winW=document.body.clientWidth||document.docuemntElement.clientWidth;//网 ...
语法、id和class选择器、创建、
一. 1.CSS规则由两个主要部分构成:选择器,以及一条或多条声明(每条声明由一个属性和一个值构成,属性和值被冒号分开). 2.声明以分号“:”结束,生命组用大括号“{}”括起来. [示例:p {co ...
java并发实战-基础知识
1.线程安全共享:变量可以由多个线程同时访问.可变:变量值在生命周期内可以变化. 当多个线程访问某个类时,这个类始终都能表现出正确的行为,称这个类是线程安全的. 无状态对象是线程安全的. 2.原子性 ...
jedis连接池参数minEvictableIdleTimeMillis和softMinEvictableIdleTimeMillis探索
我们通常在使用JedisPoolConfig进行连接池配置的时候,minEvictableIdleTimeMillis和softMinEvictableIdleTimeMillis这两个参数经常会不懂 ...
linux创建新用户，可以使用sudo无密码操作
useradd -d /home/aiuap -m aiuappasswd aiuapXXXXXXXgroupadd aiuapchown -R aiuap:aiuap /home/aiuap chm ...

temporal credit assignment in reinforcement learning 【强化学习 经典论文】

Rich Sutton's Publications

temporal credit assignment in reinforcement learning 【强化学习 经典论文】的更多相关文章

随机推荐

热门专题

temporal credit assignment in reinforcement learning 【强化学习经典论文】

temporal credit assignment in reinforcement learning 【强化学习经典论文】的更多相关文章