Awesome Reinforcement Learning

A curated list of resources dedicated to reinforcement learning.

We have pages for other topics: awesome-rnnawesome-deep-visionawesome-random-forest

Maintainers: Hyunsoo KimJiwon Kim

We are looking for more contributors and maintainers!

Contributing

Please feel free to pull requests

Table of Contents

Codes

Theory

Lectures

Books

  • Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction [Book] [Code]
  • Csaba Szepesvari, Algorithms for Reinforcement Learning [Book]
  • David Poole and Alan Mackworth, Artificial Intelligence: Foundations of Computational Agents [Book Chapter]
  • Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Programming [Book (Amazon)] [Summary]
  • Mykel J. Kochenderfer, Decision Making Under Uncertainty: Theory and Application [Book (Amazon)]

Surveys

  • Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement Learning: A Survey, JAIR, 1996. [Paper]
  • S. S. Keerthi and B. Ravindran, A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
  • Matthew E. Taylor, Peter Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR, 2009. [Paper]
  • Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
  • Michael L. Littman, "Reinforcement learning improves behaviour from evaluative feedback." Nature 521.7553 (2015): 445-451. [Paper]
  • Marc P. Deisenroth, Gerhard Neumann, Jan Peter, A Survey on Policy Search for Robotics, Foundations and Trends in Robotics, 2014. [Book]

Papers / Thesis

  • Foundational Papers

    • Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper]

      • discusses issues in RL such as the "credit assignment problem"
    • Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper]
      • earliest publication on temporal-difference (TD) learning rule.
  • Methods

    • Dynamic Programming (DP):

      • Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
    • Monte Carlo:
      • Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
      • Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
    • Temporal-Difference:
      • Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988.[Paper]
    • Q-Learning (Off-policy TD algorithm):
      • Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
    • Sarsa (On-policy TD algorithm):
      • G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
      • Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
    • R-Learning (learning of relative values)
      • Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993.[Paper-Google Scholar]
    • Function Approximation methods (Least-Sqaure Temporal Difference, Least-Sqaure Policy Iteration)
      • Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
      • Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
    • Policy Search / Policy Gradient
      • Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
      • Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper]
      • Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
      • Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper]
      • Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012.[Paper]
      • Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004.[Paper]
      • Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
      • Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
    • Hierarchical RL
      • Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
      • George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007.[Paper]
    • Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
      • V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
      • Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
      • Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
      • Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015.[ArXiv]
      • Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
      • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016.[ArXiv]

Applications

Game Playing

  • Traditional Games

    • Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
    • Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
    • Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]
  • Computer Games

Robotics

  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
  • Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
  • Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
  • Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
  • Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
  • Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]

Control

  • An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
  • Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2011) [Paper]

Operations Research

  • Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
  • Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002)[Paper]

Tutorials / Websites

Online Demos

Awesome Reinforcement Learning的更多相关文章

  1. Machine Learning Algorithms Study Notes(5)—Reinforcement Learning

    Reinforcement Learning 对于控制决策问题的解决思路:设计一个回报函数(reward function),如果learning agent(如上面的四足机器人.象棋AI程序)在决定 ...

  2. (转) Playing FPS games with deep reinforcement learning

    Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...

  3. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  4. (转) Deep Learning Research Review Week 2: Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  5. Learning Roadmap of Deep Reinforcement Learning

    1. 知乎上关于DQN入门的系列文章 1.1 DQN 从入门到放弃 DQN 从入门到放弃1 DQN与增强学习 DQN 从入门到放弃2 增强学习与MDP DQN 从入门到放弃3 价值函数与Bellman ...

  6. Open source packages on Deep Reinforcement Learning

    智能车 self driving car + 强化学习 reinforcement learning + 神经网络 模拟 https://github.com/MorvanZhou/my_resear ...

  7. (转) Deep Reinforcement Learning: Playing a Racing Game

    Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...

  8. 论文笔记之:Dueling Network Architectures for Deep Reinforcement Learning

    Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper 摘要:本文的贡献点主要是在 DQN ...

  9. getting started with building a ROS simulation platform for Deep Reinforcement Learning

    Apparently, this ongoing work is to make a preparation for futural research on Deep Reinforcement Le ...

  10. (转) Deep Learning in a Nutshell: Reinforcement Learning

    Deep Learning in a Nutshell: Reinforcement Learning   Share: Posted on September 8, 2016by Tim Dettm ...

随机推荐

  1. SharePoint 2013 开发——开发并部署webpart

    博客地址:http://blog.csdn.net/FoxDave webpart我们就不详细阐述了,在APP的开发中,自定义属性设置通过APP webpart的URL查询字符串传递,它通过IFR ...

  2. javascript prototype 剖析

    学过javascript的一定对prototype不陌生,但是这个究竟是个什么东西,就不一定很清楚. 我们先对prototype进行一个定义:每个函数都有一个prototype属性,这个属性是指向一个 ...

  3. hdu1506 dp

    //Accepted 1428 KB 62 ms // #include <cstdio> #include <cstring> #include <iostream&g ...

  4. cometd的服务器配置

    CometDServlet必须在web.xml中进行配置,如下: <servlet>        <servlet-name>cometd</servlet-name& ...

  5. ie7下 滚动条内容不动问题

    ie7+ 版式正常 ie7滚动内容不跟着动 解决方法 加上 overflow-x: hidden;    overflow-y: auto;    *position:relative;    *le ...

  6. JQuery blockUI

    1 $.blockUI({//界面锁定之后 ,显示样式和提示消息 css: { width: 'auto', left: '20px', right: '20px' }, message: '< ...

  7. EnterpriseLibrary4 自己封装程序集实现log打印

      注意:1)要引用响应的程序集,必须是41的          2)配置文件 using Microsoft.Practices.EnterpriseLibrary.Common.Configura ...

  8. 解决input之间的空隙

    <!doctype html> <html> <head> <meta charset="UTF-8"> <meta name ...

  9. 四则运算<3>单元测试

    经过分析图一的结果正确,因为输出到文件是为了打印,不要求在线答题功能,因此为实现答题功能. 经过分析,结果正确,满足了选择要求. 选择这六组测试用例的原因是这六组用例将有无乘数法,有无括号,有无负数, ...

  10. 关于oracle出现ORA-06143:连接未打开 解决方案

    原因:程序所在的路径中含有()和中文 用plsql连接正常,连接字符串也检查不出毛病,换到另一个程序照样使用,折腾了半天,最后才发现程序所在的路径中含有()和中文,所以可能导致出现这种很难排查的问题出 ...