https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

  1. (转) Deep Learning Research Review Week 2: Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  2. (转) Using the latest advancements in AI to predict stock market movements

    Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog ...

  3. 强化学习(Reinfment Learning) 简介

    本文内容来自以下两个链接: https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/ https: ...

  4. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  5. (转) Deep Learning in a Nutshell: Reinforcement Learning

    Deep Learning in a Nutshell: Reinforcement Learning   Share: Posted on September 8, 2016by Tim Dettm ...

  6. (转) Deep Reinforcement Learning: Pong from Pixels

    Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...

  7. Deep Reinforcement Learning: Pong from Pixels

    这是一篇迟来很久的关于增强学习(Reinforcement Learning, RL)博文.增强学习最近非常火!你一定有所了解,现在的计算机能不但能够被全自动地训练去玩儿ATARI(译注:一种游戏机) ...

  8. [DQN] What is Deep Reinforcement Learning

    已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...

  9. Deep Reinforcement Learning 基础知识

    Introduction 深度增强学习Deep Reinforcement Learning是将深度学习与增强学习结合起来从而实现从Perception感知到Action动作的端对端学习的一种全新的算 ...

随机推荐

  1. HDU 6229 Wandering Robots(2017 沈阳区域赛 M题,结论)

    题目链接  HDU 6229 题意 在一个$N * N$的格子矩阵里,有一个机器人. 格子按照行和列标号,左上角的坐标为$(0, 0)$,右下角的坐标为$(N - 1, N - 1)$ 有一个机器人, ...

  2. Theam,style

    Theam <!-- Base application theme. --> <!--<style name="AppTheme" parent=" ...

  3. centos中httpd Server not started: (13)Permission denied: make_sock: could not bind to address [::]:8888

    Install semanage tools: sudo yum -y install policycoreutils-python Allow port 88 for httpd: sudo sem ...

  4. Objective-C的self.用法的一些总结

    最近有人问我关于什么时候用self.赋值的问题, 我总结了一下, 发出来给大家参考. 有什么问题请大家斧正. 关于什么时间用self. , 其实是和Obj-c的存取方法有关, 不过网上很多人也都这么解 ...

  5. iscroll 子表左右滚动同时保持页面整体上下滚动

    if ( this.options.preventDefault && !utils.isBadAndroid && !utils.preventDefaultExce ...

  6. cordova热更新插件调试

    有更新www目录内容后,首先sencha app build,然后进入 cordova目录 运行 cordova-hcp build, 然后查看 chcp.json文件时间,然后压缩cordova目录 ...

  7. js CacheQueue

    (function(){ var CacheQueue=function(name,weightValue,maxLength,clearTimerTime){ //public this.name ...

  8. Another unnamed CacheManager already exists in the same VM

    今天学习Spring 缓存机制.遇到不少问题~ 好不easy缓存的单元測试用例调试成功了,在同一项目下单元測试另外一个文件时,发生了异常: org.springframework.beans.fact ...

  9. 基于SNMP的交换机入侵的内网渗透

    前言:局域网在管理中常常使用SNMP协议来进行设备的管理和监控,而SNMP的弱点也成为了我们此次渗透的关键. 使用SNMP管理设备只需要一个community string,而这个所谓的密码经常采用默 ...

  10. Shell脚本之:变量

    与编译型语言不同,shell脚本是一种解释型语言. 执行这类程序时,解释器(interpreter)需要读取我们编写的源代码(source code),并将其转换成目标代码(object code), ...