To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

(转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
(转) Using the latest advancements in AI to predict stock market movements
Using the latest advancements in AI to predict stock market movements 2019-01-13 21:31:18 This blog ...
强化学习(Reinfment Learning) 简介
本文内容来自以下两个链接: https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/ https: ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
(转) Deep Learning in a Nutshell: Reinforcement Learning
Deep Learning in a Nutshell: Reinforcement Learning Share: Posted on September 8, 2016by Tim Dettm ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
Deep Reinforcement Learning: Pong from Pixels
这是一篇迟来很久的关于增强学习(Reinforcement Learning, RL)博文.增强学习最近非常火!你一定有所了解,现在的计算机能不但能够被全自动地训练去玩儿ATARI(译注:一种游戏机) ...
[DQN] What is Deep Reinforcement Learning
已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...
Deep Reinforcement Learning 基础知识
Introduction 深度增强学习Deep Reinforcement Learning是将深度学习与增强学习结合起来从而实现从Perception感知到Action动作的端对端学习的一种全新的算 ...

随机推荐

Codeforces 754A(搜索)
设s[i][j]为序列i到j的和,当s[i][j]≠0时,即可从i跳到j+1.目标为从1跳到n+1,所以按照题意暴力即可. #include <bits/stdc++.h> using n ...
python安装在windows server2008，故障排除
python安装在windows server2008,故障排除我也在虚拟机上的win2008安装python2.7.9多次回滚了.搜了一通 Windows Server 2003/2008无法 ...
gtest 自动化测试部署
1.部署 a)编译框架 1.1下载gtest库1.6.0 并解压到文件夹 "/user/{user}/gtest.1.6.0" 下载地址:https://code.google.c ...
DotnetBrowser高级教程-（4）使用MVC框架2-接收与返回模型
在上一节,我们搭建了基本的mvc框架,这一节,我们将实现数据的接受与返回,具体操作如下: 1.新建Model目录,新增模型类Person.cs,代码如下: public class Person { ...
Shell脚本部分语法
Shell中的变量 Linux Shell中的变量分为“系统变量”和“用户自定义变量”,可以通过set命令查看那系统变量系统变量:$HOME.$PWD.$SHELL.$USER等等显示当前sh ...
GTD实用指南
以前通过余弦大牛博客接触到了GTD, 后来我自己接触之后呢, 我是非常讨厌GTD的, 因为太功利化了反人类我还是比较懒得··· 可是最近事情真的比较多,不得不做GTD了 = = 郁闷! 时间管理 ...
Vue2.0 :key作用
转自:https://www.cnblogs.com/zhumingzhenhao/p/7688336.html 为了给 Vue 一个提示,以便它能跟踪每个节点的身份,从而重用和重新排序现有元素,你需 ...
xshell的快捷键（很有用）
删除 ctrl + d 删除光标所在位置上的字符相当于VIM里x或者dl ctrl + h 删除光标所在位置前的字符相当于VIM里hx或者dh ctrl + k 删除光标 ...
mixare的measureText方法在频繁调用时抛出“referencetable overflow max 1024”的解决方式
这几天在搞基于位置的AR应用,採用了github上两款开源项目: mixare android-argument-reality-framework 这两个项目实现机制大致同样.我选取的是androi ...
sprint3 【每日scrum】 TD助手站立会议第五天
站立会议组员昨天今天困难签到刘铸辉 (组长) 通过网上的介绍懂得了闹钟的添加和工作原理,然后加入了震动效果在添加日程类型处添加了选择闹钟间隔多长时间相应,并写了闹钟运行的类广播协议也弄 ...

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning的更多相关文章

随机推荐

热门专题