Temporal-Difference Learning for Prediction
In Monte Carlo Learning, we've got the estimation of value function:

Gt is the episode return from time t, which can be calculated by:

Please recall, Gt can be only calculated at the end of a given episode. This reveals a disadvantage of Monte Carlo Learning: have to wait until the end of episodes.
TD(0) algorithm replace Gt of the equation to the immediate reward and estimated value function of the next state:

The algorithm updates the Estimated State-Value Function at time t+1, because everything in the equation is determined. This means we will wait until the agent reaching the next state, so that the agent can get the immediate reward Rt+1 and know which state the system will transition to at time t+1.
The equations below are State-Value Function for Dynamic Programming, in which the whole environment is known. Compare to these equations:

TD algorithm is quite like 6.4 Bellman Equation, but it does not take expectation. Instead, it uses the knowledge till now to estimate how much reward I am going to get from this state. The whole algorithm can be demonstrated as:

TD Target, TD Error
Bias/ Viriance trade-off
Bootstraping
Temporal-Difference Learning for Prediction的更多相关文章
- 【PPT】 Least squares temporal difference learning
最小二次方时序差分学习 原文地址: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd= ...
- PP: Multi-Horizon Time Series Forecasting with Temporal Attention Learning
Problem: multi-horizon probabilistic forecasting tasks; Propose an end-to-end framework for multi-ho ...
- [Reinforcement Learning] Model-Free Prediction
上篇文章介绍了 Model-based 的通用方法--动态规划,本文内容介绍 Model-Free 情况下 Prediction 问题,即 "Estimate the value funct ...
- [Machine Learning] 机器学习常见算法分类汇总
声明:本篇博文根据http://www.ctocio.com/hotnews/15919.html整理,原作者张萌,尊重原创. 机器学习无疑是当前数据分析领域的一个热点内容.很多人在平时的工作中都或多 ...
- (转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
- Awesome Reinforcement Learning
Awesome Reinforcement Learning A curated list of resources dedicated to reinforcement learning. We h ...
- Machine Learning 学习笔记1 - 基本概念以及各分类
What is machine learning? 并没有广泛认可的定义来准确定义机器学习.以下定义均为译文,若以后有时间,将补充原英文...... 定义1.来自Arthur Samuel(上世纪50 ...
- Distributional Reinforcement Learning with Quantile Regression
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! arXiv:1710.10044v1 [cs.AI] 27 Oct 2017 In AAAI Conference on Artifici ...
- 3. Distributional Reinforcement Learning with Quantile Regression
C51算法理论上用Wasserstein度量衡量两个累积分布函数间的距离证明了价值分布的可行性,但在实际算法中用KL散度对离散支持的概率进行拟合,不能作用于累积分布函数,不能保证Bellman更新收敛 ...
随机推荐
- 基于TCP/UDP协议的socket
基于TCP协议的socket tcp是基于链接的,必须先启动服务端,然后再启动客户端去链接服务端 server端 import socket sk = socket.socket() sk.bind( ...
- P4513 最大连续字段和 (线段树+区间合并)
题目链接:https://www.luogu.org/problem/P4513 题目大意:单点修改和求区间最大连续字段和 解题思路:很容易想到是用线段树来做,但是如何进行维护呢? 每个维护区间 [L ...
- VS2017报错:未提供初始值设定项
今天在使用VS2017写程序时,报错: 出错的代码如下: #include "pch.h" #include <iostream> #include <threa ...
- Html5+ 开发APP 后台运行代码
function backRunning(){ if(plus.os.name == 'Android'){ var main = plus.android.runtimeMainActivity() ...
- python类和self解析
在介绍Python的self用法之前,先来介绍下Python中的类和实例……我们知道,面向对象最重要的概念就是类(class)和实例(instance),类是抽象的模板,比如学生这个抽象的事物,可以用 ...
- flask项目中设置logo
<link rel="shortcut icon" href="{{ url_for('static', filename='favicon.ico') }}&qu ...
- TCP/IP基础总结性学习(7)
确保 Web 安全的 HTTPS 在 HTTP 协议中有可能存在信息窃听或身份伪装等安全问题.使用 HTTPS 通信机制可以有效地防止这些问题. 一. HTTP 的缺点 HTTP 主要有这些不足,例举 ...
- JavaWeb DOM1
一.BOM的概述 browser object modal :浏览器对象模型. 浏览器对象:window对象. Window 对象会在 <body> 或 <frameset> ...
- jetSonNano darknet ubdefined reference to 'pow',undefined reference to 'sqrtf'....
我在用CMakelist编译工程时,遇到了这个一连串基础数学函数找不到的问题,如下图所示: 我当时在工程中明明引用了 #include "math.h"头函数,这是因为你的工程在预 ...
- koa2的安装
参考: https://www.jianshu.com/p/6b816c609669 1.1 安装koa-generator 在终端输入: $ npm install -g koa-generator ...