In Monte Carlo Learning, we've got the estimation of value function:

Gt is the episode return from time t, which can be calculated by:

Please recall, Gt can be only calculated at the end of a given episode. This reveals a disadvantage of Monte Carlo Learning: have to wait until the end of episodes.

TD(0) algorithm replace Gt of the equation to the immediate reward and estimated value function of the next state:

The algorithm updates the Estimated State-Value Function at time t+1, because everything in the equation is determined. This means we will wait until the agent reaching the next state, so that the agent can get the immediate reward Rt+1 and know which state the system will transition to at time t+1.

The equations below are State-Value Function for Dynamic Programming, in which the whole environment is known. Compare to these equations:

TD algorithm is quite like 6.4 Bellman Equation, but it does not take expectation. Instead, it uses the knowledge till now to estimate how much reward I am going to get from this state. The whole algorithm can be demonstrated as:

TD Target, TD Error

Bias/ Viriance trade-off

Bootstraping

Temporal-Difference Learning for Prediction的更多相关文章

  1. 【PPT】 Least squares temporal difference learning

    最小二次方时序差分学习 原文地址: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd= ...

  2. PP: Multi-Horizon Time Series Forecasting with Temporal Attention Learning

    Problem: multi-horizon probabilistic forecasting tasks; Propose an end-to-end framework for multi-ho ...

  3. [Reinforcement Learning] Model-Free Prediction

    上篇文章介绍了 Model-based 的通用方法--动态规划,本文内容介绍 Model-Free 情况下 Prediction 问题,即 "Estimate the value funct ...

  4. [Machine Learning] 机器学习常见算法分类汇总

    声明:本篇博文根据http://www.ctocio.com/hotnews/15919.html整理,原作者张萌,尊重原创. 机器学习无疑是当前数据分析领域的一个热点内容.很多人在平时的工作中都或多 ...

  5. (转) Deep Learning Research Review Week 2: Reinforcement Learning

      Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...

  6. Awesome Reinforcement Learning

    Awesome Reinforcement Learning A curated list of resources dedicated to reinforcement learning. We h ...

  7. Machine Learning 学习笔记1 - 基本概念以及各分类

    What is machine learning? 并没有广泛认可的定义来准确定义机器学习.以下定义均为译文,若以后有时间,将补充原英文...... 定义1.来自Arthur Samuel(上世纪50 ...

  8. Distributional Reinforcement Learning with Quantile Regression

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! arXiv:1710.10044v1 [cs.AI] 27 Oct 2017 In AAAI Conference on Artifici ...

  9. 3. Distributional Reinforcement Learning with Quantile Regression

    C51算法理论上用Wasserstein度量衡量两个累积分布函数间的距离证明了价值分布的可行性,但在实际算法中用KL散度对离散支持的概率进行拟合,不能作用于累积分布函数,不能保证Bellman更新收敛 ...

随机推荐

  1. Python核心技术与实战——十一|程序的模块化

    我们现在已经总结了Python的基本招式和套路,现在可以写一些不那么简单的系统性工程或代码量较大的应用程序.这时候,一个简单的.py文件就会显得过于臃肿,无法承担一个重量级软件开发的重任.这就需要这一 ...

  2. luoguP1445 [Violet]樱花

    链接P1445 [Violet]樱花 求方程 \(\frac {1}{X}+\frac {1}{Y}=\frac {1}{N!}\) 的正整数解的组数,其中\(N≤10^6\),模\(10^9+7\) ...

  3. bzoj3011 [Usaco2012 Dec]Running Away From the Barn 左偏树

    题目传送门 https://lydsy.com/JudgeOnline/problem.php?id=3011 题解 复习一下左偏树板子. 看完题目就知道是左偏树了. 结果这个板子还调了好久. 大概已 ...

  4. synchronized和lock的使用分析(优缺点对比详解)

    1.synchronized加同步格式: synchronized(需要一个任意的对象(锁)){ 代码块中放操作共享数据的代码. } synchromized缺陷synchronized是java中的 ...

  5. CSS3——制作带动画效果的小图片

    下了一个软件:ScreenToGif用来截取动态图片,终于可以展示我的小动图啦,嘻嘻,敲开心! main.html <!DOCTYPE html> <html lang=" ...

  6. jetSonNano darknet ubdefined reference to 'pow',undefined reference to 'sqrtf'....

    我在用CMakelist编译工程时,遇到了这个一连串基础数学函数找不到的问题,如下图所示: 我当时在工程中明明引用了 #include "math.h"头函数,这是因为你的工程在预 ...

  7. mybatis中延迟加载Lazy策略

    延迟加载: lazy策略原理:只有在使用查询sql返回的数据是才真正发出sql语句到数据库,否则不发出(主要用在多表的联合查询) 1.一对一延迟加载: 假设数据库中有person表和card表:其中p ...

  8. HDU 1298 T9 ( 字典树 )

    题意 : 给你 w 个单词以及他们的频率,现在给出模拟 9 键打字的一串数字,要你在其模拟打字的过程中给出不同长度的提示词,出现的提示词应当是之前频率最高的,当然提示词不需要完整的,也可以是 w 个单 ...

  9. 小波神经网络(WNN)

    人工神经网络(ANN) 是对人脑若干基本特性通过数学方法进行的抽象和模拟,是一种模仿人脑结构及其功能的非线性信息处理系统. 具有较强的非线性逼近功能和自学习.自适应.并行处理的特点,具有良好的容错能力 ...

  10. Leetcode 15. Sum(二分或者暴力或者哈希都可以)

    15. 3Sum Medium Given an array nums of n integers, are there elements a, b, c in nums such that a +  ...