Temporal-Difference Control: SARSA and Q-Learning
SARSA
SARSA algorithm also estimate Action-Value functions rather than State-Value function. The difference between SARSA and Monte Carlo is: SARSA does not need to wait the actual return untill the end of the episode, instead it learns from each time step using estimations of the return.
In every step, the agent takes an action A from state S, then it receives a reward R and gets to a new state S'. Based on the policy π, we know the algorithm will greedily pick the action A'. So now we have:S,A,R,S',A', and the task is to estimate Q function of S,A pair.

We borrow the idea of estimating State-Value functions and use it onto Action-Value function estimation, then we get:

Here is the Sudo code for SARSA:

On-Policy vs Off-Policy
If we look into the learning process, there are actually two steps, firstly taking an action A from state S based on policy π, geting the reward R, and the next state S' coming; the second step is using the Q-function of action A' followd the same policy π. Both of the two steps use the same policy π, but actually they can be different. On the first step, the policy is called Target Policy, which is the policy that we will update. The second policy is Behavior Policy, this is how we pick the oprimal action from S'. Q-Learning uses different Policies on the two steps.
Q-Learning
From state S', Q-Learning algorithm picks the action maximizing the Q-function. It stands at state S', looking into all possible actions, and then chooses the best one.



Temporal-Difference Control: SARSA and Q-Learning的更多相关文章
- 强化学习9-Deep Q Learning
之前讲到Sarsa和Q Learning都不太适合解决大规模问题,为什么呢? 因为传统的强化学习都有一张Q表,这张Q表记录了每个状态下,每个动作的q值,但是现实问题往往极其复杂,其状态非常多,甚至是连 ...
- 增强学习(五)----- 时间差分学习(Q learning, Sarsa learning)
接下来我们回顾一下动态规划算法(DP)和蒙特卡罗方法(MC)的特点,对于动态规划算法有如下特性: 需要环境模型,即状态转移概率\(P_{sa}\) 状态值函数的估计是自举的(bootstrapping ...
- 【PPT】 Least squares temporal difference learning
最小二次方时序差分学习 原文地址: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd= ...
- 论文笔记之:Human-level control through deep reinforcement learning
Human-level control through deep reinforcement learning Nature 2015 Google DeepMind Abstract RL 理论 在 ...
- 如何用简单例子讲解 Q - learning 的具体过程?
作者:牛阿链接:https://www.zhihu.com/question/26408259/answer/123230350来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明 ...
- 强化学习_Deep Q Learning(DQN)_代码解析
Deep Q Learning 使用gym的CartPole作为环境,使用QDN解决离散动作空间的问题. 一.导入需要的包和定义超参数 import tensorflow as tf import n ...
- 深度强化学习介绍 【PPT】 Human-level control through deep reinforcement learning (DQN)
这个是平时在实验室讲reinforcement learning 的时候用到PPT, 交期末作业.汇报都是一直用的这个,觉得比较不错,保存一下,也为分享,最早该PPT源于师弟汇报所做.
- The Difference between Gamification and Game-Based Learning
http://inservice.ascd.org/the-difference-between-gamification-and-game-based-learning/ Have you trie ...
- deep Q learning小笔记
1.loss 是什么 2. Q-Table的更新问题变成一个函数拟合问题,相近的状态得到相近的输出动作.如下式,通过更新参数 θθ 使Q函数逼近最优Q值 深度神经网络可以自动提取复杂特征,因此,面对高 ...
随机推荐
- VB之Collection---Collection集合类
你看到的这个文章来自于http://www.cnblogs.com/ayanmw 由于要对一些数据进行处理,比较麻烦,实现某个算法要处理大量不同的不同类型的数据. 所以考虑到一些因素,又在使用VB6( ...
- ubuntu 16.04 安装后需要做的事情
1. 更改软件源 sudo gedit /etc/apt/source.list 在底部加入:(如果可以,把Ubuntu官方源注释掉“#_____”) # deb cdrom:[Ubuntu 16.0 ...
- docker安装MySQL5.7示例!!坑
docker pull mysql 一.错误的启动 [root@localhost ~]# docker run ‐‐name mysql01 ‐d mysql 42f0981990 ...
- LOJ6279 果树
我丢 之前sun在某校集训给我看过 当时也没想起来 今天补省集的锅的时候发现 wok这题我还听过?! 身败名裂.jpg (可是你记性不好这事情不已经人尽皆知了吗? 咳咳 回归正题 考虑对于两个同色的点 ...
- flask_session
flask_session和Flask中的session相比,比较简单,省去了 secret_key 首先,导入flask_session 模块 from flask_session import ...
- Mybatis入门教程之新增、更新、删除功能_java - JAVA
文章来源:嗨学网 敏而好学论坛www.piaodoo.com 欢迎大家相互学习 上一节说了Mybatis的框架搭建和简单查询,这次我们来说一说用Mybatis进行基本的增删改操作: 一. 插入一条数据 ...
- Linux学习-通过loganalyzer展示MySQL中rsyslog日志
一.实验环境 系统:CentOS7.6 软件包:apache,php,mariadb-server (都是基于光盘yum源) 源码包:loganalyzer-4.1.7.tar.gz (http:// ...
- F5设备部署
旁挂组网(组网模式一) 所谓旁挂组网模式,就是指在BIG-IP LTM上只配置一个Vlan,使用一个端口(或者Trunk端口)连接在网络中,所有的处理均在这一个Vlan中运行.通常有三种常见配置模式. ...
- Django模板自定义标签和过滤器,模板继承(extend),Django的模型层
上回精彩回顾 视图函数: request对象 request.path 请求路径 request.GET GET请求数据 QueryDict {} request.POST POST请求数据 Quer ...
- 在volist中用遍历
$('.InColor').each(function(){ if($(this).val()==1){ $('.absolute').css({"color":"gra ...