Monte Carlo Control

Problem of State-Value Function

Similar as Policy Iteration in Model-Based Learning, Generalized Policy Iteration will be used in Monte Carlo Control. In Policy Iteration, we keep doing Policy Evaluation and Policy Improvement untill our policy converging to Optimal Policy.

Every time when we improve the policy, the action that gives the best return(reward+value function of the next state) will be picked.

The problem of this algorithm if we directly transfering to Monte Carlo is: it is based on the Transition Matrix.

Monte Carlo Control based on Q function

The idea of Policy Iteration can be used to Estimite Action-Value Function, and it is very useful for Model-Free problem. The process of choosing actions does not depend on State-Value function, because the return from a specific action is given by Monte Carlo estimation.

Q function can be updated by:

When we improve the policy, we just pick the action that produce the maximum Q value.

Exploration-exploitation Dilemma and ε-Greedy Exploration:

In Model-Based Policy Iteration algorithm, we update all State-Value function within a single policy evaluation process, so that we can choose the best actions from the whole action space whiled improving policies. Nevertheless, Monte Carlo Learning only updates the Action-Value functions whose actions were taken on the previous episode. So there are probabily some actions having better returns than the actions we have tried. Sometimes we need to give them a trial. We call that problem the Exploration-Exploitation Delemma.

It is necessary to try some new opened restaurant, rather than going to the usual place every day.

ε-Greedy Exploration is the algorithm that gives the agent probability=ε to choose randomly actions and 1-ε to stay on the optimal action.

Monte Carlo Control的更多相关文章

增强学习（四） ----- 蒙特卡罗方法(Monte Carlo Methods)
1. 蒙特卡罗方法的基本思想蒙特卡罗方法又叫统计模拟方法,它使用随机数(或伪随机数)来解决计算的问题,是一类重要的数值计算方法.该方法的名字来源于世界著名的赌城蒙特卡罗,而蒙特卡罗方法正是以概率为基 ...
Monte Carlo Policy Evaluation
Model-Based and Model-Free In the previous several posts, we mainly talked about Model-Based Reinfor ...
Monte Carlo方法简介(转载)
Monte Carlo方法简介(转载) 今天向大家介绍一下我现在主要做的这个东东. Monte Carlo方法又称为随机抽样技巧或统计实验方法,属于计算数学的一个分支,它是在上世纪四十年代 ...
PRML读书会第十一章 Sampling Methods（MCMC， Markov Chain Monte Carlo，细致平稳条件，Metropolis-Hastings，Gibbs Sampling，Slice Sampling，Hamiltonian MCMC）
主讲人网络上的尼采 (新浪微博: @Nietzsche_复杂网络机器学习) 网络上的尼采(813394698) 9:05:00 今天的主要内容:Markov Chain Monte Carlo,M ...
Monte Carlo Approximations
准备总结几篇关于 Markov Chain Monte Carlo 的笔记. 本系列笔记主要译自A Gentle Introduction to Markov Chain Monte Carlo (M ...
（转）Markov Chain Monte Carlo
Nice R Code Punning code better since 2013 RSS Blog Archives Guides Modules About Markov Chain Monte ...
[其他] 蒙特卡洛(Monte Carlo)模拟手把手教基于EXCEL与Crystal Ball的蒙特卡洛成本模拟过程实例：
http://www.cqt8.com/soft/html/723.html下载,官网下载 (转帖)1.定义: 蒙特卡洛(Monte Carlo)模拟是一种通过设定随机过程,反复生成时间序列,计算参数 ...
Introduction to Monte Carlo Tree Search （蒙特卡罗搜索树简介）
Introduction to Monte Carlo Tree Search (蒙特卡罗搜索树简介) 部分翻译自“Monte Carlo Tree Search and Its Applicati ...
（转）Monte Carlo method 蒙特卡洛方法
转载自:维基百科蒙特卡洛方法 https://zh.wikipedia.org/wiki/%E8%92%99%E5%9C%B0%E5%8D%A1%E7%BE%85%E6%96%B9%E6%B3%9 ...

随机推荐

$strobe$monitor$display
$strobe:当该时刻的所有事件处理完后,在这个时间步的结尾打印一行格式化的文本,语法$strobe( Argument,...);$fstrobe( Mcd, Argument,...);Mc ...
Red Hat Enterprise Linux 8.0 安装
Red Hat Enterprise Linux 8.0 安装本次安装通过使用VMware Workstation 15 pro 进行. 1.新建虚拟机 2.点击首页的创建新的虚拟机,或者点击标签栏 ...
03python面向对象编程4
http://c.biancheng.net/view/2287.html 1.1定义类和对象在面向对象的程序设计过程中有两个重要概念:类(class)和对象(object,也被称为实例,insta ...
At grand 022 GCD序列构造 dp/floyd二进制变换最少费用
A diverse words指的是每一个字母在单词中出现的次数不超过1的单词题目要求你求出字典序比当前给定单词大的字典序最小单词 1.如果给定的单词长度小于26 就遍历一次在单词尾部加上字典序最小 ...
boost::regex
https://blog.51cto.com/liam2199/2108548 正则表达式
mybatis返回自增主键问题踩坑
1 <insert id="insert" keyProperty="id" useGeneratedKeys="true"  par ...
poj 3352 : Road Construction 【ebcc】
题目链接题意:给出一个连通图,求最少加入多少条边可使图变成一个边-双连通分量模板题,熟悉一下边连通分量的定义.最后ans=(leaf+1)/2.leaf为原图中size为1的边-双连通分量 #i ...
mybaits 时间查询DATE_FORMAT
<if test="accountdayInout.inoutDateStart!=null"> and DATE_FORMAT(t.inout_date,'%Y-%m ...
java 8 接口默认方法
解决问题:在java8 之前的版本,在修改已有的接口的时候,需要修改实现该接口的实现类. 作用:解决接口的修改与现有的实现不兼容的问题.在不影响原有实现类的结构下修改新的功能方法案例: 首先定义一个 ...
BZOJ 2097: [Usaco2010 Dec]Exercise 奶牛健美操二分 + 贪心 + 树上问题
Code: #include<bits/stdc++.h> using namespace std; #define setIO(s) freopen(s".in",& ...

Monte Carlo Control

Monte Carlo Control的更多相关文章

随机推荐

热门专题