Monte Carlo Control
Problem of State-Value Function
Similar as Policy Iteration in Model-Based Learning, Generalized Policy Iteration will be used in Monte Carlo Control. In Policy Iteration, we keep doing Policy Evaluation and Policy Improvement untill our policy converging to Optimal Policy.

Every time when we improve the policy, the action that gives the best return(reward+value function of the next state) will be picked.

The problem of this algorithm if we directly transfering to Monte Carlo is: it is based on the Transition Matrix.
Monte Carlo Control based on Q function
The idea of Policy Iteration can be used to Estimite Action-Value Function, and it is very useful for Model-Free problem. The process of choosing actions does not depend on State-Value function, because the return from a specific action is given by Monte Carlo estimation.


Q function can be updated by:

When we improve the policy, we just pick the action that produce the maximum Q value.

Exploration-exploitation Dilemma and ε-Greedy Exploration:
In Model-Based Policy Iteration algorithm, we update all State-Value function within a single policy evaluation process, so that we can choose the best actions from the whole action space whiled improving policies. Nevertheless, Monte Carlo Learning only updates the Action-Value functions whose actions were taken on the previous episode. So there are probabily some actions having better returns than the actions we have tried. Sometimes we need to give them a trial. We call that problem the Exploration-Exploitation Delemma.
It is necessary to try some new opened restaurant, rather than going to the usual place every day.

ε-Greedy Exploration is the algorithm that gives the agent probability=ε to choose randomly actions and 1-ε to stay on the optimal action.

Monte Carlo Control的更多相关文章
- 增强学习(四) ----- 蒙特卡罗方法(Monte Carlo Methods)
1. 蒙特卡罗方法的基本思想 蒙特卡罗方法又叫统计模拟方法,它使用随机数(或伪随机数)来解决计算的问题,是一类重要的数值计算方法.该方法的名字来源于世界著名的赌城蒙特卡罗,而蒙特卡罗方法正是以概率为基 ...
- Monte Carlo Policy Evaluation
Model-Based and Model-Free In the previous several posts, we mainly talked about Model-Based Reinfor ...
- Monte Carlo方法简介(转载)
Monte Carlo方法简介(转载) 今天向大家介绍一下我现在主要做的这个东东. Monte Carlo方法又称为随机抽样技巧或统计实验方法,属于计算数学的一个分支,它是在上世纪四十年代 ...
- PRML读书会第十一章 Sampling Methods(MCMC, Markov Chain Monte Carlo,细致平稳条件,Metropolis-Hastings,Gibbs Sampling,Slice Sampling,Hamiltonian MCMC)
主讲人 网络上的尼采 (新浪微博: @Nietzsche_复杂网络机器学习) 网络上的尼采(813394698) 9:05:00 今天的主要内容:Markov Chain Monte Carlo,M ...
- Monte Carlo Approximations
准备总结几篇关于 Markov Chain Monte Carlo 的笔记. 本系列笔记主要译自A Gentle Introduction to Markov Chain Monte Carlo (M ...
- (转)Markov Chain Monte Carlo
Nice R Code Punning code better since 2013 RSS Blog Archives Guides Modules About Markov Chain Monte ...
- [其他] 蒙特卡洛(Monte Carlo)模拟手把手教基于EXCEL与Crystal Ball的蒙特卡洛成本模拟过程实例:
http://www.cqt8.com/soft/html/723.html下载,官网下载 (转帖)1.定义: 蒙特卡洛(Monte Carlo)模拟是一种通过设定随机过程,反复生成时间序列,计算参数 ...
- Introduction to Monte Carlo Tree Search (蒙特卡罗搜索树简介)
Introduction to Monte Carlo Tree Search (蒙特卡罗搜索树简介) 部分翻译自“Monte Carlo Tree Search and Its Applicati ...
- (转)Monte Carlo method 蒙特卡洛方法
转载自:维基百科 蒙特卡洛方法 https://zh.wikipedia.org/wiki/%E8%92%99%E5%9C%B0%E5%8D%A1%E7%BE%85%E6%96%B9%E6%B3%9 ...
随机推荐
- Linux-date函数
rhel7 date函数 显示本地时间?设定当前系统的时间,以一定格式显示当前时间,如X-X-X /X:X:X 使用man date命令查看关于date的使用方法 SYNOPSIS ...
- 动态路由协议RIP
RIP Routing Information Protocol,属IGP协议,是距离矢量型动态路由协议(直接发送路由信息的协议为距离矢量型协议),使用UDP协议,端口号520. 贝尔曼福特算法 RI ...
- Codeforces Round #426 (Div. 2) - D
题目链接:http://codeforces.com/contest/834/problem/D 题意:给定一个长度为n的序列和一个k,现在让你把这个序列分成刚好k段,并且k段的贡献之和最大.对于每一 ...
- 手写与copy
m_Font.CreateFont( 14, // 字体高度 0 , // 宽度由系统确定 0 , // 文本不倾斜 0 , // 字体不倾斜 FW_NORMAL, // 字体粗度 0 , // 非斜 ...
- WPF Geometry="M0,0 L1,0 1,0.1, 0,0.1Z" 画方格背景图
此项目源码下载地址:https://github.com/lizhiqiang0204/Tile 方格效果: 前端代码如下: <Window x:Class="WpfApp1.Main ...
- classloader加载class的流程及自定义ClassLoader
java应用环境中不同的class分别由不同的ClassLoader负责加载.一个jvm中默认的classloader有Bootstrap ClassLoader.Extension ClassLoa ...
- button标签设置line-height问题
默认设置line-height是不会有问题的. 加了边框后就会出现问题. 如果想要解决的话.就调整行高,自己满意为止.
- 【微信小程序】使用vscode编写微信小程序项目
1. 在微信开发者工具(以下简称:开发者)中新建一个模板微信小程序 2. 在开发者中将模拟器分隔开 3. 设置在保存时编译 4. 在vscode中打开项目目录 5. 下载代码提示插件 这样就可以在vs ...
- Vux的安装使用
1.Vux的安装 1.1.vue-cli的vux模板生成项目 可以直接使用 vue-cli 的模板生成一个 vux 项目 vue init airyland/vux2 projectName 由此可以 ...
- 【rust】rust安装,运行第一个Rust 程序 (1)
安装 Rust 在 Unix 类系统如 Linux 和 macOS 上,打开终端并输入: curl https://sh.rustup.rs -sSf | sh 回车后安装过程出现如下显示: info ...