Optimal Value Functions and Optimal Policy
Optimal Value Function is how much reward the best policy can get from a state s, which is the best senario given state s. It can be defined as:

Value Function and Optimal State-Value Function
Let's see firstly compare Value Function with Optimal Value Function. For example, in the student study case, the value function for the blue circle state under 50:50 policy is 7.4.

However, when we consider the Optimal State-Value function, 'branches' that may prevent us from getting the best scores are proned. For instance, the optimal senario for the blue circle state is having 100% probability to continue his study rather than going to pub.

Optimal Action-Value Function
Then we move to Action-Value Function, and the following equation also reveals the Optimal Action-Value Function is from the policy who gives the best Action Returns.

The Optimal Action-Value Function is strongly related to Optimal State-Value Function by:

The equation means when action a is taken at state s, what the best return is. At this condition, the probability of reaching each state and the immediate reward is determined, so the only variable is the State-Value function . Therefore it is obvious that obtaining the Optimal State-Value function is equivalent to holding the Optimal Action-Value Function.
Conversely, the Optimal State-Value function is the best combination of Action and the following states with Optimal State-value Functions:

Still in the student example, when we know the Optimal State-Value Function, the Optimal Action-Value Function can be calculated as:

Finally we can derive the best policy from the Optimal Action-Value Function:


This means the policy only picks up the best action at every state rather than having a probability distribution. This deterministic policy is the goal of Reinforcement Learning, as it will guide the action to complete the task.
Optimal Value Functions and Optimal Policy的更多相关文章
- Reinforcement Learning: An Introduction读书笔记(3)--finite MDPs
> 目 录 < Agent–Environment Interface Goals and Rewards Returns and Episodes Policies and Val ...
- Machine Learning——吴恩达机器学习笔记(酷
[1] ML Introduction a. supervised learning & unsupervised learning 监督学习:从给定的训练数据集中学习出一个函数(模型参数), ...
- RL_Learning
Key Concepts in RL 标签(空格分隔): RL_learning OpenAI Spinning Up原址 states and observations (状态和观测) action ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
- Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...
- 深度学习课程笔记(七):模仿学习(imitation learning)
深度学习课程笔记(七):模仿学习(imitation learning) 2017.12.10 本文所涉及到的 模仿学习,则是从给定的展示中进行学习.机器在这个过程中,也和环境进行交互,但是,并没有显 ...
- DP Intro - OBST
http://radford.edu/~nokie/classes/360/dp-opt-bst.html Overview Optimal Binary Search Trees - Problem ...
- [C5] Andrew Ng - Structuring Machine Learning Projects
About this Course You will learn how to build a successful machine learning project. If you aspire t ...
- Reinforcement Learning Index Page
Reinforcement Learning Posts Step-by-step from Markov Property to Markov Decision Process Markov Dec ...
随机推荐
- mysql远程命令连接
#mysql -h 服务器地址 -P 端口 -u账号 -p密码 mysql -uroot -proot
- 入门级,关于下载设置wamp的安装
将wamp下载下来,分清楚自己电脑是32还是64位,在安装之前,首先确定你电脑里安装了vc++ 的运行库,不然安装wamp后会出现提醒缺少XXX文件,但是注意,在安装vc运行库的时候,请搜索集合包类的 ...
- java 中断线程的几种方式 interrupt()
中断 中断(Interrupt)一个线程意味着在该线程完成任务之前停止其正在进行的一切,有效地中止其当前的操作.线程是死亡.还是等待新的任务或是继续运行至下一步,就取决于这个程序.虽然初次看来它可能显 ...
- JS中数组和字符串方法的简单整理
一.数组: 数组的基本方法: 1.增:arr.unshift() /push() 前增/后增 2.删:arr.shift() /pop ...
- Nodejs 学习笔记 --- 安装与环境配置
一.安装Node.js步骤 1.下载对应自己系统对应的 Node.js 版本,地址:https://nodejs.org/zh-cn/ 2.选安装目录进行安装 3.环境配置 ...
- AI-sklearn 学习笔记(二)数据集
from sklearn import datasets from sklearn.linear_model import LinearRegression loaded_data = dataset ...
- Rsync+sersync 数据同步指南
(1):sersync 可以记录下被监听目录中发生变化的(包括增加.删除.修改)具体某一个文件或 某一个目录的名字: (2):rsync 在同步的时候,只同步发生变化的这个文件或者这个目录(每次发生变 ...
- 联想ideapad 310s如何进BIOS,换固态硬盘SSD,配置U盘启动,重装Win10系统
1. 如何进BIOS 关机情况下,捅一下Novo键,即可进入BIOS 2. 安装固态硬盘 Ideadpad 310S 本身自带的硬盘是5400转的机械硬盘,容量小速度慢.换的新的固态硬盘是SATA接口 ...
- idea 配置自动编译 livereload
1 pom中添加 spring-boot-devtools 依赖 <dependency> <groupId>org.springframework.boot</grou ...
- 關於mac os系統的一些快捷鍵和操作
一.了解mac os 1.桌面的组成元素 2.认识“Dock栏” Dock是一个神奇的东西,可以把Dock理解为Windows的任务栏,在这个地方可以随意拖放你想常驻在你屏幕底部的应用. Dock在工 ...