API - 强化学习¶

强化学习（增强学习）相关函数。

`discount_episode_rewards`([rewards, gamma, mode])	Take 1D float array of rewards and compute discounted rewards for an episode.
`cross_entropy_reward_loss`(logits, actions, ...)	Calculate the loss for Policy Gradient Network.
`log_weight`(probs, weights[, name])	Log weight.
`choice_action_by_probs`([probs, action_list])	Choice and return an an action by given the action probability distribution.

奖励函数¶

tensorlayer.rein.discount_episode_rewards(rewards=[], gamma=0.99, mode=0)[源代码]¶

Take 1D float array of rewards and compute discounted rewards for an
episode. When encount a non-zero value, consider as the end a of an episode.

Parameters:	rewards : numpy list a list of rewards gamma : float discounted factor mode : int if mode == 0, reset the discount process when encount a non-zero reward (Ping-pong game). if mode == 1, would not reset the discount process.

Parameters:

rewards : numpy list

a list of rewards

gamma : float

discounted factor

mode : int

if mode == 0, reset the discount process when encount a non-zero reward (Ping-pong game).
if mode == 1, would not reset the discount process.

Examples

>>> rewards = np.asarray([0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1])

>>> gamma = 0.9

>>> discount_rewards = tl.rein.discount_episode_rewards(rewards, gamma)

>>> print(discount_rewards)

... [ 0.72899997  0.81        0.89999998  1.          0.72899997  0.81

... 0.89999998  1.          0.72899997  0.81        0.89999998  1.        ]

>>> discount_rewards = tl.rein.discount_episode_rewards(rewards, gamma, mode=1)

>>> print(discount_rewards)

... [ 1.52110755  1.69011939  1.87791049  2.08656716  1.20729685  1.34144104

... 1.49048996  1.65610003  0.72899997  0.81        0.89999998  1.        ]

损失函数¶

Weighted Cross Entropy¶

tensorlayer.rein.cross_entropy_reward_loss(logits, actions, rewards, name=None)[源代码]¶

Calculate the loss for Policy Gradient Network.

Parameters:	logits : tensor The network outputs without softmax. This function implements softmax inside. actions : tensor/ placeholder The agent actions. rewards : tensor/ placeholder The rewards.

Parameters:

logits : tensor

The network outputs without softmax. This function implements softmax
inside.

actions : tensor/ placeholder

The agent actions.

rewards : tensor/ placeholder

The rewards.

Examples

>>> states_batch_pl = tf.placeholder(tf.float32, shape=[None, D])

>>> network = InputLayer(states_batch_pl, name='input')

>>> network = DenseLayer(network, n_units=H, act=tf.nn.relu, name='relu1')

>>> network = DenseLayer(network, n_units=3, name='out')

>>> probs = network.outputs

>>> sampling_prob = tf.nn.softmax(probs)

>>> actions_batch_pl = tf.placeholder(tf.int32, shape=[None])

>>> discount_rewards_batch_pl = tf.placeholder(tf.float32, shape=[None])

>>> loss = tl.rein.cross_entropy_reward_loss(probs, actions_batch_pl, discount_rewards_batch_pl)

>>> train_op = tf.train.RMSPropOptimizer(learning_rate, decay_rate).minimize(loss)

Log weight¶

tensorlayer.rein.log_weight(probs, weights, name='log_weight')[源代码]¶

Log weight.

Parameters:	probs : tensor If it is a network output, usually we should scale it to [0, 1] via softmax. weights : tensor

Parameters:

probs : tensor

If it is a network output, usually we should scale it to [0, 1] via softmax.

weights : tensor

采样选择函数¶

tensorlayer.rein.choice_action_by_probs(probs=[0.5, 0.5], action_list=None)[源代码]¶

Choice and return an an action by given the action probability distribution.

Parameters:	probs : a list of float. The probability distribution of all actions. action_list : None or a list of action in integer, string or others. If None, returns an integer range between 0 and len(probs)-1.

Parameters:

probs : a list of float.

The probability distribution of all actions.

action_list : None or a list of action in integer, string or others.

If None, returns an integer range between 0 and len(probs)-1.

Examples

>>> for _ in range(5):

>>>     a = choice_action_by_probs([0.2, 0.4, 0.4])

>>>     print(a)

... 0

... 1

... 1

... 2

... 1

>>> for _ in range(3):

>>>     a = choice_action_by_probs([0.5, 0.5], ['a', 'b'])

>>>     print(a)

... a

... b

... b

艾伯特(http://www.aibbt.com/)国内第一家人工智能门户

TensorLayer官方中文文档1.7.4：API – 强化学习的更多相关文章

TensorLayer官方中文文档1.7.4：API – 数据预处理
所属分类:TensorLayer API - 数据预处理¶ 我们提供大量的数据增强及处理方法,使用 Numpy, Scipy, Threading 和 Queue. 不过,我们建议你直接使用 Tens ...
TensorLayer官方中文文档1.7.4：API – 可视化
API - 可视化¶ TensorFlow 提供了可视化模型和激活输出等的工具 TensorBoard. 在这里,我们进一步提供一些可视化模型参数和数据的函数. read_image(image[, ...
Keras官方中文文档：函数式模型API
\ 函数式模型接口为什么叫"函数式模型",请查看"Keras新手指南"的相关部分 Keras的函数式模型为Model,即广义的拥有输入和输出的模型,我们使用M ...
ReactNative官方中文文档0.21
整理了一份ReactNative0.21中文文档,提供给需要的reactnative爱好者.ReactNative0.21中文文档.chm 百度盘下载:ReactNative0.21中文文档来源: ...
PyTorch官方中文文档：torch.nn
torch.nn Parameters class torch.nn.Parameter() 艾伯特(http://www.aibbt.com/)国内第一家人工智能门户,微信公众号:aibbtcom ...
学习Python 新去处：Python 官方中文文档
Python 作为世界上最好用的语言,官方支持的文档一直没有中文.小伙伴们已经习惯了原汁原味的英文文档,但如果有官方中文文档,那么查阅或理解速度都会大大提升.本文将介绍隐藏在 Python 官网的中文 ...
django2.0 官方中文文档地址
django2.0 官方开始发布中文文档了,之前还想着一直翻译完成所有有必要的内容,想着可以省事一些了,打开以后看了一下,发现官方的中文文档还没翻译完成, 现在(2018-7-10)最新章节是是编 ...
mysql 新手入门官方文档+官方中文文档附地址
点评: 官方文档地址官方中文文档地址 sql语句扩展
PyTorch官方中文文档：torch.optim 优化器参数
内容预览: step(closure) 进行单次优化 (参数更新). 参数: closure (callable) –...~ 参数: params (iterable) – 待优化参数的iterab ...

随机推荐

阿里巴巴Java开发规约插件地址
Git地址: https://github.com/alibaba/p3c eclipse 安装地址: https://p3c.alibaba.com/plugin/eclipse/update
javascript同步分页
目前网上分页的例子比较多,但是对其原理不是很了解,平时用的时候只是拿来调用,今天花了点时间,采用面向对象方式写了一个demo.对其方法做了封装,对外只提供一个调用接口. window.loadPage ...
PHP实现WebService的简单示例和实现步骤
首先我创建的文件有: api.php api的接口类文件 api.wsdl 我创建产生的最后要调用的接口文件 cometrue.php 注册service api类内容的所有内容的执行文件 creat ...
js中的各种“位置”——“top、clientTop、scrollTop、offsetTop……”，你知道多少
当要做一些与位置相关的插件或效果的时候,像top.clientTop.scrollTop.offsetTop.scrollHeight.clientHeight.offsetParent...看到这么 ...
基于WebSocketSharp 的IM 简单实现
websocket-sharp 是一个websocket的C#实现,支持.net 3.5及以上来开发服务端或者客户端.本文主要介绍用websocket-sharp来做服务端.JavaScript做客户 ...
Mysql--Database Exception (#42) 数据库错误
mysql是phpstudy中的mysql,出现这个错误八成是php.ini中没有设置mysql.sock 使用探针或者phpinfo查看php.ini的位置. sudo find / -name m ...
ubuntu上lamp环境搭建
首先,介绍个彻底删除linux已经安装的软件的方法. sudo apt-get purge mysql-server mysql-client mysql-common mysql-server-5. ...
如何使用supervisor管理你的应用
1.前言 Supervisor(http://supervisord.org/)是用Python开发的一个client/server服务,是UNIX-like系统下的一个进程管理工具,不支持Windo ...
hihoCoder 1036 Trie图 AC自动机
题意:给定n个模式串和一个文本串,判断文本中是否存在模式串. 思路:套模板即可. AC代码 #include <cstdio> #include <cmath> #includ ...
2.3 PCI桥与PCI设备的配置空间
PCI设备都有独立的配置空间,HOST主桥通过配置读写总线事务访问这段空间.PCI总线规定了三种类型的PCI配置空间,分别是PCI Agent设备使用的配置空间,PCI桥使用的配置空间和Cardbus ...

TensorLayer官方中文文档1.7.4：API – 强化学习

API - 强化学习¶

奖励函数¶

损失函数¶

Weighted Cross Entropy¶

Log weight¶

采样选择函数¶

TensorLayer官方中文文档1.7.4：API – 强化学习的更多相关文章

随机推荐

热门专题