实现DQN算法玩CartPole
先安装tensorflow 1.2版本和python 3.6,
接着安装:
numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl
的版本,这个是windows下的,如果linux下直接使用pip install numpy就可以了。
再接着安装scipy版本,也是windows 10下64位版本:
scipy-0.19.1-cp36-cp36m-win_amd64.whl
下载这些文件是通过网站:http://www.lfd.uci.edu/~gohlke/pythonlibs/ ,它是提供WINDOWS的版本。
接着下来,就是安装gym模块:
D:\AI\sample\tensorforce>pip install gym
它的网站连接是https://pypi.python.org/pypi/gym/0.2.0
最后,就是安装tesorforce库:
git clone git@github.com:reinforceio/tensorforce.git
cd tensorforce
pip install -e .
这样就可以运行强化学习的例子:
python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json
结果输出如下:
[2017-07-24 11:05:28,928] Making new env: CartPole-v0
2017-07-24 11:05:30.056045: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.056185: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.057611: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059002: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059684: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.060401: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061129: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061744: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.406967: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 4.99GiB
2017-07-24 11:05:30.407097: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-07-24 11:05:30.408846: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: Y
2017-07-24 11:05:30.409518: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
[2017-07-24 11:05:31,317] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-24 11:05:34,728] Finished episode 50 after 16 timesteps
[2017-07-24 11:05:34,728] Episode reward: 16.0
[2017-07-24 11:05:34,729] Average of last 500 rewards: 2.6
[2017-07-24 11:05:34,730] Average of last 100 rewards: 13.0
[2017-07-24 11:05:39,224] Finished episode 100 after 81 timesteps
[2017-07-24 11:05:39,224] Episode reward: 81.0
[2017-07-24 11:05:39,226] Average of last 500 rewards: 6.49
[2017-07-24 11:05:39,226] Average of last 100 rewards: 32.45
[2017-07-24 11:05:47,185] Finished episode 150 after 177 timesteps
[2017-07-24 11:05:47,185] Episode reward: 177.0
[2017-07-24 11:05:47,186] Average of last 500 rewards: 13.364
[2017-07-24 11:05:47,187] Average of last 100 rewards: 53.82
[2017-07-24 11:06:00,082] Finished episode 200 after 136 timesteps
[2017-07-24 11:06:00,083] Episode reward: 136.0
[2017-07-24 11:06:00,084] Average of last 500 rewards: 24.414
[2017-07-24 11:06:00,085] Average of last 100 rewards: 89.62
[2017-07-24 11:06:18,207] Finished episode 250 after 144 timesteps
[2017-07-24 11:06:18,208] Episode reward: 144.0
[2017-07-24 11:06:18,209] Average of last 500 rewards: 40.096
[2017-07-24 11:06:18,210] Average of last 100 rewards: 133.66
[2017-07-24 11:06:35,440] Finished episode 300 after 143 timesteps
[2017-07-24 11:06:35,440] Episode reward: 143.0
[2017-07-24 11:06:35,442] Average of last 500 rewards: 55.014
[2017-07-24 11:06:35,443] Average of last 100 rewards: 153.0
[2017-07-24 11:06:52,119] Finished episode 350 after 169 timesteps
[2017-07-24 11:06:52,119] Episode reward: 169.0
[2017-07-24 11:06:52,120] Average of last 500 rewards: 69.586
[2017-07-24 11:06:52,121] Average of last 100 rewards: 147.45
[2017-07-24 11:07:05,441] Finished episode 400 after 73 timesteps
[2017-07-24 11:07:05,441] Episode reward: 73.0
[2017-07-24 11:07:05,441] Average of last 500 rewards: 81.13
[2017-07-24 11:07:05,441] Average of last 100 rewards: 130.58
[2017-07-24 11:07:15,715] Finished episode 450 after 123 timesteps
[2017-07-24 11:07:15,715] Episode reward: 123.0
[2017-07-24 11:07:15,716] Average of last 500 rewards: 89.798
[2017-07-24 11:07:15,716] Average of last 100 rewards: 101.06
[2017-07-24 11:07:23,885] Finished episode 500 after 51 timesteps
[2017-07-24 11:07:23,885] Episode reward: 51.0
[2017-07-24 11:07:23,886] Average of last 500 rewards: 96.714
[2017-07-24 11:07:23,886] Average of last 100 rewards: 77.92
[2017-07-24 11:07:32,642] Finished episode 550 after 62 timesteps
[2017-07-24 11:07:32,642] Episode reward: 62.0
[2017-07-24 11:07:32,643] Average of last 500 rewards: 101.236
[2017-07-24 11:07:32,643] Average of last 100 rewards: 70.19
[2017-07-24 11:07:42,235] Finished episode 600 after 53 timesteps
[2017-07-24 11:07:42,235] Episode reward: 53.0
[2017-07-24 11:07:42,236] Average of last 500 rewards: 104.336
[2017-07-24 11:07:42,237] Average of last 100 rewards: 70.56
[2017-07-24 11:07:52,850] Finished episode 650 after 72 timesteps
[2017-07-24 11:07:52,851] Episode reward: 72.0
[2017-07-24 11:07:52,851] Average of last 500 rewards: 105.88
[2017-07-24 11:07:52,852] Average of last 100 rewards: 77.04
[2017-07-24 11:08:02,817] Finished episode 700 after 80 timesteps
[2017-07-24 11:08:02,817] Episode reward: 80.0
[2017-07-24 11:08:02,818] Average of last 500 rewards: 102.986
[2017-07-24 11:08:02,818] Average of last 100 rewards: 82.87
[2017-07-24 11:08:13,726] Finished episode 750 after 77 timesteps
[2017-07-24 11:08:13,726] Episode reward: 77.0
[2017-07-24 11:08:13,727] Average of last 500 rewards: 96.038
[2017-07-24 11:08:13,727] Average of last 100 rewards: 84.45
[2017-07-24 11:08:25,828] Finished episode 800 after 59 timesteps
[2017-07-24 11:08:25,828] Episode reward: 59.0
[2017-07-24 11:08:25,829] Average of last 500 rewards: 90.658
[2017-07-24 11:08:25,829] Average of last 100 rewards: 91.36
[2017-07-24 11:08:42,711] Finished episode 850 after 177 timesteps
[2017-07-24 11:08:42,711] Episode reward: 177.0
[2017-07-24 11:08:42,712] Average of last 500 rewards: 90.34
[2017-07-24 11:08:42,712] Average of last 100 rewards: 118.96
[2017-07-24 11:09:03,141] Finished episode 900 after 167 timesteps
[2017-07-24 11:09:03,141] Episode reward: 167.0
[2017-07-24 11:09:03,142] Average of last 500 rewards: 95.556
[2017-07-24 11:09:03,142] Average of last 100 rewards: 155.07
[2017-07-24 11:09:27,219] Finished episode 950 after 200 timesteps
[2017-07-24 11:09:27,219] Episode reward: 200.0
[2017-07-24 11:09:27,220] Average of last 500 rewards: 105.26
[2017-07-24 11:09:27,220] Average of last 100 rewards: 175.66
[2017-07-24 11:09:53,139] Finished episode 1000 after 200 timesteps
[2017-07-24 11:09:53,139] Episode reward: 200.0
[2017-07-24 11:09:53,141] Average of last 500 rewards: 118.344
[2017-07-24 11:09:53,141] Average of last 100 rewards: 191.86
[2017-07-24 11:10:19,722] Finished episode 1050 after 200 timesteps
[2017-07-24 11:10:19,722] Episode reward: 200.0
[2017-07-24 11:10:19,723] Average of last 500 rewards: 131.222
[2017-07-24 11:10:19,724] Average of last 100 rewards: 200.0
[2017-07-24 11:10:45,962] Finished episode 1100 after 200 timesteps
[2017-07-24 11:10:45,963] Episode reward: 200.0
[2017-07-24 11:10:45,963] Average of last 500 rewards: 144.198
[2017-07-24 11:10:45,963] Average of last 100 rewards: 199.83
1. RPG游戏从入门到精通
3. 俄罗斯方块游戏开发
http://edu.csdn.net/course/detail/51104. boost库入门基础
http://edu.csdn.net/course/detail/50295.Arduino入门基础
http://edu.csdn.net/course/detail/49316.Unity5.x游戏基础入门
http://edu.csdn.net/course/detail/48107. TensorFlow API攻略
http://edu.csdn.net/course/detail/44958. TensorFlow入门基本教程
http://edu.csdn.net/course/detail/43699. C++标准模板库从入门到精通
http://edu.csdn.net/course/detail/332410.跟老菜鸟学C++
http://edu.csdn.net/course/detail/290111. 跟老菜鸟学python
http://edu.csdn.net/course/detail/259212. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/259013. 在Windows下SVN的版本管理与实战
http://edu.csdn.net/course/detail/257914.Visual Studio 2015开发C++程序的基本使用
http://edu.csdn.net/course/detail/257015.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/258216.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672
实现DQN算法玩CartPole的更多相关文章
- 教你从头到尾利用DQN自动玩flappy bird(全程命令提示,GPU+CPU版)【转】
转自:http://blog.csdn.net/v_JULY_v/article/details/52810219?locationNum=3&fps=1 目录(?)[-] 教你从头到尾利用D ...
- DQN算法
DQN算法:基础入门看看 # -*- coding: utf-8 -*- import random import gym import numpy as np from collections im ...
- 【强化学习】DQN 算法改进
DQN 算法改进 (一)Dueling DQN Dueling DQN 是一种基于 DQN 的改进算法.主要突破点:利用模型结构将值函数表示成更加细致的形式,这使得模型能够拥有更好的表现.下面给出公式 ...
- DQN算法原理详解
一. 概述 强化学习算法可以分为三大类:value based, policy based 和 actor critic. 常见的是以DQN为代表的value based算法,这种算法中只有一个值函数 ...
- TensorFlow利用A3C算法训练智能体玩CartPole游戏
本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...
- DRL 教程 | 如何保持运动小车上的旗杆屹立不倒?TensorFlow利用A3C算法训练智能体玩CartPole游戏
本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...
- 【转】【强化学习】Deep Q Network(DQN)算法详解
原文地址:https://blog.csdn.net/qq_30615903/article/details/80744083 DQN(Deep Q-Learning)是将深度学习deeplearni ...
- 强化学习算法DQN
1 DQN的引入 由于q_learning算法是一直更新一张q_table,在场景复杂的情况下,q_table就会大到内存处理的极限,而且在当时深度学习的火热,有人就会想到能不能将从深度学习中借鉴方法 ...
- [DQN] What is Deep Reinforcement Learning
已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...
随机推荐
- tomcat+svn+maven+jenkins实现自动构建
首先说明一个各软件的版本: tomcat:apache-tomcat-8.5.16.tar.gz maven:apache-maven-3.5.0-bin.tar.gz svn:subversion- ...
- 20145329《Java程序设计》实验四总结
实验四 Android环境搭建 实验内容 1.搭建Android环境 2.运行Android 3.修改代码,能输出学号 实验步骤 1.搭建Android环境 2.安装Android,核心是配置JDK. ...
- 20135320赵瀚青LINUX第三章读书笔记
第三章 进程管理 3.1 进程 进程的定义: 是处于执行期的程序以及它所包含的资源的总称. 线程的定义: 是在进程中活动的对象. 每个线程都拥有一个独立的程序计数器.进程栈和一组进程寄存器. 内核调度 ...
- ARTS Week 001
Algorithm Leetcode 1. Two Sum Given an array of integers, return indices of the two numbers such tha ...
- LeetCode(476): Number Complement
Given a positive integer, output its complement number. The complement strategy is to flip the bits ...
- java for语句执行顺序
public class test{ public static void main(String[] args) { int i=0; for(printChar ...
- PHP设计模式(三):抽象工厂模式
- SQL优化的若干原则
SQL语句:是对数据库(数据)进行操作的惟一途径:消耗了70%~90%的数据库资源:独立于程序设计逻辑,相对于对程序源代码的优化,对SQL语句的优化在时间成本和风险上的代价都很低:可以有不同的写法:易 ...
- 【cs231n】神经网络学习笔记3
+ mu) * v # 位置更新变了形式 对于NAG(Nesterov's Accelerated Momentum)的来源和数学公式推导,我们推荐以下的拓展阅读: Yoshua Bengio的Adv ...
- 算法学习 - ST表 - 稀疏表 - 解决RMQ问题
2017-08-26 21:44:45 writer:pprp RMQ问题就是区间最大最小值查询问题: 这个SparseTable算法构造一个表,F[i][j] 表示 区间[i, i + 2 ^ j ...