先安装tensorflow 1.2版本和python 3.6,

接着安装:

numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl

的版本,这个是windows下的,如果linux下直接使用pip install numpy就可以了。

再接着安装scipy版本,也是windows 10下64位版本:

scipy-0.19.1-cp36-cp36m-win_amd64.whl

下载这些文件是通过网站:http://www.lfd.uci.edu/~gohlke/pythonlibs/  ,它是提供WINDOWS的版本。

接着下来,就是安装gym模块:

D:\AI\sample\tensorforce>pip install gym

它的网站连接是https://pypi.python.org/pypi/gym/0.2.0

最后,就是安装tesorforce库:

git clone git@github.com:reinforceio/tensorforce.git
cd tensorforce
pip install -e .

这样就可以运行强化学习的例子:

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json

结果输出如下:

[2017-07-24 11:05:28,928] Making new env: CartPole-v0
2017-07-24 11:05:30.056045: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.056185: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.057611: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059002: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059684: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.060401: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061129: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061744: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.406967: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 4.99GiB
2017-07-24 11:05:30.407097: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-07-24 11:05:30.408846: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y
2017-07-24 11:05:30.409518: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
[2017-07-24 11:05:31,317] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-24 11:05:34,728] Finished episode 50 after 16 timesteps
[2017-07-24 11:05:34,728] Episode reward: 16.0
[2017-07-24 11:05:34,729] Average of last 500 rewards: 2.6
[2017-07-24 11:05:34,730] Average of last 100 rewards: 13.0
[2017-07-24 11:05:39,224] Finished episode 100 after 81 timesteps
[2017-07-24 11:05:39,224] Episode reward: 81.0
[2017-07-24 11:05:39,226] Average of last 500 rewards: 6.49
[2017-07-24 11:05:39,226] Average of last 100 rewards: 32.45
[2017-07-24 11:05:47,185] Finished episode 150 after 177 timesteps
[2017-07-24 11:05:47,185] Episode reward: 177.0
[2017-07-24 11:05:47,186] Average of last 500 rewards: 13.364
[2017-07-24 11:05:47,187] Average of last 100 rewards: 53.82
[2017-07-24 11:06:00,082] Finished episode 200 after 136 timesteps
[2017-07-24 11:06:00,083] Episode reward: 136.0
[2017-07-24 11:06:00,084] Average of last 500 rewards: 24.414
[2017-07-24 11:06:00,085] Average of last 100 rewards: 89.62
[2017-07-24 11:06:18,207] Finished episode 250 after 144 timesteps
[2017-07-24 11:06:18,208] Episode reward: 144.0
[2017-07-24 11:06:18,209] Average of last 500 rewards: 40.096
[2017-07-24 11:06:18,210] Average of last 100 rewards: 133.66
[2017-07-24 11:06:35,440] Finished episode 300 after 143 timesteps
[2017-07-24 11:06:35,440] Episode reward: 143.0
[2017-07-24 11:06:35,442] Average of last 500 rewards: 55.014
[2017-07-24 11:06:35,443] Average of last 100 rewards: 153.0
[2017-07-24 11:06:52,119] Finished episode 350 after 169 timesteps
[2017-07-24 11:06:52,119] Episode reward: 169.0
[2017-07-24 11:06:52,120] Average of last 500 rewards: 69.586
[2017-07-24 11:06:52,121] Average of last 100 rewards: 147.45
[2017-07-24 11:07:05,441] Finished episode 400 after 73 timesteps
[2017-07-24 11:07:05,441] Episode reward: 73.0
[2017-07-24 11:07:05,441] Average of last 500 rewards: 81.13
[2017-07-24 11:07:05,441] Average of last 100 rewards: 130.58
[2017-07-24 11:07:15,715] Finished episode 450 after 123 timesteps
[2017-07-24 11:07:15,715] Episode reward: 123.0
[2017-07-24 11:07:15,716] Average of last 500 rewards: 89.798
[2017-07-24 11:07:15,716] Average of last 100 rewards: 101.06
[2017-07-24 11:07:23,885] Finished episode 500 after 51 timesteps
[2017-07-24 11:07:23,885] Episode reward: 51.0
[2017-07-24 11:07:23,886] Average of last 500 rewards: 96.714
[2017-07-24 11:07:23,886] Average of last 100 rewards: 77.92
[2017-07-24 11:07:32,642] Finished episode 550 after 62 timesteps
[2017-07-24 11:07:32,642] Episode reward: 62.0
[2017-07-24 11:07:32,643] Average of last 500 rewards: 101.236
[2017-07-24 11:07:32,643] Average of last 100 rewards: 70.19
[2017-07-24 11:07:42,235] Finished episode 600 after 53 timesteps
[2017-07-24 11:07:42,235] Episode reward: 53.0
[2017-07-24 11:07:42,236] Average of last 500 rewards: 104.336
[2017-07-24 11:07:42,237] Average of last 100 rewards: 70.56
[2017-07-24 11:07:52,850] Finished episode 650 after 72 timesteps
[2017-07-24 11:07:52,851] Episode reward: 72.0
[2017-07-24 11:07:52,851] Average of last 500 rewards: 105.88
[2017-07-24 11:07:52,852] Average of last 100 rewards: 77.04
[2017-07-24 11:08:02,817] Finished episode 700 after 80 timesteps
[2017-07-24 11:08:02,817] Episode reward: 80.0
[2017-07-24 11:08:02,818] Average of last 500 rewards: 102.986
[2017-07-24 11:08:02,818] Average of last 100 rewards: 82.87
[2017-07-24 11:08:13,726] Finished episode 750 after 77 timesteps
[2017-07-24 11:08:13,726] Episode reward: 77.0
[2017-07-24 11:08:13,727] Average of last 500 rewards: 96.038
[2017-07-24 11:08:13,727] Average of last 100 rewards: 84.45
[2017-07-24 11:08:25,828] Finished episode 800 after 59 timesteps
[2017-07-24 11:08:25,828] Episode reward: 59.0
[2017-07-24 11:08:25,829] Average of last 500 rewards: 90.658
[2017-07-24 11:08:25,829] Average of last 100 rewards: 91.36
[2017-07-24 11:08:42,711] Finished episode 850 after 177 timesteps
[2017-07-24 11:08:42,711] Episode reward: 177.0
[2017-07-24 11:08:42,712] Average of last 500 rewards: 90.34
[2017-07-24 11:08:42,712] Average of last 100 rewards: 118.96
[2017-07-24 11:09:03,141] Finished episode 900 after 167 timesteps
[2017-07-24 11:09:03,141] Episode reward: 167.0
[2017-07-24 11:09:03,142] Average of last 500 rewards: 95.556
[2017-07-24 11:09:03,142] Average of last 100 rewards: 155.07
[2017-07-24 11:09:27,219] Finished episode 950 after 200 timesteps
[2017-07-24 11:09:27,219] Episode reward: 200.0
[2017-07-24 11:09:27,220] Average of last 500 rewards: 105.26
[2017-07-24 11:09:27,220] Average of last 100 rewards: 175.66
[2017-07-24 11:09:53,139] Finished episode 1000 after 200 timesteps
[2017-07-24 11:09:53,139] Episode reward: 200.0
[2017-07-24 11:09:53,141] Average of last 500 rewards: 118.344
[2017-07-24 11:09:53,141] Average of last 100 rewards: 191.86
[2017-07-24 11:10:19,722] Finished episode 1050 after 200 timesteps
[2017-07-24 11:10:19,722] Episode reward: 200.0
[2017-07-24 11:10:19,723] Average of last 500 rewards: 131.222
[2017-07-24 11:10:19,724] Average of last 100 rewards: 200.0
[2017-07-24 11:10:45,962] Finished episode 1100 after 200 timesteps
[2017-07-24 11:10:45,963] Episode reward: 200.0
[2017-07-24 11:10:45,963] Average of last 500 rewards: 144.198
[2017-07-24 11:10:45,963] Average of last 100 rewards: 199.83

1. RPG游戏从入门到精通


2. WiX安装工具的使用

3. 俄罗斯方块游戏开发
http://edu.csdn.net/course/detail/51104. boost库入门基础
http://edu.csdn.net/course/detail/50295.Arduino入门基础
http://edu.csdn.net/course/detail/49316.Unity5.x游戏基础入门
http://edu.csdn.net/course/detail/48107. TensorFlow API攻略
http://edu.csdn.net/course/detail/44958. TensorFlow入门基本教程
http://edu.csdn.net/course/detail/43699. C++标准模板库从入门到精通 
http://edu.csdn.net/course/detail/332410.跟老菜鸟学C++
http://edu.csdn.net/course/detail/290111. 跟老菜鸟学python
http://edu.csdn.net/course/detail/259212. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/259013. 在Windows下SVN的版本管理与实战 
http://edu.csdn.net/course/detail/257914.Visual Studio 2015开发C++程序的基本使用 
http://edu.csdn.net/course/detail/257015.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/258216.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672

实现DQN算法玩CartPole的更多相关文章

  1. 教你从头到尾利用DQN自动玩flappy bird(全程命令提示,GPU+CPU版)【转】

    转自:http://blog.csdn.net/v_JULY_v/article/details/52810219?locationNum=3&fps=1 目录(?)[-] 教你从头到尾利用D ...

  2. DQN算法

    DQN算法:基础入门看看 # -*- coding: utf-8 -*- import random import gym import numpy as np from collections im ...

  3. 【强化学习】DQN 算法改进

    DQN 算法改进 (一)Dueling DQN Dueling DQN 是一种基于 DQN 的改进算法.主要突破点:利用模型结构将值函数表示成更加细致的形式,这使得模型能够拥有更好的表现.下面给出公式 ...

  4. DQN算法原理详解

    一. 概述 强化学习算法可以分为三大类:value based, policy based 和 actor critic. 常见的是以DQN为代表的value based算法,这种算法中只有一个值函数 ...

  5. TensorFlow利用A3C算法训练智能体玩CartPole游戏

    本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...

  6. DRL 教程 | 如何保持运动小车上的旗杆屹立不倒?TensorFlow利用A3C算法训练智能体玩CartPole游戏

    本教程讲解如何使用深度强化学习训练一个可以在 CartPole 游戏中获胜的模型.研究人员使用 tf.keras.OpenAI 训练了一个使用「异步优势动作评价」(Asynchronous Advan ...

  7. 【转】【强化学习】Deep Q Network(DQN)算法详解

    原文地址:https://blog.csdn.net/qq_30615903/article/details/80744083 DQN(Deep Q-Learning)是将深度学习deeplearni ...

  8. 强化学习算法DQN

    1 DQN的引入 由于q_learning算法是一直更新一张q_table,在场景复杂的情况下,q_table就会大到内存处理的极限,而且在当时深度学习的火热,有人就会想到能不能将从深度学习中借鉴方法 ...

  9. [DQN] What is Deep Reinforcement Learning

    已经成为DL中专门的一派,高大上的样子 Intro: MIT 6.S191 Lecture 6: Deep Reinforcement Learning Course: CS 294: Deep Re ...

随机推荐

  1. 20145331 实验一 "Java开发环境的熟悉"

    20145331 实验一 "Java开发环境的熟悉" 实验内容 使用JDK和IDE编译.运行简单的Java程序.题目: 实现四则运算,并进行测试. 编写代码 1.首先第一步就是要输 ...

  2. Linux读书笔记1/2章

    linux的内核设计: 第一章 1.1Linux历史: 历经时间的考验,今天Unix已经发展成一个支持抢占式多任务.多线程.虚拟内存.换页.动态链接.TCP/Ip网络的现代化操作系统. 1.2追寻Li ...

  3. Redis中RedisTemplate和Redisson管道的使用

    当对Redis进行高频次的命令发送时,由于网络IO的原因,会耗去大量的时间.所以Redis提供了管道技术,就是将命令一次性批量的发送给Redis,从而减少IO. 一.Jedis对redis的管道进行操 ...

  4. Django安装及创建工程

    Django MTV模型介绍 Django的MTV分别代表: Model(模型):负责业务对象与数据库的对象(ORM) Template(模版):负责如何把页面展示给用户 View(视图):负责业务逻 ...

  5. Restful Api CRUD 标准示例 (Swagger2+validator)

    为什么要写这篇贴? 要写一个最简单的CRUD 符合 Restful Api    规范的  一个Controller, 想百度搜索一下 直接复制拷贝 简单修改一下 方法内代码. 然而, 搜索结果让我无 ...

  6. JAVA8 HashMap 源码阅读

    序 阅读java源码可能是每一个java程序员的必修课,只有知其所以然,才能更好的使用java,写出更优美的程序,阅读java源码也为我们后面阅读java框架的源码打下了基础.阅读源代码其实就像再看一 ...

  7. GTS--阿里巴巴分布式事务全新解决方案

    现代IT应用中,服务化SOA作为主流的技术架构被广泛应用到各种信息系统.原来一个系统被分拆成若干个服务的集合,产生了跨服务调用的分布式事务问题.随着Dubbo.SpringCloud等微服务框架的流行 ...

  8. Android 进行解析并显示服务端返回的数据

    例子说明:用户通过访问web资源的最新电影资讯,服务器端生成XML或JSON格式数据,返回Android客户端进行显示. 此案例开发需要两个方面 WEB开发和Android开发. 一.web开发相对比 ...

  9. 永久以管理员身份运行cmd

    系统:win7 1,下图输入 cmd,找到cmd 2,发送到桌面快捷方式 3,在桌面上的cmd,右键,属性 点高级,进入后,勾上 管理员.

  10. poj2187凸包最远点对

    暴力过了 #include<map> #include<set> #include<cmath> #include<queue> #include< ...