Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Heinrich, Johannes, and David Silver. "Deep reinforcement learning from self-play in imperfect-information games." arXiv preprint arXiv:1603.01121(2016).

这篇文章提出了基于深度学习的自我博弈达到纳什均衡的训练方法。这个方法避免了人为的先验知识的误导，采用了端到端的训练方式，达到了人类专家级水平。

方法：

通过自我博弈产生训练数据，用来训练Qlearning网络和有监督学习网络。然后对这两个网络做ensemble

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games的更多相关文章

(转) Playing FPS games with deep reinforcement learning
Playing FPS games with deep reinforcement learning 博文转自:https://blog.acolyer.org/2016/11/23/playing- ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
Learning Roadmap of Deep Reinforcement Learning
1. 知乎上关于DQN入门的系列文章 1.1 DQN 从入门到放弃 DQN 从入门到放弃1 DQN与增强学习 DQN 从入门到放弃2 增强学习与MDP DQN 从入门到放弃3 价值函数与Bellman ...
(转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
论文笔记之：Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning ICML 2016 Best Paper 摘要:本文的贡献点主要是在 DQN ...
getting started with building a ROS simulation platform for Deep Reinforcement Learning
Apparently, this ongoing work is to make a preparation for futural research on Deep Reinforcement Le ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
论文笔记之：Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...
论文笔记之：Deep Reinforcement Learning with Double Q-learning
Deep Reinforcement Learning with Double Q-learning Google DeepMind Abstract 主流的 Q-learning 算法过高的估计在特 ...
论文笔记之：Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning <Computer Science>, 2013 Abstract: 本文提出了一种深度学习方 ...

随机推荐

C语言笔记本
在此记录一些常见的C语言错误,可以当作学习C语言的笔记,需要的时候可以回过头看看. 1.关于“++” #include int main() { int a,b,cd; a=10; b=a++; c= ...
Android事件分发机制源代码分析
小小感慨一下,做android有一段时间了,一直以来都是习惯整理笔记存到有道笔记上,没有写博客的习惯. 以后逐步分类整理出来,也算"复习"一遍了 - _ - . android的事 ...
Java和C++通过Socket通信中文乱码的解决
理想的开发状态是我开始就是C开发,一直是C的开发,现在还是C的开发,若干年后,幸运的话,我可以成为C语言的高手或者专家…… 更实际的情况是我开始是C开发,后来变成了JAVA开发,然后又做起了VC++的 ...
【转载】BasicDataSource配置说明
commons DBCP 配置参数简要说明在配置时,主要难以理解的主要有:removeAbandoned .logAbandoned.removeAbandonedTimeout.maxWait这四 ...
redis实践：用户注册登录功能
本节将使用PHP和Redis实现用户注册登录功能,下面分模块来介绍具体实现方法. 1．注册需求描述:用户注册时需要提交邮箱.登录密码和昵称.其中邮箱是用户的唯一标识,每个用户的邮箱不能重复,但允许用 ...
设置PDF文件默认缩放比例
android.content.res.TypedArray-深入理解android自定义属性(AttributeSet,TypedArray)
属性自定义属性,首先要定义出来属性,我们新建一个attrs.xml: <?xml version="1.0" encoding="utf-8"?> ...
angular学习笔记(十)-src和href处理
本篇主要介绍angular中图片的src和链接的href的处理: 用到了以下两个属性: ng-src: 绑定了数据的路径表达式 ng-href: 绑定了数据的路径表达式例如: <!DOCTYP ...
基于python的直播间接口测试实战详解结合项目
基于python的直播间接口测试详解一.基本用例内容描述以设置白名单 /advisor/setUserWhiteList.do接口为例,该方法为POST at first,先要导入一些常用到的模块 ...
JSP/Servlet中文乱码处理总结
下面的任何一条缺一不可,注意,我之所以全部都用的XXX,意思就是这几个最好全部都一致! 1.HTML中要用meta content="text/html; charset=XXX" ...

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games的更多相关文章

随机推荐

热门专题