Playing FPS games with deep reinforcement learning

博文转自：https://blog.acolyer.org/2016/11/23/playing-fps-games-with-deep-reinforcement-learning/

When I wrote up ‘Asynchronous methods for deep learning’ last month, I made a throwaway remark that after Go the next challenge for deep learning systems would be to win an esports competition against the best human teams. Can you imagine the theatre!

Source: ‘League of Legends’ video game championship is like the World Cup, Super Bowl combined – Fortune:http://fortune.com/2015/10/29/league-of-legends-video-game-championship/

Since those are team competitions, it would need to be a team of collaborating software agents playing against human teams. Which would make for some very cool AI technology.

Today’s paper isn’t quite at that level yet, but it does show that progress is already being made on playing first-person shooter (FPS) games in 3D environments.

In this paper, we tackle the task of playing an FPS game in a 3D environment. This task is much more challenging than playing most Atari games as it involves a wide variety of skills, such as navigating through a map, collecting items, recognizing and fighting enemies, etc. Furthermore, states are partially observable, and the agent navigates a 3D environment in a first-person perspective, which makes the task more suitable for real-world robotics applications.

Lample and Chaplot develop an AI agent for playing death matches. I’m not really an FPS kind of person, and had no idea what a deathmatch was. It turns out to be a scenario in which the objective is to maximize the number of kills by a player/agent. Nice. The agent uses separate neural networks for navigation tasks and for action tasks. Experimentation is done using the VizDoom framework for developing AI bots that play Doom. It turns out there’s even been a recent VizDoom competition, with the ‘full deathmatch’ category won by a team from Intel Labs. Here’s a video of their entry in action:

Deep Recurrent Q-Networks

The core of the system is built on a DRQN architecture (Deep Recurrent Q-Network). A regular Deep-Q network, such as that used to play Atari games, receives a full (or very close to full) observation of the environment at each step. But in a game like DOOM where agent’s field of view is limited to 90 degrees centred around its position it only receives apartial observation.

In 2015 Hausknecht and Stone introduced Deep Recurrent Q-Networks which include an extra parameter at each step representing the hidden state of the agent. This can be accomplished by layering a recurrent neural network such as an LSTM on top of a normal DNQ network.

Two models

In a deathmatch, you need to explore the map to collect items and find enemies, and then you need to fight enemies when you find them. Lample and Chaplot use two networks, one for navigation, and one for action. The current phase of the game (and hence which model to use at any given time) is determined by predicting whether or not an enemy is visible in the current frame (action model if so, navigation model otherwise).

There are various advantages of splitting the task into two phases and training a different network for each phase. First, this makes the architecture modular and allows different models to be trained and tested independently… Furthermore, the navigation phase only requires three actions (move forward, turn left, and turn right), which dramatically reduces the number of state-action pairs required to learn the Q-function and makes training much faster. More importantly, using two networks also mitigates ‘camper’ behaviour, i.e. the tendency to stay in one area of the map and wait for enemies, which was exhibited by the agent when we tried to train a single DQN or DRQN for the deathmatch task.

Training

When trained using a vanilla DRQN approach, agents tended either to fire at will, hoping for enemies to wander into their crossfire, or not fire at all when given a penalty for using ammunition. This is because the agent could not effectively learn to detect enemies. To address this, the team gave the agent additional information that it could use during training (but not during actual gameplay or testing). At each training step, in addition to receiving a video frame, the agent received a boolean value for each entity (enemy, health pack, weapon, ammo and so on) indicating whether or not it appeared in the frame.

We modified the DRQN architecture to incorporate this information and to make it sensitive to game features. In the initial model, the output of the CNN is given to a LSTM that predicts a score for each action based on the current frame and its hidden state. We added two fully-connected layers of size 512, and k-connected to the output of the CNN, where k is the number of game features we want to detect… Although a lot of game information was available, we only used an indicator about the presence of enemies on the current frame.

Jointly training the DRQN model and the game feature detection allows the kernels of the convolutional layers to capture relevant information about the game with only a few hours of training needed to reach an optimal enemy detection accuracy of 90%.

The reward function for the action network includes:

positive rewards for kills
negative rewards for suicides
positive rewards for picking up objects
negative rewards for losing health
negative rewards for shooting or losing ammo

The navigation network was simply given a positive reward for picking up an item, and a negative reward for walking on lava.

A frame-skip of 4 turned out to be best overall balance between training speed and performance (the agent receives a screen input every 4+1 frames, and the action decided by the network is repeated over all the skipped frames). During back-propagation, only action states with enough history to give a reasonable estimation are updated.

Fighting! (evaluation)

Evaluation is done using the delightful kill to death ratio (K/D) as the scoring metric. Table 2 below shows how well the agent performed both on known maps (limited deathmatch) and on unknown maps (full deathmatch).

You can watch the agent play in these videos.

Here’s how it stacks up against human opposition:

The authors conclude:

In this paper, we have presented a complete architecture for playing deathmatch scenarios in FPS games. We introduced a method to augment a DRQN model with high-level game information, and modularized our architecture to incorporate independent networks responsible for different phases of the game. These methods lead to dramatic improvements over the standard DRQN model when applied to complicated tasks like a deathmatch. We showed that the proposed model is able to outperform built-in bots as well as human players and demonstrated the generalizability of our model to unknown maps. Moreover, our methods are complementary to recent improvements in DQN, and could easily be combined with dueling architectures (Wang, de Freitas, and Lanctot 2015), and priorized replay (Schaul et al. 2015).

(转) Playing FPS games with deep reinforcement learning的更多相关文章

Playing FPS Games with Deep Reinforcement Learning
论文不同点: (1)用两套网络分别实现移动和射击. (2)使用LSTM来处理不完全信息. 疑问: (1)为什么对于射击使用RNN,对导航却没有使用RNN.一般来说,当我们看见视野里面有敌人的时候,我们 ...
(zhuan) Deep Reinforcement Learning Papers
Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...
【资料总结】| Deep Reinforcement Learning 深度强化学习
在机器学习中,我们经常会分类为有监督学习和无监督学习,但是尝尝会忽略一个重要的分支,强化学习.有监督学习和无监督学习非常好去区分,学习的目标,有无标签等都是区分标准.如果说监督学习的目标是预测,那么强 ...
(转) Deep Reinforcement Learning: Playing a Racing Game
Byte Tank Posts Archive Deep Reinforcement Learning: Playing a Racing Game OCT 6TH, 2016 Agent playi ...
论文笔记之：Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning <Computer Science>, 2013 Abstract: 本文提出了一种深度学习方 ...
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Heinrich, Johannes, and David Silver. "Deep reinforcement learning from self-play in imperfect- ...
Paper Reading 1 - Playing Atari with Deep Reinforcement Learning
来源:NIPS 2013 作者:DeepMind 理解基础: 增强学习基本知识深度学习特别是卷积神经网络的基本知识创新点:第一个将深度学习模型与增强学习结合在一起从而成功地直接从高维的输入学习控 ...
(转) Deep Reinforcement Learning: Pong from Pixels
Andrej Karpathy blog About Hacker's guide to Neural Networks Deep Reinforcement Learning: Pong from ...
论文笔记之：Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning ICML 2016 深度强化学习最近被人发现貌似不太稳定,有人提出很多改善的方法,这些方法有很 ...

随机推荐

软件测试第四周--关于int.parse()的类型转换问题
先来归纳一下我们用过的所有类型转换方法: 1. 隐式类型转换,即使用(int) 直接进行强制类型转换.这种方法的优点是简单粗暴,直接指定转换类型,没有任何保护措施,所以也很容易抛出异常导致程序崩溃.当 ...
计算机●编程语言●JAVA
<Java编程思想(第4版)> 2016-04-27 12:38 ☆ 对JAVA知识面比较全的介绍.但也只是介绍,没有进入主题好好分析透彻.所以适合有几年工作经验的java码农(虽然 ...
wchar_t内置还是别名？小问题一则（升级公司以前代码遇到的问题）
问题: 原来的2008工程用2010编译后,运行程序出现无法定位程序输入点 *@basic_string@_WU@*和*@basic_string@G@* 解决: 关闭“语言选项”中“将WChar_t ...
python selenium与自动化
大学是学习过java,但是工作中没用,忘完了,而且哪怕以后有了机会,就是很不愿意去学这个语言,开始喜欢上了c#,但是随着学的升入,感觉.net太庞大了,要学习那么多,总感觉我学这个要做什么,感觉要做的 ...
谈谈黑客攻防技术的成长规律（aullik5）
黑莓末路昨晚听FM里谈到了RIM这家公司,有分析师认为它需要很悲催的裁员90%,才能保证活下去.这是一个意料之中,但又有点兔死狐悲的消息.可能在不久的将来,RIM这家公司就会走到尽头,或被收购,或申 ...
Android深度探索--HAL与驱动开发----第七章读书笔记
首先创建led驱动的设备文件,可以使用cdev_init,register_chrdev_region,cdev_add等建立主设备号的设备文件.步骤如下: 1使用cdev_init初始化cdev 2 ...
13个JavaScript图表（JS图表）图形绘制插件【转】
现在网络上又有越来越多的免费的(JS 图表)JavaScript图表图形绘制插件.我之前给一家网站做过复杂的图形,我们用的是 highchart.在那段时间,没有很多可供选择的插件.但现在不同了,很容 ...
Ubuntu 16.04 64位安装insight 6.8
1. apt-get install insight已经不管用. 2. 编译源码死都有问题. 3. 拜拜,用KDBG.
BZOJ3669 （动态树）
Problem 魔法森林 (NOI2014) 题目大意给n个点,m条边的无向图,每条边有两个权值a,b. 求一条从1-->n的路径,使得这条路径上max(a)+max(b)最小.输出最小值即可 ...
android 返回键操作
cocos2dx项目移植到android平台上对于 android手机返回键,主菜单键等键的相关操作,本篇详细对返回键做个简单的介绍说明, 不足不对之处,请同猿们指出. 首先在主activity下,即 ...

(转) Playing FPS games with deep reinforcement learning