https://medium.com/aureliantactics/integrating-new-games-into-retro-gym-12b237d3ed75

OpenAI’s retro gym is a great tool for using Reinforcement Learning (RL) algorithms on classic video game systems like Super Nintendo, Genesis, Game Boy, Atari, and more. The latest version comes configured to train your RL agent on dozens of games (roms not included). What if you want to add your own game? I’ll walk through the steps to do so in this post

Pick a Game

To integrate a game into Retro Gym you’ll need to find a rom of the game. For the Retro Contest you had to buy the Sonic the Hedgehog roms from Steam. There are more classic Sega roms on Steam. When you get the rom, check that it is the correct extension. The Super Nintendo (SNES) game I chose had a .smc and .sfc rom. Retro Gym wants .sfc for SNES. I chose the puzzle game Bust A Move (aka Puzzle Bobble) for my rom.

Install Retro Gym

I did my installation on a paperspace.com ML-in-a-box instance (like in this post). TLDR:

git clone --recursive https://github.com/openai/retro.git gym-retro
cd gym-retro
pip3 install -e .

Install the Integration UI

The Integration UI lets you play the game in order to create save states and find key areas of the games memory. Using these areas of the game’s memory you can come up with a reward function, a done condition, and other useful data for training your RL agent. Install steps for Linux from Retro Gym’s README.md:

sudo apt-get install capnproto libcapnp-dev libqt5opengl5-dev qtbase5-dev
cmake . -DBUILD_UI=ON -UPYLIB_DIRECTORY
make -j$(grep -c ^processor /proc/cpuinfo)
./gym-retro-integration

If you have some install problems, the issues section of the repo has some pointers.

Configure the Rom

The ./gym-retro-integration line launches the Integration UI. Here your goal is to:

  • Create states for Retro Gym environment to load
  • Inspect the rom’s memory for points of interest
  • Find the address and types of those points of interest

The official guide and this issue thread (in particular MaxStrange) offer some pointers.

Creating states is simple enough. From the official guide:

  1. Open Gym Retro Integration after setting up the UI.
  2. Load a new game — Command-Shift-O
  3. Select the ROM of the game you’d like to integrate in the menu.
  4. Name the game.
  5. The game will open. To see what keys correspond to what controls in-game, go to Window > Control.
  6. Using the available controls, select a level, option, mode, character, etc. and take note of these options.
  7. When you are finally at the first playable moment of the game, pause the game (in the integrator, not within the actual game) (Command-P), and save the state (Command-S). This moment can be hard to find, and you might have to go back through and restart the game (Command-R) to find and save that exact state.
  8. Save the state — include the options you chose in the previous menus — e.g. SailorMoon.QueenBerylsCastle.Easy.Level1.state

Examining the Game’s Memory

Inspecting the rom for points of interest is a trial-and-error process. I recommend reading through the the official guide and the issue thread I linked to for some tips to speed this up. I used a tool called BizHawk to do this (requires Windows). BizHawk has some convenient tools like a RAM Search that lets you search through RAM values and add them to a RAM Watch list. Retro Gym has a similar tool which I tested out and will work. I was already familiar with BizHawk so I mainly used that. There are many guides to using BizHawk and BizHawk has some convenient frame-by-frame play through methods.

In Bust A Move I was interested in finding where rom’s memory addess for three specific things:

  • Bubbles popped (alternative for the reward function)
  • Game over condition
  • Score (alternative for the reward function)

Finding the bubbles value in memory was simple. I played the game for a bit in the game mode where bubbles is displayed on the top of the screen. Then I searched the RAM for that value. In BizHawk this is done with the RAM Search tool. I found memory addresses with the bubbles value (ie if 16 bubbles had been popped I searched for all memory address with a value of 16). I then added them to the RAM Watch list. I continued to play the game and watched which memory address followed the bubbles score as it increased. This can also be done in Retro Gym. In Retro Gym, I searched for the bubbles value in the address list and then created a variable which I could monitor to confirm that this value was correct.

I found the game over condition by looking for all values that were 1 during the game and 0 when the round was lost. I came up with 100 values that did that and tested that the values were consistent in different game modes and playthroughs. I picked one of the hundred values to be my game over condition.

Score was trickier. The hints in the official guide were helpful for me to figure out what was going on. In particular how one value can be broken up over multiple addresses and often those addresses are located near each other. Score is not stored in one location but by combinations of some powers of 10 in multiple different locations. The 10s score is stored in one address, the hundreds and thousands in another, and the ten-thousands and hundred-thousands in a third. While the 10s score is stored simply as 0 to 9 (ie 2 for 20, 9 for 90 etc.), the 100s are stored by the following formula:

(number of 100s) +16*(number of 1000s) = value stored in address

When you have your memory addresses you’ll need to convert them from hex to decimal and add the emulator system specific rombase number. The rombase number is found in the .json files located in retro/cores. For SNES that meant turning the bubbles hex address (000A5C) into a decimal (2652) adding the rombase number (8257536) and using this value (8260188) in a game specific data.json file (see below).

The system specific .json file also has the allowed types. You will need the type along with the address for the data.json file. See the official guide, the README.md file, the .json file located in retro/cores, and whatever tool you use to find the address for how to find the type.

Create Your Game Files

Each game in Retro Gym has the following files that you’ll want to create for your integrated game:

  • metadata.json: Tells Retro Gym the default state to load
  • data.json: File that tells Retro Gym what memory addresses to read
  • scenario.json: Creates the reward function and the done condition for your RL agent. Optionally, can use this file to link to a .lua script to create more advanced functions.
  • script.lua (optional): Helps create more advanced rewards and done functions.

Click on any game like Sonic (for the Retro Contest set up) or Airstriker(Genesis rom that comes with Retro Gym) to see examples. After creating these files I moved my BustAMove-Snes directory from retro/retro/data/contrib to retro/retro/data/stable in order to run the game. Let’s walk through creating the files.

metadata.json:

{
"default_state": "BustAMove.Challengeplay0"
}

data.json:

{
"info": {
"gameover": {
"address": 8294221,
"type": "|u1"
},
"bubbles": {
"address": 8260188,
"type": "<u4"
},
"score_jyuu": {
"address":8259924,
"type": "|u1"
},
"score_hyaku": {
"address":8259925,
"type": "|u1"
},
"score_man": {
"address":8259928,
"type": "<u4"
}
}
}

Those familiar with gym know that every time you call a gym environment.step() function an observation, reward, done, and info are returned. Whatever you put in your data.json file will be accessible from this info. Example:

import retro

env = retro.make(game='BustAMove-Snes', state='BustAMove-Snes.Challengeplay0')
env.reset()
while True:
_obs, _rew, done, _info = env.step(env.action_space.sample())
print('I have popped {}.format(_info['bubbles']))

scenario.json:

{
"done": {
"script": "lua:done_check"
},
"reward": {
"script": "lua:correct_bubbles"
},
"scripts": [
"script.lua"
]
}

This scenario.json directs to script.lua to do the calculations. Alternatively, Airstriker does the work in the scenario file and doesn’t use a lua script:

{
"done": {
"condition": "all",
"variables": {
"gameover": {
"op": "equal",
"reference": 1
},
"lives": {
"op": "zero"
}
}
},
"reward": {
"variables": {
"score": {
"reward": 1.0
}
}
}
}

script.lua:

previous_bubbles = 0
function correct_bubbles()
if data.bubbles > previous_bubbles then
local delta = data.bubbles - previous_bubbles
previous_bubbles = data.bubbles
return delta
else
return 0
end
endfunction done_check()
if data.gameover == 0 then
return true
end
return false
endprevious_score = 0
function correct_score ()
local current_score = 0
local hundreds = (data.score_hyaku % 16)*100
local thousands = (math.floor(data.score_hyaku/16))*1000
local ten_thousands = (data.score_man % 16)*10000
local hundred_thousands = (math.floor(data.score_man/16))*100000
current_score = data.score_jyuu * 10 + hundreds + thousands + ten_thousands + hundred_thousands if current_score > previous_score then
local delta = current_score - previous_score
previous_score = current_score
return delta
else
return 0
end
end

Feel free to ask any questions. I can also add some pictures if that would be helpful.

【转载】 Integrating New Games into Retro Gym的更多相关文章

  1. 常用增强学习实验环境 I (MuJoCo, OpenAI Gym, rllab, DeepMind Lab, TORCS, PySC2) (转载)

    原文地址:http://blog.csdn.net/jinzhuojun/article/details/77144590 和其它的机器学习方向一样,强化学习(Reinforcement Learni ...

  2. (转载)Let's Play Games!

    第1题  Alice和她的同学Bob通过网上聊天商量明天早晨谁去教室打扫卫生的事,Bob说:“我在桌上放了一枚硬币,你猜一下,是正面朝上还是反面朝上?如果猜对了,我去扫地.如果猜错了,嘿嘿….” Al ...

  3. GitHub上优秀的开源项目(转载)

    转载出处:https://github.com/Trinea/android-open-project 第一部分 个性化控件(View) 主要介绍那些不错个性化的 View,包括 ListView.A ...

  4. Android开源项目分类汇总-转载

    太长了,还是转载吧...今天在看博客的时候,无意中发现了@Trinea在GitHub上的一个项目Android开源项目分类汇总,由于类容太多了,我没有一个个完整地看完,但是里面介绍的开源项目都非常有参 ...

  5. 【转载】安卓APP架构

    注:本篇博文转载于 http://my.oschina.net/mengshuai/blog/541314?fromerr=z8tDxWUH 本文介绍了文章作者从事了几年android应用的开发,经历 ...

  6. [转载] Android逃逸技术汇编

    本文转载自: http://blogs.360.cn/360mobile/2016/10/24/android_escape/ 摘    要 传统逃逸技术涉及网络攻防和病毒分析两大领域,网络攻防领域涉 ...

  7. 【转载】debian上快速搭建ftp

    转载自:http://suifengpiaoshi.diandian.com/post/2012-05-05/17955899 搭建ftp 包括搭建ftp服务器和ftp客户端 本文以debian上搭建 ...

  8. [转载]bigtable 中文版

    转载厦门大学林子雨老师的译文 原文: http://dblab.xmu.edu.cn/post/google-bigtable/ Google Bigtable (中文版) 林子雨2012-05-08 ...

  9. CocoaPods 教程 转载

    CocoaPods安装和使用教程 Code4App 原创文章.转载请注明出处:http://code4app.com/article/cocoapods-install-usage 目录 CocoaP ...

  10. Codeforces Gym 100015A Another Rock-Paper-Scissors Problem 找规律

    Another Rock-Paper-Scissors Problem 题目连接: http://codeforces.com/gym/100015/attachments Description S ...

随机推荐

  1. Kubernetes OOM 和 CPU Throttling 问题

    介绍 使用 Kubernetes 时,内存不足(OOM)错误和 CPU 限制(Throttling)是云应用程序中资源处理的主要难题.为什么呢? 云应用程序中的 CPU 和内存要求变得越来越重要,因为 ...

  2. redis数据类型篇

    redis数据类型官网资料,https://redis.io/docs/manual/data-types/ 生产环境下的redis实况图 超哥这个redis实例里,db0库有140万个key. 1. ...

  3. 《Android开发卷——ListView嵌套GridView(基础)》

      listview嵌套gridview,最主要应该解决的问题是listview跟GridView的滑动问题.这个利用GridView是自定义的,就是让GridView内容有多大就显示多大,然后禁用他 ...

  4. work04

    第一题: 分析以下需求,并用代码实现(每个小需求都需要封装成方法) 1.求两个数据之和(整数 小数),在main方法中打印出来 2.判断两个数据是否相等(整数 小数),在控制台上打印出来 3.获取两个 ...

  5. rust程序设计(6)枚举与模式匹配

    rust中的枚举有什么用?枚举可以嵌入类型的好处是什么 你可以在同一个枚举中既有单个值,也有元组或结构体. 枚举的每个变体可以拥有不同数量和类型的关联数据. 这增加了类型的灵活性和表达力,使你能够更精 ...

  6. word文档生成视频,自动配音、背景音乐、自动字幕,另类创作工具

    简介 不同于别的视频创作工具,这个工具创作视频只需要在word文档中打字,插入图片即可.完事后就能获得一个带有配音.字幕.背景音乐.视频特效滤镜的优美作品. 这种不要门槛,没有技术难度的视频创作工具, ...

  7. navicat 连接oracle 失败

    问题: 1.使用Navicat连接Oracle数据库时,报错ORA-12504: TNS:listener was not given the SERVICE_NAME in CONNECT_DATA ...

  8. Flash驱动控制--芯片擦除(SPI协议)

    摘要: 本篇博客具体包括SPI协议的基本原理.模式选择以及时序逻辑要求,采用FPGA(EPCE4),通过SPI通信协议,对flash(W25Q16BV)存储的固化程序进行芯片擦除操作. 关键词:SPI ...

  9. 【ClickHouse】5:clickhouse集群部署

    背景介绍: 有三台CentOS7服务器安装了ClickHouse HostName IP 安装程序 程序端口 centf8118.sharding1.db 192.168.81.18 clickhou ...

  10. 韦东山freeRTOS系列教程之【第八章】事件组(event group)

    目录 系列教程总目录 概述 8.1 事件组概念与操作 8.1.1 事件组的概念 8.1.2 事件组的操作 8.2 事件组函数 8.2.1 创建 8.2.2 删除 8.2.3 设置事件 8.2.4 等待 ...