How rival bots battled their way to poker supremacy

http://www.nature.com/news/how-rival-bots-battled-their-way-to-poker-supremacy-1.21580

Artificial-intelligence programs harness game-theory strategies and deep learning to defeat human professionals in two-player hold 'em.

Elizabeth Gibney

02 March 2017

Article tools

Juice/Alamy

Top professional poker players have been beaten by AI bots at no-limits hold' em.

A complex variant of poker is the latest game to be mastered by artificial intelligence (AI). And it has been conquered not once, but twice, by two rival bots developed by separate research teams.

Both algorithms plays a ‘no limits’ two-player version of Texas Hold 'Em. And each has in recent months hit a crucial AI milestone: they have beaten human professional players.

The game first fell in December to DeepStack, developed by computer scientists at the University of Alberta in Edmonton, Canada, with collaborators from Charles University and the Czech Technical University in Prague. A month later, Libratus, developed by a team at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania, achieved the feat.

Over the past decade the groups have pushed each other to make ever better bots, and now the team behind DeepStack has formally published details of its AI in Science¹. But the bots are yet to play each other.

Nature looks at how the two AIs stack up, what the accomplishments could mean for online casinos and what’s left for AI to conquer.

Why do AI researchers care about poker?

AIs have mastered several board games, including chess and the complex-strategy game Go. But poker has a key difference from board games that adds complexity: players must work out their strategies without being able to see all of the information about the game on the table. They must consider what cards their opponents might have and what the opponents might guess about their hand based on previous betting.

Game theorists crack poker

Games that have such ‘imperfect information’ mirror real-life problem-solving scenarios, such as auctions and financial negotiations, and poker has become an AI test bed for these situations.

Algorithms have already cracked simpler forms of poker: the Alberta team essentially solved a limited version of two-player hold ’em poker in 2015. The form played by DeepStack and Libratus is still a two-player game, but there are no limits on how much an individual player can bet or raise — which makes it considerably more complex for an AI to navigate.

How did the human-versus-AI games unfold?

Over 4 weeks beginning in November last year, DeepStack beat 10 of 11 professional players by a statistically significant margin, playing 3,000 hands against each.

Game theory. Both AIs aim to find a strategy that is guaranteed not to lose, regardless of how an opponent plays. And because one-on-one poker is a zero-sum game — meaning that one player’s loss is always the opponent’s gain — game theory says that such a strategy always exists. Whereas a human player might exploit a weak opponent’s errors to win big, an AI with this strategy isn’t concerned by margins — it plays only to win. That means it also won't be thrown by surprising behaviour.

Previous poker-playing algorithms have generally tried to work out strategies ahead of time, computing massive ‘game trees’ that outline solutions for all the different ways that a game could unfold. But the number of possibilities is so huge — 10¹⁶⁰ — that mapping all of them is impossible. So researchers settled for solving fewer possibilities. In a game, an algorithm compares a live situation to those that it has previously calculated. It finds the closest one and ‘translates’ the corresponding action to the table.

Now, however, both DeepStack and Libratus have found ways to compute solutions in real time — as is done by computers that play chess and Go.

How do the approaches of the two AIs compare?

Instead of trying to work out the whole game tree ahead of time, DeepStack recalculates only a short tree of possibilities at each point in a game.

The developers created this approach using deep learning, a technique that uses brain-inspired architectures known as neural networks (and that helped a computer to beat one of the world’s best players at Go).

Computer science: The learning machines

By playing itself in more than 11 million game situations, and learning from each one, DeepStack gained an ‘intuition’ about the likelihood of winning from a given point in the game. This allows it to calculate fewer possibilities in a relatively short time — about 5 seconds — and make real-time decisions.

The Libratus team has yet to publish its method, so it’s not as clear how the program works. What we do know is that early in a hand, it uses previously calculated possibilities and the ‘translation’ approach, although it refines that strategy as the game gives up more information. But for the rest of each hand, as the possible outcomes narrow, the algorithm also computes solutions in real time.

And Libratus also has a learning element. Its developers added a self-improvement module, which automatically analyses the bot's playing strategy to learn how an opponent had exploited its weaknesses. They then use the information to permanently patch up holes in the AI’s approach.

The two methods require substantially different computing power: DeepStack trained using 175 core years — the equivalent of running a processing unit for 175 years or a few hundred computers for a few months. And during games it can run off a single laptop. Libratus, by contrast, uses a supercomputer before and during the match, and the equivalent of around 2,900 core years.

Can they bluff?

Yes. People often see bluffing as something human, but to a computer, it has nothing to do with reading an opponent, and everything to do with the mathematics of the game. Bluffing is merely a strategy to ensure that a player’s betting pattern never reveals to an opponent the cards that they have.

OK, so which result was more impressive?

It depends on whom you ask. Experts could quibble over the intricacies of both methods, but overall, both AIs played enough hands to generate statistically significant wins — and both against professional players.

Google AI algorithm masters ancient game of Go

Libratus played more hands, but DeepStack didn’t need to, because its team used a sophisticated statistical method that enabled them to prove a significant result from fewer games. Libratus beat much better professionals than did DeepStack, but on average, DeepStack won by a bigger margin.

Will the two AIs now face off?

Maybe. A sticking point is likely to be the big difference in computing power and so the speed of play between the AIs. This could make it difficult to find rules to which both sides can agree.

University of Alberta computer scientist Michael Bowling, one of the developers of DeepStack, says that his team is up for playing Libratus. But Libratus developer Tuomas Sandholm at CMU says that he first wants to see DeepStack beat Baby Tartanian8, one of his team’s earlier and weaker AIs.

Bowling stresses that the match would carry a big caveat: the winner might not be the better bot. Both are trying to play the perfect game, but the strategy closest to that ideal doesn’t always come out in head-to-head play. One program could accidentally hit on a hole in the opponent’s strategy, but that wouldn’t necessarily mean that the strategy overall has more or bigger holes. Unless one team wins by a substantial margin, says Bowling, “my feeling is it won’t be as informative as people would like it to be”.

Does this mean the end of online poker?

No. Many online poker casinos forbid the use of a computer to play in matches, although top players have started to train against machines.

Now that computers have slain another AI milestone, what’s left to tackle?

There are few mountains left for the AI community to climb. In part, this is because many of the games that remain unsolved, such as bridge, have more complicated rules, and so have made for less obvious targets.

The natural next move for both teams is to tackle multiplayer poker. This could mean almost starting from scratch because zero-sum game theory does not apply: in three-player poker, for instance, a bad move by one opponent can indirectly hinder, rather than always advantage, another player.

What Google’s winning Go algorithm will do next

But the intuition of deep learning could help to find solutions even where the theory doesn’t apply, says Bowling. His team’s first attempts to apply similar methods in the three-player version of limited Texas Hold ’Em have turned out surprisingly well, he says.

Another challenge is training an AI to play games without being told the rules, and instead discovering them as it goes along. This scenario more realistically mirrors real-world problem-solving situations that humans face.

The ultimate test will be to explore how much imperfect-information algorithms really can help to tackle messy real-world problems with incomplete information, such as in finance and cybersecurity.

Nature

543,

160–161

(09 March 2017)

doi:10.1038/nature.2017.21580

References

Moravčík, M. et al. Science http://dx.doi.org/10.1126/science.aam6960 (2017).

Show context

[转]How rival bots battled their way to poker supremacy的更多相关文章

RPCL(Rival Penalized Competitive Learning)在matlab下的实现
RPCL是在线kmeans的改进版,相当于半自动的选取出k个簇心:一开始设定N个簇心,N>k,然后聚类.每次迭代中把第二近的簇心甩开一段距离. 所谓在线kmeans是就是,每次仅仅用一个样本来更 ...
Bots(逆元，递推）
H. Bots time limit per test 1.5 seconds memory limit per test 256 megabytes input standard input out ...
Codeforces Bubble Cup 8 - Finals [Online Mirror]H. Bots 数学
H. Bots Time Limit: 1 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/575/problem/H Desc ...
Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students 水题
A. Two Rival Students There are
Bubble Cup 8 finals H. Bots (575H)
题意: 简单来说就是生成一棵树,要求根到每个叶子节点上的路径颜色排列不同, 且每条根到叶子的路径恰有n条蓝边和n条红边. 求生成的树的节点个数. 1<=n<=10^6 题解: 简单计数. ...
Py中map与np.rival学习
转自:廖雪峰网站 1.map/reduce map()函数接收两个参数,一个是函数,一个是Iterable,map将传入的函数依次作用到序列的每个元素,并把结果作为新的Iterator返回. 举例说明 ...
A.Two Rival Students
题目:两个竞争的学生链接:(两个竞争的对手)[https://codeforces.com/contest/1257/problem/A] 题意:有n个学生排成一行.其中有两个竞争的学生.第一个学生 ...
【CF1257A】Two Rival Students【思维】
题意:给你n个人和两个尖子生a,b,你可以操作k次,每次操作交换相邻两个人的位置,求问操作k次以内使得ab两人距离最远是多少题解:贪心尽可能的将两人往两边移动,总结一下就是min(n-1,|a-b| ...
Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students
You are the gym teacher in the school. There are nn students in the row. And there are two rivalling ...

随机推荐

理解JS中的this的指向
原文地址:https://www.cnblogs.com/pssp/p/5216085.html#1 首先必须要说的是,this的指向在函数定义的时候是确定不了的,只有函数执行的时候才能确定this到 ...
第2天【OS Linux发行版介绍、Linux系统基础使用入门、Linux命令帮助、Linux基础命令】
Logout 退出系统 Gedit 文本编辑器工具 Uname –r 查看内核版本信息,uname –a 比较详细 Cat /proc/cpuinfo 查看CPU Cat /p ...
SpringBoot添加webapp目录
一.文章简述使用IDEA工具创建的SpringBoot项目本身是没有webapp目录的.如果我们想要添加webapp目录的话,可以手动添加. 二.操作步骤 1)点击IDEA右上角的Project S ...
IntelliJ Idea 快捷键列表
最重要:1Ctrl+Alt+Shift+T:查找类2重构3提取父类 Ctrl+Shift + Enter,语句完成“!”,否定完成,输入表达式时按 “!”键Ctrl+E,最近的文件Ctrl+Shift ...
Java Web(二) Servlet详解
什么是Servlet? Servlet是运行在Web服务器中的Java程序.Servlet通常通过HTTP(超文本传输协议)接收和响应来自Web客户端的请求.Java Web应用程序中所有的请求-响应 ...
Cyclic Components CodeForces - 977E(DFS)
Cyclic Components CodeForces - 977E You are given an undirected graph consisting of nn vertices and ...
bzoj4310
题解: 后缀数组求出本质不同的串然后二分答案贪心判断是否可行代码: #include<bits/stdc++.h> ; using namespace std; typedef lo ...
day30 操作系统介绍进程的创建
今日内容一.操作系统的简单介绍二,并发与并行三.同步异步阻塞非阻塞四.multiprocess模块 1. 操作系统的简单介绍多道技术(重点) 空间复用: 时间复用: 进程之间是空间隔离的分 ...
关于iOS设备的那些事
首先推荐一个在用的库XYQuick 地址:https://github.com/uxyheaven/XYQuick idfa: 获取方式 [ASIdentifierManager sharedMana ...
MicroOrm.Dapper.Repositories 的使用
https://github.com/geffzhang/MicroOrm.Dapper.Repositories 1.特性标记都是要引用: System.ComponentModel.DataAnn ...

[转]How rival bots battled their way to poker supremacy