[转]How rival bots battled their way to poker supremacy
How rival bots battled their way to poker supremacy
http://www.nature.com/news/how-rival-bots-battled-their-way-to-poker-supremacy-1.21580
Artificial-intelligence programs harness game-theory strategies and deep learning to defeat human professionals in two-player hold 'em.
Article tools

Juice/Alamy
Top professional poker players have been beaten by AI bots at no-limits hold' em.
A complex variant of poker is the latest game to be mastered by artificial intelligence (AI). And it has been conquered not once, but twice, by two rival bots developed by separate research teams.
Both algorithms plays a ‘no limits’ two-player version of Texas Hold 'Em. And each has in recent months hit a crucial AI milestone: they have beaten human professional players.
The game first fell in December to DeepStack, developed by computer scientists at the University of Alberta in Edmonton, Canada, with collaborators from Charles University and the Czech Technical University in Prague. A month later, Libratus, developed by a team at Carnegie Mellon University (CMU) in Pittsburgh, Pennsylvania, achieved the feat.
Over the past decade the groups have pushed each other to make ever better bots, and now the team behind DeepStack has formally published details of its AI in Science1. But the bots are yet to play each other.
Nature looks at how the two AIs stack up, what the accomplishments could mean for online casinos and what’s left for AI to conquer.
Why do AI researchers care about poker?
AIs have mastered several board games, including chess and the complex-strategy game Go. But poker has a key difference from board games that adds complexity: players must work out their strategies without being able to see all of the information about the game on the table. They must consider what cards their opponents might have and what the opponents might guess about their hand based on previous betting.
Games that have such ‘imperfect information’ mirror real-life problem-solving scenarios, such as auctions and financial negotiations, and poker has become an AI test bed for these situations.
Algorithms have already cracked simpler forms of poker: the Alberta team essentially solved a limited version of two-player hold ’em poker in 2015. The form played by DeepStack and Libratus is still a two-player game, but there are no limits on how much an individual player can bet or raise — which makes it considerably more complex for an AI to navigate.
How did the human-versus-AI games unfold?
Over 4 weeks beginning in November last year, DeepStack beat 10 of 11 professional players by a statistically significant margin, playing 3,000 hands against each.
Related stories
Then, in January, Libratus beat four better professionals who are considered specialists at the game, over a total of around 120,000 hands. The computer ended up almost US$1.8 million up in virtual chips.
What are the mathematics behind the algorithms?
Game theory. Both AIs aim to find a strategy that is guaranteed not to lose, regardless of how an opponent plays. And because one-on-one poker is a zero-sum game — meaning that one player’s loss is always the opponent’s gain — game theory says that such a strategy always exists. Whereas a human player might exploit a weak opponent’s errors to win big, an AI with this strategy isn’t concerned by margins — it plays only to win. That means it also won't be thrown by surprising behaviour.
Previous poker-playing algorithms have generally tried to work out strategies ahead of time, computing massive ‘game trees’ that outline solutions for all the different ways that a game could unfold. But the number of possibilities is so huge — 10160 — that mapping all of them is impossible. So researchers settled for solving fewer possibilities. In a game, an algorithm compares a live situation to those that it has previously calculated. It finds the closest one and ‘translates’ the corresponding action to the table.
Now, however, both DeepStack and Libratus have found ways to compute solutions in real time — as is done by computers that play chess and Go.
How do the approaches of the two AIs compare?
Instead of trying to work out the whole game tree ahead of time, DeepStack recalculates only a short tree of possibilities at each point in a game.
The developers created this approach using deep learning, a technique that uses brain-inspired architectures known as neural networks (and that helped a computer to beat one of the world’s best players at Go).
By playing itself in more than 11 million game situations, and learning from each one, DeepStack gained an ‘intuition’ about the likelihood of winning from a given point in the game. This allows it to calculate fewer possibilities in a relatively short time — about 5 seconds — and make real-time decisions.
The Libratus team has yet to publish its method, so it’s not as clear how the program works. What we do know is that early in a hand, it uses previously calculated possibilities and the ‘translation’ approach, although it refines that strategy as the game gives up more information. But for the rest of each hand, as the possible outcomes narrow, the algorithm also computes solutions in real time.
And Libratus also has a learning element. Its developers added a self-improvement module, which automatically analyses the bot's playing strategy to learn how an opponent had exploited its weaknesses. They then use the information to permanently patch up holes in the AI’s approach.
The two methods require substantially different computing power: DeepStack trained using 175 core years — the equivalent of running a processing unit for 175 years or a few hundred computers for a few months. And during games it can run off a single laptop. Libratus, by contrast, uses a supercomputer before and during the match, and the equivalent of around 2,900 core years.
Can they bluff?
Yes. People often see bluffing as something human, but to a computer, it has nothing to do with reading an opponent, and everything to do with the mathematics of the game. Bluffing is merely a strategy to ensure that a player’s betting pattern never reveals to an opponent the cards that they have.
OK, so which result was more impressive?
It depends on whom you ask. Experts could quibble over the intricacies of both methods, but overall, both AIs played enough hands to generate statistically significant wins — and both against professional players.
Libratus played more hands, but DeepStack didn’t need to, because its team used a sophisticated statistical method that enabled them to prove a significant result from fewer games. Libratus beat much better professionals than did DeepStack, but on average, DeepStack won by a bigger margin.
Will the two AIs now face off?
Maybe. A sticking point is likely to be the big difference in computing power and so the speed of play between the AIs. This could make it difficult to find rules to which both sides can agree.
University of Alberta computer scientist Michael Bowling, one of the developers of DeepStack, says that his team is up for playing Libratus. But Libratus developer Tuomas Sandholm at CMU says that he first wants to see DeepStack beat Baby Tartanian8, one of his team’s earlier and weaker AIs.
Bowling stresses that the match would carry a big caveat: the winner might not be the better bot. Both are trying to play the perfect game, but the strategy closest to that ideal doesn’t always come out in head-to-head play. One program could accidentally hit on a hole in the opponent’s strategy, but that wouldn’t necessarily mean that the strategy overall has more or bigger holes. Unless one team wins by a substantial margin, says Bowling, “my feeling is it won’t be as informative as people would like it to be”.
Does this mean the end of online poker?
No. Many online poker casinos forbid the use of a computer to play in matches, although top players have started to train against machines.
Now that computers have slain another AI milestone, what’s left to tackle?
There are few mountains left for the AI community to climb. In part, this is because many of the games that remain unsolved, such as bridge, have more complicated rules, and so have made for less obvious targets.
The natural next move for both teams is to tackle multiplayer poker. This could mean almost starting from scratch because zero-sum game theory does not apply: in three-player poker, for instance, a bad move by one opponent can indirectly hinder, rather than always advantage, another player.
But the intuition of deep learning could help to find solutions even where the theory doesn’t apply, says Bowling. His team’s first attempts to apply similar methods in the three-player version of limited Texas Hold ’Em have turned out surprisingly well, he says.
Another challenge is training an AI to play games without being told the rules, and instead discovering them as it goes along. This scenario more realistically mirrors real-world problem-solving situations that humans face.
The ultimate test will be to explore how much imperfect-information algorithms really can help to tackle messy real-world problems with incomplete information, such as in finance and cybersecurity.
- Nature
- 543,
- 160–161
- (09 March 2017)
- doi:10.1038/nature.2017.21580
References
Moravčík, M. et al. Science http://dx.doi.org/10.1126/science.aam6960 (2017).
[转]How rival bots battled their way to poker supremacy的更多相关文章
- RPCL(Rival Penalized Competitive Learning)在matlab下的实现
RPCL是在线kmeans的改进版,相当于半自动的选取出k个簇心:一开始设定N个簇心,N>k,然后聚类.每次迭代中把第二近的簇心甩开一段距离. 所谓在线kmeans是就是,每次仅仅用一个样本来更 ...
- Bots(逆元,递推)
H. Bots time limit per test 1.5 seconds memory limit per test 256 megabytes input standard input out ...
- Codeforces Bubble Cup 8 - Finals [Online Mirror]H. Bots 数学
H. Bots Time Limit: 1 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/575/problem/H Desc ...
- Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students 水题
A. Two Rival Students There are
- Bubble Cup 8 finals H. Bots (575H)
题意: 简单来说就是生成一棵树,要求根到每个叶子节点上的路径颜色排列不同, 且每条根到叶子的路径恰有n条蓝边和n条红边. 求生成的树的节点个数. 1<=n<=10^6 题解: 简单计数. ...
- Py中map与np.rival学习
转自:廖雪峰网站 1.map/reduce map()函数接收两个参数,一个是函数,一个是Iterable,map将传入的函数依次作用到序列的每个元素,并把结果作为新的Iterator返回. 举例说明 ...
- A.Two Rival Students
题目:两个竞争的学生 链接:(两个竞争的对手)[https://codeforces.com/contest/1257/problem/A] 题意:有n个学生排成一行.其中有两个竞争的学生.第一个学生 ...
- 【CF1257A】Two Rival Students【思维】
题意:给你n个人和两个尖子生a,b,你可以操作k次,每次操作交换相邻两个人的位置,求问操作k次以内使得ab两人距离最远是多少 题解:贪心尽可能的将两人往两边移动,总结一下就是min(n-1,|a-b| ...
- Educational Codeforces Round 76 (Rated for Div. 2) A. Two Rival Students
You are the gym teacher in the school. There are nn students in the row. And there are two rivalling ...
随机推荐
- Python----list&元祖常用方法总结
一.创建列表,把使用逗号分隔的数据用中括号[ ]括起来即为一个列表,列表也叫数组.list.array:列表里面也可以再套列表,一个列表里面套一个列表,叫二维数组:一个里面套一个列表,里面的列表再套 ...
- 第一讲(3)osgearth编译
前题条件完成osg 3.0的编译Step 1 下载osgEarth 2.1.1https://github.com/gwaldron/osgearth/downloads------------> ...
- 通过BeanPostProcessor理解Spring中Bean的生命周期
通过BeanPostProcessor理解Spring中Bean的生命周期及AOP原理 Spring源码解析(十一)Spring扩展接口InstantiationAwareBeanPostProces ...
- rsyslog的配置文件使用方法
参考地址: http://www.rsyslog.com/doc/v8-stable/configuration/property_replacer.html rsyslog消息模板的定义规则 &qu ...
- find中的-exec参数
1.find中的-exec参数 在当前目录下(包含子目录),查找所有txt文件并找出含有字符串"bin"的行 find ./ -name "*.txt" -ex ...
- MFC CDHtmlDialog 加载本地资源
步骤:1.资源视图 项目右击选择资源添加,自定义添加新类型 如:JS(会增加JS文件夹)2. 选择1新建的文件夹右击 添加资源 导入 选择js文件引入3. 在资源文件Resource.h文件夹能找到资 ...
- CString 转换为 wchar_t *
1.将CString转换为const char* CString str = _T("231222"); std::string strDp = CStringA(str); / ...
- 【环境】新建Maven工程步骤及报错解决方法
新建Maven工程步骤: 1.打开eclipse,File-New-Other-Maven-Maven project 点击Finish,即可创建一个Maven工程.Maven是内置的,不需要额外下载 ...
- http协商缓存VS强缓存
之前一直对浏览器缓存只能描述一个大概,深层次的原理不能描述上来:终于在前端的两次面试过程中被问倒下,为了泄恨,查阅一些资料最终对其有了一个更深入的理解,废话不多说,赶紧来看看浏览器缓存的那些事吧,有不 ...
- js 设置img标签的src资源无法找到的替代图片(通过img的属性设置)
在网站的前端页面设计中,要考虑到img图片资源的存在性,如果img的src图片资源不存在或显示不出来,则需要显示默认的图片.如何做到呢? 一.监听document的error事件 document.a ...