Bull Session
AI Plays Poker
June 1, 2018
Episode Summary
This week on The Digital Life, our special guest is Noam Brown, a PhD student in Computer Science at Carnegie Mellon University, who with his advisor, Professor Sandholm, created Libratus, an AI which decisively defeated four of the world’s best human poker professionals in a Man vs. Machine competition. The breakthrough was published in Science, received widespread mainstream news coverage, and continues to be cited as one of the milestone achievements of AI. Join us as we discuss poker, the application of AI to imperfect information games, and the possibilities for this kind of artificial intelligence to be used in negotiation and other real world scenarios.
Resources:
Noam Brown
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals
How computers were finally able to best poker pros
Inside Libratus, the Poker AI That Out-Bluffed the Best Humans
And this is particularly challenging for AI. It makes things way more difficult. It makes it way harder to compute a strategy. And so, for a long time, researchers in AI just sort of ignored the issue, and they focused on these perfect information games like chess and go, and just sort of pretended that problems like poker didn’t exist, really.
Which is really unsatisfying, and so there were some of us, myself included, that thought this was a problem we should address, because the truth is, most real world situations involve hidden information. You can make an AI that plays chess, but it’s not going to be that useful in the real world if there’s hidden information involved.
And we really took a very different approach that’s compared to the Ais used for perfect information games. We did something that’s very different, and ultimately was successful, and I think that particularly, after AlphaGo and the success that AI has had recently with perfect information games, the fact that those techniques could not be used in imperfect information games, hidden information games like poker, really highlighted the importance of our research, and highlighted the importance of our accomplishment in beating humans at poker.
So, we don’t use techniques that are specific to poker. We don’t, for example, tell the bots to bluff at a certain percentage, or say that with these cards you should raise, and with these cards, you should fold. We’re developing a way for the AI to determine strategy on its own, so if it were faced with a different imperfect information game … And, by the way, I’m using “game” very loosely here to mean any sort of strategic interaction. I’m using game in the sense of game theory, not in the sense of a “let’s play Monopoly” sort of game.
The AI starts by knowing nothing about the game, and it plays totally randomly, and it plays itself. It plays a copy of itself in that game for trillions of iterations. Trillions of hands of poker, for example. And as it plays, it learns from experience that if it was in a particular situation and it raised and lost money, well, could it have gotten more money if it had folded instead? So, after the hand is over, it will review its decisions, and it will say at each decision point, what would have happened if I had taken some other action instead? If I had raised instead of called, would I have gotten more money? And if the answer is yes, then it will have regret on that action. It will regret not having taken that action, and if it can find itself in that decision point later on, or a similar decision point, then it will take that action that it regrets with higher probability.
And, in truth, this is actually very similar to how humans learn. If you’ve ever played poker with some humans, then you’ll know it’s very common for a person to ask, “What would have happened if I had raised instead? Would you have called me?” And that’s exactly what the bot is doing. It’s asking that hypothetical question of, what would have happened if I had done this other thing instead? And it’s able to get an answer because it’s playing against a copy of itself, and so it can ask itself that question, and it can give itself an answer.
I think the idea that match equilibrium exists, the idea that this perfect strategy exists in poker, is surprising to a lot of people. But if you think about it for a bit, it’s really, you can see this in smaller games. For example, in rock, paper, scissors, we all know what the match equilibrium strategy is. It’s to throw rock, paper, and scissors with one-third probability each. And if you were to do that, if you were to just play that strategy, then no matter what your opponent does, you’re going to not lose an expectation.
Now, in the case of rock, paper, scissors, you’re not going to win an expectation. No, you’re just going to tie an expectation. But in a complicated game like poker, if you are able to play the match equilibrium strategy, then it’s likely your opponent will make mistakes, and by playing the match equilibrium strategy, you will, in practice, win, because you’re playing the perfect strategy.
So, we’re not trying to adapt to the opponent. In fact, during the competition, we never looked at the cards the opponent had, for example. We never cared which player we were playing against. We were always playing the same exact strategy, no matter who the opponent was.
But it’s not perfect, and so actually what it does is, during the competition, if it finds itself in a particular situation, when it’s actually playing against a human in a particular hand of poker, and it’s on the third betting round, for example, it will compute in real time a closer approximation to match equilibrium for the situation that it’s in at that moment. So, it will take 20 seconds to figure out, let me find a much better strategy for this particular situation, but that fits within this overarching blueprint strategy that I’ve computed for the entire game as a whole.
That’s the second component, and I think that was actually the big breakthrough with Libratus. Nobody had really found an effective way of doing real time equilibrium computation in imperfect information games before. But of course, with chess AIs and go AIs, thinking in real time, that’s a big part of those AIs, and it’s kind of surprising, in retrospect, that people didn’t really focus on this earlier in imperfect information games.
Now, the third component, which is what you described, is this idea that it was, I would say, in some sense, learning from the opponents. Now, I want to make it clear, it was not adapting to the opponents. It was not trying to exploit the opponents in any way. What happened is, because the AI is not perfect, it’s not computing a perfect match equilibrium, there are some parts in the game tree, there are some different situations, where it’s playing suboptimally. And that’s a problem, because if it’s playing suboptimally, then there’s opportunities for the humans to exploit it in those situations.
And that’s what the humans were constantly trying to do. Every single day, they were trying to find out, where are these weaknesses that we could take advantage of this AI? So, at the end of each day the AI would review the situations that it was finding itself in most frequently. It was trying to find the situations where the humans were, not necessarily successfully, but at least trying to exploit it. And it would come up with a much better strategy, a much closer approximation of the match equilibrium for those particular situations that the humans were focusing on. And then the next day it would have a much better strategy in those situations, so it would be far less exploitable for those points.
And so, this led to a sort of cat and mouse sort of game, because every day, the humans would try to find a weakness and take advantage of it, and at the end of each day, the AI would fix those weaknesses in preparation for the next day. So, as the competition progressed, these holes that were in the AI shrunk and got smaller and smaller over time, and the humans had even less opportunity to exploit the AI.
Some of the cool things that it did were, for example, it was using … I think the best example was, it was using bet sizes that were very different from human convention. In the human poker world, you typically bet a fraction of the pot, and that fraction of the pot is about .5 times the pot to one times the pot. So, if there is $300 in the pot, you might bet $150, or you might bet $300, and maybe in some really rare circumstances you might bet $500 at most.
But the AI didn’t feel a need to constrain its bets to those amounts, and it would some times bet three times the pot, five times the pot, sometimes even 20 times the pot. It would have no problem putting $20,000 into a $200 pot. And this was a big shock to the humans. It was very frustrating for them in particular, because they could be in a situation where they have a really, really strong hand, maybe the second-best hand that’s possible, and then suddenly the bot bets $20,000 into a $500 pot, and the bot’s basically saying, I’m either bluffing or I have the best hand.
And so, this human who’s sitting there with the second-best hand now has to think for a really long time, is he really going to lay down the second-best hand that’s possible just because the AI is saying that he has something better? And you could see the humans sometimes taking five or 10 minutes to make a decision. It was very frustrating for them. It was very satisfying for me, but I could understand the constriction in those moments. So, that was one difference.
I think, also, a big difference, it was bluffing … It learned to bluff, of course, because you have to to bluff to play poker well. But the situations that it chose to bluff in were pretty different. I mean, some situations were pretty different from what humans would do. And this was something that, I’m not a very advanced player, but I have a pretty decent idea of how to play the game. Actually, when we were developing the AI, I would look at the hands and try to get a sense of how well it was doing, and I would see these really weird situations where it would bluff with hands that didn’t make sense to me.
I actually called up my friend who’s a better poker player, he plays professionally, and I said, “Is this a smart move? Can you tell me if this is something that a human would do?” And he looked at the hand and said, “You have a bug in your program. There’s no way that it can be bluffing in that situation.” So, I looked at the code, and I realized the code looks totally fine.
So, I called up an even better poker player, one of the best in the world, and asked him, “In your opinion, is this a smart thing to do?” And he said, “Okay, this is not something that I would do, but let me take some time to think about it.” And he came back the next day, and he said, “I thought about it, I crunched the numbers, and this is not something that any human would ever do, but it’s actually a brilliant move.” And he said, “This bot is thinking two moves ahead of a human.”
Now, we’re not specifically looking at financial markets right now, but I think that down the road, that is potentially something that we will look into, and I think that it’s going to have a huge impact. Everybody would love to see an AI … Well, I mean, some people. People would love to own an AI that can trade on the financial markets and do that well. So, there’s a lot of interest in this, but I think that there are steps that need to be taken to get to that point. I wouldn’t say it’s going to happen within three years, but maybe 10 years from now, we will see this being used in financial markets.
But the real world isn’t necessarily like that. Maybe military situations are like that, but if you’re dealing with a negotiation for example, you have win-win outcomes. It’s not zero sum. And that’s important to understand. It’s important to understand that you and your adversary can both win in this game, and the techniques that are needed are a little bit different from the zero-sum setting. So, figuring out how to adapt the techniques that we have to this general sum setting, that is one of the things that we’re looking at in the short term, and I think that’s the smaller obstacle to overcome.
The bigger obstacle is that when you’re moving from a game to the real world, your strategies and your payoffs are not well-defined. In poker, it’s very clear what actions you can take in any given situation, and it’s very clear what the payoffs are for those actions. You win a certain amount of money at the end of the hand. But if you move to a negotiation, for example, your actions are not as clearly defined. You can negotiate over all sorts of things, and that may not be well-defined at the start. And the payoffs are also not as well-defined. How do you value certain outcomes?
In financial markets, for example, you may even think that it’s clearly defined, that it’s just dollar value, but that’s not necessarily the case, actually. It could be risk, it could be … Maybe you value short-term liquidity, you value selling an asset in the short term in order to have cash on hand. There’s all sorts of things that make this situation more complicated.
So, I would say that if you are able to define the model, then you can use these techniques in the short term, I would say within the next few years. But the fact that you can’t easily define the model in a lot of situations, that’s going to be an AI obstacle, and that’s something that a lot of people in AI are working on right now. You’ve seen a lot of success in AIs for games, and you’ve seen less success in the real world, and that’s the main reason, is that it’s not really clear how to construct the model in the real world.
And now, in the longer term, I think it’s also important to understand that there’s limitations to what AI can do, and I think that those limitations are being overcome every year to some extent, but there are some that are just so far off that it’s unimaginable to imagine AI, for example, writing a novel. I don’t think that AIs will ever, or at least, not in my lifetime, perhaps, have the understanding or the creativity to be able to write a prize-winning novel.
So, I think that things that rely on verbal skills, on understanding of the world, that is going to take a long time to replace. And I think also things that involve human interaction, for example, day care. You’re not going to see an AI taking care of kids any time soon, and I don’t think that’s going to change.
And people might have thought that playing poker, being able to bluff, that is something that is uniquely human, and a sign of human intuition, and being able to read your opponent, and that’s something that if an AI were ever able to do that, then it would really be the sign of the robot apocalypse, the AIs doing everything. And now we’ve seen that, an AI that can bluff. Certainly, that means they can do a lot of things now, but they’re also, they can’t do everything.
So, I think that’s, to try to say … I’m saying that now, an AI can’t write a prize-winning novel, and if it ever did that, I would be very terrified. I was thinking back to those people 50 years ago that said, if an AI can ever play chess, then it’s the end of the world. I could be wrong, and it could be the case that an AI could write a prize-winning novel, and it won’t mean anything. But I’m kind of … I wouldn’t really put … It’s kind of hard to quantify. It’s kind of hard to say what is it that makes certain tasks harder, what makes certain tasks easier? And anything I say, I could be wrong.
You can find the Digital Life on iTunes, SoundCloud, Stitcher, Player FM, and Google Play, and if you’d like to follow us outside of the show, you can follow me on Twitter @JonFollett. That’s J-O-N F-O-L-L-E-T-T. And, of course, the whole show is brought to you by GoInvo, a studio designing the future of health care and emerging technologies, which you can check out at GoInvo.com. That’s G-O-I-N-V-O.com. Dirk?