By Mateusz Wyszyński.
We recently witnessed one of the biggest game AI events in history – Alpha Go became the first computer program to beat the world champion in a game of Go. The publication can be found here. Different techniques from machine learning and tree search have been combined by developers from DeepMind to achieve this result. One of them is the Monte Carlo Tree Search (MCTS) algorithm. This algorithm is fairly simple to understand and, interestingly, has applications outside of game AI. Below, I will explain the concept behind MCTS algorithm and briefly tell you about how it was used at the European Space Agency for planning interplanetary flights.
Perfect Information Games
Monte Carlo Tree Search is an algorithm used when playing a so-called perfect information game. In short, perfect information games are games in which, at any point in time, each player has perfect information about all event actions that have previously taken place. Examples of such games are Chess, Go or Tic-Tac-Toe. But just because every move is known, doesn’t mean that every possible outcome can be calculated and extrapolated. For example, the number of possible legal game positions in Go is over.
All the time we choose to play on the machine with the highest upper bound for xi (so “+” in the formula above).
This is a solution to Multi-Armed Bandit Problem. Now note that we can use it for our perfect information game. Just treat each possible next move (child node) as a slot machine. Each time we choose to play a move we end up winning, losing or drawing. This is our pay-out. For simplicity, I will assume that we are only interested in winning, so pay-out is 1 if we have won and 0 otherwise.
Real world application example
MAB algorithms have multiple practical implementations in the real world, for example, price engine optimization or finding the best online campaign. Let’s focus on the first one and see how we can implement this in R. Imagine you are selling your products online and want to introduce a new one, but are not sure how to price it. You came up with 4 price candidates based on our expert knowledge and experience: 99$, 100$, 115$ and 120$. Now you want to test how those prices will perform and which to choose eventually. During first day of your experiment 4000 people visited your shop when the first price (99$) was tested and 368 bought the product, for the rest of the prices we have the following outcome:
100$4060 visits and 355 purchases,
115$4011 visits and 373 purchases,
120$4007 visits and 230 purchases.
Now let’s look at the calculations in R and check which price was performing best during the first day of our experiment.
We calculated the Bayesian probability that the price performed the best and can see that the price
115$ has the highest probability (0.5). On the other hand
120$ seems bit too much for the customers.
The experiment continues for a few more days.
Day 2 results:
After the second day price
115$ still shows the best results, with
100$ performing very similar.
bandit package we can also perform significant analysis, which is handy for overall proportion comparison using
At this point we can see that
120$ is still performing badly, so we drop it from the experiment and continue for the next day. Chances that this alternative is the best according to the
p_best are very small (p_best has negligible value).
Day 3 results:
Day 3 results led us to conclude that
115$ will generate the highest conversion rate and revenue. We are still unsure about the conversion probability for the best price
115$, but whatever it is, one of the other prices might beat it by as much as 2.67% (the 95% quantile of value remaining).
The histograms below show what happens to the value-remaining distribution, the distribution of improvement amounts that another price might have over the current best price, as the experiment continues. With the larger sample we are much more confident about conversion rate. Over time other prices have lower chances to beat price
If this example was interesting to you, checkout our another post about dynamic pricing.
We are ready to learn how the Monte Carlo Tree Search algorithm works - next page.