An Introduction to Monte Carlo Tree Search

A great explanation of the concept behind Monte Carlo Tree Search algorithm and a brief example of how it was used at the European Space Agency for planning interplanetary flights.

By Mateusz Wyszyński.


We recently witnessed one of the biggest game AI events in history – Alpha Go became the first computer program to beat the world champion in a game of Go. The publication can be found here. Different techniques from machine learning and tree search have been combined by developers from DeepMind to achieve this result. One of them is the Monte Carlo Tree Search (MCTS) algorithm. This algorithm is fairly simple to understand and, interestingly, has applications outside of game AI. Below, I will explain the concept behind MCTS algorithm and briefly tell you about how it was used at the European Space Agency for planning interplanetary flights.

Perfect Information Games

Monte Carlo Tree Search is an algorithm used when playing a so-called perfect information game. In short, perfect information games are games in which, at any point in time, each player has perfect information about all event actions that have previously taken place. Examples of such games are Chess, Go or Tic-Tac-Toe. But just because every move is known, doesn’t mean that every possible outcome can be calculated and extrapolated. For example, the number of possible legal game positions in Go is over.

Every perfect information game can be represented in the form of a tree

All the time we choose to play on the machine with the highest upper bound for xi (so “+” in the formula above).

This is a solution to Multi-Armed Bandit Problem. Now note that we can use it for our perfect information game. Just treat each possible next move (child node) as a slot machine. Each time we choose to play a move we end up winning, losing or drawing. This is our pay-out. For simplicity, I will assume that we are only interested in winning, so pay-out is 1 if we have won and 0 otherwise.

Real world application example

MAB algorithms have multiple practical implementations in the real world, for example, price engine optimization or finding the best online campaign. Let’s focus on the first one and see how we can implement this in R. Imagine you are selling your products online and want to introduce a new one, but are not sure how to price it. You came up with 4 price candidates based on our expert knowledge and experience: 99$, 100$, 115$ and 120$. Now you want to test how those prices will perform and which to choose eventually. During first day of your experiment 4000 people visited your shop when the first price (99$) was tested and 368 bought the product, for the rest of the prices we have the following outcome:

  • 100$ 4060 visits and 355 purchases,
  • 115$ 4011 visits and 373 purchases,
  • 120$ 4007 visits and 230 purchases.

Now let’s look at the calculations in R and check which price was performing best during the first day of our experiment.



visits_day1 <- c(4000, 4060, 4011, 4007)
purchase_day1 <- c(368, 355, 373, 230)
prices <- c(99, 100, 115, 120)

post_distribution = sim_post(purchase_day1, visits_day1, ndraws = 10000)
probability_winning <- prob_winner(post_distribution)
names(probability_winning) <- prices

##     99    100    115    120 
## 0.3960 0.0936 0.5104 0.0000

We calculated the Bayesian probability that the price performed the best and can see that the price 115$ has the highest probability (0.5). On the other hand 120$ seems bit too much for the customers.

The experiment continues for a few more days.

Day 2 results:

visits_day2 <- c(8030, 8060, 8027, 8037)
purchase_day2 <- c(769, 735, 786, 420)

post_distribution = sim_post(purchase_day2, visits_day2, ndraws = 1000000)
probability_winning <- prob_winner(post_distribution)
names(probability_winning) <- prices


##       99      100      115      120 
## 0.308623 0.034632 0.656745 0.000000

After the second day price 115$ still shows the best results, with 99$and 100$ performing very similar.

Using bandit package we can also perform significant analysis, which is handy for overall proportion comparison using prop.test.

significance_analysis(purchase_day2, visits_day2)

##   successes totals estimated_proportion        lower      upper
## 1       769   8030           0.09576588 -0.004545319 0.01369494
## 2       735   8060           0.09119107  0.030860453 0.04700507
## 3       786   8027           0.09791952 -0.007119595 0.01142688
## 4       420   8037           0.05225831           NA         NA
##   significance rank best       p_best
## 1 3.322143e-01    2    1 3.086709e-01
## 2 1.437347e-21    3    1 2.340515e-06
## 3 6.637812e-01    1    1 6.564434e-01
## 4           NA    4    0 1.548068e-39

At this point we can see that 120$ is still performing badly, so we drop it from the experiment and continue for the next day. Chances that this alternative is the best according to the p_best are very small (p_best has negligible value).

Day 3 results:

visits_day3 <- c(15684, 15690, 15672, 8037)
purchase_day3 <- c(1433, 1440, 1495, 420)
post_distribution = sim_post(purchase_day3, visits_day3, ndraws = 1000000)

probability_winning <- prob_winner(post_distribution)
names(probability_winning) <- prices

## 99 100 115 120
## 0.087200 0.115522 0.797278 0.000000
value_remaining = value_remaining(purchase_day3, visits_day3)
potential_value = quantile(value_remaining, 0.95)
##        95% 
## 0.02670002

Day 3 results led us to conclude that 115$ will generate the highest conversion rate and revenue. We are still unsure about the conversion probability for the best price 115$, but whatever it is, one of the other prices might beat it by as much as 2.67% (the 95% quantile of value remaining).

The histograms below show what happens to the value-remaining distribution, the distribution of improvement amounts that another price might have over the current best price, as the experiment continues. With the larger sample we are much more confident about conversion rate. Over time other prices have lower chances to beat price $115.

If this example was interesting to you, checkout our another post about dynamic pricing.

We are ready to learn how the Monte Carlo Tree Search algorithm works - next page.