Blog

Data Science & Data Engineering blogs

Thanks for reading. Here you will find a huge range of information in text, audio and video on topics such as Data Science, Data Engineering, Machine Learning Engineering, DataOps and much more. The show notes for “Data Science in Production” are also collated here.

Monte Carlo models In Python

We were talking to a customer about the types of Machine Learning and experiments they were doing. One of their experiments was a Monte Carlo model. Most of the people in this meeting had never heard of Monte Carlo simulations before. I LOVE MonteCarlo models. They have so many applications. I wanted to jot down a little bit Monte Carlo models as a reference for you to refer back to. I will do some really basic probability solving with a Monte Carlo simulation in Python.

Monte Carlo simulations (MCS) enable the investigation in to stochastic probabilistic problems (Alexa: Define Stochastic. A process which is random and non-deterministic). MCS were originally postulated by American-Polish scientist Stanislaw Ulam in 1947. History states that Ulam was playing solitaire (Canfield solitaire a particular type of solitaire in which once the cards are drawn, you either win or lose) while recovering from surgery, Ulam was contemplating what the probability of winning was once a hand was drawn. How many games would he need to play before he won.

In 1947, Ulam was working on the first general purpose computer (ENIAC) and thought this type of problem was well suited to general purpose computing. John Von Neumann saw the benefit of this process and applied it to the diffusion of neutrons. The name "Monte Carlo" was inspired by Ulam's uncle who liked to gamble (in Monte Carlo). The first paper on MCS was published in 1949.

MCS is a process which "learns about a system by random sampling".  Based on this description a MCS can be used to simulate many different problems, probability is a simple example. Understanding probability and how to solve probabilities can be hard, some probabilities are beyond that of simple calculations. Writing a model which solves the probability problems through repetition and random sampling is easy. As an example of this, if the birth rate ratio of boys to girls is 51:49. Based on this ratio what is the probability of having two children who are both girls. In R using a Monte Carlo simulation, solving this is trivial.

import random

babies = []
for i in range(0,49):
babies += ['g']
for i in range(0,51):
babies += ['b']

Simulations = 1000000

for i in range(0,Simulations):
girls = random.sample(babies, 2)
if (girls== "g" and girls=="g"):