4.0 Bayesian inference

Finding model parameters using Bayesian inference

In [2]:
var iframer = require('./iframer')

In general, statistical inference is used to draw conclusions about unknown quantities based on observed real-world data. Baysian inference is a proccess of using such data to determine unknown parameters of a probabistic model and predict new values using probability distributions as a measure of uncertainty. The process is often called model fitting.

There are multiple ways to fit a model with their own pros and contras. We will be using simulations as our method of choice because of its flexibility and relative simplicity.

Bayes' theorem

Bayesian inference is based on the Bayes' theorem. To understand it we should refresh our knowledge of conditional probability. According to the definition, conditional probability of the event(hypothesis) $A$ given $B$ equals to

$$ P(A \mid B) = \frac{P(A \cap B)}{P(B)} $$

where $P(A \cap B)$ is the probability that both events A and B occur. It could be useful to think about probabilities graphically:

We can find conditional probability of the event $B$ given the event $A$ using the same definition:

$$ P(B \mid A) = \frac{P(B \cap A)}{P(A)} $$

Here comes the most interesting part. Knowing that $P(B \cap A)$ is equal to $P(A \cap B)$ we can combine these two formulas to get:

$$ P(A \mid B)P(B) = P(B \mid A)P(A) $$

After moving $P(B)$ to the left side of the equation we get the common form of the Bayes' theorem:

The best thing about the Bayes' theorem is the ability to get inverse conditional probability

Law of total probability

If we have multiple disjoint events(hypothesis) $A \tiny n$ whose union is the entire sample space, we can find a probability of any event $B$ as a sum of probabilities of all events ${A \tiny n} \cap B : n = 1,2,3,4...$ or alternatively

$$ P(B) = \sum\limits_{n} P(B \mid {A \tiny n})P({A \tiny n}) $$

Extended form of the Bayes theorem

Usually we don't know $P(B)$ in the Bayes' formula. Using the law of total probability we can rewrite the Bayes' formula as:

$$ P({A \tiny i} \mid B) = \frac{P(B \mid {A \tiny i})P({A \tiny i})}{\sum\limits_{n} P(B \mid {A \tiny n})P({A \tiny n})} $$

With these formulas we can already solve lots of probabilistic problems analytically (examples)

Bayesian inference

Using bayesian approach we can reason about unknown model parameters as random variables. In machine learning literature parameters are usually denoted as $\theta$ and $y$ denotes data. In bayesian inference, what we are interested in is the probability distribution of model parameters given observed data $p(\theta \mid y)$ or probability distribution of future data given the data we already have $p(\hat{y} \mid y)$. Using the Bayes' theorem we can find $p(\theta \mid y)$:

$$ p(\theta \mid y) = \frac{p(y \mid \theta)p(\theta)}{p(y)} $$

According to the law of total probability in discrete case $p(y) = \sum\limits_{n}p(y \mid \theta)p(\theta)$ is a sum over all possible values of the model parameter $\theta$
In case of continuous $\theta$ we have $p(y) = \int p(y \mid \theta)p(\theta)d\theta$


  • $p(y)$ is prior distribution of $\theta$
  • $p(y \mid \theta)$ is likelihood of the data
  • $p(\theta \mid y)$ is posterior distribution we are interested in

Sometimes the factor $p(y)$ is emitted because it doesn't depend on $\theta$ so the formula becomes

$$ p(\theta \mid y) \propto p(y \mid \theta)p(\theta) $$

Where $\propto$ means "proportional to". When you design a model in StatSim, the program converts it to the probabilistic model $p(y, \theta)$ and then use different algorithms to sample from the posterior distribution $p(\theta \mid y)$


By Anton Zemlyansky in