3.1 Random variables & Distributions

Add random variables to your models, choose correct probability distribution, analyze random output.

In [1]:
var iframer = require('./iframer')

“From where we stand the rain seems random. If we could stand somewhere else, we would see the order in it.” ― Tony Hillerman, Coyote Waits

Random variable block

To add a random variable to a model, simply click Add block then choose Random variable. A new block will appear in the model bar containing 5 main parts:

  1. Each random variable should have a name. StatSim automatically gives a name to a new random variable, however you can change it whenever you want in the Name field.

  2. The second field determines variable size. If this value is more than 1, a random array will be created

  3. The third required field is a probability distribution. It actually define randomness of the random variable. How likely are certain values compared to another values. There are more than 20 built-in distributions in StatSim. The full list of probability distributions in WebPPL

  4. Each random distribution has its own parameters. For example Uniform distribution need a and b that set bottom and upper limit for the random variable all values of which are equally likely. The Gaussian asks for the mean mu and standard deviation sigma.

  5. Show in output makes the variable appear in the model results. If you are not interested in the value of specific random variable and need only as a part of expression or somethin else just unset this checkbox.

When you created and set a random variable, use it in your model as any other block like data or expression.

Choosing a simulation method

After finishing designing a stochastic model you need one more thing to do before running the simulation. By default, StatSim uses Deterministic simulation method. It simply executes the model once. Just as a regular program. If you run a stochastic model with such method, the output you will get will be a simple scalar value, probably not what you really need. In the next version StatSim will automatically choose a simulation method depending on the added blocks, however understanding different types of methods is still needed.

  1. Deterministic - the model is evaluated once as a regular deterministic program
  2. Enumeration - the app enumerates all possible execution paths and tries to find the solution analytically. Works only with discrete distributions. Not always.
  3. Rejection Sampling - generates samples according to created blocks. Rejects samples according to the Condition block. Each sample is independent. That's good! However the rejection sampler can stuck when condition.
  4. MCMC (Markov Chain Monte Carlo) - generates samples with Metropolis-Hastings algorithm. Such samples don't really uses random variables from the model to generate samples, it rather uses their and Observer likelihoods (strictly speaking log-likelihoods) to choose the next step on the parameter space.
  5. HMC (Hamiltonian Monte Carlo) - sampling algorithm similar to MCMC. It reduces the correlation between sampled states using a Hamiltonian evolution. It's a good option when input to the Factor block is a function of multiple variables. However it's more sensitive to changing method parameters.

There're some more simulation method in StatSim, but you at the moment these 5 are enoght. I'll not dive into details here and probably add a separate section later.

Example: Gaussian distribution

Let's create a simple example model containing only one random variable that goes directly to the output. After running the model, we will try to understand what all the output blocks mean. Keep in mind, random variables by default are not displayed in the output. You should select Display in output checkbox. Check the simulation method. It should not be Determinstic.

In [2]:
iframer('https://statsim.com/app/?a=%5B%7B%22b%22%3A%5B%7B%22d%22%3A%22Gaussian%22%2C%22n%22%3A%22G%22%2C%22o%22%3Afalse%2C%22p%22%3A%7B%22mu%22%3A%2210%22%2C%22sigma%22%3A%223%22%7D%2C%22sh%22%3Atrue%2C%22t%22%3A0%2C%22dims%22%3A%221%22%7D%5D%2C%22mod%22%3A%7B%22n%22%3A%22GaussianTest%22%2C%22e%22%3A%22%22%2C%22s%22%3A1%2C%22m%22%3A%22rejection%22%7D%2C%22met%22%3A%7B%22sm%22%3A%2210000%22%7D%7D%5D')

The GaussianTest model contains one random variable named G with the Gaussian distribution and parameters mu euqal to 10 and sigma - to 3. Now click Run.

Trace

When StatSim finishes a simulation it gets a huge number of samples from the output distributions. In case of the GaussianTest model those samples represent a gaussian random variable. The first block in the output is trace. This chart shows what values a random variable generate during the simulation.

Autocovariogram

When using MCMC simulation method, samples are not generated randomly, but choosen according to MH algorithm. The drawback of such method is correlation between samples, which is a bad thing. Autocovariogram shows how samples are correlated. The rule of thumb is try keep autcorrelation chart closer to 0.

Histogram

Histogram divides a sample space on equal bins and calculates how many sample values fall into each bin. Given a large sample, a histogram can provide enought information about the distribution.

PDF

In StatSim PDF is just smoothed histogram chart

CDF

Cumulative distribution function shows the probability of random variable taking values less than specific value x

Summary block

Summary displays main point estimates: qunatiles, mean, standart deviation, extremas

By Anton Zemlyansky in