Non-uniform Discrete Distribution

In this lesson, we are going to extend the Bernoulli distribution to more than two possible outcomes, and discuss some mechanisms for efficiently implementing the Sample() method on this distribution.

A few lessons back we made a version of the Bernoulli distribution that took integer “odds”. That is, the distribution randomly produces either zero or one, and we can say “produce on average 2323 zeros for every 1717 ones”.


Extending Bernoulli’s Distribution

An obvious extension of this would be to produce 00, 11, or 22 randomly, similarly with some ratios: say, average 1010 zeros, 1111 ones, 55 twos out of each 2626 samples in the long run. Of course, there’s no need to stop at three outputs; basically, we want the distribution of any number from 00 to nn, with any integer ratios between them. This is a non-uniform discrete distribution (also called a “categorical distribution”); if a Bernoulli distribution is flipping an unfair coin, this is rolling an unfair nn-sided die.

Non-uniform Discrete Distribution

We’ve seen that we can do that by making an array with 1010 zeros, 1111 ones and 55 twos, and then calling ToUniform on that, but what if we wanted weights that were 437:563:501:499437:563:501:499 or something? It seems wasteful to make an array with two thousand elements just to solve this problem.

Let’s try to write the code to solve this problem without thinking too hard about it, and see where we get stuck:

public sealed class WeightedInteger : IDiscreteDistribution<int>
{
  private readonly List<int> weights;
  public static IDiscreteDistribution<int> Distribution(params int[] weights) => Distribution((IEnumerable<int>)weights);
  public static IDiscreteDistribution<int> Distribution(IEnumerable<int> weights)
  {
    List<int> w = weights.ToList();
    if (w.Any(x => x < 0) || !w.Any(x => x > 0))
      throw new ArgumentException();

    if (w.Count == 1)
      return Singleton<int>.Distribution(0);

    if (w.Count == 2)
      return Bernoulli.Distribution(w[0], w[1]);

    return new WeightedInteger(w);
  }

  private WeightedInteger(List<int> weights)
  {
    this.weights = weights;
  }

  public IEnumerable<int> Support() => Enumerable.Range(0, weights.Count).Where(x => weights[x] != 0);
  public int Weight(int i) => 0 <= i && i < weights.Count ? weights[i] : 0

}

No surprises here! This is all straightforward code like we’ve seen before.

Exercise: There are a few places where we could be a little smarter; for instance, if all the weights are zero except one, then we could make a singleton distribution for that case as well. If the last weight is zero, we can discard that weight. Can you give it a shot?

Implementing Sample

The interesting question is: how do we implement Sample? It is by no means obvious. Over the next few lessons, we’ll look at several techniques for solving this problem. In this lesson, we will learn the “inverse” technique.

Inverse Technique

How do you generate a non-uniform random number when given a standard uniform distribution? In brief:

  1. Get the probability distribution function.
  2. Integrate it to get the cumulative distribution — that is the function where c(x) is the probability that a sample is less than xx.
  3. The cumulative distribution is monotone nondecreasing and always between 00 and 11, so it has an inverse which we call the quantile function. Find it.
  4. If you sample from a standard continuous uniform distribution and then apply the quantile function to the result, the sample will conform to the desired distribution.

Read up more on this in the lesson on generating non-uniform data.

This technique works well with discrete distributions. (For discrete quantities we call the “probability distribution” function the “probability mass” function). In our case, we can simply treat the mass function as a non-normalized distribution. Here’s the non-normalized mass function for weights 10,11,510, 11, 5.

Get hands-on with 1200+ tech skills courses.