Generating Random Non-Uniform Data in C#
Uniform Probability Distribution
When building simulations of real-world phenomena, or when generating test data for algorithms that will be consuming information from the real world, it is often highly desirable to produce pseudo-random data that conform to some non-uniform probability distribution.
But perhaps I have already lost some readers who do not remember STATS 101 all those years ago. I sure don’t. Let’s take a step back.
The .NET Framework conveniently provides you with a pseudo-random number generator that produces an approximately uniform distribution. (The set of real values representable by doubles is not uniformly distributed, and the Random class is not documented as producing a uniform distribution. But in practice it is a reasonable approximation.)
By a “uniform distribution” we mean that if you took a whole lot of those random numbers and put them in two buckets, based on whether they were greater than one half or smaller than one half, then you would expect to see roughly half the numbers in each bucket; there is no bias towards either side. But moreover: if you took those same numbers and put them in ten buckets, based on whether they were in the range 0.0-0.1, 0.1-0.2, 0.2-0.3, and so on, you would also expect to see no particular bias towards any of those buckets either. In fact, no matter how many buckets of uniform size you make, if you have a large enough sample of random numbers then each bucket will end up with approximately the same number of items in it.
That’s what we mean by a “uniform probability distribution”:
The number of items you find in the bucket is proportional to the size of the bucket, and has nothing to do with the position of the bucket.
Here I’ve generated one hundred thousand pseudo-random numbers between zero and one, put them into fifty buckets, and graphed the number of items found in each bucket.