Better Estimation of the Expected Value (Continued)

In this lesson, we are going to stick to the restriction to distributions with support over 0.0 to 1.0 for pedagogic reasons, but our aim is to find a technique that gets us back to sampling over arbitrary distributions.

In the previous lesson, we implemented a better technique for estimating the expected value of a function f applied to samples from a distribution p:

  1. Compute the total area (including negative areas) under the function x => f(x) * p.Weight(x).
  2. Compute the total area under x => p.Weight(x).
    • This is 1.01.0 for a normalized PDF or the normalizing constant of a non-normalized PDF; if we already know it, we don’t have to compute it.
  3. The quotient of these areas is the expected value

Draw Backs of Using Quadrature to get an Approximate Numerical Solution

Essentially our technique was to use quadrature to get an approximate numerical solution to an integral calculus problem.

However, we also noted that it seems like there might still be room for improvement, in two main areas:

  • This technique only works when we have a good bound on the support of the distribution; for our contrived example, we chose a “profit function” and a distribution where we said that we were only interested in the region from 0.00.0 to 1.01.0.
  • Our initial intuition that implementing an estimate of “the average of many samples” by averaging many samples, seems correct; can we get back there?

The argument that we are going to make here (several times!) is: two things that are both equal to the same third thing are also equal to each other.

Recall that we arrived at our quadrature implementation by estimating that our continuous distribution’s expected value is close to the expected value of a very similar discrete distribution. We are going to make our argument a little bit more general here by removing the assumption that p is a normalized distribution. That means that we’ll need to know the normalizing factor np, which as we’ve noted is Area(p.Weight).

We said that we could estimate the expected value like this:

  1. Imagine that we create a 10001000 sided “unfair die” discrete distribution.
  2. Each side corresponds to a 0.0010.001 wide slice from the range 0.00.0 to 1.01.0; let’s say that we have a variable xx that takes on values 0.0000.000, 0.0010.001, 0.0020.002, and so on, corresponding to the 10001000 sides.
  3. The weight of each side is the probability of choosing this slice: p.Weight(x) / 1000 / np
  4. The value of each side is the “profit function” f(x)
  5. The expected value of “rolling this die” is the sum of (value times weight): the sum of f(x) * (p.Weight(x) / 1000 / np) over our thousand values of xx.

Here’s the trick:

  • Consider the standard continuous uniform distribution u. That’s a perfectly good distribution with support 0.00.0 to 1.01.0.
  • Consider the function w(x) which is x => f(x) * p.Weight(x) / np. That’s a perfectly good function from double to double.

Question: What is an estimate of the expected value of w over samples from u?

Get hands-on with 1200+ tech skills courses.