What is the Average Value as the Number of Samples Increases
In the previous lesson, we reviewed the meaning of “expected value”: when you get a whole bunch of samples from a distribution, and a function on those samples, what is the average value of the function’s value as the number of samples gets large?
Revisiting the Naive Implementation of Expected Value
We gave a naive implementation:
public static double ExpectedValue<T>(
this IDistribution<T> d,
Func<T, double> f) =>
public static double ExpectedValue(
this IDistribution<double> d) =>
Issues with the Naive Approach
Though short and sweet, this implementation has some problems; the most obvious one is that hard-coded in there; where did it come from? Nowhere, in particular, that’s where.
It seems highly likely that this is either too big, and we can compute a reasonable estimate in fewer samples, or that it is too small, and we’re missing some important values.
Let’s explore a scenario where the number of samples is too small. The scenario will be contrived but not entirely unrealistic.
Let’s suppose we have an investment strategy; we invest a certain amount of money with this strategy, and when the strategy is complete, we have either more or less money than we started with. To simplify things, let’s say that the “outcome” of this strategy is just a number between and ; indicates that the strategy has completely failed, resulting in a loss, and indicates that the strategy has completely succeeded, resulting in the maximum possible return.
Before we go on, we want to talk a bit about that “resulting in a loss” part. If you’re a normal, sensible investor and you have to invest, you buy a stock or a mutual fund for because you believe it will increase in value. If it goes up to , you sell it and pocket the profit. If it goes down to , you sell it and take the loss. But in no case do you ever lose more than the you invested. (Though of course, you do pay fees on each trade whether it goes up or down; let’s suppose those are negligible.) Our goal is to get that return on investment.
Now consider the following much riskier strategy for spending to speculate in the market: suppose there is a stock currently at which we believe will go down, not up. We borrow a hundred shares from someone willing to lend them to us and sell them for . We pay the lender interest for their trouble. Now if the stock goes down to , we repurchase a hundred shares for , return them to the owner, and we’ve spent but received ; we’ve made a return on the we “invested”. This is a “short sale”, and as you can see, you get a return instead of a return.
(We say “invested” in scare quotes because this isn’t investing; it’s speculation. Which is a genteel word for “gambling”.)
But perhaps you also see the danger here. Suppose the stock goes down but only to . We buy back the shares for , and we’ve only gained on the trade, so we’ve lost half of our ; in the “long” strategy, we would only have lost fifty cents; your losses can easily be bigger with the short strategy than with the long strategy.
But worse, what if the stock goes up to . We have to buy back those shares for , so we’ve “invested” and gotten a “return” of , for a total loss of on a investment. In a short sale if things go catastrophically wrong, you can end up losing more than your original investment. A lot more!
As we often say, foreshadowing is the sign of a quality blog; let’s continue with our example.
We have a process that produces values from to that indicates the “success level” of the strategy. What is the distribution of success level? Let’s suppose it’s a straightforward bell-shaped curve with the mean on the “success” side: