Google Ads

Saturday, September 10, 2011

The Variance Problem

Instead of doing the introduction thing (which is, frankly, boring) I figured I would leap right in head first.

I'm sure many of you have seen Vi Hart's video on "The 12 Days of Christmas" (The Gauss Chrismath Special)  This video presents a math problem that is highly applicable to a computer programming problem I had been working on right around the same time.  Remember this video: we'll be coming back to it.  Here we go.

You are presented with four numbers and are asked to find the two numbers that are the closest to each other, and give the lowest ratio of variance according to the following formula:

The strange "v" like symbol in the denominator means "|x1| or |x2|, whichever is greater." The formula is asking to take the absolute value of the difference between the two numbers, and divide by the larger of the absolute values of the two numbers.

The question is this: what is the minimum number of comparisons to determine the minimum variance?  Since we have 4 numbers, one might be tempted to give the obvious answer: 42=16, since these four numbers can be combined in groups of two in 16 different ways.    If we were to make a table of all possible combinations, and the four numbers being compared were A, B, C, and D, it would look like this:

A,AA,BA,CA,D
B,AB,BB,CB,D
C,AC,BC,CC,D
D,AD,BD,CD,D

But there is a lot of redundancy here.  Try the numbers 4 and -6 in the formula above:

Now, let's reverse these numbers:


It doesn't matter what order they're put in.  Thus, (A,B) = (B,A). That means that we can eliminate almost half of the above table (shaded in yellow):

A,AA,BA,CA,D
B,AB,BB,CB,D
C,AC,BC,CC,D
D,AD,BD,CD,D

Also, since A=A, do we really need to compare it?  No matter what number we plug into the variance formula in both the x1 and the x2 positions (say, 4 and 4), it will always evaluate to 0.  That means that we can get rid of these "self-tests":

A,AA,BA,CA,D
B,AB,BB,CB,D
C,AC,BC,CC,D
D,AD,BD,CD,D

So, we're left with 6 groupings that we need to get the variance for.

Now, let's say we have 5 numbers we need to compare?  We'd have 52 = 25 possible combinations:

A,AA,BA,CA,DA,E
B,AB,BB,CB,DB,E
C,AC,BC,CC,DC,E
D,AD,BD,CD,DD,E
E,AE,BE,CE,DE,E

Now, going through our process above, we can eliminate many of these:

A,AA,BA,CA,DA,E
B,AB,BB,CB,DB,E
C,AC,BC,CC,DC,E
D,AD,BD,CD,DD,E
E,AE,BE,CE,DE,E

This will leave us with 10 comparisons that need to be made.  Wait a minute, we eliminated half of the tests, 2 × 10 = 20.  And we had 5 numbers in this case, 20 + 5 = 25.  Before, we had (2 × 6) + 4 = 16.  So what we've done is 2t+n=n2, where t is the number of comparisons left after taking out duplicate test and self-tests, and n is the count of numbers to be compared.  The problem here is that we know n.  We need to know t.  So let's solve for t:

Now, this is odd, because this looks very similar to Ms. Hart's formula for triangular numbers:

Which is the formula to tell you the sums of all integers between 1 and n.  Why the difference of n between P and t?  Remember above that we subtracted all of the self-tests from our group, which, when summing consecutive integers, we don't have to do.  In fact, it could be said that t = P - n.

Coming up, the VBA function that can compare any arbitrary count of variables and return the lowest variance ratio.

No comments:

Post a Comment