Tuesday, July 24, 2007

Induction and Science: post hoc analysis and evidence (and how not to pick the Super Bowl winners)

In discussions of science with creationists, it will often be noted that inductive logic is part of the scientific method. "A watch implies a watchmaker" is a piece of induction. Here are some parts, and some gears, which seem to be assembled for some purpose, thus a designer is inferred. They argue this is equivalent to what scientists do when interpreting fossils, or the geologic column, or a wide variety of science.

However, these are not the same at all. Induction is an important part of science, true. It is often how we construct our hypotheses. Religion and pseudoscience stop at this step, however, which is why they are not science. Science goes one step further and tests its induction with evidenciary experimentation, ie, some procedure that could in theory produce data contrary to the hypothesis. Hypotheses that are based on past data, even a good amount of it, are nowhere near as powerful as those that have been put through the falsification ringer. I will illustrate why using solutions to polynomials.

It is a mathematical fact that given any series of n integers, a polynomial exists of at most degree n-1 that will produce the sequence given inputs of 1, 2, 3, etc. For example, the sequence:

2, 4, 6

can be represented by the function F(x) = 2x. However, it is also true that the number of polynomials that will produce said sequence is infinite. G(x) = 2x + (x-1)(x-2)(x-3) will also produce 2, 4, 6, as will H(x) = 2x - 5*(x-1)(x-2)(x-3).

So, if there are infinite possible solutions, then how can we know if we have the correct formula? Why, by predicting the next number in the sequence and seeing if we are correct. F(x) predicts 8 (2 X 4). Likewise, G(x) predicts 14, and H(x) predicts -22. Now if we look to the sequence and see:

2, 4, 6, 8

we now have some confirmation of the correctness of F(x), and may discard H(x) and G(x). This is analogous to scientific confirmation. Without it, all the just-so stories, whether they be about UFOs, or creationism, or 9/11 conspiracies, are no more valuable than G(x) and H(x) were. It doesn't matter how much they "explain", because its EASY to come up with a theory AFTER THE FACT to explain things. Until the theory starts making predictions accurately, it is just one of an infinite number of contenders.

But validation never theoretically ends in science. That's why talk of evolution being "proved" is nonsense. Go back to F(x) above. It correctly predicted the 4th number in the sequence would be 8. Does that mean it has proved it is the solution? No, because we still have the same problem we had before, except with four data points instead of three. There are now a new infinite set of potential formulas that could account for 2, 4, 6, 8.
J(x) = 2x + (x - 1)(x - 2)(x - 3)(x - 4) also produces that sequence, but unlike F(x), which predicts 10 for the 5th element, J(x) predicts 34. The only thing F(x) has going over J(x) is one confirmed data point. That's right, even though F(x) and J(x) both explain just as much history, F(x) is more likely to be right. Ponder that the next time someone starts prattering on about how much creationism explains.

The more confirmatory data points, the more confidence in our theories. However, no matter how many confirmations we get, no matter how many numbers in the sequence are correctly predicted, it remains possible that the theory we have been using is incorrect. Still, theories like evolution are treated roughly as fact by laymen and scientist alike (even though it is scientific theory), because it has been confirmed countless times. Contrast this to hypotheses like Intelligent Design, which make no predictions, but instead look at known data and claim they knew it was J(x) all along.

To end on a humorous note, this has applications in gambling as well. Many sportscasters come up with all sorts of goofy after-the-fact theories on how to predict winners. The late Pete Axthelm was the master of this, as was a local favorite here Norm Hitzges, giving us stats like "The Raiders are 4-0 the last four times they played an AFC central opponent on grass after a loss". HUH? That means diddly. That's just one of the many patterns one can find if one looks hard enough, and it is amazingly consistent how badly these theories perform once codified.

None is better than my personal favorite for predicting Super Bowl winners. As of 1997 this strategy was 15-4 in picking the winner. 15-4!!! That's a 79% win rate. Surely there is something to this theory, whatever it is. Brace yourself, here it is:

When a human mascot plays an animal mascot, bet on the human.

Yes, you read that right. With all the victories of the Giants, Cowboys, 49ers and Redskins over the Dolphins, Bengals, Bills, and Broncos, the pattern was clear. But guess what happened after that? Since then (thanks to the Patriots) the theory has only been 4-3, solidly within chance. IOW, it's nonsense, regardless of what the past shows. However, if you disagree and want to bet this way in the next Super Bowl, let me know. I'll expect appropriate odds.

No comments: