I am going to be blogging a lot about football, but understand it is almost never going to be about football per se. Football, having become our national pastime, if not an obsession, tells us a lot about our culture, and the psychological, social, and philosophical issues that arise are often far deeper than any set of X’s and O’s could be.
For example, every year in college football, the debate rages: which state has the most high school talent? While some diehards that recall players named Montana and Marino might lobby for consideration of Pennsylvania, or maybe even Ohio, the consensus these days is that the Holy Trinity of college football recruiting are Texas, Florida, and California. However, I see it as a lesson in unconsciously biased samples, and how easy it is to get fooled by them.
The debate then tends to focus on comparing the college programs of those states as a proxy for state talent. Typical were the comments from the announcers of today’s Texas/Central Florida game touting the various successful programs in Florida as evidence of the superiority of Florida high school talent. After all, college students rarely travel too far from home to get their education. Ohio State tends to have a majority of players from Ohio, or the surrounding states, as does USC from California, or UT from Texas. So having good college teams in one’s state is treated as a strong proxy for high school talent generally. However, there is a serious flaw with this analysis when deciding between the top states that I’ve never seen discussed, and it has to do with the geography of the states in question. To illustrate the problem, consider the likely results of an analysis consistent with the reasoning above. The states would probably come out something like this:
Now consider the geography of these states, and one fact leaps off the page: The more coastline, and the less centrally located in the 48 states, the better they do. This is the inevitable result of the limited distances college athletes are willing to travel from home to go to school, combined with interstate recruiting. Mathematically, the more extra-state schools within X miles of the high school athletes, the more likely it is that those athletes will play for a school in another state. Thus, the optimal position for a state with a lot of homegrown talent that wants to keep it playing ball at home, would be on the edge of the map, preferably, ahem, a peninsula. Now look at the list again, and think about how much more limited the choices of schools is for a kid out of Miami, versus one in Los Angeles, versus one in Dallas. The Dallas kid is surrounded by extra-state options, the LA kid slightly less so, and the Miami kid has to travel hundreds of miles before reaching ANY options outside of the state. Ohio and Pennsylvania are so centrally located that, well, there is no stopping the bleeding of talent beyond their borders.
So essentially, the uneven geography makes for a bias in the high school talent=>college success correlation. It tilts the analysis in favor of states with talent that are the most geographically isolated, and against those with a geographically central location. Florida college teams aren’t so good merely because Florida has superior homegrown talent. They keep far more of their talent at home than Texas or Ohio ever could, because the kids have fewer options. This is the sort of bias that can exist in many proxies, and it is important to think about what those biases might be to avoid erroneous conclusions.