First off, it should be stated that there are no exact cutoffs in probability and statistics. As I explain in Ballparking, even a career .200 hitter like Mario Mendoza has a (small) chance of hitting .400 over the course of an entire season. There's no magic number above which we can definitively say, "These results are statistically significant." Fluctuations happen in any sample size no matter how large. That said, if we have a random sampling of statistically independent events, we can make definitive statements like the following:2
There's a 95% chance that Jose Iglesias's average over his next 92 at bats will be between X and Y.Here, X and Y define what's called a confidence interval. We have limited data, but given the information we do have, we're 95% percent certain that Iglesias's batting average over the next 92 at bats will be between two numbers X and Y. What are those two numbers? Wikipedia's entry for sample size determination gives a good description of how to calculate them. The width W of the confidence interval is given by
where n is the sample size, i.e. the number of at bats. Since Iglesias has had 92 at bats so far, we have W = 0.045. Iglesias is currently batting .435. If we believe Iglesias's stats represent a random unbiased sample, then we would expect there to be a 95% chance Iglesias's next 92 at bats will give a batting average between .412 and .457.
Despite the fact that Iglesias's average is almost certainly a random fluctuation, the shortstop still shows a lot of promise. If we look at his 2012 season during which he hit an abysmal .118 in 25 games, we notice he still nets a positive 0.3 WAR. Extended over a 162-game season, he would get a not terrible WAR of 1.9. Why? His defensive capabilities more than adequately compensate for poor hitting. Over his career, he's averaging a 5.4 WAR per 162 games, which is more than double the 2.6 WAR averaged by current starting shortstop Stephen Drew and over seven times greater than 0.7 WAR averaged by current starting third baseman Will Middlebrooks. Even given the small sample size, it's tough to argue that Iglesias doesn't deserve a spot in the starting lineup.
If you like math and sports or know someone who does, make sure to check out my book Ballparking: Practical Math for Impractical Sports Questions.
Aaron Santos is a physicist and author of the books How Many Licks? Or How to Estimate Damn Near Anything and Ballparking: Practical Math for Impractical Sports Questions. Follow him on Twitter at @aarontsantos.
[1] "WAR" is one of those newfangled stats that sabermetricians like to throw around. It stands for "wins above replacement" and is supposed to represent the number of extra wins a player is expected to contribute compared to a standard replacement player.
[2] Strictly speaking, it's a bit more complicated if we're talking about actual baseball players rather than mathematical probability distributions. For example, a player's theoretical batting average is not constant over time. It can increase or decrease depending on the player's age or health.
I think a comment like "There's a 95% chance that Jose Iglesias's average over his next 92 at bats will be between X and Y" is nonsense. The best you can do is say "if someone offerred me 19:1 odds to bet that he'd be outside of [X,Y] I'd be indifferent to which side of the bet I'd want."
ReplyDeleteIt's certainly a frequentist approach. The implication is that "in a long sequence of identical stretches of 92 at bats, Iglesias would hit outside of [X,Y] in approximately 5% of them" or even "in an infinite sequence of identical stretches of 92 at bats the percentage of them outside of [X,Y] would approach 5%." There are SO many assumptions built into such a statement and none of them are demonstrably true or can even be reasonably assumed to be true.
Probability is one of those subjects (like entropy perhaps?) that's easy to talk about and run numbers on in a facile way but extraordinarily deep if you try to pin it down and truly understand what you're describing.
Thank you Aaron for getting even deeper into this question. I have one thought to add, Iglesias is also benefiting from being put in the line-up on nights where he is more likely to succeed (i.e. against left-handed pitchers). So his ABs might be statistically random but very selective in actual games. Now that he is platooning more, we'll see if the AVG of the three players you mentioned level off. In the end, I just hope whoever is playing is hitting.
ReplyDelete