Perhaps the most used, and certainly the most overused phrase in sports in 2014 — particularly baseball — is the punchy little three-word alliteration “small sample size.”
With the magical utterance or writing of those words, a slump or a hot streak can be dismissed, something that looks to be a trend can be brushed aside and every performance can be saddled with a caveat.
You’ve probably heard it by now. And maybe you wonder, like I do, what exactly it means.
In basic terms, it’s easy to discern. Joe Mauer was 3-for-4 with a walk Wednesday, meaning his on-base percentage was .800 for the day. If he can keep up that pace for the rest of the year, he will be the American League MVP and cement his place in Cooperstown. We know, of course, that he won’t do that. Extrapolating one game over the course of a 162-game season is fun, but it’s not real.
But at what point does a small sample size become an adequate sample size? The question was posed on Twitter, and I was told that entire statistical books have been written about such things.
Fortunately, Fangraphs.com has summarized those principles and applied them specifically to baseball. The site refers to “stabilization points,” the number of at-bats, plate appearances, etc., it generally takes before a sample size becomes indicative of a trend going forward. Within that summary, though, we find wildly divergent points.
For example, Fangraphs finds that a player’s strikeout rate will stabilize after just 60 plate appearances. But on-base percentage takes 460 plate appearances to stabilize. It’s fascinating stuff, but it also begs more questions. Forgive me, but Mauer offers the most interesting data to explore those questions.
Mauer has more than 5,000 career plate appearances, meaning his entire career is a massive sample size. This season, he has a little more than 250 plate appearances — a big enough sample to project some of his slump-ridden stats, but not for others.
The larger question is which sample to trust: the career numbers that suggest Mauer is one of the best hitters in baseball or the 2014 numbers — with Mauer getting older and striking out more — suggesting he is an average hitter.
In the smallest sample size (Wednesday’s game), Mauer is red hot. In the 2014 sample, he’s struggling. In the career sample, he’s great.
This is probably why former Gophers coach Glen Mason was fond of saying, “figures lie, and liars figure.”