Talking About Sample Sizes

This is going to come off a bit self-indulgent, but I think it needs to be put out there.

When I evaluate a player, I start with the statistics—generally the catch-all statistics like wOBA, TAv, OPS, etc. before moving into the more narrow metrics. From there, I then move onto the information provided on the player’s Baseball Prospectus page—the prior annual comments are brilliant for a timeline of a player’s progress or decline. At this point, I will then try to talk to someone who would know the player or the team the player is coming from. This isn’t always an option, but I try to take advantage of the first-hand knowledge when I can.

The key to evaluating players from the outside probably isn’t too different than evaluating them inside—either way, information is king. The Rays have access to more (and better) information than Tommy and I, but we can make do given what’s available. As is the case with any information, you have to filter out the biases involved. This can be easier with people’s opinions—particularly those you know well—than with numbers, which sometimes come in a variety of small sample sizes.

Here’s the thing, though: not all sample sizes are created equal.

Let me provide an example before moving on to my real grievance. Let’s say you were asked to identify who the better-hitting catcher is between John Jaso and Jeff Mathis. If the anonymous questioner only gave you 2011 data, then you would see that Jaso is hitting .167/.205/.286 (44 plate appearances) while Mathis entered Monday night’s game with a slash line of .214/.227/.381 (46 plate appearances). If you had no previous knowledge, no ability to look up old information, nothing but 2011 data, then you might say Mathis, but alas, you are rarely that restricted.

A quick Baseball-Reference search shows that Jaso has a career .251/.352/.363 line (in 458 plate appearances) while Mathis has hit .201/.262/.308 since 2008 started (864 plate appearances). The question then becomes this: what is more likely, a bad hitter having a good stretch for 458 plate appearances or a good hitter having a bad stretch for 864 plate appearances? My guess would be the former, as fluctuation is more likely the smaller the sample size, but you still have more data to digest. You can look at minor league lines, look at scouting reports, look at peripheral numbers that may indicate a harsher regression for one than the other may, and on and on and on until you have a vivid picture using all the available information.

Here’s my real point of contention. It comes from a comparison Heath Baywood—a good friend and a brilliant colleague—made a week or two ago about Elliot Johnson’s treatment versus that of Reid Brignac. Both have struggled offensively to start the season—Johnson is hitting .200/.286/.280 through 30 plate appearances and Brignac is hitting .245/.288/.245 through 52 plate appearances—and both have played shortstop, but otherwise, the comparison doesn’t work.

It doesn’t work because we have more information available to us than the 2011 data. Using that information, we come to learn that Brignac is nearly two full years younger, that he had success (relative) in 2010, that he hit roughly the same as Johnson in Triple-A despite the age gap and spending less time there, that Brignac has the pedigree of a highly-touted prospect whereas Johnson fell off most lists, and even that Johnson cleared waivers prior to last season. Basically, we learn that there is every reason to believe Brignac is the better hitter, despite looking worse than Johnson in a slightly larger sample size.

If Brignac’s struggles continue through 100 plate appearances you get annoyed, if Johnson’s do then you get antsy, if Brignac’s run through 400 plate appearances then you get concerned, if Johnson’s do then he gets fired. Maybe that seems unfair, but based on all the information available, nobody can honestly argue that Johnson’s talent level is more assured than Brignac’s, and that’s exactly why their sample sizes aren’t equal.

This isn’t just a Brignac-Johnson thing either (although I do believe in Brignac’s ability more than Johnson’s), as the major two takeaways work regardless of the player. Those are: 1) Use all of the information available—ignoring data isn’t something to be proud of if you want your analysis taken seriously or not questioned—and 2) keep in mind that 50 good (or bad) plate appearances for one player is not always equal to 50 good (or bad) plate appearances for another player. At least not in terms of true talent level and expectations heading forward.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

One Response to Talking About Sample Sizes

  1. Pingback: Cust vs Wilkerson « SoDo Mojo | A Seattle Mariners Blog