clock menu more-arrow no yes mobile

Filed under:

BABIP: The Pokes, The Quails, and The Inexplicable .410 Batting Average through April

Everybody knows a hit is a hit. Hits advance baserunners, they drive in runs, they kick up grass and dirt and miss gloves.

Now, batting average (AVG) is the traditional measure of a baseball player's skill. I personally don't subscribe to this method of evaluation, but for the sake of this article, we're only going to deal with batting average and therefore hits. Not worrying about walks, not worrying about homers, not worrying about strikeouts.

Building off of that last sentence, we should define a key concept: Three True Outcomes. The Three True Outcomes are Strikeout, Walk, and Home Run. The reason these are considered "true" over any other possible outcome is the fact that all 3 are strictly Pitcher-Batter interactions.

If Tim Lincecum punches out Adam Dunn, it had absolutely nothing to do with how Fred Lewis positioned himself.

If Oliver Perez walks Adam Dunn however, does it matter that Luis Castillo wasn't properly aligned for the shift?

Finally, if Adam Dunn takes Ted Lilly yard, will it matter if Aramis Ramirez was perfectly poised to turn a 5-4-3?

Clearly not.

These are the fundamentals that such metrics as FIP are built around, just the things that pitchers can control.

But obviously, not everything in baseball is a K, BB, or HR. This is where BABIP comes into play.

Join me after the jump and we'll discuss why.

BABIP stands for Batting Average on Balls in Play. It only takes hits that aren't home runs, and batted outs that aren't strikeouts. The exact formula is as follows:

Ed893b7b0405147c8498db6096fce17e_medium

via upload.wikimedia.org

Well this is all good and fine, but what does it mean? How is it used?

Well for starters, BABIP tends to normalize to around .300, league-wide every season. It'll certainly vary a bit, but .300 is a good benchmark. Keep this in mind.

Now think about hits. While they all count, we all know that some hits are more valuable than others. A line drive single to CF is probably a more "legit" hit than a squirter that rolls between 2B and 3B and just past 2 diving gloves. Reason why? Well, a line drive single has the height to get past the infield, and drops well in front of the CF, and basically eludes all the gloves in a sense of "yeah nobody's going to get to that". However, that squirter, if you perhaps had a 3B or a SS with better range, that could be a 5-3 or 6-3.

Maybe that 2nd hit is a seeing-eye grounder. Maybe it's a dying quail. Maybe it's just a lucky shot that jussssst gets past the outstretched glove. Maybe it's because the opposing defense sucks. Like I said, it still counts, but how much of that do you credit to the batter and how much do you fault the fielders, and how much is just statistical randomness?

It's the same kind of thing you can apply to a lineout to CF and a 2B8. The balls were hit the same, but one found glove, one found grass. There's an element of randomness that has to be considered on batted balls. Sure, we could break down every swing and every batter and every pitch, but 1. Who has time for that, and 2. The real-life differences in everything can just be kind of swooped up as statistically insignificant.

Now there are a couple of ways to use BABIP. You can use it to explore a pitcher, a batter, and a team, in terms of their performance - and if it's sustainable.

For starters, let's apply it to pitchers. We'll use Matt Herges between 2007 and 2008 for this example. In 2007, Herges had a 2.96 ERA; he struck out 5.55 batters/9, walked 2.77/9, and gave up 0.74 HR9. Very solid. Compare that to his career line of 6.13 K9, 3.41 BB9, and 0.84 HR9. Not too far off at all, right? Now let's look at 2008. K9 6.44, BB9 3.36, HR9 0.70. Again, not very far off of his career numbers, but it came with a 5.04 ERA! You could argue (based on the rate stats) that he pitched very similarly in 2008 as he did in 2007, so what went wrong?

This is where we look at BABIP. 2007, Herges was sporting a nice, proud .219 BABIP. Now, just based on that baseline number above, this is incredibly low. In 2008, his BABIP was .353. This is very high. Now, considering the fact that all the fielding-independent numbers didn't change much, we can probably thank a bit of randomness as to why the ball found gloves or found ground.

Remember how good the Rockies' defense was in 2007. That will help any pitcher look better. Batted balls find gloves and become outs. In 2008 however, a lot of key injuries definitely sunk the defense, and you could very likely thank the lack of defense for part of Herges' inflated ERA.

Similarly, with a batter, you look at their career BABIP to see what they're doing differently. If they're walking a similar amount as usual, hitting for the same power, hitting the same number of line drives, yet still aren't performing, maybe they're just hitting them right at fielders. Or if they're performing amazingly, perhaps they're getting the benefit of the opposite. Some batters are just high BABIP hitters, though, and some are low. It really varies, based on how many balls the batter puts into play.

Our real-player example is Matt Kemp. He's batting .392 right now, with a BABIP of .486. Those tell me that he's incredibly hot right now, but there's a very distinct chance that both of those numbers are coming down to more around his career norms. Interestingly, Kemp sports a .381 career BABIP, which is extremely high. His minor league BABIP was .368, which does support his major league numbers, but his BABIP might see a big drop this season to begin pushing his career total down.

So back to the idea of a .300 benchmark. The statistics - and by this I don't mean the WARPLVORP5 or whatever, I mean the literal statistics - mean, standard deviation, etc, they suggest that 30% of batted balls will become a hit. That includes everything. To be very specific, the NL-league average right now is a bit high, at .299. Last season it was .298, the season before .301.

If a team is significantly above or below that .300 mark, and they're doing most everything else like they normally would, there's a good chance that those hits are just doing the same stuff as above mentioned: finding/missing gloves/holes/no-man's-land.

Much as I'd like to simply credit/fault a team's defense, a lot of hits are hits for no reason other than ...well, no reason! They squeak past fielders and land where they ain't. You could fault the weather, the humidity, the sun, the wind, the ceiling of the Trop', Pigeons, whatever, sometimes hits fall. There's no right or wrong to it, it just is.

The reason I bring this up in today's article is simply to recap a bit of this season's slow start.

The Dodgers have a 19.5% Line Drive %, which is right around league average. They also have a .338 BABIP, a good 40 points above league average, which suggests that they're hitting the ball well, but they're also getting a bit lucky. The Rockies are at .282, but they're also tied for the worst LD% in the NL (17.0%), signifying that not only are they getting unlucky, but they're not making good contact with the ball, to the tune of about 20 points. This isn't to say that not catching breaks is the only thing wrong with the Rockies bats, but it suggests that a combination of lack of good contact and more bad breaks than the norm are sinking the offense.

So to summarize this whole debacle, let's recap:

1. BABIP is the batting average on balls that stay in the park.

2. BABIP normalizes league-wide over the course of a season

3. If a pitcher/batter/Team is looking incredibly good/poor and they're not doing anything more special than they ever have, maybe they're getting the breaks and the gaps.

4. When a manager says something like "He's just hitting it where we ain't" that's BABIP.

5. BABIP is really good to determine if someone's just hot or cold, especially if they're not doing anything differently than they're used to.

6. While not really a "luck factor", you can attribute an abnormal BABIP with a normal performance to statistical deviation, or "bad breaks"

7. Finally, this isn't intended to be a big excuse for why we're losing. We're doing a lot of things wrong, it just seems like missing breaks isn't helping things either.

That's all for this week, RowBots, maybe we'll actually win a few this week!