Out Of The Blue Clear Sky: SIERA
Before we resume discussing advanced pitching metrics, David Pinto posted that ESPN will being a lot more with advanced statistics on their programming this season. It will be interesting to see how they integrate it: in a separate, brief segment, or in their typical shove-it-down-your-throat fashion. It will also be interesting to see, since they contract with writers from both Baseball Prospectus and Fangraphs, which advanced metrics they use for pitchers, batters and fielders.
SIERA
Skill-Interactive Earned Run Average, or SIERA for short, was introduced about a year ago by Baseball Prospectus (the link takes you to their glossary page where there are further links to the five-part introduction). It's a lengthy introduction, but the authors, Matt Swartz and Eric Seidman, note that SIERA is a successor to Nate Silver's QERA, which has a "simple" formula: QERA =(2.69+K%*(-3.4)+BB%*3.88+GB%*(-0.66))^2.
You can click over to BP for the SIERA formula, but essentially the authors attempt to unfold all of the individual components of QERA, apply an appropriate multiplier, and correct for the fact that GB% is actually a percentage of balls in play instead of a percentage of plate appearances (such as K% or BB%). For ground balls, SIERA uses (GB-(FB+PU))/PA, which puts less weight on them. SIERA is way more arithmetically complicated than the simple examination that follows, but hopefully this will be a helpful introduction (or re-introduction) to the metric.
What does that accomplish?
Taking a step back, let's recall the video referenced in the FIP and xFIP overview. In that, FIP is said to track the pitcher, umpire, stadium and luck. The use of HR in FIP's formula is largely responsible for it tracking the stadium and luck (xFIP reduces the presence of stadium and luck by using expected HRs). SIERA, which at its core relies on Ks, BBs, and GBs, claims to go a step further by eliminating the effects of the stadium and luck. It accomplishes this partially by treating GBs as a pitcher skill (as an Ubaldo lover, an idea I greatly appreciate) instead of considering HRs.* Removing HR from the equation essentially eliminates the key park and luck factor, but HRs are still accounted for by the notion that high Ks and lots of GBs should result in fewer HR. In particular, high Ks matter in limiting the possibility of HRs and XBH. The umpire, by way of controlling the strike zone, is still accounted for in SIERA.
*As stated in their introduction, HR/FB is highly variable from year to year and dependent on luck. Also, as noted in BP's Part 1, there is an inherent unfairness in treating all HRs the same.
On its face, SIERA does eliminate the effects of the stadium. As far as I know, there are no park factors listed for ground balls. However, there are park factors for walks. I'm not a believer, but there's no denying that 1) less foul territory could allow for more opportunities to draw a walk (or other outcomes) by decreasing the opportunity for foul pop outs, and 2) the optics of a park, or perception as a difficult place to hit HR, could change a hitter's approach. Yet, neither of those are necessarily borne out when you sort the park factors by BBs. There are lots of favorable parks for HR in the top ten for walks. In sum, SIERA goes further than FIP or xFIP in removing park effects, but might not entirely eliminate them.
It's a bit tough tough to say that SIERA completely eliminates defense and/or luck when ground balls are one of the three core factors. In a vacuum, ground balls are always better than fly balls because they can't become fly balls. There's (probably) less variation in trajectory and fewer potential landing spots (the ball has to touch the infield to qualify, and there's less space by volume in the infield than the outfield). It's a little like the Woody Hayes saying in football, "Three things can happen when you pass the ball, and two of them are bad." Three things can happen on a fly ball - a home run, batter reaches, and an out - and two of them are bad. Only two things can happen on a ground ball: batter reaches or an out.
However, the key reason that ground balls are considered better than fly balls is that they preclude the greatest danger (HRs) and are more likely to be converted to outs by the defense. In that loose sense, SIERA still tracks a bit of defense.
We will go into how pitchers end up ranking (versus other metrics), but I want to discuss one more formulaic aspect of SIERA (and other advanced pitching metrics). Follow over the jump, and then share your own thoughts, likes and displeasures with this crazy little metric...
SIERA = same philosophy, better outcome?
This was touched on in the re-introduction to FIP and xFIP, but these advanced metrics aren't necessarily designed to replace ERA. They are intended to estimate the runs a pitcher should have allowed based on things solely within his control (with differences of opinion on what is/is not within a pitcher's control). SIERA aims to be a better predictor of how many runs a pitcher should allow than other advanced metrics. To that end, in Part 4 of their introduction, the BP authors demonstrated that SIERA serves as a better indicator of next season park-adjusted ERA than any other metric, and is a better indicator of same-season park adjusted ERA than any metric that treats home runs as luck and not a pitcher skill (QERA and xFIP).
Why is this significant? In Swartz's followup on SIERA this year, he reiterates that it is the best estimator of pitchers' skill levels (in this case, future ERA). The ideology makes sense: ground balls are better, so eliminate explicit references in a formula to HR. Swartz elaborates on SIERA's success as an estimator:
- Ground balls matter more for pitchers who get more walks and fewer strikeouts because they allow more runners to reach first base.
- Ground-ball pitchers allow fewer hits and fewer extra-base hits on ground balls than non-ground-ball pitchers, and SIERA acknowledges this effect due to its negative coefficient on ground-ball rate squared.
- Pitchers with higher ground-ball rates (but not too high) allow the highest BABIPs and SIERA picks up on this reversing effect of ground balls on BABIP due to their correlation.
- Pitchers with higher strikeout rates allow lower BABIPs and lower HR/FB rates, and SIERA picks up on this correlation. This is why the coefficient on strikeout rate in SIERA is so negative--because pitchers with high strikeout rates not only prevent runs by getting outs, but because they also allow fewer hits on balls in play and fewer home runs on fly balls.
- Pitchers with higher strikeout rates get more ground balls in double-play situations.
- Pitchers with lower walk rates issue more of their walks strategically, and thus the average damage of a walk from a high walk pitcher is higher, another effect which SIERA picks up.
From the FIP/xFIP re-intro, we understand that an "earned run" does not actually mean the pitcher is responsible for the run. Instead, pitchers should be judged on what could loosely be called "expected earned runs" (this is not meant to be a defined term). Metrics that attempt to determine expected earned runs are built from peripherals over which the pitcher has a high degree of control. SIERA is the best at using peripheral statistics (Ks, BBs, GBs) to predict a pitchers' ability to limit runs.
Who looks good?
With all of that introductory stuff in mind, you would think that SIERA would treat Ubaldo pretty well. Instead, SIERA saw 2010 Ubaldo (3.57 SIERA, 21st) about the same as 2009 Ubaldo (3.60 SIERA, 20th). On June 17, 2010, right before Ubaldo entered his only prolonged skid of the season, Swartz wrote an article articulating that Ubaldo was in line for some serious regression. On that day, prior to the start versus the Twins that spawned the infamous article in which Jack Moore yawned at Ubaldo's first half, Ubaldo's SIERA was 3.43. As Swartz noted, that's pretty impressive and lends itself well to having a low ERA, but not as low as Ubaldo's to that point (1.16). The main reason identified by Swartz was Ubaldo's unsustainably low BABIP, particular his BABIP on line drives.
It's not a new story around Purple Row that Ubaldo pitched about the same in each half of 2010, and in some ways he was a better pitcher in the second half (+1.6 K/9). Unfortunately, he walked more, his BABIP returned to regular levels, and his LOB% regressed to normal. Ubaldo can improve on the walks in 2011, but the incredibly high LOB% and low BABIP weren't sustainable over a whole season, and, while they could occur again at times in 2011, shouldn't be expected over an entire season.
In many senses, SIERA is fair to Ubaldo. It basically tells us that he is a really good pitcher who offsets a relatively high walk rate (among the peers in his talent group) with a good ground ball rate and lots of strikeouts.
Let's take a look at how SIERA stacks up against the ERA, FIP and xFIP for pitchers with 150 IP+ from 2008-2010. I've listed the top three pitchers by SIERA, plus one pitcher immediately above and below Ubaldo (overall SIERA ranking in parentheses). Click on the year to see the full table at Baseball Prospectus.
|
Pitcher |
ERA |
FIP |
xFIP |
SIERA |
|
Lincecum (1) |
2.62 |
2.62 |
3.17 |
2.96 |
|
Sabathia (2) |
2.70 |
2.91 |
3.10 |
3.05 |
|
Beckett (3) |
4.03 |
3.24 |
3.24 |
3.09 |
|
Wolf (43) |
4.69 |
4.17 |
4.29 |
4.07 |
|
Ubaldo (44) |
3.99 |
3.83 |
4.20 |
4.07 |
|
Kuroda (45) |
3.73 |
3.59 |
3.93 |
4.10 |
|
Pitcher |
ERA |
FIP |
xFIP |
SIERA |
|
Vazquez (1) |
2.87 |
2.77 |
2.82 |
2.68 |
|
Lincecum (2) |
2.48 |
2.34 |
2.87 |
2.73 |
|
Verlander (3) |
3.45 |
2.80 |
3.26 |
2.79 |
|
Pineiro (19) |
3.49 |
3.27 |
3.68 |
3.56 |
|
Ubaldo (20) |
3.47 |
3.36 |
3.63 |
3.60 |
|
Gallardo (21) |
3.73 |
3.97 |
3.76 |
3.61 |
|
Pitcher |
ERA |
FIP |
xFIP |
SIERA |
|
Halladay (1) |
2.44 |
3.01 |
2.92 |
2.93 |
|
Weaver (2) |
3.01 |
3.06 |
3.51 |
2.97 |
|
Liriano (3) |
3.62 |
2.66 |
3.06 |
3.02 |
|
Shields (20) |
5.18 |
4.24 |
3.72 |
3.57 |
|
Ubaldo (21) |
2.88 |
3.10 |
3.73 |
3.58 |
|
Marcum (22) |
3.64 |
3.74 |
3.90 |
3.59 |
One temptation is to dismiss SIERA based on Javier Vazquez topping the chart in 2009. But don't forget that he had a 9.77 K/9 and 1.81 BB/9 (5.41 K/BB), with a 41.7% GB rate. Considering Timmy only pitched 6 more innings than Javy, It says a lot about the weights in SIERA's formula that he was able to beat out Tim Lincecum's 10.42 K/9, 2.72 BB/9 and 48.9% GB rate. And, while James Shields' ERA looks terrible, especially next to Ubaldo's, he walked nearly 1.5 batters less per 9 innings than Ubaldo (while only striking out .5 batters less).
While I think Ubaldo's a better pitcher than a lot of the people ranked ahead of him (and probably would be ranked higher if you threw un-regressed HR/FB into this mix), nobody ahead of him on the list had a higher walk rate in 2008, 09, or 10. The walks will always hurt his advanced metric score and ranking because of the increased potential for runs to score.
What's it worth?
SIERA is an interesting metric. The premise, that Ks, BBs, and ground balls are a good recipe to measure run prevention, is sound. The formula is a little cumbersome to grasp, but the reasoning behind it is similar to FIP and xFIP: Ks and BBs matter a ton in a pitcher's ability to prevent runs, but ground balls are better than any other batted ball in a variety of ways and circumstances. They're more likely to help a pitcher out of a jam with less than two outs, and they're high up on the expected out values without allowing for the possible negative outcome of a HR.
However, it's a big step to eliminate HRs altogether (as opposed to reducing the weight given to them in FIP, or providing expected HRs as in xFIP). While it may be a step forward in reducing the presence of luck in the evaluation of a pitcher's true skill, luck is certainly part of the game, particularly for a pitcher who constantly flirts with bad luck in the form of a fly ball rate well above the league average (walking the line and crossing the line are hard to distinguish at different points in time). Few pitchers, without off-setting it with a tremendous amount of Ks, are going to be able to flirt with that line without crossing it in a season or two.
SIERA is also a good reminder that advanced metrics are, in part or whole, meant to be better predictors of future ERA than ERA. When that is the definition of true talent level, people are going to have different ideas about how to get there. If assimilating and properly weighting the things a pitcher can control is the measure of true talent level, and then adding a constant to make it look like ERA, other metrics may be better.
44 comments
|
0 recs |
Do you like this story?
Comments
I've already noticed
Multiple times on baseball tonight where they’ll talk for a bit about a player’s BABIP or FIP, explaining each one in laymen’s terms. Pretty cool, I think.
"The designated hitter rule is like letting someone else take Wilt Chamberlain's free throws." - Rick Wise
by The Toddfather's Goatee on Mar 9, 2011 4:13 PM MST reply actions
Boring, boring, boring
Sure glad someone has the time to play with all those numbers. Probably not nearly as fun as watching a good pitcher actually throw to someone!
by Real Perspective on Mar 9, 2011 4:47 PM MST reply actions
Why...
do you feel watching a good pitcher actually throw to someone and recognizing the value of advanced sabermetrics mutually-exclusive ideas?? They shouldn’t be.
-C
It’s rough to sit through these games and not have someone that can’t hit a Ball?
I'm sure the people at Baseball Pro. who developed SIERA
have watched more pitchers throw than you or I will ever hope to see.
Comments like these don’t sit well with people who work hard to a) bridge the gap between the sabr and non-sabr communities and b) do it without being condescending.
Rocktober is not a time of year, it is a religion.
my mother would disagree
and I disagree with my mother (about the understanding part that is)
"There have been only two geniuses in the world. Willie Mays and Willie Shakespeare." ~Tallulah Bankhead
"Love is the most important thing in the world, but baseball is pretty good too." ~Greg, age 8
JFK
What is wrong with you?
Nobody’s making you read this. I’m fine with you thinking it’s boring, but deacs put a lot of work into this (and it shows), why would you say something so rude? Great work deacs, my head hurts in a good way and I’ll appreciate the game all the more because of it :-)
"The designated hitter rule is like letting someone else take Wilt Chamberlain's free throws." - Rick Wise
by The Toddfather's Goatee on Mar 9, 2011 11:24 PM MST up reply actions
Something tells me he's joking...
Could be wrong though.
by CentralCaliRox on Mar 9, 2011 11:35 PM MST up reply actions
from everything I've read
he’s not joking with his comment
"There have been only two geniuses in the world. Willie Mays and Willie Shakespeare." ~Tallulah Bankhead
"Love is the most important thing in the world, but baseball is pretty good too." ~Greg, age 8
JFK
I kind of think he/she is too....
2011 Colorado Zombies-DeadWalking to the NL West crown
Tulo, CarGo, Ubaldo ,The Toddfather - oh my
Original Thugget Loyalists United #4, UNugg #4, QPU Emeritus, PR Gynocracy VP
Thanks
It really comes down to whether someone is intrigued or believes the idea that a pitcher has certain things in his control. The better he has at those few things (K, BB, GB, HR, etc.), the better he is at reducing the opportunity for the other team to score runs.
If that doesn’t pique someone’s interest, they’re probably not going to care.
Om nom nom

"The designated hitter rule is like letting someone else take Wilt Chamberlain's free throws." - Rick Wise
by The Toddfather's Goatee on Mar 10, 2011 8:10 AM MST up reply actions
Can you explain to me..
how Roy Oswalt can have the exact same SIERA as Ricky Nolasco when it appears Roy dominated Ricky in every other statistical category in 2010? I’m not trying to poke fun, just trying to understand.
……………….BABIP……….WHIP…….AVG……ERA……VORP
Oswalt……..0.260……….1.02…… 0.213…..2.76……56.4
Nolasco ….0.323……….1.28…… 0.273…..4.51……13.4
.260 BABIP vs. .323 BABIP
That’s it right there
Oswalt also had a higher LOB%, although not unsustainably high.
Baseball Pro.‘s VORP doesn’t use DIPS stats, so the BABIP discrepancy isn’t accounted for. Also, VORP is cumulative, so it can’t be used alongside other rate statistics.
Rocktober is not a time of year, it is a religion.
But Oswalt's BABIP is better than Nolasco's..
Shouldn’t a pitcher want his hitters to have a lower BABIP? So when you add that into the other stats, how can he have a better SIERA?
BABIP isn't taken into consideration in SIERA
I don’t know if there is a correlation between strikeouts and BABIP or ground balls and BABIP. Oswalt and Nolasco struck out about the same number of batters per nine innings, and Nolasco walked .5 fewer batters per nine innings. Very similar except for the BABIP, which (partially) explains the difference in ERA. Also, Nolasco gave up more home runs, and more home runs as a share of fly balls (which he had many more of than Oswalt).
Those two things partially explain the huge gap in ERA. If you accept, based on HR/FB and BABIP, that the difference in their performance isn’t as great as ERA suggests, then it’s easier to reconcile them having an identical SIERA.
SIERA can tell you is how good a pitcher is at run prevention based on three weighted factors: strikeouts, walks, ground balls. Nolasco had a slight edge in Ks and BBs, and Oswalt had an edge in ground balls. It balanced out according the below formula.
OK, upon examination, there is a positive correlation between GB% and BABIP
y = 0.2829x + 0.3717
R² = 0.00939
Not a strong correlation, but the trend is definitely positive. That is to say, the higher your GB%, the higher your BABIP. At least in 2010.
by Andrew Martin on Mar 9, 2011 9:56 PM MST up reply actions
Over a larger sample, 2002-2010, 656 different pitchers of 160+IP
the correlation becomes much stronger
y = 1.354x + 0.0398
R² = 0.1152
GB% undeniably raises your BABIP
by Andrew Martin on Mar 9, 2011 9:59 PM MST up reply actions
think about it
a GB isn’t ever going to be a homer, a strikeout, or a walk (duh), but has the chance to squirt through for a single. Fly balls could become HR (not counted into BABIP) and are typically the easiest outs.
by Andrew Martin on Mar 9, 2011 10:08 PM MST up reply actions
Yea
Each batted ball type has an expected component BABIP:
LD= ~.730
GB= ~.240
FB= ~.135
Thus looking at the batted-ball profile, we generate an xBABIP based on the percentages.
A (very) rough model:
xBABIP = ((.730 x LD) + (.240 x GB) + (.135 x FB))/(Balls in Play)
Example:
Sample A: 200 Balls in Play, 18.0 LD%, 47.0 GB%, 35.0 FB%
Thus 36 line drives, 94 grounders, and 70 flyballs
xBABIP = ((.730 × 36) + (.240 × 94) + (.135 × 70))/200
xBABIP = .291
Rocktober is not a time of year, it is a religion.
Interestingly enough, FB% has a stronger NEGATIVE correlation to BABIP
y = -1.6715x + 0.8556
R² = 0.19187
by Andrew Martin on Mar 9, 2011 10:07 PM MST up reply actions
Nice.
I wanted to say that it was a positive correlation, but only based off Aaron Cook’s start vs. the Twins last summer. 20 batters faced, 14 ground balls. Somehow, every single one seemed like a single up the middle. An anecdotal way of saying what you said above.
You know what would be interesting?
A stat called OPSBIP (On Base + Slugging on Balls In Play). It might be a better way to tell us just how much a pitcher is getting burned (or saved) by BABIP.
For instance, ground balls, which produce a higher BABIP than flyballs, will almost always result in a single if they are hits (the one expection being the grouund ball doubles down the lines). However, flyballs will often go into the gap creating doubles and triples which obviously hurt the pitcher more than a groundball single. BABIP counts all of these the same and they are not.
While a pitcher who gets mainly groundballs should be expected to have a higher BABIP than a pitcher who gets mainly flyballs. I bet a stat like OPSBIP would tag the flyball pitcher with the bigger number.
22 more days until the Rockies Home Opener!!!!!!!
by RhodeIslandRoxfan on Mar 10, 2011 8:07 AM MST up reply actions
Groundballs have a higher average BABIP than flyballs
If two pitchers share a similar LD% (like Oswalt/Nolasco), and one has a higher GB% (Oswalt), we would normally expect the higher-GB pitcher to have a higher BABIP.
Rocktober is not a time of year, it is a religion.
Yes, but BABIP is considered a "luck" stat for pitchers because
(for the most part) pitchers have no control over their opponents’ BABIP. Watch this video; it does a much better job explaining it.
So 2010 Oswalt = Lucky, thus a better ERA
2010 Nolasco = Unlucky, thus a worse ERA
Rocktober is not a time of year, it is a religion.
I think it has to do with their ground ball rates
I apologize for not doing the actual math, but their underlying numbers (K/9 – Nolasco: 8.39, Oswalt: 8.21; BB/9 – Nolasco: 1.88, Oswalt: 2.34) are fairly close.
Oswalt’s ground ball rate (45.7%, Nolasco: 40%) must offset the deficit in the other two categories
Only Ks, BBs and GBs are taken into consideration in the SIERA formula
SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/- 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)/PA) – 5.195(BB/PA)*((GB-FB-PU)/PA) where +/- is as before such that it is a negative sign when (GB-FB-PU)/PA is positive and vice versa.
BABIP is more helpful in explaining the ERA disparity than the SIERA similarity.
BABIP leads to an inflated (or deflated) ERA because a larger (or smaller) share of balls put in play are ending up as hits. That consideration doesn’t really apply to SIERA, unless there is some correlation between Ks and BABIP or GBs and BABIP. (I could be missing something here.)
I think we're saying the same thing lol
I just mean that the BABIP discrepancy accounts for the difference in ERA, assuming SIERA is an accurate pitching metric.
We would expect, provided SIERA is accurate and that the sample size is large enough, that both of their BABIPs would regress towards the mean and that their ERA’s would approach their SIERA values.
Rocktober is not a time of year, it is a religion.
I wish I knew more about their formulation
something about nonlinear regression to try and determine pitcher run values just doesn’t sit well with me. I’m sure there’s a better algebraic explanation, but yeah.
by Andrew Martin on Mar 9, 2011 9:45 PM MST up reply actions
Thank you for writing this awesome article.
And thanks RMN for that awesome excell work. Very nice.
The Martha Stewart of processed foods.
Super Overlady Of the Ubaldo Lovers Club.
Proud Member of the PR gynocracy.
Video tips on posting links and images to Purple Row - Click Here -
for the record
Fangraphs has an “export to Excel” option so running those 3 correlations took like 10 minutes.
by Andrew Martin on Mar 10, 2011 12:06 PM MST via mobile up reply actions
Great Read.
I love learning new things.
The Martha Stewart of processed foods.
Super Overlady Of the Ubaldo Lovers Club.
Proud Member of the PR gynocracy.
Video tips on posting links and images to Purple Row - Click Here -

by 































