By way of introduction for those who don't know me: I've been doing my own projections since 2003. I thought I'd take the time to put together a little guide to the science of player projections: what they do, what they don't do, and how to evaluate them.
The core of every system: the multi-year weighted average
The mechanics of all projection systems are quite similar: you take a weighted average of the player's past 3-4 years of performance, adjust it for age, and voilá, there's your projection. Where the systems differ is in 1) the relative weights given to each year, 2) the aging patterns used, and most importantly 3) the minor league equivalencies used.
On breakouts, collapses, and consistent thinking
To the uninitiated, the term "projection system" suggests something capable of saying bold and interesting things about the future, a Magic 8 ball of sorts. It doesn't work that way. The whole point of a projection system is using the past to generate a best estimate of the future; a breakout or collapse is by definition a large deviation from the standard pattern, and not a best estimate.
The implication of this is that if you see a player who is projected to improve over his multi-year baseline performance level to an extent that aging alone cannot possibly explain, it doesn't mean that the system "sees something in him". That's not possible. What it means is quite simply that the projection system is broken.
I was looking at the CAIRO projections the other day. CAIRO generates the following ERA projections for select Rockies starters: Chad Bettis 4.40, Jorge De La Rosa 4.46, Aaron Cook 4.54, Rob Scahill 4.54, Juan Nicasio 4.55, Erik Stavert 4.55, Jason Hammel 4.55, Kenny Durst 4.74, Nicholas Schnaitmann 4.83, Esmil Rogers 4.87.
Does that look reasonable to any of you? I'm not a big Esmil Rogers fan, but look at those schmucks who project ahead of him. Kenny Durst, seriously? Durst is the same age as Rogers, and has spent the last two years getting shellacked in Modesto. How can he be better than Rogers? Does CAIRO see something in him? Of course not. It can't, because there's nothing to see. Durst's K rate is way below average even for the California League, and he doesn't do anything particularly well (BB rate right at league average). And that's all that CAIRO knows.
Now, for all I know, Durst has Strasburg-like stuff and just hasn't been getting results. I have no idea. But the point is that even if that's the case, CAIRO doesn't know it either. CAIRO only knows the numbers. And it's producing results that are hopelessly inconsistent with those numbers. Ergo, the system is broken. And when you're trying to interpret its results, intellectual consistency is key: the system is just as broken when it's forecasting surprisingly good performance for a guy who we want to like (Chad Bettis) as when it's doing the same for a random nobody like Durst. It's not as if the system is simply wrong with regard to Durst but somehow sees real potential in Bettis... you can't dismiss one projection while approving of the other, because the same problems afflict the projections of both pitchers and the underlying process is identical.
Now, CAIRO is an admittedly extreme example. But other systems are hardly immune to this sort of nonsense. ZiPS, for example, sees Esmil Rogers and Gio Gonzalez as roughly equivalent pitchers. Gonzalez and Rogers are the same age, making comparisons easy... just look at their last three seasons. Has Rogers ever been as good as Gonzalez? Not even close, right? So why do they project equally? Beats me. But the point, again, is that they shouldn't. When you have two guys the same age, the one with the indisputably superior performance record should have a significantly better projection. If he doesn't, the proper interpretation is not that the projection system is trying to tell us something interesting and unexpected, it's that the projection system is broken. Plain and simple.
The biggest source of error: MLEs
If you look again at those CAIRO pitching projections I listed above, the first question that should pop into your head is "How on earth are random A-ball organizational soldiers projecting to be just as good as established big leaguers?" The only possible answer to that question is that CAIRO is using horribly wrong minor league equivalencies (MLEs). And CAIRO isn't alone in that.
Looking at any of the popular projection systems (ZiPS, CHONE, PECOTA, Oliver), you'll see a recurring pattern: their MLEs are too generous. The quickest way to judge this is to see how many players they have projected above replacement level. Let's look at CHONE, which is generally considered to be the best of those systems, and I think rightfully so. Last year's CHONE projections had 352 starting pitchers projected above replacement. Think about what that means. 352 pitchers, 30 organizations = 11-12 pitchers per organization. 5 or 6 SPs on a staff... so basically, CHONE was declaring all of AAA to be above replacement level. That's logically impossible; replacement level has to be higher than AAA average, for reasons I trust are obvious to all. (And before anyone asks: the issue here is not in where replacement level is set. CHONE puts replacement level at the same place everyone else does, more or less. The issue is in having too many players projecting too close to MLB average).
So why does everyone get MLEs wrong?
I don't know for sure. But I can guess. The standard method for evaluating the accuracy of a projection system is to look at how the system fared in forecasting a post-facto selected group of players, namely those who accumulated a certain minimum amount of PA or IP at the MLB level in a given year. This means that if a guy flames out in AAA and never reaches the majors, or if he reaches the majors but performs horribly and doesn't get much playing time, he gets omitted from the accuracy study. In other words, the way to look accurate is to aim high on everyone, because the underperformers just end up being ignored. But looking accurate and being accurate are not the same thing. The only reasonable way to evaluate projection accuracy is to look at as wide a population as possible... of course, the reason that nobody does that is that it would require using MLEs to measure the performance of minor leaguers, and then we get into circular argument territory.
So what's the best way to calculate MLEs? Simple. Look at everyone who played in both AAA and MLB in the same year, and determine the average difference in their performance between levels. Repeat the process for AA to AAA, High-A to AA, etc. This method isn't theoretically perfect by any means (there are selective sampling issues at play), but it's realistically as good as we can do. And it yields a nice, intuitive progression from one step on the ladder to the next, reflecting the self-evident truth (which several projection systems appear to reject) that if you take a generic A-baller and put him in the big leagues, he'll get slaughtered.
A good projection system does not surprise. The value of a projection system isn't that it tells us things we don't know, it's that it tells us things we do know in a systematic way that avoids the inconsistencies and biases characteristic of human thought.
This is the philosophy behind my NEIFI projections (link goes to the projection spreadsheet). NEIFI doesn't produce particularly interesting results. I don't want it to. Interesting != accurate. NEIFI simply puts everyone on a level playing field, and generates easily explicable and eminently reasonable results for every single player it looks at. Is it the be-all and end-all of player evaluation? Absolutely not. But it's the most reliable starting point I know of.
Of course, I know that NEIFI doesn't serve the needs of everyone. It's not a very good tool for fantasy baseball, because it just projects overall park-adjusted value stats and makes no attempt to project raw stats or even components. So to those of you using other projection systems for fantasy purposes, all I can say is to be really careful, especially when dealing with rookies (where the MLE problem takes center stage).