Introducing ClR – a Stat to Rate Closer Effectiveness

I want to start a conversation today – in an area that I am not an expert with:  baseball statistics.  While I can do the math and manipulate spreadsheets, it’s the application of certain stats that can get me into trouble, so I go into this with some caution.

All of this that follows comes from a discussion earlier in the month about the value of a closer – as in “is Craig Kimbrel really worth that much?’  Stated alternately, “are closers over-valued by teams?”  It is relevant today as well since we’re talking a lot of Fantasy Baseball, and I (for one) was wondering about the value of selecting closers in the higher rounds of a fantasy league draft.

In trying to find an answer to this question, I realized that we don’t really have a single suitable stat to provide us with a satisfactory answer to the question.  That is where the digging began.

What Should We Measure?

More from Tomahawk Take

In any statistical pursuit, it is helpful to identify metrics of success.  Specifically,

what are the criteria by which I could call a Closer “successful”?

I offer the following suggestions:

  • Did the Closer do his job – did he get the Save?
  • Does he keep runners off base; dampens chances for a comeback
  • In some manner, did he dominate?  Did he control the action, regardless of the situation?

Sounds a bit vague, but that’s essentially it.  The result matters the most, but there is such a thing as the ‘Heart Attack Factor‘ (yes, I made that up) in which fans cringe whenever a comeback bid starts to blossom.  Once one or two runners get on, then you start feeling like all that is needed is “one big hit” to bust open a 9th inning rally.

In my mind, the elite closers keep such situations to a minimum.

Finding Suitable Stats

That was step 1.  Step 2 is finding statistics (one or more) to match those criteria.

BSv% – Blown Save Percentage

A Blown Save %age is necessary, for it speaks directly to the ultimate result:  did the Closer complete his task.  But right away, I can tell that one stat won’t be sufficient: this has nothing to do with base-runners, dominance, etc.  I want to include BSv%, but we’ll definitely need some additional help.

As a numeric quantity, though the Blown Save ratio should be small.  After all, if a ‘Closer’ is losing half his save attempts, he’s going to lose his job.  So I will expect to have to scale this up a bit.

FIP – Fielding Independent Pitching.

Admittedly, I hate this one for starting pitchers.  But it actually has a useful role here.  Check out the formula for FIP:

((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant

Don’t worry about the constant.  That’s for scaling purposes to make it closer to ERA, which doesn’t concern me in this context.  The important bits are the rest of that line:

  • High penalty for yielding home runs.  Yes.  Homers are the bane of a closer.
  • Penalty for yielding base-runners or doing stupid things:  giving up walks and hit batsmen.  Good.
  • Reward for strikeouts.  This addresses dominance – leaves fewer things to chance (like your own defense).

I am not concerned about the innings scaling.  Closers will tend to pitch relatively few innings in a season – having more innings will tend to smooth over bumpy outings, and that’s fine – we want to know who is dominant overall.

WHIP – Walks plus Hits per inning pitched

The idea behind using WHIP is about the quantity of base-runners… but in a different manner from that of FIP:  hits are added.  FIP by itself seems incomplete, in that hits are (intentionally, by its definition) excluded.  WHIP includes this, so that’s helpful.

There is a flaw in adding this, since adding WHIP would mean that Walks will be (a) included twice (also appearing in FIP); and (b) it is included in different ways.  Additionally, the “penalty” for hits allowed is effectively “1”, which means we need to do something to alter the formulas so that there is a suitable scaling between the HR penalty (13) and the Walk penalty (3) of FIP.  We’ll deal with that later.

Cincinnati Reds relief pitcher Aroldis Chapman. Mandatory Credit: Frank Victores-USA TODAY Sports

IS THAT ENOUGH?

I think so.  You could argue about ERA, but I’m frankly less concerned about runs scored than I am about the results.  There are a couple of reasons for this:

  • With so few innings pitched, ERA figures can be blown up from a single bad outing.
  • ERA can also be messed up thanks to other pitchers allowing runs that the original pitcher was “responsible for.”
  • Sometimes allowing a run – even two – doesn’t mess up the outcome.  It does speak to the dominance factor, and it might make fans have to change their undies, but ultimately it’s the result that should matter most.

My Proposal

Again, this is not digging nearly to the depths that fangraphs.com would aspire to, but this should be a useful metric.

First off, I am going to create a blend of FIP and WHIP that accounts for the overlap problems noted above:

WWHIP (Weighted WHIP) = ((10 * HR) + (5 * H) + (3 * (BB+HBP)) – (2 * K)) / IP

This adds penalties for yielding bases via hits.  The multiplier changes involve the overlaps (home run are also hits, but hits can include doubles and triples).  The numbers account for both their relative frequency and for the impact of the total bases.  This has to be aggregated across the league since the major statistical websites do not break out doubles and triples allowed by pitchers as part of their standard charts.

I am tempted to increase the multiplier on strikeouts, but will yield to the FIP creators on that – plus there are effective closer-relievers who are not strikeout artists.

Because of my desired emphasis on the results, I will add scaled-up Blown Save percentage to yield a final number.  That becomes the following:

ClR (Closer Rating) = WWHIP + (10 * BSv%)

Let’s See How it Works

There are 1176 “pitcher seasons” since the 1990 season that meet the following criteria:

  • Minimum 20 innings pitched per year
  • Minimum 5 saves in that season

The numbers are interesting… both on the top and the bottom of the scale:

WORST ClR (CLOSER RATING) SINCE 1990:

[table id=10 /]

It’s interesting seeing some of the names on this chart – including Billy Wagner and Norm Charlton.  But if you look at the stats, you can see why they are near the bottom of the list:  they pretty well stunk as closers during that year.  Charlton walked 47 in 69 innings and gave up 69 hits – 7 of those leaving the park, blowing 10 saves out of 24.  That should be a bottom-of-the-chart result.

BEST ClR (CLOSER RATING) SINCE 1990:

[table id=11 /]

The good news is that the players you would hope are on the top of the charts are actually there. Craig Kimbrel, Greg Holland and Aroldis Chapman made this group twice each.  Dennis Eckersley is here.  Billy Wagner had a much better year in 1999 than in 2000.  He gave up 5 homers that year, but had a silly number of strikeouts (124) in ~75 innings.

Kimbrel’s 4 blown saves in 2014 are tops in this group, but his dominance showed otherwise with few HR and hits yielded to keep him on this leaderboard.

In 2014, the best closers were Chapman, Kimbrel, and Holland – all rating under 2.00.

The team with the most closer trouble, by this measure, was probably Toronto (Brett Cecil/6.27, Casey Janssen/7.43, and Sergio Santos/12.61).  The White Sox were right with them, though (Zach Putnam/4.83, Jake Petricka/7.15, Ronald Belisario/9.44, and Matt Lindstrom/11.88).

The Point of ClR

The overall idea here was to create something that might be useful in ranking closers against each other, and thus determining whether they are all the same (more-or-less) in value, or if a Craig Kimbrel at 1.82 ClR really is significantly better than a Drew Storen (ClR 5.52 in 2014).

But it’s mostly about how difficult the closer makes his own task.  More hits, more walks … it makes life rougher on himself and his team.  That is what I sought to capture here.  A higher ClR means that more pitchers have to warm up “just in case.”  It means that Managers have to start thinking about contingency plans.  It means that teams lose.

I hope to monitor this through the 2015 season and see how it works.  In the meantime, I’d like to get your thoughts as well.