Member Login

auto-login
Intro to Sabermetrics 101
Join My SpaceJoin FaceBooktwitterfantasyplayers

THE Latest

Linear Weights: The Positive and Negative (Runs)
  • By Michael Jong
  • November 13th, 2009

Last week we discussed Runs Created as one of the innovators in the field of run estimators. We talked about the benefits, flaws, and limitations of Runs Created. One of the mentioned limitations is the inability to use the Runs Created formula as initially written to find the run contribution of individual players, because the Runs Created formula was designed at the team level.

This week, we’ll talk about a system that can work on both the individual and team levels with good accuracy, but has its own flaws endemic to it. The system in question is the linear weights system. Linear weights is a simple concept: to each event that occurs in baseball, assign a run value to the event over some baseline, and multiply those values to the amounts recorded. What follows is a slightly more in-depth look at a brief history of the design, how the weights are produced, and the benefits and downsides of using linear weights.

The readings for the lecture are the following:

1. Two Jim Furtado pieces will be referenced here. First off, we have his general piece describing a brief account of the various run estimators developed in the past. I’ll be looking at the linear weights sections.

2. Second, we have Furtado’s own linear weights formula, Extrapolated Runs, explained by the man himself here.

3. The TangoTiger Wiki article on linear weights.

4. The TangoTiger Wiki article on batting runs.

5. Tom Tango’s 1999-2002 Run Expectancy Matrix.

6. Part 3 of Tom Tango’s run estimator series, particularly those charts I showed in our Runs Created discussion.

7. Phil Birnbaum’s two-part look at how we should not use multiple linear regression to determine linear weights (Parts 1 and 2)

Now, that looks like a lot of resources, but don’t worry, you won’t need to understand or even read through all of it (although you should, it’s great stuff). I’ll pull the parts I think are helpful to this most simplistic of explanations. Let’s get started.

Brief History

The founding father of the linear weights method is generally considered to be Pete Palmer, co-author of one of the classic sabermetric scripts, The Hidden Game of Baseball. Palmer used computer simulations (run at his work office late at night, after everyone had left, or so the tale goes) and examinations of World Series play-by-play data to determine a run expectation table, much like the one you see linked above by Tom Tango. From those run expectations, he was able to determine linear weights for each batting event.

Palmer debuted this weights system in The Hidden Game of Baseball, and it received its fair share of lauding and criticism. One of the biggest opponents of Palmer’s linear weights method was Bill James, who proclaimed quite simply that “the creation of runs is not linear.” Oddly enough, James later supported Paul Johnson’s Estimated Runs Produced system, which in fact turned out to be a linear weights system in disguise.

With the advent of Retrosheet being made available for years extending into the 1950’s, play-by-play data has become even more attainable for the laypeople (hey, even I have a database, though I haven’t learned to put it to good use yet). As a result, determining run expectancies and thus linear weights via this method have since become even easier. However, even as data that is entered into linear methods is easier to attain, linear weights still has the same advantages and disadvantages as it did when Palmer first published his findings.

The Methods

There are multiple ways to build a linear weights model for run scoring. Two main differences between common models are in their assessment of the baseline. Palmer’s system is representative of the system based upon an average baseline; the commonly used offensive statistics wOBA uses the same sort of scale. In these systems, the value of an out is measured at aroudn -0.27 runs compared to the average. Counter to that are systems that use an absolute baseline. In these systems, such as Paul Johnson’s ERP or Jim Furtado’s Extrapolated Runs, the value of an out is around -0.09 runs, representative of a baseline of zero runs.

These systems can be built in multiple ways. As mentioned above, Palmer used simulations and play-by-play data from the World Series to build a run expectancy table and measure differences in base/out states that occur after each event. This is quite similar to the empirical method of using play-by-play data that has now become possible thanks to the efforts at Retrosheet. Part 2 of Phil Birnbaum’s linked give an easy explanation as to how this is done theoretically.

Another method to determine the weights is to use dynamic run estimators to determine changes in expected run scoring from any given event. The easiest way to do this is to take an average team’s statistics and plug them into a run estimator such as Runs Created or BaseRuns, and then add one event and measure the difference in runs scored. This is the so-called “+1 method.” The issue with this simple methodology is that, even when adding a singular event, you are changing the given run environment and thus not really determining the weight for that event in the defined environment. In order to properly do that, a little bit of differential math is needed, but it is definitely doable.

A third method for determining the weights is one that has recently received some scorn, the method of multiple linear regression. Examples are given in the linked Furtado piece (#2 from above). In Part 1 of Birnbaum’s two links, he discusses why these designs can be flawed, and there is a whole lot of discussion in the comments there as well. The basic tenet here is that regression analysis carries no logical weight in terms of baseball, and thus events can be easily under/overweighed because the model does not understand the interrelation between run scoring and the events involved. Rather, the linear regression model only understands that runs were scored and that a certain amount of each of these events caused that number of runs to score.

Finally, there is something in between all of this. A combination of trial and error, regression, play-by-play analysis, and common sense can lead people to the right coefficients. Paul Johnson’s initial model of ERP was based on a common-sense skeleton and a lot of trial and error work. Furtado’s Extrapolated Runs builds off of Johnson’s model and improves upon it with a combination of many of the sources above, yielding very accurate results over the time period measured.

Ironically, despite all of the differences in these methodologies, they often come close in terms of the actual values of the weights. Most systems nail home runs and walks at pretty much the same weights as those determined empirically. The weight of the double seems to be the one missed the most often, especially by linear regression models, which drastically underrate the value of the double found empirically. Not much has been found as to why that is.

The Advantages

Well, linear weights would not be a viable system if it were not accurate, and indeed it has proven to be. In Jim Furtado’s study on the accuracy of run estimators, he has Palmer’s Batting Runs right along with the other major metrics, and has his own linear model, Extrapolated Runs, at the top of the leaderboard. All of these methods have a root mean square error of around 20-25 runs.

Perhaps the biggest advantage to using linear weights models occurs at the individual level rather than the team level. If you take the assumption that a batter’s individual events do not significantly affect the run environment faced at any given state, you can use linear weights to approximate the context-neutral value of each batting event for a player. Because batting events occur sporadically between all base/out states for most players (i.e. players do not get to control what base/out states they face), this seems to be a solid assumption for hitters. Thus, we can count all of a player’s events, multiply by the respective weights, and total them for a run total above or below the given baseline. In fact, the use of the rate stat wOBA, commonly referred to as the premier offensive statistic among much of the sabermetric community, is based on linear weights calculated from run expectancies. Linear weights are the most common method of calculating offensive contribution for individual players.

It should be noted here that linear models for pitchers are far less appealing than those for hitters. The difference is control over the run environment. Hitters do not determine what states they see when they step up to the plate. Pitchers, on the other hand, have a definite control over the run environments they see, as they are the ones who assist in allowing baserunners and getting outs directly (along with the defense). Thus, a linear model that assumes an equal distribution of all base/out states is not really correct for pitchers even though it works well for hitters.

The Disadvantages

If linear weights seem so good, why did James despise them so? Well, to put it succinctly, James was right. Runs are not scored linearly, and one need only look at the extremes to find that out. Take Furtado’s XR formula shown in his “Introducing XR” article. Say we have a team that played a game in which they singled one in the first and once in the ninth, while recording 27 strikeouts the rest of the way. It is quite obvious to anyone here that this team would have scored zero runs in this case, yet Furtado’s equation for XR has this team scoring -1.6 runs, a physical impossibility.

Let’s use another example. Let’s say our team hit one home run and struck out the remainder of the time. This team should score one run, but the XR model yields -1.2 runs in that calculation, again impossible. This is not necessarily an indictment only on the model posed by Furtado. These extremes can be shown to be troublesome to any linear model, like Palmer’s Batting Runs or wOBA.

Why? Because, like Runs Created, any linear weights model is designed to work within the framework of typical MLB performance. In both these models, the teams mentioned posted very low OBP and SLG, which hit a point at which it becomes impossible for the model to accurately portray reality. Sure enough, check out this chart from Tango’s run estimator series. It shows the score rate as OPS of the games he split increases. As you can see, linear weights is very poor at estimating scoring rates at a game OPS of below .200 (such as the ones described here) and at above .950.

Ultimately, linear weights systems are built to fit baseball scoring, but not model it. Thus, while the accuracy of these models are all high when applied to major league teams, they do not apply well at extreme values because they weren’t designed to. Linear weights is especially prey to this because the weights themselves must be determined from league run scoring; there simply is not another way of building a linear regression, play-by-play run expectancy, or even an acceptable framework without having a run environment in mind. If the run environment changes, the weights have to be changed.

Conclusions

None of the disadvantages should be seen as something that hobbles linear weights, but rather something that limits its use. For individual players, linear weights are very good for a wide range of players, simply because few hitters are so good or so bad as to meet the limits of run expectancy models. Still, the players and teams at the extremes will be valued incorrectly; the greatest players will be undercut by linear weights, while the worst players will be severely undersold. But for most, if not all of baseball, linear weights systems, if built properly, will be very accurate in quantifying a player’s contributions and reconciling mostly with the totals at the team level.

Ultimately, that is the appeal of linear weights, the ability to correlate well at both the individual and team levels. It may not model the reality of baseball, but because many of the models above match up well with the empirical models using play-by-play data, it fits run scoring quite well and thus is still very useful.

Post to Twitter Tweet This Post

Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture.
Anti-Spam Image

  • Categories

  • Archives

Reader Poll

Who would you like to see in our next Player Profile?
Loading ... Loading ...