<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Intro to Sabermetrics 101</title>
	<atom:link href="http://fanhuddle.com/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://fanhuddle.com/statistics</link>
	<description>Just another Welcome to Fanhuddle weblog</description>
	<lastBuildDate>Fri, 20 Nov 2009 20:25:59 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>BaseRuns: The Ultimate in Run Estimation (so far)</title>
		<link>http://fanhuddle.com/statistics/2009/11/20/baseruns-the-ultimate-in-run-estimation-so-far/</link>
		<comments>http://fanhuddle.com/statistics/2009/11/20/baseruns-the-ultimate-in-run-estimation-so-far/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 20:25:59 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Fundamentals]]></category>
		<category><![CDATA[BaseRuns]]></category>
		<category><![CDATA[David Smyth]]></category>
		<category><![CDATA[Patriot]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=218</guid>
		<description><![CDATA[In the last few Fundamentals sections, I&#8217;ve been discussing run estimators. First, we started with one of the originals, Runs Created. Then, we discussed a little about linear weights. Both system types had their share of positives and negatives. One common thing I mentioned for both is that, because both systems were designed with certain [...]]]></description>
			<content:encoded><![CDATA[<p>In the last few Fundamentals sections, I&#8217;ve been discussing run estimators. First, we started with one of the originals, <a href="http://fanhuddle.com/statistics/2009/10/30/runs-created-the-first-step/">Runs Created</a>. Then, we discussed a little about <a href="http://fanhuddle.com/statistics/2009/11/13/linear-weights-positives-negatives/">linear weights</a>. Both system types had their share of positives and negatives. One common thing I mentioned for both is that, because both systems were designed with certain run environments in mind (linear weights by definition must be based on certain environments, while Runs Created was initially made through observational analysis of the major league environment by Bill James), they falter when taken &#8220;out of their element.&#8221;</p>
<p>Today, we&#8217;re going to discuss the current premier run estimator, BaseRuns. The reason why BaseRuns is considered the best run estimation tool at the moment is because it has the closest model to reality of any of the other systems. As Bill James mentioned, runs are not scored/created linearly, so linear weights is ultimately a system that approximates the environment it is assigned (and it does this very well). RC is a dynamic estimator, intended to model baseball reality, but in actuality its system is flawed and does not actually model run scoring as much as it fits it (like linear weights). BaseRuns, on the other hand, offers us the best, most intuitive model for run scoring as of yet.</p>
<p>Here are the references/readings for the discussion:</p>
<p>1. Patriot&#8217;s <a href="http://gosu02.tripod.com/id108.html">excellent BaseRuns write-up</a>, upon which most of this piece will based.</p>
<p>2. <a href="http://www.tangotiger.net/rc3.html">Part 3</a> of Tom Tango&#8217;s run estimator series, particularly those charts I showed in our last two discussions.</p>
<p>3. TangoTiger&#8217;s wiki article on <a href="http://www.tangotiger.net/wiki/index.php?title=Base_Runs">BaseRuns</a>.</p>
<p>Let&#8217;s dive into BaseRuns.<span id="more-218"></span></p>
<p><strong>The Model</strong></p>
<p>David Smyth is attributed for his work in the early 1990&#8217;s on BaseRuns. Initially, he attempted to work from the basic design of Runs Created shown here:</p>
<p>Runs = A*B/C</p>
<p>where A is the on-base factor, B is the advancement factor, and C is the opportunity factor. However, Smyth found in his work that this design did not model run scoring well. He changed the model to this basic structure.</p>
<p><strong>Runs = A*[B/(B+C)] + D</strong></p>
<p>where A is once again an on-base factor, B represents an advancement factor, C represents a number of outs, and D represents home runs. You can see by this construction that the general run-scoring model is a measure of baserunners (A) multiplied by a percentage of baserunners scored (advancement of runners B over total opportunities B+C) plus any home runs hit (D).</p>
<p>The simplest formula for the BaseRuns components is shown as follows.</p>
<p>A = H + W &#8211; HR<br />
B = (1.4*TB &#8211; .6*H &#8211; 3*HR + .1*W)*1.02<br />
C = AB &#8211; H<br />
D = HR</p>
<p>A more complicated model including most recorded official statistics is as follows.</p>
<p>A = H + W + HBP &#8211; HR &#8211; .5*IW<br />
B = (1.4*TB &#8211; .6*H &#8211; 3*HR + .1*(W + HBP &#8211; IW) + .9*(SB &#8211; CS &#8211; GDP))*1.1<br />
C = AB &#8211; H + CS + GDP<br />
D = HR</p>
<p>Finally, there is a model used for pitchers.</p>
<p>A = H + W &#8211; HR<br />
B = (1.4*TBe &#8211; .6*H &#8211; 3*HR + .1*W)*1.1<br />
C = 3*IP<br />
D = HR<br />
Where TBe = 1.12*H + 4*HR</p>
<p>Here, TBe serves as an estimate of total bases against for pitchers. Obviously, many sources have that data, and that can be used in its stead.<br />
Here, TBe = 1.12*H + 4*HR</p>
<p><strong>The differences between RC and BaseRuns</strong></p>
<p>The primary and perhaps most important difference between RC and BaseRuns is how the system handles home runs. By not separating home runs from other factors, RC handles home runs inherently incorrectly. As Patriot explains in his article, home runs in an environment where every hit is a home run are worth four runs each, which makes no sense. On the other hand, all home runs must at least be worth one run, even in an environment where no other appearances resulted in a positive event. The example given in the article cites a game in which a player hits a home run, then the team records 27 outs. In this case, RC would estimate that this team would score 0.14 runs, another impossibility. This problem is of course also seen in linear weights. Each home run is valued at approximately 1.4 runs in a normal context. However, if you stretch the context into a per-PA basis, you can see that solo home runs clearly cannot amount to 1.4 runs.</p>
<p>BaseRuns handles the home run separately from the on-base factor and deducts some amount from the advancement factors. It correctly values a solo home run as worth one run, regardless of the context provided.</p>
<p><strong>The Advantages</strong></p>
<p>The biggest advantage in BaseRuns is obvious: it is the best model for estimating how runs are scored intuitively. When looking at the basic structure, it makes perfect sense. Runners that get on base need to be advanced to score, and the advancement factor divided by the opportunities factor of (Outs + Advanced Bases) comes out as a percentage of baserunners that score. Multiplying these factors obviously comes out to an estimate of how many runs score from players that get on base. Adding home runs separately accounts for the fact that each home run must by definition provide at least one run. Furthermore, removing home runs from the baserunner factor makes sense, as home runs do not actually put players on base. Basically, the fundamental structure of BaseRuns makes baseball sense, and that is why it works so well.</p>
<p>As evidenced by much of the research on the topic, it does indeed work very well, even in extreme environments. Check out <a href="http://www.tangotiger.net/rc3ops.jpg">this chart</a> from Tango&#8217;s series. You can see in the chart that the estimated scoring percentage for BaseRuns and actual empirical data matches very closely even as we move from extremely low run environments to extremely high ones.</p>
<p>An additional advantage comes from the customization element. Like RC, there are ways to adjust the factors accordingly to fit any new data points, because the factors are realities of baseball that are easy to understand. Patriot points out that the B factor of advancement can be changed depending on the environment. It can also be changed to fit an actual percentage of runners scored, though this would feel like &#8220;cheating&#8221; by tailoring the system to meet the dataset. Nevertheless, it is possible to tailor factors to include different points of data and to tinker with new formulas for the baserunners scoring rate.</p>
<p>Finally, because BaseRuns fits empirical data so well, even in the extremes, linear weights can be determined from BaseRuns by using a little differentiation. In fact, Tango has done this exact thing in d<a href="http://www.tangotiger.net/customlwts.html">etermining linear weights for various run environments</a>.</p>
<p>In short, BaseRuns is a dynamic estimator with what we perceive as the correct factors necessary for real baseball run scoring.</p>
<p><strong>The Disadvantages</strong></p>
<p>While BaseRuns is the best estimate we have at the moment, it is by no means perfect. One standing issue with BaseRuns is its estimation of stranded runners in an inning. As mentioned on the wiki, BaseRuns can overestimate a number of runners stranded, putting the number past three even though that is the limit of stranded runners in an inning. Also described in the wiki, BaseRuns can overestimate runs scored in an OBP range of .500-.800, as can be seen by <a href="http://www.tangotiger.net/rc3oba.jpg">this chart</a>. However, this estimation is leagues closer to what any one linear weights model or RC would give.</p>
<p>Finally, there is still an issue with applying BaseRuns to individual players. Clearly, the model was designed to run on teams, so applying it to individuals is a misuse of the equation. In applying it to hitters, the same problem from RC is seen; one hitters&#8217; on-base events (OBP) and advancement events (SLG) do not interact with each other, but rather with the on-base/advancement of their team. However, just like in RC, use of a theoretical team model can fix this. Also, use of BaseRuns for pitchers, given that pitchers play behind a defense, is acceptable for estimating runs allowed by pitchers and their defense.</p>
<p><strong>Conclusions</strong></p>
<p>BaseRuns stands as the best model we currently have available to estimate the impact of events in terms of runs. It is dynamic, which counters the use of linear models that do not model baseball reality. At its basic structure, it is a better model for actual run scoring than Runs Created or other designed dynamic models. Because of its accuracy, its applications are wide-ranging. The only thing left to do is find a better estimation of baserunner scoring rate, but the basic B/(B+C) is good enough to fit very well in all sorts of run environments. While linear weights are still useful for individual players, BaseRuns can help contribute to the system&#8217;s accuracy by helping determine the weights themselves. At a team level, there simply is no reason to use anything other than BaseRuns to estimate run production.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=BaseRuns%3A+The+Ultimate+in+Run+Estimation+%28so+far%29+http://3fryg.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=BaseRuns%3A+The+Ultimate+in+Run+Estimation+%28so+far%29+http://3fryg.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/11/20/baseruns-the-ultimate-in-run-estimation-so-far/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Player Profile: The Rookies of the Year</title>
		<link>http://fanhuddle.com/statistics/2009/11/16/player-profile-the-rookies-of-the-year/</link>
		<comments>http://fanhuddle.com/statistics/2009/11/16/player-profile-the-rookies-of-the-year/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 21:19:51 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Player Profile]]></category>
		<category><![CDATA[Andrew Bailey]]></category>
		<category><![CDATA[Andrew McCutchen]]></category>
		<category><![CDATA[Brett Anderson]]></category>
		<category><![CDATA[Chris Coghlan]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=215</guid>
		<description><![CDATA[Today, the AL and NL Rookie of the Year awards were announced, and let me congratulate the Florida Marlins&#8217; Chris Coghlan and the Oakland Athletics&#8217; Andrew Bailey on winning the NL and AL awards respectively. Both players had excellent seasons in their own way, and were among the best rookies of each league.
That being said, [...]]]></description>
			<content:encoded><![CDATA[<p>Today, the AL and NL Rookie of the Year awards were announced, and let me congratulate the <strong>Florida Marlins&#8217; Chris Coghlan</strong> and the <strong>Oakland Athletics&#8217; Andrew Bailey</strong> on winning the NL and AL awards respectively. Both players had excellent seasons in their own way, and were among the best rookies of each league.</p>
<p>That being said, the numbers suggest that they were NOT the best rookies. Here, we&#8217;re going to look a little more into some numbers to find out what players should have been in the mix.<span id="more-215"></span></p>
<p><strong>Andrew Bailey</strong></p>
<p>Bailey put up an excellent season as the closer for the Oakland A&#8217;s. Let&#8217;s get this right out of the way, though: here at Intro to Sabermetrics 101, we <em>do not discuss saves</em> as a meaningful statistic. Yes, Bailey saved 26 of 30 attempts this season for the A&#8217;s, but we <em>do not consider this a measure of skill or production</em>. This goes as well for RBIs. OK, let&#8217;s get on with Bailey.</p>
<p>In 83 1/3 innings this season, Andrew Bailey struck out 91 batters, walked 21 unintentionally, and gave up only five home runs. That equates to a strikeout rate of 28.1%, an unintentional walk rate of 6.5%, and a HR/FB% of 5.6% <a href="http://www.fangraphs.com/statss.aspx?playerid=1368&amp;position=P">according to FanGraphs</a>. Comparing those strikeout and walk rates to the American League reliever averages of 19.4% and 8.8% respectively also shines a favorable light on Bailey&#8217;s season. By all accounts, Bailey&#8217;s year was excellent.</p>
<p>So why did I say that he may not have been the right candidate for the award? This sentiment stems from the fact that, for the most part, relievers simply are not as valuable as starters, and there was a rookie starter this season who did very well this season. In fact, that rookie starter resides on the same team as Bailey.</p>
<p>According to FanGraphs, Bailey was worth 2.4 WAR, tied with teammate <strong>Michael Wuertz</strong> and second only to <strong>Chicago White Sox</strong> reliever <strong>Matt Thornton</strong> in WAR for American League relievers. If you prefer Rally&#8217;s method of determining defense-independent pitching (subtracting a prorated defensive value based on team defensive runs), Bailey comes out at a much more favorable 4 WAR. However, teammate and fellow rookie <strong>Brett Anderson </strong>came out with an equally impressive rookie campaign. Anderson&#8217;s 6.0% unintentional walk rate and 20.4% strikeout rate as a starter appear equally as impressive than Bailey&#8217;s performance as a reliever.  Sure enough, even though Anderson suffered from a bit of a long-ball problem, his FIP came out to 3.69, as compared to Bailey&#8217;s 2.56. As a general rule of thumb, we expect relievers to knock off one run per nine innings from their runs allowed due to their shortened workload, so the two appear to come out fairly even.</p>
<p>Anderson recorded 3.8 WAR as a starter <a href="http://www.fangraphs.com/statss.aspx?playerid=8223&amp;position=P">according to FanGraphs</a>, though Rally&#8217;s method yielded a less flattering 2.3 WAR. It may be a matter of &#8220;what&#8217;s your flavor&#8221; for determining pitcher production outside of defense, but it is definitely worth noting that, skill-wise, it appears that Anderson and Bailey were very close, and that Anderson&#8217;s playing time as a starter would win out. That being said, the AL argument seems more debatable.</p>
<p>There were other candidates, and it would be a shame not to mention them. Surprisingly, <strong>Texas Rangers </strong>shortstop <strong>Elvis Andrus</strong> did not get the award, despite the acclaim he received as a defensive wizard and the player who fixed Texas&#8217; pitching staff. Thanks to Andrus&#8217; +10 glove at shortstop, he was worth 3 WAR <a href="http://www.fangraphs.com/statss.aspx?playerid=8709&amp;position=SS">according to FanGraphs,</a> which was well in the hunt.</p>
<p>Finally, I applaud the members of the BBWAA for avoiding <strong>Rick Porcello</strong> as their RoY choice. Porcello may indeed have a bright future ahead of him, but this season the best thing he did was put the ball on the ground (54% GB%) and let his excellent team defense behind him take care of it (<strong>Detroit Tigers&#8217;</strong> team bUZR: +43 runs). Porcello struck out just 12% of hitters he faced, while walking 7.2% of them. In addition, despite the excess of ground balls, Porcello still allowed 23 home runs on the year, three more than Anderson in five fewer innings of work.</p>
<p><strong>Chris Coghlan</strong></p>
<p>This debate is something that I have raged over in my Marlin Maniac blog for <a href="http://marlinmaniac.com/2009/10/14/bba-ballot-national-league-rookie-of-the-year/">quite</a> <a href="http://marlinmaniac.com/2009/10/23/coghlan-vs-mccutchen/">some</a> <a href="http://marlinmaniac.com/2009/11/16/thoughts-coghlan-nl-roy/">time</a>. While pitchers such as the <strong>Chicago Cubs&#8217; Randy Wells</strong>, <strong>Atlanta Braves&#8217; Tommy Hanson</strong>, and <strong>Philadelphia Phillies&#8217; J.A. Happ</strong> deserved consideration and mention in this debate, the <strong>Pittsburgh Pirates&#8217; Andrew McCutchen</strong> likely lost the most in this award nomination. In my opinion, McCutchen deserved the award over Coghlan, and the argument was based soundly on defense. However, the BBWAA does not vote on defense clearly, and this may have been why Coghlan took it over McCutchen.</p>
<p>We discussed in the glossary the concept of positional adjustment. The basis of it is simple: different positions vary in degree of difficulty, so the pool of players that can play certain positions is smaller or larger than others. As a result, we give adjustments for the value of a player having played in a more difficult position, based on previous players and how well they have done playing multiple positions. The gap between defense in a corner outfield and center field is 10 runs per 162 games. In the time that McCutchen and Coghlan played this season, that gap was almost eight runs. Without even considering the quality of defense between the two players, Chris Coghlan would have to have been six runs better offensively to break even with McCutchen.</p>
<p>However, the difference between McCutchen&#8217;s and Coghlan&#8217;s park-adjusted wOBA was about .004 points. The difference in their OPS was .014 points. This difference was a bit lower than what it needed to be to even Coghlan and McCutchen.</p>
<p>Then, when you look at the defensive numbers and scouting evaluations of both players, you can tell that there was a pretty significant defensive gap between Coghlan and McCutchen, not including difference in position. This should have made the choice between them more simple, but it seems that the BBWAA will continue to ignore defense or position and instead award the player with the better offense every time.</p>
<p>While I am happy to see two deserving players win the award, I can safely claim that they were, at the very least, questionable choices. Of course, this is the norm almost every season, so this one should be of no surprise.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+The+Rookies+of+the+Year+http://7aonb.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+The+Rookies+of+the+Year+http://7aonb.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/11/16/player-profile-the-rookies-of-the-year/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Linear Weights: The Positive and Negative (Runs)</title>
		<link>http://fanhuddle.com/statistics/2009/11/13/linear-weights-positives-negatives/</link>
		<comments>http://fanhuddle.com/statistics/2009/11/13/linear-weights-positives-negatives/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 00:08:42 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Fundamentals]]></category>
		<category><![CDATA[Jim Furtado]]></category>
		<category><![CDATA[Linear Weights]]></category>
		<category><![CDATA[Pete Palmer]]></category>
		<category><![CDATA[Tom Tango]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=193</guid>
		<description><![CDATA[Last week we discussed Runs Created as one of the innovators in the field of run estimators. We talked about the benefits, flaws, and limitations of Runs Created. One of the mentioned limitations is the inability to use the Runs Created formula as initially written to find the run contribution of individual players, because the [...]]]></description>
			<content:encoded><![CDATA[<p>Last week we discussed Runs Created as one of the innovators in the field of run estimators. We talked about the benefits, flaws, and limitations of Runs Created. One of the mentioned limitations is the inability to use the Runs Created formula as initially written to find the run contribution of individual players, because the Runs Created formula was designed at the team level.</p>
<p>This week, we&#8217;ll talk about a system that can work on both the individual and team levels with good accuracy, but has its own flaws endemic to it. The system in question is the linear weights system. Linear weights is a simple concept: to each event that occurs in baseball, assign a run value to the event over some baseline, and multiply those values to the amounts recorded. What follows is a slightly more in-depth look at a brief history of the design, how the weights are produced, and the benefits and downsides of using linear weights.</p>
<p>The readings for the lecture are the following:</p>
<p>1. Two Jim Furtado pieces will be referenced here. First off, we have his general piece describing<a href="http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/Why_Do_We_Need_Another_Player_Evaluation_Method.htm"> a brief account of the various run estimators</a> developed in the past. I&#8217;ll be looking at the linear weights sections.</p>
<p>2. Second, we have Furtado&#8217;s own linear weights formula, Extrapolated Runs, explained by the man himself <a href="http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/IntroducingXR.htm">here</a>.</p>
<p>3. The TangoTiger Wiki article on <a href="http://www.tangotiger.net/wiki/index.php?title=Linear_Weights">linear weights</a>.</p>
<p>4. The TangoTiger Wiki article on <a href="http://www.tangotiger.net/wiki/index.php?title=Batting_Runs">batting runs</a>.</p>
<p>5. Tom Tango&#8217;s <a href="http://www.tangotiger.net/RE9902.html">1999-2002 Run Expectancy Matrix</a>.</p>
<p>6. <a href="http://www.tangotiger.net/rc3.html">Part 3</a> of Tom Tango&#8217;s run estimator series, particularly those charts I showed in our Runs Created discussion.</p>
<p>7. Phil Birnbaum&#8217;s two-part look at how we should not use multiple linear regression to determine linear weights (Parts <a href="http://sabermetricresearch.blogspot.com/2009/10/dont-use-regression-to-calculate-linear.html">1</a> and <a href="http://sabermetricresearch.blogspot.com/2009/10/dont-use-regression-to-calculate-linear_30.html">2</a>)</p>
<p>Now, that looks like a lot of resources, but don&#8217;t worry, you won&#8217;t need to understand or even read through all of it (although you should, it&#8217;s great stuff). I&#8217;ll pull the parts I think are helpful to this most simplistic of explanations. Let&#8217;s get started.<span id="more-193"></span></p>
<p><strong>Brief History</strong></p>
<p>The founding father of the linear weights method is generally considered to be Pete Palmer, co-author of one of the classic sabermetric scripts, <em>The Hidden Game of Baseball</em>. Palmer used computer simulations (run at his work office late at night, after everyone had left, or so the tale goes) and examinations of World Series play-by-play data to determine a run expectation table, much like the one you see linked above by Tom Tango. From those run expectations, he was able to determine linear weights for each batting event.</p>
<p>Palmer debuted this weights system in <em>The Hidden Game of Baseball</em>, and it received its fair share of lauding and criticism. One of the biggest opponents of Palmer&#8217;s linear weights method was Bill James, who proclaimed quite simply that &#8220;the creation of runs is not linear.&#8221; Oddly enough, James later supported Paul Johnson&#8217;s Estimated Runs Produced system, which in fact turned out to be a linear weights system in disguise.</p>
<p>With the advent of Retrosheet being made available for years extending into the 1950&#8217;s, play-by-play data has become even more attainable for the laypeople (hey, even I have a database, though I haven&#8217;t learned to put it to good use yet). As a result, determining run expectancies and thus linear weights via this method have since become even easier. However, even as data that is entered into linear methods is easier to attain, linear weights still has the same advantages and disadvantages as it did when Palmer first published his findings.</p>
<p><strong>The Methods</strong></p>
<p>There are multiple ways to build a linear weights model for run scoring. Two main differences between common models are in their assessment of the baseline. Palmer&#8217;s system is representative of the system based upon an average baseline; the commonly used offensive statistics wOBA uses the same sort of scale. In these systems, the value of an out is measured at aroudn -0.27 runs compared to the average. Counter to that are systems that use an absolute baseline. In these systems, such as Paul Johnson&#8217;s ERP or Jim Furtado&#8217;s Extrapolated Runs, the value of an out is around -0.09 runs, representative of a baseline of zero runs.</p>
<p>These systems can be built in multiple ways. As mentioned above, Palmer used simulations and play-by-play data from the World Series to build a run expectancy table and measure differences in base/out states that occur after each event. This is quite similar to the empirical method of using play-by-play data that has now become possible thanks to the efforts at Retrosheet. Part 2 of Phil Birnbaum&#8217;s linked give an easy explanation as to how this is done theoretically.</p>
<p>Another method to determine the weights is to use dynamic run estimators to determine changes in expected run scoring from any given event. The easiest way to do this is to take an average team&#8217;s statistics and plug them into a run estimator such as Runs Created or BaseRuns, and then add one event and measure the difference in runs scored. This is the so-called &#8220;+1 method.&#8221; The issue with this simple methodology is that, even when adding a singular event, you are changing the given run environment and thus not really determining the weight for that event in the defined environment. In order to properly do that, a little bit of differential math is needed, but it is definitely doable.</p>
<p>A third method for determining the weights is one that has recently received some scorn, the method of multiple linear regression. Examples are given in the linked Furtado piece (#2 from above). In Part 1 of Birnbaum&#8217;s two links, he discusses why these designs can be flawed, and there is a whole lot of discussion in the comments there as well. The basic tenet here is that regression analysis carries no logical weight in terms of baseball, and thus events can be easily under/overweighed because the model does not understand the interrelation between run scoring and the events involved. Rather, the linear regression model only understands that runs were scored and that a certain amount of each of these events caused that number of runs to score.</p>
<p>Finally, there is something in between all of this. A combination of trial and error, regression, play-by-play analysis, and common sense can lead people to the right coefficients. Paul Johnson&#8217;s initial model of ERP was based on a common-sense skeleton and a lot of trial and error work. Furtado&#8217;s Extrapolated Runs builds off of Johnson&#8217;s model and improves upon it with a combination of many of the sources above, yielding very accurate results over the time period measured.</p>
<p>Ironically, despite all of the differences in these methodologies, they often come close in terms of the actual values of the weights. Most systems nail home runs and walks at pretty much the same weights as those determined empirically. The weight of the double seems to be the one missed the most often, especially by linear regression models, which drastically underrate the value of the double found empirically. Not much has been found as to why that is.</p>
<p><strong>The Advantages</strong></p>
<p>Well, linear weights would not be a viable system if it were not accurate, and indeed it has proven to be. In Jim Furtado&#8217;s study on the <a href="http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/accuracy.htm">accuracy of run estimators</a>, he has Palmer&#8217;s Batting Runs right along with the other major metrics, and has his own linear model, Extrapolated Runs, at the top of the leaderboard. All of these methods have a root mean square error of around 20-25 runs.</p>
<p>Perhaps the biggest advantage to using linear weights models occurs at the individual level rather than the team level. If you take the assumption that a batter&#8217;s individual events do not significantly affect the run environment faced at any given state, you can use linear weights to approximate the context-neutral value of each batting event for a player. Because batting events occur sporadically between all base/out states for most players (i.e. players do not get to control what base/out states they face), this seems to be a solid assumption for hitters. Thus, we can count all of a player&#8217;s events, multiply by the respective weights, and total them for a run total above or below the given baseline. In fact, the use of the rate stat wOBA, commonly referred to as the premier offensive statistic among much of the sabermetric community, is based on linear weights calculated from run expectancies. Linear weights are the most common method of calculating offensive contribution for individual players.</p>
<p>It should be noted here that linear models for pitchers are far less appealing than those for hitters. The difference is control over the run environment. Hitters do not determine what states they see when they step up to the plate. Pitchers, on the other hand, have a definite control over the run environments they see, as they are the ones who assist in allowing baserunners and getting outs directly (along with the defense). Thus, a linear model that assumes an equal distribution of all base/out states is not really correct for pitchers even though it works well for hitters.</p>
<p><strong>The Disadvantages</strong></p>
<p>If linear weights seem so good, why did James despise them so? Well, to put it succinctly, James was right. Runs are not scored linearly, and one need only look at the extremes to find that out. Take Furtado&#8217;s XR formula shown in his &#8220;Introducing XR&#8221; article. Say we have a team that played a game in which they singled one in the first and once in the ninth, while recording 27 strikeouts the rest of the way. It is quite obvious to anyone here that this team would have scored zero runs in this case, yet Furtado&#8217;s equation for XR has this team scoring -1.6 runs, a physical impossibility.</p>
<p>Let&#8217;s use another example. Let&#8217;s say our team hit one home run and struck out the remainder of the time. This team should score one run, but the XR model yields -1.2 runs in that calculation, again impossible. This is not necessarily an indictment only on the model posed by Furtado. These extremes can be shown to be troublesome to any linear model, like Palmer&#8217;s Batting Runs or wOBA.</p>
<p>Why? Because, like Runs Created, any linear weights model is designed to work within the framework of typical MLB performance. In both these models, the teams mentioned posted very low OBP and SLG, which hit a point at which it becomes impossible for the model to accurately portray reality. Sure enough, check out <a href="http://www.tangotiger.net/rc3ops.jpg">this chart</a> from Tango&#8217;s run estimator series. It shows the score rate as OPS of the games he split increases. As you can see, linear weights is very poor at estimating scoring rates at a game OPS of below .200 (such as the ones described here) and at above .950.</p>
<p>Ultimately, linear weights systems are built to fit baseball scoring, but not model it. Thus, while the accuracy of these models are all high when applied to major league teams, they do not apply well at extreme values because they weren&#8217;t designed to. Linear weights is especially prey to this because the weights themselves must be determined from league run scoring; there simply is not another way of building a linear regression, play-by-play run expectancy, or even an acceptable framework without having a run environment in mind. If the run environment changes, the weights have to be changed.</p>
<p><strong>Conclusions</strong></p>
<p>None of the disadvantages should be seen as something that hobbles linear weights, but rather something that limits its use. For individual players, linear weights are very good for a wide range of players, simply because few hitters are so good or so bad as to meet the limits of run expectancy models. Still, the players and teams at the extremes will be valued incorrectly; the greatest players will be undercut by linear weights, while the worst players will be severely undersold. But for most, if not all of baseball, linear weights systems, if built properly, will be very accurate in quantifying a player&#8217;s contributions and reconciling mostly with the totals at the team level.</p>
<p>Ultimately, that is the appeal of linear weights, the ability to correlate well at both the individual and team levels. It may not model the reality of baseball, but because many of the models above match up well with the empirical models using play-by-play data, it fits run scoring quite well and thus is still very useful.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Linear+Weights%3A+The+Positive+and+Negative+%28Runs%29+http://htc95.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Linear+Weights%3A+The+Positive+and+Negative+%28Runs%29+http://htc95.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/11/13/linear-weights-positives-negatives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Player Profile: Matt Holliday</title>
		<link>http://fanhuddle.com/statistics/2009/11/09/player-profile-matt-holliday/</link>
		<comments>http://fanhuddle.com/statistics/2009/11/09/player-profile-matt-holliday/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 23:12:00 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Player Profile]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=204</guid>
		<description><![CDATA[With the World Series finished and the Hot Stove season kicking into full gear, most of us will be tuned in very closely to MLBTradeRumors for all the latest transactional developments. Recently on MLBTR, the gang over there released the 2010 Top 50 Free Agents list, complete with both the ranked list of players and [...]]]></description>
			<content:encoded><![CDATA[<p>With the World Series finished and the Hot Stove season kicking into full gear, most of us will be tuned in very closely to MLBTradeRumors for all the latest transactional developments. Recently on MLBTR, the gang over there released the 2010 Top 50 Free Agents list, complete with both the ranked list of players and the teams with whom the MLBTR crew expects them to land.</p>
<p>So who tops the list for MLBTR? It should come as no surprise that corner outfielder <strong>Matt Holliday</strong> is at the top of every team&#8217;s wish list. The outfielder split the season between the <strong>Oakland Athletics</strong> and the <strong>St. Louis Cardinals</strong> and excelled as he always had back in his days at Coors Field with the <strong>Colorado Rockies</strong>. How much can we expect Holliday to be worth? Let&#8217;s take a look.<span id="more-204"></span></p>
<p>Matt Holliday began his season in Oakland, arriving as part of a trade from the Rockies to the A&#8217;s which included talented reliever <strong>Huston Street</strong> and enigmatic outfielder <strong>Carlos Gonzalez</strong> among others heading to Colorado. At the time of the move, a lot was made about how Holliday would play away from Coors Field, which was his home for five years before. The question was deemed particularly relevant because of the drastic home/road splits displayed by Holliday. In his career, Holliday has displayed a 1.3:1 split between home and road OPS respectively (including this season), which most folks attributed to effect of Coors Field on every hitter.</p>
<p>Well, this season Holliday split time playing in the cavernous Oakland Coliseum and Busch Stadium II, which also plays like a good pitchers park. Holliday put up a .982 OPS at home and an .830 OPS on the road, good for a 1.18:1 ratio that is probably well explained by the move away from Coors and to two pitcher-friendly environments, but one that probably is mostly taken care of by park adjustments. It seems like Holliday just enjoys the comfort of his home stadium a bit more than others.</p>
<p>Still, detractors point out the early problems Holliday had when he started in Oakland. He had a .286 wOBA in his first month in Oakland, though conveniently people forget that he had a nice May (.386), solid June (.365), and awesome July (.421) before he was dealt to St. Louis in a deal where the Cardinals gave up way too much (prized 1B/3B prospect <strong>Brett Wallace</strong> and two other very useful parts).</p>
<p>Of course, perhaps the most &#8220;damning&#8221; of all aspects of Holliday&#8217;s season this year is that, once arriving back at a familiar National League facility and facing National League pitching, Holliday went a torrid streak, finishing hitting .353/.419/.604 with a .423 wOBA in 270 PA in St. Louis. Cardinals fans were less concerned with the haul they gave away to Billy Beane and Oakland after witnessing the old Colorado Matt Holliday tear up NL pitching.</p>
<p>Ultimately, Holliday ended the season with a .395 park-adjusted wOBA, good for 36 runs above average on offense in total. Combine that with his above average (if not awkward looking, at least to the scouts) defense, and you have a player that once again has posted another 5+ WAR season. Holliday&#8217;s production this season was worth $25.6M in the open market, <a href="http://www.fangraphs.com/statss.aspx?playerid=1873&amp;position=OF">according to FanGraphs</a>.</p>
<p>But of course, the key isn&#8217;t what was he worth before but rather what should he be worth in the future. Projecting Holliday&#8217;s park-adjusted wOBA&#8217;s from the last five years gives a projected value of <strong>.393</strong> when regressed to 220 PA of the mean of .330. Projecting Holliday&#8217;s defensive value was slightly trickier. While UZR consistently has him as an above average left fielder (career UZR/150 of +7 runs), the scouting reports heard from most folks have him as below average. This is reflected in the way the Fans have voted in the Fans Scouting Report over the last four years, rating him as a -3 defender per 150 games based on a weighted average of the 2005-2008 results. Given this discrepancy, it was definitely important to take both inputs into consideration. Weighing UZR at 75% and the Fans at 25% for all seasons except the most recent (for 2009, only UZR was used), I got a projected value of<strong> five runs above average per 150 games</strong> for Holliday defensively.</p>
<p>Given these inputs and a league average wOBA of .330, this is what we might expect to see from Holliday next year.</p>
<p><strong>34.3 wRAA + 5 defense + -6.9 positional adj. (LF) + 21.9 replacement adj. = 54.2 Runs Above Average, or 5.4 WAR</strong></p>
<p>That 5.4 WAR, at the current market rate, would be expected to be worth <strong>$24.3M</strong>, both numbers very similar to what he earned this season. Consider also that Holliday is a Scott Boras client, meaning the money should definitely flow from whoever signs Holliday (MLBTR has the <strong>New York Mets</strong> as the big winner). I suspect that Holliday will be looking at the type of money <strong>Mark Teixeira </strong>received last year from the Yankees. This is actually a fair amount if you consider that teams do not believe he is a good defender; if you take that -3 runs/150 games instead of the +5 runs estimate, that knocks off 0.8 wins, or $3.6M from his expected value. That kind of discount puts him right at <strong>$20.7M</strong>, a bit over what Teixeira is making this season in the first year of his eight year, $180M deal.</p>
<p>There should be absolutely no concern about the supposed &#8220;Coors effect&#8221; on Matt Holliday. Holliday is a great hitter regardless of home park, a <strong>Larry Walker</strong> rather than a <strong>Dante Bichette</strong>. The bidding war that will undoubtedly occur for his services heading into three or so prime seasons of play will be deserved, and Holliday is expected to receive a hefty sum. Good luck, and may the battle with Scott Boras begin.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Matt+Holliday+http://7awxd.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Matt+Holliday+http://7awxd.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/11/09/player-profile-matt-holliday/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Runs Created: A First Step</title>
		<link>http://fanhuddle.com/statistics/2009/10/30/runs-created-the-first-step/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/30/runs-created-the-first-step/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 16:34:03 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Fundamentals]]></category>
		<category><![CDATA[Bill James]]></category>
		<category><![CDATA[Dan Fox]]></category>
		<category><![CDATA[Patriot]]></category>
		<category><![CDATA[Runs Created]]></category>
		<category><![CDATA[Tom Tango]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=173</guid>
		<description><![CDATA[Today&#8217;s Friday lecture will discuss the first of many steps taken to estimate a number of runs. Bill James&#8217; Runs Created, initially published in an early version of the classic Bill James Historical Abstract (in Dan Fox&#8217;s explanation, he has it as the 1979 version), was one of the first run estimation models and is [...]]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s Friday lecture will discuss the first of many steps taken to estimate a number of runs. Bill James&#8217; Runs Created, initially published in an early version of the classic <em>Bill James Historical Abstract</em> (in Dan Fox&#8217;s explanation, he has it as the 1979 version), was one of the first run estimation models and is likely the most well known. As a result, it will be the first of a few run estimators that we&#8217;ll look into here.</p>
<p>As texts for this discussion, I&#8217;ll refer to the following two pieces.</p>
<p>1. Patriot&#8217;s explanation of Runs Created shown <a href="http://gosu02.tripod.com/id104.html">here</a>.</p>
<p>2. Dan Fox&#8217;s explanation shown <a href="http://danagonistes.blogspot.com/2004/10/brief-history-of-run-estimation-runs.html">here</a>.</p>
<p>I may also be referring to some of the charts Tom Tango posts <a href="http://www.tangotiger.net/runscreated.html">here</a> in his series on how runs are actually created.</p>
<p>Again, none of these looks are my original research; full claim for that goes to Bill James for designing the formulas mentioned and for Patriot, Fox, and Tango for independently reviewing the model. I&#8217;ll simply be guiding the reader through the texts provided.</p>
<p>OK, let&#8217;s finally start stepping into Runs Created. First off, I&#8217;ll once again point out to you the formula mentioned in the last discussion. Here is the essential model for all Runs Created formulas.</p>
<p><strong>Runs = A*B/C</strong></p>
<p>Where A = on-base factor; B = Advancement factor; and C = Opportunities</p>
<p>Now, before we move past the jump and start defining these factors, consider whether or not this seems fundamentally sound from a baseball perspective.<span id="more-173"></span></p>
<p><strong>The Factors</strong></p>
<p>As I mentioned before, in the discussion on runs, the factors of Runs Created are actually somewhat obvious when looking at them. The A factor, the on-base factor, sounds like it is some measure of the amount of baserunners a team produces. The B factor, the advancement factor, sounds like how much these baserunners are moved. Finally, the C factor, the opportunity factor, measures some amount of chances to hit. Think for a moment what that may look like in terms of baseball.</p>
<p>OK, now that we (may) have done that, here are the first inputs for Runs Created.</p>
<p>A = H + BB</p>
<p>B = TB</p>
<p>C = AB + BB</p>
<p>Where H = hits, BB = walks,  TB = Total Bases, and AB = At-Bats</p>
<p>Rewriting the equation with the substituted variables gets you RC = (H + BB) * TB/(AB + BB). Here you can see that the formula contains (H + BB)/(AB + BB), which is essentially on-base percentage (OBP), excluding HBP. If you added those into the A factor, you&#8217;d get that RC = OBP * TB, or RC = OBP * SLG *AB.</p>
<p>As you can see, the first formula was simple and elegant. It was intuitive, in that it was designed based on how the game of baseball is played (more on that later). And finally, it seemed to work, boasting decent root mean square errors (RMSE) at the team level compared to team runs scored.</p>
<p>Because the factors were intuitively part of the game and fairly obviously defined, it was simple to come up with improvements on the basic model by incorporating other statistics. The texts provided run down the changes added since the original version, and both Patriot and Dan Fox do an excellent job explaining the reasoning behind these changes, so I will not go in-depth into these. Instead, I&#8217;ll simply present the TECH-1 formula published in 1984 and the latest revision from the 2004.</p>
<p><strong>1984 TECH-1:</strong></p>
<p>A = H+W+HB-CS-DP<br />
B = TB+.26(W+HB-IW)+.52(SB+SH+SF)<br />
C = AB+W+HB+SH+SF</p>
<p><strong>2005 Revision:</strong></p>
<p>A = H+W+HB-CS-DP<br />
B = 1.125S+1.69D+3.02T+3.73HR+.29(W-IW+HB)+.492(SB+SH+SF)-.04K<br />
C = AB+W+HB+SH+SF</p>
<p>These additions are not terribly groundbreaking. The out factors of caught stealing and double plays were added on because they take away baserunners. The B factor changes reflect the addition of stolen bases and the fact that walks and other on-base factors advance runners as well. The additions to the opportunities factor C just include additional events that are a part of plate apperances. The factors placed in front of the various inputs in the B were designed to improve the fit for the run environment.</p>
<p><strong>The Advantage</strong></p>
<p>Runs Created is derived from a fairly simple formula that is also dynamic rather than linear, which better reflects how baseball runs are scored, and it boasts low acceptable errors across the span of MLB talent and the course of a full season at the team level. Jim Furtado did <a href="http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/accuracy.htm">a study over the 1955-1997 period</a> and had RC at a root mean square error (RMSE) of 25 runs, five back of his own model, which ranked the best in the set of models listed. This effect can be seen graphically in <a href="http://www.tangotiger.net/rc3oba.jpg">this chart</a> by Tom Tango, plotting a range of OBP versus a percentage of runners scored.</p>
<p><strong>The Flaws</strong></p>
<p>As RC has continued to develop, it has gotten more and more complex. The latest version no longer has the simplicity of the A*B/C, as additions have turned the basic function into RC = (A+2.4C)(B+3C)/(9C)-.9C. Yet this addition in complexity has not solved the critical problem with RC: it is <em>not</em> an intuitive model on how baseball works, but rather a model that reflects the context and environment of a normal MLB season. In other words, RC does a fine job predicting &#8220;normal&#8221; major league teams, but struggles at the extremes because its basic formula is not actually grounded in baseball realism, but rather modeled based on MLB results.</p>
<p>The chart provided by Tom Tango linked above is a great example of this fundamental problem. If you can spot the grey line that represents RC&#8217;s scoring rate compared to the black line which represents an empirically determined scoring rate based on data clumped by games of a certain range of OBP, you can see that RC begins drastically overestimating actual scoring rate somewhere past the .400 OBP mark. Why? Again, the .400 OBP mark is not typical for your normal MLB team, but RC was designed with those teams in mind. This discrepancy is even more apparent when Tango groups games by OPS, shown <a href="http://www.tangotiger.net/rc3ops.jpg">here</a>. In that chart, the grey line jumps significantly off the mark at around .900 OPS and is of course <em>well</em> off base past an OPS of 1.00.</p>
<p>This flaw is exacerbated when examining the HR in RC. Patriot had this to say on the mistreatment of the HR in RC:</p>
<p><em>&#8220;A home run always produces at least one run, no matter what. In RC, a team with 1 HR and 100 outs will be projected to score 1*4/101 runs, a far cry from the one run that we know will score. And in an offensive context where no outs are made, all runners will eventually score, and each event, be it a walk, a single, a home run&#8211;any on base event at all&#8211;will be worth precisely one run. In a 1.000 OBA context, RC puts a HR at 1*4/1 = 4 runs. This flaw is painfully obvious at that kind of extreme point, but the distorting effects begin long before that.&#8221;</em></p>
<p>We&#8217;ve seen that this distortion effect indeed occurs quite before that. What is the end result? When measuring teams at either high or low extremes, RC is either too optimistic or not enough so respectively. This is an issue that Bill James himself recognized quite some time before; as Patriot mentions, the home run issue is one of the reasons why James tweaked the B factor for 2005 down from total bases to a set of factors for hit event. Patriot describes how this has affected the linear weights of each event as compared to empirically determined linear weights, and you can read that on his piece.</p>
<p>What I&#8217;d like to point out is that, while James has tinkered with the formula to attempt to address concerns with RC at different run environments, he has not addressed the fundamental concern of RC&#8217;s design. Ultimately, all of these moves are patches to fix a greater issue with the formula. In Dan Fox&#8217;s piece, he mentions that the authors of <em>Curve Ball: Baseball, Statistics, and the Role of Chance in the Game</em>, Jim Albert and Jay Bennett, think that product models for run estimation such as RC are not effective for the extremes.</p>
<p><strong>Do not use run estimators for individual players</strong></p>
<p>The initial formula for RC seemed friendly enough to be applied to players. In fact, James did this in part to help in reconciling total team RC with actual team runs. However, there is a clear issue with applying run estimators such as RC on an individual level. Such estimators were not designed at the individual level, and interactions between the various factors do not make sense within the context of individuals. Applying RC to an individual player assumes that the player is both getting on base and driving himself home, an impossibility outside of home runs. Thus, it models something more like the amount of runs produced by a lineup of that player at the given number of opportunities.</p>
<p>It is important to mention here that non-linear weights run estimators all have this issue. When we later discuss the best of these estimators, BaseRuns, you&#8217;ll see that applying BaseRuns to individual hitters is also fundamentally flawed. It is not the fault of the run estimator because it simply is not designed to do this calculation.</p>
<p>However, a method that can possibly used to evaluate a player&#8217;s run contribution is to place him in a hypothetical team of players of a certain level/baseline and see the differences in run production between the team with or without the player. This is the so-called &#8220;Theoretical Team&#8221; analysis, which originated from Bill James&#8217; work and will be discussed here some time in a later Friday lecture.</p>
<p><strong>Conclusion</strong></p>
<p>All of that being said, RC still holds a valuable and important place in the history of baseball statistical analysis. The introduction of RC&#8217;s basic formula A*B/C, while ultimately not effective outside of the MLB environment, influenced others in their work on run estimation. The basic formula was at the root of the design of BaseRuns by David Smyth, currently the best run estimator at the team level.  The work on the theoretical team model was originated from work using RC as a team run estimator. In other words, while RC is not and should not be considered an ideal run estimator for work in the present time, its fingerprints are on many designs in the field of run estimation to this day. It was a first step into the topic, but was not the last.</p>
<p><strong>References</strong></p>
<p>1. Patriot. &#8220;Runs Created.&#8221; http://gosu02.tripod.com/id104.html</p>
<p>2. Fox, D. &#8220;A Brief History of Run Estimation: Runs Created.&#8221; Dan Agonistes. 07 Oct 2004. http://danagonistes.blogspot.com/2004/10/brief-history-of-run-estimation-runs.html</p>
<p>3. Tango, T. &#8220;How Runs are Really Created: Third Installment.&#8221; Tangotiger.net. http://www.tangotiger.net/rc3.html</p>
<p>4. Furtado, J. &#8220;Methods and Accuracy in Run Estimation Tools.&#8221; Baseball Think Factory. http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/accuracy.htm</p>
<p>4. Albert, J., Bennett, J. <em>Curve Ball: Baseball, Statistics, and the Role of Chance in the Game</em>. Springer.<em> </em>ISBN: 978-0-387-00193-7</p>
<p><strong>Reading Materials</strong></p>
<p>Here&#8217;s the reading material for the weekend.</p>
<p>1. The next discussion on run estimators will be on linear weights of all kinds! I&#8217;ll deal with them in a general fashion, but you can check out Jim Furtado&#8217;s explanation of his own system, Extrapolated Runs, <a href="http://www.baseballthinkfactory.org/btf/scholars/furtado/articles/IntroducingXR.htm">here</a>.</p>
<p>2. BtB colleague Tommy Bennett has a look at <a href="http://www.beyondtheboxscore.com/2009/10/30/1107169/how-the-economy-will-affect">GDP and MLB salary growth</a> that was very intriguing. Check out the discussion thread over at <a href="http://www.insidethebook.com/ee/index.php/site/comments/mlb_salaries_v_gdp/">The Book blog</a> as well.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Runs+Created%3A+A+First+Step+http://t5hko.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Runs+Created%3A+A+First+Step+http://t5hko.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/30/runs-created-the-first-step/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Value of Walks and Baseball&#8217;s &#8220;Clock&#8221;</title>
		<link>http://fanhuddle.com/statistics/2009/10/28/baseballs-clock/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/28/baseballs-clock/#comments</comments>
		<pubDate>Wed, 28 Oct 2009 12:56:42 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Fundamentals]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=157</guid>
		<description><![CDATA[In the previous Friday lecture, I discussed a bit about runs, how they&#8217;re determined, and why it is important that all events be placed in terms of them. Runs are indeed valuable, but they are just half of what determines wins and losses in a baseball game. Today I&#8217;ll discuss the other important aspect of [...]]]></description>
			<content:encoded><![CDATA[<p>In the previous Friday lecture, I discussed a bit about runs, how they&#8217;re determined, and why it is important that all events be placed in terms of them. Runs are indeed valuable, but they are just half of what determines wins and losses in a baseball game. Today I&#8217;ll discuss the other important aspect of baseball, baseball&#8217;s &#8220;clock.&#8221;</p>
<p>But first, let&#8217;s start with a question that I once pondered when I first got into sabermetrics and one that, while many people may generally understand, isn&#8217;t always best explained: why are walks so valuable? Or, to ask it in a way more commonly asked, why do sabermetricians glorify the walk so much?<span id="more-157"></span></p>
<p>Here are the answers:</p>
<p>1) Walks are only as valuable as they are in terms of runs (remember, we learned that on Friday).</p>
<p>2) Sabermetricians aren&#8217;t glorifying walks per se.</p>
<p>If those answers sound vague, allow me to explain further. I present you <a href="http://www.tangotiger.net/customlwts.html">this chart</a> of custom linear weights for each offensive event based on a run estimator called BaseRuns (more on BaseRuns as we continue our run estimator discussions, which start up again on Friday). Weights were determined by Tom Tango. Let&#8217;s focus in on the five-run environment as an example. In a five-run environment, the run value above average of a walk 0.327 runs. If you&#8217;ll notice, the walk is the <em>lowest valued batting event</em> on the chart, discounting the intentional walk. Compare that to the single, which is worth 0.488 runs, or the home run, valued at 1.406 runs. A walk is worth a little less than third of a run, less than a quarter of a home run and even 67% of the value of a single. Clearly, the walk isn&#8217;t some highly valuable event in comparison to these others.</p>
<p>So clearly sabermetricians, the same people who built these models that claim the walk has a low run value compared to any hit, are not arbitrarily overvaluing the walk. However, I lied a bit when I said the walk is lowest valued batting event; actually, it&#8217;s the lowest valued <em>positive</em> batting event. Further down the column, there&#8217;s another event that can occur in a batting situation that is negatively valued: the out (there is also the strikeout, but there&#8217;s a very small difference in the value between the two). Therein lies part of the key to why walks are evaluated as important.</p>
<p>Everyone knows that baseball has no traditional clock that counts down until every digit hits zero and the game ends. But in actuality, baseball games do have definite clocks in terms of outs. Every out counts down until the end of the game, and ultimately the game ends when the losing team records its 27th out (outside of extra innings). Thus, the out is the only limited resource in the game of baseball, and this makes each out made is a significant team loss and each out remaining a valuable commodity.</p>
<p>And there is the reason why walks are so &#8220;valued&#8221; by sabermetricians. It is not the pure production that a walk provides but rather the value of the walk against the negative value of the out. Essentially, walks, like all other batting events, are inherently valuable as &#8220;not outs.&#8221; In a five-run environment, the out is worth 0.306 runs <em>below</em> average. Even a lowly walk in the discussed environment is still worth 0.631 runs more than an out. Then consider that drawing walks is (theoretically) more repeatable as a skill than hitting, because walks only involve the pitcher and hitter&#8217;s skills, while hits will also involve some aspect of fielders as well, and you can see why walks are valued highly <em>at a player-evaluation level</em>.</p>
<p>This value difference between safe events and out events is the main reason why OBP has gained traction as a useful statistic for evaluating players. While its name, &#8220;on-base percentage,&#8221; frames the statistic as a &#8220;percentage of times on base,&#8221; its most important aspect is really best framed as &#8220;percentage of opportunities without recording an out.&#8221; Because outs are the only commodity during a baseball game in which teams are limited in access, gathering players who minimize their outs compared to a league average is a viable way to improve an offense. And again, the easiest way to improve OBP at a team level is to find players who draw large numbers of walks. Again, the walks aren&#8217;t valuable necessarily, but their status as repeatable skills that avoid making outs make them a valuable player quality.</p>
<p>Now we have two defined qualities that determine the outcome of a game of baseball: runs, which are a measure of production for teams, and outs, which are a measure of time or opportunity for teams. With those two values, we can determine the quality of a team&#8217;s or individual&#8217;s play. Out future discussions will then go into the various methods of evaluation of this quality.</p>
<p><strong>References</strong></p>
<p>1. Tango, Tom M. &#8220;Sabermetrics 301: Custom Linear Weights.&#8221; Tangotiger.net. Link provided above.</p>
<p><strong>Reading Materials</strong></p>
<p>1. Remember that for this Friday, the required reading is Patriot&#8217;s <a href="http://gosu02.tripod.com/id104.html">in-depth explanation of runs created</a>. He did an excellent job of explaining it, so think of me as the guide leading you through the text, though the text is still the authority on this matter.</p>
<p>2. If you&#8217;re interested in how <em>not</em> to determine runs from component stats for <em>individual</em> <em>players</em>, Phil Birnbaum tells you that <a href="http://sabermetricresearch.blogspot.com/2009/10/dont-use-regression-to-calculate-linear.html">linear regression is not the way to go</a>. Basically, those linear regressions on a team level don&#8217;t get the linear weights right on an individual level.</p>
<p>3. MGL gives us a primer on how we should <a href="http://www.insidethebook.com/ee/index.php/site/article/is_ryan_howard_really_that_bad_versus_lhp/">project lefty-righty splits for any player</a>, and says that the projection says <strong>Ryan Howard&#8217;s</strong> game against lefties should be better than advertised.</p>
<p>Now, we&#8217;ll open the floor for any commentary or discussion. Have at it folks, feedback is much appreciated.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+Value+of+Walks+and+Baseball%E2%80%99s+%E2%80%9CClock%E2%80%9D+http://r8o3z.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=The+Value+of+Walks+and+Baseball%E2%80%99s+%E2%80%9CClock%E2%80%9D+http://r8o3z.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/28/baseballs-clock/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Player Profile: Chone Figgins</title>
		<link>http://fanhuddle.com/statistics/2009/10/26/player-profile-chone-figgins/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/26/player-profile-chone-figgins/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 18:01:09 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Player Profile]]></category>
		<category><![CDATA[Chone Figgins]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=139</guid>
		<description><![CDATA[Last night, the New York Yankees eliminated the Los Angeles Angels on their way to their first trip to the World Series since 2003. While the Yankees go on and prepare for a bout against the Philadelphia Phillies, a busy offseason for the Angels begins, one filled with important decisions to make. The Angels have [...]]]></description>
			<content:encoded><![CDATA[<p>Last night, the <strong>New York Yankees</strong> eliminated the <strong>Los Angeles Angels</strong> on their way to their first trip to the World Series since 2003. While the Yankees go on and prepare for a bout against the <strong>Philadelphia Phillies</strong>, a busy offseason for the Angels begins, one filled with important decisions to make. The Angels have a number of players up for free agency, including ace starter <strong>John Lackey</strong>, outfielder <strong>Bobby Abreu</strong>, and perhaps one of the most coveted pieces of the offseason, third baseman <a href="http://www.fangraphs.com/statss.aspx?playerid=1580&amp;position=3B/OF"><strong>Chone Figgins</strong></a>. In today&#8217;s lecture, we&#8217;ll discuss a little bit about Chone Figgins and what he may bring to the table as a free agent pickup.</p>
<p>Figgins began his career as a part-timer in 2003, but has since become a regular in the Angel&#8217;s lineup. At the start of his career, Figgins displayed the classic skillset of the speedy slap-hitter: a contact hitter with a low strikeout rate, speed on the bases, and no power. However, while many managers still employ the speedy slap-hitter in their lineups, often times at the top of the lineup, few hitters of this type can be successful offensive contributors in the majors. Even the best speedy singles hitters in baseball don&#8217;t muster more than a .370 wOBA for any given season, simply because it is so difficult to deliver offensive value relying entirely on singles and baserunning. For example, in <strong>Ichiro Suzuki&#8217;s</strong> 2004 season, during which he broke the record for most hits in a season, Ichiro recorded a .379 wOBA according to FanGraphs. In that season alone, there were 15 qualified position players with a wOBA higher than Ichiro&#8217;s, in what was perhaps the best slap hitter season ever.</p>
<p>For a player like Figgins, whose skillset would be expected to struggle to be productive in the major leagues, there are only two options for improving as a big-league player:</p>
<div style="float:right;margin-left:5px">
<p><a href="http://view.picapp.com/default.aspx?term=\Chone Figgins&amp;iid=6893001" target="_blank"><img class="alignleft" style="border: 0pt none" src="http://cdn.picapp.com/ftp/Images/3/5/f/1/ALCS_Game_5yankeesangels_3752.JPG?adImageId=6727591&amp;imageId=6893001" border="0" alt="ALCS Game 5yankees@angels" width="234" height="156" /></a>1) Draw more walks.<br />
2) Play tremendous defense</div>
<p>The first option is used to complement the player&#8217;s offensive skillset. Contact hitters usually hit for a high average, but such an average usually fluctuates around a norm due to the dependence on BABIP. As such, their on-base percentage tends to fluctuate as well, and given that these players lack the power to compensate for a low OBP, their down seasons will result in very little value. However, by increasing plate patience and walks, which are far more repeatable skills than singles hitting and BABIP, speedy contact hitters can maintain a steadier OBP, even in the down years, and thus boost their overall average value. Not to mention, an increase in walks obviously allows speedsters to get on base and take advantage of one of their primary tools.</p>
<p>In this regard, Figgins has demonstrated significant improvement since the beginning of his career.</p>
<p><a href="http://fanhuddle.com/statistics/files/2009/10/Figgins1.gif"><img class="aligncenter size-full wp-image-142" src="http://fanhuddle.com/statistics/files/2009/10/Figgins1.gif" alt="Figgins1" width="450" height="306" /></a></p>
<p>The graph shows that Figgins&#8217; walk rate has steadily increased since he started playing full-time in 2003. This year, he posted a career high walk rate of 13.9%, along with leading the American League in walks. The value of these walks can be seen when comparing his batting average with his on-base-percentage.</p>
<p><a href="http://fanhuddle.com/statistics/files/2009/10/Figgins2.gif"><img class="aligncenter size-full wp-image-147" src="http://fanhuddle.com/statistics/files/2009/10/Figgins2.gif" alt="Figgins2" width="450" height="307" /></a></p>
<p>Changes in Figgins&#8217; batting average began to affect him less and less as his career continued. This culminated in the 2008 campaign, in which Figgins posted a .276 batting average, the second lowest of his career, while still getting on base more than 36% of the time. This mitigated the damage of the poor batting average and kept his production at the plate to a reasonable amount. This season, with his career high walk rate and a return to normalcy for Figgins&#8217; batting average, he was able to post a .395 OBP, a career high, on his way to a park-adjusted .358 wOBA in my calculations, a season worht 17.5 runs above average for a player with an ISO less than .100.</p>
<p>The second point, regarding stellar defensive play, was a path long taken by many players of yesteryear and accepted by managers everywhere. It was the idea that a player could have a light bat if he played a premium defensive position and played it well. Traditionally, this occurred primarily with &#8220;up the middle&#8221; defensive positions, particularly catcher and shortstop. But now, with the advent of defensive metrics like UZR, Plus/Minus, and TotalZone, we now have a much better idea about the level defense and the amount a player&#8217;s defense is contributing to a player&#8217;s worth in terms of runs.</p>
<p>Figgins started his career as a utility man, playing primarily center field at the start of his career. None of the major defensive metrics show that Figgins was a particularly good center fielder in his time there. In 2004 the Angels gave Figgins significant time at third base, and again they did so in 2007. In 2008, Figgins began to take to the position, posting 8.2 runs above average in 89 defensive games according to UZR and 9.9 runs above average according to TotalZone. This season, Figgins stepped it up a notch according to UZR, notching 14.5 runs above average in his time there. The <a href="http://www.tangotiger.net/scouting/">fans seem to generally agree</a>, rating Figgins an above average defender at the positions he&#8217;s played since 2004.</p>
<p>What about the whole package? According to FanGraphs, Figgins&#8217; excellent offense and defense this season has been worth 5.9 WAR this season; my calculations based in part off their data and BP&#8217;s Equivalent Baserunning Runs have him at 6.2 WAR. This breakout season came at the appropriate time given his impending free agency. According to the current free agent WAR rate, Figgins would be worth something along the lines of $26.4M and $27.9M. Of course, he won&#8217;t be paid like a 6 WAR player because a lot of his value is derived from defense. Running a projection using 8/4/2/1 for Figgins&#8217; offensive numbers and 5/4/3/2/1 for his defensive numbers (with defense weighted between UZR, TotalZone, and Fans Scouting Report numbers converted to runs, whenever available), I got the following projected line, taken at 136 games played and 639 PA to account some for his health issues the last two years.</p>
<p><strong>6.3 wRAA + 7.2 Defense + 2.1 Positional Adj. (3B) + 21.3 Replacement = 36.9 Runs Above Average, or about 3.7 WAR</strong></p>
<p>Given his defensive reputation according to the Fans Scouting Report this year, I would trust most of those numbers as fairly indicative of his performance. He isn&#8217;t a 6 WAR player, but what he is a solidly above average player that any team could use. At this year&#8217;s market rate, Figgins would be worth <strong>$16.6M</strong>, and I don&#8217;t doubt someone won&#8217;t give him money around that value. Given Figgins&#8217; age (he&#8217;ll be turning 32 before the next season starts), we could expect some downturn on the order of half a WAR. At an estimated <strong>$14.4M</strong> accounting for his age, it&#8217;s likely that he&#8217;ll be paid his worth in the offseason this year.</p>
<p>Figgins is the example of a player who, with a limited offensive skillset, still excelled by maximizing repeatable skills like walks to help his offense and contribute deftly on the defensive side with his glove. Now that he&#8217;s settled in to a position, it&#8217;s possible he may produce defensively at the level he has shown for the life of his likely contract, which should be around two to three years in length if the participating teams are intelligent enough. At that length and value, you could do a whole lot worse than Chone Figgins.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Chone+Figgins+http://492eb.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Chone+Figgins+http://492eb.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/26/player-profile-chone-figgins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The International Currency of Baseball</title>
		<link>http://fanhuddle.com/statistics/2009/10/16/the-international-currency-of-baseball/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/16/the-international-currency-of-baseball/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 16:19:17 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Fundamentals]]></category>
		<category><![CDATA[Bill James]]></category>
		<category><![CDATA[Run Estimators]]></category>
		<category><![CDATA[Runs Created]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=126</guid>
		<description><![CDATA[We&#8217;ll begin our first discussion on baseball with a commonly asked question. I&#8217;m sure you&#8217;ve seen it before, so this discussion won&#8217;t be long and tedious, but it&#8217;s an important one to set the foundation of what we&#8217;ll be building upon here in our Intro course. So the question is, when we think of a [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ll begin our first discussion on baseball with a commonly asked question. I&#8217;m sure you&#8217;ve seen it before, so this discussion won&#8217;t be long and tedious, but it&#8217;s an important one to set the foundation of what we&#8217;ll be building upon here in our Intro course. So the question is, when we think of a good player, what do we think of him doing that makes him good?</p>
<p>Well, that&#8217;s actually a fairly easy question. The best players contribute the most to helping their teams win. But what do they do to contribute to wins? What does a good player do that gets wins for his teams? Here&#8217;s where your opinion may diverge. Some people prefer players who rack up hits. Some downplay the preference of large numbers of hits in favor for extra bases and of course, home run pop. Others prefer players who walk and get . Still others prefer speed. Others even go into the mythical realm of clutch, grit, and good ol&#8217; fashioned ballplayer-ness and look to grab those types.</p>
<p>So which one is right? Which type of player among these listed most contributes to wins?<span id="more-126"></span></p>
<div style="float:right;margin-left:5px"></div>
<p>The answer is &#8220;it depends.&#8221; The reason why it is is twofold: 1) when I mention these types of hitters, I don&#8217;t know the exact numbers that we&#8217;re talking about here, and 2) wins ultimately aren&#8217;t measured by hits, walks, strikeouts, or even home runs, though homers get more to the point. No, games are ultimately measured in <em>runs</em>, and that&#8217;s a crucial point when we evaluate players. If you don&#8217;t score runs, you can&#8217;t win games, and if you allow too many, your chances of winning are bad too.</p>
<p>Now you may think to yourself &#8220;I&#8217;m supposed to evaluate players based on runs scored?&#8221; Well, no, not entirely. League runs scored, yes. Individual runs scored, no. We have to derive the run contributions of player within a certain contezxt, whether at the league or team level. However, once we get down to the individual level, contributions from other players are entering into the mix, particularly in runs scored, so we can&#8217;t use that as a measurement of run production, odd as it may seem.</p>
<div style="float:right;margin-left:5px"><a href="http://view.picapp.com/default.aspx?term=\runs scored&amp;iid=6740573" target="_blank"><img style="border: 0pt none" src="http://cdn.picapp.com/ftp/Images/9/6/0/0/Twins_vs_Tigers_09e1.JPG?adImageId=5826189&amp;imageId=6740573" border="0" alt="Twins vs. Tigers" width="234" height="155" /></a>But if there is no tracked stat called &#8220;runs&#8221; for a player that measures his production to the team, don&#8217;t we have to depend on things like hits, walks, homers, and other measurables? Yes and no. Yes, those stats that a player accrues over the course of a season or a career are indeed the things we must use to evaluate a player&#8217;s production, but the key is that those stats have no meaning without being converted to runs. Of course, intrinsically more hits, more walks, more home runs yields more runs, but to what extent? How much do each of those events weigh in determining runs?</div>
<p>Here we enter the world of run estimators, and that will the primary fundamentals discussion next week. For now, I&#8217;ll open with a beginning look at run estimators. You saw one type of run estimator in the glossary: linear weights. There I described the creation of linear weights using an empirical analysis, mainly the run expectancy differences between base/out states. We&#8217;re going to drop back a bit and talk about one of the first run estimators created and the elementary basis for all other run estimators, Runs Created.</p>
<p>Runs Created was first introduced in 1979 with a purpose of determining a number of total runs contributed by players on a team. I&#8217;ll go into more detail about the construct of RC in one of our next fundamentals discussions, but I&#8217;ll leave you with the essential formula of RC. RC at its absolute base is represented in this form:</p>
<p><strong>Runs = A*B/C</strong></p>
<p>Where A = on-base factor; B = Advancement factor; and C = Opportunities</p>
<p>Without knowing what exactly goes into these inputs, it should be clear what James means by each term. On-base factor must have something to do with baserunners, advancement factor must have something to do with movement of these baserunners, and the opportunities factor must be measured by something like at-bats or plate appearances. That being said, the question to consider for next Friday, when we&#8217;ll dive into Runs Created, is whether this construct makes sense.</p>
<p><strong>Reading Assignments</strong></p>
<p>1. Colin Wyers has a great article about <a href="http://www.hardballtimes.com/main/article/what-are-little-runs-made-of/">runs and how they happen to be made</a>. He breaks down how fractions of runs get made so that we can all understand just what the heck 0.5 runs is!</p>
<p>2. Here&#8217;s a piece by Patriot at his old site <a href="http://gosu02.tripod.com/id104.html">regarding Runs Created</a>. This is good reading for the next piece, as I&#8217;ll be breaking down and trying to further simplify a lot of things that Patriot discusses in the article, so it would be nice to get familiar with the topic.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=The+International+Currency+of+Baseball+http://wttpr.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=The+International+Currency+of+Baseball+http://wttpr.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/16/the-international-currency-of-baseball/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Player Profile: Carl Pavano</title>
		<link>http://fanhuddle.com/statistics/2009/10/12/player-profile-carl-pavano/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/12/player-profile-carl-pavano/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 20:43:24 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Player Profile]]></category>
		<category><![CDATA[Carl Pavano]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=98</guid>
		<description><![CDATA[
Last night, Carl Pavano put up a dominant outing for the Minnesota Twins against his former team, the New York Yankees, in what ultimately turned into a loss. Pavano was excellent, turning in nine strikeouts in seven innings while not allowing a walk. His two mistakes, a fastball to Alex Rodriguez followed shortly by one [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center">
<p>Last night, <strong>Carl Pavano</strong> put up a dominant outing for the <strong>Minnesota Twins</strong> against his former team, the <strong>New York Yankees, </strong>in what ultimately turned into a loss. Pavano was excellent, turning in nine strikeouts in seven innings while not allowing a walk. His two mistakes, a fastball to <strong>Alex Rodriguez</strong> followed shortly by one to <strong>Jorge Posada</strong>, ultimately spelled the end of his and the Twins&#8217; season, as Yankees<strong> </strong>starter <strong>Andy Pettitte </strong>was equally impressive in getting the win. While this outing surprised everyone in terms of excellent quality, but it should not surprise us as much as his numbers and his career would tell us.<span id="more-98"></span></p>
<p>Now, peripherally, perhaps our class would say that Pavano&#8217;s 5.10 ERA would be a stretch to call anything but bad, and if you were to compare it to the American League average ERA of 4.44, you would probably be correct. However, if you checked out the reading in the glossary about defense-independent pitching, you&#8217;d know that pitchers rarely have good control over their balls in play, and Pavano has not had the best of luck, with a BABIP of .330 that is still decently above his career line of .308. If you just looked at that ERA and those hits allowed, you would say that Pavano was struggling.  Let&#8217;s look at the aspects of a pitcher that he can better control and see how well Pavano is stacking up. We can see a clue as to Pavano being decent in his WHIP (Walks plus Hits per Innings Pitched). I don&#8217;t endorse WHIP because it counts hits as something a pitcher can control, and while I&#8217;m sure to some degree pitchers can control their hits (pitchers do have a measure of control over home runs and I&#8217;m sure less control over their hits in general), it isn&#8217;t nearly as controllable as walks. Pavano&#8217;s WHIP on the season is 1.37, right in line with a career 1.39 WHIP. How was this accomplished? Pavano is walking batters at a 4.6% clip (UIBB% of 4.4%), which stands as the fifth lowest walk rate in the majors this season, behind <strong>Joel Pineiro</strong>, <strong>Roy Halladay</strong>, <strong>Dan Haren</strong>, and <strong>Cliff Lee</strong>. That&#8217;s some pretty exclusive company to have.</p>
<div style="float:right;margin-left:5px"><a href="http://view.picapp.com/default.aspx?term=\Carl Pavano&amp;iid=6782537" target="_blank"><img src="http://cdn.picapp.com/ftp/Images/2/5/5/3/Twins_Pavano_pitches_aa33.JPG?adImageId=5291557&amp;imageId=6782537" border="0" alt="Twins Pavano pitches during game 3 of the ALDS in Minneapolis" width="234" height="175" /></a>In addition to the low walk rate, the second lowest rate Pavano has posted in his career, he&#8217;s also struck out hitters at his second highest career rate, at 17.2%. Combining a low walk rate and a high strikeout rate has resulted in a renaissance campaign for Pavano, the type of season he had not had since he was a member of the <strong>Florida Marlins</strong> and a season the Yankees were hoping he would deliver when he signed that four year, $39.95M contract in the 2005 offseason. Four injury-riddled years later and Pavano has put up a 4.00 FIP and a 4.54 tRA as calculated by StatCorner.</div>
<p>What do Pavano&#8217;s pitches look like? I grabbed all of his Pitch f/x data for the season and checked it out myself. Let&#8217;s start off with a horizontal vs. vertical movement chart.</p>
<p><a href="http://fanhuddle.com/statistics/files/2009/10/Pavano2.gif"><img class="aligncenter size-medium wp-image-103" src="http://fanhuddle.com/statistics/files/2009/10/Pavano2-300x205.gif" alt="Pavano2" width="300" height="205" align="middle" /></a></p>
<p>Click on there and check out the full version. What you&#8217;ll see in the blob over Pavano&#8217;s picture are the pitches he threw broken down into their movement components. This is typically an easy way to visualize a pitcher&#8217;s pitches, as different pitches have general horizontal/vertical break combinations that are easily viewed in the coordinate plane. For example, sliders are typically neutral breaking pitches from release to plate; while there is obvious movement in flight, the pitch in general ends up where you would expect gravity to take it. As a result, pitches in the neutral (0,0) area or around that region that are of a certain speed tend to be classified as sliders. On the other hand, fastballs are &#8220;straight&#8221; from release to plate, and as a result are doing a bit of gravity defying. In the coordinate system, they appear to have &#8220;rise,&#8221; or high positive vertical break. We&#8217;ll have more on Pitch f/x analysis in a future glossary.</p>
<p>In Pavano&#8217;s chart, you can see some general trends, though the separation of movement is not terribly strong, in part due to the nature of his pitches and in part due to the nature of the graph (i.e. it does not contain velocity information). In general, however, you can see that Pavano has a typical high-rising four-seamer, complemented with his changeup that has significantly better drop (the difference between the cluster of vertical break is around three inches) and a slider-type offering with similar dip to the change and more movement away from the right handers (all Pitch f/x graphs, unless otherwise noted, are from the catcher perspective, so righties are on the left side of the graph, lefties on the right side).</p>
<p>Pavano&#8217;s pitch distribution appears to be something along the lines of 56.3% fastball, 29.7% changeup, 13.1% slider, with the remainder being classified here as &#8220;cutters,&#8221; though it&#8217;s questionable whether this pitch actually exists. Still, we get the idea of the general distribution: Pavano likes to use his fastball, presumably to get strikes, since it appears he does not walk a lot of hitters, and uses either the changeup or slider to punch out hitters, depending on situation and handedness.</p>
<p><a href="http://fanhuddle.com/statistics/files/2009/10/Pavano3.gif"><img class="aligncenter size-medium wp-image-106" src="http://fanhuddle.com/statistics/files/2009/10/Pavano3-300x205.gif" alt="Pavano3" width="300" height="205" /></a></p>
<table class="zebra" border="1" cellpadding="6"><a href="http://fanhuddle.com/statistics/files/2009/10/PavanoTable.gif"><img class="aligncenter size-full wp-image-122" src="http://fanhuddle.com/statistics/files/2009/10/PavanoTable.gif" alt="PavanoTable" width="284" height="158" /></a></p>
<p>When we look a bit more into the numbers afforded to us by Pitch f/x, we can see some more characterization about the type of pitcher Pavano has been this season. Here, I detail Pavano&#8217;s &#8220;watch%,&#8221; measured as called strikes/pitches taken. You can see in the location chart, however, that this is generally only an approximation (though it is what was actually called by the umpires). We know umpires don&#8217;t often adhere to the rules for the zone as mentioned in the rulebook, and one of the primary places where they consistently are wrong is in the outside part of the zone against left-handers. Without going into a lefty-righty split, I can tell you that those left-side called strikes that are outside the delineated zone are almost entirely to left-handed hitters.</p>
<p>In any case, what we can garner from this is that Pavano has mostly had a consistent zone, not getting terribly squeezed by the umps as a whole. Looking at the watch rates for his individual pitches, not including his &#8220;cutter,&#8221; you can see very little difference between the watch% of each pitch. In general, Pavano is placing his stuff and hitters are laying off of it at about an equal rate for each pitch.</p>
<p><a href="http://fanhuddle.com/statistics/files/2009/10/Pavano4.gif"><img class="aligncenter size-medium wp-image-107" src="http://fanhuddle.com/statistics/files/2009/10/Pavano4-300x205.gif" alt="Pavano4" width="300" height="205" /></a></table>
<table class="zebra" border="1" cellpadding="6">
<tbody>
<tr class="even">
<td>Pavano</td>
<td>Whiffs</td>
<td>Other Swings</td>
<td>Whiff%</td>
</tr>
<tr>
<td>Fastball</td>
<td>72</td>
<td>676</td>
<td>9.6%</td>
</tr>
<tr class="even">
<td>Changeup</td>
<td>139</td>
<td>366</td>
<td>27.5%</td>
</tr>
<tr>
<td>Slider</td>
<td>53</td>
<td>142</td>
<td>27.2%</td>
</tr>
<tr class="even">
<td>Total</td>
<td>264</td>
<td>1,184</td>
<td>18.2%</td>
</tr>
</tbody>
</table>
<p>As you can see from the location chart, Pavano isn&#8217;t the most adept pitcher at missing bats; his 17.2% strikeout rate is actually a tad below the American League average for the year. He doesn&#8217;t induce a whole lot of whiffs, though he gets them primarily low with his breaking stuff. He also does a solid job inducing outside zone swings (31.6% according to FanGraphs). His fastball is not efficient at whiffing at all, getting only 9.6% whiffs on the year, but this is what you would expect given an average fastball velocity in the low 90&#8217;s. The slider and changeup seem equally impressive at missing bats, as both are around 27% missed swings.</p>
<p>Looking at Pitch f/x and his peripherals, you would have to think that Pavano has pitched average and has been a victim of poor timing or poor defense (behind him played the Twins and the <strong>Cleveland Indians</strong>, who posted team bUZR&#8217;s of -36.2 and -33.4 runs respectively). Pavano does a decent job keeping the ball on the ground (43% this year, 44.7% career), is likely to do an average job of keeping balls in the park (10.7% HR/FB% this year, 10.2% career), and does an excellent job of being a strike-thrower and limiting walks. This year he&#8217;ll be a free agent again after signing an incentive laden deal with the Indians, and he will benefit from a weak starting pitching class this offseason. Pavano likely needs assistance from his defense because he doesn&#8217;t miss a lot of bats, so signing him isn&#8217;t a perfect move for everyone. However, pitchers with ERA&#8217;s above 5 who also happen to be worth 3.7 WAR on the season are often undervalued, and a lucky team who signs him to a short-term deal and puts a decent defense behind him may get lucky and pick up a solid starter for far less than his free agent money will pay.</p>
<p><strong>Required Reading:</strong></p>
<p>Here&#8217;s the first of our biweekly &#8220;reading assignments.&#8221; Make sure you check out these articles.</p>
<p>1. Here&#8217;s my instant analysis on <a href="http://www.beyondtheboxscore.com/2009/10/12/1080920/instant-analysis-pavano-and">the duel between Pavano and Pettitte</a> over at <a href="http://www.beyondtheboxscore.com/">Beyond the Box Score</a>.</p>
<p>2. Kincaid offers us a two-part primer on FIP (<a href="http://www.3-dbaseball.net/2009/10/evaluating-pitchers-with-fip-part-i.html">Part 1</a> and <a href="http://www.3-dbaseball.net/2009/10/evaluating-pitchers-with-fip-part-ii.html">Part 2</a>). If you ever thought to yourself whether those coefficients were &#8220;bogus,&#8221; this is for you.</p>
<p>3. Here&#8217;s one of BtB colleague Jack Moore&#8217;s opening pieces over at FanGraphs, showing the <a href="http://www.fangraphs.com/blogs/index.php/relating-batting-average-and-woba/">correlation between batting average and wOBA</a>.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Carl+Pavano+http://4z895.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Player+Profile%3A+Carl+Pavano+http://4z895.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/12/player-profile-carl-pavano/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Intro to Sabermetrics 101: Glossary Sect. 3</title>
		<link>http://fanhuddle.com/statistics/2009/10/08/intro-to-saber101-glossary3/</link>
		<comments>http://fanhuddle.com/statistics/2009/10/08/intro-to-saber101-glossary3/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 18:12:42 +0000</pubDate>
		<dc:creator>Michael Jong</dc:creator>
				<category><![CDATA[Course Materials]]></category>
		<category><![CDATA[Glossary]]></category>
		<category><![CDATA[Resources]]></category>

		<guid isPermaLink="false">http://fanhuddle.com/statistics/?p=69</guid>
		<description><![CDATA[Replacement Level
Source: There are tons of sources on replacement level in baseball, but of course, I&#8217;ll only cite the good ones. Here&#8217;s The Book Wiki&#8217;s article. There&#8217;s the Keith Woolner piece that&#8217;s involved in defining VORP (we don&#8217;t and won&#8217;t talk about VORP here). Here&#8217;s a good one by Sean &#8220;Rally&#8221; Smith showing some examples [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Replacement Level</strong></p>
<p>Source: There are tons of sources on replacement level in baseball, but of course, I&#8217;ll only cite the good ones. Here&#8217;s <a href="http://www.tangotiger.net/wiki/index.php?title=Replacement_Level">The Book Wiki&#8217;s</a> article. There&#8217;s the Keith Woolner <a href="http://www.stathead.com/bbeng/woolner/vorpdescnew.htm">piece</a> that&#8217;s involved in defining VORP (we don&#8217;t and won&#8217;t talk about VORP here). Here&#8217;s <a href="http://www.hardballtimes.com/main/article/replacement-level-article/">a good one</a> by Sean &#8220;Rally&#8221; Smith showing some examples based on his CHONE projections. Also, there&#8217;s Dave Cameron&#8217;s excellent series on <a href="http://www.fangraphs.com/blogs/index.php/2009-replacement-level-right-field">replacement level examples</a> (this was the last position player one, but it has all the previous links).</p>
<p>I won&#8217;t go into a terribly large amount of detail, but Woolner&#8217;s piece and the later WAR pieces go over the economic reasons why replacement level is a convenient baseline for establishing a player&#8217;s value. Replacement value is best defined as the value provided by freely available talent, either in the minor leagues or on the free agent market. The major league minimum is around $400K, and all teams must pay at least that much for talent. Since that is the floor for salaries, there&#8217;s no way to optimize economic value any further; whether players are at below that level of talent, teams still have to pay the league minimum salary.</p>
<p>As a result, the league minimum can be seen as a baseline at which a certain amount of production can be attained, whether it is through the farm system (the classic Quad-A player) or as free agent journeyman. It&#8217;s &#8220;free&#8221; to teams because they have to fill a roster and have to at least pay each player the minimum. Anything teams pay over the minimum is supposed to produce above this level of talent. For position players, the classic example is <strong>Willie Bloomquist</strong>.</p>
<p>Currently, most of the research has a team of replacement level players theoretically winning around 48 games (you&#8217;ll hear figures between 47 and 50), a .300 win% team. That&#8217;s one bad team, but that&#8217;s what you&#8217;d expect for 25 players paid at the league minimum AND performing at that level of talent. This is contrary to what Woolner&#8217;s article mentions, and at a later time I may do a compilation of articles regarding this.</p>
<p><strong>Wins Above Replacement (WAR)</strong></p>
<p>Source: The awesome <a href="http://www.fangraphs.com/blogs/index.php/glossary/#winvalues">FanGraphs Win Value series</a> by Dave Cameron, the WAR Lords of the Diamond two-part series (<a href="http://www.beyondtheboxscore.com/2009/6/12/906943/war-lords-of-the-diamond-position">position players</a> and <a href="http://www.beyondtheboxscore.com/2009/6/20/919602/war-lords-of-the-diamond-pitchers">pitchers</a>) by Jabberwocky over at <a href="http://www.purplerow.com/">Purple Row</a> and reposted on Beyond the Box Score, and originator Tom Tango&#8217;s <a href="http://www.insidethebook.com/ee/index.php/site/comments/how_to_calculate_war/">explanation</a>.</p>
<p>Wins Above Replacement (WAR) is the result of a lot of different runs-based numbers being plugged in to a big algorithm to achieve a number in familiar terms (wins) compared to a baseline (replacement level). Actually, it&#8217;s not all that complicated and it is mentioned in all of the pieces shown above. Among the ones most commonly quoted are FanGraphs&#8217; values and values from <a href="http://www.baseballprojection.com/war/playerindex.htm">Rally&#8217;s historical WAR database</a>. Both use similar processes but work on slightly different inputs. WAR for position players and pitchers is calculated differently, and particularly WAR for pitchers is of some interest due to the variety of inputs used.</p>
<p><strong>How I Do It</strong></p>
<p>The FanGraphs series by Dave Cameron does a wonderful job explaining all of the intricacies of the WAR calculation, most of which are actually fairly easy to explain. I do the same basic process and use different inputs, so mine versions are always a bit different. Keep in mind that when I quote WAR here, as when I quote any other stat, I will mention the source, whether it be my homebrewed version, the FanGraphs version, or Rally&#8217;s version.</p>
<p>It&#8217;s important here to make sure your terminology is correct. WAR refers to a specific process as outlined in the above links and below (generally). Do not confuse this with Baseball Prospectus&#8217; WARP1/2/3, and definitely do not confuse this with VORP. There are supposedly numerous issues with BP&#8217;s wins statistics, particularly with their low replacement level.</p>
<p>For position players:</p>
<p>1) Offense: I will use wOBA as the primary entrant on offense, mostly because it is the most easily accessible set of linear weights and it is so darn convenient. For this, I&#8217;ll use custom linear weights derived using the methods shown <a href="http://www.insidethebook.com/ee/index.php/site/comments/woba_year_by_year_calculations/">here</a>. These values will not include pitchers hitting. For baserunning, I&#8217;ll use BP&#8217;s Equivalent Baserunning Runs (EqBRR), as it seems like the best baserunning metric in the business.</p>
<p>2) Defense: For defense, I&#8217;ll use three different inputs, bUZR (FanGraphs), TotalZone (B-R and Rally&#8217;s site), and Fan Scouting Report data (Tango&#8217;s site). I&#8217;ll weigh the two zone-based metrics at .375 each and weight FSR data converted to runs at .25.</p>
<p>3) Positional Adjustment: I&#8217;ll use the same positional adjustments found on FanGraphs. Here&#8217;s <a href="http://www.insidethebook.com/ee/index.php/site/comments/fielding_position_adjustments/">the research done</a> by Tango using 2002-2005 UZR data provided by MGL. After much work, here&#8217;s the scale in <a href="http://www.fangraphs.com/blogs/index.php/explaining-win-values-part-three">its entirety</a> posted as part of the Win Values series over at FG. The initial study focusing on this was done by comparing players who played multiple positions in that time period. The results now being used for this era (and this is somewhat era-sensitive, as Tango mentions in <a href="http://www.google.com/url?sa=t&amp;source=custom&amp;client=pub-9367275287626489&amp;sigafs=IskSIC4St6d0i8HO&amp;flav=0000&amp;ct=res&amp;cd=11&amp;url=http%3A%2F%2Fwww.insidethebook.com%2Fee%2Findex.php%2Fsite%2Fcomments%2Fand_more_on_positional_adjustments%2F&amp;ei=4FXPSo2BB4XoM-T4oJQD&amp;usg=AFQjCNFp6vB5F9ejSSPTlT7cB3Xhnzq_5w">this post</a>) make intuitive baseball sense. Catchers receive the most help, as they play the most difficult and scarce position. First basemen receive the most penalty, as they play the easiest position to replace on the field. The other positions are in between; shortstops receive 7.5 runs more for their work while corner outfielders receiving that much of a penalty for theirs. DH&#8217;s receive a -17.5 run penalty for being eminently replaceable.</p>
<p>4) Replacement adjustment: With all the research done on replacement level, the general consensus is a value of 20 runs below average per 600 plate appearances. The reason for the importance of the replacement adjustment is the value of playing time; any time that is spent on the field derives value over a replacement player who provides far less worth, so elite players who do not accrue playing time are docked value. To account for difference in league talent level, we can use 18 runs per 600 PA in the National League and 22 runs per 600 PA in the American League (Tango accounts for this slightly differently, using a rate per 162 games).</p>
<p>For pitchers:</p>
<p>1) Pitching runs: There are a lot of different defense independent metrics that attempt to use component stats to determine run values per nine innings. For the purposes of my calculation of WAR, I&#8217;ll be using two inputs, FanGraphs&#8217; FIP/.92 (FIP scaled to runs allowed instead of ERA) and StatCorner&#8217;s tRA. Both do similar things, but tRA accounts for batted balls, while FIP uses the traditional home runs, strikeouts, and walks.</p>
<p>2) Pythagenpat: In order to determine wins, we&#8217;ll use Pythagenpat directly. For runs allowed, we use the pitcher&#8217;s runs per nine innings. For runs scored, we use the league average. We can then determine the run environment exponent and derive a win%, which of course is in terms of win/9 innings.</p>
<p>3) Replacement level: As discussed in the Tango post initially linked, the replacement level for pitchers are different depending on their role. I don&#8217;t use a separate level for closers, but I do use one for starters and relievers. For the National League, the replacement level for starters is .390 win% for starters and .480 for relievers, while those values are at .370 and .460 respectively for the American League. To use these adjustments, we use the simple formula:</p>
<p>Win% (pitcher) &#8211; Win% (replacement level)</p>
<p>to determine the wins above replacement. For relievers, there is an additional leverage index calculation that is used to give half of the credit for the leverage situations a reliever faces. Starters do not get such an adjustment because they will pitch on average around a 1.00 LI. The equation for reliever wins is:</p>
<p>Win% (reliever) &#8211; Win% (replacement level)*(1+average LI/2)</p>
<p>You can multiply this difference by Innings Pitched/9 to determine WAR. Voila!</p>
<p>More on leverage and other topics a little it later.</p>
<p align="left"><a class="tt" href="http://twitter.com/home/?status=Intro+to+Sabermetrics+101%3A+Glossary+Sect.+3+http://9dbkk.th8.us" title="Post to Twitter"><img class="nothumb" src="http://fanhuddle.com/statistics/wp-content/plugins/tweet-this/icons/tt-twitter.png" alt="Post to Twitter" /></a> <a class="tt" href="http://twitter.com/home/?status=Intro+to+Sabermetrics+101%3A+Glossary+Sect.+3+http://9dbkk.th8.us" title="Post to Twitter">Tweet This Post</a></p>]]></content:encoded>
			<wfw:commentRss>http://fanhuddle.com/statistics/2009/10/08/intro-to-saber101-glossary3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
