Friday 21 February 2014

Baseball Players’ Compensation Statistical Analysis.


Baseball players’ compensation statistical analysis.
Introduction.
Statistical concepts can be applied to the game of baseball, and as such, one is able to measure the individual productivity of the players in relation to their compensation. This statistical appraisal is facilitated by the availability of ample data about the performance and productivity of an individual. Moreover, the marginal revenue product (MRP) of an individual baseball player is relatively independent, and hence the contribution of this player to the team can be easily evaluated. It thus follows that one can estimate the marginal revenue product of each baseball player, and then make a comparison between it and the salary that the player receives (Fields 3).
MRP and the Team Model.
An additional unit of labor employed has the effect of generating additional income for a business entity, since the services offered by the additional labor will be used to create a product that will be sold. Hence, this marginal income that has been generated by the additional input of labor is correlated to two variables: marginal product (change in the production in physical output) and the marginal revenue that is generated from each unit of the physical output. This correlation is as per the equation below.
Marginal income = K (constant) × marginal product × marginal revenue.
The marginal revenue product (MRP) refers to the additional income that is derived from employing an additional worker. Theoretically, a business entity is able to pay the employee a wage that roughly equals his MRP. The theoretical dynamics of competitive labor markets postulates that the remunerations of an employee must equal his MRP (Fields 7).
In a baseball team, the MRP of a player is dependent on his playing skills and performance which determine the resultant input of the player to the team in terms of improvements of team performance and its effect on the team revenue. The effect of the MRP of a player to the team is either direct or indirect as is explained hereafter. Superb baseball skills of a player contribute to an improvement in team performance, and this increases the number of victories for the team. This translates to an increase in broadcast revenues and gate receipts. Thus, it is apparent that the market worth of an individual player is defined by the amount of revenue that the team accrues from such a player (Fields 8).
The MRP of a player is also related to his contribution to the performance variables of the team. These performance variables determine the winning percentages. Winning percentages affect the team revenue. It is thus apparent that the team revenue is related (albeit indirectly) to performance variables. Hence, the assumption that the production function of a team is linear can be expressed mathematically as (Fields 8):
WINPCT = α0 + β1RC + β2ERA + β3NATLG + β4CONT + β5OUT + e.................................... (1)
Where :-
-WINPCT is the percentage of games that the team wins.
-RC is the total runs that the team has created for a particular season
-ERA is the average run that the teams earns per 9-inning game
-NATLG is considered to be 1 if team is in the National League; otherwise it is considered to be 0.
-CONT is considered to be 1 if the team finished within 5 games out of first place in the  division; otherwise it is considered to be 0.
-OUT is considered to be 1 if the team finished 20 or more games out of first place in the division; otherwise it is considered to be 0.
The numbers of runs that have been created are a useful and reliable measure of the overall offensive production. ERA serves as the most apposite defensive measure (especially for team pitching), since it reflects the number of runs that have been prevented from scoring. However, ERA never accounts for any errors. The performance variables are as follows: hitting, walks, slugging averages, stolen bases, pitching and runs (Fields 8).
Usually, two runs are sufficient to enable a team win many of its games in the season; and this obviates the contribution made by hitting and pitching performance to the outcome of such a game. This necessitates the need for dummy variables that would account for these factors. The dummy variables that have been introduced in equation (1) are OUT and CONT. The CONT variable accounts for the team morale. Team morale is determined by the quality of team management, instantaneous decision making and composure of players; and as such, it significantly influences the probability of a team winning a majority of close matches. The OUT variable accounts for price of disappointment after loses, and the cost of buying minor league players. NATLG accounts for the quality of the play. Usually, the American League has more runs than the National League (Fields 8).
Meta-analysis has shown that the variation in the total team revenue is a linear function of team characteristics and WINPCT; and, it is thus expressed mathematically as follows (Fields 9):
TOTREV = α0 + σ1WINPCT σ2NATLG gTEAM gYEAR e…………………................ (2).
Where:
-TOTREV is the total operating revenues of the team.
-WINPCT is the winning percentage of the team.
- NATLG is considered to be 1 if the team is in the National League; otherwise it is      considered to be 0.
- TEAM is the vector of the team dummies.
- YEAR is the vector of the year dummies.
The equation above (Equation 2) is based on the following sets of hypotheses (Fields 9):
       i.            Fan attendance is directly related to the TOTREV.
     ii.            Both fan attendance and TOTREV are positively influenced by the number of team wins, since fans do respond positively to winning teams.
   iii.            Partial coefficient of TOTREV with regards to the WINPCT provides a measure of team marginal revenue.
   iv.            The variables of team dummies adjust elements within inter-team differences, while variables of year dummies accounts for differences in total revenue across years.
Projected MRP for baseball players is calculated using both equation 1 and 2 (Fields 9).
Statistics and variable creation.
The data used cover an entire decade (1990-1999). However, random sampling has been used to create abridged version of the statistics collected within this period. The remunerations have been adjusted using the appropriate weights into their current dollar values. Variable creation is used to adjust for differences among different measures, such as hitting and pitching. Hence, variable creation would enable individual performance of hitting to be measured using Runs Created (RC) as per the equation below (Fields 10):
                        RC = Totalbases (Hits + Walks)……………………………………………… (3).
                                    Walks + Atbats
The above equation would enable a team to compute the aggregate total runs that each player has created for it, while concurrently eliminating dependency. An RBI provides an effective measure of the offense capabilities of a player, but they are influenced by dependency. This is due to the fact that it provides a measure of the number of chances that a player had to drive in the runners (Fields 10).
Individual performance of pitching is measured using Earned Runs Average (ERA) as per the equation below (Fields 11):
                        ERA = 9 × Earned Runs…………………………………………………… (4).
                                    InningsPitched.
The descriptive statistics of the relevant baseball games have been collected, collated and analyzed by various statistical agencies. Random sample have been extracted from the collated descriptive statistics, and tabulated as per the tables below (Fields 12).
Table 1 below has utilized variable creation to adjust errors, dependencies and other anomalies that exists in the baseball games (Fields 21).
Table 1: RBI, Runs, and Runs Created during 1990 and 1999 by selected RBI Leaders (Fields 21)
PLAYER
RBI
R
RC
                           Selected 1990 Scores
Bonilla, B
120
112
129
Bonds, B
114
104
191
McGwire, M
108
87
142
Sandberg, R
100
116
156

                           Selected 1999 Scores
Palmero, C
142
96
204
Williams, M
142
98
138
DelGado, C
134
113
157
Guerrero, V
131
102
173
Note:
-         RBI is the Runs Batted In
-         R refers to Runs Scored
-         RC is the Runs Created as calculated using equation (3) above.
Table 2: Team Revenues and Statistics from 1990-99 (An abridged version) (Fields 23).
TEAM
REV
WINPCT
RC
ERA
OUT
CONT
Baltimore
97.5
.513
792.8
4.392
.300
.200
California
61.9
.473
730.5
4.471
.400
.300
Cleveland
83.2
.534
851.8
4.380
.200
.600
Detroit
54.6
.452
775.9
5.011
.500
0
Minnesota
47.8
.462
758.9
4.776
.500
.100
Oakland
61.1
.496
766.7
4.639
.200
.300
Tampa Bay
79.2
.407
760.8
4.705
.100
0
Atlanta
85.9
.596
759.8
3.498
.100
.800
Colorado
91.9
.478
875.8
5.344
.429
.286
Philadelphia
61.3
.471
712.7
4.298
.700
.100
Florida
60.4
.442
683.5
4.397
.571
0
San Diego
56.9
.484
712.1
4.007
.200
.200
Los Angeles
91.9
.513
705.9
3.692
.300
.600
Note:
-         Revenues are in millions of 1999 dollars using the CPI Index.
-         WINPCT is winning percentage.
-          RC is Runs Created.
-         ERA is Earned Run Average.
-          OUT is a dummy variable for teams finishing twenty or more games out of first place in their respective division.
-          CONT is a dummy variable for teams finishing within five games of first place in the division.
Table 3: Estimates of MRP, Actual Salaries, and Production Statistics (Fields 28).
PITCHERS
PLAYER
ERA
IP%
MRP
SALARY
Roger Clemmons
2.41
0.19
7,487,971
3,180,323
John Smiley
3.86
0.11
3,627,553
5,592,679
Denny Neagle
2.97
0.16
4,777,413
2,442,192
HITTERS
PLAYER
ERA
MRP
SALARY
Javier Lopez
38.4
871,398
125,533
Bill Spiers
37.8
858, 150
424,729
Ozzie Guillen
28.1
657,806
500,000






Note: - ERA is earned run average
-          IP% is percentage of team innings pitched.
-         RC is runs created
-         MRP is estimated marginal revenue product in real 1999 dollars,
-         SALARY is the actual salary in real 1999 dollars.
 The following two tables show descriptive statistics that have been collated and analyzed.
Table 4: Compilation of the Average Total Team Revenues and other Descriptive Statistics by Year (Fields 24).
YEAR
REV
RC
ERA
OUT
CONT
1990
66.1
725.6
3.86
.308
.269
1991
70.8
720.5
3.91
.346
.192
1992
72.4
706.1
3.74
.423
.192
1993
73.1
776.7
4.18
.393
.214
1994
45.4
590.9
4.51
.071
.464
1995
55.1
728.3
4.45
.428
.321
1996
70.0
848.1
4.62
.214
.418
1997
82.1
817.7
4.39
.179
.321
1998
84.5
815.0
4.43
.433
.233
1999
94.6
865.0
4.71
.533
.267

Table 5: Average Salary among Professional Baseball Players in Constant 1999 Dollars (Fields 25).
YEAR
HITTERS
PITCHERS
1990
$827,507 (745,348)
$763,856 (687,340)
1991
$1,122,047 (1,123,599)
$1,149,394 (1,135,239)
1992
$1,277,637 (1,411,698)
$1,320,195 (1,417,248)
1993
$1,257,828 (1,504,463)
$1,259,855 (1,492,623)
1994
$1,313,949 (1,545,810)
$1,249,382 (1,502,080)
1995
$1,213,318 (1,784,463)
$1,062,461(1,643,588)
1996
$1,086,363 (1,623,599)
$858,016 (1,312,878)
1997
$1,304,520 (1,805,201)
$1,060,511(1,512,825)
1998
$1,468,324 (1,989,561)
$1,221,230 (1,620,667)
1999
$1,748,757 (2,124,505)
$1,656,944 (2,004,616)
Note: - Standard deviations are in parentheses.
Statistical analysis.
The winning percentage function for a baseball team was calculated using the equation below (Fields 13):
WINPCT = 0.547 + 0.000235RC – 0.052ERA – 0.004NATLG – 0.046 CONT – 0.043 OUT...(4)
                  (0.021)   (0.000024)         (.004)              (.004)            (.005)              (.005)

Hence it is apparent from equation (4) above that the coefficients of performance variables increase the winning percentage (WINPCT) in thousandths of a unit. For instance, a single run created (RC) increases the winning percentage by a value of 0.000235. Hence, the difference in WINPCT between the official leagues is non-significant. Based on the equation (4) above, the contender team is likely to finish 0.046 above other teams that have an equivalent player performance. Table 3 above shows that an increase of 0.1 WINPCT increases the total team revenue by $ 9,630,000. NATLG shows that there is no statistically significant difference between total team revenues collected in the American League and the National League. Year dummies show that the 1994-1995 strike did have an adverse effect on total team revenue (Fields 14).
MRP calculation assumes that no externalities influence individual performance, and as such the linear summation of the respective individual performance in a team provides a measure of team performance. Calculation of an individual MRP utilizes equation 1 and equation 2, as follows. For a single baseball player, an increase of 1.0 WINPCT raises his TOTREV by $96,306. RC is the most apposite measure for hitter, while ERA is the most apposite measure for a pitcher. An increase of a single unit in RC increases the team’s WINPCT by 0.00023, while a decrease of a single unit in ERA increases the team’s WINPCT by 0.054. Thus, the MRP for a hitter is (Fields 14):
MRP of a hitter = 0.000235× annual RC × $96,306 ………………………… (5)
Hence, a high RC increases the value of MRP.
The total ERA of a team is the weighted average of the pitchers’ ERA. A pitcher ERA is weighted as a share of the IP % (team innings pitched), and as such, the individual ERA productivity function must be multiplied by his IP% so as to obtain (Fields 15):
MRP of a pitcher = $96,306 × IP% × {.547 – (.054 × ERA)}……………….. (6)
Thus, a high MRA reduces the MRP.
Studies have shown that miscellaneous inputs such as trading capabilities, managerial performance and investment in stadiums, do not influence the MRP (Fields 15).
Statistical regression analysis.
Statistical analysis of the compensation of baseball players utilizes linear regression, and linear regression models as is explained below. The salary is interrelated to the MRP by regressing salary on projected MRP as follows (Fields 15):
Salaryi = α + ei + µMRP……………………………………………………………… (7)
            Where: i denotes individual player, while µMRP is the projected MRP.
If the player is paid his full marginal revenue product, then µ=1. Thus, based on table 3, µ is biased downwards, and as such the players are underpaid. For instance, the co-efficient (µ) of MRP for Ozzie Guillen is 0.76, and this implies that he was paid 76% of his actual value. µ has a statistical difference of 0.34 from 1 (1- 0.76 = 0.34), and this shows that the player is underpaid. For the perspective of the performance of the baseball team, such a baseball player is more than worth his remuneration.
For pitchers like Roger Clemmons, the µ is:
µ=3,180,323/7,487,971= 0.42.
This implies that the pitcher is underpaid. Moreover, table 3 shows that pitchers are more underpaid as compared to hitters.
Conclusion.
Statistical concepts can be applied to the game of baseball, and as such, one is able to measure the individual productivity of the players in relation to their compensation. The MRP of an individual baseball player is relatively independent, and hence the contribution of this player to the team can be easily evaluated. Theoretically, a business entity is able to pay the employee a wage that roughly equals his MRP. If the player is paid his full marginal revenue product, then
µMRP is equal to the MRP. However, µMRP is less than MRP, and this shows that the baseball players are underpaid. Also, pitchers are more underpaid as compared to hitters.















Works cited.
Fields, Brian. "Estimating the value of Major League Baseball players." PhD Thesis, Greenville:
            East Carolina University Press, 2007. Print.



No comments:

Post a Comment

Only comments that conform to the natural laws of decency and formal language will be displayed on this blog.