Part 1 of the text covers regression analysis with cross-sectional data. It builds

Regressio Aalysis with Cross-Sectioal Data 1 Part 1 of the text covers regressio aalysis with cross-sectioal data. It builds upo a solid base of college algebra ad basic cocepts i probability ad statistics. Appedices A, B, ad C cotai complete reviews of these topics. Chapter 2 begis with the simple liear regressio model, where we explai oe variable i terms of aother variable. Although simple regressio is ot widely used i applied ecoometrics, it is used occasioally ad serves as a atural startig poit because the algebra ad iterpretatios are relatively straightforward. Chapters 3 ad 4 cover the fudametals of multiple regressio aalysis, where we allow more tha oe variable to affect the variable we are tryig to explai. Multiple regressio is still the most commoly used method i empirical research, ad so these chapters deserve careful attetio. Chapter 3 focuses o the algebra of the method of ordiary least squares (OLS), while also establishig coditios uder which the OLS estimator is ubiased ad best liear ubiased. Chapter 4 covers the importat topic of statistical iferece. Chapter 5 discusses the large sample, or asymptotic, properties of the OLS estimators. This provides justificatio of the iferece procedures i Chapter 4 whe the errors i a regressio model are ot ormally distributed. Chapter 6 covers some additioal topics i regressio aalysis, icludig advaced fuctioal form issues, data scalig, predictio, ad goodess-of-fit. Chapter 7 explais how qualitative iformatio ca be icorporated ito multiple regressio models. Chapter 8 illustrates how to test for ad correct the problem of heteroskedasticity, or ocostat variace, i the error terms. We show how the usual OLS statistics ca be adjusted, ad we also preset a extesio of OLS, kow as weighted least squares, that explicitly accouts for differet variaces i the errors. Chapter 9 delves further ito the very importat problem of correlatio betwee the error term ad oe or more of the explaatory variables. We demostrate how the availability of a proxy variable ca solve the omitted variables problem. I additio, we establish the bias ad icosistecy i the OLS estimators i the presece of certai kids of measuremet errors i the variables. Various data problems are also discussed, icludig the problem of outliers. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it. 21 21

2 The Simple Regressio Model The simple regressio model ca be used to study the relatioship betwee two variables. For reasos we will see, the simple regressio model has limitatios as a geeral tool for empirical aalysis. Nevertheless, it is sometimes appropriate as a empirical tool. Learig how to iterpret the simple regressio model is good practice for studyig multiple regressio, which we will do i subsequet chapters. 2.1 Defiitio of the Simple Regressio Model 22 Much of applied ecoometric aalysis begis with the followig premise: y ad x are two variables, represetig some populatio, ad we are iterested i explaiig y i terms of x, or i studyig how y varies with chages i x. We discussed some examples i Chapter 1, icludig: y is soybea crop yield ad x is amout of fertilizer; y is hourly wage ad x is years of educatio; ad y is a commuity crime rate ad x is umber of police officers. I writig dow a model that will explai y i terms of x, we must cofrot three issues. First, sice there is ever a exact relatioship betwee two variables, how do we allow for other factors to affect y? Secod, what is the fuctioal relatioship betwee y ad x? Ad third, how ca we be sure we are capturig a ceteris paribus relatioship betwee y ad x (if that is a desired goal)? We ca resolve these ambiguities by writig dow a equatio relatig y to x. A simple equatio is y 0 1 x u. [2.1] Equatio (2.1), which is assumed to hold i the populatio of iterest, defies the simple liear regressio model. It is also called the two-variable liear regressio model or bivariate liear regressio model because it relates the two variables x ad y. We ow discuss the meaig of each of the quatities i (2.1). [Icidetally, the term regressio has origis that are ot especially importat for most moder ecoometric applicatios, so we will ot explai it here. See Stigler (1986) for a egagig history of regressio aalysis.] Whe related by (2.1), the variables y ad x have several differet ames used iterchageably, as follows: y is called the depedet variable, the explaied variable, the Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

TABLE 2.1 Termiology for Simple Regressio y x Depedet variable Idepedet variable Explaied variable Explaatory variable Respose variable Cotrol variable Predicted variable Predictor variable Regressad Regressor Cegage Learig, 2013 respose variable, the predicted variable, or the regressad; x is called the idepedet variable, the explaatory variable, the cotrol variable, the predictor variable, or the regressor. (The term covariate is also used for x.) The terms depedet variable ad idepedet variable are frequetly used i ecoometrics. But be aware that the label idepedet here does ot refer to the statistical otio of idepedece betwee radom variables (see Appedix B). The terms explaied ad explaatory variables are probably the most descriptive. Respose ad cotrol are used mostly i the experimetal scieces, where the variable x is uder the experimeter s cotrol. We will ot use the terms predicted variable ad predictor, although you sometimes see these i applicatios that are purely about predictio ad ot causality. Our termiology for simple regressio is summarized i Table 2.1. The variable u, called the error term or disturbace i the relatioship, represets factors other tha x that affect y. A simple regressio aalysis effectively treats all factors affectig y other tha x as beig uobserved. You ca usefully thik of u as stadig for uobserved. Equatio (2.1) also addresses the issue of the fuctioal relatioship betwee y ad x. If the other factors i u are held fixed, so that the chage i u is zero, u 0, the x has a liear effect o y: y 1 x if u 0. [2.2] Thus, the chage i y is simply 1 multiplied by the chage i x. This meas that 1 is the slope parameter i the relatioship betwee y ad x, holdig the other factors i u fixed; it is of primary iterest i applied ecoomics. The itercept parameter 0, sometimes called the costat term, also has its uses, although it is rarely cetral to a aalysis. EXAMPLE 2.1 SOYBEAN YIELD AND FERTILIZER Suppose that soybea yield is determied by the model yield 0 1 fertilizer u, [2.3] so that y yield ad x fertilizer. The agricultural researcher is iterested i the effect of fertilizer o yield, holdig other factors fixed. This effect is give by 1. The error term u cotais factors such as lad quality, raifall, ad so o. The coefficiet 1 measures the effect of fertilizer o yield, holdig other factors fixed: yield 1 fertilizer. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

EXAMPLE 2.2 A SIMPLE WAGE EQUATION A model relatig a perso s wage to observed educatio ad other uobserved factors is wage 0 1 educ u. [2.4] If wage is measured i dollars per hour ad educ is years of educatio, the 1 measures the chage i hourly wage give aother year of educatio, holdig all other factors fixed. Some of those factors iclude labor force experiece, iate ability, teure with curret employer, work ethic, ad umerous other thigs. The liearity of (2.1) implies that a oe-uit chage i x has the same effect o y, regardless of the iitial value of x. This is urealistic for may ecoomic applicatios. For example, i the wage-educatio example, we might wat to allow for icreasig returs: the ext year of educatio has a larger effect o wages tha did the previous year. We will see how to allow for such possibilities i Sectio 2.4. The most difficult issue to address is whether model (2.1) really allows us to draw ceteris paribus coclusios about how x affects y. We just saw i equatio (2.2) that 1 does measure the effect of x o y, holdig all other factors (i u) fixed. Is this the ed of the causality issue? Ufortuately, o. How ca we hope to lear i geeral about the ceteris paribus effect of x o y, holdig other factors fixed, whe we are igorig all those other factors? Sectio 2.5 will show that we are oly able to get reliable estimators of 0 ad 1 from a radom sample of data whe we make a assumptio restrictig how the uobservable u is related to the explaatory variable x. Without such a restrictio, we will ot be able to estimate the ceteris paribus effect, 1. Because u ad x are radom variables, we eed a cocept grouded i probability. Before we state the key assumptio about how x ad u are related, we ca always make oe assumptio about u. As log as the itercept 0 is icluded i the equatio, othig is lost by assumig that the average value of u i the populatio is zero. Mathematically, E(u) 0. [2.5] Assumptio (2.5) says othig about the relatioship betwee u ad x, but simply makes a statemet about the distributio of the uobserved factors i the populatio. Usig the previous examples for illustratio, we ca see that assumptio (2.5) is ot very restrictive. I Example 2.1, we lose othig by ormalizig the uobserved factors affectig soybea yield, such as lad quality, to have a average of zero i the populatio of all cultivated plots. The same is true of the uobserved factors i Example 2.2. Without loss of geerality, we ca assume that thigs such as average ability are zero i the populatio of all workig people. If you are ot coviced, you should work through Problem 2 to see that we ca always redefie the itercept i equatio (2.1) to make (2.5) true. We ow tur to the crucial assumptio regardig how u ad x are related. A atural measure of the associatio betwee two radom variables is the correlatio coefficiet. (See Appedix B for defiitio ad properties.) If u ad x are ucorrelated, the, as radom variables, they are ot liearly related. Assumig that u ad x are ucorrelated goes a log way toward defiig the sese i which u ad x should be urelated i equatio (2.1). But it does ot go far eough, because correlatio measures oly liear depedece betwee u ad x. Correlatio has a somewhat couterituitive feature: it is possible for u to be ucorrelated with x while beig correlated with fuctios of x, such as x 2. (See Sectio B.4 for further discussio.) This possibility is ot acceptable for most regressio purposes, as it Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

causes problems for iterpretig the model ad for derivig statistical properties. A better assumptio ivolves the expected value of u give x. Because u ad x are radom variables, we ca defie the coditioal distributio of u give ay value of x. I particular, for ay x, we ca obtai the expected (or average) value of u for that slice of the populatio described by the value of x. The crucial assumptio is that the average value of u does ot deped o the value of x. We ca write this assumptio as E(u x) E(u). [2.6] Equatio (2.6) says that the average value of the uobservables is the same across all slices of the populatio determied by the value of x ad that the commo average is ecessarily equal to the average of u over the etire populatio. Whe assumptio (2.6) holds, we say that u is mea idepedet of x. (Of course, mea idepedece is implied by full idepedece betwee u ad x, a assumptio ofte used i basic probability ad statistics.) Whe we combie mea idepedece with assumptio (2.5), we obtai the zero coditioal mea assumptio, E(u x) 0. It is critical to remember that equatio (2.6) is the assumptio with impact; assumptio (2.5) essetially defies the itercept, 0. Let us see what (2.6) etails i the wage example. To simplify the discussio, assume that u is the same as iate ability. The (2.6) requires that the average level of ability is the same regardless of years of educatio. For example, if E(abil 8) deotes the average ability for the group of all people with eight years of educatio, ad E(abil 16) deotes the average ability amog people i the populatio with sixtee years of educatio, the (2.6) implies that these must be the same. I fact, the average ability level must be the same for all educatio levels. If, for example, we thik that average ability icreases with years of educatio, the (2.6) is false. (This would happe if, o average, people with more ability choose to become more educated.) As we caot observe iate ability, we have o way of kowig whether or ot average ability is the same for all educatio levels. But this is a issue that we must address before relyig o simple regressio aalysis. EXPLORING FURTHER 2.1 Suppose that a score o a fial exam, score, depeds o classes atteded (atted) ad uobserved factors that affect exam performace (such as studet ability). The score 0 1 atted u. [2.7] Whe would you expect this model to satisfy (2.6)? I the fertilizer example, if fertilizer amouts are chose idepedetly of other features of the plots, the (2.6) will hold: the average lad quality will ot deped o the amout of fertilizer. However, if more fertilizer is put o the higher-quality plots of lad, the the expected value of u chages with the level of fertilizer, ad (2.6) fails. The zero coditioal mea assumptio gives 1 aother iterpretatio that is ofte useful. Takig the expected value of (2.1) coditioal o x ad usig E(u x) 0 gives E(y x) 0 1 x. [2.8] Equatio (2.8) shows that the populatio regressio fuctio (PRF), E(y x), is a liear fuctio of x. The liearity meas that a oe-uit icrease i x chages the expected value of y by the amout 1. For ay give value of x, the distributio of y is cetered about E(y x), as illustrated i Figure 2.1. It is importat to uderstad that equatio (2.8) tells us how the average value of y chages with x; it does ot say that y equals 0 1 x for all uits i the populatio. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

FIGURE 2.1 E(y x) as a liear fuctio of x. y E(y x) 5 0 1 1x x 1 x 2 x 3 Cegage Learig, 2013 For example, suppose that x is the high school grade poit average ad y is the college GPA, ad we happe to kow that E(colGPA hsgpa) 1.5 0.5 hsgpa. [Of course, i practice, we ever kow the populatio itercept ad slope, but it is useful to preted mometarily that we do to uderstad the ature of equatio (2.8).] This GPA equatio tells us the average college GPA amog all studets who have a give high school GPA. So suppose that hsgpa 3.6. The the average colgpa for all high school graduates who atted college with hsgpa 3.6 is 1.5 0.5(3.6) 3.3. We are certaily ot sayig that every studet with hsgpa 3.6 will have a 3.3 college GPA; this is clearly false. The PRF gives us a relatioship betwee the average level of y at differet levels of x. Some studets with hsgpa 3.6 will have a college GPA higher tha 3.3, ad some will have a lower college GPA. Whether the actual colgpa is above or below 3.3 depeds o the uobserved factors i u, ad those differ amog studets eve withi the slice of the populatio with hsgpa 3.6. Give the zero coditioal mea assumptio E(u x) 0, it is useful to view equatio (2.1) as breakig y ito two compoets. The piece 0 1 x, which represets E(y x), is called the systematic part of y that is, the part of y explaied by x ad u is called the usystematic part, or the part of y ot explaied by x. I Chapter 3, whe we itroduce more tha oe explaatory variable, we will discuss how to determie how large the systematic part is relative to the usystematic part. I the ext sectio, we will use assumptios (2.5) ad (2.6) to motivate estimators of 0 ad 1 give a radom sample of data. The zero coditioal mea assumptio also plays a crucial role i the statistical aalysis i Sectio 2.6. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

2.2 Derivig the Ordiary Least Squares Estimates Now that we have discussed the basic igrediets of the simple regressio model, we will address the importat issue of how to estimate the parameters 0 ad 1 i equatio (2.1). To do this, we eed a sample from the populatio. Let {(x i,y i ): i 1,, } deote a radom sample of size from the populatio. Because these data come from (2.1), we ca write y i 0 1 x i u i [2.9] for each i. Here, u i is the error term for observatio i because it cotais all factors affectig y i other tha x i. As a example, x i might be the aual icome ad y i the aual savigs for family i durig a particular year. If we have collected data o fiftee families, the 15. A scatterplot of such a data set is give i Figure 2.2, alog with the (ecessarily fictitious) populatio regressio fuctio. We must decide how to use these data to obtai estimates of the itercept ad slope i the populatio regressio of savigs o icome. There are several ways to motivate the followig estimatio procedure. We will use (2.5) ad a importat implicatio of assumptio (2.6): i the populatio, u is ucorrelated with x. Therefore, we see that u has zero expected value ad that the covariace betwee x ad u is zero: E(u) 0 [2.10] FIGURE 2.2 Scatterplot of savigs ad icome for 15 families, ad the populatio regressio E(savigs icome) 0 1 icome. savigs E(savigs icome) 5 0 1 1icome 0 icome 0 Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

ad Cov(x,u) E(xu) 0, [2.11] where the first equality i (2.11) follows from (2.10). (See Sectio B.4 for the defiitio ad properties of covariace.) I terms of the observable variables x ad y ad the ukow parameters 0 ad 1, equatios (2.10) ad (2.11) ca be writte as ad E(y 0 1 x) 0 [2.12] E[x(y 0 1 x)] 0, [2.13] respectively. Equatios (2.12) ad (2.13) imply two restrictios o the joit probability distributio of (x,y) i the populatio. Sice there are two ukow parameters to estimate, we might hope that equatios (2.12) ad (2.13) ca be used to obtai good estimators of 0 ad 1. I fact, they ca be. Give a sample of data, we choose estimates ˆ 0 ad ˆ1 to solve the sample couterparts of (2.12) ad (2.13): 1 (y i ˆ0 ˆ1x i ) 0 [2.14] ad 1 x i (y i ˆ0 ˆ1x i ) 0. [2.15] This is a example of the method of momets approach to estimatio. (See Sectio C.4 for a discussio of differet estimatio approaches.) These equatios ca be solved for ˆ0 ad ˆ1. Usig the basic properties of the summatio operator from Appedix A, equatio (2.14) ca be rewritte as ȳ ˆ0 ˆ1x, [2.16] where ȳ 1 y i is the sample average of the y i ad likewise for x. This equatio allows us to write ˆ0 i terms of ˆ1, ȳ, ad x : ˆ0 ȳ ˆ1x. [2.17] Therefore, oce we have the slope estimate ˆ1, it is straightforward to obtai the itercept estimate ˆ0, give ȳ ad x. Droppig the 1 i (2.15) (sice it does ot affect the solutio) ad pluggig (2.17) ito (2.15) yields which, upo rearragemet, gives x i [y i (ȳ ˆ1x ) ˆ1x i ] 0, x i (y i ȳ) ˆ1 x i (x i x ). From basic properties of the summatio operator [see (A.7) ad (A.8)], x i (x i x ) (x i x ) 2 ad x i (y i ȳ) (x i x )(y i ȳ). Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

Therefore, provided that the estimated slope is ˆ1 (x i x ) 2 0, [2.18] (x i x ) (y i ȳ). [2.19] (x i x ) 2 Equatio (2.19) is simply the sample covariace betwee x ad y divided by the sample variace of x. (See Appedix C. Dividig both the umerator ad the deomiator by 1 chages othig.) This makes sese because 1 equals the populatio covariace divided by the variace of x whe E(u) 0 ad Cov(x,u) 0. A immediate implicatio is that if x ad y are positively correlated i the sample, the ˆ1 is positive; if x ad y are egatively correlated, the ˆ1 is egative. Although the method for obtaiig (2.17) ad (2.19) is motivated by (2.6), the oly assumptio eeded to compute the estimates for a particular sample is (2.18). This is hardly a assumptio at all: (2.18) is true provided the x i i the sample are ot all equal to the same value. If (2.18) fails, the we have either bee ulucky i obtaiig our sample from the populatio or we have ot specified a iterestig problem (x does ot vary i the populatio). For example, if y wage ad x educ, the (2.18) fails oly if everyoe i the sample has the same amout of educatio (for example, if everyoe is a high school graduate; see Figure 2.3). If just oe perso has a differet amout of educatio, the (2.18) holds, ad the estimates ca be computed. FIGURE 2.3 A scatterplot of wage agaist educatio whe educ i 12 for all i. wage 0 12 educ Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

The estimates give i (2.17) ad (2.19) are called the ordiary least squares (OLS) estimates of 0 ad 1. To justify this ame, for ay ˆ0 ad ˆ1 defie a fitted value for y whe x x i as ˆ y i ˆ0 ˆ1x i. [2.20] This is the value we predict for y whe x x i for the give itercept ad slope. There is a fitted value for each observatio i the sample. The residual for observatio i is the differece betwee the actual y i ad its fitted value: û i y i ˆ y i y i ˆ0 ˆ1x i. [2.21] Agai, there are such residuals. [These are ot the same as the errors i (2.9), a poit we retur to i Sectio 2.5.] The fitted values ad residuals are idicated i Figure 2.4. Now, suppose we choose ˆ0 ad ˆ1 to make the sum of squared residuals, 2 û i (y i ˆ0 ˆ1x i ) 2, [2.22] as small as possible. The appedix to this chapter shows that the coditios ecessary for ( ˆ0, ˆ1) to miimize (2.22) are give exactly by equatios (2.14) ad (2.15), without 1. Equatios (2.14) ad (2.15) are ofte called the first order coditios for the OLS estimates, a term that comes from optimizatio usig calculus (see Appedix A). From our previous calculatios, we kow that the solutios to the OLS first order coditios are give by (2.17) ad (2.19). The ame ordiary least squares comes from the fact that these estimates miimize the sum of squared residuals. FIGURE 2.4 Fitted values ad residuals. y y i û i 5 residual y ˆ 5 ˆ 0 1 ˆ 1x y 1 yˆ 1 yˆ i 5 fitted value x 1 x i x Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

Whe we view ordiary least squares as miimizig the sum of squared residuals, it is atural to ask: Why ot miimize some other fuctio of the residuals, such as the absolute values of the residuals? I fact, as we will discuss i the more advaced Sectio 9.4, miimizig the sum of the absolute values of the residuals is sometimes very useful. But it does have some drawbacks. First, we caot obtai formulas for the resultig estimators; give a data set, the estimates must be obtaied by umerical optimizatio routies. As a cosequece, the statistical theory for estimators that miimize the sum of the absolute residuals is very complicated. Miimizig other fuctios of the residuals, say, the sum of the residuals each raised to the fourth power, has similar drawbacks. (We would ever choose our estimates to miimize, say, the sum of the residuals themselves, as residuals large i magitude but with opposite sigs would ted to cacel out.) With OLS, we will be able to derive ubiasedess, cosistecy, ad other importat statistical properties relatively easily. Plus, as the motivatio i equatios (2.13) ad (2.14) suggests, ad as we will see i Sectio 2.5, OLS is suited for estimatig the parameters appearig i the coditioal mea fuctio (2.8). Oce we have determied the OLS itercept ad slope estimates, we form the OLS regressio lie: ˆ y ˆ0 ˆ1x, [2.23] where it is uderstood that ˆ0 ad ˆ1 have bee obtaied usig equatios (2.17) ad (2.19). The otatio ˆ y, read as y hat, emphasizes that the predicted values from equatio (2.23) are estimates. The itercept, ˆ0, is the predicted value of y whe x 0, although i some cases it will ot make sese to set x 0. I those situatios, ˆ0 is ot, i itself, very iterestig. Whe usig (2.23) to compute predicted values of y for various values of x, we must accout for the itercept i the calculatios. Equatio (2.23) is also called the sample regressio fuctio (SRF) because it is the estimated versio of the populatio regressio fuctio E(y x) 0 1 x. It is importat to remember that the PRF is somethig fixed, but ukow, i the populatio. Because the SRF is obtaied for a give sample of data, a ew sample will geerate a differet slope ad itercept i equatio (2.23). I most cases, the slope estimate, which we ca write as ˆ1 y ˆ x, [2.24] is of primary iterest. It tells us the amout by which ˆ y chages whe x icreases by oe uit. Equivaletly, y ˆ ˆ1 x, [2.25] so that give ay chage i x (whether positive or egative), we ca compute the predicted chage i y. We ow preset several examples of simple regressio obtaied by usig real data. I other words, we fid the itercept ad slope estimates with equatios (2.17) ad (2.19). Sice these examples ivolve may observatios, the calculatios were doe usig a ecoometrics software package. At this poit, you should be careful ot to read too much ito these regressios; they are ot ecessarily ucoverig a causal relatioship. We have said othig so far about the statistical properties of OLS. I Sectio 2.5, we cosider Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

statistical properties after we explicitly impose assumptios o the populatio model equatio (2.1). EXAMPLE 2.3 CEO SALARY AND RETURN ON EQUITY For the populatio of chief executive officers, let y be aual salary (salary) i thousads of dollars. Thus, y 856.3 idicates a aual salary of $856,300, ad y 1,452.6 idicates a salary of $1,452,600. Let x be the average retur o equity (roe) for the CEO s firm for the previous three years. (Retur o equity is defied i terms of et icome as a percetage of commo equity.) For example, if roe 10, the average retur o equity is 10%. To study the relatioship betwee this measure of firm performace ad CEO compesatio, we postulate the simple model salary 0 1 roe u. The slope parameter 1 measures the chage i aual salary, i thousads of dollars, whe retur o equity icreases by oe percetage poit. Because a higher roe is good for the compay, we thik 1 0. The data set CEOSAL1.RAW cotais iformatio o 209 CEOs for the year 1990; these data were obtaied from Busiess Week (5/6/91). I this sample, the average aual salary is $1,281,120, with the smallest ad largest beig $223,000 ad $14,822,000, respectively. The average retur o equity for the years 1988, 1989, ad 1990 is 17.18%, with the smallest ad largest values beig 0.5 ad 56.3%, respectively. Usig the data i CEOSAL1.RAW, the OLS regressio lie relatig salary to roe is salary 963.191 18.501 roe [2.26] 209, where the itercept ad slope estimates have bee rouded to three decimal places; we use salary hat to idicate that this is a estimated equatio. How do we iterpret the equatio? First, if the retur o equity is zero, roe 0, the the predicted salary is the itercept, 963.191, which equals $963,191 sice salary is measured i thousads. Next, we ca write the predicted chage i salary as a fuctio of the chage i roe: salary 18.501 ( roe). This meas that if the retur o equity icreases by oe percetage poit, roe 1, the salary is predicted to chage by about 18.5, or $18,500. Because (2.26) is a liear equatio, this is the estimated chage regardless of the iitial salary. We ca easily use (2.26) to compare predicted salaries at differet values of roe. Suppose roe 30. The salary 963.191 18.501(30) 1,518,221, which is just over $1.5 millio. However, this does ot mea that a particular CEO whose firm had a roe 30 ears $1,518,221. May other factors affect salary. This is just our predictio from the OLS regressio lie (2.26). The estimated lie is graphed i Figure 2.5, alog with the populatio regressio fuctio E(salary roe). We will ever kow the PRF, so we caot tell how close the SRF is to the PRF. Aother sample of data will give a differet regressio lie, which may or may ot be closer to the populatio regressio lie. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

FIGURE 2.5 The OLS regressio lie salary 963.191 18.501 roe ad the (ukow) populatio regressio fuctio. salary salary 5 963.191 1 18.501 roe E(salary roe) 5 0 1 1roe 963.191 roe Cegage Learig, 2013 EXAMPLE 2.4 WAGE AND EDUCATION For the populatio of people i the workforce i 1976, let y wage, where wage is measured i dollars per hour. Thus, for a particular perso, if wage 6.75, the hourly wage is $6.75. Let x educ deote years of schoolig; for example, educ 12 correspods to a complete high school educatio. Sice the average wage i the sample is $5.90, the Cosumer Price Idex idicates that this amout is equivalet to $19.06 i 2003 dollars. Usig the data i WAGE1.RAW where 526 idividuals, we obtai the followig OLS regressio lie (or sample regressio fuctio): wage 0.90 0.54 educ [2.27] 526. EXPLORING FURTHER 2.2 The estimated wage from (2.27), whe educ 8, is $3.42 i 1976 dollars. What is this value i 2003 dollars? (Hit: You have eough iformatio i Example 2.4 to aswer this questio.) We must iterpret this equatio with cautio. The itercept of 0.90 literally meas that a perso with o educatio has a predicted hourly wage of 90 a hour. This, of course, is silly. It turs out that oly 18 people i the sample of 526 have less tha eight years of Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

educatio. Cosequetly, it is ot surprisig that the regressio lie does poorly at very low levels of educatio. For a perso with eight years of educatio, the predicted wage is wage 0.90 0.54(8) 3.42, or $3.42 per hour (i 1976 dollars). The slope estimate i (2.27) implies that oe more year of educatio icreases hourly wage by 54 a hour. Therefore, four more years of educatio icrease the predicted wage by 4(0.54) 2.16, or $2.16 per hour. These are fairly large effects. Because of the liear ature of (2.27), aother year of educatio icreases the wage by the same amout, regardless of the iitial level of educatio. I Sectio 2.4, we discuss some methods that allow for ocostat margial effects of our explaatory variables. EXAMPLE 2.5 VOTING OUTCOMES AND CAMPAIGN EXPENDITURES The file VOTE1.RAW cotais data o electio outcomes ad campaig expeditures for 173 two-party races for the U.S. House of Represetatives i 1988. There are two cadidates i each race, A ad B. Let votea be the percetage of the vote received by Cadidate A ad sharea be the percetage of total campaig expeditures accouted for by Cadidate A. May factors other tha sharea affect the electio outcome (icludig the quality of the cadidates ad possibly the dollar amouts spet by A ad B). Nevertheless, we ca estimate a simple regressio model to fid out whether spedig more relative to oe s challeger implies a higher percetage of the vote. The estimated equatio usig the 173 observatios is votea 26.81 0.464 sharea [2.28] 173. This meas that if Cadidate A s share of spedig icreases by oe percetage poit, Cadidate A receives almost oe-half a percetage poit (0.464) more of the total vote. Whether or ot this is a causal effect is uclear, but it is ot ubelievable. If sharea 50, votea is predicted to be about 50, or half the vote. EXPLORING FURTHER 2.3 I Example 2.5, what is the predicted vote for Cadidate A if sharea 60 (which meas 60%)? Does this aswer seem reasoable? I some cases, regressio aalysis is ot used to determie causality but to simply look at whether two variables are positively or egatively related, much like a stadard correlatio aalysis. A example of this occurs i Computer Exercise C3, where you are asked to use data from Biddle ad Hamermesh (1990) o time spet sleepig ad workig to ivestigate the tradeoff betwee these two factors. A Note o Termiology I most cases, we will idicate the estimatio of a relatioship through OLS by writig a equatio such as (2.26), (2.27), or (2.28). Sometimes, for the sake of brevity, it is useful to idicate that a OLS regressio has bee ru without actually writig out the equatio. We will ofte idicate that equatio (2.23) has bee obtaied by OLS i sayig that we ru the regressio of y o x, [2.29] Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

or simply that we regress y o x. The positios of y ad x i (2.29) idicate which is the depedet variable ad which is the idepedet variable: we always regress the depedet variable o the idepedet variable. For specific applicatios, we replace y ad x with their ames. Thus, to obtai (2.26), we regress salary o roe, or to obtai (2.28), we regress votea o sharea. Whe we use such termiology i (2.29), we will always mea that we pla to estimate the itercept, ˆ0, alog with the slope, ˆ1. This case is appropriate for the vast majority of applicatios. Occasioally, we may wat to estimate the relatioship betwee y ad x assumig that the itercept is zero (so that x 0 implies that y ˆ 0); we cover this case briefly i Sectio 2.6. Uless explicitly stated otherwise, we always estimate a itercept alog with a slope. 2.3 Properties of OLS o Ay Sample of Data I the previous sectio, we wet through the algebra of derivig the formulas for the OLS itercept ad slope estimates. I this sectio, we cover some further algebraic properties of the fitted OLS regressio lie. The best way to thik about these properties is to remember that they hold, by costructio, for ay sample of data. The harder task cosiderig the properties of OLS across all possible radom samples of data is postpoed util Sectio 2.5. Several of the algebraic properties we are goig to derive will appear mudae. Never theless, havig a grasp of these properties helps us to figure out what happes to the OLS estimates ad related statistics whe the data are maipulated i certai ways, such as whe the measuremet uits of the depedet ad idepedet variables chage. Fitted Values ad Residuals We assume that the itercept ad slope estimates, ˆ0 ad ˆ1, have bee obtaied for the give sample of data. Give ˆ0 ad ˆ1, we ca obtai the fitted value y ˆ i for each observatio. [This is give by equatio (2.20).] By defiitio, each fitted value of y ˆ i is o the OLS regressio lie. The OLS residual associated with observatio i, û i, is the differece betwee y i ad its fitted value, as give i equatio (2.21). If û i is positive, the lie uderpredicts y i ; if û i is egative, the lie overpredicts y i. The ideal case for observatio i is whe û i 0, but i most cases, every residual is ot equal to zero. I other words, oe of the data poits must actually lie o the OLS lie. EXAMPLE 2.6 CEO SALARY AND RETURN ON EQUITY Table 2.2 cotais a listig of the first 15 observatios i the CEO data set, alog with the fitted values, called salaryhat, ad the residuals, called uhat. The first four CEOs have lower salaries tha what we predicted from the OLS regressio lie (2.26); i other words, give oly the firm s roe, these CEOs make less tha what we predicted. As ca be see from the positive uhat, the fifth CEO makes more tha predicted from the OLS regressio lie. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

TABLE 2.2 Fitted Values ad Residuals for the First 15 CEOs obso roe salary salaryhat uhat 1 14.1 1095 1224.058 129.0581 2 10.9 1001 1164.854 163.8542 3 23.5 1122 1397.969 275.9692 4 5.9 578 1072.348 494.3484 5 13.8 1368 1218.508 149.4923 6 20.0 1145 1333.215 188.2151 7 16.4 1078 1266.611 188.6108 8 16.3 1094 1264.761 170.7606 9 10.5 1237 1157.454 79.54626 10 26.3 833 1449.773 616.7726 11 25.9 567 1442.372 875.3721 12 26.8 933 1459.023 526.0231 13 14.8 1339 1237.009 101.9911 14 22.3 937 1375.768 438.7678 15 56.3 2011 2004.808 6.191895 Cegage Learig, 2013 Algebraic Properties of OLS Statistics There are several useful algebraic properties of OLS estimates ad their associated statistics. We ow cover the three most importat of these. (1) The sum, ad therefore the sample average of the OLS residuals, is zero. Mathematically, u ˆ i 0. [2.30] This property eeds o proof; it follows immediately from the OLS first order coditio (2.14), whe we remember that the residuals are defied by û i y i ˆ0 ˆ1x i. I other words, the OLS estimates ˆ0 ad ˆ1 are chose to make the residuals add up to zero (for ay data set). This says othig about the residual for ay particular observatio i. (2) The sample covariace betwee the regressors ad the OLS residuals is zero. This follows from the first order coditio (2.15), which ca be writte i terms of the residuals as x i û i 0. [2.31] The sample average of the OLS residuals is zero, so the left-had side of (2.31) is proportioal to the sample covariace betwee x i ad û i. (3) The poit (x,ȳ) is always o the OLS regressio lie. I other words, if we take equatio (2.23) ad plug i x for x, the the predicted value is ȳ. This is exactly what equatio (2.16) showed us. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

EXAMPLE 2.7 WAGE AND EDUCATION For the data i WAGE1.RAW, the average hourly wage i the sample is 5.90, rouded to two decimal places, ad the average educatio is 12.56. If we plug educ 12.56 ito the OLS regressio lie (2.27), we get wage 0.90 0.54(12.56) 5.8824, which equals 5.9 whe rouded to the first decimal place. These figures do ot exactly agree because we have rouded the average wage ad educatio, as well as the itercept ad slope estimates. If we did ot iitially roud ay of the values, we would get the aswers to agree more closely, but to little useful effect. Writig each y i as its fitted value, plus its residual, provides aother way to iterpret a OLS regressio. For each i, write y i ˆ y i û i. [2.32] From property (1), the average of the residuals is zero; equivaletly, the sample average of the fitted values, y ˆ i, is the same as the sample average of the y i, or ȳ ˆ ȳ. Further, properties (1) ad (2) ca be used to show that the sample covariace betwee y ˆ i ad û i is zero. Thus, we ca view OLS as decomposig each y i ito two parts, a fitted value ad a residual. The fitted values ad residuals are ucorrelated i the sample. Defie the total sum of squares (SST), the explaied sum of squares (SSE), ad the residual sum of squares (SSR) (also kow as the sum of squared residuals), as follows: SST (y i ȳ) 2. [2.33] SSE ( y ˆ i ȳ) 2. [2.34] SSR û 2 i. [2.35] SST is a measure of the total sample variatio i the y i ; that is, it measures how spread out the y i are i the sample. If we divide SST by 1, we obtai the sample variace of y, as discussed i Appedix C. Similarly, SSE measures the sample variatio i the y ˆ i (where we use the fact that ŷ ȳ), ad SSR measures the sample variatio i the û i. The total variatio i y ca always be expressed as the sum of the explaied variatio ad the uexplaied variatio SSR. Thus, SST SSE SSR. [2.36] Provig (2.36) is ot difficult, but it requires us to use all of the properties of the summatio operator covered i Appedix A. Write (y i ȳ) 2 [(y i y ˆ i ) ( y ˆ i ȳ)] 2 [û i ( y ˆ i ȳ)] 2 2 û i 2 û i ( ˆ y i ȳ) SSR 2 û i ( y ˆ i ȳ) SSE. ( y ˆ i ȳ) 2 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

Now, (2.36) holds if we show that û i ( y ˆ i ȳ) 0. [2.37] But we have already claimed that the sample covariace betwee the residuals ad the fitted values is zero, ad this covariace is just (2.37) divided by 1. Thus, we have established (2.36). Some words of cautio about SST, SSE, ad SSR are i order. There is o uiform agreemet o the ames or abbreviatios for the three quatities defied i equatios (2.33), (2.34), ad (2.35). The total sum of squares is called either SST or TSS, so there is little cofusio here. Ufortuately, the explaied sum of squares is sometimes called the regressio sum of squares. If this term is give its atural abbreviatio, it ca easily be cofused with the term residual sum of squares. Some regressio packages refer to the explaied sum of squares as the model sum of squares. To make matters eve worse, the residual sum of squares is ofte called the error sum of squares. This is especially ufortuate because, as we will see i Sectio 2.5, the errors ad the residuals are differet quatities. Thus, we will always call (2.35) the residual sum of squares or the sum of squared residuals. We prefer to use the abbreviatio SSR to deote the sum of squared residuals, because it is more commo i ecoometric packages. Goodess-of-Fit So far, we have o way of measurig how well the explaatory or idepedet variable, x, explais the depedet variable, y. It is ofte useful to compute a umber that summarizes how well the OLS regressio lie fits the data. I the followig discussio, be sure to remember that we assume that a itercept is estimated alog with the slope. Assumig that the total sum of squares, SST, is ot equal to zero which is true except i the very ulikely evet that all the y i equal the same value we ca divide (2.36) by SST to get 1 SSE/SST SSR/SST. The R-squared of the regressio, sometimes called the coefficiet of determiatio, is defied as R 2 SSE/SST 1 SSR/SST. [2.38] R 2 is the ratio of the explaied variatio compared to the total variatio; thus, it is iterpreted as the fractio of the sample variatio i y that is explaied by x. The secod equality i (2.38) provides aother way for computig R 2. From (2.36), the value of R 2 is always betwee zero ad oe, because SSE ca be o greater tha SST. Whe iterpretig R 2, we usually multiply it by 100 to chage it ito a percet: 100 R 2 is the percetage of the sample variatio i y that is explaied by x. If the data poits all lie o the same lie, OLS provides a perfect fit to the data. I this case, R 2 1. A value of R 2 that is early equal to zero idicates a poor fit of the OLS lie: very little of the variatio i the y i is captured by the variatio i the y ˆ i (which all lie o the OLS regressio lie). I fact, it ca be show that R 2 is equal to the square of the sample correlatio coefficiet betwee y i ad y ˆ i. This is where the term R-squared came from. (The letter R was traditioally used to deote a estimate of a populatio correlatio coefficiet, ad its usage has survived i regressio aalysis.) Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

EXAMPLE 2.8 CEO SALARY AND RETURN ON EQUITY I the CEO salary regressio, we obtai the followig: salary 963.191 18.501 roe [2.39] 209, R 2 0.0132. We have reproduced the OLS regressio lie ad the umber of observatios for clarity. Usig the R-squared (rouded to four decimal places) reported for this equatio, we ca see how much of the variatio i salary is actually explaied by the retur o equity. The aswer is: ot much. The firm s retur o equity explais oly about 1.3% of the variatio i salaries for this sample of 209 CEOs. That meas that 98.7% of the salary variatios for these CEOs is left uexplaied! This lack of explaatory power may ot be too surprisig because may other characteristics of both the firm ad the idividual CEO should ifluece salary; these factors are ecessarily icluded i the errors i a simple regressio aalysis. I the social scieces, low R-squareds i regressio equatios are ot ucommo, especially for cross-sectioal aalysis. We will discuss this issue more geerally uder multiple regressio aalysis, but it is worth emphasizig ow that a seemigly low R-squared does ot ecessarily mea that a OLS regressio equatio is useless. It is still possible that (2.39) is a good estimate of the ceteris paribus relatioship betwee salary ad roe; whether or ot this is true does ot deped directly o the size of R- squared. Studets who are first learig ecoometrics ted to put too much weight o the size of the R-squared i evaluatig regressio equatios. For ow, be aware that usig R-squared as the mai gauge of success for a ecoometric aalysis ca lead to trouble. Sometimes, the explaatory variable explais a substatial part of the sample variatio i the depedet variable. EXAMPLE 2.9 VOTING OUTCOMES AND CAMPAIGN EXPENDITURES I the votig outcome equatio i (2.28), R 2 0.856. Thus, the share of campaig expeditures explais over 85% of the variatio i the electio outcomes for this sample. This is a sizable portio. 2.4 Uits of Measuremet ad Fuctioal Form Two importat issues i applied ecoomics are (1) uderstadig how chagig the uits of measuremet of the depedet ad/or idepedet variables affects OLS estimates ad (2) kowig how to icorporate popular fuctioal forms used i ecoomics ito regressio aalysis. The mathematics eeded for a full uderstadig of fuctioal form issues is reviewed i Appedix A. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.