The Simple Regression Model

Size: px

Start display at page:

Download "The Simple Regression Model"

Annis Flowers
5 years ago
Views:

1 The Simple Regressio Model Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) SLR 1 / 75

2 Defiitio of the Simple Regressio Model Defiitio of the Simple Regressio Model Pig Yu (HKU) SLR 2 / 75

3 Defiitio of the Simple Regressio Model Defiitio of the Simple Regressio Model The simple liear regressio (SLR) model is also called two-variable liear regressio model or bivariate liear regressio model. The SLR model is usually writte as y = β 0 + β 1 x + u, where β 0 is called the itercept (parameter) or the costat term, ad β 1 is called the slope (parameter). y x u Depedet variable Idepedet Variable Error Term Explaied variable Explaatory variable Disturbace Respose variable Cotrol variable Uobservable Predicted variable Regressad Predictor variable Regressor Covariate Table: Termiology for SLR Pig Yu (HKU) SLR 3 / 75

4 Defiitio of the Simple Regressio Model Iterpretatio of the SLR Model The SLR model tries to "explai variable y i terms of variable x" or "study how y varies with chages i x": y x = β u 1 as log as x = 0, 1 where y meas "by how much does the depedet variable chage if oly the x idepedet variable is icreased by oe uit?". i partial derivative is the couterpart of d i derivative (e.g., dy dx ), where d is the first letter of "delta" ( ) which usually meas a small chage i mathematics. I other words, y x = β u 1 oly if = 0, i.e., all other thigs remai equal whe the x idepedet variable is icreased by oe uit. The simple liear regressio model is rarely applicable i practice but its discussio is useful for pedagogical reasos. 1 Note that y x = β 1 + u x. Pig Yu (HKU) SLR 4 / 75

5 Defiitio of the Simple Regressio Model Two SLR Examples Example (Soybea Yield ad Fertilizer): yield = β 0 + β 1 fertilizer + u, where β 1 measures the effect of fertilizer o yield, holdig all other factors fixed, ad u cotais factors such as raifall, lad quality, presece of parasites, Example (A Simple Wage Equatio): wage = β 0 + β 1 educ + u, where β 1 measures the chage i hourly wage give aother year of educatio, holdig all other factors fixed, ad u cotais factors such as labor force experiece, iate ability, teure with curret employer, work ethic, Pig Yu (HKU) SLR 5 / 75

6 Defiitio of the Simple Regressio Model (*) Whe Is There a Causal Iterpretatio of β 1? Although u x = 0 implies that β 1 has a causal iterpretatio for each idividual, it hardly holds i practice. Also, because we usually ca observe oly oe pair of (x,y) for each idividual, we caot idetify the idividual causal effect which requires y values for at least two x values. So, we are usually iterested i the average causal effect. β 1 ca be iterpreted as the average causal effect uder the coditioal mea idepedece assumptio: E [ujx] = The explaatory variable must ot cotai iformatio about the mea of the uobserved factors. E [ujx] = 0 implies Cov (x,u) = 0 [proof ot required, Cov (x,u) will be defied later]. So i practice, we just argue why x ad u are correlated to ivalidate a causal iterpretatio. 2 It is called the zero coditioal mea assumptio i the textbook. Pig Yu (HKU) SLR 6 / 75

7 Defiitio of the Simple Regressio Model [Review] Mea For a radom variable (r.v.) X, the mea (or expectatio) of X, deoted as E [X ] (or E (X )) ad sometimes µ X (or simply µ), is a weighted average of all possible values of X. For example, i the populatio, proportio p (e.g., 17% i US) idividuals are college graduates, ad the remaiig are ot. Defie X = 1(college graduate), where 1() is the idicator fuctio which equals 1 whe the statemet i the parethesis is true ad zero otherwise. The distributio of X is 1, X = 0, with probability p, with probability 1 p, so E [X ] = 0 (1 p) + 1 p = p. For a geeral discrete r.v. X, P X = x j = pj, j = 1,,J, 3 where p j 0, ad p p J = J j=1 p j = 1, we have E [X ] = J j=1 x jp j. 3 This is called the probability mass fuctio (pmf) of X. Pig Yu (HKU) SLR 7 / 75

8 Defiitio of the Simple Regressio Model cotiue The mea of a cotiuous r.v. ca be defied as a approximatio of a discrete r.v.. For a cotiuous r.v. takig values o (a,b), where a ca be ad b ca be, [figure here] E [X ] (b a)/ 1 a + i + 1 P (a + i < X a + (i + 1) ) 2 i=0 (b a)/ 1 i=0 Sum! R, a + i For example, if X N a + i + 1 f a + i + 1 Z b! xf (x)dx. 2 2 a! x, ad! dx µ,σ 2, the ormal distributio with mea µ ad variace σ 2, the E[X ] = µ. Two Useful Properties: (i) "the mea of the sum is the sum of the mea", h i E i=1 X i = i=1 E [X i ]; (ii) for ay costats a ad b, E [a + bx ] = a + be [X ]. Pig Yu (HKU) SLR 8 / 75

9 Defiitio of the Simple Regressio Model Figure: Probability Desity Fuctio (pdf) of Wage: wage exp N µ,σ 2, a = 0,b = Pig Yu (HKU) SLR 9 / 75

10 Defiitio of the Simple Regressio Model [Review] Coditioal Mea For two r.v. s, Y ad X, the coditioal mea of Y give X = x, deoted as E [Y jx = x] (or E (Y jx = x)), is the mea of Y for the (slice of) idividuals with X = x. For example, if Y is the hourly wage, X = 1(college graduate), the Z E [Y jx = 1] = yf (yjx = 1)dy is the average wage for college graduates, where f (yjx = 1) is the desity of wage amog college graduates. The coditioal mea E [Y jx = x] ca be ay fuctio of x. Pig Yu (HKU) SLR 10 / 75

11 Defiitio of the Simple Regressio Model cotiue Oe Useful Property: E [g(x )Y jx = x] = g(x)e [Y jx = x] for ay fuctio g(), i.e., coditioig o X meas X ca be treated as a costat. - g(x) is similar to b i the secod property of mea. The two properties of mea ca still apply to coditioal mea: - (i) "the coditioal mea of the sum is the sum of the coditioal mea", h i E i=1 Y ix = x = i=1 E [ Y ijx = x]; - (ii) for ay costats a ad b, E [ a + by jx = x] = a + be [Y jx = x]. Pig Yu (HKU) SLR 11 / 75

12 Defiitio of the Simple Regressio Model Coditioal Mea Idepedece Although E [ujx] ca be ay fuctio of x, coditioal mea idepedece restricts it to be the costat zero. E [ujx] = 0 meas for whatever value x takes, the mea of u give the specific x value is zero. The zero i E [ujx] = 0 is just a ormalizatio. If E [ujx] = c 6= 0, the redefie u = u c, ad β 0 = β 0 + c. Now, y = β 0 + β 1 x + u = (β 0 +c) + β 1 x + (u c) β 0 + β 1 x + u, where E [u jx] = E [u cjx] = E [ujx] c = c c = 0, ad meas "defied as". So the key here is that E [ujx] is a costat, ot depedig o x. Pig Yu (HKU) SLR 12 / 75

13 Defiitio of the Simple Regressio Model A Classical Example: Retur to Schoolig Recall the wage equatio wage = β 0 + β 1 educ + u, where for simplicity, suppose educ = 1(college graduate), ad u represets the iate ability. If E [ujeduc = 1] = E [ujeduc = 0] = 0, the E [wagejeduc = 1] E [wagejeduc = 0] = E [β 0 + β 1 educ + ujeduc = 1] E [β 0 + β 1 educ + ujeduc = 0] = (β 0 + β 1 ) + E [ujeduc = 1] β 0 E [ujeduc = 0] = (β 0 + β 1 ) β 0 = β 1. - Although u 6= 0 for each idividual, averagely, its mea withi each group of educatio level is zero. This is what "all other relevat factors are balaced" i radom assigmet of x of Chapter 1 meas. The coditioal mea idepedece assumptio is ulikely to hold here because idividuals with more educatio will also be more itelliget o average. Pig Yu (HKU) SLR 13 / 75

At its peak i the 1940s ad 1950s, polio would paralyze or kill over half a millio people worldwide every

14 Defiitio of the Simple Regressio Model Causality ad Correlatio: Polio ad Ice-cream By 1910, frequet epidemics became regular evets throughout the developed world, primarily i cities durig the summer moths. At its peak i the 1940s ad 1950s, polio would paralyze or kill over half a millio people worldwide every year. - From Wiki Aother Example: A pretty woma caused death? (iauspicious or ulucky?) Pig Yu (HKU) SLR 14 / 75

15 Defiitio of the Simple Regressio Model Populatio Regressio Fuctio (PRF) Similar as i the retur-to-schoolig example, the coditioal mea idepedece assumptio implies that E [yjx] = E [β 0 + β 1 x + ujx] = β 0 + β 1 x + E [ujx] = β 0 + β 1 x. This meas that the average value of the depedet variable ca be expressed as a liear fuctio of the explaatory variable although i geeral E [yjx] ca be ay fuctio of x. The PRF is ukow. It is a theoretical relatioship assumig a liear model ad coditioal mea idepedece. We eed to estimate the PRF. Pig Yu (HKU) SLR 15 / 75

16 Defiitio of the Simple Regressio Model E [yjx] As a Liear Fuctio of x Pig Yu (HKU) SLR 16 / 75

17 Derivig the Ordiary Least Squares Estimates Derivig the Ordiary Least Squares Estimates Pig Yu (HKU) SLR 17 / 75

18 Derivig the Ordiary Least Squares Estimates A Radom Sample I order to estimate the regressio model we eed data. Pig Yu (HKU) SLR 18 / 75

19 Derivig the Ordiary Least Squares Estimates Figure: Scatterplot of Savigs ad Icome for 15 Families, ad the Populatio Regressio E [savigsjicome] = β 0 + β 1 icome Pig Yu (HKU) SLR 19 / 75

20 Derivig the Ordiary Least Squares Estimates Ordiary Least Squares (OLS) Estimatio The OLS estimates of β = (β 0,β 1 ) try to fit as good as possible a regressio lie through the data poits: Figure: bu i (β ) for Three Possible β Values β 1,β 2 ad β 3 : = 10 Pig Yu (HKU) SLR 20 / 75

21 Derivig the Ordiary Least Squares Estimates What Does "As Good As Possible" Mea? Defie residuals at arbitrary β as bu i (β ) = y i β 0 β 1 x i. Miimize the sum of squared residuals [figure here]: mi SSR (β ) mi bu i (β ) 2 = mi (y i β 0 β 1 x i ) 2 β 0,β 1 β 0,β 1 i=1 β 0,β 1 i=1 =) β b = bβ 0, β b 1, where b β is the solutio to the first order coditios (FOCs) for the OLS estimates. It turs out that bβ 1 = i=1 (x i x) (y i y) i=1 (x i x) 2 ad b β 0 = y x b β 1, where x = 1 i=1 x i is the sample mea of x, ad y is similarly defied. I moder times, b β ca be easily obtaied through STATA. Pig Yu (HKU) SLR 21 / 75

22 Derivig the Ordiary Least Squares Estimates Figure: Objective Fuctios of OLS Estimatio Pig Yu (HKU) SLR 22 / 75

23 Derivig the Ordiary Least Squares Estimates Derivatio of OLS Estiamtes The FOCs are 4? () From the first equatio, 2 i=1 y b i β b 0 β 1 x i = 0, 2 i=1 x i y b i β b 0 β 1 x i = 0, 1 i=1 y b i β b 0 β 1 x i = 0, 1 i=1 x i y b i β b 0 β 1 x i = 0. y = b β 0 + x b β 1 =) b β 0 = y x b β 1. Substitutig b β 0 ito the secod equatio, we have 1 x i y i i=1 4 Recall that dx2 dx d(y i β 0 β 1 x i ) 2 y x b β 1 bβ 1 x i = 0 =) 1 i=1 d(ax+b) = 2x ad dx = a, so by the chai rule, dβ 0 = 2(y i β 0 β 1 x i ) d(y i β 0 β 1 x i ) dβ 0 = 2(y i β 0 β 1 x i ), ad d(y i β 0 β 1 x i ) 2 dβ 1 = 2(y i β 0 β 1 x i ) d(y i β 0 β 1 x i ) dβ 1 = 2x i (y i β 0 β 1 x i ). x i (y i y) = 1 β b 1 x i (x i i=1 Pig Yu (HKU) SLR 23 / 75 x).

24 Derivig the Ordiary Least Squares Estimates cotiue So where x i (y i y) i=1 i=1 x i (x i x) i=1 bβ 1 = i=1 x i (y i y) i=1 x i (x i x) = i=1 (x i x) (y i y) i=1 (x i x) 2, (x i x) (y i y) = i=1 Alterative Expressio for b β 1 : 5 y = 1 i=1 y i, so i=1 y i = y. bβ 1 = (x i x) 2 = = x 1 i=1 (x i x) (y i y) 1 i=1 (x i x) 2 = [x i (x i x)] (y i y) = x (y i y) i=1 i=1 (y i y)?5 = x (y y) = 0, i=1 x (x i x) = 0. i=1 dcov (x,y) dvar (x). Pig Yu (HKU) SLR 24 / 75

25 Derivig the Ordiary Least Squares Estimates [Review] Covariace ad Variace The populatio covariace betwee two r.v. s X ad Y, sometimes deoted as σ XY, is defied as Cov (X,Y ) = E [(X µ X ) (Y µ Y )]. Ituitio: [figure here] - If X > µ X ad Y > µ Y, the (X µ X ) (Y µ Y ) > 0, which is also true whe X < µ X ad Y < µ Y. While if X > µ X ad Y < µ Y, or vice versa, the (X µ X ) (Y µ Y ) < 0. - If σ XY > 0, the, o average, whe X is above/below its mea, Y is also above/below its mea. If σ XY < 0, the, o average, whe X is above/below its mea, Y is below/above its mea. A positive covariace idicates that two r.v. s move i the same directio, while a egative covariace idicates they move i opposite directios. Pig Yu (HKU) SLR 25 / 75

26 Derivig the Ordiary Least Squares Estimates Positive Covariace Negative Covariace Zero Covariace Zero Covariace (Quadratic) Figure: Positive, Negative a Zero Covariace Pig Yu (HKU) SLR 26 / 75

27 Derivig the Ordiary Least Squares Estimates cotiue Alterative Expressios of Cov (X,Y ): Cov (X,Y ) = E [XY µ X Y µ Y X + µ X µ Y ] = E [XY ] µ X µ Y µ Y µ X + µ X µ Y = E [XY ] µ X µ Y = E [(X µ X )Y ] = E [X (Y µ Y )], where the last two equalities idicate that demeaig oe of X ad Y is eough. Covariace measures the amout of liear depedece 6 betwee two r.v. s. h - If E [X ] = 0 ad E X 3i h = 0, the Cov(X,X 2 ) = E X 3i h E [X ]E X 2i = 0 although X ad X 2 are quadratically related. [Figure here] h Var(X ) = Cov (X,X ) = E (X µ X ) 2i h = E X 2i is the covariace of X with µ 2 X itself, deoted as σ 2 X or simply σ 2. (we will discuss more o it later) - The defiitio of Var(X ) implies E is the variace plus the first momet squared. h X 2i = Var (X ) + E [X ] 2, the secod momet 6 This is why d Cov(x,y) appears i b β 1 which measures the liear relatioship betwee y ad x. Pig Yu (HKU) SLR 27 / 75

28 Derivig the Ordiary Least Squares Estimates [Review] Method of Momets The method of momets (MoM) was put forward by Karl Pearso ( ) i The basic idea is to replace E [] by 1 i=1 So the MoM estimator is ofte called the sample aalog or sample couterpart. For example, E [X ] ca be estimated by the sample mea X = 1 X i. i=1 Cov (X,Y ) ca be estimated by the sample covariace dcov (X,Y ) = 1 i=1 X i X Y i Y. - Recall that demeaig oe of X ad Y is eough (see the expressios for b β 1 i the previous slide)! Var (X ) ca be estimated by the sample variace dvar (X ) = 1 i=1 X i X 2. Pig Yu (HKU) SLR 28 / 75

29 Derivig the Ordiary Least Squares Estimates OLS Calculatio: A Cooked Numerical Example y i x i y i y x i x (x i x) (y i y) x i (y i y) (x i x) 2 x i (x i x) i= i= > Table: Compoets of OLS Calculatio: = 4 bβ 1 = 4 i=1 x i (y i y) 4 i=1 x i (x i x) = 4 i=1(x i x)(y i y) = 19 4 i=1(x i x) 2 bβ 0 = y xβ b 1 = = = i=1(x i x)(y i y) = i=1(x i x) = Pig Yu (HKU) SLR 29 / 75

30 Derivig the Ordiary Least Squares Estimates History of Ordiary Least Squares The least-squares method is usually credited to Gauss (1809), but it was first published as a appedix to Legedre (1805) which is o the paths of comets. Nevertheless, Gauss claimed that he had bee usig the method sice 1795 at the age of 18. C.F. Gauss ( ), Göttige A.-M. Legedre ( ), Éole Normale Pig Yu (HKU) SLR 30 / 75

31 Derivig the Ordiary Least Squares Estimates CEO Salary ad Retur o Equity We will provide three empirical examples of OLS estimatio. Suppose the SLR model is salary = β 0 + β 1 roe + u, where salary is the CEO salary i thousads of dollars, ad roe is the retur o equity of the CEO s firm i percetage. The fitted regressio is \salary = roe, where b β 1 = > 0, which meas that if the retur o equity icreases by 1 percet, the salary is predicted to chage by $18, 501. [figure here] Eve if roe = 0, the predicted salary of CEO is $963,191. Causal Iterpretatio of b β 1? Thik about what factors are icluded i u (e.g., market share, sales, teure 7, character of the CEO, etc.) ad check whether Cov (x,u) = 0. 7 What is the differece betwee teure ad experiece? Pig Yu (HKU) SLR 31 / 75

32 Derivig the Ordiary Least Squares Estimates Pig Yu (HKU) SLR 32 / 75

33 Derivig the Ordiary Least Squares Estimates Wage ad Educatio Suppose the SLR model is wage = β 0 + β 1 educ + u, where wage is the hourly wage i dollars, ad educ is years of educatio. The fitted regressio is \wage = educ, where β b 1 = 0.54 > 0, which meas that i the sample, oe more year of educatio was associated with a icrease i hourly wage by $0.54 (which is quite large! e.g., four years college would icrease the wage by $ = $2.16 per hour). bβ 0 = 0.90 meas whe educ = 0, wage is egative. Does this make sese? [figure here] Do you thik the retur to educatio is costat? (see the later discussio i this chapter) Causal Iterpretatio of b β 1? No. Pig Yu (HKU) SLR 33 / 75

34 Derivig the Ordiary Least Squares Estimates Figure: \wage = educ: oly two people have educ = 0 Pig Yu (HKU) SLR 34 / 75

35 Derivig the Ordiary Least Squares Estimates Votig Outcomes ad Campaig Expeditures (Two Parties) Suppose the SLR model is votea = β 0 + β 1 sharea + u, where votea is the percetage of vote for cadidate A, ad sharea is the percetage of total campaig expeditures spet by cadidate A. The fitted regressio is \votea = shareA, where b β 1 = > 0, which meas if cadidate A s share of spedig icreases by oe percetage poit, he or she receives (about oe half) percetage poits more of the total vote. If cadidate A does ot sped ay o campaig, the he or she will receive about 26.81% of the total vote. If sharea = 50, the \votea is roughly 50. Causal Iterpretatio of b β 1? Maybe OK - u icludes the quality of the cadidates, dollar amouts (ot percetage) spet by A ad B, etc. Pig Yu (HKU) SLR 35 / 75

36 Properties of OLS o Ay Sample of Data Properties of OLS o Ay Sample of Data Pig Yu (HKU) SLR 36 / 75

37 Properties of OLS o Ay Sample of Data a: Fitted Values ad Residuals by i = b β 0 + b β 1 x i is called the fitted or predicted value at x i. bu i bu i bβ = y i b β 0 b β 1 x i = y i by i is called the residual, which is the deviatio of y i from the fitted regressio lie. 8 [figure here] by = b β 0 + b β 1 x is called the OLS regressio lie or sample regressio fuctio (SRF). 8 bu i is differet from u i = y i β 0 β 1 x i. The later is uobservable while the former is a by-product of OLS estimatio. Pig Yu (HKU) SLR 37 / 75

38 Properties of OLS o Ay Sample of Data Figure: Fitted Values ad Residuals Pig Yu (HKU) SLR 38 / 75

39 Properties of OLS o Ay Sample of Data b: Algebraic Properties of OLS Statistics Check the figure above to uderstad the followig properties. The key is the two FOCs, ad all other results are corollaries. i=1 bu i = 0: it must be the case that some residuals are positive ad others are egative, so the fitted regressio lie must lie i the middle of the data poits. - This property implies y =? by + bu = by, where? is because y i = by i + bu i. i=1 x i bu i = 0: 1 x i bu i = 1 x i bu i bu = Cov d (x, bu) = 0. 9 i=1 i=1 - These two properties are the sample aalogs of E[u] = 0 ad Cov (x,u) = 0 which are implied by E [ujx] = 0 [proof ot required]. - These two properties imply by i bu i = bβ 0 + b β 1 x i bu i = b β 0 bu i + b β 1 x i bu i = i=1 i=1 i=1 i=1 y = b β 0 + x b β 1 : The fitted regressio lie passes through (x,y). This is the first FOC, equivalet to i=1 bu i = 0. 9 Recall that we eed oly demea oe of x ad bu. 10 This meas d Cov (by, bu) = 0. Pig Yu (HKU) SLR 39 / 75

40 Properties of OLS o Ay Sample of Data The Cooked Numerical Example Revisited y i x i by i bu i x i bu i by i bu i Sum: 4 i= Mea: i= Table: Check Algebraic Properties of OLS Statistics: by i = β b 0 + β b 1 x i ad bu i = y i by i y = b β 0 + x b β 1 : 7 = Pig Yu (HKU) SLR 40 / 75

41 Properties of OLS o Ay Sample of Data c: Measures of Variatio How well does the explaatory variable explai the depedet variable? Measures of Variatio: SST = (y i y) 2, i=1 SSE = by i 2 by?= (by i y) 2, i=1 i=1 SSR = SSR bβ = bu i 2, i=1 where SST = total sum of squares, represets total variatio i depedet variable, SSE = explaied sum of squares, represets variatio explaied by regressio, SSR = residual sum of squares, represets variatio ot explaied by regressio. It ca be show that SST = SSE + SSR. Pig Yu (HKU) SLR 41 / 75

42 Properties of OLS o Ay Sample of Data (*) Decompositio of Total Variatio Note that SST = (y i y) 2 i=1 = [(y i by i ) + (by i y)] 2 i=1 = [bu i + (by i y)] 2 i=1 = bu 2 i + 2 i=1 i=1 = SSR + 2 bu i (by i i=1 = SSR + SSE, bu i (by i y) + i=1 y) + SSE (by i y) 2 where the last equality is because i=1 bu i by i = 0 ad i=1 bu iy = y i=1 bu i = 0. Pig Yu (HKU) SLR 42 / 75

43 Properties of OLS o Ay Sample of Data Goodess-of-Fit The R-squared of the regressio, also called the coefficiet of determiatio, is defied as R 2 = SSE SST = SST SSR = 1 SST SSR SST. R-squared measures the fractio of the total variatio that is explaied by the regressio. 0 R 2 1. Whe R 2 = 0? Whe R 2 = 1? [figure here] - R 2 tries to explai variatio ot level; a costat caot explai variatio (but explais oly level), so R 2 = 0 if oly the costat cotributes to the regressio: if bβ 1 = 0, the β b 0 = y xβ b 1 = y, so SSR = y i β b 0 2 x b iβ 1 = (y i y) 2 = SST. i=1 i=1 - R 2 is defied oly if there is a itercept; we eed to use the costat to absorb the level of y, ad the use x i to measure the variatio of y i : 2 SSE = i=1 ( by i y) 2 = i=1 bβ 0 + x i b β 1 y = i=1 y x b β 1 + x i b β 1 y 2 = b β 2 1 i=1 (x i x) 2 = b β 2 1 SST x. Pig Yu (HKU) SLR 43 / 75

44 Properties of OLS o Ay Sample of Data Figure: Data Patters for R 2 = 0 ad R 2 = 1 Cautio: A high R-squared does ot ecessarily mea that the regressio has a causal iterpretatio! [check the followig two examples] Pig Yu (HKU) SLR 44 / 75

45 Properties of OLS o Ay Sample of Data Two Examples of R-Squared CEO Salary ad Retur o Equity: \salary = roe, = 209,R 2 = The regressio explais oly 1.3% of the total variatio i salaries. Votig Outcomes ad Campaig Expeditures: \votea = shareA, = 173,R 2 = The regressio explais 85.6% of the total variatio i electio outcomes. 11 It is quite stadard to have a low R 2 for cross-sectioal data because a lot of heterogeeities are cotaied i u. Pig Yu (HKU) SLR 45 / 75

46 Uits of Measuremet ad Fuctioal Form Uits of Measuremet ad Fuctioal Form Pig Yu (HKU) SLR 46 / 75

47 Uits of Measuremet ad Fuctioal Form b: Icorporatig Noliearities i Simple Regressio The effects of chagig uits of measuremet o OLS statistics will be discussed i Chapter 6. Regressio of log wages o years of educatio: log(wage) = β 0 + β 1 educ + u, where log() deotes the atural logarithm. [figure here] This is ofte called semi-log or log-liear regressio model. This chages the iterpretatio of the regressio coefficiet: β 1 = log(wage) educ = 1 wage wage educ = wage/wage educ where wage/wage is the proportioal chage of wage. [see the ext slide for math review] Or, 100β 1 = 100 wage/wage educ = % wage educ, where % is read as "percetage chage of", ad is read as "chage of"., Pig Yu (HKU) SLR 47 / 75

48 Uits of Measuremet ad Fuctioal Form [Review] Derivative of Logarithmic Fuctios Figure: log(x) : x > 0; wage > 0 Recall that d logx = 1 dx or d logx = dx x x. d logx The derivative gets smaller ad smaller as x gets larger ad larger: lim x!0 dx =, d logx d logx dx = 1, lim x=1 x! dx = 0. Pig Yu (HKU) SLR 48 / 75

49 Uits of Measuremet ad Fuctioal Form A Log Wage Equatio The fitted regressio lie is which implies \ log(wage) = educ, = 526,R 2 = 0.186, \wage t e educ. The wage icreases by 8.3% for every additioal year of educatio (= retur to educatio). For example, if the curret wage is $10 per hour (which implies that log(10) educ = ), ad suppose the educatio is icreased by oe year. The log(10) wage = exp = t 0.83, ad wage/wage educ = +$0.83/$10 +1 year = = 8.3%. Pig Yu (HKU) SLR 49 / 75

50 Uits of Measuremet ad Fuctioal Form Figure: wage = exp( educ) Whe the wage level is higher, the icrease i wage for oe more year of eductio is larger, but the percetage icrease of wage is the same. Pig Yu (HKU) SLR 50 / 75

51 Uits of Measuremet ad Fuctioal Form Costat Elasticity Model CEO Salary ad Firm Sales: log(salary) = β 0 + β 1 log(sales) + u, where sales is measured i millios of dollars. This chages the iterpretatio of the regressio coefficiet: β 1 = log(salary) log(sales) = salary/salary sales/sales = % salary % sales = elasticity. The log-log form postulates a costat elasticity model, whereas the semi-log form assumes a semi-elasticity model with 100β 1 called the semi-elasticity of y with respect to x: i the log wage equatio, elasticity = log(wage) log(educ) = log(wage) educ/educ = β 1 educ, which depeds o educ. The elasticity is larger for a higher educatio level. Pig Yu (HKU) SLR 51 / 75

52 Uits of Measuremet ad Fuctioal Form CEO Salary ad Firm Sales The fitted regressio lie is which implies \ log(salary) = log(sales), = 209,R 2 = 0.211, \salary t e log(sales) = e sales The salary icreases by 0.257% for every 1% icrease of sales Figure: salary = e sales Pig Yu (HKU) SLR 52 / 75

53 Uits of Measuremet ad Fuctioal Form Summary of Fuctioal Forms Ivolvig Logarithms Model Depedet Variable Idepedet Variable Iterpretatio of β 1 Level-level y x y = β 1 x Level-log y log(x) y = β % x Log-level log(y) x % y = (100β 1 ) x Log-log log(y) log(x) % y = β 1 % x Table: Summary of Fuctioal Forms Ivolvig Logarithms Pig Yu (HKU) SLR 53 / 75

54 Expected Values ad Variaces of the OLS Estimators Expected Values ad Variaces of the OLS Estimators Pig Yu (HKU) SLR 54 / 75

55 Expected Values ad Variaces of the OLS Estimators Statistical Properties of OLS Estimators The property such as i=1 bu i = 0 is satisfied by ay sample of data, i.e., regardless of the values of f(x i,y i ) : i = 1,,g, this property must satisfy. We ow treat b β 0 ad b β 1 as estimators, i.e., treat them as radom variables because they are calculated from a radom sample. Recall that bβ 1 = i=1 (x i x) (y i y) i=1 (x i x) 2 ad b β 0 = y x b β 1, where the data f(x i,y i ) : i = 1,,g is radom ad depeds o the particular sample that has bee draw. Cautio: distiguish a radom variable ad its realizatio! Questio: What will the estimators estimate o average ad how large is their variability i repeated samples? i.e., h i h i E bβ 0 =?,E bβ 1 =? ad Var bβ 0 =?,Var bβ 1 =? Pig Yu (HKU) SLR 55 / 75

56 Expected Values ad Variaces of the OLS Estimators Stadard Assumptios for the SLR Model Scietific approach requires assumptios! Assumptio SLR.1 (Liear i Parameters): y = β 0 + β 1 x + u. - I the populatio, the relatioship betwee y ad x is liear. - The "liear" i liear regressio meas "liear i parameter", e.g., y = β 0 + β 1 log(x) + u is a liear regressio. Assumptio SLR.2 (Radom Samplig): The data f(x i,y i ) : i = 1,,g is a radom sample draw from the populatio, i.e., each data poit follows the populatio equatio, y i = β 0 + β 1 x i + u i. Pig Yu (HKU) SLR 56 / 75

57 Expected Values ad Variaces of the OLS Estimators Discussio of Radom Samplig: Wage ad Educatio The populatio cosists, for example, of all workers of coutry A. I the populatio, a liear relatioship betwee wages (or log wages) ad years of educatio holds. Draw completely radomly a worker from the populatio. The wage ad the years of educatio of the worker draw are radom because oe does ot kow beforehad which worker is draw. Throw back worker ito populatio ad repeat radom draw times. The wages ad years of educatio of the sampled workers are used to estimate the liear relatioship betwee wages ad educatio. Pig Yu (HKU) SLR 57 / 75

58 Expected Values ad Variaces of the OLS Estimators Figure: Graph of y i = β 0 + β 1 x i + u i. Pig Yu (HKU) SLR 58 / 75

59 Expected Values ad Variaces of the OLS Estimators cotiue Assumptio SLR.3 (Sample Variatio i Explaatory Variable): i=1 (x i x) 2 > 0. - The values of the explaatory variables are ot all the same (otherwise it would be impossible to study how much the depedet variable chages whe the explaatory variable chages oe uit - β 1 ). [figure here] Assumptio SLR.4 (Zero Coditioal Mea): E [u i jx i ] = 0. - The value of the explaatory variable must cotai o iformatio about the mea of the uobserved factors. Pig Yu (HKU) SLR 59 / 75

60 Expected Values ad Variaces of the OLS Estimators Figure: A Scatterplot of Wage Agaist Educatio Whe educ i = 12 for All i Pig Yu (HKU) SLR 60 / 75

61 Expected Values ad Variaces of the OLS Estimators a: Ubiasedess of OLS Theorem 2.1: Uder assumptios SLR.1-SLR.4, h i h i E bβ 0 = β 0 ad E bβ 1 = β 1 for ay values of β 0 ad β 1. How to uderstad ubiasedess? The estimated coefficiets may be smaller or larger, depedig o the sample that is the result of a radom draw. However, o average, they will be equal to the values that characterize the true relatioship betwee y ad x i the populatio. "O average" meas if samplig was repeated, i.e., if drawig the radom sample ad doig the estimatio was repeated may times. I a give sample, estimates may differ cosiderably from true values. Pig Yu (HKU) SLR 61 / 75

62 Expected Values ad Variaces of the OLS Estimators (*) Proof of Ubiasedess of OLS Proof. We always coditio o fx i,i = 1,,g, i.e., the x values ca be treated as fixed. Note that bβ 1 β 1 = i=1 (x i x)y i i=1 (x i x) 2 β 1 SLR.1 2 = i=1 (x i x) (β 0 + β 1 x i + u i ) i=1 (x i x) 2 β 1 = i=1 (x i x)β 0 i=1 (x i x) 2 + β i=1 (x i 1 i=1 (x i = i=1 (x i x)u i i=1 (x i x) 2, x)x i x) 2 + i=1 (x i x)u i i=1 (x i x) 2 β 1 where the last equality is because i=1 (x i x)β 0 = β 0 i=1 (x i x) = 0 ad i=1 (x i x)x i = i=1 (x i x) 2. Pig Yu (HKU) SLR 62 / 75

63 Expected Values ad Variaces of the OLS Estimators (*) Proof cotiue Proof. Now, h i E bβ 1 β 1 = E " i=1 (x i x)u i i=1 (x i x) 2 # (ii) = E i=1 (x i x)u i i=1 (x i x) 2 (i) = i=1 E [(x i x)u i ] (ii) i=1 (x i x) 2 = i=1 (x i x)e [u i ] a SLR.2 4 i=1 (x i x) 2 = 0. Further, sice y = β 0 + β 1 x + u, h i E bβ 0 = E hy xβ b i 1 h = β 0 xe bβ 1 i = E hβ bβ 0 1 β 1 x + u β 1 i + E [u] = β 0, where the last equality is because b β 1 is ubiased, ad E [u] = 1 i=1 E [u i ] = 0 by Assumptio SLR.4. a E [u i jx 1,,x ] SLR.2 = E [u i jx i ] SLR.4 = 0. The key assumptio for ubiasedess is Assumptio SLR.4. Pig Yu (HKU) SLR 63 / 75

64 Expected Values ad Variaces of the OLS Estimators b: Variaces of the OLS Estimators Ubiasedess is ot the oly desirable property of the OLS estimator. [ituitio here: gufire][figure here] Depedig o the sample, the estimates will be earer or farther away from the true populatio values. How far ca we expect our estimates to be away from the true populatio values o average (= samplig variability)? Samplig variability is measured by the estimator s variaces. [see the ext slides for review of variace] Pig Yu (HKU) SLR 64 / 75

65 Expected Values ad Variaces of the OLS Estimators Figure: Radom Variables with the Same Mea BUT Differet Distributios Pig Yu (HKU) SLR 65 / 75

66 Expected Values ad Variaces of the OLS Estimators [Review] Variace h Recall that Var (X ) = E (X E [X ]) 2i measures how spreadig the distributio of a r.v. X is [figure above]. For example, cosider two radom variables X ad Y with [figure here] P (X = 2) = P (X = 2) = 1/2 ad P (Y = 1) = P (Y = 1) = 1/2. - Obviously, X is more spreadig tha Y although both have the mea zero. - If we check their variaces, the ideed, Var (X ) = 1 2 ( 2) = 4 > 1 = 1 2 ( 1) = Var (Y ). The stadard deviatio of a r.v., deoted as sd(x), is simply the square root of the variace: q sd(x ) = Var(X ). The ame "stadard deviatio" came from Karl Pearso Variace measures the expected squared "deviatio" from the mea ad has the uit of the squared uit of X. - By takig the square root i sd(x ), we get back to the "stadard" (origial) uit of X. 12 We will show a photo of him i the ext chapter. Pig Yu (HKU) SLR 66 / 75

67 Expected Values ad Variaces of the OLS Estimators cotiue If X = 1(college graduate), the Var (X ) = p (1 p) 2 + (1 p)(0 p) 2 = p(1 p). If X N µ,σ 2, the Var (X ) = σ 2. Two Useful Properties: (i) for idepedet r.v. s, "the variace of the sum is the sum of the variaces", Var i=1 X i = i=1 Var (X i ); (ii) for ay costats a ad b, Var (a + bx ) = b 2 Var (X ). These two properties imply Var (x) = Var SLR.2+(i) = 1 i=1 1 2 x i! (ii) i=1 = 1 2 Var i=1 Var (x i ) SLR.2 = x i! Var (x) 2 = Var (x). Pig Yu (HKU) SLR 67 / 75

68 Expected Values ad Variaces of the OLS Estimators [Review] Coditioal Variace As the coditioal mea, the coditioal variace of Y give X = x, deoted as Var (Y jx = x), is the variace of Y for the (slice of) idividuals with X = x. Apply the secod property of variace to the coditioal variace to have Var (y i jx i ) = Var (y i β 0 β 1 x i jx i ) = Var (u i jx i ), where as metioed i the coditioal mea, coditioal o x i, β 0 + β 1 x i ca be treated as a costat like a i (ii). Although E [y i jx i ] = β 0 + β 1 x i is liear i x i (Assumptio SLR1, 2 ad 4), Var (y i jx i ) is assumed ot to deped o x i (Assumptio SLR.5 below). Pig Yu (HKU) SLR 68 / 75

69 Expected Values ad Variaces of the OLS Estimators Homoskedasticity Assumptio SLR.5 (Homoskedasticity): Var (u i jx i ) = σ 2. - The value of the explaatory variable must cotai o iformatio about the variability of the uobserved factors. Pig Yu (HKU) SLR 69 / 75

70 Expected Values ad Variaces of the OLS Estimators Heteroskedasticity Whe Var (u i jx i ) depeds o x i, the error term is said to exhibit heteroskedasticity. Figure: A Example for Heteroskedasticity: Wage ad Educatio Pig Yu (HKU) SLR 70 / 75

71 Expected Values ad Variaces of the OLS Estimators Variaces of OLS Estimators Theorem 2.2: Uder assumptios SLR.1-SLR.5, Var bβ 1 Var bβ 0 = σ 2 i=1 (x i x) 2 = σ 2, SST x = σ 2 1 i=1 x2 i i=1 (x i x) 2 = σ 2 1 i=1 x2 i. SST x The samplig variability of the estimated regressio coefficiets will be the higher the larger the variability of the uobserved factors, ad the lower, the higher the variatio i the explaatory variable. [figure here] Pig Yu (HKU) SLR 71 / 75

72 Expected Values ad Variaces of the OLS Estimators Figure: Relative Difficulty i Idetifyig β 1 Pig Yu (HKU) SLR 72 / 75

73 Expected Values ad Variaces of the OLS Estimators (*) Proof of Theorem 2.2 Proof. Var bβ 1 is more importat, so we cocetrate o it here. As i the proof of Theorem 2.1, we coditio o fx i,i = 1,,g. Var bβ 1 = Var bβ i=1 1 β 1 = Var (x! i x)u i i=1 (x i x) 2 (ii) = Var i=1 (x i x)u i SLR.2+(i) i=1 SSTx 2 = Var ((x i x)u i ) SSTx 2 (ii) = i=1 (x i x) 2 Var (u i ) a SLR.5 SSTx 2 = i=1 (x i x) 2 σ 2 SSTx 2 = σ 2 SST x SST 2 x = σ 2 SST x. a Var (u i jx 1,,x ) SLR.2 = Var (u i jx i ) SLR.5 = σ 2. The key assumptio to get this simple formula of Var bβ 1 is Assumptio SLR.5. Pig Yu (HKU) SLR 73 / 75

74 Expected Values ad Variaces of the OLS Estimators c: Estimatig the Error Variace (assumig homoskedasticity) Var (u i jx i ) = σ 2 [proof ot required] = Var (u i ). - The variace of u does ot deped o x, i.e., is equal to the ucoditioal variace. The sample aalog of Var (u i ) is eσ 2 = 1 bu i i=1 2 1 bu = i=1 bu 2 i = SSR. - Note that bu i = β 0 + β 1 x + u b i β b 0 β 1 x i = u bβ i 0 β bβ 0 1 β 1 x i, so h i h i E [bu i u i ] = E bβ 0 β 0 E bβ 1 β 1 x i = 0. This is why we ca use bu i to substitute u i i the geuie sample aalog of Var (u i ), say, 1 i=1 (u i u) 2. Oe could estimate the variace of the errors by calculatig the variace of the residuals i the sample; ufortuately this estimate would be biased. A ubiased estimate of the error variace ca be obtaied by subtractig the umber of estimated regressio coefficiets from the umber of observatios: bσ 2 = 1 2 i=1 bu 2 i = SSR 2. Pig Yu (HKU) SLR 74 / 75

75 Expected Values ad Variaces of the OLS Estimators cotiue Theorem 2.3 (Ubiased Estimatio of σ 2 ): Uder assumptios SLR.1-SLR.5, h E bσ 2i = σ 2. p bσ = bσ 2 is called the stadard error of the regressio (SER). The estimated stadard deviatios of the regressio coefficiets are called stadard errors. They measure how precisely the regressio coefficiets are estimated: se bβ 1 se bβ 0 i.e., we plug i bσ 2 for the ukow σ 2. = = r s dvar bβ 1 = r dvar bβ 0 = bσ 2, SST x s bσ 2 1 i=1 x2 i, SST x Pig Yu (HKU) SLR 75 / 75

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.