Regression. Econometrics II. Douglas G. Steigerwald. UC Santa Barbara. D. Steigerwald (UCSB) Regression 1 / 17

Regression Econometrics Douglas G. Steigerwald UC Santa Barbara D. Steigerwald (UCSB) Regression 1 / 17

Overview Reference: B. Hansen Econometrics Chapter 2.20-2.23, 2.25-2.29 Best Linear Projection is often called Regression coined by rancis Galton 1886 Regression coe cient: β = (Var (x)) 1 Cov (x, y) regression measures correlation not causality (y, x) jointly normally distributed yields "classic regression model" short regression coe cient is often a bound ("omitted variable bias") random coe cient model implies E (yjx) is linear in x D. Steigerwald (UCSB) Regression 2 / 17

Origin rancis Galton Galton: introduces regression, correlation, standard deviation 1886 paper studies joint distribution of heights (y, x) y childrens height x parents height in essence, he estimates E (yjx) - slope is 2/3 on average, children don t attain the height of their parents regression to the mean he deduces that this conclusion is a fallacy if the marginal distribution of y and x are the same (heights are stable) if heights are stable, µ y = µ x = µ childrens heights are a linear combination of average height and parents height y = x β + α + u α = (1 β) µ P (y jx) = (1 β) µ + x β D. Steigerwald (UCSB) Regression 3 / 17

Regression allacy if heights are stable, Var (y) = Var (x) slope coe cient is always less than 1 β = Cov (x, y ) /Var (x) = Corr (x, y ) 1 Corr (x, y ) 1 β < 1 in general regression fallacy: β < 1 implies Var (y) < Var (x) clearly false, we derived β < 1 with Var (y) equal to Var (x)! subtle, Secrist (1933) The Triumph of Mediocrity in Business department stores: regress 1920 pro t on 1930 pro t interprets β < 1 as triumph of mediocrity what condition implies Var (y) < Var (x)? Var (y) = β 2 Var (x) + Var (u) Var (y ) /Var (x) < 1 if β 2 < 1 Var (u) /Var (x) D. Steigerwald (UCSB) Regression 4 / 17

Reverse Regression Galton also noted, could regress x on y reverse regression (parent height on child height) x = y β + α + u β = Corr(x, y) = β α = (1 β) µ = α coe cients are identical a natural feature of joint distributions (not causal) inverting a projection (or CE) does not yield a projection (or CE) x = y 1 β α 1 β β u a valid equation, but not a linear projection (nor a CE) hence projection of x on y does not have slope 1/β D. Steigerwald (UCSB) Regression 5 / 17

Regression Coe cients in regression, often the intercept is separated, if so y = x T β + α + u x does not contain the value 1 Ey = Ex T β + α or µ y = µ T x β + α α = µ y µ T x β the regression model is sometimes written as y µ y = (x µ x ) T β + u centered (deviations-from-mean) form β lpc = E (x µ x ) (x µ x ) T 1 E (x µ x ) y µ y β = (Var (x)) 1 Cov (x, y) D. Steigerwald (UCSB) Regression 6 / 17

Coe cient Decomposition x T = x 1, x2 T with dim (x1 ) = 1 y = x 1 β 1 + x T 2 β 2 + u Matrix Expression β 1 = E exex T 1 E (exey) the correlation between x 1 and y after correlation with x 2 is removed "Two-Step" Procedure regression 1: x 1 on x 2 x 1 = x2 Tγ 2 + u 1 regression 2: y on u 1 β 1 = E (u 1 y ) /Eu1 2 D. Steigerwald (UCSB) Regression 7 / 17

Normal Regression Assume (y, x) are jointly normally distributed, which implies (u, x) are jointly normally distributed Best linear projection y = x T β + u where β = E xx T 1 E (xy) by construction E (xu) = 0 together with jointly normal distribution, implies x and u are independent E (ujx) = E (u) = 0 E u 2 jx = E u 2 = σ 2 jointly normal distribution yields "classic regression model" D. Steigerwald (UCSB) Regression 8 / 17

Review: Best Linear Predictor Error best linear predictor (linear approximation) decomposition P (yjx) = x T β lpc y = x T β lpc + u u = e + E (yjx) x T β lpc error consists of two components e - deviation from conditional mean E (ejx) = 0 E (yjx) x T β lpc error from approximation of conditional mean D. Steigerwald (UCSB) Regression 9 / 17

Approximation Error Example 1 y = x + x 2 x N (0, 1) E (yjx) = x + x 2 e 0 linear approximation y = βx + α + u α = µ y βµ x = 1 β = E (xy) /Ex 2 = 1 approximation P (yjx) = x + 1 di ers sharply from E (yjx) linear predictor error u = 0 + x + x 2 (x + 1) = x 2 1 u is a function of x, but uncorrelated with x E (xu) = E x 3 x = 0 D. Steigerwald (UCSB) Regression 10 / 17

Approximation Error: Grouped Data β di ers over groups could indicate nonlinear E (yjx) group 1 - x N (2, 1) group 2: x N (4, 1) here r x E (yjx) does not di er over groups yjx N (m (x), 1) m (x) = 2x x 2 6 r x E (yjx) = 2 x 3 linear approximation does di er over groups implication: conditional mean is nonlinear D. Steigerwald (UCSB) Regression 11 / 17

Approximation Error: Omitted Covariates long regression (Goldberger coined these terms) y = x T 1 β 1 + x T 2 β 2 + u E (xu) = 0 β = short regression y = x T 1 γ 1 + u 1 E (x 1 u 1 ) = 0 γ 1 = E xx T 1 E (xy) 1 E x 1 x1 T E (x1 y) recall, both coe cient and error change the linear projection coe cient from the short regression is γ 1 = β 1 + E x 1 x1 T 1 E (x1 x 2 ) β 2 + E x 1 x1 T 1 E (x1 u) E (xu) = 0 ) E (x 1 u) = 0 γ 1 6= β 1 unless E (x 1 x 2 ) = 0 (or β 2 = 0) "omitted variable" bias D. Steigerwald (UCSB) Regression 12 / 17

Short Regression Coe cient in many cases, short regression coe cient is a bound Long regression: linear projection of log(wage) on x = (ed, ab) ab intellectual ability, unobserved Short regression: linear projection of log(wage) on x = ed ed and ab likely positively correlated conditional on ed, ab likely increases wages therefore γ 1 = β 1 + c, c > 0 an upper bound (not very useful here) D. Steigerwald (UCSB) Regression 13 / 17

Approximation: Random Coe cient Model (linear) random coe cient model yields a linear CE random coe cient model y = x T η η individual speci c component random independent of x equals r x y, true causal e ect i.e. change in response variable due to change in covariate Example: y = log(wages) x = ed η individual-speci c return to schooling D. Steigerwald (UCSB) Regression 14 / 17

Random Coe cient Model Yields Linear CE η = β + v E (η) = β Var (η) = Ω distribution of v is independent of x, mean 0 covariance Ω Conditional Expectation unction E (yjx) = x T E (ηjx) = x T E (η) = x T β y = x T β + e e = x T v E (ejx) = 0 Var (ejx) = x T Ωx (conditional) heteroskedasticity in a regression can be evidence that y is generated by a random coe cient model D. Steigerwald (UCSB) Regression 15 / 17

Review How do we express β in terms of covariances? (consider a single covariate) β = Cov (x, y) /Var (x) How does this change if Var (x) = Var (y)? β = Cov (x, y) / (sd (x) sd (y)) = Corr (x, y) How does this change, if we regress x on y? β = Cov (x, y) /Var (y) = Corr (x, y) dentical! How does the short (regression) coe cient di er from the long coe cient? γ 1(short) = β 1(long ) + (Var (x 1 )) 1 Cov (x 1, x 2 ) β 2(long ) When does the short (regression) coe cient provide a useful bound on the long coe cient? when the bias attenuated the measured response (sign of β 1(long ) di ers from sign of Cov (x 1, x 2 ) β 2(long ) ) D. Steigerwald (UCSB) Regression 16 / 17

Matrix Expression partition regressor matrix, with Q 12 = E x 1 x2 T 1 Q11 Q 12 Q 11 Q := 12 Q 21 Q 22 Q 21 Q 22 Q 11 = Q 11 Q 12 Q 1 22 Q 21 variation in the component of x 1 that is uncorrelated with x 2 y x 2 Q 1 22 E (x 2y) component of y that is uncorrelated with x 2 Q 12 = Q 11 Q 12 Q 1 22 β = Q 1 E (xy) β 1 = Q 11 E (x 1 y) Q 12 E (x 2 y) Q 12 = Q 11 Q 12 Q22 1 β 1 = Q 11 E x 1 y x 2 Q22 1 E (x 2y) Return to Decomposition D. Steigerwald (UCSB) Regression 17 / 17