Quantitative Methods in Economics Panel data and generalized least squares

Size: px

Start display at page:

Download "Quantitative Methods in Economics Panel data and generalized least squares"

Scott Nicholson
5 years ago
Views:

1 Quantitative Methods in Economics Panel data and generalized least squares Maximilian Kasy Harvard University, fall / 22

2 Roadmap, Part I 1. Linear predictors and least squares regression 2. Conditional expectations 3. Some functional forms for linear regression 4. Regression with controls and residual regression 5. Panel data and generalized least squares 2 / 22

3 Takeaways for these slides Panel data: more than one outcome Estimator: GLS generalizes LS to multidimensional outcome Example 1: Siblings earnings and education Example 2: Agricultural output over time 3 / 22

4 Panel data Data: Y 1 X 1 X X 1K Y =.., X =.. =... Y M X M1... X MK X M Example: Population of families with Y t = log(earnings) and Z t =(education, age) for family member t, K 1 X t is constructed (polynomials, etc.) from Z t (t = 1,...,M). 4 / 22

5 Generalized linear predictor Linear predictor with weighting: β 1 β =.. = arg min E[(Y Xc) Φ(Y Xc)], c R β K K Φ is a M M symmetric, positive-definite matrix. Notation: EΦ (Y X) = Xβ. 5 / 22

6 Inner Product: If U and V are M 1 random vectors, and Φ is a (nonrandom) M M positive-definite matrix, U,V Φ = E(U ΦV). Norm: V Φ = V,V 1/2 Φ. β = argmin c Y Xc 2 Φ. Questions for you Find the first order conditions for this minimization problem. 6 / 22

7 Solution: Let Xβ = ( X (1)... X (K )) β = X (1) β X (K ) β K, Orthogonal projection: Y Xβ,X (k) Φ = 0 (k = 1,...,K ) E[(Y Xβ) ΦX] = 0 Solving for β : and thus E(Y ΦX) β E(X ΦX) = 0 E(X ΦX)β = E(X ΦY ), β = [E(X ΦX)] 1 E(X ΦY ). 7 / 22

8 Sample version - generalized least squares (GLS) Data: ( ) x i = x (1) i... x (K ) i y i1 y i =.., y = y im y 1.. y n, x i11... x i1k =.., x = x im1... x imk y is nm 1 and x is nm K. Let A be a nm nm symmetric, positive-definite matrix. GLS estimator: b = argmin(y xc) A(y xc). c x 1.. x n. 8 / 22

9 Inner Product: u,v R nm, Norm: Orthogonal projection: u,v A = u Av. v A = v,v 1/2 A. b = argmin c y xc A. y xb,x (k) A = 0 (k = 1,...,K ) (y xb) Ax = 0 Solving for b: y Ax b (x Ax) = 0 (x Ax)b = x Ay, and thus b = (x Ax) 1 x Ay. 9 / 22

10 Rewriting the GLS estimator Factorization of A : A = C C, where C is nm nm and nonsingular. Define ỹ = Cy, x = Cx. Note that with ṽ = Cv, v 2 A = v Av = v C Cv = ṽ ṽ = ṽ 2 I, 10 / 22

11 thus b = argmin y xc A = argmin ỹ xc I = ( x x) 1 x ỹ. c c We can obtain a GLS fit using a least-squares program. In Matlab, b = x\ỹ. 11 / 22

12 Panel data: Application to siblings Consider siblings in families with two children Y it = log(earnings) of sibling t in family i Z it = education of sibling t in family i (t = 1,2) Data: W i = (Y i1,y i2,z i1,z i2 ) i.i.d. (i = 1,...,n). 12 / 22

13 Latent variable model Assume that earnings are determined by E (Y i1 Z i1,z i2,a i ) = γ 1 + γ 2 Z i1 + γ 3 A i, E (Y i2 Z i1,z i2,a i ) = γ 1 + γ 2 Z i2 + γ 3 A i. Here A i is a family background variable that is not observed. Note that, by assumption, sibling education does not show up given A i! Now consider a regression of earnings on own- and sibling education: E (Y i1 1,Z i1,z i2 ) = γ 1 + γ 2 Z i1 + γ 3 E (A i 1,Z i1,z i2 ), E (Y i2 1,Z i1,z i2 ) = γ 1 + γ 2 Z i2 + γ 3 E (A i 1,Z i1,z i2 ). 13 / 22

14 Next, consider a hypothetical regression of A i on both siblings education: E (A i 1,Z i1,z i2 ) = λ 0 + λ 1 Z i1 + λ 2 Z i2. Questions for you Assume λ 1 = λ 2 and combine these equations to derive an expression for E (Y it 1,Z it,z i1 + Z i2 ). 14 / 22

15 Solution: E (Y i1 1,Z i1,z i1 + Z i2 ) = (γ 1 + γ 3 λ 0 ) + γ 2 Z i1 + γ 3 λ 1 (Z i1 + Z i2 ), E (Y i2 1,Z i2,z i1 + Z i2 ) = (γ 1 + γ 3 λ 0 ) + γ 2 Z i2 + γ 3 λ 1 (Z i1 + Z i2 ). If the latent variable model is true, then the coefficient on own-education equals γ 2. Generalized Linear Predictor: ( ) Yi1 Y i =, X i = Y i2 E Φ (Y i X i ) = X i β = ( ) 1 Zi1 (Z i1 + Z i2 ), 1 Z i2 (Z i1 + Z i2 ) ( ) β1 + β 2 Z i1 + β 3 (Z i1 + Z i2 ). β 1 + β 2 Z i2 + β 3 (Z i1 + Z i2 ) 15 / 22

16 Panel data: Application to production functions Suppose output is determined by Q it = L γ it F iv it (i = 1,...,n; t = 1,...,T ), Q it = output for farm i in year t, L it = labor input, 0 < γ < 1 F i = soil quality, V it = rainfall. Profit maximization at time t, where V it is uncertain: max L Questions for you E[P t Q it W t L Info it ] = maxp t [L γ F i E(V it Info it )] W t L L Derive the first order condition for the firms profit maximization problem. Solve for the implied demand for labor. 16 / 22

17 Solution: First order condition: Derived demand for labor: γp t L γ 1 F i E(V it Info it ) = W t. logl it = 1 1 γ [logγ log W t P t + logf i + loge(v it Info it )]. 17 / 22

18 We can write the production function as logq it = γ logl it + logf i + logv it. Equivalently Y it = γz it + A i + logv it, with Y it = logq it, Z it = logl it, and A i = logf i. Hypothetical regression of output at time t on labor input at all times and soil quality (unobserved): E(Y it Z i1,...,z it,a i ) = γz it + A i + E(logV it Z i1,...,z it,a i ) 18 / 22

19 One more assumption: strict exogeneity of Z conditional on A: E(logV it Z i1,...,z it,a i ) = constant, then Questions for you Take first differences : Calculate Then calculate E(Y it Z i1,...,z it,a i ) = γz it + A i + constant E(Y it Y i,t 1 Z i1,...,z it,a i ). E(Y it Y i,t 1 Z i1,...,z it ). 19 / 22

20 Solution: Strict exogeneity conditional on A E(Y it Y i,t 1 Z i1,...,z it,a i ) = γz it + A i (γz i,t 1 + A i ) = γ(z it Z i,t 1 ), and thus E(Y it Y i,t 1 Z i1,...,z it ) = γ(z it Z i,t 1 ). 20 / 22

21 Generalized Linear Predictor: Y i2 Y i1 Z i2 Z i1 Ydif i =., X i =., Y it Y i,t 1 Z it Z i,t 1 EΦ (Ydif i X i ) = X i β = β(z i2 Z i1 ). β(z it Z i,t 1 ). If the latent variable model with the strict exogeneity assumption is true, then β = γ. 21 / 22

22 Within group estimator Strict exogeneity conditional on A where E[ 1 T T t=1y it Z i1,...,z it,a i ] = 1 T Z i = 1 T T t=1 (γz it + A i + constant) = γ Zi + A i + constant, T t=1z it, Ȳ i = 1 T Consider de-meaned regression: T t=1 Y it. E(Y it Ȳ i Z i1,...,z it,a i ) = γz it + A i (γ Zi + A i ) = γ(z it Zi ), and thus E(Y it Ȳ i Z i1,...,z it ) = γ(z it Zi ). 22 / 22

23 Generalized Linear Predictor Define demeaned variables, Y i1 Ȳ i Z i1 Zi Ydev i =., X i =.. Y it Ȳ i Z it Zi Generalized linear predictor: β(z i1 Zi ) EΦ (Ydev i X i ) = X i β =.. β(z it Zi ) If the latent variable model with the strict exogeneity assumption is true, then β = γ. 23 / 22

24 Will return to panel data when talking about differences in differences! 24 / 22

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the