Day 4 Nonlinear methods

Size: px

Start display at page:

Download "Day 4 Nonlinear methods"

Cody McDowell
6 years ago
Views:

1 Day 4 Nonlinear methods c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009, 2010), Microeconometrics using Stata (MUS), Stata Press. and A. Colin Cameron and Pravin K. Trivedi (2005), Microeconometrics: Methods and Applications (MMA), C.U.P. March 21-25, 2011 c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 1 Colin / 52 C

2 1. ntroduction 1. ntroduction Consider nonlinear estimator, one for which there is no explicit solution for bθ. example we work with is Poisson MLE Topics include interpreting coe cients general estimation theory key nonlinear estimators: MLE and NLS calculating marginal e ects statistical inference computational methods c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 2 Colin / 52 C

3 1. ntroduction Overview 1 ntroduction 2 Poisson Regression 3 Data example 4 Marginal e ects 5 Estimation theory 6 Maximum likelihood estimator 7 Nonlinear least squares estimator 8 Statistical inference 9 Gradient methods 10 Appendix: Theory for Extremum Estimator 11 Appendix: Theory for Poisson MLE c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 3 Colin / 52 C

4 2. Poisson Regression Model 2. Poisson Regression Model Poisson regression is a leading example of a nonlinear model. Consider count data with y = 0, 1, 2,... OLS has problem that E[y i jx i ] = xi 0 β < 0 is possible And OLS is ine cient (based on homoskedasticity, normality). So what do we do? Starting point from statistics is Poisson: Poisson density (or more precisely probability mass function): Pr[Y = yjµ] = e µ µ y /y! where µ = E[y] > 0, V[y] = µ, y! = y (y 1) 1. For regression the mean µ > 0 varies with regressors x Conditional mean for Poisson regression model: where x i and β are K 1 vectors. µ i = E[y i jx i ] = exp(x 0 i β), µ i > 0 c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 4 Colin / 52 C

5 2. Poisson Regression Maximum Likelihood Maximum likelihood Likelihood principle Likelihood principle says choose that value of β which makes the probability of observing our data (y, X) as high as possible. Joint density f (yjx, β) gives probability of observing y given β (and X). Likelihood L(βjy, X) = f (yjx, β) reinterprets as probability of β given y (and X). Maximize the likelihood, or equivalently the log-likelihood. n general f (y i jx i, β) is conditional (on x i ) density for one observation. f (yjx, β) = f (y 1 jx 1, θ) f (y N jx N, θ) = N i =1 f (y i jx i, β) is joint conditional density for N independent observations. L(βjy, X) = N i =1 f (y i jx i, β) is the likelihood function. ln L(βjy, X) = ln( N i =1 f (y i jx i, β)) = N i =1 ln f (y i jx i, β) is the log-likelihood function. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 5 Colin / 52 C

6 2. Poisson Regression MLE for Poisson MLE for Poisson Conditional density for one observation is e µ µ y /y! with µ = e x0 β f (y i jx i, β) = exp( exp(xi 0 β)) exp(xi 0 β) y i /y i! ln f (y i jx i, β) = exp(xi 0 β) + y i xi 0 β ln(y i!) Log-likelihood for N independent observations: ln L(β) = N i=1 ln f (y i jx i, β) = N i=1 f exp(x0 i β) + y i x 0 i β ln(y i!)g. Di erentiate w.r.t. β: ln L(β) β = N i=1 f exp(x0 i β)x i + y i x i g = N i=1 (y i exp(x 0 i β))x i ML rst-order conditions N i=1 (y i exp(x 0 i β))x i = 0. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 6 Colin / 52 C

7 2. Poisson Regression The Complications The Complications We cannot solve the nonlinear rst-order conditions in β. How do we compute β? b F Use an iterative gradient method such as Newton-Raphson. How do we do asymptotic theory for b β? F Linearize the nonlinear rst-order conditions. The conditional mean E[yjx] is nonlinear in x and β. How do we interpret β? F Use the marginal e ect E[y jx]/ x = exp(x 0 β)β. F This varies with the evaluation point x. Similar complications hold for all nonlinear models. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 7 Colin / 52 C

8 3. Data Example Poisson for Doctor visits 3. Data Example: Poisson for doctor visits Data from MUS chapter 10. Use Poisson regression as dependent variable docvis is a count.. use mus10data.dta, clear. quietly keep if year02==1. describe docvis private chronic female income storage display value variable name type format label variable label docvis int %8.0g number of doctor visits private byte %8.0g = 1 if private insurance chronic byte %8.0g = 1 if a chronic condition female byte %8.0g = 1 if female income float %9.0g ncome in $ / summarize docvis private chronic female income Variable Obs Mean Std. Dev. Min Max docvis private chronic female income c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 8 Colin / 52 C

9 3. Data Example Poisson for Doctor visits. histogram docvis if docvis < 40, width(1) Density number of doctor visits c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program in March Economics 21-25,. Based 2011 on A. 9 Colin / 52 C

10 3. Data Example Poisson MLE Poisson MLE: default standard errors (do not use) Converges quickly (2 iterations) and coe cients highly statistically signi cant.. poisson docvis private chronic female income teration 0: log likelihood = teration 1: log likelihood = teration 2: log likelihood = Poisson regression Number of obs = 4412 LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = docvis Coef. Std. Err. z P> z [95% Conf. nterval] private chronic female income _cons c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 10 Colin / 52 C

11 3. Data Example Robust standard errors Poisson MLE: robust standard errors (use these) Same coe cient estimates, di erent s.e. s, still highly statistically signi cant.. poisson docvis private chronic female income, vce(robust) teration 0: log pseudolikelihood = teration 1: log pseudolikelihood = teration 2: log pseudolikelihood = Poisson regression Number of obs = 4412 Wald chi2(4) = Prob > chi2 = Log pseudolikelihood = Pseudo R2 = Robust docvis Coef. Std. Err. z P> z [95% Conf. nterval] private chronic female income _cons c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 11 Colin / 52 C

12 3. Data Example Robust standard errors For Poisson the default standard errors can be much too small. Reason: Algebra reveals that the default s.e. s are based on Poisson assumption that V[y] = µ and instead for these data V[y] >> µ.. * Comparison of standard errors. quietly poisson docvis private chronic female income. estimates store DEFAULT. quietly poisson docvis private chronic female income, vce(robust). estimates store ROBUST. estimates table DEFAULT ROBUST, b(%9.4f) se(%9.3f) stats(n r2 F) Variable DEFAULT ROBUST private chronic female income _cons N r2 F legend: b/se c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 12 Colin / 52 C

13 4. Marginal e ects 4. Marginal e ects nterpret coe cients using ME j = E[y jx] x j, the marginal e ect on the conditional mean of a one unit change in the j th regressor. For Poisson ME j = E[y jx] x j = exp(x0 β) x j = exp(x 0 β)β j. 1. For Poisson the sign of β j equals the sign of ME j Reason: exp(x 0 β) > For Poisson if β j is twice β k then ME j is twice ME k Reason: ME j MEk = exp(x0 β)β j exp(x 0 β)β k = β j β. k This is the case for any single-index model with E[yjx] = g(x 0 β). 3. ME j di ers with the point of evaluation x 4. For Poisson: β j is a semi-elasticity since ME j /E[yjx] = β j. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 13 Colin / 52 C

14 4. Marginal e ects AME, MEM and MER AME, MEM and MER: Three estimates of ME 1. AME: Average marginal e ect = 1 N N i=1me j. Stata 11: margins, dydx(*) or Stata 10 add-on margeff. 2. MEM: Marginal e ect at mean = ME at x = x. Stata 11: margins, dydx(*) atmean or Stata 10 mfx. 3. MER: Marginal e ect at a representative value = ME at x = x. Stata 11: margins, dydx(*) at() or Stata 10 mfx. For Poisson these are (1) 1 N N i =1 exp(x0 i b β); (2) exp(x 0 b β); (3) exp(x 0b β). Preceding method uses derivatives. For discrete regressors use nite di erence method F ME = E[y jd = 1, x] E[y jd = 0, x] c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 14 Colin / 52 C

15 4. Marginal e ects Data Example Average Marginal e ect (AME) Large e ects: e.g. Doctor visits 90% higher for female Here AME is 10%-40% higher than MEM. * Marginal effects AME and MEM. margins, dydx(*) Warning: cannot perform check for estimable functions. Average marginal effects Number of obs = 4412 Model VCE : Robust Expression : Predicted number of events, predict() dy/dx w.r.t. : private chronic female income Delta-method dy/dx Std. Err. z P> z [95% Conf. nterval] private chronic female income c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 15 Colin / 52 C

16 4. Marginal e ects Data Example Marginal e ect at mean (MEM) Here AME was about 10%-40% higher than MEM. * Marginal effects AME. margins, dydx(*) atmean Warning: cannot perform check for estimable functions. Conditional marginal effects Number of obs = 4412 Model VCE : Robust Expression : Predicted number of events, predict() dy/dx w.r.t. : private chronic female income at : private = (mean) chronic = (mean) female = (mean) income = (mean) Delta-method dy/dx Std. Err. z P> z [95% Conf. nterval] private chronic female income c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 16 Colin / 52 C

17 4. Marginal e ects Data Example Finite di erence method used to compute AME Use pre x i. to declare regressors as indicator variables Compared to calculus AME: F lower for private, higher for chronic, similar for female.. * Marginal effects using finite difference for binary regressors. quietly poisson docvis i.private i.chronic i.female income, vce(robust). margins, dydx(*) Average marginal effects Number of obs = 4412 Model VCE : Robust Expression : Predicted number of events, predict() dy/dx w.r.t. : 1.private 1.chronic 1.female income Delta-method dy/dx Std. Err. z P> z [95% Conf. nterval] 1.private chronic female income Note: dy/dx for factor levels is the discrete change from the base level. More generally pre xes i. and c. create factor variables to get marginal e ects with interactions. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 17 Colin / 52 C

18 5. Estimation Theory m-estimator 5. Estimation Theory: m-estimator An m-estimator bθ of the q 1 parameter vector θ maximizes Q N (θ) = 1 N N i=1 q(y i, x i, θ). Q N (θ) is the average or sum of N scalar sub-functions q i (). MLE: q(y i, x i, θ) = ln f (y i jx i, θ). OLS: q(y i, x i, β) = 1 2 (y i xi 0 β)2. NLS: q(y i, x i, β) = 1 2 (y i g(x i, β)) 2 for speci ed g(x i, β). Consider the local maximum of Q N (θ) that solves the estimating equation 1 q(y i, x i, θ) N N i=1 θ = 0. b θ c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 18 Colin / 52 C

19 5. Estimation Theory Consistency Consistency nformally bθ! p θ 0 if the sample condition 1 N N i=1 holds in the population: ] = 0. θ0 E[ q(y i,x i,θ) θ q(y i,x i,θ) θ = 0 b θ Here θ 0 is the true value in the data generating process (d.g.p.). Formal condition: bθ! p θ 0 if plim Q N (θ) is maximized at θ = θ 0. For Poisson with 1 N N i=1(y i exp(xi 0β))x i = 0 we require E[(y i exp(xi 0 β 0 ))x i ] = 0, which is the case if E[y i jx i ] = exp(xi 0 β 0 ). Poisson MLE consistent if the conditional mean is correctly speci ed. For other models correct mean may be insu cient for consistency. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 19 Colin / 52 C

20 5. Estimation Theory Asymptotic normality Asymptotic normality De ne the following: gradient term: g i (θ) = q i (θ)/ θ. Hessian term: H i (θ) = 2 q i (θ)/ θ θ 0 = g i (θ)/ θ 0 Asymptotic normality: bθ a N [θ 0, V[bθ]], The variance-covariance matrix of the estimator (VCE) is V[bθ] = (E [ i H i (θ)]) 1 E i j g i (θ)g j (θ) 0 E i H i (θ) 0 1 f data are independent over i we use the robust sandwich estimate V[bθ] = 1 i H i (bθ) i g i (bθ)g i (bθ) 0 i H i (bθ) 0 1 c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 20 Colin / 52 C

21 5. Estimation Theory Limit normal derivation Limit normal derivation The rst-order conditions are 1 N N i=1 g i (bθ) = 0. An exact- rst order Taylor series expansion yields 1 N i g i (bθ) = 1 N i g i (θ 0 ) + 1 N i g i (θ) θ 0 θ + ( bθ θ 0 ) = 1 N i g i (θ 0 ) + 1 N i H i (θ + )(bθ θ 0 ). Set this to zero ( rst-order conditions) and solve (bθ θ 0 ) = 1 N i H i (θ + ) 1 1 N i g i (θ 0 ) Linearized so like ( β b 1 β 0 ) = 1N i x i x 1N i i x i u i. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 21 Colin / 52 C

22 5. Estimation Theory Limit normal derivation We have p N( bθ θ 0 ) = 1 1 N i H i (θ + 1 ) p N i g i (θ 0 ) by a LLN and a CLT, where d! A0 1 N [0, B 0 ] d! N [0, A0 1 B 0A0 1] A 0 = plim 1 N i H i (θ 0 ) = lim E[ 1 N i H i (θ 0 )] B 0 = plim 1 N i j g i (θ 0 )g j (θ 0 ) 0 = lim E[ 1 N i j g i (θ 0 )g j (θ 0 ) 0 ] c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 22 Colin / 52 C

23 5. Estimation Theory Di erent Estimates of the VCE Di erent Estimates of the VCE Need to get estimate bv[bθ] of V[bθ] V[bθ] = (E [ i H i (θ)]) 1 E i j g i (θ)g j (θ) 0 (E [ i H i (θ) 0 ]) 1 Leading estimates Robust for independent: 1 bv rob [bθ] = i bh 1 N i N q i bg i bg 0 i i bh i Cluster-robust: 1 1 bv clus [bθ] = c bh C cc C 1 c bg c bg c 0 c bh cc Default for MLE: 1 bv def [bθ] = i E[H i (θ)]j b θ Clusters rewrite f.o.c. as C c=1 g c (bθ) = 0 for the c th of C clusters, where g c (θ) = i :i2c g i (θ). c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 23 Colin / 52 C

24 5. Estimation Theory Poisson Example Poisson Example For Poisson example N i=1 g i (β) = N i=1(y i exp(x 0 i β))x i N i=1 H i (β) = N i=1 exp(x 0 i β)x i x 0 i Three standard estimates are bv rob [ β] b = i e x0 bβ 1 i x i xi 0 i (y i e x0 bβ i ) 2 x i x 0 i i e x0 bβ i x i xi 0 bv clu [ β] b = i e x0 bβ 1 i x i xi 0 C c C 1 c bg c bg c 0 i e x0 bβ i x i xi 0 bv def [ β] b = i e x0 bβ 1 i x i xi bv clu [ b β] used bg c = i :i 2c (y i e x0 i bβ )x i. bv def [ b β] relies on E[(y i e x0 i β ) 2 ] = e x0 i β (so imposes V[yjx] = E[yjx]). c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 24 Colin / 52 C

25 6. Maximum likelihood estimator Properties 6. Maximum Likelihood Estimator (MLE) Special case of the preceding m-estimator theory. With N independent observations bθ ML maximizes the log-likelihood function: ln L(θ) = ln N i=1 f (y i jx i, θ) = N i=1 ln f (y i jx i, θ). The following properties hold provided essentially that 1. the true density is f (y i jx i, θ 0 ); and 2. the range of y does not depend on θ (regularity conditions) The MLE is consistent: bθ ML p! θ0. The MLE is asymptotically normal: bθ ML V[bθ ML ] = h 1 N i=1 E ln fi ln f i θ θ = ibθ 0 a N [θ, V[ bθ ML ]] with h N i=1 E 2 1 ln f i θ θ. ibθ 0 c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 25 Colin / 52 C

26 6. Maximum likelihood estimator Terminology ML Terminology Score: ln L(θ)/ θ. For correctly speci ed model E[ ln L(θ)/ θ] = 0. h i h i ln L(θ) nformation matrix: V θ = ln L(θ) ln L(θ) E θ θ 0. nformation matrix equality: h i h i ln L(θ) ln L(θ) E θ = θ 0 E 2 ln L(θ) θ θ 0 Holds if density correctly speci ed The asymptotic variance of the MLE is minus the inverse of the information matrix. The MLE is fully e cient as its variance matrix is the Cramer-Rao lower bound. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 26 Colin / 52 C

27 6. Maximum likelihood estimator Quasi-MLE Quasi-MLE: Misspeci ed density What if f (y i jx i, θ) is misspeci ed? The MLE is then called the quasi-mle or pseudo-mle The quasi-mle is inconsistent in general. p bθ QML! θ that maximizes plim N 1 ln L(θ). n general θ 6= θ 0, though there are some notable exceptions. θ = θ 0 if the speci ed density is in the linear exponential family (Poisson, binomial, one-parameter gamma, normal) and E[y jx] is correctly speci ed. The quasi-mle is still asymptotically normal with bθ QML a N [θ, V[bθ QML ]]. Given independence over i, robust standard errors are based on h i bv[bθ QML ] = i E 2 ln f i b 1 h i θ θ 0 i E ln fi ln f i b 1 θ θ θ 0 θ h i i E 2 ln f i b 1 θ θ. 0 θ c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 27 Colin / 52 C

28 7. Nonlinear least squares De nition 7. Nonlinear Least Squares: De nition Nonlinear regression model: y i = g(x i, β) + u i, where the function g() is speci ed. Exponential mean example: g(x i, β) = exp(x 0 i β). Nonlinear least squares (NLS) estimator b β NLS minimizes the sum of squared residuals S N (β) = N i=1 (y i g(x i, β)) 2. There is no explicit solution for b β NLS in general. So compute using iterative gradient methods. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 28 Colin / 52 C

29 7. Nonlinear least squares Properties Nonlinear Least Squares: Properties Assume that E[y i jx i ] = g(x i, β) so conditional mean correct. NLS is consistent and asymptotically normal with a bβ NLS N [β, V[ βnls b ]]. Given independence over i robust estimate of VCE is bv[ b β] = N i=1 g i g β i bβ N i=1 bu 2 i g i β 1 bβ β 0 g i bβ bβ β 0 N i=1 g i g β i bβ β 0 bβ 1 where bu i = (y i g(x i, b β)) is the NLS residual. Exactly the same as OLS except g i / βj bβ replaces x i! c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 29 Colin / 52 C

30 7. Nonlinear least squares Data Example NLS regression for the model y = exp(x 0 β) + u.. * Nonlinear least-squares regression (command nl). generate one = 1. nl (docvis = exp({xb: private chronic female income one})), vce(robust) nolog (obs = 4412) Nonlinear regression Number of obs = 4412 R-squared = Adj R-squared = Root MSE = Res. dev. = Robust docvis Coef. Std. Err. t P> t [95% Conf. nterval] /xb_private /xb_chronic /xb_female /xb_income /xb_one NLS slope coe s are: 0.71, 1.06, 0.43, Poisson slope coe s: 0.79, 1.09, 0.49, NLS robust standard errors are: 0.109, 0.056, 0.069, Poisson robust standard errors: 0.109, 0.056, 0.058, c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 30 Colin / 52 C

31 8. Statistical inference Wald test 8. Statistical inference: Wald test Wald test of linear restrictions on θ: same method as for OLS. Wald test of h nonlinear restrictions: H 0 : h(θ) = 0 against H a : h(θ) 6= 0. Base test on does h(bθ) ' 0? By Taylor series with R = h(θ)/ θ 0 The Wald test statistics are h(bθ) ' h(θ) + br(bθ θ) = br(bθ θ) under H 0 : h(θ) = 0 a N [0, brv[bθ]br 0 ] W = h(bθ) 0 [brbv[bθ]br 0 ] 1 h(bθ) a χ 2 (h) under H 0 F = (W/h) a F (h, N q) under H 0. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 31 Colin / 52 C

32 8. Statistical inference Con dence interval (delta method) Con dence interval (delta method) Consider scalar γ = g(θ) for speci ed function g(). A 95% con dence interval for γ is then bγ 1.96 sbγ 2. where sbγ 2 = γ b θ 0 bv[bθ] γ θ θ. b θ Derivation: g(bθ) ' g(θ) + g 0 (bθ)(bθ θ) bγ ' γ + γ b ( bθ θ) θ 0 θ Var[bγ] = γ b Var[ bθ] γ θ 0 θ θ b θ called the delta method as derivative taken. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 32 Colin / 52 C

33 8. Statistical inference Data example Data example Wald test of H 0 : β 2 /β 2 = 1 against H a : β 2 /β 2 = 1. 95% con dence interval for γ = β 2 /β * Wald test of nonlinear hypothesis. quietly poisson docvis private chronic female income, vce(robust). testnl _b[female]/_b[private]=1 (1) _b[female]/_b[private] = 1 chi2(1) = Prob > chi2 = * Delta method confidence interval. nlcom _b[female]/_b[private] - 1 _nl_1: _b[female]/_b[private] - 1 docvis Coef. Std. Err. z P> z [95% Conf. nterval] _nl_ Reject H 0 as p = ; 95% conf. interval is ( 0.59, 0.18). c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 33 Colin / 52 C

34 8. Statistical inference Likelihood-based tests Likelihood-based tests Test H 0 : h(θ 0 ) = 0 which imposes h restrictions. Three tests Wald, likelihood ratio, and Lagrange multiplier (or score) test are asymptotically equivalent under H 0 (all χ 2 (h)) asymptotically equivalent under local alternatives h(θ 0 ) = c/ p N. Wald most often used as can robustify. De ne L(θ) is the likelihood function. bθ u = unrestricted MLE that maximizes ln L(θ). eθ r = restricted MLE that maximizes ln L(θ) λ 0 h(θ). (where λ is vector of Lagrangian multipliers.) c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 34 Colin / 52 C

35 8. Statistical inference Likelihood-based tests 1 Wald test: Does h(bθ u ) ' 0? W = h(bθ u ) 0 [brv[ b β]br 0 ] 1 h(bθ u ). 2 Likelihood Ratio Test: Does ln L(eθ r ) ' ln L(bθ u )? LR = 2[ln L(eθ r ) ln L(bθ u )]. 3 Score test: Does ln L(θ)/ θ ' 0 when evaluated at θ = eθ r? [L(θ) is for unrestricted model so ln L(θ)/ θ = 0 at θ = bθ u ]. Score = LM = ln L e θ 0 (N ea 1 ) ln L θ θ r 4 LM test: does Lagrange multiplier λ e r ' 0? Equals 3. as maximizing ln L(θ) λ 0 h(θ) w.r.t. θ implies λ e r. e θ r ln L θ e θ r = h(θ)0 θ Score vector is a full rank matrix multiple of the Lagrange multipliers. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 35 Colin / 52 C e θ r

36 9. Gradient methods De nition 11. Gradient methods Consider estimator bθ that is a local maximum, solving g(bθ) = 0, where g(θ) = Q (θ) θ. terative gradient methods update the s th round estimate bθ s by a matrix weighted average of the gradient bθ s+1 = bθ s + A s g(bθ s ), s = 1,..., S. motivation: change bθ in direction with greatest impact on Q(bθ) i.e. where the gradient is largest, though need to scale the gradient. the weighting matrix A s is a q q matrix that depends on bθ s Newton-Raphson is leading example with A s = H(θ) = g (θ) θ 0 1 = 2 Q (θ) θ θ 0 1 is the Hessian matrix for the optimization problem. H(bθ s ) 1, where c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 36 Colin / 52 C

37 9. Gradient methods Newton-Raphson Newton-Raphson Motivate NR: second-order Taylor series expansion around bθ s Q(θ) = Q(bθ s ) + g(bθ s ) 0 (θ bθ s ) (θ bθ s ) 0 H(bθ s )(θ bθ s ) + R, where R is a remainder that we now ignore. Maximize the approximation Q (θ) w.r.t. respect to θ. Q (θ) θ = g(bθ s ) + H(bθ s )(θ bθ s ) = 0. Solving yields the Newton-Raphson iteration (θ bθ s ) = H(bθ s ) 1 g(bθ s ). This increases Q(θ) if H(bθ s ) is negative de nite (and we ignore R) This works especially well if Q(θ) is globally concave. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 37 Colin / 52 C

38 9. Gradient methods Stopping criteria Gradient methods: Stopping criteria Stop iterations when (1) small change in the coe cient vector (tolerance) (2) small change in the objective function (ltolerance) (3) small gradient relative to the Hessian (nrtolerance) (4) small gradient relative to the coe cients (gtolerance) (5) maximum number of iterations reached (i.e. NOT CONVERGED) Stata default values for these criteria can be changed See help maximize. Stata built-in commands use (1), (2) and (5) only. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 38 Colin / 52 C

39 9. Gradient methods Poisson Example Poisson Example For the Poisson MLE Q(β) = 1 N N i=1f exp(x 0 i β) + y i x 0 i β ln y i!g g(β) = 1 N N i=1(y i exp(x 0 i β))x i H(β) = 1 N N i=1 exp(x 0 i β)x i x 0 i. The Newton-Raphson iterations are bβ s+1 = β b h s + 1 N N i=1 exp(xi 0 β b i 1 s )x i xi 0 1 N N i=1(y i exp(xi 0 β b s ))x i. H(β) = X 0 DX where X is N k regressors and D = Diag[exp(xi 0 β)] is an N N diagonal matrix with positive entries. So H(β) is negative de nite for all β and objective function is globally concave. NR will work very well here unless regressors are highly multicollinear. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 39 Colin / 52 C

40 9. Gradient methods Application Application Apply Newton-Raphson to Poisson MLE using Mata.. * Newton-Raphson in Mata for Poisson MLE. * Set up data and local macros for dependent variable and regressors. generate cons = 1. local y docvis. local xlist private chronic female income cons. * Mata commands for Poisson MLE NR iterations. mata: mata (type end to exit) : st_view(y=.,., "`y'") // read in stata data to y and X : st_view(x=.,., tokens("`xlist'")) : b = J(cols(X),1,0) // compute starting values : n = rows(x) : iter = 1 // initialize number of iterations : cha = 1 // initialize stopping criterion c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 40 Colin / 52 C

41 9. Gradient methods Application : do { > mu = exp(x*b) > grad = X'(y-mu) // k x 1 gradient vector > hes = cross(x, mu, X) // negative of the k x k Hessian matrix > bold = b > b = bold + cholinv(hes)*grad > cha = (bold-b)'(bold-b)/(bold'bold) > iter = iter + 1 > } while (cha > 1e-16) // end of iteration loops : mu = exp(x*b) : hes = cross(x, mu, X) : vgrad = cross(x, (y-mu):^2, X) : vb = cholinv(hes)*vgrad*cholinv(hes)*n/(n-cols(x)) : iter // num iterations 13 : cha // stopping criterion e-24 : st_matrix("b",b') // pass results from Mata to Stata : st_matrix("v",vb) // pass results from Mata to Stata : end c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 41 Colin / 52 C

42 9. Gradient methods Application Mata computes b β in b and bv[ b β] in V. Use Stata command ereturn to nicely display results.. * Present results, nicely formatted using Stata command ereturn. matrix colnames b = `xlist'. matrix colnames V = `xlist'. matrix rownames V = `xlist'. ereturn post b V. ereturn display Coef. Std. Err. z P> z [95% Conf. nterval] private chronic female income cons Results are the same as from Stata command poisson. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 42 Colin / 52 C

43 10. Appendix: Theory for Extremum estimator De nition 10. Appendix: Theory for extremum estimator More general framework than m-estimators. Additionally includes generalized method of moments (GMM). Extremum estimator bθ maximizes Q N (θ) = Q N (y, X, θ). For a global maximum, so bθ = arg max θ2θ Q N (θ). Usually a local maximum, solving Q N (θ) θ = 0. b θ Leading examples: MLE, NLS, and GMM. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 43 Colin / 52 C

44 10. Appendix: Theory for Extremum estimator Consistency Consistency f Q N (θ) p! Q 0 (θ) (where Q 0 (θ) is nonstochastic) then the local maximum p! to each other So bθ p! θ 0 where θ 0 is the local maximum of Q 0 (θ). c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 44 Colin / 52 C

45 10. Appendix: Theory for Extremum estimator Consistency Consistency of Local Maximum (Amemiya (1985, Theorem 4.1.2)). Make the assumptions: (i) The parameter space Θ is an open subset of R q ; (ii) Q N (θ) is a measurable function of the data for all θ 2 Θ, and Q N (θ)/ θ exists and is continuous in an open neighborhood of θ 0 ; (iii) The objective function Q N (θ) converges uniformly in probability to Q 0 (θ) in an open neighborhood of θ 0, and Q 0 (θ) attains a unique local maximum at θ 0. Then one of the solutions to Q N (θ)/ θ = 0 is consistent for θ 0. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 45 Colin / 52 C

46 10. Appendix: Theory for Extremum estimator Limit normal distribution Limit normal distribution Based on exact rst-order Taylor series (mean-value theorem). For f () di erentiable there is always x + between x and x 0 such that f (x) = f (x 0 ) + f 0 (x + )(x x 0 ). Extend to f() a vector function of the vector θ f(bθ) = f(θ 0 ) + f(θ) θ 0 (bθ θ 0 ), θ + Here f(θ) = Q N (θ)/ θ is already a rst derivative. Then Q N (θ) = Q N (θ) θ θ + 2 Q N (θ) θ0 θ θ 0 (bθ θ 0 ). θ + b θ Set the right-hand side to 0 and solve for p N( bθ θ 0 ) p N(bθ θ 0 ) = 2 Q N (θ) θ θ 0 θ + 1 p Q N (θ) N θ. θ0 c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 46 Colin / 52 C

47 10. Appendix: Theory for Extremum estimator Limit normal distribution Asymptotic Normality of Local Maximum (Amemiya (1985, Theorem 4.1.3)). n addition to preceding assumptions for consistency assume: (i) 2 Q N (θ)/ θ θ 0 exists and is continuous in an open convex neighborhood of θ 0 ; (ii) 2 Q N (θ)/ θ θ 0 θ + converges in prob. to nite nonsingular matrix A 0 = plim 2 Q N (θ)/ θ θ 0 θ0 for any sequence θ + such that θ + p! θ 0 ; (iii) p d N Q N (θ)/ θj θ0! N [0, B0 ], where B 0 = plim hn Q N (θ)/ θ Q N (θ)/ θ 0 i θ0. Then p N( bθ θ 0 ) d! N [0, A 1 0 B 0A 1 0 ] where bθ is the consistent solution to Q N (θ)/ θ = 0. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 47 Colin / 52 C

48 11. Appendix: Theory for Poisson MLE Consistency 11. Appendix: Theory for Poisson MLE The Poisson quasi-mle maximizes Q N (β) = N 1 i n o e x0 i β + y i xi 0 β ln y i! Then n o Q 0 (β) = plimn 1 i e x0 i β + y i xi 0 β ln y i! n h = lim N 1 i E e x0 βi o i + E[y i xi 0 β] E [ln y i!] by LLN n h = lim N 1 i E e x0 βi h i o i + E e x0 i β 0 x 0 i β E [ln y i!]. The key step is plim N 1 i y i x 0 i β = lim N 1 i E y i xi 0 β by LLN h i = lim N 1 i E e x0 i β 0 x 0 i β if E[y i jx i ] = exp(xi 0 β 0 ). c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 48 Colin / 52 C

49 11. Appendix: Theory for Poisson MLE Consistency Poisson Consistency Then Q 0 (β) = lim N 1 i n h E e x0 βi h i i + E e x0 i β 0 x 0 i β o E [ln y i!] so (noting that E[ln y i!] depends on β 0 and not β) Q 0 (β) β i i = lim N 1 i E he x0 i β x i + lim N 1 i E he x0 i β 0 xi = 0 at β = β 0. So b β is consistent for β as Q N (β)! p Q 0 (β) and Q 0 (β) is maximized at β = β 0. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 49 Colin / 52 C

50 11. Appendix: Theory for Poisson MLE Consistency A simpler heuristic proof is that β b is consistent for β if " # Q N (β) E = N 1 β i (y i exp(xi 0 β 0 ))x i β0 = 0. This is the case if the d.g.p. such that E[y i jx i ] = exp(x 0 i β 0 ) i.e. that the conditional mean is correctly speci ed this result holds more generally if the speci ed density is in linear exponential family. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 50 Colin / 52 C

51 11. Appendix: Theory for Poisson MLE Limit normal distribution Poisson MLE: Limit normal distribution For Poisson p N( b β β0 ) = = First term by a LLN N 1 i 1 p N 1 i H i (β ) + N i g i (β 0 ) 1 N 1 i e x0 i β+ x i xi 0 N 1/2 i (y i e x0 i β 0 )x e x0 i β+ x i x 0 i p! A 0 = lim i E[e x0 i β 0 xi x 0 i ] Second term by a CLT and with independence over i N 1/2 i (y i exp(xi 0 β 0 ))x i h i p! N 0, B 0 = lim N 1 i E[E[(y i e x0 i β 0 ) 2 jx i ]x i xi 0 ] Combining p N( b β β0 ) d! N [0, A 1 0 B 0A 1 0 ]. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 51 Colin / 52 C

52 11. Appendix: Theory for Poisson MLE Limit normal distribution Note that if y i is Poisson distributed then E[(y i exp(x 0 i β 0 )) 2 jx i ] = V[y i jx i ] = E[y i jx i ] = exp(x 0 i β 0 ) so B 0 = A 0 and p N( β b β0 )! d N [0, A0 1]. c A. Colin Cameron Univ. of Calif.- Davis (Frontiers BGPE in Course: Econometrics Nonlinear Bavarian methods Graduate Program inmarch Economics 21-25, Based on A. 52 Colin / 52 C

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments Day 2A nstrumental Variables, Two-stage Least Squares and Generalized Method of Moments c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based