Advanced Econometrics - PDF Free Download

Advanced Econometrics Dr. Andrea Beccarini Center for Quantitative Economics Winter 2013/2014 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156

General information Aims and prerequisites Objective: learn to understand and use advanced econometric estimation techniques Applications in micro and macro econometrics and finance Prerequisites: Statistical Foundations (random vectors, stochastic convergence, estimators) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 2 / 156

General information Literature Russell Davidson and James MacKinnon, Econometric Theory and Methods, Oxford University Press, 2004. Various textbooks Andrea Beccarini (CQE) Econometrics Winter 2013/2014 3 / 156

General information Schedule Least squares estimation and method of moments Maximum likelihood estimation Instrument variables estimation GMM Indirect Inference Andrea Beccarini (CQE) Econometrics Winter 2013/2014 4 / 156

Least squares Linear regression Multiple linear regression model y = X β + u u N ( 0, σ 2 I ) OLS estimator Covariance matrix ˆβ = ( X X ) 1 X y ( ) Cov ˆβ = σ 2 ( X X ) 1 Gauss-Markov theorem Andrea Beccarini (CQE) Econometrics Winter 2013/2014 5 / 156

Least squares Nonlinear regression Notation of Davidson and MacKinnon (2004), y t = x t (β) + u t u t IID(0, σ 2 ) x t (β) is a nonlinear function of the parameter vector β Example: y t = β 1 + β 2 x t1 + 1 β 2 x t2 + u t Andrea Beccarini (CQE) Econometrics Winter 2013/2014 6 / 156

Least squares Nonlinear regression Minimize the sum of squared residuals T (y t x t (β)) 2 t=1 with respect to β Usually, the minimization must be done numerically Andrea Beccarini (CQE) Econometrics Winter 2013/2014 7 / 156

Method of moments Definition of moments Raw moment of order p Empirical raw moment of order p µ p = E(X p ) ˆµ p = 1 n n i=1 X p i for a simple random sample X 1,..., X n Andrea Beccarini (CQE) Econometrics Winter 2013/2014 8 / 156

Method of moments Basic idea: Step 1 Write r theoretical moments as functions of r unknown parameters µ 1 = g 1 (θ 1,..., θ r ). µ r = g r (θ 1,..., θ r ) Of course, central moments may be used as well Andrea Beccarini (CQE) Econometrics Winter 2013/2014 9 / 156

Method of moments Basic idea: Step 2 Invert the system of equations: Write the r unknown parameters as functions of the r theoretical moments θ 1 = h 1 (µ 1,..., µ r ). θ r = h r (µ 1,..., µ r ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 10 / 156

Method of moments Basic idea: Step 3 Replace all theoretical moments by empirical moments ˆθ 1 = h 1 (ˆµ 1,..., ˆµ r ). ˆθ r = h r (ˆµ 1,..., ˆµ r ) The estimators ˆθ 1,..., ˆθ r are moment estimators Andrea Beccarini (CQE) Econometrics Winter 2013/2014 11 / 156

Method of moments Properties of moment estimators Moment estimators are consistent since plimˆθ 1 = plim (h 1 (ˆµ 1, ˆµ 2,...)) = h 1 (plimˆµ 1, plimˆµ 2,...) = h 1 (µ 1, µ 2,...) = θ 1 In general, moment estimators are not unbiased and not efficient Since the empirical moments are asymptotically normal (why?), moment estimators are also asymptotically normal delta method [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 12 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Step 1: We know that E(X ) = µ 1 = 1/λ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Step 1: We know that E(X ) = µ 1 = 1/λ Step 2 (inversion): λ = 1/µ 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Step 1: We know that E(X ) = µ 1 = 1/λ Step 2 (inversion): λ = 1/µ 1 Step 3: The estimator is ˆλ = 1ˆµ 1 = 1 n 1 i X i = 1 X n Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Step 1: We know that E(X ) = µ 1 = 1/λ Step 2 (inversion): λ = 1/µ 1 Step 3: The estimator is Is ˆλ unbiased? ˆλ = 1ˆµ 1 = 1 n 1 i X i = 1 X n Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Method of moments Example Let X Exp (λ) with unknown parameter λ and let X 1,..., X n be a random sample Step 1: We know that E(X ) = µ 1 = 1/λ Step 2 (inversion): λ = 1/µ 1 Step 3: The estimator is Is ˆλ unbiased? ˆλ = 1ˆµ 1 = 1 n 1 i X i = 1 X n Alternative: Var(X ) = 1/λ 2, then ˆλ = 1/ S 2 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156

Maximum likelihood Basic idea The basic idea is very natural: Choose the parameters such that the probability (likelihood) of the observations x 1,..., x n as a function of the unknown parameters θ 1,..., θ r is maximized Likelihood function L(θ; x 1,..., x n ) = { P(X1 = x 1,..., X n = x n ; θ) f X1,...,X n (x 1,..., x n ; θ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 14 / 156

Maximum likelihood Basic idea For simple random samples Maximize the likelihood L(θ; x 1,..., x n ) = n f X (x i ; θ) i=1 L(ˆθ; x 1,..., x n ) = max θ Θ L(θ; x 1,..., x n ) ML estimate ˆθ = arg max L(θ; x 1,..., x n ) ML estimator ˆθ = arg max L(θ; X 1,..., X n ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 15 / 156

Maximum likelihood Basic idea Because sums are easier to deal with than products, and because sums are subject to limit laws, it is common to maximize the log-likelihood ln L(θ) = n ln f X (X i ; θ) i=1 The ML estimator is the same as before, since ˆθ = arg max ln L(θ; X 1,..., X n ) = arg max L(θ; X 1,..., X n ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 16 / 156

Maximum likelihood Basic idea Usually, we find ˆθ by solving the system of equations ln L/ θ 1 = 0 ln L/ θ r = 0 The gradient vector g(θ) = ln L(θ)/ θ is called score vector or score If the log-likelihood is not differentiable other maximization methods must be used. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 17 / 156

Maximum likelihood Example Let X Exp(λ) with density f (x; λ) = λe λx for x 0 and f (x; λ) = 0 else Likelihood of i.i.d. random sample Log-likelihood L(λ; x 1,..., x n ) = n i=1 λe λx i ln L(λ; x 1,..., x n ) = n ln λ λ n i=1 x i Andrea Beccarini (CQE) Econometrics Winter 2013/2014 18 / 156

Maximum likelihood Example Set the derivative to zero ln L(λ) λ = ṋ λ n i=1 x i! = 0, hence ˆλ = n n i=1 x i = 1 x The ML estimator for λ is ˆλ = 1 X Andrea Beccarini (CQE) Econometrics Winter 2013/2014 19 / 156

Maximum likelihood Properties of ML estimators: Preliminaries The log-likelihood and the score vector are ln L (θ) = ln L (θ) θ = n ln f X (X i ; θ) i=1 n i=1 ln f X (X i ; θ) θ The contributions ln f X (X i ; θ) are random variables The contributions ln f X (X i ; θ)/ θ are random vectors Hence, limit laws can be applied to the (normalized) sums Andrea Beccarini (CQE) Econometrics Winter 2013/2014 20 / 156

Maximum likelihood Properties of ML estimators: Preliminaries For all θ e ln L(θ) dx = L (θ; x 1,..., x n ) dx = 1 since L (θ) is a joint density function of X 1,..., X n Andrea Beccarini (CQE) Econometrics Winter 2013/2014 21 / 156

Maximum likelihood Properties of ML estimators: Preliminaries Define the matrix G (θ, X 1,..., X n ) of gradient contributions G ij (θ, X i ) = ln f X (X i ; θ) θ j The column sums are the gradient vector with elements g j (θ) = n G ij (θ, X i ) i=1 The expected gradient vector is E θ (g (θ)) = 0 [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 22 / 156

Maximum likelihood Properties of ML estimators: Preliminaries The covariance matrix of gradient vector Cov (g (θ)) = E ( g (θ) g (θ) ) is called information matrix (and often denoted I (θ)) Information matrix equality [P] Cov (g (θ)) = E (H (θ)) ( ) ( ln L (θ) 2 ) ln L (θ) Cov = E θ θ θ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 23 / 156

Maximum likelihood Properties of ML estimators 1 Equivariance: If ˆθ is the ML estimator for θ, then h(ˆθ) is the ML estimator for h(θ) 2 Consistency: plimˆθ n =θ 3 Asymptotic normality: ) d n (ˆθn θ U N (0, V (θ)) 4 Asymptotic efficiency: V (θ) is the Cramï 1 2r-Rao bound 5 Computability (analytical or numerical); the covariance matrix of the estimator is a by-product of the numerical method Andrea Beccarini (CQE) Econometrics Winter 2013/2014 24 / 156

Maximum likelihood Properties of ML estimators Equivariance: Let ˆθ be the ML estimator of θ Let ψ = h(θ) be a one-to-one function of θ with inverse h 1 (ψ) = θ Then the ML estimator of ψ satisfies d ln L(h 1 (ψ)) dψ which holds at ˆψ = h(ˆθ) = d ln L(θ) dh 1 (ψ) = 0 dθ dψ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 25 / 156

Maximum likelihood Properties of ML estimators Consistency The parameter θ is identified if for all θ θ and data x 1,..., x n ln L ( θ ) x 1,..., x n ln L (θ x1,..., x n ) The parameter θ is asymptotically identified if for all θ θ 0 plim 1 n ln L ( θ ) plim 1 n ln L (θ 0) where θ 0 is the true value of the parameter [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 26 / 156

Maximum likelihood Properties of ML estimators Asymptotic normality By definition, the ML estimator satisfies g(ˆθ) = 0 A first order Taylor series expansion of g around the true parameter vector θ 0 gives [P] g(ˆθ) = g (θ 0 ) + H (θ 0 ) (ˆθ θ 0 ) + rest Andrea Beccarini (CQE) Econometrics Winter 2013/2014 27 / 156

Maximum likelihood Covariance matrix estimation The (approximate) covariance matrix of ˆθ is [ Cov(ˆθ) = [E (H (θ 0 ))] 1 = E A consistent estimator of Cov(ˆθ) is ( 2 ln L(θ 0 ) θ 0 θ 0 ( [ ] 1 2 ln L(ˆθ) Ĉov(ˆθ) = H(ˆθ) = ˆθ ˆθ Often, H(ˆθ) is a by-product of numerical optimization ) 1 )] 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 28 / 156

Maximum likelihood Covariance matrix estimation An alternative consistent covariance matrix estimator is [ Ĉov(ˆθ) = G(ˆθ; X 1,..., X n ) G(ˆθ; X 1,..., X n ) ] 1 This estimator is called outer-product-of-the-gradient (OPG) estimator Advantage: Only the first derivatives are required Disadvantage: Less reliable in small samples Andrea Beccarini (CQE) Econometrics Winter 2013/2014 29 / 156

Maximum likelihood Example Numerical estimation of the parameters of N(µ, σ 2 ) Let X 1,..., X 50 be a random sample from X N(µ, σ 2 ) with µ = 5 and σ 2 = 9 Density function ( ) f X (x) = 1 exp 1 (x µ)2 2π 2 σ 2 Log-likelihood function ln L ( µ, σ 2) = n i=1 ln f X (x i ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 30 / 156

Maximum likelihood Example See numnormal.r Point estimates ( ˆµ ˆσ 2 ) = ( 3.64025 6.90869 Estimated covariance matrix derived numerically from H(ˆθ) Ĉov (ˆµ, ˆσ 2) ( ) 0.13817 0.00016 = 0.00016 1.90918 ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 31 / 156

Maximum likelihood Example See numnormal.r Point estimates ( ˆµ ˆσ 2 ) = ( 3.64025 6.90869 Estimated covariance matrix derived from theory Ĉov (ˆµ, ˆσ 2) ( ) 0.13817 0 = 0 1.90920 ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 32 / 156

Maximum likelihood Example of violated regularity conditions Let X be uniformly distributed on the interval [0, θ] The density function is { 1/θ for 0 x θ f X (x) = 0 else The likelihood function is { ( 1 ) n θ for θ max i x i L(θ x 1,..., x n ) = 0 else Andrea Beccarini (CQE) Econometrics Winter 2013/2014 33 / 156

Maximum likelihood Example of violated regularity conditions likelihood 0.0e+00 4.0e 13 8.0e 13 1.2e 12 0 1 2 3 4 5 6 L(θ) is not differentiable at max i x i θ Maximum is at ˆθ = max i x i The estimator is consistent but not asymptotically normal Illustration in R Andrea Beccarini (CQE) Econometrics Winter 2013/2014 34 / 156

Maximum likelihood Dependent observations Maximum likelihood estimation is still possible if the observations are dependent The joint density of the observations can be factorized as f X1 (x 1 ) f X1,...,X T (x 1,..., x T ) T f Xt X1 =x 1,...,X t 1 =x t 1 (x t ) t=2 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 35 / 156

Maximum likelihood Dependent observations Loglikelihood T ln L = ln f X1 (x 1 ) + ln f Xt X1 =x 1,...,X t 1 =x t 1 (x t ) t=2 If T is large, one may ignore ln f X1 (x 1 ) Computing the loglikelihood is straightforward if f Xt X1 =x 1,...,X t 1 =x t 1 (x t ) = f Xt Xt 1 =x t 1 (x t ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 36 / 156

Maximum likelihood The three classical tests Wald test, Lagrange multiplier test and likelihood ratio test (W, LM, LR) Hypotheses H 0 : r(θ) = 0 vs H 1 : r(θ) 0 Often, r is a scalar-valued function and θ is a scalar The function r may be non-linear! Andrea Beccarini (CQE) Econometrics Winter 2013/2014 37 / 156

Maximum likelihood The three classical tests Basic test ideas: Wald test: If r(θ) = 0 is true, then r(ˆθ ML ) will be close to 0 Likelihood ratio test: If r(θ) = 0 is true, then ln L(ˆθ R ) will not be far below ln L(ˆθ ML ) Lagrange multiplier test: If r(θ) = 0 is true, the score function g(ˆθ R ) = ln L(ˆθ R )/ θ will be close to 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 38 / 156

Maximum likelihood The three classical tests Example: Let X 1,..., X n be a random sample from X Exp(λ) Test H 0 : λ = 4 against H 1 : λ 4 Different notation: H 0 : r(λ) = 0 where r(λ) = λ 4 See threetests.r Andrea Beccarini (CQE) Econometrics Winter 2013/2014 39 / 156

Maximum likelihood Wald test Wald test Hypotheses H 0 : r(θ) = 0 H 1 : r(θ) 0 with functions r= (r 1,..., r m ) m is the number of restrictions Wald test: If r(θ) = 0 is true, then r(ˆθ ML ) will be close to 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 40 / 156

Maximum likelihood Wald test Asymptotically, under H 0 (by delta method!) ( ) r(ˆθ ML ) N 0, Cov(r(ˆθ ML )) with Cov(r(ˆθ ML )) = r(ˆθ ML ) θ Cov(ˆθ ML ) r(ˆθ ML ) θ Remember: If X N(µ, Σ), then (X µ) Σ 1 (X µ) χ 2 m Wald test statistic [ ] 1 W = r(ˆθ ML ) Cov(r(ˆθ ML )) r(ˆθ ML ) asy χ 2 m Andrea Beccarini (CQE) Econometrics Winter 2013/2014 41 / 156

Maximum likelihood Wald test Remarks: Reject H 0 if W is larger than the (1 α)-quantile of the χ 2 m-distribution Usually, Cov(r(ˆθ ML )) must be replaced by Ĉov(r(ˆθ ML )) The Wald test is not invariant with respect to re-parametrizations The Wald test only requires the unrestricted ML estimator Ideal, if ˆθ ML is much easier to calculate than ˆθ R Andrea Beccarini (CQE) Econometrics Winter 2013/2014 42 / 156

Maximum likelihood Likelihood ratio test Likelihood ratio test Is ln L(ˆθ ML ) significantly larger than ln L(ˆθ R )? LR test statistic ( ) L(ˆθ R ) LR = 2 ln L(ˆθ ML ) ( ) = 2 ln L(ˆθ R ) ln L(ˆθ ML ) Asymptotic distribution: LR asy χ 2 m Andrea Beccarini (CQE) Econometrics Winter 2013/2014 43 / 156

Maximum likelihood Likelihood ratio test Remarks: Reject H 0 if LR is larger than the (1 α)-quantile of the χ 2 m-distribution To compute LR, one requires both the unrestricted estimator ˆθ ML and the restricted estimator ˆθ R Ideal, if both ˆθ ML and ˆθ R are easy to calculate The LR test is often used to compare different models to each other Andrea Beccarini (CQE) Econometrics Winter 2013/2014 44 / 156

Maximum likelihood Lagrange multiplier test Lagrange multiplier test Is g(ˆθ R ) significantly different from 0? The test is based on the restricted estimator ˆθ R Lagrange approach: max θ ln L(θ) s.t. r(θ) = 0 LM test statistic LM = g(ˆθ R ) [ ] 1 I (ˆθ R ) g(ˆθ R ) asy χ 2 m with I (ˆθ R ) = E ( ) 2 ln L(ˆθ R ) θ θ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 45 / 156

Maximum likelihood Lagrange multiplier test Remarks: Reject H 0 if LM is larger than the (1 α)-quantile of the χ 2 m-distribution The LM test only requires the restricted estimator Ideal, if ˆθ R is much easier to calculate than ˆθ ML The LM test is often used to test misspecifications (heteroskedasticity, autocorrelation, omitted variables etc.) Asymptotically, the three tests are equivalent Andrea Beccarini (CQE) Econometrics Winter 2013/2014 46 / 156

Maximum likelihood The three classical tests Multivariate case Example: Production function where u i N(0, 0.05 2 ) Log-likelihood function ln L(a 1, a 2 ) Y i = X a 1 i1 X a 2 i2 + u i ML estimators â 1 and â 2 Hypothesis test of a 1 + a 2 = 1 or a 1 + a 2 1 = 0 See classtest.r Andrea Beccarini (CQE) Econometrics Winter 2013/2014 47 / 156

Instrumental variables Preliminaries OLS is not consistent if E (u t X t ) 0 Define an information set Ω t (a σ-algebra), such that E (u t Ω t ) = 0 This moment condition can be used for estimation Variables in Ω t are called instrumental variables (or instruments) We denote the instrument vector by W t Andrea Beccarini (CQE) Econometrics Winter 2013/2014 48 / 156

Instrumental variables Correlation between errors and disturbances (I) Errors in variables Consider the model y t = α + βx t + ε t, ε t iid(0, σ 2 ε) The exogenous variable xt is unobservable We can only observe x t = xt + v t where v t iid(0, σv 2 ) are independent of everything else Estimators of y t = α + βx t + u t are inconsistent [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 49 / 156

Instrumental variables Correlation between errors and disturbances (II) Omitted variables bias Let y t = α + β 1 x 1t + β 2 x 2t + ε t If x 2 is unobservable, one estimates y t = α + β 1 x 1t + u t where u t = β 2 x 2t + ε t If x 2t and x 1t are correlated then so are u t and x 1t Andrea Beccarini (CQE) Econometrics Winter 2013/2014 50 / 156

Instrumental variables Correlation between errors and disturbances (III) Endogeneity Standard example: supply and demand curves determine both price and quantity q t = γ d p t + X d t β d + u d t q t = γ s p t + X s t β s + u s t Solve for q t and p t [ qt p t ] [ 1 γd = 1 γ s ] 1 ([ X d t β d X s t β s ] + ( u d t u s t )) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 51 / 156

Instrumental variables Correlation between errors and disturbances (III) Since q t and p t depend on both ut d estimation of and u s t single equation OLS q t = γ d p t + Xt d β d + ut d q t = γ s p t + Xt s β s + ut s is inconsistent The right hand side variable p t is correlated with the error term The condition E (u t Ω t ) = 0 is violated if p t is in Ω t Andrea Beccarini (CQE) Econometrics Winter 2013/2014 52 / 156

Instrumental variables Correlation between errors and disturbances Warning! Inconsistency is not always a problem If we simply want to forecast, we can use inconsistent estimators Trivial example: 10 20 30 40 50 0 50 100 150 200 Positive correlation between u and X x y true regression line Andrea Beccarini (CQE) Econometrics Winter 2013/2014 53 / 156

Instrumental variables The simple IV estimator Let W denote the T K matrix of instruments All columns of X with X t Ω t should be included in W Then E (u t W t ) = 0 implies the moment condition E ( W u ) = E ( W (y X β) ) = 0 The IV estimator is a method of moment estimator The solution is ˆβ IV = ( W X ) 1 W y Andrea Beccarini (CQE) Econometrics Winter 2013/2014 54 / 156

Instrumental variables Properties The simple IV estimator is consistent if plim 1 n W X = S WX is deterministic and nonsingular The simple IV estimator is asymptotically normal, ( n ˆβ IV β) U N (0, σ 2 (S WX ) 1 ( ) ) S WW S 1 WX where S WW = plim 1 n W W [P] [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 55 / 156

Instrumental variables How to find instruments Instruments must be 1 exogenous, i.e. plim 1 n W u = 0 2 valid, i.e. plim 1 n W X = S WX non-singular Natural experiments (weather, earthquakes,... ) Angrist and Pischke (2009): Good instruments come from a combination of institutional knowledge and ideas about the processes determining the variable of interest. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 56 / 156

Instrumental variables How to find instruments Examples Natural experiments 1 Brï 1 2ckner and Ciccone: Rain and the democratic window of opportunity, Econometrica 79 (2011) 923-947 2 Angrist and Evans: Children and their parents labor supply: Evidence from exogenous variation in family size, American Economic Review 88 (1998) 450-77. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 57 / 156

Instrumental variables How to find instruments Examples Institutional arrangements 1 Angrist and Krueger: Does Compulsory School Attendance Affect Schooling and Earnings?, Quarterly Journal of Economics 106 (1991) 979-1014. 2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidence from Prison Overcrowding Litigation, Quarterly Journal of Economics 111 (1996) 319-351. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 58 / 156

Instrumental variables How to find instruments In a time series context, one can sometimes use lagged endogenous regressors as instrumental variables Example: with E(u t x t ) 0 y t = α + βx t + u t If Cov (x t, x t 1 ) 0 but Cov (u t, x t 1 ) = 0, then x t 1 can be used as instrumental variable Attention: Cov (u t, x t 1 ) = 0 is not always obvious Andrea Beccarini (CQE) Econometrics Winter 2013/2014 59 / 156

Instrumental variables How to find instruments Example (Measurement error in time series) Consider the model y t = α + βx t + u t xt = ρxt 1 + ε t x t = xt + v t. Then x t 1 is a valid instrument for a regression of y t on x t, and α and β will be estimated consistently. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 60 / 156

Instrumental variables How to find instruments Example (Omitted variable bias in time series) Consider the model y t = α + β 1 x 1t + β 2 x t2 + u t x 1t = ρ 11 x 1,t 1 + ρ 12 x 2,t 1 + ε 1t x 2t = ρ 21 x 1,t 1 + ρ 22 x 2,t 1 + ε 2t Then x 1,t 1 is not a valid instrument for a regression of y t on x 1t, and α and β 1 will not be estimated consistently. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 61 / 156

Instrumental variables How to find instruments Example (Endogeneity in time series) Consider the model y t = α + β 1 x t + β 2 y t 1 + u t x t = γ + δ 1 y t + δ 2 x t 1 + v t Then x 1,t 1 is a valid instrument for a regression of y t on x t and y t 1, and α, β 1 and β 2 will be estimated consistently. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 62 / 156

Instrumental variables Generalized IV estimation If the number of instruments L is larger than the number of parameters K, the model is overidentified Right-multiply the T L matrix W by an L K matrix J to obtain an T K instrument matrix WJ Linear combinations of the instruments in W One can show that the asymptotically optimal matrix is J = (W W ) 1 W X Andrea Beccarini (CQE) Econometrics Winter 2013/2014 63 / 156

Instrumental variables Generalized IV estimation The generalized IV estimator is ˆβ IV = ( (WJ) X ) 1 (WJ) y ( = X W ( W W ) ) 1 1 W X X W ( W W ) 1 W y = ( X P W X ) 1 X P W y with P W = W (W W ) 1 W Consistency and asymptotic normality still hold Andrea Beccarini (CQE) Econometrics Winter 2013/2014 64 / 156

Instrumental variables Generalized IV estimation The two-stage-least-squares (2SLS) interpretation The matrix J is similar to ˆβ in the standard OLS model, Hence, WJ is similar to X ˆβ J = ( W W ) 1 W X The optimal instruments are obtained if we regress the endogenous regressors on the instruments (1st stage), and then use the fitted values as regressors (2nd stage) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 65 / 156

Instrumental variables Finite sample properties The finite sample properties of IV estimators are complex In the overidentified case, the first L K moments exist, but higher moments do not If the expectation exists, IV estimators are in general biased The simple IV estimator has very heavy tails, even the first moment does not exist! The estimator can be extremely far off the true value ivfinite.r Andrea Beccarini (CQE) Econometrics Winter 2013/2014 66 / 156

Instrumental variables Hypothesis testing Exact hypothesis tests are usually not feasible Asymptotic tests are based on the asymptotic normality An estimator of the covariance matrix of ˆβ IV is ( ) Ĉov ˆβ IV = ˆσ 2 ( X P W X ) 1 with P W = W ( W W ) 1 W ˆσ 2 = 1 ) ) (y X ˆβ IV (y X ˆβ IV n Andrea Beccarini (CQE) Econometrics Winter 2013/2014 67 / 156

Instrumental variables Hypothesis testing Asymptotic t-test H 0 : β i = β i0 H 1 : β i β i0 Under the null hypothesis, the test statistic t = ˆβ i β i0 ( ) Var ˆβ i is asymptotically N(0, 1) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 68 / 156

Instrumental variables Hypothesis testing Asymptotic Wald test (similiar to an F -test) H 0 : β 2 = β 20, H 1 : β 2 β 20 where β 2 is a length L subvector of β Under the null hypothesis, the test statistic ( ) ( )] 1 ( ) W = ˆβ2 β 20 [Ĉov ˆβ2 ˆβ2 β 20 is asymptotically χ 2 with L degrees of freedom Andrea Beccarini (CQE) Econometrics Winter 2013/2014 69 / 156

Instrumental variables Hypothesis testing Testing overidentifying restrictions The identifying restrictions are E(u t W t ) = 0 or E ( W u ) = 0 If the model is just identified the validity of the restriction cannot be tested If the model is overidentified, one can test if the overidentifying restrictions hold, i.e. if the instruments are valid and exogenous Andrea Beccarini (CQE) Econometrics Winter 2013/2014 70 / 156

Instrumental variables Hypothesis testing Basic test idea: Check if the IV residuals can be explained by the full set of instruments Compute the IV residuals û Regress the residuals on all instruments W Under the null hypothesis, the test statistic nr 2 χ 2 m where m is the degree of overidentification Andrea Beccarini (CQE) Econometrics Winter 2013/2014 71 / 156

Instrumental variables Hypothesis testing Davidson and MacKinnon (2004, p. 338): Even if we do not know quite how to interpret a significant value of the overidentification test statistic, it is always a good idea to compute it. If it is significantly larger than it should be by chance under the null hypothesis, one should be extremely cautious in interpreting the estimates, because it is quite likely either that the model is specified incorrectly or that some of the instruments are invalid. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 72 / 156

Instrumental variables Hypothesis testing Durbin-Wu-Hausman test H 0 : E ( X u ) = 0 H 1 : E ( W u ) = 0 Test if IV estimation is really necessary or if OLS would do Under H 1, OLS is inconsistent, but IV is still consistent Basic test idea: Compare ˆβ OLS and ˆβ IV. If they are too different, reject H 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 73 / 156

Instrumental variables Hypothesis testing The difference between the estimators is ˆβ IV ˆβ OLS = ( X P W X ) 1 X P W y ( X X ) 1 X y Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156

Instrumental variables Hypothesis testing The difference between the estimators is ˆβ IV ˆβ OLS = ( X P W X ) 1 X P W y ( X X ) 1 X y = ( X P W X ) ( 1 X P W y ( X P W X ) ( X X ) ) 1 X y Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156

Instrumental variables Hypothesis testing The difference between the estimators is ˆβ IV ˆβ OLS = ( X P W X ) 1 X P W y ( X X ) 1 X y = ( X P W X ) ( 1 X P W y ( X P W X ) ( X X ) ) 1 X y = ( X P W X ) ( 1 X P W (I X ( X X ) ) ) 1 X y Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156

Instrumental variables Hypothesis testing The difference between the estimators is ˆβ IV ˆβ OLS = ( X P W X ) 1 X P W y ( X X ) 1 X y = ( X P W X ) ( 1 X P W y ( X P W X ) ( X X ) ) 1 X y = ( X P W X ) ( 1 X P W (I X ( X X ) ) ) 1 X y = ( X P W X ) 1 ( X P W M X y ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156

Instrumental variables Hypothesis testing We need to test if X P W M X y is significantly different from 0 This term is identically equal to zero for all variables in X that are instruments (i.e. that are also in W ) Denote by X all possibly endogenous regressors To test if X P W M X y is significantly different from zero, perform a Wald test of δ = 0 in the regression y = X β + P W X δ + u Andrea Beccarini (CQE) Econometrics Winter 2013/2014 75 / 156

GMM Model description Hansen, L. (1982), Large Sample Properties of Generalized Method of Moments Estimators, Econometrica 50, 1029-1054: In this paper we study the large sample properties of a class of generalized method of moments (GMM) estimators which subsumes many standard econometric estimators. To motivate this class, consider an econometric model whose parameter vector we wish to estimate. The model implies a family of orthogonality conditions that embed any economic theoretical restrictions that we wish to impose or test. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 76 / 156

GMM Model description John Cochrane (2005), Asset Pricing, p. 196: Most of the effort involved with GMM is simply mapping a given problem into the very general notation. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 77 / 156

GMM Model description Describe the model by elementary zero functions E θ (f t (θ, y t )) = 0 where everything can be vector-valued Parameter vector θ of length K Observation vectors y t Identification condition E θ0 (f t (θ, y t )) 0 for all θ θ 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 78 / 156

GMM Model description Example (Linear regression model) Consider the standard model y = X β + u u N(0, σ 2 I ), independent of X Parameter vector θ =? Observations y t =? Elementary zero functions f t (θ, y t ) =? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 79 / 156

GMM Model description Example (Lognormal distribution) Suppose there is a random sample X 1,..., X n from X LN(µ, σ 2 ) Parameter vector θ =? Observations y t =? Elementary zero functions f t (θ, y t ) =? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 80 / 156

GMM Model description Example (Asset pricing) The basic asset pricing formula is p t = E (m t+1 x t+1 Ω t ) with asset price p, stochastic discount factor m, payoff x, and information set Ω t. Parameter vector θ =? Observations y t =? Elementary zero functions f t (θ, y t ) =? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 81 / 156

GMM Model description Stack all elementary zero functions f 1 (θ, y 1 ) f (θ, y) = Covariance matrix. f n (θ, y n ) E ( f (θ, y) f (θ, y) ) = Ω Dimension of Ω depends on dimension of f t (θ, y t ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 82 / 156

GMM Model description Example (Linear regression model) The covariance matrix Ω is E (f (θ, y) f (θ, y) ) = E ( u u ) = σ 2 I If there are autocorrelation and heteroskedasticity E ( u u ) = Ω Andrea Beccarini (CQE) Econometrics Winter 2013/2014 83 / 156

GMM Model description Example (Lognormal distribution) The covariance matrix Ω is E (f (θ, y) f (θ, y) ) = f 2 11 f 11 f 12... f 11 f n1 f 11 f n2 f 12 f 11 f12 2... f 12 f n1 f 12 f n2 E....... f n1 f 11 f n1 f 12... fn1 2 f n1 f n2 f n2 f 11 f n2 f 12... f n2 f n1 fn2 2 =? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 84 / 156

GMM Model description Example (Asset pricing) The covariance matrix Ω is f 2 11... f 11 f n1 E (f (θ, y) f (θ, y) ) = E..... f n1 f 11... fn1 2 =? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 85 / 156

GMM Estimating equations To estimate θ, we need K estimating equations In general, they are weighted averages of the f t In most cases, the estimating equations are based on L K instrumental variables W If L > K, we need to form linear combinations Let W be the n L matrix of instruments and J be an L K matrix of full rank Define the n K matrix Z = WJ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 86 / 156

GMM Estimating equations Theoretical moment conditions (orthogonality conditions) E ( Z tf t (θ, y t ) ) = 0 The estimating equations are the empirical counterpart 1 n Z f (θ, y) = 0 Solving this system yields the GMM estimator ˆθ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 87 / 156

GMM Estimating equations Example (Linear regression model) The K moment conditions for the linear regression model are E ( Z tf t (θ, y t ) ) = E ( X t ( yt X tβ )) = 0 and the estimating equations are 1 n X (y X β) = 0. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 88 / 156

GMM Estimating equations Example (Lognormal distribution) The two moment conditions for the lognormal distribution are E ( Z tf t (θ, y t ) ) ([ ] [ ]) 1 0 ft1 (θ, y = E t ) 0 1 f t2 (θ, y t ) ([ Xt exp ( µ + 1 = E 2 σ2) ]) Xt 2 exp ( 2µ + 2σ 2) ( ) 0 = 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 89 / 156

GMM Estimating equations Example (contd)... and the estimating equations are 1 n Z f (θ, y) = 1 [ 1 0 1 0... 1 0 n 0 1 0 1... 0 1 = [ 1 n 1 n n t=1 ] f 11 f 12. f n1 f n2 n ( t=1 Xt exp ( ( µ + 1 2 σ2)) ] X 2 t exp ( 2µ + 2σ 2)) = [ 0 0 ] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 90 / 156

GMM Properties of GMM estimators Consistency Assume that a law of large numbers applies to 1 n Z f (θ, y) Define the limiting estimation functions α (θ) = plim 1 n Z f (θ, y) and the limiting estimation equations α (θ) = 0 The GMM estimator ˆθ is consistent if the asymptotic identification condition holds, α (θ) α (θ 0 ) for all θ θ 0 [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 91 / 156

GMM Properties of GMM estimators Asymptotic normality Simplified notation: f t (θ) = f t (θ, y t ), f (θ) = f (θ, y) Additional assumption: f t (θ) is continuously differentiable at θ 0 First order Taylor series expansion of 1 n Z f (θ) = 0 in ˆθ around θ 0 [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 92 / 156

GMM Asymptotic efficiency The asymptotic distribution of ) n (ˆθ θ0 is normal with mean 0 and covariance matrix ( plim 1 ) 1 ( n Z F (θ 0 ) plim 1 n Z ΩZ ) ( plim 1 ) 1 n F (θ 0) Z What is the optimal choice of Z in the estimating equations? The optimal choice depends on assumptions about the matrices F (θ) and Ω Andrea Beccarini (CQE) Econometrics Winter 2013/2014 93 / 156

GMM Asymptotic efficiency If Ω = σ 2 I and E(F t (θ 0 )f t (θ 0 )) = 0 the optimal choice is Z = F (θ 0 ) Problem: Z depends on the unknown θ 0 Solution: Solve the estimating equations 1 n F (θ)f (θ) = 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 94 / 156

GMM Asymptotic efficiency If Ω = σ 2 I and E(F t (θ 0 )f t (θ 0 )) 0 but W t Ω t, the optimal choice is Z = P W F (θ 0 ) Problem: Z depends on the unknown θ 0 Solution: Solve the estimating equations 1 n F (θ)p W f (θ) = 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 95 / 156

GMM Asymptotic efficiency Suppose, the covariance matrix Ω is unknown Since Z = WJ, the covariance matrix of n(ˆθ θ 0 ) is (plim 1n ) 1 ( J W F 0 plim 1 ) ( n J W ΩWJ plim 1 ) 1 n F 0 WJ For the optimal J = (W ΩW ) 1 W F 0 this becomes (plim 1n F 0W ( W ΩW ) 1 W F 0 ) 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 96 / 156

GMM Asymptotic efficiency Although Ω cannot be estimated consistently, the term 1 n W ΩW can be estimated consistently (we will do that later) If ˆΣ is an estimator of 1 n W ΩW, the optimal estimating equations are 1 n J W f (θ) = 1 n F (θ) W ˆΣ 1 W f (θ) = 0 and the estimated covariance matrix of ˆθ is Ĉov(ˆθ) = n ( ˆF W ˆΣ 1 W ˆF ) 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 97 / 156

GMM Alternative notation Attention Many textbooks use a different notation (and so does the gmm package in R) The two approaches are equivalent The moment conditions are notated as E (g (θ, y t )) = E ( W t f t (θ, y) ) = 0 The number of moment conditions L can be larger than the number of parameters K Andrea Beccarini (CQE) Econometrics Winter 2013/2014 98 / 156

GMM Alternative notation The L estimating equations cannot be solved exactly ḡ n (θ, y) = 1 n The GMM estimator is defined by n g(θ, y t ) = 0 t=1 ˆθ = arg min ḡ n (θ, y) A n ḡ n (θ, y) where A n is a sequence of L L weighting matrices (which can be chosen by the user) with limit A Andrea Beccarini (CQE) Econometrics Winter 2013/2014 99 / 156

GMM Alternative notation The GMM estimator based on ḡ n is consistent, ˆθ p θ Asymptotic normality: Define the L K matrix G(θ) = ḡ n (θ, y t ) θ = 1 n n t=1 g(x t, θ) θ Assume that nḡ n (θ, y) d N (0, V ), then n (ˆθ θ0 ) d N (0, ( G AG ) 1 G AVAG ( G A G ) 1 ) [P] Asymptotically optimal weighting matrix A [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 100 / 156

GMM Equivalence The two GMM approaches (based on f t and g) are equivalent The first order condition of ḡ(θ) Aḡ(θ) is G K L A L L g = 0 L 1 K 1 which is the same as J K L W L n f = 0 n 1 K 1 List of equivalences [P] Andrea Beccarini (CQE) Econometrics Winter 2013/2014 101 / 156

GMM Covariance matrix estimation The covariance matrix of the elementary zero functions is often unknown E ( f (θ, y) f (θ, y) ) = Ω There may be heteroskedasticity and autocorrelation in Ω Although Ω cannot be estimated consistently, the term 1 n W ΩW can be estimated consistently Andrea Beccarini (CQE) Econometrics Winter 2013/2014 102 / 156

GMM Covariance matrix estimation Write 1 Σ = plim n n W ΩW Assume that a suitable law of large numbers holds, where f t = f t (θ, y t ) 1 Σ = lim n n n n t=1 s=1 E ( f t f s W t W s ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 103 / 156

GMM Covariance matrix estimation Define the autocovariance matrices { 1 n n t=j+1 Γ(j) = E (f tf t j W t W t j ) for j 0 ) n t= j+1 (f E t+j f t W t+j W t for j < 0 Then Σ = lim 1 n n 1 n j= n+1 Γ(j) = lim n n 1 ( Γ(0) + Γ(j) + Γ (j) ) j=1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 104 / 156

GMM Covariance matrix estimation The autocovariance matrix Γ(j), j 0, can be estimated by ˆΓ(j) = 1 n Newey-West estimator of Σ ˆΣ = ˆΓ(0) + p j=1 n t=j+1 ˆf t ˆft j W t W t j ( 1 j ) (ˆΓ(j) + ˆΓ (j)) p + 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 105 / 156

GMM Test of overidentifying restrictions The GMM estimators minimize the criterion function 1 n f (θ)w ˆΣ 1 W f (θ) Asymptotically, the minimized value (Hansen s J statistics, Hansen s overidentification statistic, Hansen-Sargan statistic) is distributed as χ 2 L K if the overidentifying restrictions hold If the null hypothesis is rejected, then something went wrong, e.g. the model is misspecified Andrea Beccarini (CQE) Econometrics Winter 2013/2014 106 / 156

Indirect inference Basic idea Anthony Smith, Jr. (New Palgrave Dictionary of Economics): Indirect inference is a simulation-based method for estimating the parameters of economic models. Its hallmark is the use of an auxiliary model to capture aspects of the data upon which to base the estimation. The parameters of the auxiliary model can be estimated using either the observed data or data simulated from the economic model. Indirect inference chooses the parameters of the economic model so that these two estimates of the parameters of the auxiliary model are as close as possible. Andrea Beccarini (CQE) Econometrics Winter 2013/2014 107 / 156

Indirect inference The true model Economic model y t = G (y t 1, x t, u t ; β), t = 1,..., T Exogenous variables x t and endogenous variables y t Random errors u t, i.i.d. with cdf F Parameter vector β of dimension K Let standard estimation methods for β be intractable It must be possible (and easy) to simulate y 1,..., y T given y 0 (assumed to be known), x 1,..., x T and β Andrea Beccarini (CQE) Econometrics Winter 2013/2014 108 / 156

Indirect inference The auxiliary model The true model is too complicated for estimation of β Instead estimate an auxiliary model with parameter vector θ The dimension L of θ must be at least as large as the dimension K of β The auxiliary model must be suitable (but is allowed to be misspecified) easy and fast to estimate Often, the auxiliary model is a standard time series model Andrea Beccarini (CQE) Econometrics Winter 2013/2014 109 / 156

Indirect inference Estimating the auxiliary model For given β (and y 0, x 1,..., x T ), the auxiliary model s parameters θ are estimated 1 from the observed data x 1,..., x T, y 1,..., y T, resulting in estimator ˆθ 2 from H simulated datasets x 1,..., x T, ỹ (h) 1,..., ỹ (h) T for h = 1,..., H, resulting in estimators θ (h) (β) Define θ(β) = 1 H H θ (h) (β) h=1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 110 / 156

Indirect inference Optimization Compute the difference between the vectors ˆθ and θ(β) ) Q(β) = (ˆθ θ(β)) W (ˆθ θ(β) where W is a positive definite weighting matrix The indirect inference estimator of β is ˆβ = arg min Q(β) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 111 / 156

Indirect inference Remarks The simulations have to be done with the same set of random errors Indirect inference is similar to GMM: the auxiliary parameters are the moments The asymptotic distribution of ˆβ can be derived (see Gourieroux et al., 1993) The weighting matrix W can be chosen optimally Andrea Beccarini (CQE) Econometrics Winter 2013/2014 112 / 156

Indirect inference A simple example (Gourieroux et al., 1993) Consider the MA(1) process y t = ε t βε t 1 with ε t N(0, 1) and β = 0.5 for t = 1,..., 250 The maximum likelihood estimator ˆβ ML is not trivial Indirect inference estimator ˆβ II of β? Auxiliary model: AR(3) with parameters θ No weighting, the matrix W is the identity matrix Andrea Beccarini (CQE) Econometrics Winter 2013/2014 113 / 156

Indirect inference A simple example (Gourieroux et al., 1993) Compare the distribution of ˆβ ML and ˆβ II Step 1: Simulate a time series y 1,..., y 250 Step 2: Compute ˆβ ML Step 3: Estimate ˆθ from y 1,..., y 250 Step 4: For given β, simulate 10 paths ỹ (h) 1,..., ỹ (h) 250 Step 5: Estimate θ(β) from the simulated paths Step 6: Repeat steps 4 and 5 for different β until the difference between ˆθ and θ(β) is minimized Step 7: Save ˆβ II and start again at step 1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 114 / 156

Bootstrap Basic idea Point of departure: unknown distribution function F (univariate or multivariate) Unknown parameter vector θ = θ(f ) Simple random sample X 1,..., X n from F Estimator ˆθ = ˆθ(X 1,..., X n ) Why is the distribution of ˆθ of interest? Andrea Beccarini (CQE) Econometrics Winter 2013/2014 115 / 156

Bootstrap Basic idea Basic bootstrap idea: Approximate the unknown distribution of by the distribution of ˆθ(X 1,..., X n ) for X 1,..., X n i.i.d. from F ˆθ(X 1,..., X n ) for X 1,..., X n i.i.d. from ˆF The distribution of ˆθ under ˆF is usually found by Monte-Carlo simulations based on resamples (pseudo sample) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 116 / 156

Bootstrap Basic idea How is F estimated? parametric parametric bootstrap nonparametric nonparametric bootstrap smoothed smooth bootstrap model based Applications bias and standard errors confidence intervals hypothesis tests Andrea Beccarini (CQE) Econometrics Winter 2013/2014 117 / 156

Bootstrap Example 1 Nonparametric bootstrap of the standard error of ˆθ = X = 1 n n i=1 X i Simple random sample X 1,..., X 20 Estimation of the unknown cdf F by the empirical distribution function F n (x) = 1 n 1 (X i x) n i=1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 118 / 156

Bootstrap Example 1 (contd) How is X distributed under F? How is X distributed under ˆF = F n? Estimation of the distributio of X under F n by Monte-Carlo simulation Calculation of the standard deviation of X under F n The distribution of X under F n is an approximation of the distribution of X under F Andrea Beccarini (CQE) Econometrics Winter 2013/2014 119 / 156

Bootstrap Example 1 (still contd): The algorithm 1 Draw a random sample X 1,..., X 20 from F n (resampling) 2 Compute X = 1 20 20 Xi i=1 3 Repeat steps 1 and 2 a large number B of times, save the results as X 1,..., X B 4 Compute the standard error bootex1.r SE( X ) = 1 B ( X B 1 i X ) 2 i=1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 120 / 156

Bootstrap Example 2 Parametric bootstrap of the bias of ˆθ = ˆλ = 1 X for the exponential distribution X Exp(λ) Simple random sample X 1,..., X 8 Estimation of the unknown distribution function F by ( ) Fˆλ(x) = 1 exp ˆλx Andrea Beccarini (CQE) Econometrics Winter 2013/2014 121 / 156

Bootstrap Example 2 (contd) How is ˆλ distributed under F? How is ˆλ distributed under ˆF = Fˆλ? Estimation of the distribution of ˆλ under Fˆλ by Monte-Carlo simulation Find the expectation of ˆλ under Fˆλ The distribution of ˆλ under Fˆλ approximates the distribution of ˆλ under F Andrea Beccarini (CQE) Econometrics Winter 2013/2014 122 / 156

Bootstrap Example 2 (still contd): The algorithm 1 Compute ˆλ = 1/ X from X 1,..., X 8 2 Draw a simple random sample X 1,..., X 8 3 Compute ˆλ = 1/ X from Fˆλ 4 Repeat steps 1 and 2 a large number B of times, save the results as ˆλ 1,..., ˆλ B 5 Estimate the bias by bootex2.r ( 1 B b ˆλ b ) ˆλ Andrea Beccarini (CQE) Econometrics Winter 2013/2014 123 / 156

Bootstrap General approach for bootstrap standard errors original sample X 1,..., X n edf ˆF = F n or ˆF = Fˆθ 1. resample: X 1,..., X n ˆθ 1 2. resample: X 1,..., X n ˆθ 2. B. resample: X 1,..., X n ˆθ B SE(ˆθ) = 1 B ) 2 (ˆθ B 1 b ˆθ b=1 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 124 / 156

Bootstrap Bootstrapping confidence intervals General definition: An interval [ˆθ low (X 1,..., X n ) ; ˆθ high (X 1,..., X n )] is called (1 α)-confidence interval if ) P (ˆθ low θ ˆθ high = 1 α If the equality holds only asymptotically, the interval is called asymptotic (1 α)-confidence interval Note: The interval limits are random variables Andrea Beccarini (CQE) Econometrics Winter 2013/2014 125 / 156

Bootstrap Naive bootstrap confidence intervals The naive confidence intervals are sometimes called the other percentile method Generate a large number (B) of resamples and compute ˆθ 1,..., ˆθ B Let ˆθ (1) ˆθ (2)... ˆθ (B) be the order statistic The naive (1 α)-confidence interval is [ˆθ ((α/2)b) ; ˆθ ((1 α/2)b)] Why is this approach often problematic? bootnaiv.r Andrea Beccarini (CQE) Econometrics Winter 2013/2014 126 / 156

Bootstrap Percentile bootstrap confidence intervals To determine confidence intervals we look at the distribution of ˆθ θ Let c 1 and c 2 be the α/2- and (1 α/2)-quantiles, i.e. ) P (c 1 ˆθ θ c 2 = 1 α Then ] [ˆθ c2, ˆθ c1 is the (1 α)-confidence interval Andrea Beccarini (CQE) Econometrics Winter 2013/2014 127 / 156

Bootstrap Percentile bootstrap confidence intervals Approximate the distribution of ˆθ θ by bootstrapping ˆθ ˆθ Let c 1 and c 2 be the α/2- and (1 α/2)-quantiles, i.e. ( ) P c1 ˆθ ˆθ c2 = 1 α We obtain c 1 = ˆθ (α/2b) ˆθ and c 2 = ˆθ ((1 α/2)b) ˆθ and ] [ˆθ c2, ˆθ c1 = [ ] 2ˆθ ˆθ ((1 α/2)b) ; 2ˆθ ˆθ ((α/2)b) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 128 / 156

Bootstrap Percentile bootstrap confidence intervals Algorithm of the percentile method: Compute ˆθ from the original sample X 1,..., X n Generate a large number B of resamples and compute ˆθ 1,..., ˆθ B Let ˆθ (1) ˆθ (2)... ˆθ (B) be the order statistics The bootstrap (1 α)-confidence interval is [ ] 2ˆθ ˆθ ((1 α/2)b) ; 2ˆθ ˆθ ((α/2)b) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 129 / 156

Bootstrap Example 3 Parametric bootstrap 0.95-confidence interval for λ of an exponential distribution Simple random sample X 1,..., X 8 Estimate λ by ˆλ = 1/ X Estimate the unknown distribution function F by ) Fˆλ ( ˆλx (x) = 1 exp Andrea Beccarini (CQE) Econometrics Winter 2013/2014 130 / 156

Bootstrap Example 3 (contd) The algorithm bootex3.r 1 Compute ˆλ = 1/ X from X 1,..., X 8 2 Draw a simple random sample X 1,..., X 8 3 Compute ˆλ = 1/ X from Fˆλ 4 Repeat steps 1 and 2 a large number B of times, save the results as ˆλ 1,..., ˆλ B 5 The bootstrap 0.95-confidence interval is [ 2ˆλ ˆλ ] ((1 α/2)b) ; 2ˆλ ˆλ ((α/2)b) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 131 / 156

Bootstrap Hypothesis testing Test the hypotheses H 0 : θ = θ 0 H 1 : θ θ 0 at significance level α Assumption: Random sample (univariate or multivariate) Test statistic T = ˆθ θ 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 132 / 156

Bootstrap Hypothesis testing Reject H 0 if the value of the test statistic is less than the α/2-quantile of T or greater than the (1 α/2)-quantile of T The p-value of the test is P( T > t ) How can we estimate the distribution of T under H 0? Wald approach: bootstrap distribution T = ˆθ ˆθ ˆθ = ˆθ(X 1,..., X n ) is calculated from resamples drawn under the alternative hypothesis Andrea Beccarini (CQE) Econometrics Winter 2013/2014 133 / 156

Bootstrap Hypothesis testing Lagrange multiplier approach: bootstrap distribution T # = ˆθ # θ 0 Attention: ˆθ # = ˆθ(X # 1,..., X # n ) is calculated from resamples drawn under the null hypothesis! This approach is particularly suitable for the parametric bootstrap (but can also be used for other bootstraps) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 134 / 156

Bootstrap Hypothesis testing: General algorithm 1 Compute test statistic T from X 1,..., X n 2 Draw a resample under the null hypothesis, X # 1,..., X # n, or draw a resample under the alternative hypothesis, X 1,..., X n 3 Compute the test statistic T or T # for the resample 4 Repeat steps 2 and 3 a large number B of times; save the results as T # 1,..., T # B or T 1,..., T B 5 Calculate the α/2-quantile c # 1 (or c 1 ) and the (1 α/2)-quantile c # 2 (or c 2 ) 6 Reject H 0 if the test statistic T is less than c # 1 (or c 1 ) or greater than c # 2 (or c 2 ) Andrea Beccarini (CQE) Econometrics Winter 2013/2014 135 / 156

Bootstrap Example 4 Parametric bootstrap for the parameter λ of an exponential distribution X Exp(λ) Random sample X 1,..., X 8 Hypotheses H 0 : λ = λ 0 = 2 against H 1 : λ λ 0 (at level α = 0.05) Test statistic T = ˆλ 2 Bootstrap of the distribution of T under the alternative hypothesis (Wald approach) bootex4a.r Andrea Beccarini (CQE) Econometrics Winter 2013/2014 136 / 156

Bootstrap Example 4 (contd) Bootstrap of the distribution of T under the null hypothesis (LM approach) Under the null hypothesis, X # Exp(λ 0 ) with λ 0 = 2 bootex4b.r Hence, the distribution of T # is found by an ordinary Monte-Carlo simulation! If T < T # (α/2b) or T > T # ((1 α/2)b), reject H 0 Andrea Beccarini (CQE) Econometrics Winter 2013/2014 137 / 156

Bootstrap Example 5 Nonparametric test for equality of two expectations Two independent variables X and Y with expectations µ X, µ Y and unknown variances σ 2 X, σ2 Y Hypotheses H 0 : µ X = µ Y against H 1 : µ X µ Y Samples X 1,..., X m and Y 1,..., Y n Test statistic T = ˆµ X ˆµ Y ˆσ 2 X + ˆσ2 Y Andrea Beccarini (CQE) Econometrics Winter 2013/2014 138 / 156