Lecture 14 More on structural estimation

Lecture 14 More on structural estimation Economics 8379 George Washington University Instructor: Prof. Ben Williams

traditional MLE and GMM MLE requires a full specification of a model for the distribution of y x. Typically, f y1,...,y n x 1,...,x n = n i=1 f θ y i x i where fy θ i x i known if θ R K is known. GMM is sometimes preferred because it is based only on moments E(w i m(y i, x i, θ)) = 0. Both are difficult when f yi x i or m is difficult to compute.

MLE vs GMM Two reasons f yi x i or m can be difficult to compute: latent variable: fy θ i x i = fy x,u θ f udu y i is determined conditional on x i and unobserved shock(s) via an economic model which may involve dynamic optimization, solution of a nash equilibrium, etc.

Early examples Multinomial probit: l(β) = n J 1(y i = j) log(pr(y i = j X i )) i=1 j=1 where Pr(y i = j X i ) = Pr(X ij β + ε ij max X il β + ε il) and ε i = (ε i1,..., ε ij ) N(0, Σ) l j

Early examples Random coefficients logit: same form for likelihood with Pr(y i = j X i ) = and β i N( β, Σ β ) exp(x ij β i) J l=1 exp(x il β)f (β i)dβ i

Maximum Simulated Likelihood Suppose that f (y i X i, θ) = g(y i X i, u, θ)ψ(u)du. simulate u i1,..., u is i.i.d. ψ( ) for each i and replace l i (θ) = log(f (y i X i, θ)) with ˆl i (θ) = log ( S 1 S s=1 g(y i X i, u is, θ) ) then ˆθ MSL = arg max θ n i=1 ˆl i (θ).

Maximum Simulated Likelihood Only consistent and asymptotically normal if n/s 0. Take S as a multiple of the sample size if feasible. do not draw new simulation sample in each iteration of the optimization routine! Sometimes this naive simulation can be improved by importance sampling and other variance-reduction techniques. See 12.7 in CT.

Method of Simulated Moments Suppose we want to estimate θ based on the moment condition: E(w i m(y i, x i, θ 0 )) = 0 where m(y i, x i, θ) = h(y i, x i, u, θ)ψ(u)du requires simulation then draw u is, s = 1,..., S for each i and compute ˆm(y i, x i, θ) = S 1 S s=1 h(y i, x i, u is, θ)

Method of Simulated Moments If E( ˆm(y i, x i, θ) y i, x i ) = m(y i, x i, θ) (unbiased simulation) and the usual GMM conditions are satisfied then minimizing n w i ˆm(y i, x i, θ) i=1 provides a consistent, asymptotically normal estimator if, in addition, S, then the estimator is asymptotically equivalent to the GMM estimator for finite S, the asymptotic variance is inflated by a factor of 1 + S 1, though this can be improved, e.g. by importance sampling

Method of Simulated Moments variance estimation requires either simulation or bootstrap Gourieroux and Monfort (1991) provide more general conditions under which S is not necessary Pakes and Pollard (1989) provide some examples.

Indirect inference auxiliary model, e.g., a likelihood: l n (θ) = n i=1 log(f (y i X i, θ)) an auxiliary estimate, e.g., ˆθ = arg max θ l n (θ) economic model, e.g., y i = G(X i, u i ; β) for i = 1,..., n and u i iid F u simulate {y m (β)} and obtain θ(β) by maximizing M n m=1 i=1 log(f (y m i (β) X i, θ)

Indirect inference ˆβ = arg min β D(ˆθ, θ(β)) D is a metric function; Smith (2008) suggests Wald, LR, LM metrics consistent and asymptotically normal for M fixed, n variance inflate by 1 + M 1 very easy to implement despite the lack of efficiency

Indirect inference the following are typical conditions required for indirect inference: the economic model is correctly specified and well-behaved the auxiliary likelihood function is well-behaved in the limit, despite the fact that it is misspecified binding function l n(θ) p l(θ; β, F u) when the data is generated by the economic model with parameters β and distribution F u define θ(β) = arg max θ l(θ; β, F u) θ 0 = θ(β 0 ) is the pseudo-true value θ(β) is the binding function

Indirect inference under appropriate regularity conditions, ˆθ p θ 0 and θ(β) p θ(β) thus the identification condition: is β 0 the unique solution to θ 0 = θ(β)? requires dim(θ) dim(β) simulation avoids needing to know the binding function

Rust (1987) Agents solve a dynamic control problem where d t is a discrete control variable and state variables are x t, ε t. We observe (d t, x t ) T t=1. The agent solves: max E d t,d t+1,... ( T t ) β s t U(d s, x s, ε s ) d t, x t, ε t s=t Suppose 1. U(d s, x s, ε s ) = u(d s, x s ) + ε s (d s ) 2. f (x t+1, ε t+1 d t, x t, ε t ) = f (x t+1 d t, x t )f (ε t+1 ) Then the solution is characterized by the integrated value function.

Rust (1987) In particular, the integrated value function solves the Bellman equation: { V (x t ) = max u(d, x t ) + ε t (d) d + β V (x t+1 )f (x t+1 d, x t ) f (ε t)dε t x t+1 and the decision rule is d t = arg max u(d, x t ) + ε t (d) d + β V (x t+1 )f (x t+1 d, x t ) x t+1

Rust (1987) Then Pr(d t = d x t ) = Pr(v(d, x t )+ε t (d) > v(d, x t )+ε t (d ) for all d ) where v(d, x t ) = u(d, x t ) + β x t+1 V (x t+1 )f (x t+1 d, x t ) This can in principle be implemented through maximum likelihood since the log-likelihood can be written as n i=1 t=1 T log(f (x i,t+1 d i,t, x i,t )) + log(pr(d i,t x i,t ))

Rust (1987) This can be adapted to a finite horizon problem too, as discussed in Arcidiacono and Ellickson (2011). Start with the last period and use backwards recursion to solve for period-specific value functions. The model can also include a choice-dependent outcome variable, y t, by adding log(f (y it d it, x it )) to the log-likelihood.

Rust (1987) Rust (1987) assumes that ε t (d) are independent type 1 extreme value, which leads to a logistic expression for the choice probability: Pr(d t = d x t ) = exp(u(d, x t ) + β x t+1 V (xt+1 )f (x t+1 x t, d) d exp(u(d, x t ) + β x t+1 V (xt+1 )f (x t+1 x t, d ) and for the integrated value function: V (x t ) = log exp(u(d, x t ) + β d x t+1 V (x t+1 )f (x t+1 d, x t ))

Rust (1987) Rust (1987) nested fixed point algorithm: Estimate f (x t+1 d t, x t ) separately. Start with an initial value for the other parameters, ˆθ 0. Using this parameter value, solve for the value function through successive iterations of the Bellman equation. This will be a vector (because x t is discretized), V(ˆθ 0 ). Given ˆθ 0 and V(ˆθ 0 ), compute the ingredients needed to update: ˆθ 1 = ˆθ 0 + H(ˆθ 0 ) 1 J(ˆθ 0 ) Iterate this until convergence. Check other initial values.

Keane and Wolpin (1997) Keane and Wolpin (1997) extend this in several ways: permanent unobserved heterogeneity payoff variables that are choice-censored nonseparable unobservable unobservables correlated across choice alternatives

Keane and Wolpin (1997) The( log-likelihood for the model is a mixture: L ) log l=1 L i(θ, ω l )π l xi1 where T i L i (θ, ω l ) = Pr(d i,t x i,t, θ, ω l )f (y it a it, x it, θ, ω l ) t=1 T i 1 t=1 f (x i,t+1 d i,t, x i,t, θ, ω l ) There is a type-specific integrated value function: { V l (x t ) = max U(d, x t, ω l, ε t ) d +β V l (x t+1 )f (x t+1 d, x t, ω l ) f (ε t)dε t x t+1

Keane and Wolpin (1997) Keane and Wolpin (1997) develop a new method to simulate the integrals which will not be closed form even if errors are type I extreme value interpolate the value function computed on only part of the state space This is called the simulation and interpolation method.