ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

Similar documents
Follow links for Class Use and other Permissions. For more information send to:

Bayesian Methods for Machine Learning

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Maximum Likelihood (ML) Estimation

Lecture 2: Univariate Time Series

GARCH Models Estimation and Inference

Stat 5101 Lecture Notes

The Metropolis-Hastings Algorithm. June 8, 2012

STA 4273H: Statistical Machine Learning

Follow links for Class Use and other Permissions. For more information send to:

Answers and expectations

GARCH Models Estimation and Inference

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Volatility. Gerald P. Dwyer. February Clemson University

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Estimation of Dynamic Regression Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

A Course on Advanced Econometrics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Nonlinear GMM. Eric Zivot. Winter, 2013

Econometric Analysis of Cross Section and Panel Data

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

STA 4273H: Sta-s-cal Machine Learning

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Stock index returns density prediction using GARCH models: Frequentist or Bayesian estimation?

Multivariate GARCH models.

Dynamic Discrete Choice Structural Models in Empirical IO

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

STA414/2104 Statistical Methods for Machine Learning II

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Econometrics of Panel Data

Introduction to Machine Learning CMU-10701

Bayesian Inference and MCMC

Quick Review on Linear Multiple Regression

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

Estimation, Inference, and Hypothesis Testing

A Bayesian perspective on GMM and IV

Bayesian Regression Linear and Logistic Regression

BTRY 4090: Spring 2009 Theory of Statistics

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

The Linear Regression Model

Econ 583 Final Exam Fall 2008

Introduction to Estimation Methods for Time Series models Lecture 2

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Bayesian Inference for DSGE Models. Lawrence J. Christiano

11. Further Issues in Using OLS with TS Data

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Markov chain Monte Carlo

This note introduces some key concepts in time series econometrics. First, we

ECON3327: Financial Econometrics, Spring 2016

Bayesian Semiparametric GARCH Models

Econometrics I, Estimation

Bayesian Semiparametric GARCH Models

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Sequential Monte Carlo Methods for Bayesian Computation

Lecture 9: Markov Switching Models

Chapter 1. GMM: Basic Concepts

Monte Carlo Methods. Leon Gu CSD, CMU

Financial Econometrics

Introduction to Machine Learning

Lecture 8: Multivariate GARCH and Conditional Correlation Models

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Short Questions (Do two out of three) 15 points each

When is a copula constant? A test for changing relationships

Markov Chain Monte Carlo

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Chapter 2. GMM: Estimating Rational Expectations Models

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

STATISTICS SYLLABUS UNIT I

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Likelihood-free MCMC

Multivariate Time Series: VAR(p) Processes and Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

MCMC algorithms for fitting Bayesian models

Markov Chain Monte Carlo methods

STA 294: Stochastic Processes & Bayesian Nonparametrics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Generalized Autoregressive Score Models

Testing Restrictions and Comparing Models

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

MFE Financial Econometrics 2018 Final Exam Model Solutions

Greene, Econometric Analysis (6th ed, 2008)

CPSC 540: Machine Learning

Econometrics Summary Algebraic and Statistical Preliminaries

Computational statistics

Further Evidence on Simulation Inference for Near Unit-Root Processes with Implications for Term Structure Estimation

MA Advanced Econometrics: Applying Least Squares to Time Series

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

What s New in Econometrics. Lecture 15

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Bayesian inference for multivariate extreme value distributions

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

1. The Multivariate Classical Linear Regression Model

The Instability of Correlations: Measurement and the Implications for Market Risk

Transcription:

Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN a.a. 14/15 p. 1 LECTURE 3: REVIEW OF BASIC ESTIMATION METHODS: GMM AND OTHER EXTREMUM ESTIMATORS; SIMULATION-BASED ESTIMATORS

OVERVIEW 1) Definition of Generalized Method of Moment Estimators: Unconditional and conditional moment restrictions 2) Extremum estimators: asymptotic normality 3) Efficient GMM tests 4) Goodness of fit tests in GMM (extremum estimators) 5) Sequential, partial estimation 6) Simulation-based estimators (brief introduction): SMM 7) Simulation-based estimators (brief introduction): MCMC 8) Some notes on SNP auxiliary models a.a. 14/15 p. 2

THE GMM: DEFINITION In modern econometrics, GMM is the leading case of limited information estimator We want to estimate a parameter vector θ 0 in the admissible parameter space Φ R K and is to be based on a sample, a sub-vector of the complete set of variables appearing in a DAPM. The restrictions on the distribution of to be used in estimating θ 0 are summarized as a set of restrictions on the moments of functions of z t The moment restrictions may be conditional or unconditional In the unconditional case, is satisfied uniquely by θ 0, where h is an M-dim. vector, M K h may define standard central or noncentral moments of returns, the orthogonality of forecast errors to variables, etc. a.a. 14/15 p. 3

THE GMM: UNCONDITIONAL MOMENTS Definition 4 [Just identified GMM, K = M]: Because the function defined as H 0 (θ) E[h(z t ; θ)] satisfies H 0 (θ 0 ) = 0, a natural estimation is to replace H 0 by its sample counterpart, and choose θ T that sets H T (θ T ) = 0 M If H T converges to its population counterpart as T gets large by a LLN, H T (θ) H 0 (θ), for all θ Φ, then under regularity conditions we should expect that θ T θ 0 Next suppose that M > K: then there is in general no unique way of solving for the K unknowns using the M equations H T (θ) = 0 Definition 5 [Over-identified GMM, K < M]: Let {a T : T 1} a.a. 14/15 p. 4

THE GMM: UNCONDITIONAL MOMENTS be a sequence of s M matrices of rank s, K s M, and consider the function. Then the GMM criterion function is a quadratic form:. denotes the Euclidean norm, x = (x x) 1/2 Quadratic form follows from GMM estimators are optimal, in the sense of being asymptotically most efficient, when they can be represented as the solution to (*) for appropriate choice of W T Let s now focus on stronger, conditional moment restrictions: W T To construct an estimator of θ 0, we choose K sample moment a.a. 14/15 p. 5 (*)

THE GMM: CONDITIONAL MOMENTS equations in the K unknowns θ Because h(z t+n ; θ 0 ) is orthogonal to any random variable in I t, we have much more flexibility in choosing these moment conditions We can afford to introduce a class of K M full-rank instrument matrices A t with elements in I t. For any A t, we use (**) A family of GMM estimators indexed by A t A, θ A T is the solutions to the corresponding sample moment equations: If the sample mean of A t h(z t+n ; θ) converges to its population counterpart, for all θ Φ, and A t and h are chosen so that θ 0 is the unique element of Φ satisfying (**), then we might reasonably expect θ A T to converge to θ 0 as T a.a. 14/15 p. 6

THE GMM: CONDITIONAL MOMENTS The large sample distribution of θ A T depends, in general, on the choice of A t The conditional GMM estimator, is not the extreme value of a criterion function; it is instead the solution to K moment equations in K unknowns, and θ T solves the sample counterpart of these equations The class of θ A T conditional GMM estimators offers more flexibility in choosing the weights on h; as a result, they are often more efficient than their unconditional counterparts This means that they allow us to exploit more information about the distribution of z t than (*) in the estimation of θ 0 Example 2 [Linear projections, aka regressions ] If we define then by construction, δ 0 satisfies a.a. 14/15 p. 7

THE GMM: CONDITIONAL MOMENTS Why is this notable/remarkable? Because the celebrated orthogonal projection theorem tells us that the unique solution to a standard projection (== population regression) problem is given by the δ 0 R K satisfying Notice that this means that projection/regressions are just a special case of GMM based on unconditional moment restrictions Example 1 (cont d) [CIR model] Because the conditional mean E[r t+ r t ] for any finite interval is given by then it is natural to base GMM estimation on the condition and instruments can then be set to any g(r t ) so that estimation is based on a.a. 14/15 p. 8

THE GMM: CONDITIONAL MOMENTS Example 2 [Two-factor SV model] Assume that v is SV and therefore unobserved: The interest rate is no longer Markov given its past history Although the variance of r t conditioned on r t 1 is not known in closed form, nor is the form of density of r t conditioned on J-histories of r, is correct Hence there are two possible estimation strategies: (i) approximate the log-likelihood function; (ii) GMM based on either unconditional or conditional restrictions from Example 1 However, this GMM estimator ignores entirely the structure of the volatility process; thus, not only are we unable to recover any information about the parameters of volatility, but knowledge of the functional form of the volatility equation is ignored a.a. 14/15 p. 9

EXTREMUM ESTIMATORS Substantially more information about f(r t r t 1 ; θ 0 ) can be used in estimation, but to accomplish this we have to extend the GMM estimation strategy to allow for unobserved state variables Under some technical conditions on the sequence of weight matrices {W T }, one can show that the GMM for θ 0 is (strongly) consistent for θ Definition 6 [Extremum Estimators] EEs are estimators obtained by either maximizing or minimizing a criterion function over the admissible parameter space Obviously, the specific properties of EEs will depend on the adopted criterion function In fact, ML, QML, GMM (unconditional and conditional), and regression (LP) estimators are all cases of EEs As already mentioned in lecture 2, EEs share a common and a.a. 14/15 p. 10

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY important statistical property, they have an asymptotically normal distribution Suppose that θ T is strongly consistent for θ 0. To show asymptotic normality of θ T, we focus on FOCs for the maximization or minimization of Q T, the sample mean of the function D(z t ; θ): Thus, the function D(z t ; θ) representing the FOCs for Q 0, is Hansen (1982, ECMA) has argued that in some ways, EEs are a.a. 14/15 p. 11 w/

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY just generalizations of GMM to generic instrument matrix A and to generic restrictions often written as scores or orthogonality conditions One of the common mis-conceptions at this point is that it is a simple application of the classical CLT may deliver asymptotic normality However, the classical CLT is based on the strong assumption of IIDness of z t : in FEc, the assumption of independence is typically too strong It rules out persistence in the state variables and time-varying conditional volatilities Also the assumption that {X t } is a stationary and ergodic time series, which is much weaker than an IID assumption, is not sufficient to establish a CLT, N The problem is that an ergodic time series can be highly a.a. 14/15 p. 12

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY persistent, so that the X t and X s, for s t are too highly correlated for to converge to a normal A weaker assumption than IIDness that however delivers a CLT is instead that z t follows a martingale difference sequence, i.e. (referred to a generic time series {X t }) that with probability one, i.e., X t is mean -independent If {X t } is also stationary and ergodic and E[X 2 1] is finite (exists), then a CLT applies to Formally stated, asymptotic normality consists of: However, the specific structure of Σ 0 will depend on the estimator under consideration a.a. 14/15 p. 13

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY 1 Maximum Likelihood Estimator (MLE): Because and A t = I K The second equality derives from the fact that from the MLE FOCs one has Differentiating under the integral sign and using the chain rule: a.a. 14/15 p. 14

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY Putting everything together, we have: In actual implementations, the asymptotic covariance can be estimated either as the inverse of the sample mean of the outer product of the likelihood scores, or as minus the inverse of the sample mean of the secondderivative matrix evaluated at b T ML Asymptotically, the two objects are identical, but in small a.a. 14/15 p. 15

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY these do not have to be the same The asymptotic covariance matrix of b T ML is the Cramer-Rao lower bound, the inverse of the Hessian matrix Even though the MLE may be biased in small samples, as T gets large, it is the most efficient estimator in the sense of having the smallest asymptotic covariance matrix among all consistent estimators of β 0 2 GMM: Because and assuming that A T converges in probability to A 0 = d 0 W 0, we have: If the probability limit of the distance matrix defining the GMM a.a. 14/15 p. 16

EXTREMUM ESTIMATORS: ASYMPTOTIC NORMALITY criterion function is chosen to be W 0 = Σ -1 0, then where d 0 = 3 QMLE: It is essentially the same as the MLE case, but now and where a.a. 14/15 p. 17

EFFICIENT GMM ESTIMATOR Under QMLE, Σ 0 and d 0 are different from those under MLE In particular, no further simplification obtains so that In many circumstances, a researcher estimating a GMM will have considerable latitude in choosing either A 0 or h(z t, θ), or both Therefore, a natural question is: Which is the optimal GMM estimator among all admissible estimators? A natural answer is simply: the most efficient in the sense of having the smallest asymptotic covariance matrix among all estimators that exploit the same information about the distribution of a.a. 14/15 p. 18

EFFICIENT GMM ESTIMATOR Notice that the choice of a weighting matrix A t makes sense iff M > K, when there are more moment conditions than parameters In the case of the GMM fixed on unconditional moment restrictions (sometimes called fixed ), it is easy to show that To relate this observation back to the standard GMM criterion function expressed as a quadratic form in H T (θ) because A 0 =d 0 W 0, the optimal GMM estimator is obtained by setting W 0 = Σ -1 0 As Σ 0 is the asymptotic covariance matrix of the sample moment H T (θ 0 ), this choice of W 0 gives the most weight to those moment a.a. 14/15 p. 19

EFFICIENT GMM ESTIMATOR conditions that are most precisely estimated in the sense of having a small (asymptotic) variance In the case of the conditional moment restriction-based GMM, the optimal choice of the weights becomes time-varying: and plugging into A t =d t W t gives: Notice that MLE represents a special case of optimal GMM in which the optimality derives from the choice of the moment conditions(s) and not of the weighting matrix (i.e., A t = I K t): a.a. 14/15 p. 20

GMM GOODNESS OF FIT TESTS In minimizing Q T (θ) over the choice of θ Θ, the GMM is chosen to set K linear combinations of the M sample moment conditions H T to zero (the K first-order conditions): Yet, if the model is correctly specified, all M sample moment equations H T (θ T ) should be close to zero. Therefore one can construct a goodness-of-fit test of the model by examining whether linear combinations of H T (θ T ) that are not set to zero in estimation are in fact close to 0 It turns out that the min value of the GMM criterion function, scaled by sample size, TQ T (θ T ) is a goodness-of-fit test based Under the null hypothesis that the model is correctly specified, TQ T (θ T ) is asymptotically distributed as a chisquare distribution with M K degrees of freedom a.a. 14/15 p. 21

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) Notice that which is a weighted average of sample moment conditions With complex estimation problems, it is often convenient to break down the problem in multiple steps to exploit computational advantages For instance, given, first the sub-vector θ 0 is estimated using a subset of the available moment equations, and then the sub-vector 0 is estimated in a second stage using additional moment conditions that also depend on θ 0 An important issue with sequential estimation is whether the asymptotic distribution of λ T is affected by first-stage estimation of θ 0 The answer is generally yes ; however, it is not always yes and, fortunately, there is a simple way to check... a.a. 14/15 p. 22

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) Consider the GMM problem where where θ 0 has dimension K 1, h 1 is a M 1 -vector function, λ 0 has dimension K 2, and h 2 is a M 2 -vector function A first-stage estimator θ T of θ 0 is obtained by solving for some K 1 M 1 matrix A 1T with probability limit A 10 Similarly, the second-stage estimator λ T of λ 0 is obtained as the solution to Taking mean-value expansions and solving for (θ T θ 0 ) and a.a. 14/15 p. 23 (**)

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) (λ T λ 0 ) gives suitable intermediate values Mean value expansion: The asymptotic distribution of λ T that solves the system is the same as that of the distribution of the λ T that solves (**) iff: Under this condition, the correct limiting distribution of λ T is a.a. 14/15 p. 24

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) obtained by treating θ 0 as if it were known: there is no effect on the (limiting) distribution of λ T of pre-estimation of θ 0 in a first stage The condition is a useful test for whether sequential estimation affects the inference in the second stage Example 3 [Two-stage estimation of GARCH models] Consider r t+1 = + 2 t+1 t+1 t+1 IID N(0,1) 2 t+1 = + (r t - ) 2 + 2 t Typically, estimation of [ ] is performed by MLE This is consistent with this lecture as we know that MLE is just a special case of GMM However, it is not infrequent to see papers in which first is estimated by OLS and in a second step, using e t+1 = r t+1 -, a.a. 14/15 p. 25

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) then MLE is applied to the GARCH model to estimate [ ], as if were a known value Problem: Also should be estimated by GLS and not OLS, where GLS is equivalent to MLE in this simple case Because a quick look at the log-likelihood function reveals that the portion that contains [ ] shows that enters also this second component, i.e., clearly such a procedure is not equivalent to full-mle For several reasons, the implementation of GMM and MLE in the analysis of DAPMs may be computationally demanding, if not infeasible One such circumstance is when there are unobserved state a.a. 14/15 p. 26 variables, e.g., in the case of stochastic volatility:

SEQUENTIAL ESTIMATORS (ANOTHER CASE OF QML) Unobserved! Because of the unobservability of volatility, discretely sampled returns, {r t }, are not Markov conditioned on their own history and the form of the conditional dstr. of r t is unknown Apart from a few special cases, the moments of r t, expressed as functions of the unknown parameters, are also unknown This problem may be rendered even more challenging by the presence of jumps in returns or volatility (At least) two solutions among estimation methods: (1) Simulated Method of Moments; (2) Monte Carlo Markov Chain (MCMC) methods Both estimators are applicable to DAPMs without latent a.a. 14/15 p. 27 variables, but they are most useful when latent variables (e.g.,

SIMULATED METHOD OF MOMENTS a.a. 14/15 p. 28 jumps) when these methods often dominate others, both in their tractability and efficiency SMM extends the GMM estimator to a class of DAPM for which moment restrictions do not have analytic representations in terms of observable variables and unknown parameters SMM are not just GMM applied to simulated data: the reason is that functions of the current value of the simulated state depend on the unknown parameter vector: _ through the structure of the model (as in any GMM problem) and _ indirectly, through generation of data by simulation The feedback effect of the latter dependence on the transition law of the simulated state implies that the (first-momentcontinuity) conditions used to establish the uniform conver-

SIMULATED METHOD OF MOMENTS gence of sample to population criterion functions in GMM fail Assume that a given R N -valued state process {Y t } t=1 is generated by the difference equation where the parameter vector β 0 Θ R K and { t } is an IID sequence of R p -valued random variables The number of shocks, p, need not equal the dimension of the state vector, N Letting Z t (Y t, Y t 1,..., Y t l+1 ) for some positive integer l <, the estimation of β 0 is based on the moments of the observation function g(z t, β), g : R N Θ R M Moments of the observed series are calculated as sample moments of the observed g* t g(z t, β) The function H may be known or determined implicitly by the a.a. 14/15 p. 29

SIMULATED METHOD OF MOMENTS numerical solution of a discrete-time model or by a discretetime approximation of a continuous-time model Example 2 [Cont d] Consider A standard (Euler) discretization scheme, gives Bivariate N(0, I 2 ) If the function mapping β into E[g(Z t,β)] is known and independent of t, the GMM estimator is applicable a.a. 14/15 p. 30 h(z t,β) h(z t,β)

SIMULATED METHOD OF MOMENTS Unfortunately, the form of g(z t,β) is known only for special cases, like when κ = 0 or when The SMM circumvents the requirement that g(z t, β) is known by making the much weaker assumption that we have access to an R p -valued sequence These random variables are identical in distribution to, and independent of, { t } The idea is that for any R N -valued initial point Y 0 and any parameter vector β Θ, the simulated state process {Y β t} can be constructed inductively by letting Y β t = Y 0 and Y β t+1 = H(Y β t, Likewise, the simulated observation process is constructed by g β t = g(z β t, β), where The SMM of β 0 is then the parameter vector b T that best matches the sample moments of the actual and simulated observation processes, {g* t } and {g b T t } a.a. 14/15 p. 31,β)

SIMULATED METHOD OF MOMENTS Let T: Z Z define the simulation sample size T (T) generated for a given sample size T of actual observations, where T (T) as T For any β, let denote the difference in sample moments. If {g* t } and {g bt t} satisfy a LLN and under adequate identification conditions, In the over-identified case, one also selects a sequence W = {W T } of M M positive-semidefinite matrices such that b T is: Although mechanically this is the case, SMM does not simply extend the GMM by replacing the population moment E[g(Z t,β)] with its sample counterpart, calculated with simulated data a.a. 14/15 p. 32

SIMULATED METHOD OF MOMENTS What are the main differences between GMM and SMM? 1 The key difference is the parameter dependency of the simulated series {g β t} because this depends on β not only directly, but indirectly through the dependence of the entire past history of the simulated process {Y β t} on β 2 Unless the environment is IID, simulations require initial conditions for the state Y t : even if the transition function of the Markov process {Y t } is stationary, the simulated process {Y β t} is generally non-stationary This derives from the fact that the initial simulated state Y β 0 is typically not drawn from the ergodic distribution of the process Practically, one can leave out an initial portion of the simulated state variables {Y β t} to mitigate transient effects What are the key statistical properties of SMM? As usual, a.a. 14/15 p. 33

SIMULATED METHOD OF MOMENTS they are (weakly) consistent and asymptotically normal Weak consistency: plim b T = β 0 The proof of consistency is interesting because strong consistency is difficult to obtain Duffie and Singleton (1993, ECMA) give conditions on H that guarantee that the compounding effects of simulation on the properties of estimators damp out over time, and use these conditions to prove strong consistency However, their damping conditions are not satisfied by many of diffusion DAPM Because the simulated state process is usually not initialized with a draw from its ergodic distribution, one needs a condition that allows the use of an arbitrary initial state, knowing that the state process converges rapidly to its stationary distribution Such a condition is geometric ergodicity: a condition ensuring a.a. 14/15 p. 34

SIMULATED METHOD OF MOMENTS that the simulated state process satisfies a LLN with an asymptotic distribution that is invariant to initial conditions Let P t x denote the t-step transition probability for a timehomogeneous Markov process {X t }, i.e., the distribution of X t given the initial point X 0 = x; {X t } is ρ-ergodic, for some ρ (0, 1], if there is a measure π s.t., for every initial point x, Total variation norm If {X t } is ρ-ergodic for ρ < 1, then {X t } is geometrically ergodic In calculating asymptotic distributions, geometric ergodicity can substitute for stationarity since it means that the process converges geometrically to its stationary distribution As usual, the condition is imposed on {Y β t} because it delivers a (S)LLN for the simulated series {g β t} a.a. 14/15 p. 35 Ergodic dstr.

SIMULATED METHOD OF MOMENTS Because of the (S)LLN, the criterion function Q T (β) converges to the asymptotic criterion function Q 0 : Θ R defined by Q 0 (β) = G (β) W 0 G (β) almost surely What is the optimal sequence W = {W T } of M M SPD matrices? Similarly to GMM, this happens when W T W 0 = Σ 0-1 Σ 0 is a function solely of the moments of {g* t } and hence the data, and not of β or on the moments of the simulated process {g β t} When it comes to asymptotic normality, one needs to justify the assumption of uniform continuity of the observations as a function of β: with simulations, a perturbation of β affects not only the current observation, but also transitions btw past states, a dependence that compounds over time Formally, one has: a.a. 14/15 p. 36

SIMULATED METHOD OF MOMENTS N(0, ) as As 0, AsyVar(b T ) (D 0 Σ -1 0 D 0 ) -1, the covariance matrix obtained when an analytic expression for E[g(Z t,β)] is known Knowledge of E[g(Z t,β)] increases efficiency, but if the simulated sample size T (T) is chosen to be large relative to the size T of the sample, then there is essentially no loss in efficiency Typically, in applications of the SMM to asset pricing, it is assumed that T is large and 0 So far, we stated that one has freedom to pick moments that enter the definition of and that at best SMM may require an optimal choice of the weighting matrix, W T W 0 = Σ -1 0 a.a. 14/15 p. 37

KLIC-BASED MOMENT SELECTION: EMM In principle, one would want to choose moment equations that capture some of the known features of the data, such as persistence, conditional heteroskedasticity, and non-normality Gallant and Tauchen (1996, ECT) proposed a clever application of SMM that allows one to easily capture these features Their approach is based on the concept of minimization of the Kullback-Leibler information criterion (KLIC): where DGP process and is the density of the actual but unknown is just an approximation The KLIC can be interpreted as a measure of our ignorance about the true structure of the DGP However, minimizing KLIC for given p has clear implications for the denominator: one should maximize as in MLE a.a. 14/15 p. 38

KLIC-BASED MOMENT SELECTION: EMM Gallant and Tauchen s idea is simple and yet powerful: let Y t denote the observed subvector of the state process Y t What s the difference? In example 2, Y t = r t and Y t = (r t, v t ) because stochastic volatility is not observable Let f (Y t Y t-1 ; δ) be a conditional density function of the data that captures parametrically the features of the data that one is interested in representing by {g t } Applying ML to gives δ T s.t. There is no presumption that the density f is the true conditional density of Y t or that δ T is a consistent estimator of any of the parameters of the true DGP process for Y t δ T is a consistent estimator of δ 0 that minimizes the KLIC At this point, GT s proposal is simple: having chosen f and estimated δ T by MLE, let the score of this log-likelihood be the a.a. 14/15 p. 39

KLIC-BASED MOMENT SELECTION: EMM vector of moments used to estimate β 0 : and the corresponding function for simulated data is In simulating Y β t, it is generally necessary to simulate the entire vector Y β t and then select the Y β t in the moment conditions: The sample moments entering the SMM criterion depend only on simulated data a.a. 14/15 p. 40

KLIC-BASED MOMENT SELECTION: EMM Of course simulation is common but not needed for this KLICdriven method to work, i.e., this can be applied to GMM too However, you know already that when in a GMM you set the (unconditional) moment restrictions to equal the scores, then MLE obtains (as well as the Cramer-Rao lower bound) One wonders about the efficiency of SMM when the moment conditions are picked this way Gallant and Tauchen write about Efficient Method of Moments when the auxiliary model is picked in a special way (see below, GT s seminonparametric framework) Several studies have examined the small-sample properties of the SMM and compared them to the properties of standard GMM and (when feasible) ML estimators Properties of the SMM depend on the choice of auxiliary model, number of moments, and sample size a.a. 14/15 p. 41

KLIC-BASED MOMENT SELECTION: EMM Chumacero (1997, SNDE): SMM is more efficient and often shows less bias than the GMM; however, tests of overidentifying restrictions using SMM tend to reject the models too often Andersen, Chung and Sorensen (1999, JoE): SMM performed well vs. GMM based on less systematic choice of instruments For their models and parameters the overall goodness-of-fit chisquare statistics from simulated moments led to reliable inference Example 4 [Matching simple features of the data, model-free case] If Y t is an observed scalar process and we care for firstorder serial correlation and conditional heteroskedasticity that depends on lagged squared projection errors, then set The FOCs vs. a 0, 0 and 1 are: a.a. 14/15 p. 42

KLIC-BASED MOMENT SELECTION: EMM In this case the components of G T ( ) are: An alternative estimation strategy for diffusion models, including models with latent state variables, is the method of Markov chain Monte Carlo (MCMC) Its conceptual foundations draw upon Bayesian theory MCMC generates estimates not just of the parameters of the model, but also of the latent volatility, jump times, and jump sizes, i.e., of latent variables Under approximate ML, QML, GMM, and SMM, a time series of values of latent variables (e.g., stochastic volatility) is usually computed after estimation using filtering methods a.a. 14/15 p. 43

MONTE CARLO MARKOV CHAIN ESTIMATION MCMC allows separation and quantification of estimation risk and model specification risk, and infrequent observations or missing data are easily accommodated The basic idea is to combine a prior distribution over the unknown parameters with the conditional density of the state vector to obtain a joint posterior distribution of the parameters and the state conditional on the observables From this joint posterior distribution the marginal posterior distributions of the states and parameters can be computed Mean or median, standard deviation, quantiles, and so on, of the posterior distribution of the parameters can be computed Denote as Θ the parameter vector of interest, X be a vector of (possibly latent) state variables, and Y denote the vector of observed asset prices or yields The MCMC algorithm constructs a Markov chain that converges a.a. 14/15 p. 44

MONTE CARLO MARKOV CHAIN ESTIMATION to the joint distribution p(θ, X Y ). From this distribution, one can determine both p(θ Y ) (which gives the parameter estimates) and p(x Y) (which provides estimates of the unobserved states) Key to this construction is the Clifford-Hammersley theorem: under a positivity condition, knowing p(θ X, Y) and p(x Θ, Y) is equivalent to knowing p(x, Θ Y) What gives the MCMC algorithm its traction is that the first two distributions are often much easier to characterize than the joint distribution p(x,θ Y ) When feasible to simulate from both densities, MCMC algorithm uses a Gibbs sampler: given realizations up to g 1, X g t is drawn from p(x t Θ g 1,Y ); Θ g is drawn from p(θ X g 1 t, Θ g-1,y) When direct sampling from the joint density p(x, Y ) is not a.a. 14/15 p. 45 feasible, researchers have replaced Gibbs sampling with

MONTE CARLO MARKOV CHAIN ESTIMATION Metropolis-Hastings sampling Suppose that simulation from the conditional density p(x Θ, Y) is not feasible and let The basic idea is to start with a distribution q(x g+1 X g ) that is known and from which samples can be easily drawn Then a single Gibbs sampling step is replaced by the two steps: Example 5 [Simple Brownian motion model] Consider that can be discretized into a.a. 14/15 p. 46

MONTE CARLO MARKOV CHAIN ESTIMATION Of course, the log-lik fnct of this model is known in closed form and the first line of action consists of estimating one MLE However, adopt a Bayesian approach under independent priors, with objects of interest and From Bayes rule: posteriors Likelihoods Typical choices of conjugate priors are p( S ) normal and p(σ 2 S) inverted gamma; is a normal, b/c The MCMC draws from known densities (i.e., it is a Gibbs sampler) priors a.a. 14/15 p. 47

GALLANT AND TAUCHEN S SNP MODEL A flexible family of auxiliary models is constructed as follows: let Y,t 1 be the linear projection of Y t onto L μ lags of Y; allow for ARCH -like errors by transforming the innovations in this autoregression by a matrix R Y,t 1 with elements that are linear in the absolute values of L r past values of Let z t the standardized Y t, and approximate the conditional density of Y t by Standard normal What is a Hermite polynomial? Hermite polynomial in z Coeffs. that can be made fnc. of time The Hermite expansion serves to introduce nonnormality in the conditional distribution of Y t by scaling the conditional a.a. 14/15 p. 48

GALLANT AND TAUCHEN S SNP MODEL normal density by a polynomial in lagged values of Y t SNP auxiliary models give another advantage: using Wald tests of individual moment conditions, you can test the null hypotheses that elements of mean score vector from the auxiliary model, G (β 0 ), are zero using the sample scores Rejection of the null that a particular mean score is zero would suggest that the DAPM does not adequately describe the features of the conditional distribution of Y governed by the associated parameter in the auxiliary model a.a. 14/15 p. 49