Follow links for Class Use and other Permissions. For more information send to:

Similar documents
Follow links for Class Use and other Permissions. For more information send to:

ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN

Multivariate GARCH models.

Generalized Method of Moments Estimation

Economic modelling and forecasting

Volatility. Gerald P. Dwyer. February Clemson University

Financial Econometrics

Nonlinear GMM. Eric Zivot. Winter, 2013

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Liquidity Preference hypothesis (LPH) implies ex ante return on government securities is a monotonically increasing function of time to maturity.

Chapter 11 GMM: General Formulas and Application

Uncertainty and Disagreement in Equilibrium Models

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

This note introduces some key concepts in time series econometrics. First, we

Econometrics of Panel Data

Joint Estimation of Risk Preferences and Technology: Further Discussion

Affine Processes. Econometric specifications. Eduardo Rossi. University of Pavia. March 17, 2009

Next, we discuss econometric methods that can be used to estimate panel data models.

The Hansen Singleton analysis

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Studies in Nonlinear Dynamics & Econometrics

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Time Series 2. Robert Almgren. Sept. 21, 2009

ECON3327: Financial Econometrics, Spring 2016

Follow links for Class Use and other Permissions. For more information send to:

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Lecture 2: Univariate Time Series

Estimation of Dynamic Regression Models

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Generalized Method of Moments (GMM) Estimation

Linear Regression and Its Applications

Multivariate Regression

Stochastic Volatility and Correction to the Heat Equation

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Vector Auto-Regressive Models

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

VAR Models and Applications

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

GMM - Generalized method of moments

Final Exam. Economics 835: Econometrics. Fall 2010

Flexible Estimation of Treatment Effect Parameters

Chapter 6 Stochastic Regressors

Estimating Deep Parameters: GMM and SMM

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

Lecture 4: September Reminder: convergence of sequences

A Guide to Modern Econometric:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Econometría 2: Análisis de series de Tiempo

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Switching Regime Estimation

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Model estimation through matrix equations in financial econometrics

Multivariate Time Series: VAR(p) Processes and Models

GARCH Models Estimation and Inference

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

Chapter 1. Introduction. 1.1 Background

ARIMA Modelling and Forecasting

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

The Instability of Correlations: Measurement and the Implications for Market Risk

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Issues on quantile autoregression

The Linear Regression Model

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Introduction to Estimation Methods for Time Series models. Lecture 1

Dynamic Discrete Choice Structural Models in Empirical IO

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

An Introduction to Generalized Method of Moments. Chen,Rong aronge.net

Multivariate Distributions

1 Motivation for Instrumental Variable (IV) Regression

R = µ + Bf Arbitrage Pricing Model, APM

Empirical Market Microstructure Analysis (EMMA)

6.867 Machine Learning

1 The Basic RBC Model

Appendix of the paper: Are interest rate options important for the assessment of interest rate risk?

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Probabilities & Statistics Revision

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

High-dimensional regression

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Lecture 2. (1) Permanent Income Hypothesis (2) Precautionary Savings. Erick Sager. February 6, 2018

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

GARCH Models Estimation and Inference

The BLP Method of Demand Curve Estimation in Industrial Organization

STATE SPACE GEOMETRY OF ASSET PRICING: AN INTRODUCTION

Empirical Macroeconomics

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Stochastic Processes

Interpreting Regression Results

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

NOTES ON VECTORS, PLANES, AND LINES

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

If we want to analyze experimental or simulated data we might encounter the following tasks:

Transcription:

COPYRIGH NOICE: Kenneth. Singleton: Empirical Dynamic Asset Pricing is published by Princeton University Press and copyrighted, 00, by Princeton University Press. All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher, except for reading and browsing via the World Wide Web. Users are not permitted to mount this file on any network servers. Follow links for Class Use and other Permissions. For more information send email to: permissions@pupress.princeton.edu

0 0 0 0 Model Specification and Estimation Strategies A dapm may: () provide a complete characterization of the joint distribution of all of the variables being studied; or () imply restrictions on some moments of these variables, but not reveal the form of their joint distribution. A third possibility is that there is not a well-developed theory for the joint distribution of the variables being studied. Which of these cases obtains for the particular DAPM being studied determines the feasible estimation strategies; that is, the feasible choices of D in the definition of an estimation strategy. his chapter introduces the maximum likelihood (ML), generalized method of moments (GMM), and linear least-squares projection (LLP) estimators and begins our development of the interplay between model formulation and the choice of an estimation strategy discussed in Chapter... Full Information about Distributions Suppose that a DAPM yields a complete characterization of the joint distribution of a sample of size on a vector of variables y t, y {y,..., y }. Let L (β) = L( y ; β) denote the family of joint density functions of y implied by the DAPM and indexed by the K -dimensional parameter vector β. Suppose further that the admissible parameter space associated with this DAPM is R K and that there is a unique β 0 that describes the true probability model generating the asset price data. In this case, we can take L (β) to be our sample criterion function called the likelihood function of the data and obtain the maximum likelihood (ML) estimator b ML by maximizing L (β). In ML estimation, we start with the joint density function of y, evaluate the random variable y at the realization comprising the observed historical sample, and then maximize the value of this density over the choice of β. his amounts to maximizing, [], () Lines: to.0pt PgV PgEnds: E [], ()

. Model Specification and Estimation Strategies 0 0 0 0 over all admissible β, the likelihood that the realized sample was drawn from the density L (β). ML estimation, when feasible, is the most econometrically efficient estimator within a large class of consistent estimators (Chapter ). In practice, it turns out that studying L is less convenient than working with a closely related objective function based on the conditional density function of y t. Many of the DAPMs that we examine in later chapters, for which ML estimation is feasible, lead directly to knowledge of the density function of y t conditioned on y t, f t (y t y t ; β) and imply that ( ) f t (y t y t ; β) = f y t yt ; β, (.) where y t (y t, y t,..., y t + ),a -history of y t. he right-hand side of (.) is not indexed by t, implying that the conditional density function does not change with time. In such cases, the likelihood function L becomes ( ) L (β) = f y t yt ; β f m ( y ; β), (.) t = + where f m ( y ) is the marginal, joint density function of y. aking logarithms gives the log-likelihood function l log L, ( ) l (β) = log f y t yt ; β + log f m ( y ; β). (.) t = + Since the logarithm is a monotonic transformation, maximizing l gives the same ML estimator b ML as maximizing L. he first-order conditions for the sample criterion function (.) are l ( b ML ) = β log f ( ) ) yt y b ML + log f m ( t y ; b ML = 0, (.) β ; β t = + where it is presumed that, among all estimators satisfying (.), b ML is the one that maximizes l. Choosing z = (y t, y ) and t A sufficient condition for this to be true is that the time series {y t } is a strictly stationary process. Stationarity does not preclude time-varying conditional densities, but rather just that the functional form of these densities does not change over time. It turns out that b ML need not be unique for fixed, even though β 0 is the unique minimizer of the population objective function Q 0. However, this technical complication need not concern us in this introductory discussion. t [], () Lines: to *.0pt * PgEnds: Eject [], ()

.. Full Information about Distributions 0 0 0 0 log f ( ) D(z t ; β) y t yt ; β (.) β as the function defining the moment conditions to be used in estimation, it is seen that (.) gives first-order conditions of the form (.), except for the last term in (.). For the purposes of large-sample arguments developed more formally in Chapter, we can safely ignore the last term in (.) since this term converges to zero as. When the last term is omitted from (.), this objective function is referred to as the approximate log-likelihood function, whereas (.) is the exact log-likelihood function. ypically, there is no ambiguity as to which likelihood is being discussed and we refer simply to the log-likelihood function l. Focusing on the approximate log-likelihood function, fixing β, and taking the limit as gives, under the assumption that sample moments converge to their population counterparts, the associated population criterion function [ ( )] Q 0 (β) = E log f y t yt ; β. (.) o see that the β 0 generating the observed data is a maximizer of (.), and hence that this choice of Q 0 underlies a sensible estimation strategy, we observe that since the conditional density integrates to, ( ) 0 = f y t yt ; β 0 dy t β log f ( ) ( ) = y t yt ; β 0 f y t yt ; β 0 dy t β [ ] log f = E (y t yt ; β 0 ) y t, (.) β which, by the law of iterated expectations, implies that [ ] Q 0 log f (β 0 ) = E (y t yt ; β 0 ) = E [D(z t ; β 0 )] = 0. (.) β β hus, for ML estimation, (.) is the set of constraints on the joint distribution of y used in estimation, the ML version of (.0). Critical to (.) he fact that the sum in (.) begins at + is inconsequential, because we are focusing on the properties of b ML (or θ ) for large, and is fixed a priori by the asset pricing theory. here are circumstances where the small-sample properties of b ML may be substantially affected by inclusion or omission of the term log f m ( y ; β) from the likelihood function. Some of these are explored in later chapters. [], () Lines: to.pt PgEnds: E [], ()

0. Model Specification and Estimation Strategies 0 0 0 0 being satisfied by β 0 is the assumption that the conditional density f implied by the DAPM is in fact the density from which the data are drawn. An important special case of this estimation problem is where {y t } is an independently and identically distributed (i.i.d.) process. In this case, if f m (y t ; β) denotes the density function of the vector y t evaluated at β, then the log-likelihood function takes the simple form l (β) log L (β) = log f m (y t ; β). (.) t = his is an immediate implication of the independence assumption, since the joint density function of y factors into the product of the marginal densities of the y t. he ML estimator of β 0 is obtained by maximizing (.) over β. he corresponding population criterion function is Q 0 (β) = E [log f m (y t ; β)]. hough the simplicity of (.) is convenient, most dynamic asset pricing theories imply that at least some of the observed variables y are not independently distributed over time. Dependence might arise, for example, because of mean reversion in an asset return or persistence in the volatility of one or more variables (see the next example). Such time variation in conditional moments is accommodated in the formulation (.) of the conditional density of y t, but not by (.). Example.. Cox, Ingersoll, and Ross [Cox et al., b] (CIR) developed a theory of the term structure of interest rates in which the instantaneous short-term rate of interest, r, follows the mean reverting diffusion dr = κ( r r ) dt + σ rdb. (.0) An implication of (.0) is that the conditional density of r t + given r t is ( ) q ( ) f (r t + r t ; β 0 ) = ce u t v t + v t + Iq (u t v t + ), (.) u t where κ c = σ ( e κ ), (.) κ u t = σ ( e κ ) e κ r t, (.) κ v t + = σ ( e κ ) r t +, (.) [0], () Lines: to.pt * PgEnds: Eject [0], ()

.. No Information about the Distribution 0 0 0 0 q = κ r /σ, and I q is the modified Bessel function of the first kind of order q. his is the density function of a noncentral χ with q + degrees of freedom and noncentrality parameter u t. For this example, ML estimation would proceed by substituting (.) into (.) and solving for b ML. he short-rate process (.0) is the continuous time version of an interest-rate process that is mean reverting to a long-run mean of r and that has a conditional volatility of σ r. his process is Markovian and, therefore, y = y t, which explains the single lag in the conditioning t information in (.). hough desirable for its efficiency, ML may not be, and indeed typically is not, a feasible estimation strategy for DAPMs, as often they do not provide us with complete knowledge of the relevant conditional distributions. Moreover, in some cases, even when these distributions are known, the computational burdens may be so great that one may want to choose an estimation strategy that uses only a portion of the available information. his is a consideration in the preceding example given the presence of the modified Bessel function in the conditional density of r. Later in this chapter we consider the case where only limited information about the conditional distribution is known or, for computational or other reasons, is used in estimation... No Information about the Distribution At the opposite end of the knowledge spectrum about the distribution of y is the case where we do not have a well-developed DAPM to describe the relationships among the variables of interest. In such circumstances, we may be interested in learning something about the joint distribution of the vector of variables z t (which is presumed to include some asset prices or returns). For instance, we are often in a situation of wondering whether certain variables are correlated with each other or if one variable can predict another. Without knowledge of the joint distribution of the variables of interest, researchers typically proceed by projecting one variable onto another to see if they are related. he properties of the estimators in such projections are examined under this case of no information. Additionally, there are occasions when we reject a theory and a replacement theory that explains the rejection has yet to be developed. On such occasions, many have resorted to projections of one variable onto others with the hope of learning more about the source of the initial rejection. Following is an example of this second situation. Projections, and in particular linear projections, are a simple and often informative first approach to examining statistical dependencies among variables. More complex, nonlinear relations can be explored with nonparametric statistical methods. he applications of nonparametric methods to asset pricing problems are explored in subsequent chapters. [], () Lines: -.pt PgEnds: E [], ()

. Model Specification and Estimation Strategies 0 0 0 0 Example.. Several scholars writing in the 0s argued that, if foreign currency markets are informationally efficient, then the forward price for delivery of foreign exchange one period hence (F t ) should equal the market s best forecast of the spot exchange rate next period (S t + ): F = E [S t + I t ], (.) t where I t denotes the market s information at date t. his theory of exchange rate determination was often evaluated by projecting S t + F t onto a vector x t and testing whether the coefficients on x t are zero (e.g., Hansen and Hodrick, 0). he evidence suggested that these coefficients are not zero, which was interpreted as evidence of a time-varying market risk premium λ t E [S t + I t ] F (see, e.g., t Grauer et al.,, and Stockman, ). heory has provided limited guidance as to which variables determine the risk premiums or the functional forms of premiums. herefore, researchers have projected the spread S t + F onto a variety of variables t known at date t and thought to potentially explain variation in the risk premium. he objective of the latter studies was to test for dependence of λ t on the explanatory variables, say x t. o be more precise about what is meant by a projection, let L denote the set of (scalar) random variables that have finite second moments: { } L = random variables x such that Ex <. (.) We define an inner product on L by and a norm by x y E (xy), x, y L, (.) x =[ x x ] = E (x ). (.) We say that two random variables x and y in L are orthogonal to each other if E (xy) = 0. Note that being orthogonal is not equivalent to being uncorrelated as the means of the random variables may be nonzero. Let A be the closed linear subspace of L generated by all linear combinations of the K random variables {x, x,...,x K }. Suppose that we want to project the random variable y L onto A in order to obtain its best linear predictor. Letting δ (δ,...,δ K ), the best linear predictor is that element of A that minimizes the distance between y and the linear space A: min y z min y δ x... δ K x K. (.) z A δ R K [], () Lines: 0 to.pt * PgEnds: Eject [], ()

.. No Information about the Distribution 0 0 0 0 he orthogonal projection theorem tells us that the unique solution to (.) is given by the δ 0 R K satisfying [ ] E (y x δ 0 )x = 0, x = (x,..., x K ); (.0) that is, the forecast error u (y x δ 0 ) is orthogonal to all linear combinations of x. he solution to the first-order condition (.0) is δ 0 = E [xx ] E [xy]. (.) In terms of our notation for criterion functions, the population criterion function associated with least-squares projection is [ ] Q 0 (δ) = E (y t x t δ), (.) and this choice is equivalent to choosing z = (y t, x t ) and the function D as Q (δ) = (y t x t δ), (.) [ δ = t D(z t ; δ) = (y t x t δ)x t. (.) he interpretation of this choice is a bit different than in most estimation problems, because our presumption is that one is proceeding with estimation in the absence of a DAPM from which restrictions on the distribution of (y t, x t ) can be deduced. In the case of a least-squares projection, we view the moment equation [ ] [ ] E D(y t, x t ; δ 0 ) = E (y t x t δ 0 )x t = 0 (.) as the moment restriction that defines δ 0. he sample least-squares objective function is with minimizer t= x t x t ] x t y t. (.) t = t = he orthogonal projection theorem says that if L is an inner product space, M is a closed linear subspace of L, and y is an element of L, then z M is the unique solution to min y z z M if and only if y z is orthogonal to all elements of M. See, e.g., Luenberger (). [], () Lines: -.pt * PgEnds: PageBreak [], ()

. Model Specification and Estimation Strategies 0 0 0 0 he estimator δ is also obtained directly by replacing the population moments in (.) by their sample counterparts. In the context of the pricing model for foreign currency prices, researchers have projected (S t + F t ) onto a vector of explanatory variables x t. he variable being predicted in such analyses, (S t+ F t ), is not the risk premium, λ t = E [(S t + F t ) I t ]. Nevertheless, the resulting predictor in the population, x t δ 0, is the same regardless of whether λ t or (S t + F t ) is the variable being forecast. o see this, we digress briefly to discuss the difference between best linear and best prediction. he predictor x t δ 0 is the best linear predictor, which is defined by the condition that the projection error u t = y t x t δ 0 is orthogonal to all linear combinations of x t. Predicting y t using linear combinations of x t is only one of many possible approaches to prediction. In particular, we could also consider prediction based on both linear and nonlinear functions of the elements of x t. Pursuing this idea, let V denote the closed linear subspace of L generated by all random variables g (x t ) with finite second moments: { } V = g (x t ) : g : R K R, and g (x t ) L. (.) Consider the new minimization problem min z V y t z t. By the orthogonal projection theorem, the unique solution z t to this problem has the property that (y t z t ) is orthogonal to all z t V. One representation of z is the conditional expectation E [y t x t ]. his follows immediately from the properties of conditional expectations: the error ɛ t = y t E [y t x t ] satisfies [ ] E [ɛ t g (x t )] = E (y t E [y t x t ])g (x t ) = 0, (.) for all g (x t ) V. Clearly, A V so the best predictor is at least as good as the best linear predictor. he precise sense in which best prediction is better is that, whereas ɛ t is orthogonal to all functions of the conditioning information x t, u t is orthogonal to only linear combinations of x t. here are circumstances where best and best linear predictors coincide. his is true whenever the conditional expectation E [y t x t ] is linear in x t. One well-known case where this holds is when (y t, x t ) is distributed as a multivariate normal random vector. However, normality is not necessary for best and best linear predictors to coincide. For instance, consider again Example.. he conditional mean E [r t + r t ] for positive time interval is given by (Cox et al., b) κ µ rt ( ) E [r t + r t ] = r t e + r ( e κ ), (.) which is linear in r t, yet neither the joint distribution of (r t, r t ) nor the distribution of r t conditioned on r t is normal. (he latter is noncentral chi-square.) [], (0) Lines: to 0.0pt PgV PgEnds: EX [], (0)

.. Limited Information: GMM Estimators 0 0 0 0 With these observations in mind, we can now complete our argument that the properties of risk premiums can be studied by linearly projecting (S t + F t ) onto x t. Letting Proj[ x t ] denote linear least-squares projection onto x t,weget [( ) ] Proj[λ t x t ] = Proj S t+ F t ɛ t+ xt [( ) ] = Proj S t + F t x t, (.0) where ɛ t + (S t + F t ) λ t. he first equality follows from the definition of the risk premium as E [S t + F t I t ] and the second follows from the fact that ɛ t + is orthogonal to all functions of x t including linear functions... Limited Information: GMM Estimators In between the cases of full information and no information about the joint distribution of y are all of the intermediate cases of limited information. Suppose that estimation of a parameter vector θ 0 in the admissible parameter space R K is to be based on a sample z, where z t is a subvector of the complete set of variables y t appearing in a DAPM. he restrictions on the distribution of z to be used in estimating θ 0 are summarized as a set of restrictions on the moments of functions of z t. hese moment restrictions may be either conditional or unconditional.... Unconditional Moment Restrictions Consider first the case where a DAPM implies that the unconditional moment restriction E [h(z t ; θ 0 )] = 0 (.) is satisfied uniquely by θ 0, where h is an M -dimensional vector with M K. he function h may define standard central or noncentral moments of asset returns, the orthogonality of forecast errors to variables in agents information sets, and so on. Illustrations based on Example. are presented later in this section. o develop an estimator of θ 0 based on (.), consider first the case of K = M ; the number of moment restrictions equals the number of parameters to be estimated. he function H 0 : R M defined by H 0 (θ ) = here is no requirement that the dimension of be as large as the dimension of the parameter space considered in full information estimation; often is a lower-dimensional subspace of, just as z t may be a subvector of y t. However, for notational convenience, we always set the dimension of the parameter vector of interest to K, whether it is θ 0 or β 0. [], () Lines:.0pt PgEnds: E [], ()

. Model Specification and Estimation Strategies 0 0 0 0 E [h(z t ; θ)] satisfies H 0 (θ 0 ) = 0. herefore, a natural estimation strategy for θ 0 is to replace H 0 by its sample counterpart, H (θ) = h(z t ; θ), (.) t = and choose the estimator θ to set (.) to zero. If H converges to its population counterpart as gets large, H (θ) H 0 (θ), for all θ, then under regularity conditions we should expect that θ θ 0. he estimator θ is an example of what Hansen (b) refers to as a generalized methodof-moments, or GMM, estimator of θ 0. Next suppose that M > K. hen there is not in general a unique way of solving for the K unknowns using the M equations H (θ) = 0, and our strategy for choosing θ must be modified. We proceed to form K linear combinations of the M moment equations to end up with K equations in the K unknown parameters. hat is, letting A denote the set of K M (constant) matrices of rank K, we select an A Ā and set D A (z t ; θ) = Ah(z t ; θ), (.) with this choice of D A determining the estimation strategy. Different choices of A Ā index (lead to) different estimation strategies. o arrive at a sample counterpart to (.), we select a possibly sample-dependent matrix A with the property that A A (almost surely) as sample size gets large. hen the K vector θ A (the superscript A indicating that the estimator is A-dependent) is chosen to satisfy the K equations td (z t,θ A ) = 0, where D ) = A h(z t ; θ A (z t,θ A ). Note that we are now allowing D to be sample dependent directly, and not only through its dependence on θ A. his will frequently be the case in subsequent applications. he construction of GMM estimators using this choice of D can be related to the approach to estimation involving a criterion function as follows: Let {a : } be a sequence of s M matrices of rank s, K s M, and consider the function where denotes the Euclidean norm. hen Q (θ) = a H (θ), (.) argmin a H (θ) =argmin a H (θ) = argmin H (θ)a a H (θ), (.) θ θ θ and we can think of our criterion function Q as being the quadratic form Q (θ) = H (θ)w H (θ), (.) [], () Lines: to.pt PgEnds: EX [], ()

.. Limited Information: GMM Estimators 0 0 0 0 where W a a is often referred to as the distance matrix. his is the GMM criterion function studied by Hansen (b). he first-order conditions for this minimization problem are By setting H (θ ) W H (θ ) = 0. (.) θ A = [ H (θ ) / θ]w, (.) we obtain the D (z t ; θ) associated with Hansen s GMM estimator. he population counterpart to Q in (.) is Q 0 (θ) = E [h(z t ; θ)] W 0 E [h(z t ; θ)]. (.) he corresponding population D 0 (z t,θ) is given by [ ] h D 0 (z t,θ) = E (z t ; θ 0 ) W 0 h(z t ; θ) A 0 h(z t ; θ), (.0) θ where W 0 is the (almost sure) limit of W as gets large. Here D 0 is not sample dependent, possibly in contrast to D. Whereas the first-order conditions to (.) give an estimator in the class A [with A defined by (.0)], not all GMM estimators in A are the firstorder conditions from minimizing an objective function of the form (.). Nevertheless, it turns out that the optimal GMM estimators in Ā, in the sense of being asymptotically most efficient (see Chapter ), can be represented as the solution to (.) for appropriate choice of W. herefore, the largesample properties of GMM estimators are henceforth discussed relative to the sequence of objective functions {Q ( ) : } in (.).... Conditional Moment Restrictions In some cases, a DAPM implies the stronger, conditional moment restrictions E [h(z t +n ; θ 0 ) I t ] = 0, for given n, (.) where the possibility of n > is introduced to allow the conditional moment restrictions to apply to asset prices or other variables more than one period in the future. Again, the dimension of h is M, and the information set I t may be generated by variables other than the history of z t. o construct an estimator of θ 0 based on (.), we proceed as in the case of unconditional moment restrictions and choose K sample moment [], () Lines:.000pt PgEnds: E [], ()

. Model Specification and Estimation Strategies 0 0 0 0 equations in the K unknowns θ. However, because h(z t +n ; θ 0 ) is orthogonal to any random variable in the information set I t, we have much more flexibility in choosing these moment equations than in the preceding case. Specifically, we introduce a class of K M full-rank instrument matrices A t with each A t A t having elements in I t. For any A t A t, (.) implies that E [A t h(z t +n ; θ 0 )] = 0 (.) at θ = θ 0. herefore, we can define a family of GMM estimators indexed by A A, θ A, as the solutions to the corresponding sample moment equations, ( ) A t h z t +n ; θ A = 0. (.) t If the sample mean of A t h(z t +n ; θ) in (.) converges to its population counterpart in (.), for all θ, and A t and h are chosen so that θ 0 is the unique element of satisfying (.), then we might reasonably expect θ A to converge to θ 0 as gets large. he large-sample distribution of θ A depends, in general, on the choice of A t. he GMM estimator, as just defined, is not the extreme value of a specific criterion function. Rather, (.) defines θ 0 as the solution to K moment equations in K unknowns, and θ solves the sample counterpart of these equations. In this case, D 0 is chosen directly as D 0 (z t+n, A t ; θ) = D (z t +n, A t ; θ) = A t h(z t+n ; θ). (.) Once we have chosen an A t in A t, we can view a GMM estimator constructed from (.) as, trivially, a special case of an estimator based on unconditional moment restrictions. Expression (.) is taken to be the basic K moment equations that we start with. However, the important dis- tinguishing feature of the class of estimators A t, compared to the class A,is that the former class offers much more flexibility in choosing the weights on h. We will see in Chapter that the most efficient estimator in the class A is often more efficient than its counterpart in A. hat is, (.) allows one to exploit more information about the distribution of z t than (.) in the estimation of θ 0. As is discussed more extensively in the context of subsequent applications, this GMM estimation strategy is a generalization of the instrumental variables estimators proposed for classical simultaneous equations models by Amemiya () and orgenson and Laffont (), among others. [], () Lines: 0 to *.pt Short Page PgEnds: EX [], ()

.. Limited Information: GMM Estimators 0 0 0 0... Linear Projection as a GMM Estimator Perhaps the simplest example of a GMM estimator based on the moment restriction (.) is linear least-squares projection. Suppose that we project y t onto x t. hen the best linear predictor is defined by the moment equation (.0). hus, if we define ( ) ( ) h y y t x t, x t ; δ = t δ x t, (.) then by construction δ 0 satisfies E [h(y t, x t ; δ 0 )] = 0. One might be tempted to view linear projection as special case of a GMM estimator in A t by choosing n = 0, ( ) ( ) A t = x t and h y t, x t ; δ = y t x t δ. (.) However, importantly, we are not free to select among other choices of A t A t in constructing a GMM estimator of the linear predictor x t δ 0. here- fore, least-squares projection is appropriately viewed as a GMM estimator in A. Circumstances change if a DAPM implies the stronger moment restriction [( ) ] E y t x δ 0 xt = 0. (.) t Now we are no longer in an environment of complete ignorance about the distribution of (y t, x t ), as it is being assumed that x t δ 0 is the best, not just the best linear, predictor of y t. In this case, we are free to choose ( ) ( ) A t = g (x t ) and h y t, x t ; δ = y t x t δ, (.) for any g : R K R K. hus, the assumption that the best predictor is linear puts us in the case of conditional moment restrictions and opens up the possibility of selecting estimators in A defined by the functions g.... Quasi-Maximum Likelihood Estimation Another important example of a limited information estimator that is a special case of a GMM estimator is the quasi-maximum likelihood (QML) estimator. Suppose that n = and that I t is generated by the -history y t of a vector of observed variables y t. Further, suppose that the functional We employ the usual, informal notation of letting I t or y denote the σ -algebra (infort mation set) used to construct conditional moments and distributions. [], () Lines: -.0pt Short Page PgEnds: E [], ()

0. Model Specification and Estimation Strategies 0 0 0 0 forms of the population mean and variance of y t +, conditioned on I t,are known and let θ denote the vector of parameters governing these first two conditional moments. hen ML estimation of θ 0 based on the classical normal conditional likelihood function gives an estimator that converges to θ 0 and is normally distributed in large samples (see, e.g., Bollerslev and Wooldridge, ). Referring back to the introductory remarks in Chapter, we see that the function D (= D 0 = D ) determining the moments used in estimation in this case is log f N ( ) D(z t ; θ) = y t yt ; θ, (.) θ = (y t, y t ) and f N is the normal density function conditioned on where z t y t. hus, for QML to be an admissible estimation strategy for this DAPM it must be the case that θ 0 satisfies [ ] log f N ( ) E y t yt ; θ 0 = 0. (.0) θ he reason that θ 0 does in fact satisfy (.0) is that the first two conditional moments of y t are correctly specified and the normal distribution is fully characterized by its first two moments. his intuition is formalized in Chapter. he moment equation (.0) defines a GMM estimator.... Illustrations Based on Interest Rate Models Consider again the one-factor interest rate model presented in Example.. Equation (.) implies that we can choose ( ) [ ] h z κ t + ; θ 0 = r t+ r ( e κ ) e r t, (.) where z t + = (r t +, r t ). Furthermore, for any vector function g (r t ) : R R, we can set A t = g (r t ) and [ ] κ E (r t + r ( e κ ) e r t )g (r t ) = 0. (.) herefore, a GMM estimator θ A = ( r,κ ) of θ = ( r,κ) can be constructed 0 from the sample moment equations [ ] r t + r ( e κ ) e κ r t g (r t ) = 0. (.) t [0], () Lines: to *.pt * PgEnds: Eject [0], ()

.. Limited Information: GMM Estimators 0 0 0 0 Each choice of g (r t ) A t gives rise to a different GMM estimator that in general has a different large-sample distribution. Linear projection of r t onto r t is obtained as the special case with g (r t ) = (, r t ), M = K =, and θ = (κ, r ). urning to the implementation of QML estimation in this example, the mean of r t + conditioned on r t is given by (.) and the conditional variance is given by (Cox et al., b) σ σ ( ) Var [r t + r t ] = r t (e κ e κ ) + r ( e κ ). (.) κ κ σ rt If we set =, it follows that discretely sampled returns (r t, r t,...) follow the model r t+ = r ( e κ ) + e κ r t + ɛ t +, (.) where the error term ɛ t + in (.) has (conditional) mean zero and variance one. For this model, θ 0 = ( r,κ,σ ) = β 0 (the parameter vector that describes the entire distribution of r t ), though this is often not true in other applications of QML. he conditional distribution of r t is a noncentral χ. However, suppose we ignore this fact and proceed to construct a likelihood function based on our knowledge of (.) and (.), assuming that the return r t is distributed as a normal conditional on r t. hen the log-likelihood function is (l q to indicate that this is QML) ( ) q ( ) (r t µ rt ) l (θ) log(π) log σ rt σ. (.) t = rt Computing first-order conditions gives l q ( q ) θ σˆrt (r t ˆ µ rt ) σˆrt = + θ j t = ˆ θ j σˆrt θ j σ rt σ rt (r t µ ˆ rt ) µ ˆ rt + σ = 0, j =,,, (.) θ j ˆrt where θ q denotes the QML estimator and µ ˆ rt and σ ˆ are µ rt and σ rt rt evaluated at θ q. As suggested in the preceding section, this estimation strategy is admissible because the first two conditional moments are correctly specified. hough one might want to pursue GMM or QML estimation for this interest rate example because of their computational simplicity, this is not [], () Lines: 0.pt PgEnds: E [], ()

. Model Specification and Estimation Strategies 0 0 0 0 the best illustration of a limited information problem because the true likelihood function is known. However, a slight modification of the interest rate process places us in an environment where GMM is a natural estimation strategy. Example.. Suppose we extend the one-factor model introduced in Example. to the following two-factor model: dr = κ( r r ) dt + σ r vdb r, dv = ν( v v) dt + σ v vdb v. (.) In this two-factor model of the short rate, v plays the role of a stochastic volatility for r. Similar models have been studied by Anderson and Lund (a) and Dai and Singleton (000). he volatility shock in this model is unobserved, so estimation and inference must be based on the sample r and r t is no longer a Markov process conditioned on its own past history. An implication of the assumptions that r mean reverts to the long-run value of r and that the conditional mean of r does not depend on v is that (.) is still satisfied in this two-factor model. However, the variance of r t conditioned on r t is not known in closed form, nor is the form of the density of r t conditioned on r t. hus, neither ML nor QML estimation strategies are easily pursued. 0 Faced with this limited information, one con- venient strategy for estimating θ 0 ( r,κ) is to use the moment equations (.) implied by (.). his GMM estimator of θ 0 ignores entirely the known structure of the volatility process and, indeed, σ is not an element of θ 0. hus, not only r are we unable to recover any information about the parameters of the volatility equation using (.), but knowledge of the functional form of the volatility equation is ignored. It turns out that substantially more information about f (r t r t ; θ 0 ) can be used in estimation, but to accomplish this we have to extend the GMM estimation strategy to allow for unobserved state variables. his extension is explored in depth in Chapter.... GMM Estimation of Pricing Kernels As a final illustration, suppose that the pricing kernel in a DAPM is a function of a state vector x t and parameter vector θ 0. In preference-based DAPMs, the pricing kernel can be interpreted as an agent s intertemporal 0 Asymptotically efficient estimation strategies based on approximations to the true conditional density function of r have been developed for this model. hese are described in Chapter. [], () Lines: to -.pt PgV * PgEnds: PageBreak [], ()

0 0 0 0 able.. Summary of Population and Sample Objective Functions for Various Estimators Maximum likelihood GMM Least-squares projection Population [ ( )] [ ( ] objective max E log f yt y t ; β min E[h(zt ; θ)] ) W0E[h(zt ; θ)] min E yt x t δ function δ R K θ β ( t = y t x t δ ) Sample ( ) objective max t= + log f y t y t ; β min H (θ) W H (θ) min function δ R K β θ = h(z t ; θ) H (θ) = t [ ( )] [( ) ] Population E log f yt y t ; β0 = 0 A0E[h(zt ; θ 0)] = 0 E xt = 0 yt x t δ 0 β F.O.C. ( ) ( ) yt x t δ xt = 0 h(z ; θ ) = 0 t = + β t = h(z A t t = f yt t ; = 0 log y b ML Sample F.O.C. [], () Lines: 0 * 0.pt * PgEnds: PageBreak [], ()

. Model Specification and Estimation Strategies 0 0 0 0 marginal rate of substitution of consumption, in which case x t might involve consumptions of goods and θ 0 is the vector of parameters describing the agent s preferences. Alternatively, q might simply be parameterized directly as a function of financial variables. In Chapter it was noted that [( ) ] E q t +n (x t +n ; θ 0 )r t +n I t = 0, (.) for investment horizon n and the appropriate information set I t.if r t +n is chosen to be a vector of returns on M securities, M K, then (.) represents M conditional moment restrictions that can be used to construct a GMM estimator of θ (Hansen and Singleton, ). ypically, there are more than K securities at one s disposal for empirical work, in which case one may wish to select M > K. A K M matrix A t A t can then be used to construct K unconditional moment equations to be used in estimation: [ ( )] E A t q t+n (x t +n ; θ 0 )r t +n = 0. (.0) Any A t I t is an admissible choice for constructing a GMM estimator (subject to minimal regularity conditions)... Summary of Estimators he estimators introduced in this chapter are summarized in able., along with their respective first-order conditions. he large-sample properties of ML, GMM, and LLP estimators are explored in Chapter. [Last Page] [], (0) Lines: to 0.pt PgEnds: EX [], (0)