Chapter 11 GMM: General Formulas and Application

Similar documents
GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

GMM, HAC estimators, & Standard Errors for Business Cycle Statistics

Economic modelling and forecasting

Estimating Deep Parameters: GMM and SMM

An Introduction to Generalized Method of Moments. Chen,Rong aronge.net

u t = T u t = ū = 5.4 t=1 The sample mean is itself is a random variable. What if we sample again,

Generalized Method of Moments (GMM) Estimation

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Heteroskedasticity and Autocorrelation Consistent Standard Errors

ASSET PRICING MODELS

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Econometrics of Panel Data

GMM - Generalized method of moments

Final Exam. Economics 835: Econometrics. Fall 2010

STAT 100C: Linear models

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Volatility. Gerald P. Dwyer. February Clemson University

Econometrics Summary Algebraic and Statistical Preliminaries

Linear Regression with Time Series Data

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Heteroskedasticity and Autocorrelation

Linear Regression with Time Series Data

Generalized Method of Moment

Generalized Method of Moments Estimation

Next is material on matrix rank. Please see the handout

Department of Economics, UCSD UC San Diego

Multiple Equation GMM with Common Coefficients: Panel Data

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

LECTURE 11. Introduction to Econometrics. Autocorrelation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Drawing Inferences from Statistics Based on Multiyear Asset Returns

Chapter 1. GMM: Basic Concepts

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Statistics 910, #5 1. Regression Methods

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Lecture 4: Heteroskedasticity

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

11. Further Issues in Using OLS with TS Data

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

R = µ + Bf Arbitrage Pricing Model, APM

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Interpreting Regression Results

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

The Linear Regression Model

Quick Review on Linear Multiple Regression

The BLP Method of Demand Curve Estimation in Industrial Organization

ECON 366: ECONOMETRICS II. SPRING TERM 2005: LAB EXERCISE #10 Nonspherical Errors Continued. Brief Suggested Solutions

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

Multivariate Time Series: VAR(p) Processes and Models

Linear Regression with Time Series Data

We begin by thinking about population relationships.

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Asymptotic distribution of GMM Estimator

Intermediate Econometrics

1 Introduction to Generalized Least Squares

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Time Series 2. Robert Almgren. Sept. 21, 2009

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Least Squares Estimation-Finite-Sample Properties

The Hansen Singleton analysis

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

Econ 423 Lecture Notes

Specification testing in panel data models estimated by fixed effects with instrumental variables

Linear Model Under General Variance

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Nonlinear GMM. Eric Zivot. Winter, 2013

Heteroscedasticity and Autocorrelation

LECTURE 13: TIME SERIES I

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

GMM estimation is an alternative to the likelihood principle and it has been

Lecture 4: Testing Stuff

Iris Wang.

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

MFE Financial Econometrics 2018 Final Exam Model Solutions

Financial Econometrics

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Single Equation Linear GMM

This chapter reviews properties of regression estimators and test statistics based on

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

The Statistical Property of Ordinary Least Squares

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

FinQuiz Notes

ECON3327: Financial Econometrics, Spring 2016

Econ 582 Fixed Effects Estimation of Panel Data

Follow links for Class Use and other Permissions. For more information send to:

Spatial Regression. 11. Spatial Two Stage Least Squares. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Single Equation Linear GMM with Serially Correlated Moment Conditions

TESTING FOR CO-INTEGRATION

Transcription:

Chapter 11 GMM: General Formulas and Application

Main Content General GMM Formulas esting Moments Standard Errors of Anything by Delta Method Using GMM for Regressions Prespecified weighting Matrices and Moment Conditions Estimating on One Group of Moments,esting on Another Estimating the Spectral Density Matrix

11.1 General GMM Formulas GMM procedures can be used to implement a host of estimation and testing exercises. o estimate the parameter, you just have to remember (or look up) a few very general formulas, and then map them into your case. Express a model as E[ f( xt, b)] Everything is a vector: f can represent a vector of L sample moments, x can be M data series, b t can be N parameters.

Definition of the GMM Estimate We estimate parameters to set some linear combination of sample means of f to zero b: set ag ( b ) = 0 where a t is a matrix that defines which linear combination of g ( b) will be set to zero. If you estimate b by min g ( b) Wg( b),the firstorder condition are his is mean a b 1 g( b) = f( xt, b) g = W b t = 1 g Wg b ( b ) = 0

Standard Error of Estimate Hansen (1982), heorem 3.1 tells us that the asymptotic distribution of the GMM estimate is 1 1 ( b b) N[0,( ad) asa ( ad) ] where f g ( b) d E[ ( xt, b)] = b b S E[ f( xt, b), f( xt j, b)] j= a= plim a In practical terms,this means to use 1 1 1 var( b) = ( ad) asa ( ad)

Distribution of the moments Hansen s Lemma 4.1 gives the sampling distribution of the moments : g ( b) g b N I d ad a S I d ad a 1 1 ( ) [0,( ( ) ) ( ( ) )] 1 he I dad ( ) a terms account for the fact that in each sample some linear combinationsof g are set to zero. hen S is singular.

2 χ est A sum of squared standard normals is 2 distributed χ, so we have g b I dad asi dad a g b 1 1 1 ( )[( ( ) ) ( ( ) )] ( ) 2 is distributed χ which has degrees of freedom given by number of nonzero linear combinations of g, the number of moments less the number of estimated parameters

It does, but with a hitch: he variancecovariance matrix is singular, so you have to pseudo-invert it. For example, you can perform an eigenvalue decomposition = QΛQ' and then invert only the non-zero eigenvalues.

Efficient Estimates Hansen shows that one particular choice is statistically optimal, a = ds 1 his choice is the first order condition to 1 min g( b) S g( b) that we studied in the last { b} Chapter. With this weighting matrix, the standard error of b reduces to b b N ds d 1 1 ( ) [0,( ) ]

With the optimal weights S 1 the variance of the moments simplifies to Proof: 1 1 1 cov( g ) = ( S d( ds d) d ) 1 1 var( g ( b)) = ( I d( ad) a) S( I d( ad) a) a= ds 1 1 I d ad a S I d ad a 1 1 ( ( ) ) ( ( ) )) = I d ds d ds S I d ds d ds 1 1 1 1 ( ( ) ) ( ( ) ) = S dds dd I S dds d d 1 1 1 1 ( ( ) )( ( ) ) 1 1 1 1 = S d( ds d) d d( ds d) d + d( ds d) d = S d( ds d) d 1 1

Using this matrix in a test, there is an equivalent and simpler way to construct this test 1 g b S g b moments parameters 2 ( ) ( ) χ (# # )

Alternatively, note that S 1 is a pseudo-inverse of the second stage cov( g ) Proof: A pseudo inverse times cov( g ) should result in an idempotent matrix of the same rank as S cov( g ) = S ( S d( ds d) d ) = I S d( ds d) d 1 1 1 1 1 hen, check that the result is idempotent = 1 1 1 1 1 1 1 1 1 ( I S d( ds d) d)( I S d( ds d) d) I S d( ds d) d) cov( ) his derivation not only verifies that J has the same distribution as g cov( g) g, but that they are numerically the same in every sample. g

Model Comparisons You often want to compare one model to another. If one model can be expressed as a special or restricted case of the other or unrestricted model we can perform a statistical comparison that looks very much like a likelihood ratio test. 2 J ( restricted) J ( unrestricted) χ (# restriction) If the restricted model is really true, it should nor rise much. 2 his is a χ difference test due to Newey and West(1987), who call it the D-test

11.2 est Moments How to test one or a group of pricing error. (1)Use the formula for var( g ) 2 (2)A χ difference test We can use the sampling distribution of g, to evaluate the significance of individual pricing errors, to construct a t-test(for a 2 single moment) or a χ test(for groups of moments)

Alternatively, you can use the difference approach. Start with a general model that includes all the moments, and form an estimate of the spectral density matrix S. Set to zero the moments you want to test, and denote g s ( b) the vector of moments, including the zeros(s for smaller) g ( b ) S g ( b ) g ( b ) S g ( b ) χ 1 1 2 s s s s moments) (#eliminated If moments we want to test truly are zero,the criterion should not be that much lower 2 χ

11.3 Standard Errors of Anything by Delta Method we want to estimate a quantity that is a nonlinear function of sample means b= φ[ E( x )] = φ( u) In this case, we have For example, a correlation coefficient can be written as a function of sample means as hus, take corr( x, y ) = t t 1 dφ dφ var( b) = [ ] cov( xt, x t j )[ ] du du t Exy ( ) Ex ( ) Ey ( ) t t t t 2 2 2 2 ( t ) ( t) ( t ) ( t) Ex E x Ey E y u = E x E x E y E y E x y 2 2 [ ( t), ( t ), ( t), ( t ), ( t t)]

11.4 Using GMM for Regression Mapping any statistical procedure into GMM makes it easy to develop an asymptotic distribution that corrects for statistical problems such as non-normality, serial correlation and conditional heteroskedasticity. For example, I map OLS regressions into GMM. When errors do not obey the OLS assumptions, OLS is consistent, and often more robust than GLS, but its standard errors need to be corrected.

OLS picks parameters β to minimize the variance of the residual: β We find from the first order condition, which states that the residual is orthogonal to the right hand variable: g ( β) = E[ x ( y x β)] = 0 t t t It is exactly identified. We set the sample moments exactly to zero and there is no weighting matrix (a = I). We can solve for the estimate analytically, β = 2 min E[( yt xt) ] { β } β 1 [ E( xx t t )] E( xy t t) his is the familiar OLS formula. But its standard error need to be corrected.

We can use GMM to obtain the standard errors 1 1 through ( b b) N[0,( ds d) ], so that d = E( x x ) t t f( xt, β) = xt( yt xtβ) = xtεt 1 var( β) = E( xx ) [ E( ε xx ε )] E( xx) 1 1 t t t t t j t j t t j=

Serially Uncorrelated, Homoskedastic Errors Formally,the OLS assumptions are E( ε x, x, ε, ε ) = 0 t t t 1 t 1 t 2 2 2 E( εt xt, xt 1, εt 1, εt 2 ) = cons tant = σe he first assumption means that only the j=0 term enter the sum 2 E( εtxx t t jε t j) = E( εt xx t t ) j= he second assumption means that 2 2 E( εt xx t t ) = E( εt ) E( xx t t) Hence the standard errors reduces to our old form 1 2 1 var( β) = σ ( XX ) ε

Heteroskedastic Errors If we delete the condition homoskedasticity assumption 2 2 ( εt t, t 1, εt 1, εt 2 ) = tan = σ he standard errors are E x x cons t ε 1 = 1 2 var( β) Exx ( t t ) E( εt xx t t) Exx ( t t ) hese are known as heteroskedasticity consistent standard errors or white standard errors after White (1980)

Hansen-Hodrick Errors When the regression notation is yt+ k= β kxt+ εt+ k under the null that one-period returns are unforecastable, we still see correlation in the e t due to the overlapping data. Unforecastable returns imply E( εtεt j) = 0 for j K Under this condition, the standard errors are 1 var( β ) ( ) [ ( ε ε )] ( ) k 1 1 k = Exx t t E txx t t j t j Exx t t j= k+ 1

11.5 Prespecified Weighting Matrices and Moment Conditions In the last chapter, our final estimates were based on the efficinet S 1 weighting matrix. A prespecified weighting matrix lets you specify which moments or linear combination of moments GMM will value in the minimization. So you can also go one step further and impose which linear combinations a of moment conditions will be set to zero in estimation rather than use the choice resulting from a minimization.

1 2 For example, if g,, but = [ g, g] W = I g / b= [1,10] so that the second moment is 10 times more sensitive to the parameter value than the first moment, then GMM with fixed weighting matrix set 1 g 2 1* g + 10* = 0 If we want GMM to pay equal attention to the two moment, we can fix the a matrix directly. Using a prespecified weighting matrix is not the same thing as ignoring correlation of the error u t in the distribution theory.

How to Use Prespecified Weighting Matrices If we use weighting matrix W, the first-order conditions to min g ( bwg ) ( b) are { b} g ( b) Wg ( b ) = d Wg ( b ) = 0 b So the variance-covariance matrix of the estimated coefficients is 1 var( b) = ( d Wd) d WSWd( d Wd) 1 1 he variance-covariance matrix of the moments 1 g = I d dwd dw S I Wd dwd d 1 1 var( ) ( ( ) ) ( ( ) ) 2 he above equation can be the basis of χ test for the overidentifying restrictions. g

If we interpret () i 1 to be a generalized inverse, then 1 2 g var( g ) g χ (# moment # parameters) If var( g ) is singular, you can inverting only the nonzero eigenvalues.

Motivations for Prespecified Weighting Matrices Robustness, as with OLS vs. GLS When errors are autocorrelated or heteroskedastic and we correctly model the error covariance matrix and the regression is perfectly specified, the GLS procedure can improve efficiency. If the error covariance matrix is incorrectly, the GLS estimates can be much worse than OLS. In these cases, it is often a good idea to use OLS estimates. But we need to correct the standard error of the OLS estimates

For GMM, first-stage or other fixed weighting matrix estimates may give up something in asymptotic efficiency, standard errors and model fit tests. hey are still consistent and more robust to statistical and economic problems.but we use the S matrix in computing standard error. When the parameter estimates have a great different between the first stage and the second stage, we should decide what cause this. It is truly due to efficiency gain or a model misspecification.

Near-Singular S he spectral density matrix is often nearly singular, since asset returns are highly correlated with each other. As a result, second stage GMM tries to minimize differences and differences of differences of asset returns in order to extract statistically orthogonal components with lowest variance. his feature leads GMM to place a lot of weight on poorly estimated, economically uninteresting, or otherwise non-robust aspects of the data.

For example, suppose that S is given by So We can write where S 1 ρ S = ρ 1 hen, the GMM criterion is is equivalent to gc 1 1 1 ρ 1 ρ ρ 1 = 2 CC = S 1 1 ρ 2 2 C = 1 ρ 1 ρ 0 1 min( )( Cg) 1 min g S g

Cg gives the linear combination of moments that efficient GMM is trying to minimize. As ρ 1,for the matrix C, the (2,2) element stay at 1, but the (1,1) and (1,2) elements get very large. If ρ = 0.95 then C 3.20 3.04 = 0 1 his mean that GMM pay a little attention to the second moment, and play three times as much weight on the difference between first and second moment. hrough the decomposition of S, we can see what moments GMM is prizing.

GMM wants to focus on well-measured moments. In asset pricing applications, the errors are close to uncorrelated over time, so GMM is looking for e portfolios with small values of var( mt + 1Rt + 1). hose will be assets with small return variance. hen, GMM will pay most attention to correctly pricing the sample minimum-variance portfolio. his cause that sample minimum-variance portfolio may have little to do with the true minimum-variance portfolio. Like any portfolio on the sample frontier, its composition largely reflects luck.

Economically Interesting Moment he optimal weighting matrix makes GMM pay close attention to linear combinations of moments with small sampling error in both estimation and evaluation. We want to force the estimation and evaluation to pay attention to economically interesting moments instead.

Level Playing Field he S matrix changes as the model and as its parameters change. As the S matrix changes, which assets the GMM estimate tries hard to price well changes as well. For example we take a model m t and create a new model by simply adding noise, unrelated to asset returns (in sample), m t = mt + εt then the moment e condition g = E ( mr t t ) is unchanged. However, the 2 e e spectral density matrix S = E(( mt +εt) Rt Rt ) rise dramatically. his can reduce the J, leading to a false sense of improvement.

Level Playing Field If the sample contains a nearly riskfree portfolio of the test assets, or a portfolio with apprently small variances of m t+1 R e t+1, then J test will focus on the pricing of this portfolio and will lead to a false rejection, since there is an eigenvalue of S that is too small. Some stylized facts, such as the RMSE, pricing errors, are as interesting as the statistical tests.

Some Prespecified Weighting Matrices When the second-moment matrix of payoffs in place of S (Hansen and 1 W = E( xx ) Jagannathan (1997)). he minimum distance (second moment) between a candidate discount factor y and the space of true discount factors is the same as the minimum 1 value of the GMM criterion with W = E( xx ) as weighting matrix. Why is this true?

Proof :he distance between y and the nearest valid m is the same as the distance between is the same as the distance between proj( y X ) and. From the OLS formula, x 1 proj( y X ) E( yx ) E( xx ) x is the portfolio of that price x hen, the distance between y and is x = 1 x = pe( xx) x y x = proj( y X ) x x = ( ) ( ) ( ) x 1 1 E yx E xx x p E xx x = ( ( ) ) ( ) 1 E yx p E xx x = = 1 [ E( xy ) p] E( xx ) [ E( xy ) p] 1 gexx ( ) g

Identity Matrix Using the identity matrix weights has a particular advantage with large systems in which S is nearly singular. It avoids most of the problems associated with inverting a near-singular S matrix.

Comparing the Second-Moment and Identity Matrices he second moment matrix and the optimal weighting matrix S give an objective that is invariant to the initial choice of assets or portfolios. 1 [ E( yax) Ap] E( Axx A ) [ E( yax) Ap] 1 = [ E( yx) p] E( xx ) [ E( yx) p] It is not true of the identity or other fixed matrices. he results depend on the initial choice of portfolios. he second-moment matrix is often even more nearly singular than the spectral density matrix. It is no help on overcoming the near singularity of S.

Estimating on One Group of Moment, esting on Another We can use one set of moment for estimate and another for testing We can also using one set of asset returns and then see how the model does out of sample on another set of asset We can do all this very simply by using an appropriate weighting matrix or a prespecified moment matrix, for example a = [ I N,0 N+ M] a

11.7 Estimating the Spectral Density Matrix he optimal weighting matrix S depend on population moments, and depend on the parameters b. S = E( uu ), u ( m ( b) x p ) j= t t j t t t 1 here are a lot of parameters. How do we estimate this matrix in practice?

Use a sensible first-stage W, or transform the data In the first-stage b estimates, we should use a sensible weighting matrix. Sometimes, some moments will have different unit than other moment. For example the moment formed by * d Rt + 1 pt and the moment formed by R *1 t + 1. It is also useful to start with moments that are not horrendously correlated with each other.

a For example, you might consider and a b rather than and R. R R R a R b W 1 1 1 0 2 1 = 0 1 = 1 1 1 1

Remove means Under the null, Eu ( t ) = 0 whether we estimate the covariance matrix by removing mean. 1 1 [( ut u)( ut u) ], u = ut t= 1 t= 1 Hansen and Singleton (1982) advocate removing the means in sample. But this mothed also make that estimate S matrices are often nearly singular. Since E( uu ) = cov( uu, ) + EuEu ( ) ( ) EuEu ( ) ( ) is a singular matrix

Downweight highter-order correlations When we estimate S, we want to construct consistent estimates that are automatically positive definite in every sample. For example the Newey and West estimate, it is k k j 1 Sˆ = ( ) ( uu t t j ) j= k k t= 1 he Newey-West estimator is base on the variance of kth sums. So it is positive definite k var( ut j) = ke( uu t t) + ( k 1)[ E( uu t t 1) + E( ut 1u t)] + j= 1 k k j + [ Euu ( t t k) + Eu ( t ku t)] = k Euu ( t t j) k j= k

What value of k, or how wide a window if of another shape, should you use? he rate at which k should increase with sample size, but not as quickly as the sample size increases.

Consider parametric structures for autocorrelation and heteroskedasticity GMM is not inherently tied to nonparametric covariance matrix estimates. We can impose a parametric structure on the S matrix. For example, if we model a scalar u as an 2 AR(1) with parameter σ and ρ u then 2 j ( ) 21+ ρ S = E uu t t j = σu ρ = σu j= 1 ρ So we only need to estimate two parameter j=

Use the null limit correlations In the asset pricing setup, the null hypothesis specifies that E ( ) ( 1) 0 t ut+ 1 Et mt+ 1Rt+ 1 as well as In this situation, you can get = = Eu ( t + 1) = 0 ˆ 1 S = uu t t. = However, the null might not be correct, if the null is not correct, you have a inconsistent estimate. If the null is not correct,estimating extra lags that should be zero under the null only loses a little bit of power. t 1

Monte Carlo evidence suggest that adding null hypothesis can help with the power of test statistics. Small-sample performance of the nonparametric estimators with many lags is not very good. We can test the autocorrelated of u t to decide whether the model is right. If there is a lot of correlation, this is an indication that something is wrong with the estimate. ˆ 1 S = uu t t. = t 1

Size problems; consider a factor or other parametric cross-sectional structure when the number of moments is more than around 1/10 the number of data points, S estimates tend to become unstable and near-singular. It might be better to estimate an S imposing a factor structure on all the primitive assets. One might also use a highly structured estimate of S as weighting matrix, while using a less constrained estimate for the standard errors.

Alternatives to the two-stage procedure: iteration and one-step. Iterate: we can use this formula bˆ = min g ( b) S ( b) g ( b) 1 2 1 { b} Where b 1 is a first-stage estimate, held fixed in the minimization over b, then use ˆb 2 2 to find Sb ( ˆ 2), find ˆ 1 b3 = min[ g( b) S( b2) g( b) { b} and so on. We can find this estimate serial converge to one value.

his procedure is also likely to produce estimates that do not depend on the initial weighting matrix. Pick b and S simultaneously. When search for b, the S also change. hen the object become into 1 min[ g( b) S ( b) g( b)] { b} he first-order conditions are 1 g 1 S ( b) 2*( ) S ( b) g( b) + g( b) g( b) = 0 b b

In the iteration method, each step involves a numerical search over g( b) Sg( b), may be much quicker to minimize once over g( b. ) S( b) g( b) On the other hand, the latter is not a locally quadratic form, so the search may run into greater numerical difficulties.

he End hank you!