Empirical Likelihood Estimation of Lévy Processes

Size: px

Start display at page:

Download "Empirical Likelihood Estimation of Lévy Processes"

Joshua Bailey
5 years ago
Views:

1 Empirical Likelihood Estimation of Lévy Processes Naoto Kunitomo and Takashi Owada December 2006 Summary We propose an estimation procedure for the parameters of Lévy processes and the infinitely divisible distributions. The empirical likelihood method gives an easy way to estimate the parameters of the infinitely divisible distributions including the stable distributions. The maximum empirical likelihood estimator by using the empirical characteristic functions gives the consistency, the asymptotic normality, and the asymptotic efficiency when the number of restrictions is large. Test procedures and some extensions to the regression and estimating equations problems with the infinitely divisible disturbances are developed. A simple empirical example on the analysis of stock index returns in Japan is discussed. Key Words Lévy process, Infinitely Divisible Distribution, Stable Distribution, Empirical Likelihood, Empirical Characteristic Function, Regression and Estimating Equation. This paper is a revised version of Discussion Paper CIRJE-F-272 (May 2004), Graduate School of Economics, University of Tokyo, which was presented at European Econometric Society Meeting at Vienna (August 2006) and Institute of Statistical Mathematics at Tokyo (September 2006). We thank Professor Y. Miyahara of Nagoya-City University for helpful comments in particular. Professor, Graduate School of Economics, University of Tokyo, 7-3- Hongo, Bunkyo-ku, Tokyo , JAPAN ( kunitomo@e.u-tokyo.ac.jp). Graduate Student, Graduate School of Economics, University of Tokyo, 7-3- Hongo, Bunkyoku, Tokyo , JAPAN.

2 . Introduction There have been growing interests on the applications of the Lévy processes and the class of infinitely divisible distributions in several research fields including financial economics. One interesting class of infinitely divisible distributions is the class of stable distributions. Since they are important classes of probability distributions, there have been extensive studies by mathematicians over several decades. See Feller (97), Zolotarev (986), and Sato (999) for the details of related problems in the probability literature. Several statistical applications of stable distributions have been applied for modeling the fat-tail phenomena sometimes observed in financial economics and other applied areas of statistics. See Mandelbrot (963), Paulson et. al. (975), and Nolan (200) for the earlier studies of the subject in the statistics literature. More recently, some applications of the more general Lévy processes and other classes of infinitely divisible distributions have been used in the analyses of financial data. See Bandorff-Nielsen et. al. (200) and Carr et. al. (2002) for recent examples. Several estimation methods for the key parameters of stable distributions have been proposed and developed over the past few decades. DuMouchel (97) has investigated the parametric maximum likelihood estimation method and Nolan (997) has extended a numerical algorithm of the likelihood evaluation. Since it is not possible to obtain any explicit form of the likelihood function for stable distributions except very special cases, Fama and Roll (968, 97) proposed a practical estimation method based on the percentiles of distributions and later MuCulloch (986) has improved their method. Also another method based on the empirical characteristic function was originally proposed by Press (972), and there have been several related studies by Paulson et. al. (975), Koutrouvelis (980), Kogon and Williams (998), Feuerverger and McDunnough (98a, 98b). These classical estimation methods could be extended to the more general Lévy processes. The main purpose of this paper is to develop a new parameter estimation procedure for the Lévy processes and some classes of infinitely divisible distributions based on the empirical likelihood approach. The empirical likelihood method was originally proposed by Owen (988, 990) for constructing nonparametric confidence intervals and later it has been extended to the estimating equations problem by Qin and Lawless (994). In this paper first we shall show that we can apply the em- 2

3 pirical likelihood approach to the estimation problem of unknown parameters for stable distributions and the resulting computational burden is not very heavy. In particular, the maximum empirical likelihood (MEL) estimator for the parameters of stable distribution has some desirable asymptotic properties; it has the consistency, the asymptotic normality, and the asymptotic efficiency when the number of restrictions on the empirical characteristic function is large under a set of regularity conditions. Also it is possible to develop the empirical likelihood ratio statistics for the parameters of stable distributions which have the desirable asymptotic property. More importantly, it is rather straightforward to extend our estimation method for the unknown parameters of stable distributions to the more general estimation problem of Lévy processes and infinitely divisible distributions. Also we can apply our method to the regression and the general estimating equation problems with stable disturbances and other infinitely divisible disturbances. We shall show that it is possible to estimate both the parameters of equations and the parameters of stable distributions (or some infinitely divisible distributions) for disturbances at the same time by our method. It seems that it is not easy to solve this estimation problem by the conventional methods proposed in the past studies and in this sense our estimation method has some advantage over other methods. For the estimating equations problem, Qin and Lawless (994) have shown some asymptotic properties of the MEL estimator and Kitamura et. al. (200) have extended their results to one direction. In this respect our study has some technical novelty because we are considering the case when the number of restrictions grows with the sample size. Hence this paper can be regarded as an extension of Qin and Lawless (994) in an important direction. In Section 2, we formulate the empirical likelihood estimation method of the class of stable distributions in the standard situation and state our main results on the asymptotic properties of the MEL estimator and the related testing procedure. Then in Section 3, we discuss the estimation problem of the Lévy processes and several infinitely divisible distributions, and an extension to the estimating equations problem when the disturbance terms follow the stable distribution (or some infinitely divisible distributions). In Section 4, we report some simulation results and in Section 5 we give an empirical analysis of the stock index returns in Japan by the use of CGMY process. We give some concluding remarks in Section 6 and the proofs of main results are given in Section 7. 3

4 2. Empirical Likelihood Estimation of Stable Distribution We first consider the situation when X i (i =,...,n) are a sequence of independently and identically distributed random variables and they follow the class of stable distributions. Let the characteristic function of X i be denoted by φ θ (t), and its real part and imaginary part be φ R θ (t) and φ I θ(t), respectively. We adopt the formulation of the characteristic function by Chamber et. al. (976) for the class of stable distribution and it is represented as φ θ (t) =φ R θ (t)+iφ I θ(t), (2.) where φ R θ (t) = e γt α cos[δt + βγt( γt α ) tan πα 2 ], φ I θ (t) = e γt α sin[δt + βγt( γt α ) tan πα], 2 and the parameter space is given by Θ = {0 <α 2, β,γ >0,δ R}. In the following analysis we denote the vector of unknown parameters θ =(α,β,γ,δ) and the stable distribution associated with θ as H θ. There are two non-standard problems in the estimation of the vector of unknown parameters θ. It has been well-known in probability theory that except some special cases (the normal distribution, the Cauchy distribution, and a Lévy distribution) we do not have a simple explicit form of the probability density function and distribution function. This makes some difficulty in the direct estimation of unknown parameters including the parametric maximum likelihood method. Also since the stable distributions do not necessarily have the first and/or second moments, some of standard techniques in the asymptotic theory cannot be directly applicable. 2. Empirical Likelihood Method In order to estimate the unknown parameters of the stable distributions, we are proposing to use the empirical likelihood approach, which is similar to the one developed by Qin and Lawless (994). Although the stable distributions do not necessarily have the first and second moments, we can utilize the information from the empirical characteristic function. We define the empirical likelihood function by n n L n (H θ )= (H θ (X k ) H θ (X k )) = p k, k= k= 4

5 where p k (k =,,n) are the probability assigned to the data points of X k. Without any further restriction except p k 0 and n k= p k =, the empirical likelihood function L n (H θ ) can be maximized at p k =/n (k =,,n). Hence the empirical likelihood ratio function is given by R n (H θ )= n k= np k. Then we define the maximum empirical likelihood estimator ˆθ n for the vector of unknown coefficients by maximizing R n (H θ ) under the restrictions : { ( P n = p k 0, p k =, p k cos(tl X k ) φ R θ (t l ) ) =0, k= k= ( p k sin(tl X k ) φ I θ (t l) ) } (2.2) =0(k =,,n; l =,...,m). k= In the above restrictions m is the number of restrictions on the characteristic function (we take m 2), and two terms n k= p k cos(t l X k ) and n k= p k sin(t l X k ) are the real part and the imaginary part of the empirical characteristic function evaluated at m different points t = t l (t <t 2 < < t m ; l =,,m). The choice of m is important and it can be dependent on the sample size n, but we shall discuss this problem later. Define 2m vectors g(x k, θ) = ( g R (X k, θ), g I (X k, θ) ), (2.3) where g R (X k, θ) = ( cos(t X k ) φ R θ (t ),...,cos(t m X k ) φ R θ (t m) ), g I (X k, θ) = ( sin(t X k ) φ I θ (t ),...,sin(t m X k ) φ I θ (t m) ),φ R θ (t k ) and φ I θ (t k) are given by (2.) evaluated at t = t k (k =,,m). Then we have the orthogonality condition E θ0 [g(x, θ 0 )] = 0, where E θ0 [ ] is the expectation operator and θ 0 is the vector of true parameter values. We assume that the convex hull P n (θ) ={ n k= p k g(x k, θ) p k 0, n k= p k =} contains 0 and use the Lagrange form as ( n ) L n (θ) = log(np k ) μ p k nλ n p k g(x k, θ), (2.4) k= k= k= where μ and λ =(λ,...,λ m,λ 2,...,λ 2m ) are the 2m vector of Lagrange multipliers. By differentiating L n (θ) with respect to p k, we have p k =/[μ + nλ g(x k, θ)] (k =,,n). Then we have ˆμ = n, ˆp k =(/n)[+λ g(x k, θ)], and λ = λ(θ) is the solution of 0 = n k= ˆp k g(x k, θ). When a 2m 2m matrix (/n) n k= g(x k, θ)g(x k, θ) 5

6 is positive definite and ˆp k > 0(k =,,n), the matrix 2 ( ) log[ + λ g(x λ λ k, θ)] = g(x k, θ)g(x k, θ) [ n n +λ g(x k, θ) ] 2 k= is also positive definite and λ = λ(θ) is the unique solution of { argmin λ log [ +λ g(x k, θ) ]}. n k= We define the maximum empirical likelihood (MEL) estimator for the vector of unknown parameters θ by maximizing the log-likelihood function l n (θ) which is given by n l n (θ) = log nˆp k = log [ +λ g(x k, θ) ]. (2.5) k= k= The numerical maximization in the MEL estimation is usually done by the two-step optimization procedure proposed by Owen. (See Owen (200) for the details.) k= 2.2 Asymptotic Properties of MEL estimation We shall investigate the asymptotic properties of the MEL estimator for θ. For the problem of the general estimating equations, Qin and Lawless (994) have proven the consistency and the asymptotic normality of the MEL estimator under a set of regularity conditions. When the number of restrictions m is fixed, we have an analogous result in our situation, which is the starting point of subsequent developments. Theorem 2. : We assume that X,...,X n are a sequence of i.i.d. random variables with the stable distribution H θ and the vector of true parameters θ 0 = (α 0,β 0,γ 0,δ 0 ) is in Int(Θ ), where Θ = {(α,β,γ,δ) :ɛ α ɛ, +ɛ α 2 ɛ, β, ɛ γ M, M δ M} with ɛ (a sufficiently small positive number) and M (a sufficiently large positive number). Let the MEL estimator be ˆθ n = argmax θ R n (θ), where { n } R n (θ) = np k p k g(x k, θ) =0,p k 0, p k =, (2.6) k= k= and a 2m vector of restrictions g(, ) is defined by (2.3). Let the Lagrange multiplier ˆλ n be the solution of g(x k, ˆθ n ) n +λ g(x k, ˆθ n ) = 0. (2.7) k= 6 k=

7 Then as n + n ˆθ n θ 0 ˆλ n d N 4+2m 0 0, Ω m O O Γ m, (2.8) where Ω m = [B m (θ 0)A m (θ 0 ) B m (θ 0 )], Γ m = A m (θ 0 ) [A m (θ 0 ) B m (θ 0 )Ω m B m (θ 0 ) ]A m (θ 0 ), and we define a 2m vector Φ θ = ( φ R θ (t ),...,φ R θ (t m ),φ I θ(t ),...,φ I θ(t m ) ), a ( ) [ 2m 4 matrix B m (θ) = Φ θ, a2m 2mmatrix A m (θ) =E θ g(x, θ)g(x, θ) ], and the (i,j)-th elements of A m (θ) are given by { φ R θ (t i + t j )+φ R θ (t i t j ) } φ R θ (t i )φ R θ (t j )( i, j m), { φ I θ (t i + t j m ) φ I θ (t i t j m ) } φ R θ (t i)φ I θ (t j m) ( i m, m + j 2m), { φ I θ (t i m + t j )+φ I θ(t i m t j ) } φ I θ(t i m )φ R θ (t j )(m + i 2m, j m), { 2 φ R θ (t i m + t j m ) φ R θ (t i m t j m ) } φ I θ (t i m)φ I θ (t j m) (m + i, j 2m), respectively. The above statement is based on Qin and Lawless (994) (their Lemma and Theorem ) and its proof is to check their sufficient conditions in our situation. We need some regularity conditions on the functions g(x, θ) with respect to θ and use a neighborhood N(θ 0 )ofθ 0 with some smoothness conditions. But it is rather straightforward to verify those conditions in our situation. For instance, we can utilize the condition that for θ N(θ 0 ) (a compact set) we have [ m ( g(x, θ) = cos(tl x) φ R θ (t l ) ) 2 m ( + sin(tl x) φ I θ(t l ) ) ] / m. l= Also it is possible to show directly that g(x, θ)/ θ j and 2 g(x, θ)/ θ j θ k are continuous in N(θ 0 ), and both g(x, θ)/ θ j and 2 g(x, θ)/ θ j θ k (i, j =,, 4) are bounded in N(θ 0 )(N(θ 0 ) Θ ). Next we consider the density function of stable distribution f θ (x) with the vector of unknown parameters θ =(α,β,γ,δ). By using the similar arguments as DuMouchel (973), we can show that f θ (x) has the following properties : (i) For x R, f θ (x) as a function of θ is continuous in Int(Θ ) and for any 7 l=

8 θ Int(Θ ) it is twice continuously differentiable. (ii) Since for any θ Int(Θ ), 2 2 f θ (x) f θ θ (x)dx = dx, (2.9) θ [ ] then E log fθ (X) θ = 0 and θ [( )( ) ] [ log fθ (X) log fθ (X) 2 ] log f θ (X) I(θ) =E θ = E θ. (2.0) θ θ θ (iii) For any θ Int(Θ ), the Fisher Information matrix I(θ) is non-singular. Hence for any non-zero vector u R 4 we have the inequality which implies that Ω m is positive definite. u I(θ 0 ) u u Ω m u, (2.) The last inequality (2.) implies that the asymptotic variance of the MEL estimator in Theorem 2. is larger than the Cramér-Rao lower-bound in general and it is asymptotically inefficient when the number of restrictions m is fixed. However, it is possible to consider the situation when m is dependent on the sample size n. For the estimation problem, we take some η (0 <η<) and m = m n =[n η ], where [c] is the largest integer not exceeding c. In order to impose m n restrictions in the form of (2.2), we set t l = Kl/m n (l =, 2,,m n ) with some large constant K (> 0). Then we have the consistency, the asymptotic normality, and the asymptotic efficiency of the MEL estimator as stated in the next theorem. The proof is lengthy and given in Section 7. Theorem 2.2 : We assume that X,...,X n are i.i.d. random variables with the stable distribution H θ. We set m = m n = [n /2 ɛ ](0< ɛ < /2) and define ˆθ n = argmax θ R n (θ), where R n (θ) is given by (2.6). (i) Assume that the true parameter vector θ 0 is in Int(Θ 2 ) and Θ 2 = {(α,β,γ,δ): ɛ α 2, β, ɛ γ M, M δ M} with ɛ (a sufficiently small positive number) and M (a sufficiently large positive number). Then as n ˆθ n p θ 0. (2.3) (ii) We restrict the parameter space such that the vector of true parameter values θ 0 is in Int(Θ ) and Θ is the same as in Theorem 2.. Then as n d n(ˆθn θ 0 ) N 4 [0, J K (θ 0 )], (2.4) 8

9 and lim J K(θ 0 )=I(θ 0 ), K + where J K (θ 0 ) = lim m Ω m (θ 0 ) and Ω m =[B m (θ 0)A m (θ 0 ) B m (θ 0 )] evaluated at the points t l = Kl/m n (l =,,m n ) and I(θ 0 ) is positive definite. There are two important special cases to be mentioned. First, when α = (i.e. the Cauchy distribution) and β 0, we have the situation that for any finite t, 2 φ θ (t) lim α α 2 + (2.5) and the convergence rate of ˆα n (the estimator of α) to could be different from n. When α = 2 and β 0, we have the situation that for any finite t lim α 2 φ θ (t) β = lim α 2 2 φ θ (t) β 2 =0. (2.6) Then the vector of unknown parameters θ is unidentified and the limiting information matrix is degenerate. (See Matsui and Takemura (2004), for instance.) It is not still clear if we have the asymptotic normality and the asymptotic efficiency of the MEL estimator in these boundary cases. 2.3 Empirical Likelihood Testing It is also possible to develop the empirical likelihood ratio statistics and testing procedures for the parameters of stable distribution which have the desirable asymptotic properties as stated in the next theorem. The proof is given in Section 7. Theorem 2.3 : Suppose the assumptions in (ii) of Theorem 2.2 hold and we set m = m n =[n /2 ɛ ](0<ɛ</2). (i) The empirical likelihood ratio statistic for testing the hypothesis H 0 : θ = θ 0 is given by W =2[l n (ˆθ n ) l n (θ 0 )], where the log-likelihood function l n (θ) is given by (2.5). Then d χ 2 (4) (2.7) W as n + when H 0 is true. (ii) For the hypothesis of the restrictions E θ0 [g(x, θ 0 )] = 0, the likelihood ratio statistic is given by W 2 = 2l n (ˆθ n ). Then W 2 2m 4m d N(0, ) (2.8) 9

10 as n + when the 2m restrictions imposed are true. The first part of Theorem 2.3 allows us to use the empirical likelihood ratio statistic for testing the standard hypothesis H 0 as well as constructing confidence sets for parameters of θ. The second part may not be standard in the statistics literature, but it corresponds to the testing problem of the overidentifying restrictions in the econometric literature. Since the degrees of freedom L = 2m 4 in the second case becomes large as n +, we have the normal distribution as the limit. 3. Estimation of Lévy Processes, Regression and Estimating Equation Problems 3. Estimation of Lévy Processes It is straightforward to apply our approach explained in Section 2 to several problems which have some difficulty in the standard procedures. For instance, we consider the estimation problem of unknown parameters in the class of Lévy processes. For any one-dimensional Lévy process Z v at a positive finite time v (> 0), it can be represented as the sum of i.i.d. random variables X vi with 0 = v 0 v v n v.for the notational convenience we take v i v i =(i =,,n= v; v 0 = 0,,v n = n) and write Z n = n i= X i. Then it has been well-known that the one-dimensional Lévy processes {Z v } and the infinitely divisible distributions for the random variables {X i } are completely determined by the characteristic function { φ θ (t) = exp ibt a } 2 t2 + [e itx itxi( x < )]ν c (dx), (3.) R where b and a ( 0) are real constants, I( ) is the indicator function, and ν c ( ) is the Lévy measure satisfying ν c ({0}) =0, [ x 2 ]ν c (dx) < + (3.2) x >0 and we denote c is the vector of some parameters. (See Sato (999) for the details of the Lévy processes and the infinitely divisible distributions.) Then the vector of unknown parameters of the infinitely divisible distributions is represented as θ = (a, b, c ). 0

11 For possible applications, we mention only three important cases of the infinitely divisible distributions used in the recent financial economics and mathematical finance. First, the class of stable distributions with the condition 0 <α<2 can be characterized by the Lévy measure c dx for x<0 x ν c (dx) = +α, (3.3) c 2 dx for x>0 x +α where c =(c,c 2,α). We should note that the parameterization of c (> 0) and c 2 (> 0) is different from the one appeared in Section 2, however, there is one-toone correspondence between the vectors (α,β,γ,δ) in Section 2 and (α, b, c,c 2 ) (see Chapter 2 of Sato (999) for the details). The second case is the CGMY process introduced by Carr et. al. (2002), which has been applied to describe the stochastic processes for financial prices. The Lévy measure for this process has been given by ν c (dx) =C 0 {I(x <0)e G x + I(x >0)e M x } x (+Y ) dx, (3.4) where the vector of parameters c =(C 0,G,M,Y) satisfies the restrictions C 0 > 0,G 0,M 0, and Y<2. The characteristic function is given by φ θ (t) = exp{i[b + C 0 Γ( Y )Y (M Y G Y )]t +C 0 Γ( Y )((M it) Y M Y +(G + it) Y G Y )]}, (3.5) where the vector of parameters is given by θ =(b, C 0,G,M,Y) and Γ( ) is the Gamma function. When Y =0, then the CGMY process is reduced to the Variance Gamma process proposed by Madan and Seneta (990). Miyahara (2002) has summarized the basic properties of the CGMY process and the Variance Gamma process in a systematic way. Although the characteristic function given by (3.5) is continuous with respect to θ, we see that for any finite t φ θ (t) Y + as Y 0orY. Hence we should be careful to treat the boundary cases as we have discussed for the class of stable distributions in Section 2. Third example is the class of normal inverse Gaussian processes, which has been introduced and discussed by Bandorff-Nielsen (998). The characteristic function

12 for this class of distributions is given by φ θ (t) = exp{δ[ α 2 β 2 α 2 (β + it) 2 ]+iμt}, (3.6) and the vector of parameters is given by θ =(μ, α, β, δ) in the present case. In these infinitely divisible distributions it is not possible to obtain the simple form of the density function and the parametric maximum likelihood estimation method has computational problems except very special cases. In this respect, the maximum empirical likelihood (MEL) method can be directly applicable and we can establish the next result. The proof is similar to that of Theorem 2.2 and it is omitted. Theorem 3. : We assume that X,...,X n are a sequence of i.i.d. random variables with the characteristic function given by (3.), which is continuous with respect to θ, and the Lévy measure ν c is absolutely continuous with respect to the Lebesgue measure. The true parameter vector θ 0 is in Int(Θ 3 ), and Θ 3 is a compact subset such that (3.) is the characteristic function of the infinitely divisible distribution with a non-degenerate density f θ ( ) everywhere positive in R. We impose 2m restriction functions as g(x k, θ) defined as (2.3) at t l = Kl/m n (l =,,m n ) with some positive constant K for the real part φ R θ (t) and the imaginary part φ I θ(t) of φ θ (t). We set m = m n =[n /2 ɛ ](0<ɛ</2). (i) Define ˆθ n = argmax θ R n (θ) and R n (θ) is given by (2.6). Then p ˆθ n θ 0. (3.7) (ii) We restrict the parameter space such that the true parameter value θ 0 is in Int(Θ 4 ) and Θ 4 is a compact subset such that I(θ 0 ) (the Fisher information matrix) is positive definite, and φ θ (t) is continuously twice-differentiable with respect to θ and their partial (first and second) derivatives are bounded by the integrable functions. Then d n(ˆθn θ 0 ) N[0, J K (θ 0 )], (3.8) where lim K + J K (θ 0 ) = I(θ 0 ) and J K (θ 0 ) are defined by the corresponding quantities as in Theorem 2.2. There could be simpler regularity conditions for the results in Theorem 3.. Since the Lévy measure is not necessarily a finite measure in the general case, however, a careful analysis would be needed to impose further conditions on ν c ( ). 2

13 3.2 Regression and Estimating Equation Problems We turn to the estimation problems of the nonlinear regression and a single structural equation in the econometric model (or the estimating equation model in statistics) represented by y j = h (y 2j, z j, θ )+u j (j =,,n), (3.9) where h (,, ) is a measurable function, y j and y 2j are and G (vector of) endogenous variables, z j is a K vector of included exogenous variables, θ = (θ k )isanr vector of unknown parameters, and {u j } are mutually independent disturbance terms with the infinitely divisible distribution H θ2 and θ 2 =(θ 2k ) is the vector of unknown parameters. When there does not exists any endogenous variable (i.e. y 2j ) in the righthand side of (3.9), we have the nonlinear regression model. When (3.9) is the first equation in a system of (+G ) structural equations which relate the vector of +G endogenous variables y j =(y j, y 2j) to the vector of K (= K +K 2 ) instrumental (or exogenous) variables z j (j =,,n). The set of instrumental variables includes the vector of explanatory variables z j appeared in the structural equation of our interest as (3.9). The restrictions we impose on the real part and the imaginary part of the characteristic function are given such that for any m different points t < <t m, E [ h 2 (z j )(e itu j φ θ2 (t)) ] = 0 (j =,,n), (3.0) where h 2 ( ) is a set of l functions of instrumental variables z j (l K) and θ 2 = (a, b, c ) is the vector of unknown parameters of the infinitely divisible distributions for the disturbance terms {u j }. Because we do not specify the structural equations except (3.9) and we only have the limited information on the set of instrumental variables (or instruments), we are actually considering the limited information estimation method in econometrics. (See Anderson (2003) and Anderson and Rubin (949) on the classical linear formulation of the related problems and see Anderson et. al. (2005) for the finite sample properties of the MEL estimator in the simple linear structural equation.) As an important application of our general procedure, we shall consider the nonlinear regression and the estimating equation problems when the disturbance terms follow the class of infinitely divisible distributions. Let θ =(θ, θ 2) and θ 2 =(α,β,γ,δ) be the vectors of unknown parameters in the estimating equations 3

14 with the infinitely divisible disturbances. The maximum empirical likelihood (MEL) estimator for the vector of unknown parameters can be defined by maximizing the Lagrange form L n(λ, θ) = log(n p j ) μ p j n p j h 2(z j ) j= j= j= { m λ k [cos(t k (y j h (y 2j, z j, θ ))) φ R θ 2 (t k )] k= } m + λ 2k [sin(t k (y j h (y 2j, z j, θ ))) φ I θ 2 (t k )] k=, (3.) where μ is a scalar Lagrange multiplier, and λ k and λ 2k (k =,,m) are l vectors of Lagrange multipliers, φ R θ 2 (t) and φ I θ 2 (t) are the real part and the imaginary part of φ θ2 (t), respectively, and p j (j =,,n) are the weighted probability functions to be chosen. The above maximization problem is the same as to maximize ( { L n (λ, θ) = log +h 2 (z m j) λ k [cos(t k (y j h (y 2j, z j, θ ))) φ R θ 2 (t k )] j= k= }) m + λ 2k [sin(t k (y j h (y 2j, z j, θ ))) φ I θ 2 (t k )], k= (3.2) where we have used the relations ˆμ = n and { [nˆp j ] m = +h 2(z j ) λ k [cos(t k (y j h (y 2j, z j, θ ))) φ R θ 2 (t k )] k= } m + λ 2k [sin(t k (y j h (y 2j, z j, θ ))) φ I θ 2 (t k )]. k= (3.3) By differentiating (3.2) with respect to λ =(λ, λ 2,, λ m, λ 2m) and combining the resulting equation with (3.3), we have the MEL estimator for the vector of parameters θ. Because we have r + 4 parameters and the number of restrictions is 2lm, the degrees of overidentifying restrictions is given by L =2lm r 4, (3.4) where we assume that L>0. In our formulation of the present problem the restrictions of (2.2) in Section 2 can be interpreted as the simplest case of (3.0) in this section when r =0,l=, 4

15 and h 2 (x) =. Also if we set y j = y j, x j = z j (j =,,n), and the vector of x j are exogenous, then we have the nonlinear regression model with the infinitely divisible disturbances or the stable disturbances. More generally, the estimation problem of structural equations has been discussed under the standard moment conditions on disturbance terms and the generalized method of moments by Hansen (982) or the estimating equation method by Godambe (960). The semi-parametric statistical estimation methods have been ususally applied. (See Hayashi (2000), for the details of standard results in the recent econometrics literature.) By applying the similar arguments as in Section 2, it may be possible to establish the asymptotic results as Theorem 2.2, Theorem 2.3 and Theorem 3. in the general estimating equations problem under a set of regularity conditions. 4. Simulation Results In order to examine the actual performance of our estimation procedure, we have done a set of Monte Carlo simulations. In the first experiment we have fixed γ = and δ =0, and simulated,000 random numbers of the stable distribution by using the method of Chamber, Mallows and Stuck (976). After some experiments, we have imposed the constraints on the empirical characteristic functions at the points t = 0.,., 2., 3., 4.. By using the restrictions at only these five points, we can get relatively accurate estimation results when the true parameter values are (α, β) (0,.8) [, ]. When (α, β) [.8, 2) [, ], however, we have had sometimes slow convergence when we had imposed the restrictions at near to the origin as t =0.. From our experiments, when we have fat tails in the empirical study of returns sometimes encountered in financial economics and the true value α is near to 2, it may be enough to use the restrictions on the empirical characteristic functions at t =0.6,., 2., 3., 4.. When γ 0,δ 0 0, it is computationally efficient to use the iterative procedure as. First we obtain a preliminary estimate by using an estimation method as McCulloch (986) and obtain ˆγ (0), ˆδ (0). 2. Apply the empirical likelihood method to the standardized data (x ˆδ (0) )/ˆγ (0),...,(x n ˆδ (0) )/ˆγ (0), and set ˆγ (), ˆδ (). 5

16 3. We set ˆγ =ˆγ (0)ˆγ (), ˆδ = ˆδ (0) + ˆδ ()ˆγ (0) as the final estimates of the parameters γ and δ. Although in our experiments we have set the sample size n = 000, we can estimate the key parameters satisfactorily as far as the cases when n 00 by imposing the restrictions at only 5 points. Table : Simulation Results of α We set γ =.0 and δ =0.0 in our simulations. The values of average, maximum, minimum, and RMSE are calculated from the estimates for each coefficients. (α, β) Average Max Min RMSE (.95,0.0) (.80,0.0) (.65,0.0) (.50,0.0) (.30,0.0) (.25,0.0) (.00,0.0) (0.80,0.0) (.50,0.5) (.0,0.5) (.00,0.5) (0.60,0.5) (0.50,0.5) We repeated our simulations 500 times in each case and calculated the average, the maximum, the minimum, and RMSE as reported in Table. Then we have compared the sample variance with the asymptotic variance for the parametric maximum likelihood estimator, which was obtained numerically by DuMouchel (97) and Nolan (200). We can define the efficiency of our estimator as the ratio of the asymptotic variance calculated from the inverse of the Fisher information and the sample variance of estimator in our simulations. we have summarized our numerical results on efficiency in Table 2 and we have found that there are not many extreme cases when we have low efficiency and our estimation method gives reasonable values in most cases. When α = 2 and β 0, there is an identification problem and 6

17 we have confirmed that some instability in numerical computations would occur without any restrictions on the parameter space. Table 2: Efficiency We set γ =.0 and δ = 0.0 in our simulations. The values in Table 2 are the efficiencies as the ratio of the asymptotic variance and the sample variance for each coefficients α,β,γ,δ in simulations. (α, β) α β γ δ (.65,0.0) (.50,0.0) (.30,0.0) (.25,0.0) (.00,0.0) (0.80,0.0) (.50,0.5) (.0,0.5) (.00,0.5) (0.60,0.5) (0.50,0.5) As the second simulation, we have examined the actual performance of the empirical likelihood estimation for the regression model with the class of stable disturbance terms, which is defined by Y j = θ X j + u j (j =,,n), (4.) where θ is the unknown (scalar) coefficient, Y j is the dependent variable, X j is the explanatory variable, and u j is the disturbance term with the stable distribution. We have set the (true) parameter values β =0,γ =,δ = 0 in the class of stable distributions and simulated {X j } such that they are a sequence of i.i.d. random variables which follow the log-normal distribution LN(0, ). We have repeated our simulations 500 times for the sample size n (= 3, 000) with the true parameter value θ =.0, and calculated the average, the maximum, the minimum, and the RMSE in Table 3. When α =.5 we also have calculated the standard least squares estimator for the coefficient parameter θ. The average and its RMSE were.005 and 0.056, 7

18 Table 3: Simulation Results for Regression We set β = δ =0.0 and set θ =(θ,α,γ) in our simulations. The values of average, maximum, minimum, and RMSE of the MEL estimates are calculated from the estimates for each coefficients. (α, γ, θ )=(0.6,.0,.0) α γ θ Average RMSE Max Min (α, γ, θ )=(.5,.0,.0) α γ θ Average RMSE Max Min respectively, while the maximum and the minimum were.4499 and , respectively. It seems that the RMSE of the least squares estimator is more than twice of the RMSE of the MEL estimator when <α<2 in our simulations. In addition to this favorable result on our estimation method, the least squares estimation often fails when 0 <α<in our limited experiments. On the other hand, we did not have any convergence problem in the MEL estimation as long as we have enough data size in the simulations. The MEL estimation procedure for the regression model with the stable disturbances has reasonable performance in all cases of our simulation. 5. An empirical example of the stock index returns in Japan As an empirical example, we discuss an analysis of the stock index returns R i (i =,,n) in Japan. We have applied the stable distribution, the CGMY process and also the AR models with the stable disturbances and the CGMY disturbances. In our analysis we have used the daily data of TOPIX, which is the major stock index in Tokyo from March 20, 990 to August 6, In each case we took m =7as the restrictions on the empirical characteristic function. First, we estimated the stable distribution with four parameters and found a significant departure of the distributions of log-returns from the Gaussian distribution. The estimated parameters and the standard errors are ˆα =.6747 (0.0292), 8

19 ˆγ = (0.02), ˆβ = (0.0880), ˆδ = (0.0223). This agrees with a wide spread observation on the distributions of daily returns on the stock markets. Second, we have estimated the CGMY process which includes some stable distributions as special cases. The estimated parameters and the standard errors are Ĉ = (0.60), Ĝ = (0.84), ˆM = (0.843), Ŷ = (0.2270). All estimated parameters of the CGMY process are statistically significant and the formulation of stable distributions for the stock index returns as well as the Gaussian distribution may not be appropriate because the estimates of G and M are quite different from 0. Since there are non-negligible autocorrelations in the TOPIX returns data, we have fitted the second order AR model and the disturbances follow the CGMY process with the restriction G = M. (We have examined several different AR models and the AR(2) model was chosen by minimizing the resulting empirical AIC.) The estimated parameters with the restriction and the standard errors are R i =.023R i.229r i 2 + u i (5.) (0.065) (0.065) and Ĉ = (0.40), Ĝ = (0.820), Ŷ =.242 (0.258). The resulting estimates are fully consistent with our observations. However, we have found that when we use finer intervals for stock returns, the estimate of α becomes smaller in its magniutude, for instance. This contradicts with the class of stable distributions for stock returns. Although there have been some empirical findings that the stable distribution may be appropriate for the stock returns data, it seems that the CGMY model may be better in some sense. Our empirical investigation suggests that the CGMY process is a good candidate for applications. Our empirical analysis here, however, is still at the preliminary stage and a fuller investigation will be needed on the related problems. 6. Conclusions This paper first develops a new parameter estimation method of stable distributions based on the empirical likelihood approach. We have shown that we can apply the empirical likelihood approach to the estimation problem of stable distributions and the computational burden is not heavy in comparison with the parametric maximum likelihood estimation. The maximum empirical likelihood (MEL) estimator 9

20 for the parameters of stable distributions has some desirable asymptotic properties; it has the consistency, the asymptotic normality, and the asymptotic efficiency when the number of restrictions is large. Also it is possible to develop the empirical likelihood ratio statistics for the parameters of stable distributions which have the desirable asymptotic property such as the asymptotic χ 2 distribution and the asymptotic normal distribution. Also we can construct a test procedure for the nullhypothesis of restrictions imposed and it is the same as the test of overidentifying conditions in the econometrics literature. We can use the testing procedures based on the empirical likelihood ratio statistics in these situations. Second, it is rather straightforward to extend our estimation method for unknown parameters of the stable distributions to the estimation of the general Lévy processes and the infinitely divisible distributions. It is also directly possible to apply our approach to the nonlinear regression and the estimating equation problems with infinitely divisible disturbance terms. We have shown that it is possible to estimate both the parameters of equations and the parameters of the distributions for disturbances at the same time by our method. It seems that it is not easy to solve this estimation problem by the conventional methods proposed in the past studies and in this sense our estimation method developed has some advantage over other methods. Finally, we should mention that our estimation method is so simple that the results can be extended to some directions. One obvious direction is to extend our method to the multivariate infinitely divisible distributions and it is straightforward to do it for the class of symmetric stable distributions. Although we have assumed that X k (k =,,n) are a sequence of i.i.d. random variables in this paper, there are many interesting applications when they are dependent. 7. Proof of Theorems In this mathematical appendix we give the proofs of Theorem 2.2 and Theorem 2.3 in Section 2. We first show the consistency of the MEL estimator, and then prove its asymptotic normality and asymptotic efficiency when m is large. We set m = m n =[n /2 ɛ ](0<ɛ</2). In our proofs we take a compact set Θ for the parameter space and we assume that the vector of true parameters θ 0 is in Int(Θ) and each elements of A m (θ) and B m (θ) are bounded. We first prepare two lemmas (Lemma and Lemma 2), which are needed for the 20

21 proof of Theorems. Lemma : For a sufficiently large K (> 0), as n λ min [Σ n (θ 0 )] > 0, (7.) where λ min [ ] is the minimum characteristic root of the 2m 2m matrix Σ n (θ 0 )= K2 m E θ 0 [ g(x, θ 0 )g(x, θ 0 ) ]. (7.2) Proof of Lemma : Consider the L 2 -class of functions c(x) such that 0 < c(x)2 dx < and define the Fourier tansform by c (t) = c(x)e itx dx. 2π Then we take c (t l ) for t l = Kl/m (l = m,,,,,m) such that 0 < ml= m c (t m l ) 2 <. We also define a 2m complex vector by C m =[c (t ),,c (t m ),c ( t ),,c ( t m )] and g(x, θ) =[e itx φ θ (t ),,e itmx φ θ (t m ),e itx φ θ ( t ),,e itmx φ θ ( t m )]. Then K 2 m C me 2 θ0 [ g(x, θ0 ) g(x, θ 0 ) ] C m = K2 m m E 2 θ 0 c (t l )c (t l )(e itlx φ θ0 (t l ))(e it l X φ θ0 (t l )) = K2 m 2 l,l = m,l 0,l 0 m l,l = m,l 0,l 0 K K K K [c (t l )c (t l )(φ θ0 (t l + t l ) φ θ0 (t l )φ θ0 (t l ))] {φ θ0 (s + t) φ θ0 (s)φ θ0 (t)}c (s)c (t)dsdt (7.3) as m = m n (n ). For sufficiently large K, A = = K K t x x K K [ 2π φ θ0 (s + t)c (s)c (t)dsdt s c(x) 2 f(x)dx, ] e i(s+t)x φ θ0 (s + t)ds e itx c(x)c (t)dxdt 2

22 and [ K B = K K K φ θ0 (s)c (s)ds] 2 = { x [ x [ ] } 2 φ θ0 (s)e isx ds c(x)dx 2π s c(x)f(x)dx] 2. We use the relation ( t a(t)b(t)dt)2 = s a(s)2 ds t b(t)2 dt (/2) s t [a(s)b(t) a(t)b(s)] 2 dsdt and set a(s) =c(s) f(s) and b(t) = f(t). Then for sufficiently large K, A B [c(x) c(y)] 2 [ f(x)f(y)] 2 dxdy 2 x y = (7.4) [c(x) c(y)] 2 f(x)f(y)dxdy > 0 2 x,y because c(x) ( L 2 ) cannot be constant. Finally, we use the fact that we can set the complex vectors C m and g(x, θ 0 ) such that the corrsponding real vectors are given by g(x, θ 0 )= Q.E.D. 2 2i 2 2i I m g(x, θ 0 ), C m = 2 2i 2 2i I m C m. Lemma 2 : Under the assumptions of Theorem 2.2, we have lim J K(θ 0 ) = lim lim [ Bm (θ 0 ) A m (θ 0 ) B m (θ 0 ) ] = I(θ0 ) (7.5) K K n as m = m n (n + ), where A m (θ 0 ) and B m (θ 0 ) are defined in Theorem 2. at t l = Kl/m (l =,,m). Proof of Lemma 2: We define a 2m complex vector Φ θ by Φ θ =[φ θ (t ),,φ θ (t m ),φ θ ( t ),,φ θ ( t m )]. Then we find that B m (θ 0 ) A m (θ 0 ) B m (θ 0 )= ( Φ θ0 ) {E θ0 [ g(x, θ 0 ) g(x, θ 0 ) ]} Φ θ0. Let w θ (t) be w θ (t) = log f θ (x) e itx dx, (7.6) 2π θ where f θ (x) is the density function with the parameter vector θ and W θ is a 2m 4 matrix W θ =(w θ (t ),, w θ (t m ), w θ ( t ),, w θ ( t m )). Then we have the 22

23 convergence as K m W Φ θ0 θ 0 = K ( ) m φθ0 (t l ) w θ0 (t l ) m θ l= m,l 0 as n. Similarly, we have the convergence as n. Hence ( = K 2 m 2 K K K W θ 0 Φ θ0 K W θ 0 E θ0 [ g(x, θ 0 ) g(x, θ 0 ) ] W θ0 ) K K w θ0 (t) ( φθ0 (t) θ {φ θ0 (s + t) φ θ0 (s)φ θ0 (t)}w θ0 (s)w θ 0 (t)dsdt ) dt (7.7) (7.8) lim { W n θ 0 E θ0 [ g(x, θ 0 ) g(x, θ 0 ) ] W θ0 } W Φ θ0 θ 0 ( ) { K φθ0 (t) K } K w θ0 (t) dt {φ K θ θ0 (s+t) φ θ0 (s)φ θ0 (t)}w θ0 (s)w θ 0 (t)dsdt K K ( ) K φθ0 (t) w θ0 (t) dt K θ. (7.9) We denote the RHS of (7.9) as Ξ K (θ 0 ). Then for any (non-degenerate) 2m 4 matrix v we have ( v Φ ) ( θ0 {v E θ0[ g(x, θ 0 ) g(x, θ 0 ) ]v} v Φ ) θ0 (7.0) can be minimized at and the minimum value is ( Φ θ0 v = { E θ0 [ g(x, θ 0 ) g(x, θ 0 ) ] } Φ θ0 ) {E θ0[ g(x, θ 0 ) g(x, θ 0 ) ]} Φ θ0. It has been well-known that the asymptotic efficiency bound is given by I(θ 0 ), provided that it is non-singular. Thus for any 4 non-zero vector u, ) u I(θ 0 ) u u ( Φθ0 {E θ0[ g(x, θ 0 ) g(x, θ 0 ) ]} Φ θ0 u ( u W θ 0 Φ θ0 ) { W θ0 E θ0 [ g(x, θ 0 ) g(x, θ 0 ) ] W θ0 } ( 23 W Φ θ0 θ 0 (7.) ) u,

24 and we have lim K Ξ K (θ 0 )=I(θ 0 ). By using the same arguments developed by Feuerverger and McDunnough (98a) for the information matrix and then we obtain the desired result. Q.E.D. Proof of Theorem 2.2 : [i] Consistency: We take a sufficiently large K (> 0) and set m = m n =[n /2 ɛ ] (0 <ɛ</2). We use the fact that K K m E [ θ 0 [g(x, θ)] 2 = (φ R θ0 (t) φ R θ (t))2 +(φ I θ 0 (t) φ I θ (t))2] dt + o(), (7.2) 0 where the first term of the right hand side is denoted by Γ 2 (K, θ 0, θ). Define a criterion function by and a function G n (θ) = log [ +λ (θ)g(x k, θ) ] (7.3) n k= u(θ) = K m E θ 0 [g(x, θ)] + K m E θ 0 [g(x, θ)]. (7.4) For any δ>0 we take a neighborhood N(θ 0,δ) and then sup K θ Θ\N (θ 0,δ) m E [ θ 0 u (θ )g(x, θ ) ] Γ 2(K, θ 0, θ ) + Γ 2 (K, θ 0, θ = o(). (7.5) ) We set β =/2 and then by using Taylor s Theorem, there exists a t (0, ) such that sup K θ Θ nβ { log [ +n β u (θ )g(x k, θ ) ] K [ } u (θ )g(x k, θ ) ] m n m n k= k= K n β (u (θ )g(x k, θ )) 2 sup θ Θ m n k= 2[+tn β u (θ )g(x k, θ )] 2 = o() because we have n β u (θ )g(x k, θ ) 2 2n β m 0 and m /2 [n β u (θ )g(x k, θ )] 2 0asn. For any ɛ>0, P sup K [ u (θ )g(x k, θ ) ] K θ Θ m n k= m E [ θ 0 u (θ )g(x, θ ) ] >ɛ ɛ E K 2 2 θ 0 sup θ Θ m n 2 u (θ ) (g(x k, θ ) E θ0 (g(x, θ ))) = o(). k= 24

25 Then, by using (7.2) and (7.5), for any δ>0there exists n 0 and H(δ) > 0 such that for all n n 0, P sup n β K { log [ +n β u (θ )g(x k, θ ) ] } > H(δ) < δ θ Θ\N (θ 0 m n 2.,δ) k= (7.6) Since sup { log [ +n β u (θ )g(x k, θ ) ] } sup G n (θ ), θ Θ\N (θ 0 n,δ) k= θ Θ\N (θ 0,δ) we have K P n β sup G n (θ ) > H(δ) < δ m θ Θ\N (θ 0 2. (7.7),δ) Now we investigate the stochastic order of the Lagrange multipliers at the true value, which is the solution of g(x k, θ 0 ) n +λ (θ 0 )g(x k, θ 0 ) = 0 (7.8) k= and we write λ(θ 0 )= λ(θ 0 ) ξ and ξ is the 2m unit vector. By multiplying ξ from the left hand side to (7.8), ( n g(x k, θ 0 )ξ ) g(x k, θ 0 ) 0 = g(x n ξ k, θ 0 ) λ(θ 0 ) k= k= + λ(θ 0 ) ξ g(x k, θ 0 ) λ(θ 0 ) ξ g(x k, θ 0 )g(x k, θ 0 ) ξ + λ(θ 0 ) max k n g(x k, θ 0 ) n k= ξ g(x k, θ 0 ) n k= and we have the inequality Since K 2 E θ0 [ mn K 2 λ(θ 0 ) + λ(θ 0 ) max k n g(x, θ 0 ) mn K 2 ξ g(x k, θ 0 ) mn. k= ξ g(x k, θ 0 )g(x k, θ 0 ) ξ k= ξ g(x k, θ 0 ) 2] ( K2 k= mn )2 E θ0 2 g(x k, θ 0 ) = O k= 25 ( ) mn,

26 and by the use of Lemma, λ(θ 0 ) = O p (7.9) mn and G n (θ 0 ) λ (θ 0 )g(x k, θ 0 )= λ(θ 0 ) n n k= ξ g(x k, θ 0 ), (7.20) k= which is of the order O p (m /2 n /2 )O p ( m/n) =O p (n ). Hence for nay δ>0 there exists H (δ) > 0 such that for all n n, P ( K m nβ G n (θ 0 ) < H (δ) ) < δ 2. (7.2) Thus by combining (7.7) and (7.2), we have P (ˆθn / N(θ 0,δ) ) <δand it implies that ˆθ p n θ 0 as n +. (ii)asymptotic Normality : We consider the first order condition of the criterion function G n (ˆθ n )/ θ = 0. Then by expanding G n (θ 0 )/ θ around at θ 0 = ˆθ n, we have n G n(θ 0 ) = 2 G n (θ n ) θ θ n(ˆθn θ 0 ), (7.22) where we have taken θ n ˆθ n ˆθ n θ 0. In order to show the asymptotic normality of the random vector n G n (θ 0 )/ θ, we write n G n(θ 0 ) θ = n n = n n ( ) g(xk, θ 0 ) λ(θ +λ (θ 0 )g(x k, θ 0 ) 0 ) ( Φ ) θ 0 λ(θ +λ (θ 0 )g(x k, θ 0 ) 0 ). k= k= (7.23) By rewriting (7.8) with respect to the Lagrange multipliers, we have 0 = n k= K 2 { m g(x k, θ 0 ) n where we set the remainder term k= K 2 } m g(x k, θ 0 )g(x k, θ 0 ) λ(θ 0 )+r n, (7.24) r n = n k= K 2 m g(x k, θ 0 )(λ (θ 0 )g(x k, θ 0 )) 2 +λ (θ 0 )g(x k, θ 0 ). 26

27 By using Lemma and (7.9), we find that r n is of the order O p (n m /2 ). Since we have the condition m n /n 0, we can approximate the random vector n G n (θ 0 )/ θ as n G n(θ 0 ) θ = n n k= ( K m Φ θ0 ) S n (θ 0 ) K m g(x k, θ 0 )+o p (), (7.25) where S n (θ) = K 2 n k= m g(x k, θ)g(x k, θ). In order to show the asymptotic normality of (7.25), for any constant vector ζ R 4 we define a set of random variables X nk (k =,...,n)by X nk = ζ K Φ θ 0 [Σ m n (θ 0 )] K g(x k, θ 0 ). m Then by using the Lindberge central limit theorem, n n X nk k= d N(0, ζ I K (θ 0 )ζ), (7.26) where I K (θ 0 )(=J K (θ 0 ) ) = lim n [B m (θ 0 ) A m (θ 0 ) B m (θ 0 )] (See Lemma 2). For an arbitrary vector ζ, we consider the stochastic order of r 2n = n n k= ( K m Φ θ0 ) (S n (θ 0 ) Σ n (θ 0 ) ) K m g(x k, θ 0 ). (7.27) Then we need to evaluate S n (θ 0 ) S n (θ 0 ) Σ n (θ 0 ) Σ n (θ 0 ). For this purpose we take δ = ɛ/2 (> 0) and then S n (θ 0 ) Σ n (θ 0 ) = o p (n /2+δ ) and there exists a η (> 0) such that Σ n (θ 0 ) η m by Lemma. Hence we have S n (θ 0 ) = {Σ n (θ 0 )+[S n (θ 0 ) Σ n (θ 0 )]} = O p (m /2 ) and r 2n O p (mn (/2 δ) ). Then r 2n is o p () since we have set m n = n /2 ɛ (0 <ɛ</2). By gathering these evaluations, n G n (θ 0 )/ θ converges to the normal distribution with the variance-covariance matrix I K (θ 0 )(=J K (θ 0 ) )asn +. Next for the second derivatives we shall show 2 G n (θ n) θ = I K (θ 0 )+o p (). (7.28) 27

28 We need to consider each terms of the second derivatives and evaluate their stochastic orders, which are given by 2 G n (θ) θ = n ( Φθ /) λ(θ)λ (θ) ( Φ θ /) k= [ + λ (θ)g(x k, θ)] 2 + ( Φθ /) λ(θ)g(x k, θ) ( λ(θ)/) n k= [ + λ (θ)g(x k, θ)] 2 ( Φθ / θ ( ) λ(θ)/) 2m λ l (θ)( 2 g l (X k, θ)/ θ ). n k= +λ (θ)g(x k, θ) n k= l= +λ (θ)g(x k, θ) (7.29) There are some complications partly because we have 2nd and 3rd terms with the derivatives of the Lagrange multipliers. By differentiating (/n) n k= g(x k, θ)/[ + λ (θ)g(x k, θ)] = 0 with respect to θ, we have the relation [ K 2 g(x k, θ)g(x k, θ) ] λ(θ) n m [ + λ (θ)g(x k= k, θ)] 2 ( = K Φθ /) n k= m +λ (θ)g(x k, θ) K 2 g(x k, θ)λ (θ) ( Φ θ /) (7.30). n k= m ( + λ (θ)g(x k, θ)) 2 For the last term of (7.30), we find ( Φ θ / ) λ(θ)g(x k, θ) n k= [ + λ (θ)g(x k, θ)] 2 Φ θ max k n [ + λ (θ)g(x k, θ)] 2 λ(θ) max g(x k, θ), k n is of the order o p (). Hence the second term of (7.29) is of the order o p (). Similarly, for the first term and the fourth term of (7.29) are dominated by max k n [ + λ (θ)g(x k, θ)] 2 λ(θ) 2 and 2m max k n +λ λ (θ)g(x k, θ) l (θ) 2 g l (X k, θ) l= θ, respectively. By evaluating the stochastic orders of these terms as before, we find that they are of the order o p (). 28 Φ θ 2

29 As the dominant term, we need to evaluate the third term of (7.29). After straightforward (but tedious) evaluations on the 3rd term of (7.29), it is possible to show that ( Φθ /) [ K 2 g(x k, θ)g(x k, θ) ] n k= +λ (θ)g(x k, θ) n k= m [ + λ (θ)g(x k, θ)] 2 n k= K 2 m Then for the third term of (7.29), we find n k= g(x k, θ)λ (θ) ( Φ θ /) [ + λ (θ)g(x k, θ)] 2 = o p(). ( Φ θ / ) ( λ(θ)/ ) +λ (θ)g(x k, θ) I K (θ 0 ) = o p() (7.3) and I K (θ 0 )=J K (θ 0 ). By using (7.26) and (7.28), we have established the asymptotic normality d n(ˆθn θ 0 ) N(0, J K (θ 0 )). (7.32) Finally, by taking sufficiently large K>0and use Lemma 2, we have that lim K J K (θ 0 )=I(θ 0 ), which is positive definite. Q.E.D. Proof of Theorem 2.3: [i] : The first part of the proof of Theorem 2.3 is similar to the proof of the testing hypothesis problem given by Owen (990), and Qin and Lawless (994) except the fact that the number of restrictions m = m n increases as n. The precise evaluations of stochastic orders in our derivations are straightforward as in the proof of Theorem 2.2. Let Y k (θ) =λ(θ) g(x k, θ) (k =,,n). Then the criterion function G n (ˆθ n ) at the MEL estimation can be re-written as G n (ˆθ n )= [Y k (ˆθ n ) n 2 Y k(ˆθ n ) 2 + ] 3 Y k(θ ) 3, (7.33) k= where we have θ ˆθ n ˆθ n θ 0. We first approximate Y k (ˆθ n ) by the sum of four terms as λ(θ 0 ) g(x k, θ 0 )+(ˆθ n θ 0 ) ( λ/) g(x k, θ 0 ) + λ(θ 0 ) ( Φ θ0 /) (ˆθ n θ 0 )+(ˆθ n θ 0 ) ( λ/) ( Φ θ0 /) (ˆθ n θ 0 ) 29

Alternative Estimation Methods of Simultaneous Equations and Microeconometric Models

CIRJE-F-32 A New Light from Old Wisdoms: Alternative Estimation Methods of Simultaneous Equations and Microeconometric Models T. W. Anderson Stanford University Naoto Kunitomo Yukitoshi Matsushita University