Exact Maximum Likelihood Estimation for Non-Gaussian Non-invertible Moving Averages

Size: px

Start display at page:

Download "Exact Maximum Likelihood Estimation for Non-Gaussian Non-invertible Moving Averages"

Melanie Jenkins
5 years ago
Views:

1 Exact Maximum Likelihood Estimation for Non-Gaussian Non-invertible Moving Averages Nan-Jung Hsu Institute of Statistics National Tsing-Hua University F Jay Breidt Department of Statistics Colorado State University January 3, 2005 Abstract A procedure for solving exact maximum likelihood estimation MLE is proposed for non-invertible non-gaussian MA processes By augmenting certain latent variables, the exact likelihood of all relevant innovations can be expressed explicitly according to a set of recursions Breidt and Hsu, 2005 Then, the exact MLE is solved numerically by EM algorithm Two alternative estimators are proposed subject to different treatments to the latent variables We show that these three estimators and the approximate MLE proposed by Lii and Rosenblatt 992 are asymptotically equivalent In simulations, the MLE solved by EM performs better than other estimators in terms of smaller root mean square errors for small samples Keywords: EM algorithm, Monte Carlo, non-invertible, non-minimum phase, non- Gaussian Introduction Consider a qth order moving average process MAq X t = θbz t, where {Z t } is an independent and identically distributed iid sequence of random variables with zero mean and finite variance, B is the backshift operator B k Y t = Y t k for k = 0, ±, ±2,, θz = + θ z + + θ q z q, and θ q 0 The moving average polynomial, θz, is said to be invertible if all the roots of θz = 0 are outside the unit circle in the complex plane, and non-invertible or non-minimum phase otherwise Brockwell and Davis, 99, Theorem 32

2 Invertibility is a standard assumption in the analysis of moving average time series models, because without this assumption the model is not identifiable using estimation methods based on second-order moments of the process Such methods include Gaussian likelihood, least-squares, and various spectral-based methods see, for example, Brockwell and Davis, 99 But in the non-gaussian case, invertible and non-invertible moving averages are distinguishable on the basis of higher-order cumulants or likelihood functions The invertibility assumption in the non-gaussian case is entirely artificial, and removing this assumption leads to a broad class of useful models Indeed, non-invertible moving averages and the broader class of non-minimum phase autoregressive moving average ARMA models are important tools in a number of applications, including seismic and other deconvolution problems Wiggins, 978; Ooe and Ulrych, 979; Blass and Halsey, 98; Donoho, 98; Godfrey and Rocca, 98; Hsueh and Mendel, 985; Scargle, 98, design of communication systems Benveniste, Goursat, and Roget, 980, processing of blurry images Donoho, 98; Chien, Yang, and Chi, 997, and modeling of vocal tract filters Rabiner and Schafer, 978; Chien, Yang, and Chi, 997 Estimation methods for general moving average processes include cumulant-based estimators using cumulants of order greater than two Wiggins, 978; Donoho, 98; Lii and Rosenblatt, 982; Giannakis and Swami, 990; Chi and Kung, 995; Chien, Yang, and Chi, 997; quasi-likelihood methods which lead to least absolute deviation-type estimators Huang and Pawitan, 2000; Breidt, Davis, and Trindade, 200; and the approximate maximum likelihood estimation Lii and Rosenblatt, 992 The approach by Huang and Pawitan 2000 is a semi-parametric method which uses the Laplacian likelihood and does not need the assumption for the distribution of innovations However, their likelihood is actually a conditional likelihood which is conditional on some initial variables setting to be zero This estimator is proved to be consistent only for heavy tailed errors The alternative approach by Lii and Rosenblatt 992 uses an appropriate truncation in the representation of the innovations in terms of the observations and approximates the likelihood function based on these truncated innovations This approximate MLE is shown to be asymptotically equivalent to the exact MLE under mild conditions In this work, we incorporate q latent variables into the likelihood function and solve the exact MLE of the model parameters by EM algorithm In addition, we propose two alternative estimators subject to different treatments to the latent variables One is the conditional MLE in which all of the latent variables are set to be zero The other is the joint MLE in which the latent variables and the model parameters are estimated simultaneously by maximizing the joint likelihood We show that these two alternative estimators have the same asymptotic distribution as the MLE The rest of the paper is organized as follows In Section 2, we first review the recursions given by Breidt and Hsu 2005 for computing residuals from a realization of a general moving average process Then, the likelihood function is written in terms of these residuals and the specified distribution of the innovations In Section 3, the procedures for solving exact MLE by EM algorithm and two alternative estimators are introduced In Section 4, numerical simulations are conducted for evaluating the performance of different estimators in finite samples for a non-invertible MA2 with non-gaussian noise under various parameter settings As a result, the exact MLE performs better than other estimators in terms of smaller root mean square errors for all cases A brief discussion follows in Section 5 Proofs 2

3 are given in the Appendix 2 MA processes and the likelihoods Rewrite as with and X t = θbz t = θ Bθ BZ t, 2 θ z = + θ z + + θ sz s 0 for z >, θ z = + θ z + + θ r zr 0 for z <, where r + s = q, θ s 0, and θ r 0 We further assume that any unit roots of θ z = 0 are not repeated roots The moving average polynomial, θz, is said to be invertible if s = 0 and non-invertible or non-minimum phase if s 0 We will refer to the case in which r = 0 and s > 0 as purely non-invertible According to Breidt and Hsu 2005, define W t = θ BZ t, 3 so that Z t = W t θ B Z t 4 Then, X t = θ BW t = + θ B + + θ s Bs W t = θs θb W t s, 5 where Thus, θz = + θ s z + + θ z s + z s θs θs θs W t s = X t θ s θb W t s 6 By incorporating the latent variables Z r = Z q+,, Z q+r and W s = W n s+,, W n to the observations X n = X,, X n, the joint distribution of X n, Z r, W s can be represented by the joint distribution of Z q+,, Z, Z 0, Z,, Z n via the following two 3

4 linear transformations: Z q+ Z q+r W s+ W 0 W W n = θ r θ θ r θ Z q+ Z q+r Z s+ Z 0 Z Z n, and Z q+ Z q+r X X n W n s+ W n = θ s θ θ s θ Z q+ Z q+r W s+ W n s W n s+ W n, which can be obtained directly from 4 and 6 Consequently, the joint distribution of X n, Z r, W s satisfies px n, z r, w s ; θ, = θ s n n t= q+ f z t, 7 where f z = fz/ is the probability density function of Z t with scale parameter and θ = θ, θ with θ θ, θ 2,, θ s and θ θ, θ 2,, θ r Also, note that {z t : t = q +,, n} solved by 4 and 6 are functions of θ, X n, Z r and W s, denoted as z t θ, x n, z r, w s for completeness in the following context Since the latent variables Z r and W s are unobserved in practice, we proposed three estimators of parameters by maximizing the joint distribution 7 subject to different treatments to the latent variables The first estimator is the exact MLE solved by EM algorithm in which the latent variables are treated as missing data The second estimator is the conditional MLE in which all of the latent variables are set to be zero The third estimator is the joint MLE in which the latent variables and the model parameters are estimated simultaneously The details are described in the next section 3 Estimation methods In the following, three estimators of ψ = θ, are introduced 4

5 First, the exact MLE is considered The latent variables Z r, W s are unobserved and treated as missing data in the joint distribution 7 Then, the MLE of parameters is solved recursively by the EM algorithm given in the following two steps: E-step: Compute the conditional expectation of the log-likelihood which satisfies Q ψ ψ old = n log θ s + n t= q+ E old [log f z t θ, X n, Z r, W s X n ], where the expectation E old is taken on the latent variables Z r, W s with respect to the conditional distribution given X n at current parameter estimates ψ old M-step: Solve ψ new = argmax ψ Q ψ ψ old These two steps are repeated until the convergence achieved In general, the expectation in the E-step does not have close form Therefore, we adopt a Monte Carlo method to compute the expectation numerically The same idea is used in Breidt and Hsu 2005 for obtaining the best mean square prediction for non-gaussian MA processes More precisely, the expectation in the E-step is re-written as Q t ψ ψ old E old [log f z t θ, x n, Z r, W s X n = x n ] = log f z t θ, x n, z r, w s px n, z r, w s ; ψ old dz r dw s pxn, z r, w s ; ψ old dz r dw s = log f z t θ, x n, z r, w s [ nk= q+ f old z k θ old, x n, z r, w s ] dz r dw s nk= q+ f old z k θ old, x n, z r, w s dz r dw s 8 These q-dimensional integrals can be evaluated numerically via importance sampling as follows Let hz r, w s be a q-dimensional joint density called importance sampler satisfying supph supp px n, z r, w s ; ψ old dx n, where supph is the support of h For any z i r, w i s supph, define the importance weight A x n, z i r Then, 8 can be written as, wi s ; ψ old nk= q+ f old zk θ old, x n, z r i, w i s = h z i r, w i s Q t ψ ψ old = log f z t θ, x n, z r, w s A x n, z r, w s ; ψ old h z r, w s dz r dw s A xn, z r, w s ; ψ old h z r, w s dz r dw s = E h [log f z t θ, x n, Z r, W s A x n, Z r, W s ; ψ old ], 9 E h [A x n, Z r, W s ; ψ old ] 5

6 where the expectation in 9 is taken with respect to hz r, w s The ratio in 9 can be approximated via Monte Carlo MC method as ˆQ t ψ ψ old = Mi= log f zt θ, x n, z i r, wi s A x n, z i r, wi s ; ψ old Mi= A x n, z i r, w i, 0 s ; ψ old where M is the number of draws in the importance sampling and {z i r, wi s ; i =, 2,, M} are q-dimensional random vectors drawn from the importance sampler hz r, w s Consequently, the MC approximation of Q ψ ψ old is ˆQ ψ ψ old = n log θ s + n t= q+ ˆQ t ψ ψ old, which is actually used in the implementation of the EM algorithm to solve the MLE The second estimator is the conditional MLE in which the latent variables are set to be zero Namely, ˆψ c = argmax ψ n log θ s + n t= q+ log f z t θ, x n, 0 r, 0 s The third estimator is the joint MLE in which the latent variables and the model parameters are estimated simultaneously, which is ˆψ J, ẑ r, ŵ s = argmax ψ,z r,w s n log θ s + n t= q+ log f z t θ, x n, z r, w s 2 In Lii and Rosenblatt 992, they derived the asymptotic distribution for the quasiestimator of parameters which maximizing the pseudo likelihood defined in 8 in their paper This quasi-estimator is the MLE if all {X t : t Z} are observed However, the pseudo likelihood depends on unobserved {X t } and cannot be computed in practice Therefore, they further show that the approximate MLE of parameters, which maximizes a truncated version of the pseudo likelihood, is asymptotically equivalent to the quasiestimator In this work, we further show that the conditional MLE ˆψ c and the joint MLE ˆψ J defined in and 2 are also equivalent to the quasi-estimator asymptotically The proof is given in the Appendix 4 Simulation To investigate the performance of the proposed estimators for finite samples, the following simulation study is conducted We consider two kinds of innovation distributions; one is Laplace distribution and the other is Student s t distribution with four degrees of freedom, which are the same as the setting given in Lii and Rosenblatt 992 The models considered in our simulation study include an invertible MA s = 0, r =, a non-invertible MA s =, r = 0 and a non-purely non-invertible MA2 s = r = For each MA process, 6

7 Table : Model settings for simulations Models Parameter Values Innovation Distributions invertible MA θ = 09 or θ = 08 Laplace non-invertible MA θ = /09 or θ = /08 Laplace non-invertible MA2 θ = /09 and θ = / Laplace or t4 θ = /08 and θ = /2 Laplace or t4 two parameters settings are considered in which one is closer to the unit root case and the other is not The detail settings are given in Table For each process, 200 samples with size n = 50 and n = 00 are generated and four estimators, including the MLE by EM, conditional MLE, joint MLE and the approximate MLE proposed by Lii and Rosenbaltt abbreviated by L&R MLE, are computed for each sample Similar to Lii and Rosenblatt 992 in their simulations, we use q trunc = 0 as the number of truncation for computing the approximate MLE for both sample sizes, which leads to the effective data length n 2q trunc being 30 for the case with n = 50 and being 80 for the case with n = 00 The performance of four estimators under various settings are summarized in Tables?? 5 in terms of biases, standard errors and root mean square errors The standard error based on the asymptotic distribution ASD for each process is also given at the bottom of each table for reference Note that, in Table 4 and 5, the estimation results contain two parts a and b in which the parameters are estimated under the true distribution t4 with unknown in part a and the parameters are estimated under the Laplace distribution with unknown in part b which leads to the least absolute deviation estimation ***Add results for MA here*** Generally speaking, for MA2 processes, the MLE solved by EM outperforms the other estimators in terms of smaller root mean square errors when n = 50, especially for the scale parameter Moreover, the second best estimator is the joint MLE for most of the cases with n = 50 For the rest of two estimators, the L&R MLE is better on estimating the MA coefficients and the conditional MLE is better on estimating the scale parameter Note that, compared to other estimators, the L&R MLE produces very poor estimation for the scale parameter for small samples which is probably due to small effective data length subject to the truncation in the objective function This situation is improved when the sample size increases Roughly speaking, these four estimators are fairly competitive when n = 00 which confirms their asymptotic equivalence Between different parameter settings, the efficiency gain of the MLE solved by EM relative to the L&R MLE is larger when the roots of MA polynomials are closer to the unit circle To sum up, the MLE solved by EM has best performance among four considered estimators for small samples But, from the computational point of view, the joint MLE is a good alternative for small samples, especially when the roots of MA polynomials are closer to the unit circle 7

8 Table 2: Biases, standard errors and root mean square errors of various likelihood-based estimators for a non-invertible MA2 satisfying X t = 09 B + BZ t where {Z t } are iid Laplace with = 200 replicates n = 50 n = 00 Estimator r = 09 r 2 = = r = 09 r 2 = = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse ASD Table 3: Biases, standard errors and root mean square errors of various likelihood-based estimators for a non-invertible MA2 satisfying X t = 08 B + 2 BZ t where {Z t } are iid Laplace with = 200 replicates n = 50 n = 00 Estimator r = 08 r 2 = 2 = r = 08 r 2 = 2 = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse ASD

9 Table 4: Biases, standard errors and root mean square errors of various likelihood-based estimators for a non-invertible MA2 satisfying X t = 09 B + BZ t where {Z t } are iid t4 with = 200 replicates a Parameters are estimated under f t4 n = 50 n = 00 Estimator r = 09 r 2 = = r = 09 r 2 = = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse ASD b Parameters are estimated under f Lap n = 50 n = 00 Estimator r = 09 r 2 = = r = 09 r 2 = = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse

10 Table 5: Biases, standard errors and root mean square errors of various likelihood-based estimators for a non-invertible MA2 satisfying X t = 08 B + 2 BZ t where {Z t } are iid t4 with = 200 replicates a Parameters are estimated under f t4 n = 50 n = 00 Estimator r = 08 r 2 = 2 = r = 08 r 2 = 2 = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse ASD b Parameters are estimated under f Lap n = 50 n = 00 Estimator r = 08 r 2 = 2 = r = 08 r 2 = 2 = L&R MLE bias sd rmse MLE by EM bias sd rmse Joint MLE bias sd rmse Cond MLE bias sd rmse

11 5 Discussion We proposed an algorithm to compute the exact MLE for invertible or non-invertible moving averages with non-gaussian noise Instead of truncating unobserved {X t } in the residuals for approximating the likelihood Lii and Rosenblatt, 992, we augment suitable latent variables to construct a joint likelihood We, then solve the exact MLE by the EM algorithm In addition, two alternative estimators are suggested subject to different treatments to the latent variables Both of the alternative estimators are much easier to implement but equivalent to the MLE asymptotically Simulation results show that the exact MLE solved by EM algorithm performs better than other estimators in terms of smaller root mean square errors in small samples for non-invertible non-gaussian MA processes 6 Appendix In Lii and Rosenblatt 992, they first derive the asymptotic distribution for the quasiestimator of parameters which maximizing the pseudo likelihood defined in 8 in Lii and Rosenblatt 992 However, the pseudo likelihood depends on unobserved {X t } and cannot be computed in practice, they further show that the approximate MLE of parameters, which maximizing the truncated version of pseudo likelihood defined in 9, is asymptotically equivalent to the quasi-estimator In the following, we shall show that the conditional MLE and the joint MLE defined in and 2 are also asymptotically equivalent to the quasi-estimator For notational simplicity, we only give the proof for a non-invertible MA2 with r = s = satisfying θ B = θ B and θ B = B with θ <, > The parameters considered in this particular case are ψ = θ,, First, some assumptions and notations are given The assumptions for the density of the innovations {Z t } satisfying f z = fz/ are given below: A : fx > 0 for all x A2 : f C 2 A3 : f L with f xdx = 0 A4 : xf xdx = A5 : f xdx = 0 A6 : xf xdx = 0 A7 : x 2 f xdx = 2 A8 : + x 2 [f x] 2 /fxdx < A9 : uz + h uz A + z k h + h l for all z, h with positive constants k, l, A, where u = f /f, f /f Before the proof, We first give two lemmas without proofs

12 Lemma Exercise 88 in Chapter, Durrett 2005 Assume {X t ; t =, 2, } are iid non-zero random variables, then n t= X t z t converges almost surely for z < if and only if E log + X t <, where log + x = max{logx, 0} Lemma 2 Proposition 3 in Brockwell and Davis 996 Assume {X t } are any sequence of random variables such that sup t E X t < If j ψ j <, then j ψ j X t j converges almost surely Proof: Define W t = θ BZ t and V t = θ BZ t Consequently, X t = θ BW t Given the initials w n, z, according to the backward and forward recursions in 6 and 4, we have w n j = j l j X n j+l + w n, l= θ2 j =, 2,, n, z k = k θw i k i + θ k+ z = = i=0 k i=0 k i=0 n k+i l= n k+i l= θ i θ i θ2 θ2 l X k i+l + [ k l X k i+l + θ i i=0 θ2 n k+i ] w n + θ k+ z θ2 n k θ / k+ θ / w n + θ k+ z, 3 k = 0,,, n Since the residuals {z t } are functions of θ, X n, w n and z, they are expressed as z t θ, x n, z, w n for completeness and abbreviated as z t θ z t θ, x n, z, w n and ẑ t θ z t θ, x n, ẑ, ŵ n for simplicity, where z, w n is the actual initial variables associated with X n and ẑ, ŵ n is some estimator of z, w n For Instance, ẑ, ŵ n = 0 for the conditional MLE case and ẑ, ŵ n is a function of X n for the joint MLE case Similar to Lii and Rosenblatt 992, let Q ɛ = { ψ IR q+ : ψ ψ 0 ɛ }, be a neighborhood of ψ 0 θ 0, 0, where ψ 0 is the true parameter vector and is the max norm on IR q+ Then, for small ɛ > 0, there exist a d 0, such that { max θ, } < d, 4 which is obtained directly from 32 in Lii and Rosenblatt 992 Moreover, according to 33 in Lii and Rosenblatt 992, we have sup z t θ z t θ 0 C ɛ /2 for some C > 0 According to 3 and 4, we have ẑ t θ z t θ = θ / t+ θ / θ2 2 d j X t j, n t ŵ n w n + θ t+ ẑ z,

13 and sup ẑ t θ z t θ 2 d 2 dn t ŵ n w n + d t+ ẑ z 5 Also, the first derivatives of ẑ t θ z t θ satisfy θ ẑ t θ z t θ = n t+ [ t + θ t + t θ t+ ] θ 2 ŵ n w n + t + θ t ẑ z, ẑ t θ z t θ n t+2 = θ [ t ] t+ + t + θ t θ n t+ θ 2 n t θ θ2 t+ θ / ŵ n w n, and therefore, sup θ ẑ tθ z t θ 2n t + dn t+ + 2td n+t+ d 2 2 ŵ n w n + t + d t ẑ z 6 Similarly, the second derivative of ẑ t θ z t θ satisfies sup 2 ẑ t θ z t θ θ 4 ŵ n w n { d 2 4 n t + 2d n t+2 + [2t + n t]d n+t} 7 In the following proof, we also need the derivatives of z t θ which satisfy z t θ θ = θ [θ B] W t = z t θ = θ B BW 2 t = θ B θ B W t = θ B Z t = θz j t j, j=0 [ [θ B] V t = θ θ B Z t = B ] B Z t = B j B j j+ Z t = Z t+j, j=0 j=0 θ B θ B V t = θ B θ B Z t 2 = θb j j k+ B k Z t, j=0 k=0 2 z t θ = θ θ B Z t = 3

14 in which all of the partial derivatives with respect to θ have the following form γ j Z t j, 8 with some {γ j } where γ j decay exponentially In Lii and Rosenblatt 992, the asymptotic distribution of the maximizer is derived based on the following pseudo likelihood function l n ψ = n log + t= q+ log f z t θ, in which the residuals are functions of X n, Z r, W s and can be represented as infinite series of {X t } This maximizer is ideal but cannot be computed in practice since only finite number of {X t } are observable They further show that the maximizer of a truncated version of l n ψ denoted as l q ψ, which is computable is equivalent to the quasi-mle asymptotically In this paper, we consider the following objective function: ˆl n ψ = n log + t= q+ log f ẑ t θ, where ẑ t depends on some initials ẑ, ŵ n When ẑ, ŵ n = 0, the corresponding estimator is the conditional MLE defined in One the other hand, the joint MLE defined in 2 is also a special case which maximizes ˆl n ψ where ẑ, ŵ n are functions of X n Our goal is to show that the conditional MLE, joint MLE and the quasi-mle are asymptotically equivalent Following the proof given by Lii and Rosenblatt 992, we first show that and then show that n /2 n sup n l nψ ˆl n ψ 0, as, 9 [ ψ log f ẑ t θ ] ψ log f z t θ 0, 20 ψ=ψ 0 in probability, where ψ = θ,, ψ 0 = θ 0, 0, n ˆBψ Bψ 0 0, 2 Bθθ ψ B Bψ θ ψ 2 B θ ψ B ψ ψψ log f z t θ, and ˆBψ is defined similarly but subject to ẑ t θ 4

15 For showing 9, n l nψ ˆl n ψ = n n = n n n n A + A 2, log f z t θ log f ẑ t θ log f z t θ log f ẑ t θ z tθ ẑ t θ f ẑt f θ f z t θ ẑ t θ z t θ 0 + f f z t θ ẑ t θ z t θ 0 f + n f ẑt f θ f z t θ 0 f f z t θ ẑ t θ ẑ t f θ f z t θ 0 f where the first-order Taylor expansion is used with ẑ t θ = z t θ + u t θẑ t θ z t θ for some u t θ [0, ] ie, ẑ t θ is between z tθ and ẑ t θ According to 5, we have sup A = = f z t θ 0 f sup z t θ ẑ t θ f { } 2 z t θ 0 f d 2 dn t ŵ n w n + d t+ ẑ z 2 d ŵ 2 n w n d n t f z t θ 0 f +d ẑ z d t f z t θ 0 f 22 Since { f f z t θ 0 ; t =,, n } are iid random variables with E [ f ] 2 [ f ] 2 [ z t θ 0 = f z f zdz = f f f ] z 2 f z dz = [f z] 2 2 dz <, fz because of Assumption A8, by Lemma 2, both series in 22 converge almost surely Therefore, A converges almost surely On the other hand, implies ẑ t θ z tθ 0 = z t θ z t θ 0 + u t θ ẑ t θ z t θ ẑ t θ z tθ 0 z t θ z t θ 0 + ẑ t θ z t θ, 5

16 and from Assumption A9 with u = f f, A 2 satisfies A 2 = A Therefore, sup A 2 z t θ ẑ t θ f f ẑ t θ f f zt θ 0 z t θ ẑ t θ A { + z t θ 0 / k ẑ t θ z t θ 0 + l ẑ t θ z t θ 0 l} z t θ ẑ t θ { + z t θ 0 / k z t θ z t θ 0 + ẑ t θ z t θ + l z t θ z t θ 0 + ẑ t θ z t θ l} A 0 ɛ 2 A + 0 ɛ 2 A + 0 ɛ l+ + z tθ 0 k 0 ɛ k sup z t θ ẑ t θ sup z t θ z t θ 0 + z tθ 0 k 0 ɛ k sup A A 2 + A 22 + A 23, z t θ ẑ t θ 2 sup z t θ ẑ t θ 2 l sup z t θ z t θ 0 l + 2 l sup ẑ t θ z t θ l where A 2, A 22 and A 23 are further derived below According to Equation 33 in Lii and Rosenblatt 992, A 2 = C ɛ /2 0 ɛ 2 + z tθ 0 k 0 ɛ k 2C ɛ /2 d 2 0 ɛ 2 ŵ n w n + C ɛ/2 d 0 ɛ 2 ẑ z { } d j X t j 2 d 2 dn t ŵ n w n + d t+ ẑ z d n t + z tθ 0 k 0 ɛ k d j X t j d t + z tθ 0 k 0 ɛ k d j X t j, where the summations in the last equation can be shown to converge almost surely as follows d t + z tθ 0 k d j X 0 ɛ k t j d t + z tθ 0 k 2 /2 2 /2 d t d j X 0 ɛ k t j, 23 6

17 where the first series in 23 converges almost surely by Lemma with E log + + z tθ 0 k 2 = 2E log + z tθ 0 k 0 ɛ k 0 ɛ k [ 2E log + z tθ 0 k ] [ 0 ɛ k { ztθ 0 0 ɛ} + 2E log + z tθ 0 k 0 ɛ k { ztθ 0 > 0 ɛ} [ 2 log 2P z t θ 0 0 ɛ + 2E log 2 z tθ 0 k ] 0 ɛ k { ztθ 0 > 0 ɛ} 2 log 2 + 2k E z tθ 0 0 ɛ < and the second series in 23 also converges almost surely by Lemma 2 with E 2 d j X t j EX 2 t 2 d j < Therefore, A 2 converges almost surely Similarly, for A 22, we have A 22 = + z tθ 0 k sup z 0 ɛ 2 0 ɛ k t θ ẑ t θ 2 + z tθ 0 k { 2 0 ɛ 2 0 ɛ k d 2 dn t ŵ n w n + d t+ ẑ z 4 0 ɛ 2 d 2 ŵ 2 n w n 2 d 2n t + z tθ 0 k 0 ɛ k + d2 0 ɛ ẑ 2 z 2 d 2t + z tθ 0 k 0 ɛ k 2d + 0 ɛ 2 d 2 ŵ n w n ẑ z d t + z tθ 0 k 0 ɛ k Similar to the derivation for A 2, the three series in the last equation all converge almost surely by Lemma and consequently, A 22 converges almost surely For A 23, we have A 23 2l C l ɛ l/2 { } l 2 0 ɛ l+ d 2 dn t ŵ n w n + d t+ ẑ z d j X t j 2 l { } 2 l+ + 0 ɛ l+ d 2 dn t ŵ n w n + d t+ ẑ z, where the second series converges obviously and, by Lemma 2, the first series converges almost surely if E d j X t j 7 l } 2 < 24 ]

18 Consequently, A 23 converges almost surely The proof for 9 is complete since sup n l nψ ˆl n ψ n sup A + A 2 0, as We begin to show 20 as follows First, we consider the partial derivative with respect to θ: θ log f ẑ t θ θ log f z t θ = f ẑ t θ f θ ẑtθ f z t θ f θ z tθ f zt θ f θ ẑ tθ z t θ + θ z f ẑt θ tθ f zt θ f f + θ ẑ f ẑt θ tθ z t θ f zt θ f f B t + B 2t + B 3t 25 Accordingly, the derivative with respect to θ in Equation 20 can be bounded by the sum of three series subject to the decomposition in 25 By 6, the first term of 20 subject to B t in 25 satisfies n /2 B t n /2 0 ψ=ψ 0 = n /2 0 f f f zt θ 0 f 0 θ ẑ tθ z t θ θ=θ 0 { } zt θ 0 2n t + d n t+ + 2td n+t+ d 2 2 ŵ n w n + t + d t ẑ z, 0 in which, by Lemma 2, all of the series converge almost surely since td t f zt θ 0 < and E f <, and therefore, n /2 [ n B t ] ψ=ψ converges to zero almost surely According to 8, the 0 second term of 20 subject to B 2t in 25 satisfies n /2 B 2t = n ψ=ψ /2 0 f ẑt θ 0 f zt θ 0 f 0 f 0 θ z tθ θ=θ 0 0 n /2 0 A [ + z tθ 0 k k 0 0 ] 0 ẑt θ 0 z t θ l ẑt θ 0 z t θ 0 l γ j Z t j Following the similar arguments as those in calculating A 2, n /2 [ n B 2t ] ψ=ψ 0 n /2 [ n B 3t ] ψ=ψ both converge to zero almost surely by Lemma 2 since 0 and 2 E γ j Z t j EZt 2 2 γ j <, 8

19 where γ j defined in 8 decay exponentially Finally, the third term of 20 subject to B 3t in 25 satisfies n /2 B 3t A n /2 0 0, as, ψ=ψ 0 = n /2 0 { + z tθ 0 k k 0 f f { 2n t + d n t+ + 2td n+t+ ẑt θ 0 f zt θ 0 0 f 0 θ ẑ tθ z t θ θ=θ 0 } 0 ẑt θ 0 z t θ l ẑt θ 0 z t θ 0 l d 2 2 ŵ n w n + t + d t ẑ z under similar arguments as those in calculating A 22 To complete 20, we then consider the partial derivative with respect to : log f ẑ t θ log f z t θ = f ẑt θ 2 ẑ t θ f zt θ z t θ f f f zt θ 2 f ẑ tθ z t θ + z f ẑt θ tθ 2 f zt θ f f + ẑ f ẑt θ tθ z 2 t θ f zt θ f f B 4t + B 5t + B 6t 26 Accordingly, the derivative with respect to in Equation 20 can be bounded by the sum of three series subject to the decomposition in 26 Similar to deriving the convergence for A, t B 4t and t B 6t at ψ = ψ 0 both converge almost surely Also, since z t θ 0 = πbx t = π j X t j, where πb = [θ B] [ θ sb s θb ] with j π j <, the upper bound for t B 5t at ψ = ψ 0 converges almost surely under similar arguments as those for B 2t Therefore, 20 holds from the above results It is left to show 2 First, let n ˆBψ Bψ 0 = n Bψ Bψ 0 + n ˆBψ Bψ In Lii and Rosenblatt 992, it has been shown that } n Bψ Bψ 0 0, in probability, as ψ ψ 0 In the following, we only need to show n ˆBψ Bψ 0, in probability 9

20 Since 2 θθ log f z t θ = f f zt θ 2 z t θ θθ + 2 f zt θ zt θ f θ z t θ θ, we have ˆB θθ ψ B θθ ψ 2 θθ log f ẑ t θ 2 θθ log f z t θ ψ=ψ n f ẑt θ [ 2 ] ẑ t θ f θθ f zt θ [ 2 ] z t θ θ=θ f θθ θ=θ + f ẑt θ [ ] ẑt θ ẑ t θ f 2 f θ θ θ=θ zt θ [ ] zt θ z t θ f θ θ θ=θ n f ẑt θ 2 ẑ t θ z t θ f θθ + f zt θ θ=θ f f ẑt θ 2 z t θ f θθ θ=θ + f ẑt θ 2 ẑ t θ z t θ ẑ t θ z t θ f θ θ + ẑ tθ z t θ z t θ θ θ + z tθ ẑ t θ z t θ θ θ θ=θ + f ẑt θ f zt θ 2 f z t θ z t θ f θ θ 27 θ=θ Using similar techniques used in the previous proof, it can be shown that each series in the decomposition in 27 converges almost surely Similarly, since we have 2 log f z 2 t θ = z2 t θ 4 ˆB ψ B ψ f zt θ f f f f f + 2 z tθ f 3 f zt θ, ẑt θ f ẑ 2 zt θ t θ z 2 f t θ ẑt θ ẑ t θ f zt θ z f t θ 28 Ignoring the scalar / 4, the first term on the right hand side of 28 can be bounded by f ẑt θ f zt θ f f z2 t θ f ẑt θ + f ẑ2 t θ zt 2 θ f ẑt θ f zt θ { } f f z2 t θ + ẑ t θ z t θ z t θ ẑ t θ z t θ f { f zt θ 0 ẑt θ f zt θ 0 } + f f f

21 Analogous to the previous proof, it can be shown that the series associated with each term in the above decomposition converges with probability one The second term on the right hand side of 28 is identical to 2 log f ẑ t θ log f z t θ ψ=ψ which also converges with probability one using the same decomposition given in 26 Finally, we consider B θ Since 2 θ log f z t θ = f zt θ z 3 t θ z tθ f zt θ zt θ f θ 2 f θ, we have ˆB θ ψ B θ ψ 2 θ log f ẑ t θ 2 θ log f z t θ ψ=ψ ẑt θ [ ] 3 ẑ t θ ẑt θ f θ θ=θ zt θ [ ] f z t θ zt θ θ θ=θ + ẑt θ [ ẑt θ 2 θ ]θ=θ f zt θ [ ] zt θ f θ θ=θ ẑt θ 3 ẑ t θ z t θ ẑ t θ z t θ θ θ=θ + ẑt θ 3 z tθ ẑ tθ z t θ + ẑ t θ z t θ z tθ θ θ θ=θ + ẑt θ f zt θ 3 f z tθ z t θ θ θ=θ + ẑt θ ẑ t θ z t θ 2 θ θ=θ + ẑt θ 2 f zt θ z t θ f θ 29 θ=θ f f f f f f f f f f f f f f Similarly, it can be shown that the series associated with each term in the decomposition in 29 converges with probability one Consequently, ˆBψ Bψ converges with probability one and n ˆBψ Bψ 0 From 9, as n, there exists a consistent sequence of estimators { ˆψ n Q ɛ } satisfying the proposed likelihood equations By Taylor expansion, we have [ ] 0 = n /2 ψ ˆl n ψ ψ= ˆψ n ψ ˆl n ψ = 0 2

22 n = n /2 [ ] ψ log f ẑ t θ + n ˆBψ n /2 ˆψ n ψ 0 ψ=ψ 0 n = n /2 [ ] [ ψ log f z t θ + n /2 ψ=ψ 0 ψ log f ẑ t θ ] ψ log f z t θ ψ=ψ 0 +n Bψ 0 n /2 ˆψ [ ] n ψ 0 + n ˆBψ Bψ 0 n /2 ˆψ n ψ 0, where ψ is between ˆψ n and ψ 0 According to 20 and 2, 0 = n /2 n which implies [ ] ψ log f z t θ + n Bψ 0 n /2 ˆψ n ψ 0 + o p, ψ=ψ 0 n /2 ˆψ n ψ 0 = [ n Bψ 0 ] N 0, Σ, n /2 [ ] ψ log f z t θ where Σ is defined in 7 and 8 in Lii and Rosenblatt 992 References + o p ψ=ψ 0 Benveniste, A, Goursat, M, and Roget, G 980 Robust identification of a nonminimum phase system: blind adjustment of linear equalizer in data communications IEEE Transactions on Automatic Control AC 25, Blass, WE and Halsey, GW 98 Deconvolution of Absorption Spectra Academic Press, New York Breidt, FJ, Davis, RA, and Trindade, AA 200 Least absolute deviation estimation for all-pass time series models Annals of Statistics 29, Breidt, JF and Hsu, N-J 2004 Best mean square prediction for moving averages forthcoming in Statistical Sinica Brockwell, PJ and Davis, RA 99 Time Series: Theory and Methods, 2nd ed Springer- Verlag, New York Chi, C-Y, and Kung, J-Y 995 A new identification algorithm for allpass systems by higher-order statistics Signal Processing 4, Chien, H-M, Yang, H-L, and Chi, C-Y 997 Parametric cumulant base phase estimation of -d and 2-d nonminimum phase systems by allpass filtering IEEE Transactions on Signal Processing 45, Donoho, D 98 On minimum entropy deconvolution In Applied Time Series Analysis II, DF Findley, ed, Academic Press, New York, pp Evans, M and Swartz, T 2000 Approximating Integrals via Monte Carlo and Deterministic Methods Oxford University Press, Oxford, UK 22

23 Gelman, A, Carlin, JB, Stern, HS and Rubin, DB 995 Chapman & Hall, London, UK Bayesian Data Analysis Giannakis, GB, and Swami, A 990 On estimating noncausal nonminimum phase ARMA models of non-gaussian processes IEEE Transactions on Acoustics, Speech, and Signal Processing 38, Godfrey, R and Rocca, F 98 Zero memory nonlinear deconvolution Geophysical Prospecting 29, Hildebrand, FB 968 Finite-Difference Equations and Simulations Prentice-Hall, Englewood Cliffs, NJ Hsueh, A-C and Mendel, JM 985 Minimum-variance and maximum-likelihood deconvolution for non-causal channel models IEEE Transactions on Geoscience and Remote Sensing 23, Huang, J and Pawitan, Y 2000 Quasi-likelihood estimation of non-invertible moving average processes Scandinavian Journal of Statistics 27, Lii, K-S and Rosenblatt, M 982 Deconvolution and estimation of transfer function phase and coefficients for nongaussian linear processes Annals of Statistics 0, Lii, K-S and Rosenblatt, M 992 An approximate maximum likelihood estimation for non-gaussian non-minimum phase moving average processes Journal of Multivariate Analysis 43, Lii, K-S and Rosenblatt, M 996 Maximum likelihood estimation for nongaussian nonminimum phase ARMA sequences Statistica Sinica 6, 22 Ooe, M and Ulrych, TJ 979 Minimum entropy deconvolution with an exponential transformation Geophysical Prospecting 27, Rabiner, LR and Schafer, RM 978 Digital Processing of Speech Signals Prentice-Hall, Englewood Cliffs, New Jersey Rosenblatt, M 2000 Gaussian and Non-Gaussian Linear Time Series and Random Fields Springer-Verlag, New York Scargle, JD 98 Phase-sensitive deconvolution to model random processes, with special reference to astronomical data In Applied Time Series Analysis II, DF Findley, ed, Academic Press, New York, Tierney, L 994 Markov chains for exploring posterior distributions with discussion, Annals of Statistics 22, Wiggins, RA 978 Minimum entropy deconvolution Geoexploration 6,

EXACT MAXIMUM LIKELIHOOD ESTIMATION FOR NON-GAUSSIAN MOVING AVERAGES

Statistica Sinica 19 (2009), 545-560 EXACT MAXIMUM LIKELIHOOD ESTIMATION FOR NON-GAUSSIAN MOVING AVERAGES Nan-Jung Hsu and F. Jay Breidt National Tsing-Hua University and Colorado State University Abstract: