TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS

Size: px

Start display at page:

Download "TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS"

Dale Goodwin
5 years ago
Views:

1 TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS By Hanfeng Chen and Jiahua Chen 1 Bowling Green State University and University of Waterloo Abstract. Often a question arises as to whether the observed data are a sample from a homogeneous population or the data have come from a heterogeneous population. particular, one wants to test for a single normal distribution versus a mixture of two normal distributions. Classic asymptotic results fail to apply to the problem since the model does not satisfy the regularity conditions. This paper investigates the large sample behavior of the likelihood ratio statistic for testing homogeneity in the normal mixture in location parameters with an unknown structural parameter. It is proved that the asymptotic null distribution of the likelihood ratio statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function under the null distribution. The correlation structure of the process involved in the limiting distribution is presented explicitly. From the large sample study, it is also found that even though the structural parameter is not part of the mixing distribution, the convergence rate of its maximum likelihood estimate is n 1/4 rather than n 1/, while the mixing distribution has a convergence rate n 1/8 rather than n 1/4. This is in sharp contrast to the ordinary semi-parametric models and to the mixture models without a structural parameter. Key words and phrases: Asymptotic distribution, Gaussian process, genetic analysis, finite mixture, likelihood ratio, non-regular model, semi-parametric model. AMS 1980 subject classifications. Primary 6F03; secondary 6F05. In 1 The work was supported in part by a grant from NSERC of Canada, and an FIL grant from Bowling Green State University. 1

2 1 Introduction Consider the following problem: Let X 1,..., X n be a random sample from a mixture population (1 α)n(θ 1, σ ) + αn(θ, σ ) with the probability density function (pdf) (1 α)σ 1 φ((x θ 1 )/σ) + ασ 1 φ((x θ )/σ), (1) where φ( ) is the pdf of the standard normal N(0, 1). We wish to test H 0 : α(1 α) = 0 or θ 1 = θ, versus the full model (1), i.e., to test N(θ, σ ) versus (1 α)n(θ 1, σ ) + αn(θ, σ ). The mixture pdf (1) can also be expressed as an integral σ 1 φ((x u)/σ)dg(u), with the mixing distribution G(u) = (1 α)i(u θ 1 ) + αi(u θ ). For parametric hypothesis testing problems it is customary to use the likelihood ratio as a test statistic. Under standard regularity conditions, a classic result of Wilks (1938) states that if the null hypothesis is true, the likelihood ratio test (LRT) has, asymptotically, a χ -distribution. However, the regularity conditions are not satisfied for the mixture problem considered here. First, the null hypothesis lies on the boundary of the parameter space, whereas the standard regularity conditions require it to be in the interior. Secondly, the two statements α = 0 and θ 1 = θ which equivalently specify the null hypothesis are not exclusive. That is, there is a loss of identifiability under the null model. One may think that the unidentifiability can be eliminated by reparameterization. In that scenario, a third problem appears. The Fisher information which characterizes the behavior of the maximum likelihood estimate (MLE) degenerates. Due to these irregularity problems, the classic results break down under the mixture model: the maximum likelihood estimators (MLE s) for some model parameters are inconsistent; the usual quadratic approximation to the likelihood function is no longer appropriate; Cramér (1946) s result about the

3 asymptotic normality of the MLE and Wilks (1938) s asymptotic χ -theory of the LRT do not hold. Cheng and Traylor (1995) identified the mixture model as one of four non-regular parametric models. Due to its appealing challenge in theoretic study and its important applications to various scientific disciplines such as human genetic linkage analysis, actuarial sciences and statistical ecology, there has been increasing interest for researchers in the mixture model in recent years (e.g., Hartigan, 1985; Ghosh and Sen, 1985; Lindsay, 1989; Leroux, 199; Chernoff and Lander, 1995; Dacunha-Castelle and Gassiat, 1999; Lemdani and Pons, 1999; Chen and Chen, 001a and b). The large sample behavior of the LRT for homogeneity in the mixture model indeed is a long-standing mystery. Hartigan (1985) showed that the LRT statistic tends to infinity with probability one if the mean parameters are unbounded. The divergence behavior of the LRT is further detailed by Bickel and Chernoff (1993). One of important implications by Hartigan s result is that a bounded assumption on the mean parameters is necessary for the LRT to have a limiting distribution. Under the boundedness assumption of the mean parameters, Ghosh and Sen (1985) gave the first version of the asymptotic distributions of the LRT for testing homogeneity. However, in addition to the boundedness, they had to impose a separation condition, i.e. θ 1 θ > ɛ for some given ɛ > 0. The separation condition is obviously unsatisfactory. There have been many attempts made to remove the separation condition. Lemdani and Pons (1999) used a reparameterization approach to investigate the testing problem when one of the mean parameters is known and their study showed that there is no obvious way to remove the separation condition. Dacunha-Castelle and Gassiat (1999) developed a general reparameterization method for the testing problem in locally conic models. Their results can be applied to some useful mixture models under some situations and especially interestingly to stationary ARMA models. In the meantime, Chen and Chen (001a and b) took a different approach, i.e., the so-called sandwich method, to attack the problem without the separation condition. 3

4 From the discussion above, to remove the separation condition has been one of the central issues in the large sample study of the LRT for homogeneity since Ghosh and Sen (1985). Also the studies have been confined to the mixture models without a structural parameter. This paper intends to investigate the general problem: (a) A structural parameter is included in the mixture model to bring the model closer to the reality. The structural parameter is not required to be bounded. (b) Test for homogeneity is considered, i.e., both the two mean parameters are assumed unknown. (c) The separation condition is removed from the model. Following Chen and Chen (001a and b), we will use the sandwich method to derive the asymptotic distribution of the LRT. The main challenge in the present problem, however, is to analyze the contribution to variation due to estimating the structural parameter and separate it from estimating the mixing distribution G. Even though the paper deals with the normal mixtures, the ideas and technical treatments are applicable to general parametric mixture models. Study of the normal mixtures elucidates the difficulties and clarifies the main issues in the mixture models, some of which are often buried in the analytic conditions in a general set-up. We will start our study in Section with the case of the single mean parameter mixture model. In this problem, one of the mean parameters θ 1 and θ in the model (1) is assumed known. While study of the single mean parameter mixtures has its own virtue, main purpose of the section is to demonstrate the main ideas behind our approach and outline the study for the general mixture model (1). The asymptotic distribution of the LRT for homogeneity under the model (1) is investigated in Section 3. It is shown that the asymptotic null distribution of the LRT statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function. 4

5 Throughout the paper, without loss of generality, let the null underlying distribution be N(0, 1). For convenience of notation, we say X n (t) = O p (a n ) or = o p (a n ) if sup t T X n (t)/a n = O p (1) or sup t T X n (t)/a n = o p (1), where T is a suitably specified index set and a n is a sequence of constants or random variables. Single Mean Parameter Mixtures Assume in the model (1) that θ 1 is specific, say 0 and the other mean θ is unknown. Write θ = θ. In addition assume that θ M. Based on the observation X i s, we wish to use the LRT to test the null hypothesis H 0 : N(0, σ ) versus H a : (1 α)n(0, σ ) + αn(θ, σ ). The log-likelihood function of α, θ and σ is l n (α, θ, σ) = log[(1 α)σ 1 exp{ Xi /(σ )} + ασ 1 exp{ (X i θ) /(σ )}]. Let ˆσ 0 be the MLE of σ under the null hypothesis, i.e. ˆσ 0 = n 1 X i. Let r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, ˆσ 0 )}, i.e. { r n (α, θ, σ) = log 1 + α(exp{ X iθ θ } } 1) σ ( X n log σ ) i X + n 1 + log i. () σ n Let ˆα, ˆθ and ˆσ be the MLEs for α, θ and σ under the full model. Then the LRT is to reject the null hypothesis when R n = r n (ˆα, ˆθ, ˆσ) is large..1 Large sample behavior of the MLE s We first show that under the null hypothesis ˆσ infinity with probability approaching one. is bounded away from zero and 5

6 Lemma 1 Under the null distribution N(0, 1), there exist constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. Proof. Consider r n (α, θ, σ) defined by (). Note that when xθ θ 0, [ ] { } xθ θ xθ θ 1 + α exp{ } 1 exp. σ σ We thus have the inequality [(Xi θ θ ) + X ( i ] r n (α, θ, σ) n log σ + n 1 + log σ X i n ), (3) where t + = ti(t > 0) denotes the positive part of t. Since (X i θ θ ) + X i is equal to either (θ X i ) or X i, we see that r n (α, θ, σ) n log σ + n{1 + log( X i /n)}. Since log(n 1 X i ) 0 almost surely, the function r n (α, θ, σ) < 0 for all σ > with probability approaching 1 for some large constant. That is, lim P (ˆσ > ) = 0 for some constant. Next we show that ˆσ is also bounded away from zero asymptotically. By the uniform strong law of large numbers (see Rubin, 1956), n 1 {X i (θx i θ ) + } S(θ) = E{X (θx θ ) + }, almost surely and uniformly in θ M. Since S(θ) is continuous and positive, the minimum value of S(θ) is positive, say equal to q for some q > 0. Then with probability approaching one uniformly in α, θ and σ, ( r n (α, θ, σ) nqσ n log σ + n 1 + log Let ɛ > 0 be small enough such that q/ɛ log ɛ + 1 < 0. X i n ). It follows that with probability approaching 1 uniformly, the function r n (α, θ, σ) < 0 if σ < ɛ, implying that lim P (ˆσ ɛ) = 1. By Lemma 1, the parametric space of interest can be reduced to a compact one by restricting σ within the interval [ɛ, ]. 6

7 Lemma Under the null distribution N(0, 1), as n, ˆαˆθ 0 and ˆσ 1, in probability. Proof. As remarked, we only need to consider ɛ σ for some constants 0 < ɛ < 1 < <. Let G = {G( ) : G(u) = (1 α)i(u 0) + αi(u θ), θ M, 0 α 1}. Let the space G be metrized by taking the Lévy distance between two distribution functions G 1 and G as follows: λ(g 1, G ) = inf{τ > 0 : G 1 (u τ) τ G (u) G 1 (u + τ) + τ, for all u}. (It is well-known that the Lévy distance convergence is equivalent to the weak convergence of distribution functions. See, e.g., Chow and Teicher, 1978.) For any sequence of G j (u) = (1 α j )I(u 0) + α j I(u θ j ) in G, since θ j M < and 0 α j 1, one can find a subsequence G j such that both α j and θ j converge, say, to α and θ, implying that the subsequence G j converges to G (u) = (1 α )I(u 0) + α I(u θ ) G weakly, i.e., λ(g j, G ) 0. It is thus shown that G is compact, so is the product space Ω = {ω = (σ, G) : σ [ɛ, ], G G}. Moreover, for ω = (σ, G) Ω, put f(x; ω) = σ 1 φ{(x u)/σ}dg(u). Then the parameter ω Ω is identifiable, i.e., for any ω i Ω,,, f(x; ω 1 ) = f(x; ω ), for all x, implies ω 1 = ω. Of the compactness and identifiability, Wald (1949) s argument leads to consistency of the MLE ˆω for ω = (σ, G) under the null model ω 0 = (1, G 0 ), where G 0 (u) = I(u 0). To see this, give any small γ > 0 such that {ω = (σ, G) : ω ω 0 = σ 1 + λ(g, G 0 ) < γ} Ω. 7

8 Let Ω = {ω Ω : ω ω 0 γ}. For any ω Ω, define an open ball in Ω as B(ω, η) = {ω Ω : ω ω < η} and define f(x B) = sup{f(x; ω)}. ω B It is seen that f(x B(ω, η)) (πɛ) 1/ so that E{ log f(x B(ω, η) } <, and as η 0, E{log f(x B(ω, η))} E{log f(x; ω )}. For each ω, take η(ω ) to be small enough such that E{log[f(X B(ω, η(ω )))/φ(x)]} < 0. (Note that f(x; ω 0 ) = φ(x).) Then there are finite many ω s, say ω 1,, ω m, such that m j=1b j = Ω, where B j = B(ω j, η(ω j )). By the law of large numbers, for 1 j m, P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, where i.o stands for infinitely often. Consequently, for the MLE ˆω = (ˆσ, Ĝ), P {ˆω Ω, m i.o} P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, j=1 i.e., ˆσ 1 and λ(ĝ, G 0) 0 in probability. Finally, it is implied that the MLEs of the moments u k dg(u) = αθ k are also consistent. Note that u k dg 0 (u) = 0 under the null model N(0, 1). The lemma is proved. Attributing to Lemma, σ can be viewed to have been restricted in any small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive δ. This restriction will be used to ensure the tightness of some processes later. Here we would like to point out that Lemma does not imply anything about the rate of convergence. We also like to remark that Lemma does not say that ˆα or ˆθ is consistent. In fact, ˆα and ˆθ are inconsistent under the null model. See Chernoff and Lander (1995) s discussion in the binomial mixture model that is also applicable to the normal mixture model. 8

9 . Asymptotic distribution of the LRT We proceed to study the large sample behavior of the LRT. A sandwich idea is employed to derive the asymptotic null distribution of R n. asymptotic upper bound for R n. Write We first establish an r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)} + {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} = r 1n (α, θ, σ) + r n, where r 1n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)}, r n (σ) = {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} and ˆσ 0 = n 1 X i, the MLE of σ under the null model. We first analyze r 1n (α, θ, σ). Express r 1n (α, θ, σ) = log(1 + δ i ), where δ i = (σ 1)U i (σ) + αθy i (θ, σ), with and U i (σ) = (σ 1) 1 [ 1 σ exp{ X i ( 1 σ 1)} 1 ], (4) Y i (θ, σ) = 1 [ exp{ (X i θ) ] + X i σθ σ } exp{ X i σ + X i }. (5) The functions U i (σ) and Y i (θ, σ) are continuously differentiable by defining U i (1) = (X i 1)/ and Y i (0, σ) = σ 3 X i exp{ X i (σ 1)/}. Also note that under the null distribution N(0, 1), E{U i (σ)} = 0 and E{Y i (θ, σ)} = 0 for any σ and θ. By the inequality log(1 + x) x x + 3 x3, we have r 1n (α, θ, σ) = log(1 + δ i ) δ i δi + δi 3. 3 Re-write δ i as δ i = (σ 1)U i (1) + αθy i (θ, 1) + ɛ in, (6) 9

10 where the remainder ɛ in = (σ 1){U i (σ) U i (1)} + αθ{y i (θ, σ) Y i (θ, 1)}. The following proposition can be used to estimate the sum of the remainders. Proposition 1 Let 0 < δ < 1. Then under the null distribution N(0, 1), the processes U n(σ) = n 1/ {U i (σ) U i (1)}/(σ 1) and Y n (θ, σ) = n 1/ {Y i (θ, σ) Y i (θ, 1)}/(σ 1), σ [1 δ, 1 + δ] and θ M, are tight. Proof. We only need to verify the Lipschitz condition in light of Billingsley (1968, p.95). That is, for U n(σ), to prove that E{U n(σ 1 ) U n(σ )} C(σ 1 σ ), for some constant C. Since E{U i (σ)} = 0, it is sufficient to prove that the square of the derivative of {U i (σ) U i (1)}/(σ 1) is bounded by an integrable random variable, say g(x i ). Furthermore, since {U i (σ) U i (1)}/(σ 1) is a second order of difference of the function H i (σ) = σ 1 exp{ X i (1/σ 1)}, it is enough to prove that H i (σ) g(x i ) for all σ [1 δ, 1 + δ]. By direct calculations, we see that for some constant C, H i (σ) C(X 6 i + X 4 i + X i + 1) exp { X i δ/(1 + δ) }. It is clear that the right hand side of the above is integrable under the null distribution N(0, 1), since 0 < δ < 1. Similarly, for Y (θ, σ), it is sufficient to show that θ H i(θ, σ) + θ σ H i(θ, σ) g(x i ), for some integrable g(x i ), where H i (θ, σ ) = σ 1 exp{ (X i θ) /(σ ) + X i /}. Again by direct calculations, we have, for some constant C, { θ H i(θ, σ) C(X i + X i + 1) exp (X i θ) } + Xi (1 + δ) C(X i + X i + 1) exp { δx i /(1 + δ) + M X i }. 10

11 The rightmost of the above inequality is again integrable under the null distribution N(0, 1), as 0 < δ < 1. The similar argument can also be used to show that H i (θ, σ)/ θ σ is bounded above by an integrable random variable. The proof is completed. By Proposition 1, U n(σ) = O p (1) and Y n (θ, σ) = O p (1), implying that ɛin = n 1/ (σ 1) O p (1) + n 1/ αθ(σ 1)O p (1). (7) (Note that by the convention, U n(σ) = O p (1) means and Y n (θ, σ) = O p (1) means sup Un(σ) = O p (1), σ 1 δ sup Yn (θ, σ) = O p (1). θ M, σ 1 δ ) For convenience of notation, put E n1 = (σ 1) O p (1), E n = αθ(σ 1)O p (1), U i = U i (1), and Y i (θ) = Y i (θ, 1). By (6) and (7), we obtain δ i = {(σ 1)U i + αθy i (θ)} + n 1/ (E n1 + E n ). (8) Similarly, we can replace σ with 1 in the square and cubic terms of δ i, and arrive at the following: and δi = {(σ 1)U i + αθy i (θ)} + n(en1 + En), (9) δi 3 {(σ 1)U i + αθy i (θ)} 3 = n( E n1 3 + E n 3 ). (10) It is important to note that in (10), the remainder terms have a factor of n rather than n 3/. To see this, e.g., (σ 1){U i (σ) U i } 3 = n(σ 1) 6 (1/n) {U i (σ) U i }/(σ 1) 3 = n(σ 1) 6 O p (1) = n E n

12 Now by (8), (9) and (10), r 1n (α, θ, σ) {(σ 1)U i + αθy i (θ)} {(σ 1)U i + αθy i (θ)} +(/3) {(σ 1)U i + αθy i (θ)} 3 3 +n 1/ (E n1 + E n ) + n ( E n1 j + E n j ). (11) j= Introduce Z i (θ) = Y i (θ) θu i. Then (σ 1)U i + αθy i (θ) = t 1 U i + t Z i (θ) where t 1 = σ 1 + αθ, t = αθ. Since U i and Z i (θ) are orthogonal, i.e., EU i Z i (θ) = 0, the cubic and remainder terms in (11) are controlled by the square term. In fact, the square sum times n 1 converges uniformly to a positive definite quadratic form in t 1 and t, and n 1 { Z i (θ) 3 + U i 3 } = O p (1) uniformly. Thus, t1 U i + t Z i (θ) 3 {t1 U i + t Z i (θ)} ( t 1 + t )O p (1). As for the remainder terms in (11), since θ M, n 1/ E n1 = n 1/ (σ 1) O p (1) n 1/ (t 1 + t ) O p (1) = o p { [t 1 U i + t Z i (θ)] }, and similarly n 1/ E n n 1/ {t +(σ 1) } O p (1) n 1/ (t 1+t ) O p (1) = o p { [t 1 U i +t Z i (θ)] }. (Note that when t 1 = t = 0, i.e., σ = 1 and θ = 0 in the above inequalities, r 1n = 0 = o p (1). Thus, this case can be ignored here and on other similar occasions 1

13 in the sequel.) The other remainder terms resulting from the square or cubic sum are of the same (or higher) order as that from the linear sum. In fact, n(en1 + En) n(t 1 + t ) O p (1) = (t 1 + t )O p { [t 1 U i + t Z i (θ)] }, n( E n1 3 + E n 3 ) (t 1 + t )O p (nen1 + nen). It is then concluded that (11) can be expressed as r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1 U i + t Z i (θ)} {1 + ( t 1 + t )O p (1) + o p (1)}. (1) Since U i and Z i (θ) are orthogonal, (1) is further reduced to r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1Ui + t Zi (θ)}{1 + ( t 1 + t )O p (1) + o p (1)}. (13) Let ˆt 1 = ˆσ 1 + ˆαˆθ and ˆt = ˆαˆθ be the MLE s. By Lemma, ˆt 1 = o p (1) and ˆt = o p (1). Consequently, replacement of the MLE s in (13) gives r 1n (ˆα, ˆθ, ˆσ) {ˆt 1 U i + ˆt Z i (ˆθ)} {ˆt 1Ui + ˆt Zi (ˆθ)}{1 + o p (1)}. (14) To obtain a short form for the upper bound, fix θ and consider the quadratic function n n Q(t 1, t ) = {t 1 U i + t Z i (θ)} {t 1 Ui + t Zi (θ)}. If θ 0, then t = αθ 0 and if θ < 0, then t < 0. By considering the regions of t 0 and t < 0 separately, we see that for fixed θ, Q(t 1, t ) is maximized at t 1 = t 1 and t = t with Ui t 1 =, t U = [sgn(θ) Z i (θ)] +, (15) i Z i (θ) where sgn(θ) is the sign function, and Q( t 1, t ) = { U i } U i + [{sgn(θ) Z i (θ)} + ]. Z i (θ) 13

14 Therefore, by (14) it follows that r 1n (ˆα, ˆθ, ˆσ) { U i } {1 + o U p (1)} + sup i θ M = { U i } U i [{sgn(θ) Z i (θ)} + ] {1 + o Z p (1)} i (θ) [{sgn(θ) Z i (θ)} + ] + sup + o θ M Z p (1). (16) i (θ) Recall that R n = r n (ˆα, ˆθ, ˆσ) = r 1n (ˆα, ˆθ, ˆσ) + r n, and note that r n renders an ordinary quadratic approximation, i.e., r n = { U i } U i + o p (1). An upper bound for R n is thus obtained as follows: [{sgn(θ) n R n sup Z i (θ)} + ] + o θ M nez1(θ) p (1). (17) Here nez 1(θ) substitutes for Z i (θ) since they are equivalent asymptotically and uniformly. To obtain a lower bound for R n, let ɛ > 0 be any small number fixed. Let R n (ɛ) be the supremum of r n (α, θ, σ) under restriction ɛ θ M. For fixed ɛ θ M, let α(θ) and σ(θ) assume the values determined by (15). Consider the Taylor expansion r 1n ( α(θ), θ, σ(θ)) = δ i δ i (1 + η i ), where η i < δ i and δ i is equal to δ i in (6) with α = α(θ) and σ = σ(θ). Attributing to bounding θ away from 0, the solution α(θ) is feasible, so that σ (θ) 1 = O p (n 1/ ) and α(θ) = O p (n 1/ ), uniformly in θ [ɛ, M]. Since δ i = ( σ 1)U i ( σ)+ αθy i (θ, σ), For a general constant C, δ i σ 1 U i ( σ) + αθ Y i (θ, σ). sup Y i (θ, σ) C(X + 1)e CX = o p (n 1/ ), θ M 14

15 where X = max{ X i } = o p ( log n) (see Serfling, 1980, page 91). Similarly U i ( σ) CX = o p (n 1/ ). Thus, uniformly in θ, i.e., max{ η i } max{ δ i } = o p (1), (18) r 1n ( α(θ), θ, σ(θ)) = δ i δ i {1 + o p (1)}. Thus, from (15) with fixed θ, α and σ are such that It follows that R n (ɛ) r 1n ( α(θ), θ, σ(θ)) = { U i } U i + [{sgn(θ) Z i (θ)} + ] nez 1(θ) + o p (1). [{sgn(θ) Z i (θ)} + ] sup r n ( α(θ), θ, σ(θ)) = sup + o ɛ θ M ɛ θ M nez1(θ) p (1). (19) Theorem 1 Let X 1,, X n be a random sample from the mixture distribution (1 α)n(0, σ ) + αn(θ, σ ), where 0 α 1, θ M and σ > 0, otherwise unknown. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ = 0, i.e., N(0, σ ). Then under the null distribution N(0, 1), as n, R n sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 < θ M, ζ(θ) is a Gaussian process with mean 0 and variance 1 and the autocorrelation given by e st 1 (st) / ρ(s, t) = sgn(st) (e s 1 s 4 /)(e t 1 t 4 /), (0) for s, t 0. Proof. The proof starts with (17) and (19). Since the process n 1/ n Z i (θ)/ EZ1(θ), 15 θ M

16 converges weakly to a Gaussian process ξ(θ). Direct calculation of the mean and the covariance of Z i (θ) yields that the Gaussian process ξ(θ) has mean 0, variance 1 and the autocorrelation function for s, t 0 e st 1 (st) / (e s 1 s 4 /)(e t 1 t 4 /). Therefore, the upper bound of R n converges in probability to sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 θ M, ζ(θ) = sgn(θ)ξ(θ) follows the Gaussian process with mean 0, variance 1 and the autocorrelation function (0). For ɛ > 0 given, the lower bound of R n converges weakly to R(ɛ) = sup {ζ + (θ)}. ɛ θ M Now letting ɛ 0, R(ɛ) approaches in distribution This completes the proof. sup {ζ + (θ)}. θ M 3 Two-Mean Parameter Mixtures: Tests for Homogeneity In this section we study the testing problem when both mean parameters θ 1 and θ are unknown. In addition, assume 0 α 1/ so that θ 1 and θ are distinguishable. We wish to test H 0 : α = 0 or θ 1 = θ, veresus the full model (1). 16

17 Let X 1,..., X n be a random sample of size n from a mixture population (1 α)n(θ 1, σ )+αn(θ, σ ). Let r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, ˆθ, ˆθ, ˆσ 0 )}, where ˆθ = X and ˆσ 0 = n 1 (X i X) are the MLE s of θ 1 = θ = θ and σ under the null hypothesis. Explicitly, [ r n (α, θ 1, θ, σ) = log (1 α)σ 1 exp{ (X i θ 1 ) } + ασ 1 exp{ (X i θ ) ] } σ σ { (Xi +n 1 + log X) }. n Let ˆα, ˆθ 1, ˆθ and ˆσ be the MLEs for α, θ 1, θ and σ under the full model (1 α)n(θ 1, σ )+αn(θ, σ ). The LRT is to reject H 0 if the observed R n = r n (ˆα, ˆθ 1, ˆθ, ˆσ) is large. 3.1 Large sample behavior of the MLE s The statement of Lemma 1 remains true, i.e., under the null distribution N(0, 1), there are constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. The proof is also similar to that of Lemma 1. Note that r n (α, θ 1, θ, σ) can be expressed as [ r n (α, θ 1, θ, σ) = log 1 + α[exp{ (X i θ ) (X i θ 1 ) ] } 1] σ n log σ (Xi θ 1 ) { (Xi + n 1 + log X) }. σ n Thus the inequality (3) becomes where r n (α, θ, σ) n S n(θ 1, θ ) σ S n (θ 1, θ ) = n 1 n { n log σ (Xi + n 1 + log X) }, n [(X i θ 1 ) {(X i θ 1 ) (X i θ ) } + ]. 17

18 Uniformly in θ i M, i = 1,, S n (θ 1, θ ) approaches almost surely S(θ 1, θ ) = E(X θ 1 ) E[{(X θ 1 ) (X θ ) } + ]. The function S(θ 1, θ ) is continuous and positive over θ i M, i = 1,. Thus the minimum of S(θ 1, θ ) is positive, as required by the proof of Lemma 1. Lemma is re-written as follows. Lemma 3 Under the null distribution N(0, 1), as n, ˆθ 1 0, ˆα ˆθ +(1 ˆα) ˆθ 1 0, ˆα ˆθ 0 and ˆσ 1, in probability. Proof. The proof is similar to that of Lemma. Consider ɛ σ for some constants 0 < ɛ < 1 < <. Let the space G = {G : G(u) = (1 α)i(u θ 1 ) + αi(u θ ), 0 α 1/, θ i M, i = 1, } be metrized by the Lévy distance. Then the product space of [ɛ, ] G is compact. Furthermore, the parameters σ [ɛ, ] and G G Of the compactness and identifiability, Wald s argument leads to the consistency of the MLEs of σ and G. Therefore the MLEs of the moments u k dg(u) = (1 α)θ1 k + αθ k are consistent. Under the null distribution N(0, 1), u k dg(u) = 0. Thus, (1 ˆα)ˆθ 1 + ˆαˆθ 0 and (1 ˆα)ˆθ 1 + ˆαˆθ 0 which implies that ˆαˆθ 0 and ˆθ 1 0 since 1 ˆα 1/. The lemma is proved. In light of Lemma 3, without loss of generality, σ can be restricted to a small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive number δ. We proceed to derive the asymptotic distribution of the LRT. The new challenge in the present case is a loss of positive-definiteness of the quadratic term as in (11). To overcome the difficulty, the parameter space is partitioned into two parts: θ > ɛ and θ ɛ, for an arbitrarily small ɛ > 0. The LRT will be analyzed within each part by using the sandwich approach. Let R n (ɛ; I) denote the supremum of the likelihood function within the part θ > ɛ, and R n (ɛ; II) the supremum within θ ɛ. Then 18

19 R n = max{r n (ɛ; I), R n (ɛ; II)}. The number ɛ will remain fixed as n approaches infinity. It is easily seen that Lemma 3 remains true under either restriction θ > ɛ or θ ɛ. Dependence on ɛ will be suppressed notationally for the MLE s of the parameters. Thus ˆα, ˆθ 1, ˆθ and ˆσ will denote the constrained MLE s of α, θ 1, θ and σ with restriction θ ɛ in the analysis of R n (ɛ; I), but stand for the constrained MLE s with restriction θ ɛ in the analysis of R n (ɛ; II). 3. Analysis of R n (ɛ; I) We first establish an asymptotic upper bound for R n (ɛ; I). As in Section., write r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, 0, 0, 1)} + {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = r 1n (α, θ 1, θ, σ) + r n. To analyze r 1n (α, θ 1, θ, σ), express r 1n (α, θ 1, θ, σ) = log(1 + δ i ), where [ 1 δ i = (1 α) σ exp{x i (X i θ 1 ) ] [ 1 } 1 + α σ σ exp{x i (X i θ ) ] } 1 σ = (1 α)θ 1 Y i (θ 1, σ) + αθ Y i (θ, σ) + (σ 1)U i (σ), (1) with Y i (θ, σ) and U i (σ) are defined in (5) and (4). Re-write δ i = m 1 Y i (0, 1) + (σ 1 + m )U i (1) + m 3 V i (θ ) + ɛ in, where ɛ in is the remainder term of replacement, and m 1 = (1 α)θ 1 + αθ, m = (1 α)θ 1 + αθ, m 3 = αθ 3, V i (θ ) = Y i(θ, 1) Y i (0, 1) θ U i (1). () θ Define V i (0) = (X i /) + (Xi 3 /6) so that the function V i (θ) is continuously differentiable. By a similar analysis to the single mean parameter case, it is seen that the total remainder satisfies ɛ n = ɛ in = O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 (3) 19

20 Recall that U i = U i (1) = (X i 1)/ and Y i (0, 1) = X i. We have n δ i = m 1 X i + (σ n 1 + m ) U i + m 3 V i (θ ) + ɛ n. Since the remainders resulting from the square and cubic sums are or the same (or higher) order as that from the linear sum (see the similar analysis in the case of single mean parameter mixtures), we have r 1n (α, θ 1, θ, σ) δ i δi + (/3) δi 3 = {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} +(/3) {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} 3 O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 Furthermore, the cubic sum is negligible when compared to the square sum. This can be justified by using the idea leading to (14). First, the square sum times n 1 approaches E{m 1 X 1 +(σ 1+m )U 1 +m 3 V 1 (θ )} uniformly. The limit is a positive definitive quadratic form in variables m 1, σ 1 + m and m 3. Next, noting that X i, U i and V i (θ ) are mutually orthogonal, we see that r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + ˆɛ n. Here the terms with a hat are their (constrained) MLE s with restriction θ > ɛ as remarked in the end of Section 3.1. In particular, from (3), ˆɛ n = O p { n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 }. 0

21 By Cauchy inequality [e.g., n ˆm n ˆm 1] and the restriction θ > ɛ (hence ˆθ ɛ), we have n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 ˆσ 1 [4 + n{ ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) }] + ˆθ 1 (1 + nˆθ 4 1) = o p (1) + no p { ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) } = o p (1) + no p { ˆm 1 + (ˆσ 1 + ˆm ) + ˆm 3}. Hence the remainder term ˆɛ n can also be absorbed into the quadratic sum, i.e. r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + o p (1). Applying the argument leading to (16), the right-hand side of the above inequality becomes even greater when ˆm 1, ˆσ 1 + ˆm and ˆm 3 are replaced with Xi m 1 =, σ Ui 1 + m X =, m i U 3 = {sgn(θ ) V i (θ )} +, (4) i V i (θ ) for any ɛ < θ M, so that or r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } + { U i } [{sgn(θ) V i (θ)} + ] X + sup i U + o i ɛ< θ M V p (1), (5) i (θ) r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } n + { U i } n On the other hand, the classic analysis gives [{sgn(θ) V i (θ)} + ] + sup + o ɛ< θ M nev1 p (1). (θ) r n = {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = n X (/n){ U i } + o p (1). (6) Combining (5) and (6) yields R n (ɛ; I) [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (θ) 1

22 We thus have established an asymptotic upper bound for R n (ɛ; I). Again we can see that the upper bound is achievable. For θ ɛ fixed, let α, θ 1 and σ be the solutions for α, θ 1 and σ of (4). Then α = O p (n 1/ ), θ 1 = O p (n 1/ ) and σ 1 = O p (n 1/ ) uniformly in θ. The uniformity is ensured by the restriction θ > ɛ. Use the Taylor expansion r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + η i ), where η i < δ i. The argument leading to (18) also proves max{ η i } max{ δ i } = o p (1), so that By (4), α, θ 1 and σ are such that r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + o p (1)). sup r n ( α, θ [{sgn(θ) V i (θ)} + ] 1, θ, σ) = sup + o ɛ< θ M ɛ θ M nev1 p (1). (θ) It is thus shown that the asymptotic upper bound is achievable. That is, R n (ɛ; I) = This concludes the analysis of R n (ɛ; I). [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (7) (θ) 3.3 Analysis of R n (ɛ; II) Now consider the restriction θ ɛ. In this case, θ 1 and θ can be treated equally. In fact, since the MLE of θ 1 is consistent, we can restrict θ 1 ɛ as well. As before, we know that r 1n (α, θ 1, θ, σ) = log(1 + δ i ) δ i δi + (/3) δi 3. Let ˆm k = (1 ˆα)ˆθ k 1 + ˆαˆθ k. Using the Taylor expansions of Y i (ˆθ, ˆσ) and U i (ˆσ) in (1), we have ˆδ i = ˆm 1 Y i (0, 1) + (ˆσ 1 + ˆm )Y i (0, 1) + 1 ˆm 3Y i (0, 1) {3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm }Y i (0, 1) + ˆɛ in, (8)

23 where Y i (0, 1) is the first partial derivative of Y i (θ, σ) with respect to θ at θ = 0 and σ = 1 and similarly for Y i (0, 1) and Y i (0, 1). As before, put Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1). By calculation, Y i = U i (1) = (X i 1)/, Y i = (Xi 3 3X i )/3, and Y i = U i(1) = (Xi 4 6Xi +3)/4. The sum of the remainders ˆɛ n = ˆɛ in satisfies ˆɛ n = n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (9) Note that the cross product terms in the Taylor expansion of (8) have been taken into account in the remainder. For example, n 1/ (ˆσ 1) ˆm 1 = o p (n 1/ ˆm 1 ) = o p (1 + n ˆm 1). The coefficient n 1/ in the above results from the iid sum of zero-mean random variables as we have seen in the last section. Also note that 3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm = 3(ˆσ 1 + ˆm ) + ˆm 4 3 ˆm. Hence (8) reduces to where ˆδ i = ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i + ˆɛ in, ŝ 1 = ˆm 1, ŝ = ˆσ 1 + ˆm, ŝ 3 = (1/) ˆm 3, ŝ 4 = (1/6)( ˆm 4 3 ˆm ), (30) and combining (9), the sum of the remainders, ˆɛ n = ˆɛ in, becomes ˆɛ n = n 1/ ŝ O p (1) + n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (31) Therefore, an upper bound for r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) is r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3

24 {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } + {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3 + ˆɛ 3 n. The argument leading to (14) shows that the cubic sum is controlled by the square sum up to a term of ɛo p (1). Also note that Y i, Y i, Y i and Y i are mutually orthogonal and hence the quadratic sum is positive-definite. Therefore, r 1n (ˆα, ˆθ 1, ˆθ n, ˆσ) {ŝ 1 Y i + ŝ ŝ (Y i ) + ŝ 3 n Y i + ŝ 3 (Y i ) + ŝ 4 Since ˆσ 1 = o p (1), ˆm ɛ and ˆm 3 ˆm 6, we have n Y n i + ŝ 4 (Y Y i } {ŝ 1 Y i + i ) }{1 + ɛo p (1)} + ˆɛ n. (3) n 1/ (ˆσ 1) 3 8n 1/ { ŝ 3 + ˆm 3 } ɛn 1/ ŝ O p (1) + n 1/ ˆm 6 O p (1), so that (31) can be expressed as ˆɛ n = ɛn 1/ ŝ O p (1) + n( ˆm 1 + ˆm 3)o p (1) + n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (33) Now the key point is to show that ˆɛ n = o p (1) + ɛn{ŝ 1 + ŝ + ŝ 3 + ŝ 4}O p (1). (34) This result implies that the remainder is also negligible when compared to the square sum in (3). Put ˆτ = (1 ˆα) ˆθ ˆα ˆθ 5. Then ˆm 5 + ˆm 6 = O p (ˆτ). Therefore, (34) follows immediately from (33) and the following lemma. Lemma 4 ˆτ = o p (1) + ɛ{ ŝ 1 + ŝ + ŝ 3 + ŝ 4 }O p (1). 4

25 Proof. The proof is accomplished by partitioning the sample space into several parts and showing that in each part, one of ŝ i,,, 3, 4 controls the size of ˆτ. Consider the first part: (1 ˆα) ˆθ 1 γ ˆα ˆθ for a constant γ > 1. In this case, On the other hand, ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ (γ 1)ˆα ˆθ. ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ ( ) ˆα ˆθ (1 ˆα) ˆθ 1 1 (1 ˆα) ˆθ 1 (1 γ 1 )(1 ˆα) ˆθ 1. So, ˆτ = ɛo p {(1 ˆα) ˆθ 1 + ˆα ˆθ } = ɛo p (ŝ 1 ). Similarly, let (1 ˆα) ˆθ 1 3 γ ˆα ˆθ 3. We have ˆτ = ɛo p (ŝ 3 ). Finally, consider the case that γ 1 (1 ˆα) ˆθ 1 k ˆα ˆθ k γ, for k = 1, 3. Solving the inequalities with k = 1 and 3, we have γ ˆα/(1 ˆα) γ, implying (1 ˆα) ˆαγ and ˆα (1 ˆα)γ. Therefore, ˆm 4 ˆm = ˆα(1 ˆα)(ˆθ 1 ˆθ ) ˆα(1 ˆα)(ˆθ ˆθ 4 ) γ {(1 ˆα) ˆθ4 1 + ˆα ˆθ4 } γ ˆm. (35) So, ˆm 4 3 ˆm (γ ) ˆm < 0 when the constant γ is chosen priorly to be between 1 and. It follows that ˆm ˆm 4 3 ˆm /( γ ). 5

26 Equation (35) also implies ˆm 4 (1 + γ ) ˆm. Consequently, ˆτ = ɛo p ( ˆm 4 ) = ɛo p ( ˆm ) = ɛo p (ŝ 4 ). We have thus exhausted the sample space and the lemma follows. From (31) and (34), it follows that r 1n (ˆα, ˆθ 1, ˆθ n n, ˆσ) {ŝ 1 Y i + ŝ Y n i + ŝ 3 Y n i + ŝ 4 Y i } {ŝ 1 Yi + ŝ (Y i ) + ŝ 3 (Y i ) + ŝ 4 (Y i ) }{1 + ɛo p (1)}. Applying the quadratic form argument which has been used several times in the previous sections, we obtain that R n (ɛ; II) ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). The upper bound in the above inequality is attained when the parameters α, θ 1, θ and σ assume the values determined by the following equations: Yi Y i s 1 =, s Y = i (Y i ), Y i Y s 3 = (Y i ), s i 4 = (Y i ), where s 1, s, s 3 and s 4 are defined correspondingly by (30). We thus arrive at R n (ɛ; II) = ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). (36) Remark. A by-product of the above analysis shows that the MLE of σ has a convergence rate at most n 1/4. To see this, consider the submodel where θ 1 = θ = θ, α = 1/ and σ 1 = θ. The maximum of the likelihood function is achieved when m 4 3m = θ 4 = 6 Y i (Y i ) = O p(n 1/ ). This implies that ˆθ = O p (n 1/8 ) and ˆσ 1 = n 1/4. This is in contrast to the ordinary semi-parametric models, where one may still have the usual rate of n 1/ for 6

27 the parametric components. See Van der Vaart (1996). Moreover, the result suggests that the best possible rate for estimating the mixing distribution when a structural parameter is present, is n 1/8 rather than n 1/4 as found by Chen (1995) for the mixture models without a structure parameter. 3.4 Asymptotic distribution of the LRT Theorem Let X 1,, X n be a random sample from the mixture distribution (1 α)n(θ 1, σ ) + αn(θ, σ ), where 0 α 1/, θ i M, i = 1, and σ > 0. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ 1 = θ, i.e., N(θ 1, σ ). Then under the null distribution N(0, 1), as n, R n sup [{ς + (θ)} I(θ 0) + {ς(0) + Z }I(θ = 0)], θ M where the process involved in the limiting distribution is defined as follows: (1) ς(θ), θ M is a Gaussian process with mean 0, variance 1 and the autocorrelation function for s, t 0, b(st) ρ(s, t) = sgn(st), and ρ(0, t) = t 3 b(s )b(t ) 6b(t ), (37) where b(x) = e x 1 x x /, and () ς(0) and Z N(0, 1) are independent, and for s 0, Proof. For any fixed ɛ > 0, Cov{ς(s), Z} = s 4 6b(s ). (38) R n = max{r n (ɛ; I), R n (ɛ; II)}. By (7), the asymptotic distribution of R n (ɛ; I) is determined by the limit of { V i (θ)} nev1 (θ). 7

28 By the definition of V i (θ) given in (), we see that n 1/ n V i (θ)/{ev 1 (θ)} 1/ converges weakly to a Gaussian process, say ξ (θ), ɛ θ M with mean 0, variance 1 and autocorrelation function as follows: for ɛ s, t M, Cov{ξ (s), ξ (t)} = b(st)/ b(s )b(t ). Define ς(θ) = sgn(θ)ξ (θ) for θ 0 and ς(0) = ξ (0). Then ς(θ) is a Gaussian process with the autocorrelation function (37) and R n (ɛ; I) converges weakly to by first letting n and then ɛ 0. sup {ς + (θ)}, 0< θ M On the other hand, by (36) we can have that, by first letting n and then ɛ 0, R n (ɛ; II) converges weakly to ς(0) + Z. To see this, put R n (ɛ; II) = A n + ɛo p (1) in (36). For any η > 0, there exists C > 0 such that P ( O p (1) > C) < η for all large n s. Thus for any given x and n large, P (A n x ɛc) η P (R n (ɛ; II) x) P (A n x + ɛc) + η, implying that R n (ɛ; II) converges weakly to ς(0) + Z by first letting n and then ɛ 0 and finally η 0. The independence of ς(0) and Z is due to the fact V i (0) = Y i / and the orthogonality of Y i and Y i. The correlation between ς(θ) and Z is seen from the following calculation: Cov{V i (θ), Y i } = (θ/6)var(y i ) = θ/4, and Var{V i (θ)} = a(s )/θ 6. Thus the correlation between V i (θ) and Y i are given by (38). The proof is completed. 4 Conclusion Remark The asymptotic null distribution of the LRT for homogeneity in finite normal mixture models in the presence of a structural parameter has been derived without separation 8

29 conditions on the mean parameters. It is proved that the asymptotic null distribution of the LRT is the maximum of a χ -variable and the supremum of the square of a truncated Gaussian process. If the structural parameter were removed from the model, the peculiar large sample behavior of the LRT would disappear and the limiting null distribution would be simply the supremum of the square of the truncated Gaussian process and reduce to the one discovered by Chen and Chen (001a). If, in addition, let M approach infinity, the supremum is distributed approximately as that of ( log M) 1/ + {X log(π)}/( log M) 1/, where P (X x) = exp{ e x }, which is the type-i extreme value distribution. See Chernoff and Lander (1995, Appendix D) and Adler (1990). The result in Bickel and Chernoff (1993) can be obtained in a heuristic way by letting M = (log n/) 1/. It is interesting to see that the results from different model set-ups agree formally. Bickel and Chernoff actually dealt with a modified LRT by replacing a random element in the LRT statistic with its mean in order to simplify the analysis. It seems that their modification might not have changed the asymptotic behavior of the LRT substantially. Computing the quantiles of the supremum of a Gaussian process over a region is a difficult problem. See also the comments by Dacunha-Castelle and Gassiat (1999), and Chen and Chen (001b). Some approximations in special cases can be found in Adler (1990) and Sun (1993). Owning to the large sample study, it is found that even though the structural parameter is not part of the mixing distribution, the convergence rate of the MLE is n 1/4 rather than n 1/. This is in sharp contrast to the ordinary semi-parametric models. Moreover, the estimated mixing distribution has a convergence rate n 1/8 rather than n 1/4 as discovered by Chen (1995) for finite mixture models without a structural parameter. 9

30 REFERENCES Adler, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Institute of Mathematical Statistics, Lecture Notes. Vol. 1. Hayward, CA. Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in prototypical non regular problem. Statistics and Probability: a Raghu Raj Bahadur Festschrift. Ed: J.K. Ghosh, S.K. Mitra, K.R. Parthasarathy, and B.L.S. Prakasa Rao. Wiley Eastern Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Chen, H. and Chen, J. (001a). Large sample distribution of the likelihood ratio test for normal mixtures. Statistics & Probability Letters Chen, H. and Chen, J. (001b). The likelihood ratio test for homogeneity in the finite mixture models. Canad. J. Statist Chen, J. (1995). Optimal rate of convergence in finite mixture models. Ann. Statist Cheng, R.C.H. and Traylor, L. (1995). Non-regular maximum likelihood problems. J. Roy. Statist. Soc. B Chernoff, H. and Lander, E. (1995). Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. J. Statist. Plann. Inf Chow, Y.S., and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, New York. Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton. 30

31 Dacunha-Castelle, D. and Gassiat, É. (1999). Testing in locally conic models, and application to mixture models. Ann. Statist Dean, C.B. (199). Testing for overdispersion in Poisson and Binomial regression models. J. Amer. Statist. Assoc Ghosh, J.K. and Sen, P.K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Hartigan, J.A. (1985). A Failure of Likelihood Asymptotics for Normal Mixtures. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Lemdani, M. and Pons, O. (1999). Likelihood ratio tests in contamination models. Bernoulli Leroux, B. (199). Consistent estimation of a mixture distributions. Ann. Statist Lindsay, B.G. (1989). Moment matrices: applications in mixtures. Ann. Statist Rubin H. (1956). Uniform convergence of random functions with applications to statistics. Ann. Math. Statist Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Sun, J. (1993). Tail probabilities of the maxima of Gaussian random fields. The Annals of Probability

32 Van der Vaart, A. W. (1996). Efficient maximum likelihood estimation in semiparametric mixture models. Ann. Statist Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist Wilks, S.S. (1938). The large sample distribution of the likelihood ratio for testing composite hypothesis. Ann. Math. Statist., Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA. hchen@math.bgsu.edu Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ont NL 3G1, Canada jhchen@uwaterloo.ca 3

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for