TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS

Size: px
Start display at page:

Download "TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS"

Transcription

1 TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS By Hanfeng Chen and Jiahua Chen 1 Bowling Green State University and University of Waterloo Abstract. Often a question arises as to whether the observed data are a sample from a homogeneous population or the data have come from a heterogeneous population. particular, one wants to test for a single normal distribution versus a mixture of two normal distributions. Classic asymptotic results fail to apply to the problem since the model does not satisfy the regularity conditions. This paper investigates the large sample behavior of the likelihood ratio statistic for testing homogeneity in the normal mixture in location parameters with an unknown structural parameter. It is proved that the asymptotic null distribution of the likelihood ratio statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function under the null distribution. The correlation structure of the process involved in the limiting distribution is presented explicitly. From the large sample study, it is also found that even though the structural parameter is not part of the mixing distribution, the convergence rate of its maximum likelihood estimate is n 1/4 rather than n 1/, while the mixing distribution has a convergence rate n 1/8 rather than n 1/4. This is in sharp contrast to the ordinary semi-parametric models and to the mixture models without a structural parameter. Key words and phrases: Asymptotic distribution, Gaussian process, genetic analysis, finite mixture, likelihood ratio, non-regular model, semi-parametric model. AMS 1980 subject classifications. Primary 6F03; secondary 6F05. In 1 The work was supported in part by a grant from NSERC of Canada, and an FIL grant from Bowling Green State University. 1

2 1 Introduction Consider the following problem: Let X 1,..., X n be a random sample from a mixture population (1 α)n(θ 1, σ ) + αn(θ, σ ) with the probability density function (pdf) (1 α)σ 1 φ((x θ 1 )/σ) + ασ 1 φ((x θ )/σ), (1) where φ( ) is the pdf of the standard normal N(0, 1). We wish to test H 0 : α(1 α) = 0 or θ 1 = θ, versus the full model (1), i.e., to test N(θ, σ ) versus (1 α)n(θ 1, σ ) + αn(θ, σ ). The mixture pdf (1) can also be expressed as an integral σ 1 φ((x u)/σ)dg(u), with the mixing distribution G(u) = (1 α)i(u θ 1 ) + αi(u θ ). For parametric hypothesis testing problems it is customary to use the likelihood ratio as a test statistic. Under standard regularity conditions, a classic result of Wilks (1938) states that if the null hypothesis is true, the likelihood ratio test (LRT) has, asymptotically, a χ -distribution. However, the regularity conditions are not satisfied for the mixture problem considered here. First, the null hypothesis lies on the boundary of the parameter space, whereas the standard regularity conditions require it to be in the interior. Secondly, the two statements α = 0 and θ 1 = θ which equivalently specify the null hypothesis are not exclusive. That is, there is a loss of identifiability under the null model. One may think that the unidentifiability can be eliminated by reparameterization. In that scenario, a third problem appears. The Fisher information which characterizes the behavior of the maximum likelihood estimate (MLE) degenerates. Due to these irregularity problems, the classic results break down under the mixture model: the maximum likelihood estimators (MLE s) for some model parameters are inconsistent; the usual quadratic approximation to the likelihood function is no longer appropriate; Cramér (1946) s result about the

3 asymptotic normality of the MLE and Wilks (1938) s asymptotic χ -theory of the LRT do not hold. Cheng and Traylor (1995) identified the mixture model as one of four non-regular parametric models. Due to its appealing challenge in theoretic study and its important applications to various scientific disciplines such as human genetic linkage analysis, actuarial sciences and statistical ecology, there has been increasing interest for researchers in the mixture model in recent years (e.g., Hartigan, 1985; Ghosh and Sen, 1985; Lindsay, 1989; Leroux, 199; Chernoff and Lander, 1995; Dacunha-Castelle and Gassiat, 1999; Lemdani and Pons, 1999; Chen and Chen, 001a and b). The large sample behavior of the LRT for homogeneity in the mixture model indeed is a long-standing mystery. Hartigan (1985) showed that the LRT statistic tends to infinity with probability one if the mean parameters are unbounded. The divergence behavior of the LRT is further detailed by Bickel and Chernoff (1993). One of important implications by Hartigan s result is that a bounded assumption on the mean parameters is necessary for the LRT to have a limiting distribution. Under the boundedness assumption of the mean parameters, Ghosh and Sen (1985) gave the first version of the asymptotic distributions of the LRT for testing homogeneity. However, in addition to the boundedness, they had to impose a separation condition, i.e. θ 1 θ > ɛ for some given ɛ > 0. The separation condition is obviously unsatisfactory. There have been many attempts made to remove the separation condition. Lemdani and Pons (1999) used a reparameterization approach to investigate the testing problem when one of the mean parameters is known and their study showed that there is no obvious way to remove the separation condition. Dacunha-Castelle and Gassiat (1999) developed a general reparameterization method for the testing problem in locally conic models. Their results can be applied to some useful mixture models under some situations and especially interestingly to stationary ARMA models. In the meantime, Chen and Chen (001a and b) took a different approach, i.e., the so-called sandwich method, to attack the problem without the separation condition. 3

4 From the discussion above, to remove the separation condition has been one of the central issues in the large sample study of the LRT for homogeneity since Ghosh and Sen (1985). Also the studies have been confined to the mixture models without a structural parameter. This paper intends to investigate the general problem: (a) A structural parameter is included in the mixture model to bring the model closer to the reality. The structural parameter is not required to be bounded. (b) Test for homogeneity is considered, i.e., both the two mean parameters are assumed unknown. (c) The separation condition is removed from the model. Following Chen and Chen (001a and b), we will use the sandwich method to derive the asymptotic distribution of the LRT. The main challenge in the present problem, however, is to analyze the contribution to variation due to estimating the structural parameter and separate it from estimating the mixing distribution G. Even though the paper deals with the normal mixtures, the ideas and technical treatments are applicable to general parametric mixture models. Study of the normal mixtures elucidates the difficulties and clarifies the main issues in the mixture models, some of which are often buried in the analytic conditions in a general set-up. We will start our study in Section with the case of the single mean parameter mixture model. In this problem, one of the mean parameters θ 1 and θ in the model (1) is assumed known. While study of the single mean parameter mixtures has its own virtue, main purpose of the section is to demonstrate the main ideas behind our approach and outline the study for the general mixture model (1). The asymptotic distribution of the LRT for homogeneity under the model (1) is investigated in Section 3. It is shown that the asymptotic null distribution of the LRT statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function. 4

5 Throughout the paper, without loss of generality, let the null underlying distribution be N(0, 1). For convenience of notation, we say X n (t) = O p (a n ) or = o p (a n ) if sup t T X n (t)/a n = O p (1) or sup t T X n (t)/a n = o p (1), where T is a suitably specified index set and a n is a sequence of constants or random variables. Single Mean Parameter Mixtures Assume in the model (1) that θ 1 is specific, say 0 and the other mean θ is unknown. Write θ = θ. In addition assume that θ M. Based on the observation X i s, we wish to use the LRT to test the null hypothesis H 0 : N(0, σ ) versus H a : (1 α)n(0, σ ) + αn(θ, σ ). The log-likelihood function of α, θ and σ is l n (α, θ, σ) = log[(1 α)σ 1 exp{ Xi /(σ )} + ασ 1 exp{ (X i θ) /(σ )}]. Let ˆσ 0 be the MLE of σ under the null hypothesis, i.e. ˆσ 0 = n 1 X i. Let r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, ˆσ 0 )}, i.e. { r n (α, θ, σ) = log 1 + α(exp{ X iθ θ } } 1) σ ( X n log σ ) i X + n 1 + log i. () σ n Let ˆα, ˆθ and ˆσ be the MLEs for α, θ and σ under the full model. Then the LRT is to reject the null hypothesis when R n = r n (ˆα, ˆθ, ˆσ) is large..1 Large sample behavior of the MLE s We first show that under the null hypothesis ˆσ infinity with probability approaching one. is bounded away from zero and 5

6 Lemma 1 Under the null distribution N(0, 1), there exist constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. Proof. Consider r n (α, θ, σ) defined by (). Note that when xθ θ 0, [ ] { } xθ θ xθ θ 1 + α exp{ } 1 exp. σ σ We thus have the inequality [(Xi θ θ ) + X ( i ] r n (α, θ, σ) n log σ + n 1 + log σ X i n ), (3) where t + = ti(t > 0) denotes the positive part of t. Since (X i θ θ ) + X i is equal to either (θ X i ) or X i, we see that r n (α, θ, σ) n log σ + n{1 + log( X i /n)}. Since log(n 1 X i ) 0 almost surely, the function r n (α, θ, σ) < 0 for all σ > with probability approaching 1 for some large constant. That is, lim P (ˆσ > ) = 0 for some constant. Next we show that ˆσ is also bounded away from zero asymptotically. By the uniform strong law of large numbers (see Rubin, 1956), n 1 {X i (θx i θ ) + } S(θ) = E{X (θx θ ) + }, almost surely and uniformly in θ M. Since S(θ) is continuous and positive, the minimum value of S(θ) is positive, say equal to q for some q > 0. Then with probability approaching one uniformly in α, θ and σ, ( r n (α, θ, σ) nqσ n log σ + n 1 + log Let ɛ > 0 be small enough such that q/ɛ log ɛ + 1 < 0. X i n ). It follows that with probability approaching 1 uniformly, the function r n (α, θ, σ) < 0 if σ < ɛ, implying that lim P (ˆσ ɛ) = 1. By Lemma 1, the parametric space of interest can be reduced to a compact one by restricting σ within the interval [ɛ, ]. 6

7 Lemma Under the null distribution N(0, 1), as n, ˆαˆθ 0 and ˆσ 1, in probability. Proof. As remarked, we only need to consider ɛ σ for some constants 0 < ɛ < 1 < <. Let G = {G( ) : G(u) = (1 α)i(u 0) + αi(u θ), θ M, 0 α 1}. Let the space G be metrized by taking the Lévy distance between two distribution functions G 1 and G as follows: λ(g 1, G ) = inf{τ > 0 : G 1 (u τ) τ G (u) G 1 (u + τ) + τ, for all u}. (It is well-known that the Lévy distance convergence is equivalent to the weak convergence of distribution functions. See, e.g., Chow and Teicher, 1978.) For any sequence of G j (u) = (1 α j )I(u 0) + α j I(u θ j ) in G, since θ j M < and 0 α j 1, one can find a subsequence G j such that both α j and θ j converge, say, to α and θ, implying that the subsequence G j converges to G (u) = (1 α )I(u 0) + α I(u θ ) G weakly, i.e., λ(g j, G ) 0. It is thus shown that G is compact, so is the product space Ω = {ω = (σ, G) : σ [ɛ, ], G G}. Moreover, for ω = (σ, G) Ω, put f(x; ω) = σ 1 φ{(x u)/σ}dg(u). Then the parameter ω Ω is identifiable, i.e., for any ω i Ω,,, f(x; ω 1 ) = f(x; ω ), for all x, implies ω 1 = ω. Of the compactness and identifiability, Wald (1949) s argument leads to consistency of the MLE ˆω for ω = (σ, G) under the null model ω 0 = (1, G 0 ), where G 0 (u) = I(u 0). To see this, give any small γ > 0 such that {ω = (σ, G) : ω ω 0 = σ 1 + λ(g, G 0 ) < γ} Ω. 7

8 Let Ω = {ω Ω : ω ω 0 γ}. For any ω Ω, define an open ball in Ω as B(ω, η) = {ω Ω : ω ω < η} and define f(x B) = sup{f(x; ω)}. ω B It is seen that f(x B(ω, η)) (πɛ) 1/ so that E{ log f(x B(ω, η) } <, and as η 0, E{log f(x B(ω, η))} E{log f(x; ω )}. For each ω, take η(ω ) to be small enough such that E{log[f(X B(ω, η(ω )))/φ(x)]} < 0. (Note that f(x; ω 0 ) = φ(x).) Then there are finite many ω s, say ω 1,, ω m, such that m j=1b j = Ω, where B j = B(ω j, η(ω j )). By the law of large numbers, for 1 j m, P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, where i.o stands for infinitely often. Consequently, for the MLE ˆω = (ˆσ, Ĝ), P {ˆω Ω, m i.o} P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, j=1 i.e., ˆσ 1 and λ(ĝ, G 0) 0 in probability. Finally, it is implied that the MLEs of the moments u k dg(u) = αθ k are also consistent. Note that u k dg 0 (u) = 0 under the null model N(0, 1). The lemma is proved. Attributing to Lemma, σ can be viewed to have been restricted in any small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive δ. This restriction will be used to ensure the tightness of some processes later. Here we would like to point out that Lemma does not imply anything about the rate of convergence. We also like to remark that Lemma does not say that ˆα or ˆθ is consistent. In fact, ˆα and ˆθ are inconsistent under the null model. See Chernoff and Lander (1995) s discussion in the binomial mixture model that is also applicable to the normal mixture model. 8

9 . Asymptotic distribution of the LRT We proceed to study the large sample behavior of the LRT. A sandwich idea is employed to derive the asymptotic null distribution of R n. asymptotic upper bound for R n. Write We first establish an r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)} + {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} = r 1n (α, θ, σ) + r n, where r 1n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)}, r n (σ) = {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} and ˆσ 0 = n 1 X i, the MLE of σ under the null model. We first analyze r 1n (α, θ, σ). Express r 1n (α, θ, σ) = log(1 + δ i ), where δ i = (σ 1)U i (σ) + αθy i (θ, σ), with and U i (σ) = (σ 1) 1 [ 1 σ exp{ X i ( 1 σ 1)} 1 ], (4) Y i (θ, σ) = 1 [ exp{ (X i θ) ] + X i σθ σ } exp{ X i σ + X i }. (5) The functions U i (σ) and Y i (θ, σ) are continuously differentiable by defining U i (1) = (X i 1)/ and Y i (0, σ) = σ 3 X i exp{ X i (σ 1)/}. Also note that under the null distribution N(0, 1), E{U i (σ)} = 0 and E{Y i (θ, σ)} = 0 for any σ and θ. By the inequality log(1 + x) x x + 3 x3, we have r 1n (α, θ, σ) = log(1 + δ i ) δ i δi + δi 3. 3 Re-write δ i as δ i = (σ 1)U i (1) + αθy i (θ, 1) + ɛ in, (6) 9

10 where the remainder ɛ in = (σ 1){U i (σ) U i (1)} + αθ{y i (θ, σ) Y i (θ, 1)}. The following proposition can be used to estimate the sum of the remainders. Proposition 1 Let 0 < δ < 1. Then under the null distribution N(0, 1), the processes U n(σ) = n 1/ {U i (σ) U i (1)}/(σ 1) and Y n (θ, σ) = n 1/ {Y i (θ, σ) Y i (θ, 1)}/(σ 1), σ [1 δ, 1 + δ] and θ M, are tight. Proof. We only need to verify the Lipschitz condition in light of Billingsley (1968, p.95). That is, for U n(σ), to prove that E{U n(σ 1 ) U n(σ )} C(σ 1 σ ), for some constant C. Since E{U i (σ)} = 0, it is sufficient to prove that the square of the derivative of {U i (σ) U i (1)}/(σ 1) is bounded by an integrable random variable, say g(x i ). Furthermore, since {U i (σ) U i (1)}/(σ 1) is a second order of difference of the function H i (σ) = σ 1 exp{ X i (1/σ 1)}, it is enough to prove that H i (σ) g(x i ) for all σ [1 δ, 1 + δ]. By direct calculations, we see that for some constant C, H i (σ) C(X 6 i + X 4 i + X i + 1) exp { X i δ/(1 + δ) }. It is clear that the right hand side of the above is integrable under the null distribution N(0, 1), since 0 < δ < 1. Similarly, for Y (θ, σ), it is sufficient to show that θ H i(θ, σ) + θ σ H i(θ, σ) g(x i ), for some integrable g(x i ), where H i (θ, σ ) = σ 1 exp{ (X i θ) /(σ ) + X i /}. Again by direct calculations, we have, for some constant C, { θ H i(θ, σ) C(X i + X i + 1) exp (X i θ) } + Xi (1 + δ) C(X i + X i + 1) exp { δx i /(1 + δ) + M X i }. 10

11 The rightmost of the above inequality is again integrable under the null distribution N(0, 1), as 0 < δ < 1. The similar argument can also be used to show that H i (θ, σ)/ θ σ is bounded above by an integrable random variable. The proof is completed. By Proposition 1, U n(σ) = O p (1) and Y n (θ, σ) = O p (1), implying that ɛin = n 1/ (σ 1) O p (1) + n 1/ αθ(σ 1)O p (1). (7) (Note that by the convention, U n(σ) = O p (1) means and Y n (θ, σ) = O p (1) means sup Un(σ) = O p (1), σ 1 δ sup Yn (θ, σ) = O p (1). θ M, σ 1 δ ) For convenience of notation, put E n1 = (σ 1) O p (1), E n = αθ(σ 1)O p (1), U i = U i (1), and Y i (θ) = Y i (θ, 1). By (6) and (7), we obtain δ i = {(σ 1)U i + αθy i (θ)} + n 1/ (E n1 + E n ). (8) Similarly, we can replace σ with 1 in the square and cubic terms of δ i, and arrive at the following: and δi = {(σ 1)U i + αθy i (θ)} + n(en1 + En), (9) δi 3 {(σ 1)U i + αθy i (θ)} 3 = n( E n1 3 + E n 3 ). (10) It is important to note that in (10), the remainder terms have a factor of n rather than n 3/. To see this, e.g., (σ 1){U i (σ) U i } 3 = n(σ 1) 6 (1/n) {U i (σ) U i }/(σ 1) 3 = n(σ 1) 6 O p (1) = n E n

12 Now by (8), (9) and (10), r 1n (α, θ, σ) {(σ 1)U i + αθy i (θ)} {(σ 1)U i + αθy i (θ)} +(/3) {(σ 1)U i + αθy i (θ)} 3 3 +n 1/ (E n1 + E n ) + n ( E n1 j + E n j ). (11) j= Introduce Z i (θ) = Y i (θ) θu i. Then (σ 1)U i + αθy i (θ) = t 1 U i + t Z i (θ) where t 1 = σ 1 + αθ, t = αθ. Since U i and Z i (θ) are orthogonal, i.e., EU i Z i (θ) = 0, the cubic and remainder terms in (11) are controlled by the square term. In fact, the square sum times n 1 converges uniformly to a positive definite quadratic form in t 1 and t, and n 1 { Z i (θ) 3 + U i 3 } = O p (1) uniformly. Thus, t1 U i + t Z i (θ) 3 {t1 U i + t Z i (θ)} ( t 1 + t )O p (1). As for the remainder terms in (11), since θ M, n 1/ E n1 = n 1/ (σ 1) O p (1) n 1/ (t 1 + t ) O p (1) = o p { [t 1 U i + t Z i (θ)] }, and similarly n 1/ E n n 1/ {t +(σ 1) } O p (1) n 1/ (t 1+t ) O p (1) = o p { [t 1 U i +t Z i (θ)] }. (Note that when t 1 = t = 0, i.e., σ = 1 and θ = 0 in the above inequalities, r 1n = 0 = o p (1). Thus, this case can be ignored here and on other similar occasions 1

13 in the sequel.) The other remainder terms resulting from the square or cubic sum are of the same (or higher) order as that from the linear sum. In fact, n(en1 + En) n(t 1 + t ) O p (1) = (t 1 + t )O p { [t 1 U i + t Z i (θ)] }, n( E n1 3 + E n 3 ) (t 1 + t )O p (nen1 + nen). It is then concluded that (11) can be expressed as r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1 U i + t Z i (θ)} {1 + ( t 1 + t )O p (1) + o p (1)}. (1) Since U i and Z i (θ) are orthogonal, (1) is further reduced to r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1Ui + t Zi (θ)}{1 + ( t 1 + t )O p (1) + o p (1)}. (13) Let ˆt 1 = ˆσ 1 + ˆαˆθ and ˆt = ˆαˆθ be the MLE s. By Lemma, ˆt 1 = o p (1) and ˆt = o p (1). Consequently, replacement of the MLE s in (13) gives r 1n (ˆα, ˆθ, ˆσ) {ˆt 1 U i + ˆt Z i (ˆθ)} {ˆt 1Ui + ˆt Zi (ˆθ)}{1 + o p (1)}. (14) To obtain a short form for the upper bound, fix θ and consider the quadratic function n n Q(t 1, t ) = {t 1 U i + t Z i (θ)} {t 1 Ui + t Zi (θ)}. If θ 0, then t = αθ 0 and if θ < 0, then t < 0. By considering the regions of t 0 and t < 0 separately, we see that for fixed θ, Q(t 1, t ) is maximized at t 1 = t 1 and t = t with Ui t 1 =, t U = [sgn(θ) Z i (θ)] +, (15) i Z i (θ) where sgn(θ) is the sign function, and Q( t 1, t ) = { U i } U i + [{sgn(θ) Z i (θ)} + ]. Z i (θ) 13

14 Therefore, by (14) it follows that r 1n (ˆα, ˆθ, ˆσ) { U i } {1 + o U p (1)} + sup i θ M = { U i } U i [{sgn(θ) Z i (θ)} + ] {1 + o Z p (1)} i (θ) [{sgn(θ) Z i (θ)} + ] + sup + o θ M Z p (1). (16) i (θ) Recall that R n = r n (ˆα, ˆθ, ˆσ) = r 1n (ˆα, ˆθ, ˆσ) + r n, and note that r n renders an ordinary quadratic approximation, i.e., r n = { U i } U i + o p (1). An upper bound for R n is thus obtained as follows: [{sgn(θ) n R n sup Z i (θ)} + ] + o θ M nez1(θ) p (1). (17) Here nez 1(θ) substitutes for Z i (θ) since they are equivalent asymptotically and uniformly. To obtain a lower bound for R n, let ɛ > 0 be any small number fixed. Let R n (ɛ) be the supremum of r n (α, θ, σ) under restriction ɛ θ M. For fixed ɛ θ M, let α(θ) and σ(θ) assume the values determined by (15). Consider the Taylor expansion r 1n ( α(θ), θ, σ(θ)) = δ i δ i (1 + η i ), where η i < δ i and δ i is equal to δ i in (6) with α = α(θ) and σ = σ(θ). Attributing to bounding θ away from 0, the solution α(θ) is feasible, so that σ (θ) 1 = O p (n 1/ ) and α(θ) = O p (n 1/ ), uniformly in θ [ɛ, M]. Since δ i = ( σ 1)U i ( σ)+ αθy i (θ, σ), For a general constant C, δ i σ 1 U i ( σ) + αθ Y i (θ, σ). sup Y i (θ, σ) C(X + 1)e CX = o p (n 1/ ), θ M 14

15 where X = max{ X i } = o p ( log n) (see Serfling, 1980, page 91). Similarly U i ( σ) CX = o p (n 1/ ). Thus, uniformly in θ, i.e., max{ η i } max{ δ i } = o p (1), (18) r 1n ( α(θ), θ, σ(θ)) = δ i δ i {1 + o p (1)}. Thus, from (15) with fixed θ, α and σ are such that It follows that R n (ɛ) r 1n ( α(θ), θ, σ(θ)) = { U i } U i + [{sgn(θ) Z i (θ)} + ] nez 1(θ) + o p (1). [{sgn(θ) Z i (θ)} + ] sup r n ( α(θ), θ, σ(θ)) = sup + o ɛ θ M ɛ θ M nez1(θ) p (1). (19) Theorem 1 Let X 1,, X n be a random sample from the mixture distribution (1 α)n(0, σ ) + αn(θ, σ ), where 0 α 1, θ M and σ > 0, otherwise unknown. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ = 0, i.e., N(0, σ ). Then under the null distribution N(0, 1), as n, R n sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 < θ M, ζ(θ) is a Gaussian process with mean 0 and variance 1 and the autocorrelation given by e st 1 (st) / ρ(s, t) = sgn(st) (e s 1 s 4 /)(e t 1 t 4 /), (0) for s, t 0. Proof. The proof starts with (17) and (19). Since the process n 1/ n Z i (θ)/ EZ1(θ), 15 θ M

16 converges weakly to a Gaussian process ξ(θ). Direct calculation of the mean and the covariance of Z i (θ) yields that the Gaussian process ξ(θ) has mean 0, variance 1 and the autocorrelation function for s, t 0 e st 1 (st) / (e s 1 s 4 /)(e t 1 t 4 /). Therefore, the upper bound of R n converges in probability to sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 θ M, ζ(θ) = sgn(θ)ξ(θ) follows the Gaussian process with mean 0, variance 1 and the autocorrelation function (0). For ɛ > 0 given, the lower bound of R n converges weakly to R(ɛ) = sup {ζ + (θ)}. ɛ θ M Now letting ɛ 0, R(ɛ) approaches in distribution This completes the proof. sup {ζ + (θ)}. θ M 3 Two-Mean Parameter Mixtures: Tests for Homogeneity In this section we study the testing problem when both mean parameters θ 1 and θ are unknown. In addition, assume 0 α 1/ so that θ 1 and θ are distinguishable. We wish to test H 0 : α = 0 or θ 1 = θ, veresus the full model (1). 16

17 Let X 1,..., X n be a random sample of size n from a mixture population (1 α)n(θ 1, σ )+αn(θ, σ ). Let r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, ˆθ, ˆθ, ˆσ 0 )}, where ˆθ = X and ˆσ 0 = n 1 (X i X) are the MLE s of θ 1 = θ = θ and σ under the null hypothesis. Explicitly, [ r n (α, θ 1, θ, σ) = log (1 α)σ 1 exp{ (X i θ 1 ) } + ασ 1 exp{ (X i θ ) ] } σ σ { (Xi +n 1 + log X) }. n Let ˆα, ˆθ 1, ˆθ and ˆσ be the MLEs for α, θ 1, θ and σ under the full model (1 α)n(θ 1, σ )+αn(θ, σ ). The LRT is to reject H 0 if the observed R n = r n (ˆα, ˆθ 1, ˆθ, ˆσ) is large. 3.1 Large sample behavior of the MLE s The statement of Lemma 1 remains true, i.e., under the null distribution N(0, 1), there are constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. The proof is also similar to that of Lemma 1. Note that r n (α, θ 1, θ, σ) can be expressed as [ r n (α, θ 1, θ, σ) = log 1 + α[exp{ (X i θ ) (X i θ 1 ) ] } 1] σ n log σ (Xi θ 1 ) { (Xi + n 1 + log X) }. σ n Thus the inequality (3) becomes where r n (α, θ, σ) n S n(θ 1, θ ) σ S n (θ 1, θ ) = n 1 n { n log σ (Xi + n 1 + log X) }, n [(X i θ 1 ) {(X i θ 1 ) (X i θ ) } + ]. 17

18 Uniformly in θ i M, i = 1,, S n (θ 1, θ ) approaches almost surely S(θ 1, θ ) = E(X θ 1 ) E[{(X θ 1 ) (X θ ) } + ]. The function S(θ 1, θ ) is continuous and positive over θ i M, i = 1,. Thus the minimum of S(θ 1, θ ) is positive, as required by the proof of Lemma 1. Lemma is re-written as follows. Lemma 3 Under the null distribution N(0, 1), as n, ˆθ 1 0, ˆα ˆθ +(1 ˆα) ˆθ 1 0, ˆα ˆθ 0 and ˆσ 1, in probability. Proof. The proof is similar to that of Lemma. Consider ɛ σ for some constants 0 < ɛ < 1 < <. Let the space G = {G : G(u) = (1 α)i(u θ 1 ) + αi(u θ ), 0 α 1/, θ i M, i = 1, } be metrized by the Lévy distance. Then the product space of [ɛ, ] G is compact. Furthermore, the parameters σ [ɛ, ] and G G Of the compactness and identifiability, Wald s argument leads to the consistency of the MLEs of σ and G. Therefore the MLEs of the moments u k dg(u) = (1 α)θ1 k + αθ k are consistent. Under the null distribution N(0, 1), u k dg(u) = 0. Thus, (1 ˆα)ˆθ 1 + ˆαˆθ 0 and (1 ˆα)ˆθ 1 + ˆαˆθ 0 which implies that ˆαˆθ 0 and ˆθ 1 0 since 1 ˆα 1/. The lemma is proved. In light of Lemma 3, without loss of generality, σ can be restricted to a small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive number δ. We proceed to derive the asymptotic distribution of the LRT. The new challenge in the present case is a loss of positive-definiteness of the quadratic term as in (11). To overcome the difficulty, the parameter space is partitioned into two parts: θ > ɛ and θ ɛ, for an arbitrarily small ɛ > 0. The LRT will be analyzed within each part by using the sandwich approach. Let R n (ɛ; I) denote the supremum of the likelihood function within the part θ > ɛ, and R n (ɛ; II) the supremum within θ ɛ. Then 18

19 R n = max{r n (ɛ; I), R n (ɛ; II)}. The number ɛ will remain fixed as n approaches infinity. It is easily seen that Lemma 3 remains true under either restriction θ > ɛ or θ ɛ. Dependence on ɛ will be suppressed notationally for the MLE s of the parameters. Thus ˆα, ˆθ 1, ˆθ and ˆσ will denote the constrained MLE s of α, θ 1, θ and σ with restriction θ ɛ in the analysis of R n (ɛ; I), but stand for the constrained MLE s with restriction θ ɛ in the analysis of R n (ɛ; II). 3. Analysis of R n (ɛ; I) We first establish an asymptotic upper bound for R n (ɛ; I). As in Section., write r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, 0, 0, 1)} + {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = r 1n (α, θ 1, θ, σ) + r n. To analyze r 1n (α, θ 1, θ, σ), express r 1n (α, θ 1, θ, σ) = log(1 + δ i ), where [ 1 δ i = (1 α) σ exp{x i (X i θ 1 ) ] [ 1 } 1 + α σ σ exp{x i (X i θ ) ] } 1 σ = (1 α)θ 1 Y i (θ 1, σ) + αθ Y i (θ, σ) + (σ 1)U i (σ), (1) with Y i (θ, σ) and U i (σ) are defined in (5) and (4). Re-write δ i = m 1 Y i (0, 1) + (σ 1 + m )U i (1) + m 3 V i (θ ) + ɛ in, where ɛ in is the remainder term of replacement, and m 1 = (1 α)θ 1 + αθ, m = (1 α)θ 1 + αθ, m 3 = αθ 3, V i (θ ) = Y i(θ, 1) Y i (0, 1) θ U i (1). () θ Define V i (0) = (X i /) + (Xi 3 /6) so that the function V i (θ) is continuously differentiable. By a similar analysis to the single mean parameter case, it is seen that the total remainder satisfies ɛ n = ɛ in = O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 (3) 19

20 Recall that U i = U i (1) = (X i 1)/ and Y i (0, 1) = X i. We have n δ i = m 1 X i + (σ n 1 + m ) U i + m 3 V i (θ ) + ɛ n. Since the remainders resulting from the square and cubic sums are or the same (or higher) order as that from the linear sum (see the similar analysis in the case of single mean parameter mixtures), we have r 1n (α, θ 1, θ, σ) δ i δi + (/3) δi 3 = {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} +(/3) {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} 3 O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 Furthermore, the cubic sum is negligible when compared to the square sum. This can be justified by using the idea leading to (14). First, the square sum times n 1 approaches E{m 1 X 1 +(σ 1+m )U 1 +m 3 V 1 (θ )} uniformly. The limit is a positive definitive quadratic form in variables m 1, σ 1 + m and m 3. Next, noting that X i, U i and V i (θ ) are mutually orthogonal, we see that r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + ˆɛ n. Here the terms with a hat are their (constrained) MLE s with restriction θ > ɛ as remarked in the end of Section 3.1. In particular, from (3), ˆɛ n = O p { n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 }. 0

21 By Cauchy inequality [e.g., n ˆm n ˆm 1] and the restriction θ > ɛ (hence ˆθ ɛ), we have n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 ˆσ 1 [4 + n{ ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) }] + ˆθ 1 (1 + nˆθ 4 1) = o p (1) + no p { ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) } = o p (1) + no p { ˆm 1 + (ˆσ 1 + ˆm ) + ˆm 3}. Hence the remainder term ˆɛ n can also be absorbed into the quadratic sum, i.e. r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + o p (1). Applying the argument leading to (16), the right-hand side of the above inequality becomes even greater when ˆm 1, ˆσ 1 + ˆm and ˆm 3 are replaced with Xi m 1 =, σ Ui 1 + m X =, m i U 3 = {sgn(θ ) V i (θ )} +, (4) i V i (θ ) for any ɛ < θ M, so that or r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } + { U i } [{sgn(θ) V i (θ)} + ] X + sup i U + o i ɛ< θ M V p (1), (5) i (θ) r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } n + { U i } n On the other hand, the classic analysis gives [{sgn(θ) V i (θ)} + ] + sup + o ɛ< θ M nev1 p (1). (θ) r n = {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = n X (/n){ U i } + o p (1). (6) Combining (5) and (6) yields R n (ɛ; I) [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (θ) 1

22 We thus have established an asymptotic upper bound for R n (ɛ; I). Again we can see that the upper bound is achievable. For θ ɛ fixed, let α, θ 1 and σ be the solutions for α, θ 1 and σ of (4). Then α = O p (n 1/ ), θ 1 = O p (n 1/ ) and σ 1 = O p (n 1/ ) uniformly in θ. The uniformity is ensured by the restriction θ > ɛ. Use the Taylor expansion r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + η i ), where η i < δ i. The argument leading to (18) also proves max{ η i } max{ δ i } = o p (1), so that By (4), α, θ 1 and σ are such that r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + o p (1)). sup r n ( α, θ [{sgn(θ) V i (θ)} + ] 1, θ, σ) = sup + o ɛ< θ M ɛ θ M nev1 p (1). (θ) It is thus shown that the asymptotic upper bound is achievable. That is, R n (ɛ; I) = This concludes the analysis of R n (ɛ; I). [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (7) (θ) 3.3 Analysis of R n (ɛ; II) Now consider the restriction θ ɛ. In this case, θ 1 and θ can be treated equally. In fact, since the MLE of θ 1 is consistent, we can restrict θ 1 ɛ as well. As before, we know that r 1n (α, θ 1, θ, σ) = log(1 + δ i ) δ i δi + (/3) δi 3. Let ˆm k = (1 ˆα)ˆθ k 1 + ˆαˆθ k. Using the Taylor expansions of Y i (ˆθ, ˆσ) and U i (ˆσ) in (1), we have ˆδ i = ˆm 1 Y i (0, 1) + (ˆσ 1 + ˆm )Y i (0, 1) + 1 ˆm 3Y i (0, 1) {3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm }Y i (0, 1) + ˆɛ in, (8)

23 where Y i (0, 1) is the first partial derivative of Y i (θ, σ) with respect to θ at θ = 0 and σ = 1 and similarly for Y i (0, 1) and Y i (0, 1). As before, put Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1). By calculation, Y i = U i (1) = (X i 1)/, Y i = (Xi 3 3X i )/3, and Y i = U i(1) = (Xi 4 6Xi +3)/4. The sum of the remainders ˆɛ n = ˆɛ in satisfies ˆɛ n = n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (9) Note that the cross product terms in the Taylor expansion of (8) have been taken into account in the remainder. For example, n 1/ (ˆσ 1) ˆm 1 = o p (n 1/ ˆm 1 ) = o p (1 + n ˆm 1). The coefficient n 1/ in the above results from the iid sum of zero-mean random variables as we have seen in the last section. Also note that 3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm = 3(ˆσ 1 + ˆm ) + ˆm 4 3 ˆm. Hence (8) reduces to where ˆδ i = ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i + ˆɛ in, ŝ 1 = ˆm 1, ŝ = ˆσ 1 + ˆm, ŝ 3 = (1/) ˆm 3, ŝ 4 = (1/6)( ˆm 4 3 ˆm ), (30) and combining (9), the sum of the remainders, ˆɛ n = ˆɛ in, becomes ˆɛ n = n 1/ ŝ O p (1) + n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (31) Therefore, an upper bound for r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) is r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3

24 {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } + {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3 + ˆɛ 3 n. The argument leading to (14) shows that the cubic sum is controlled by the square sum up to a term of ɛo p (1). Also note that Y i, Y i, Y i and Y i are mutually orthogonal and hence the quadratic sum is positive-definite. Therefore, r 1n (ˆα, ˆθ 1, ˆθ n, ˆσ) {ŝ 1 Y i + ŝ ŝ (Y i ) + ŝ 3 n Y i + ŝ 3 (Y i ) + ŝ 4 Since ˆσ 1 = o p (1), ˆm ɛ and ˆm 3 ˆm 6, we have n Y n i + ŝ 4 (Y Y i } {ŝ 1 Y i + i ) }{1 + ɛo p (1)} + ˆɛ n. (3) n 1/ (ˆσ 1) 3 8n 1/ { ŝ 3 + ˆm 3 } ɛn 1/ ŝ O p (1) + n 1/ ˆm 6 O p (1), so that (31) can be expressed as ˆɛ n = ɛn 1/ ŝ O p (1) + n( ˆm 1 + ˆm 3)o p (1) + n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (33) Now the key point is to show that ˆɛ n = o p (1) + ɛn{ŝ 1 + ŝ + ŝ 3 + ŝ 4}O p (1). (34) This result implies that the remainder is also negligible when compared to the square sum in (3). Put ˆτ = (1 ˆα) ˆθ ˆα ˆθ 5. Then ˆm 5 + ˆm 6 = O p (ˆτ). Therefore, (34) follows immediately from (33) and the following lemma. Lemma 4 ˆτ = o p (1) + ɛ{ ŝ 1 + ŝ + ŝ 3 + ŝ 4 }O p (1). 4

25 Proof. The proof is accomplished by partitioning the sample space into several parts and showing that in each part, one of ŝ i,,, 3, 4 controls the size of ˆτ. Consider the first part: (1 ˆα) ˆθ 1 γ ˆα ˆθ for a constant γ > 1. In this case, On the other hand, ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ (γ 1)ˆα ˆθ. ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ ( ) ˆα ˆθ (1 ˆα) ˆθ 1 1 (1 ˆα) ˆθ 1 (1 γ 1 )(1 ˆα) ˆθ 1. So, ˆτ = ɛo p {(1 ˆα) ˆθ 1 + ˆα ˆθ } = ɛo p (ŝ 1 ). Similarly, let (1 ˆα) ˆθ 1 3 γ ˆα ˆθ 3. We have ˆτ = ɛo p (ŝ 3 ). Finally, consider the case that γ 1 (1 ˆα) ˆθ 1 k ˆα ˆθ k γ, for k = 1, 3. Solving the inequalities with k = 1 and 3, we have γ ˆα/(1 ˆα) γ, implying (1 ˆα) ˆαγ and ˆα (1 ˆα)γ. Therefore, ˆm 4 ˆm = ˆα(1 ˆα)(ˆθ 1 ˆθ ) ˆα(1 ˆα)(ˆθ ˆθ 4 ) γ {(1 ˆα) ˆθ4 1 + ˆα ˆθ4 } γ ˆm. (35) So, ˆm 4 3 ˆm (γ ) ˆm < 0 when the constant γ is chosen priorly to be between 1 and. It follows that ˆm ˆm 4 3 ˆm /( γ ). 5

26 Equation (35) also implies ˆm 4 (1 + γ ) ˆm. Consequently, ˆτ = ɛo p ( ˆm 4 ) = ɛo p ( ˆm ) = ɛo p (ŝ 4 ). We have thus exhausted the sample space and the lemma follows. From (31) and (34), it follows that r 1n (ˆα, ˆθ 1, ˆθ n n, ˆσ) {ŝ 1 Y i + ŝ Y n i + ŝ 3 Y n i + ŝ 4 Y i } {ŝ 1 Yi + ŝ (Y i ) + ŝ 3 (Y i ) + ŝ 4 (Y i ) }{1 + ɛo p (1)}. Applying the quadratic form argument which has been used several times in the previous sections, we obtain that R n (ɛ; II) ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). The upper bound in the above inequality is attained when the parameters α, θ 1, θ and σ assume the values determined by the following equations: Yi Y i s 1 =, s Y = i (Y i ), Y i Y s 3 = (Y i ), s i 4 = (Y i ), where s 1, s, s 3 and s 4 are defined correspondingly by (30). We thus arrive at R n (ɛ; II) = ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). (36) Remark. A by-product of the above analysis shows that the MLE of σ has a convergence rate at most n 1/4. To see this, consider the submodel where θ 1 = θ = θ, α = 1/ and σ 1 = θ. The maximum of the likelihood function is achieved when m 4 3m = θ 4 = 6 Y i (Y i ) = O p(n 1/ ). This implies that ˆθ = O p (n 1/8 ) and ˆσ 1 = n 1/4. This is in contrast to the ordinary semi-parametric models, where one may still have the usual rate of n 1/ for 6

27 the parametric components. See Van der Vaart (1996). Moreover, the result suggests that the best possible rate for estimating the mixing distribution when a structural parameter is present, is n 1/8 rather than n 1/4 as found by Chen (1995) for the mixture models without a structure parameter. 3.4 Asymptotic distribution of the LRT Theorem Let X 1,, X n be a random sample from the mixture distribution (1 α)n(θ 1, σ ) + αn(θ, σ ), where 0 α 1/, θ i M, i = 1, and σ > 0. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ 1 = θ, i.e., N(θ 1, σ ). Then under the null distribution N(0, 1), as n, R n sup [{ς + (θ)} I(θ 0) + {ς(0) + Z }I(θ = 0)], θ M where the process involved in the limiting distribution is defined as follows: (1) ς(θ), θ M is a Gaussian process with mean 0, variance 1 and the autocorrelation function for s, t 0, b(st) ρ(s, t) = sgn(st), and ρ(0, t) = t 3 b(s )b(t ) 6b(t ), (37) where b(x) = e x 1 x x /, and () ς(0) and Z N(0, 1) are independent, and for s 0, Proof. For any fixed ɛ > 0, Cov{ς(s), Z} = s 4 6b(s ). (38) R n = max{r n (ɛ; I), R n (ɛ; II)}. By (7), the asymptotic distribution of R n (ɛ; I) is determined by the limit of { V i (θ)} nev1 (θ). 7

28 By the definition of V i (θ) given in (), we see that n 1/ n V i (θ)/{ev 1 (θ)} 1/ converges weakly to a Gaussian process, say ξ (θ), ɛ θ M with mean 0, variance 1 and autocorrelation function as follows: for ɛ s, t M, Cov{ξ (s), ξ (t)} = b(st)/ b(s )b(t ). Define ς(θ) = sgn(θ)ξ (θ) for θ 0 and ς(0) = ξ (0). Then ς(θ) is a Gaussian process with the autocorrelation function (37) and R n (ɛ; I) converges weakly to by first letting n and then ɛ 0. sup {ς + (θ)}, 0< θ M On the other hand, by (36) we can have that, by first letting n and then ɛ 0, R n (ɛ; II) converges weakly to ς(0) + Z. To see this, put R n (ɛ; II) = A n + ɛo p (1) in (36). For any η > 0, there exists C > 0 such that P ( O p (1) > C) < η for all large n s. Thus for any given x and n large, P (A n x ɛc) η P (R n (ɛ; II) x) P (A n x + ɛc) + η, implying that R n (ɛ; II) converges weakly to ς(0) + Z by first letting n and then ɛ 0 and finally η 0. The independence of ς(0) and Z is due to the fact V i (0) = Y i / and the orthogonality of Y i and Y i. The correlation between ς(θ) and Z is seen from the following calculation: Cov{V i (θ), Y i } = (θ/6)var(y i ) = θ/4, and Var{V i (θ)} = a(s )/θ 6. Thus the correlation between V i (θ) and Y i are given by (38). The proof is completed. 4 Conclusion Remark The asymptotic null distribution of the LRT for homogeneity in finite normal mixture models in the presence of a structural parameter has been derived without separation 8

29 conditions on the mean parameters. It is proved that the asymptotic null distribution of the LRT is the maximum of a χ -variable and the supremum of the square of a truncated Gaussian process. If the structural parameter were removed from the model, the peculiar large sample behavior of the LRT would disappear and the limiting null distribution would be simply the supremum of the square of the truncated Gaussian process and reduce to the one discovered by Chen and Chen (001a). If, in addition, let M approach infinity, the supremum is distributed approximately as that of ( log M) 1/ + {X log(π)}/( log M) 1/, where P (X x) = exp{ e x }, which is the type-i extreme value distribution. See Chernoff and Lander (1995, Appendix D) and Adler (1990). The result in Bickel and Chernoff (1993) can be obtained in a heuristic way by letting M = (log n/) 1/. It is interesting to see that the results from different model set-ups agree formally. Bickel and Chernoff actually dealt with a modified LRT by replacing a random element in the LRT statistic with its mean in order to simplify the analysis. It seems that their modification might not have changed the asymptotic behavior of the LRT substantially. Computing the quantiles of the supremum of a Gaussian process over a region is a difficult problem. See also the comments by Dacunha-Castelle and Gassiat (1999), and Chen and Chen (001b). Some approximations in special cases can be found in Adler (1990) and Sun (1993). Owning to the large sample study, it is found that even though the structural parameter is not part of the mixing distribution, the convergence rate of the MLE is n 1/4 rather than n 1/. This is in sharp contrast to the ordinary semi-parametric models. Moreover, the estimated mixing distribution has a convergence rate n 1/8 rather than n 1/4 as discovered by Chen (1995) for finite mixture models without a structural parameter. 9

30 REFERENCES Adler, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Institute of Mathematical Statistics, Lecture Notes. Vol. 1. Hayward, CA. Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in prototypical non regular problem. Statistics and Probability: a Raghu Raj Bahadur Festschrift. Ed: J.K. Ghosh, S.K. Mitra, K.R. Parthasarathy, and B.L.S. Prakasa Rao. Wiley Eastern Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Chen, H. and Chen, J. (001a). Large sample distribution of the likelihood ratio test for normal mixtures. Statistics & Probability Letters Chen, H. and Chen, J. (001b). The likelihood ratio test for homogeneity in the finite mixture models. Canad. J. Statist Chen, J. (1995). Optimal rate of convergence in finite mixture models. Ann. Statist Cheng, R.C.H. and Traylor, L. (1995). Non-regular maximum likelihood problems. J. Roy. Statist. Soc. B Chernoff, H. and Lander, E. (1995). Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. J. Statist. Plann. Inf Chow, Y.S., and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, New York. Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton. 30

31 Dacunha-Castelle, D. and Gassiat, É. (1999). Testing in locally conic models, and application to mixture models. Ann. Statist Dean, C.B. (199). Testing for overdispersion in Poisson and Binomial regression models. J. Amer. Statist. Assoc Ghosh, J.K. and Sen, P.K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Hartigan, J.A. (1985). A Failure of Likelihood Asymptotics for Normal Mixtures. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Lemdani, M. and Pons, O. (1999). Likelihood ratio tests in contamination models. Bernoulli Leroux, B. (199). Consistent estimation of a mixture distributions. Ann. Statist Lindsay, B.G. (1989). Moment matrices: applications in mixtures. Ann. Statist Rubin H. (1956). Uniform convergence of random functions with applications to statistics. Ann. Math. Statist Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Sun, J. (1993). Tail probabilities of the maxima of Gaussian random fields. The Annals of Probability

32 Van der Vaart, A. W. (1996). Efficient maximum likelihood estimation in semiparametric mixture models. Ann. Statist Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist Wilks, S.S. (1938). The large sample distribution of the likelihood ratio for testing composite hypothesis. Ann. Math. Statist., Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA. hchen@math.bgsu.edu Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ont NL 3G1, Canada jhchen@uwaterloo.ca 3

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS

A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for

More information

MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM

MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Statistica Sinica 19 (009), 1603-1619 MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Yuejiao Fu, Jiahua Chen and John D. Kalbfleisch York University, University of British Columbia

More information

INFORMATION CRITERION AND CHANGE POINT PROBLEM FOR REGULAR MODELS 1. Jianmin Pan

INFORMATION CRITERION AND CHANGE POINT PROBLEM FOR REGULAR MODELS 1. Jianmin Pan INFORMATION CRITERION AND CHANGE POINT PROBLEM FOR REGULAR MODELS 1 Jiahua Chen Department of Statistics & Actuarial Science University of Waterloo Waterloo, Ontario, Canada N2L 3G1 A. K. Gupta Department

More information

Testing for Homogeneity in Genetic Linkage Analysis

Testing for Homogeneity in Genetic Linkage Analysis Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Testing Algebraic Hypotheses

Testing Algebraic Hypotheses Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

A general upper bound of likelihood ratio for regression

A general upper bound of likelihood ratio for regression A general upper bound of likelihood ratio for regression Kenji Fukumizu Institute of Statistical Mathematics Katsuyuki Hagiwara Nagoya Institute of Technology July 4, 2003 Abstract This paper discusses

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Local Asymptotic Normality

Local Asymptotic Normality Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Chi-square lower bounds

Chi-square lower bounds IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

CUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS. Sangyeol Lee

CUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS. Sangyeol Lee CUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS Sangyeol Lee 1 Contents 1. Introduction of the CUSUM test 2. Test for variance change in AR(p) model 3. Test for Parameter Change in Regression Models

More information

Soo Hak Sung and Andrei I. Volodin

Soo Hak Sung and Andrei I. Volodin Bull Korean Math Soc 38 (200), No 4, pp 763 772 ON CONVERGENCE OF SERIES OF INDEENDENT RANDOM VARIABLES Soo Hak Sung and Andrei I Volodin Abstract The rate of convergence for an almost surely convergent

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS

ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Fitting circles to scattered data: parameter estimates have no moments

Fitting circles to scattered data: parameter estimates have no moments arxiv:0907.0429v [math.st] 2 Jul 2009 Fitting circles to scattered data: parameter estimates have no moments N. Chernov Department of Mathematics University of Alabama at Birmingham Birmingham, AL 35294

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies

A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

Testing for homogeneity in mixture models

Testing for homogeneity in mixture models Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP09/13 TESTING FOR HOMOGENEITY

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Stability and Sensitivity of the Capacity in Continuous Channels. Malcolm Egan

Stability and Sensitivity of the Capacity in Continuous Channels. Malcolm Egan Stability and Sensitivity of the Capacity in Continuous Channels Malcolm Egan Univ. Lyon, INSA Lyon, INRIA 2019 European School of Information Theory April 18, 2019 1 / 40 Capacity of Additive Noise Models

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Math 181B Homework 1 Solution

Math 181B Homework 1 Solution Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. 1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.

More information

Estimation of parametric functions in Downton s bivariate exponential distribution

Estimation of parametric functions in Downton s bivariate exponential distribution Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper McGill University Faculty of Science Department of Mathematics and Statistics Part A Examination Statistics: Theory Paper Date: 10th May 2015 Instructions Time: 1pm-5pm Answer only two questions from Section

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

CHANGE DETECTION IN TIME SERIES

CHANGE DETECTION IN TIME SERIES CHANGE DETECTION IN TIME SERIES Edit Gombay TIES - 2008 University of British Columbia, Kelowna June 8-13, 2008 Outline Introduction Results Examples References Introduction sunspot.year 0 50 100 150 1700

More information

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information