TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS
|
|
- Dale Goodwin
- 5 years ago
- Views:
Transcription
1 TESTS FOR HOMOGENEITY IN NORMAL MIXTURES IN THE PRESENCE OF A STRUCTURAL PARAMETER: TECHNICAL DETAILS By Hanfeng Chen and Jiahua Chen 1 Bowling Green State University and University of Waterloo Abstract. Often a question arises as to whether the observed data are a sample from a homogeneous population or the data have come from a heterogeneous population. particular, one wants to test for a single normal distribution versus a mixture of two normal distributions. Classic asymptotic results fail to apply to the problem since the model does not satisfy the regularity conditions. This paper investigates the large sample behavior of the likelihood ratio statistic for testing homogeneity in the normal mixture in location parameters with an unknown structural parameter. It is proved that the asymptotic null distribution of the likelihood ratio statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function under the null distribution. The correlation structure of the process involved in the limiting distribution is presented explicitly. From the large sample study, it is also found that even though the structural parameter is not part of the mixing distribution, the convergence rate of its maximum likelihood estimate is n 1/4 rather than n 1/, while the mixing distribution has a convergence rate n 1/8 rather than n 1/4. This is in sharp contrast to the ordinary semi-parametric models and to the mixture models without a structural parameter. Key words and phrases: Asymptotic distribution, Gaussian process, genetic analysis, finite mixture, likelihood ratio, non-regular model, semi-parametric model. AMS 1980 subject classifications. Primary 6F03; secondary 6F05. In 1 The work was supported in part by a grant from NSERC of Canada, and an FIL grant from Bowling Green State University. 1
2 1 Introduction Consider the following problem: Let X 1,..., X n be a random sample from a mixture population (1 α)n(θ 1, σ ) + αn(θ, σ ) with the probability density function (pdf) (1 α)σ 1 φ((x θ 1 )/σ) + ασ 1 φ((x θ )/σ), (1) where φ( ) is the pdf of the standard normal N(0, 1). We wish to test H 0 : α(1 α) = 0 or θ 1 = θ, versus the full model (1), i.e., to test N(θ, σ ) versus (1 α)n(θ 1, σ ) + αn(θ, σ ). The mixture pdf (1) can also be expressed as an integral σ 1 φ((x u)/σ)dg(u), with the mixing distribution G(u) = (1 α)i(u θ 1 ) + αi(u θ ). For parametric hypothesis testing problems it is customary to use the likelihood ratio as a test statistic. Under standard regularity conditions, a classic result of Wilks (1938) states that if the null hypothesis is true, the likelihood ratio test (LRT) has, asymptotically, a χ -distribution. However, the regularity conditions are not satisfied for the mixture problem considered here. First, the null hypothesis lies on the boundary of the parameter space, whereas the standard regularity conditions require it to be in the interior. Secondly, the two statements α = 0 and θ 1 = θ which equivalently specify the null hypothesis are not exclusive. That is, there is a loss of identifiability under the null model. One may think that the unidentifiability can be eliminated by reparameterization. In that scenario, a third problem appears. The Fisher information which characterizes the behavior of the maximum likelihood estimate (MLE) degenerates. Due to these irregularity problems, the classic results break down under the mixture model: the maximum likelihood estimators (MLE s) for some model parameters are inconsistent; the usual quadratic approximation to the likelihood function is no longer appropriate; Cramér (1946) s result about the
3 asymptotic normality of the MLE and Wilks (1938) s asymptotic χ -theory of the LRT do not hold. Cheng and Traylor (1995) identified the mixture model as one of four non-regular parametric models. Due to its appealing challenge in theoretic study and its important applications to various scientific disciplines such as human genetic linkage analysis, actuarial sciences and statistical ecology, there has been increasing interest for researchers in the mixture model in recent years (e.g., Hartigan, 1985; Ghosh and Sen, 1985; Lindsay, 1989; Leroux, 199; Chernoff and Lander, 1995; Dacunha-Castelle and Gassiat, 1999; Lemdani and Pons, 1999; Chen and Chen, 001a and b). The large sample behavior of the LRT for homogeneity in the mixture model indeed is a long-standing mystery. Hartigan (1985) showed that the LRT statistic tends to infinity with probability one if the mean parameters are unbounded. The divergence behavior of the LRT is further detailed by Bickel and Chernoff (1993). One of important implications by Hartigan s result is that a bounded assumption on the mean parameters is necessary for the LRT to have a limiting distribution. Under the boundedness assumption of the mean parameters, Ghosh and Sen (1985) gave the first version of the asymptotic distributions of the LRT for testing homogeneity. However, in addition to the boundedness, they had to impose a separation condition, i.e. θ 1 θ > ɛ for some given ɛ > 0. The separation condition is obviously unsatisfactory. There have been many attempts made to remove the separation condition. Lemdani and Pons (1999) used a reparameterization approach to investigate the testing problem when one of the mean parameters is known and their study showed that there is no obvious way to remove the separation condition. Dacunha-Castelle and Gassiat (1999) developed a general reparameterization method for the testing problem in locally conic models. Their results can be applied to some useful mixture models under some situations and especially interestingly to stationary ARMA models. In the meantime, Chen and Chen (001a and b) took a different approach, i.e., the so-called sandwich method, to attack the problem without the separation condition. 3
4 From the discussion above, to remove the separation condition has been one of the central issues in the large sample study of the LRT for homogeneity since Ghosh and Sen (1985). Also the studies have been confined to the mixture models without a structural parameter. This paper intends to investigate the general problem: (a) A structural parameter is included in the mixture model to bring the model closer to the reality. The structural parameter is not required to be bounded. (b) Test for homogeneity is considered, i.e., both the two mean parameters are assumed unknown. (c) The separation condition is removed from the model. Following Chen and Chen (001a and b), we will use the sandwich method to derive the asymptotic distribution of the LRT. The main challenge in the present problem, however, is to analyze the contribution to variation due to estimating the structural parameter and separate it from estimating the mixing distribution G. Even though the paper deals with the normal mixtures, the ideas and technical treatments are applicable to general parametric mixture models. Study of the normal mixtures elucidates the difficulties and clarifies the main issues in the mixture models, some of which are often buried in the analytic conditions in a general set-up. We will start our study in Section with the case of the single mean parameter mixture model. In this problem, one of the mean parameters θ 1 and θ in the model (1) is assumed known. While study of the single mean parameter mixtures has its own virtue, main purpose of the section is to demonstrate the main ideas behind our approach and outline the study for the general mixture model (1). The asymptotic distribution of the LRT for homogeneity under the model (1) is investigated in Section 3. It is shown that the asymptotic null distribution of the LRT statistic is the maximum of a χ -variable and supremum of the square of a truncated Gaussian process with mean 0 and variance 1. This reveals and exposes the unusual large sample behavior of the likelihood function. 4
5 Throughout the paper, without loss of generality, let the null underlying distribution be N(0, 1). For convenience of notation, we say X n (t) = O p (a n ) or = o p (a n ) if sup t T X n (t)/a n = O p (1) or sup t T X n (t)/a n = o p (1), where T is a suitably specified index set and a n is a sequence of constants or random variables. Single Mean Parameter Mixtures Assume in the model (1) that θ 1 is specific, say 0 and the other mean θ is unknown. Write θ = θ. In addition assume that θ M. Based on the observation X i s, we wish to use the LRT to test the null hypothesis H 0 : N(0, σ ) versus H a : (1 α)n(0, σ ) + αn(θ, σ ). The log-likelihood function of α, θ and σ is l n (α, θ, σ) = log[(1 α)σ 1 exp{ Xi /(σ )} + ασ 1 exp{ (X i θ) /(σ )}]. Let ˆσ 0 be the MLE of σ under the null hypothesis, i.e. ˆσ 0 = n 1 X i. Let r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, ˆσ 0 )}, i.e. { r n (α, θ, σ) = log 1 + α(exp{ X iθ θ } } 1) σ ( X n log σ ) i X + n 1 + log i. () σ n Let ˆα, ˆθ and ˆσ be the MLEs for α, θ and σ under the full model. Then the LRT is to reject the null hypothesis when R n = r n (ˆα, ˆθ, ˆσ) is large..1 Large sample behavior of the MLE s We first show that under the null hypothesis ˆσ infinity with probability approaching one. is bounded away from zero and 5
6 Lemma 1 Under the null distribution N(0, 1), there exist constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. Proof. Consider r n (α, θ, σ) defined by (). Note that when xθ θ 0, [ ] { } xθ θ xθ θ 1 + α exp{ } 1 exp. σ σ We thus have the inequality [(Xi θ θ ) + X ( i ] r n (α, θ, σ) n log σ + n 1 + log σ X i n ), (3) where t + = ti(t > 0) denotes the positive part of t. Since (X i θ θ ) + X i is equal to either (θ X i ) or X i, we see that r n (α, θ, σ) n log σ + n{1 + log( X i /n)}. Since log(n 1 X i ) 0 almost surely, the function r n (α, θ, σ) < 0 for all σ > with probability approaching 1 for some large constant. That is, lim P (ˆσ > ) = 0 for some constant. Next we show that ˆσ is also bounded away from zero asymptotically. By the uniform strong law of large numbers (see Rubin, 1956), n 1 {X i (θx i θ ) + } S(θ) = E{X (θx θ ) + }, almost surely and uniformly in θ M. Since S(θ) is continuous and positive, the minimum value of S(θ) is positive, say equal to q for some q > 0. Then with probability approaching one uniformly in α, θ and σ, ( r n (α, θ, σ) nqσ n log σ + n 1 + log Let ɛ > 0 be small enough such that q/ɛ log ɛ + 1 < 0. X i n ). It follows that with probability approaching 1 uniformly, the function r n (α, θ, σ) < 0 if σ < ɛ, implying that lim P (ˆσ ɛ) = 1. By Lemma 1, the parametric space of interest can be reduced to a compact one by restricting σ within the interval [ɛ, ]. 6
7 Lemma Under the null distribution N(0, 1), as n, ˆαˆθ 0 and ˆσ 1, in probability. Proof. As remarked, we only need to consider ɛ σ for some constants 0 < ɛ < 1 < <. Let G = {G( ) : G(u) = (1 α)i(u 0) + αi(u θ), θ M, 0 α 1}. Let the space G be metrized by taking the Lévy distance between two distribution functions G 1 and G as follows: λ(g 1, G ) = inf{τ > 0 : G 1 (u τ) τ G (u) G 1 (u + τ) + τ, for all u}. (It is well-known that the Lévy distance convergence is equivalent to the weak convergence of distribution functions. See, e.g., Chow and Teicher, 1978.) For any sequence of G j (u) = (1 α j )I(u 0) + α j I(u θ j ) in G, since θ j M < and 0 α j 1, one can find a subsequence G j such that both α j and θ j converge, say, to α and θ, implying that the subsequence G j converges to G (u) = (1 α )I(u 0) + α I(u θ ) G weakly, i.e., λ(g j, G ) 0. It is thus shown that G is compact, so is the product space Ω = {ω = (σ, G) : σ [ɛ, ], G G}. Moreover, for ω = (σ, G) Ω, put f(x; ω) = σ 1 φ{(x u)/σ}dg(u). Then the parameter ω Ω is identifiable, i.e., for any ω i Ω,,, f(x; ω 1 ) = f(x; ω ), for all x, implies ω 1 = ω. Of the compactness and identifiability, Wald (1949) s argument leads to consistency of the MLE ˆω for ω = (σ, G) under the null model ω 0 = (1, G 0 ), where G 0 (u) = I(u 0). To see this, give any small γ > 0 such that {ω = (σ, G) : ω ω 0 = σ 1 + λ(g, G 0 ) < γ} Ω. 7
8 Let Ω = {ω Ω : ω ω 0 γ}. For any ω Ω, define an open ball in Ω as B(ω, η) = {ω Ω : ω ω < η} and define f(x B) = sup{f(x; ω)}. ω B It is seen that f(x B(ω, η)) (πɛ) 1/ so that E{ log f(x B(ω, η) } <, and as η 0, E{log f(x B(ω, η))} E{log f(x; ω )}. For each ω, take η(ω ) to be small enough such that E{log[f(X B(ω, η(ω )))/φ(x)]} < 0. (Note that f(x; ω 0 ) = φ(x).) Then there are finite many ω s, say ω 1,, ω m, such that m j=1b j = Ω, where B j = B(ω j, η(ω j )). By the law of large numbers, for 1 j m, P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, where i.o stands for infinitely often. Consequently, for the MLE ˆω = (ˆσ, Ĝ), P {ˆω Ω, m i.o} P {n 1 n log[f(x i B j )/φ(x i )] 0, i.o} = 0, j=1 i.e., ˆσ 1 and λ(ĝ, G 0) 0 in probability. Finally, it is implied that the MLEs of the moments u k dg(u) = αθ k are also consistent. Note that u k dg 0 (u) = 0 under the null model N(0, 1). The lemma is proved. Attributing to Lemma, σ can be viewed to have been restricted in any small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive δ. This restriction will be used to ensure the tightness of some processes later. Here we would like to point out that Lemma does not imply anything about the rate of convergence. We also like to remark that Lemma does not say that ˆα or ˆθ is consistent. In fact, ˆα and ˆθ are inconsistent under the null model. See Chernoff and Lander (1995) s discussion in the binomial mixture model that is also applicable to the normal mixture model. 8
9 . Asymptotic distribution of the LRT We proceed to study the large sample behavior of the LRT. A sandwich idea is employed to derive the asymptotic null distribution of R n. asymptotic upper bound for R n. Write We first establish an r n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)} + {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} = r 1n (α, θ, σ) + r n, where r 1n (α, θ, σ) = {l n (α, θ, σ) l n (0, 0, 1)}, r n (σ) = {l n (0, 0, 1) l n (0, 0, ˆσ 0 )} and ˆσ 0 = n 1 X i, the MLE of σ under the null model. We first analyze r 1n (α, θ, σ). Express r 1n (α, θ, σ) = log(1 + δ i ), where δ i = (σ 1)U i (σ) + αθy i (θ, σ), with and U i (σ) = (σ 1) 1 [ 1 σ exp{ X i ( 1 σ 1)} 1 ], (4) Y i (θ, σ) = 1 [ exp{ (X i θ) ] + X i σθ σ } exp{ X i σ + X i }. (5) The functions U i (σ) and Y i (θ, σ) are continuously differentiable by defining U i (1) = (X i 1)/ and Y i (0, σ) = σ 3 X i exp{ X i (σ 1)/}. Also note that under the null distribution N(0, 1), E{U i (σ)} = 0 and E{Y i (θ, σ)} = 0 for any σ and θ. By the inequality log(1 + x) x x + 3 x3, we have r 1n (α, θ, σ) = log(1 + δ i ) δ i δi + δi 3. 3 Re-write δ i as δ i = (σ 1)U i (1) + αθy i (θ, 1) + ɛ in, (6) 9
10 where the remainder ɛ in = (σ 1){U i (σ) U i (1)} + αθ{y i (θ, σ) Y i (θ, 1)}. The following proposition can be used to estimate the sum of the remainders. Proposition 1 Let 0 < δ < 1. Then under the null distribution N(0, 1), the processes U n(σ) = n 1/ {U i (σ) U i (1)}/(σ 1) and Y n (θ, σ) = n 1/ {Y i (θ, σ) Y i (θ, 1)}/(σ 1), σ [1 δ, 1 + δ] and θ M, are tight. Proof. We only need to verify the Lipschitz condition in light of Billingsley (1968, p.95). That is, for U n(σ), to prove that E{U n(σ 1 ) U n(σ )} C(σ 1 σ ), for some constant C. Since E{U i (σ)} = 0, it is sufficient to prove that the square of the derivative of {U i (σ) U i (1)}/(σ 1) is bounded by an integrable random variable, say g(x i ). Furthermore, since {U i (σ) U i (1)}/(σ 1) is a second order of difference of the function H i (σ) = σ 1 exp{ X i (1/σ 1)}, it is enough to prove that H i (σ) g(x i ) for all σ [1 δ, 1 + δ]. By direct calculations, we see that for some constant C, H i (σ) C(X 6 i + X 4 i + X i + 1) exp { X i δ/(1 + δ) }. It is clear that the right hand side of the above is integrable under the null distribution N(0, 1), since 0 < δ < 1. Similarly, for Y (θ, σ), it is sufficient to show that θ H i(θ, σ) + θ σ H i(θ, σ) g(x i ), for some integrable g(x i ), where H i (θ, σ ) = σ 1 exp{ (X i θ) /(σ ) + X i /}. Again by direct calculations, we have, for some constant C, { θ H i(θ, σ) C(X i + X i + 1) exp (X i θ) } + Xi (1 + δ) C(X i + X i + 1) exp { δx i /(1 + δ) + M X i }. 10
11 The rightmost of the above inequality is again integrable under the null distribution N(0, 1), as 0 < δ < 1. The similar argument can also be used to show that H i (θ, σ)/ θ σ is bounded above by an integrable random variable. The proof is completed. By Proposition 1, U n(σ) = O p (1) and Y n (θ, σ) = O p (1), implying that ɛin = n 1/ (σ 1) O p (1) + n 1/ αθ(σ 1)O p (1). (7) (Note that by the convention, U n(σ) = O p (1) means and Y n (θ, σ) = O p (1) means sup Un(σ) = O p (1), σ 1 δ sup Yn (θ, σ) = O p (1). θ M, σ 1 δ ) For convenience of notation, put E n1 = (σ 1) O p (1), E n = αθ(σ 1)O p (1), U i = U i (1), and Y i (θ) = Y i (θ, 1). By (6) and (7), we obtain δ i = {(σ 1)U i + αθy i (θ)} + n 1/ (E n1 + E n ). (8) Similarly, we can replace σ with 1 in the square and cubic terms of δ i, and arrive at the following: and δi = {(σ 1)U i + αθy i (θ)} + n(en1 + En), (9) δi 3 {(σ 1)U i + αθy i (θ)} 3 = n( E n1 3 + E n 3 ). (10) It is important to note that in (10), the remainder terms have a factor of n rather than n 3/. To see this, e.g., (σ 1){U i (σ) U i } 3 = n(σ 1) 6 (1/n) {U i (σ) U i }/(σ 1) 3 = n(σ 1) 6 O p (1) = n E n
12 Now by (8), (9) and (10), r 1n (α, θ, σ) {(σ 1)U i + αθy i (θ)} {(σ 1)U i + αθy i (θ)} +(/3) {(σ 1)U i + αθy i (θ)} 3 3 +n 1/ (E n1 + E n ) + n ( E n1 j + E n j ). (11) j= Introduce Z i (θ) = Y i (θ) θu i. Then (σ 1)U i + αθy i (θ) = t 1 U i + t Z i (θ) where t 1 = σ 1 + αθ, t = αθ. Since U i and Z i (θ) are orthogonal, i.e., EU i Z i (θ) = 0, the cubic and remainder terms in (11) are controlled by the square term. In fact, the square sum times n 1 converges uniformly to a positive definite quadratic form in t 1 and t, and n 1 { Z i (θ) 3 + U i 3 } = O p (1) uniformly. Thus, t1 U i + t Z i (θ) 3 {t1 U i + t Z i (θ)} ( t 1 + t )O p (1). As for the remainder terms in (11), since θ M, n 1/ E n1 = n 1/ (σ 1) O p (1) n 1/ (t 1 + t ) O p (1) = o p { [t 1 U i + t Z i (θ)] }, and similarly n 1/ E n n 1/ {t +(σ 1) } O p (1) n 1/ (t 1+t ) O p (1) = o p { [t 1 U i +t Z i (θ)] }. (Note that when t 1 = t = 0, i.e., σ = 1 and θ = 0 in the above inequalities, r 1n = 0 = o p (1). Thus, this case can be ignored here and on other similar occasions 1
13 in the sequel.) The other remainder terms resulting from the square or cubic sum are of the same (or higher) order as that from the linear sum. In fact, n(en1 + En) n(t 1 + t ) O p (1) = (t 1 + t )O p { [t 1 U i + t Z i (θ)] }, n( E n1 3 + E n 3 ) (t 1 + t )O p (nen1 + nen). It is then concluded that (11) can be expressed as r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1 U i + t Z i (θ)} {1 + ( t 1 + t )O p (1) + o p (1)}. (1) Since U i and Z i (θ) are orthogonal, (1) is further reduced to r 1n (α, θ, σ) {t 1 U i + t Z i (θ)} {t 1Ui + t Zi (θ)}{1 + ( t 1 + t )O p (1) + o p (1)}. (13) Let ˆt 1 = ˆσ 1 + ˆαˆθ and ˆt = ˆαˆθ be the MLE s. By Lemma, ˆt 1 = o p (1) and ˆt = o p (1). Consequently, replacement of the MLE s in (13) gives r 1n (ˆα, ˆθ, ˆσ) {ˆt 1 U i + ˆt Z i (ˆθ)} {ˆt 1Ui + ˆt Zi (ˆθ)}{1 + o p (1)}. (14) To obtain a short form for the upper bound, fix θ and consider the quadratic function n n Q(t 1, t ) = {t 1 U i + t Z i (θ)} {t 1 Ui + t Zi (θ)}. If θ 0, then t = αθ 0 and if θ < 0, then t < 0. By considering the regions of t 0 and t < 0 separately, we see that for fixed θ, Q(t 1, t ) is maximized at t 1 = t 1 and t = t with Ui t 1 =, t U = [sgn(θ) Z i (θ)] +, (15) i Z i (θ) where sgn(θ) is the sign function, and Q( t 1, t ) = { U i } U i + [{sgn(θ) Z i (θ)} + ]. Z i (θ) 13
14 Therefore, by (14) it follows that r 1n (ˆα, ˆθ, ˆσ) { U i } {1 + o U p (1)} + sup i θ M = { U i } U i [{sgn(θ) Z i (θ)} + ] {1 + o Z p (1)} i (θ) [{sgn(θ) Z i (θ)} + ] + sup + o θ M Z p (1). (16) i (θ) Recall that R n = r n (ˆα, ˆθ, ˆσ) = r 1n (ˆα, ˆθ, ˆσ) + r n, and note that r n renders an ordinary quadratic approximation, i.e., r n = { U i } U i + o p (1). An upper bound for R n is thus obtained as follows: [{sgn(θ) n R n sup Z i (θ)} + ] + o θ M nez1(θ) p (1). (17) Here nez 1(θ) substitutes for Z i (θ) since they are equivalent asymptotically and uniformly. To obtain a lower bound for R n, let ɛ > 0 be any small number fixed. Let R n (ɛ) be the supremum of r n (α, θ, σ) under restriction ɛ θ M. For fixed ɛ θ M, let α(θ) and σ(θ) assume the values determined by (15). Consider the Taylor expansion r 1n ( α(θ), θ, σ(θ)) = δ i δ i (1 + η i ), where η i < δ i and δ i is equal to δ i in (6) with α = α(θ) and σ = σ(θ). Attributing to bounding θ away from 0, the solution α(θ) is feasible, so that σ (θ) 1 = O p (n 1/ ) and α(θ) = O p (n 1/ ), uniformly in θ [ɛ, M]. Since δ i = ( σ 1)U i ( σ)+ αθy i (θ, σ), For a general constant C, δ i σ 1 U i ( σ) + αθ Y i (θ, σ). sup Y i (θ, σ) C(X + 1)e CX = o p (n 1/ ), θ M 14
15 where X = max{ X i } = o p ( log n) (see Serfling, 1980, page 91). Similarly U i ( σ) CX = o p (n 1/ ). Thus, uniformly in θ, i.e., max{ η i } max{ δ i } = o p (1), (18) r 1n ( α(θ), θ, σ(θ)) = δ i δ i {1 + o p (1)}. Thus, from (15) with fixed θ, α and σ are such that It follows that R n (ɛ) r 1n ( α(θ), θ, σ(θ)) = { U i } U i + [{sgn(θ) Z i (θ)} + ] nez 1(θ) + o p (1). [{sgn(θ) Z i (θ)} + ] sup r n ( α(θ), θ, σ(θ)) = sup + o ɛ θ M ɛ θ M nez1(θ) p (1). (19) Theorem 1 Let X 1,, X n be a random sample from the mixture distribution (1 α)n(0, σ ) + αn(θ, σ ), where 0 α 1, θ M and σ > 0, otherwise unknown. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ = 0, i.e., N(0, σ ). Then under the null distribution N(0, 1), as n, R n sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 < θ M, ζ(θ) is a Gaussian process with mean 0 and variance 1 and the autocorrelation given by e st 1 (st) / ρ(s, t) = sgn(st) (e s 1 s 4 /)(e t 1 t 4 /), (0) for s, t 0. Proof. The proof starts with (17) and (19). Since the process n 1/ n Z i (θ)/ EZ1(θ), 15 θ M
16 converges weakly to a Gaussian process ξ(θ). Direct calculation of the mean and the covariance of Z i (θ) yields that the Gaussian process ξ(θ) has mean 0, variance 1 and the autocorrelation function for s, t 0 e st 1 (st) / (e s 1 s 4 /)(e t 1 t 4 /). Therefore, the upper bound of R n converges in probability to sup {ζ + (θ)}, θ M where ζ(0) = 0 and for 0 θ M, ζ(θ) = sgn(θ)ξ(θ) follows the Gaussian process with mean 0, variance 1 and the autocorrelation function (0). For ɛ > 0 given, the lower bound of R n converges weakly to R(ɛ) = sup {ζ + (θ)}. ɛ θ M Now letting ɛ 0, R(ɛ) approaches in distribution This completes the proof. sup {ζ + (θ)}. θ M 3 Two-Mean Parameter Mixtures: Tests for Homogeneity In this section we study the testing problem when both mean parameters θ 1 and θ are unknown. In addition, assume 0 α 1/ so that θ 1 and θ are distinguishable. We wish to test H 0 : α = 0 or θ 1 = θ, veresus the full model (1). 16
17 Let X 1,..., X n be a random sample of size n from a mixture population (1 α)n(θ 1, σ )+αn(θ, σ ). Let r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, ˆθ, ˆθ, ˆσ 0 )}, where ˆθ = X and ˆσ 0 = n 1 (X i X) are the MLE s of θ 1 = θ = θ and σ under the null hypothesis. Explicitly, [ r n (α, θ 1, θ, σ) = log (1 α)σ 1 exp{ (X i θ 1 ) } + ασ 1 exp{ (X i θ ) ] } σ σ { (Xi +n 1 + log X) }. n Let ˆα, ˆθ 1, ˆθ and ˆσ be the MLEs for α, θ 1, θ and σ under the full model (1 α)n(θ 1, σ )+αn(θ, σ ). The LRT is to reject H 0 if the observed R n = r n (ˆα, ˆθ 1, ˆθ, ˆσ) is large. 3.1 Large sample behavior of the MLE s The statement of Lemma 1 remains true, i.e., under the null distribution N(0, 1), there are constants 0 < ɛ < < such that lim P (ɛ n ˆσ ) = 1. The proof is also similar to that of Lemma 1. Note that r n (α, θ 1, θ, σ) can be expressed as [ r n (α, θ 1, θ, σ) = log 1 + α[exp{ (X i θ ) (X i θ 1 ) ] } 1] σ n log σ (Xi θ 1 ) { (Xi + n 1 + log X) }. σ n Thus the inequality (3) becomes where r n (α, θ, σ) n S n(θ 1, θ ) σ S n (θ 1, θ ) = n 1 n { n log σ (Xi + n 1 + log X) }, n [(X i θ 1 ) {(X i θ 1 ) (X i θ ) } + ]. 17
18 Uniformly in θ i M, i = 1,, S n (θ 1, θ ) approaches almost surely S(θ 1, θ ) = E(X θ 1 ) E[{(X θ 1 ) (X θ ) } + ]. The function S(θ 1, θ ) is continuous and positive over θ i M, i = 1,. Thus the minimum of S(θ 1, θ ) is positive, as required by the proof of Lemma 1. Lemma is re-written as follows. Lemma 3 Under the null distribution N(0, 1), as n, ˆθ 1 0, ˆα ˆθ +(1 ˆα) ˆθ 1 0, ˆα ˆθ 0 and ˆσ 1, in probability. Proof. The proof is similar to that of Lemma. Consider ɛ σ for some constants 0 < ɛ < 1 < <. Let the space G = {G : G(u) = (1 α)i(u θ 1 ) + αi(u θ ), 0 α 1/, θ i M, i = 1, } be metrized by the Lévy distance. Then the product space of [ɛ, ] G is compact. Furthermore, the parameters σ [ɛ, ] and G G Of the compactness and identifiability, Wald s argument leads to the consistency of the MLEs of σ and G. Therefore the MLEs of the moments u k dg(u) = (1 α)θ1 k + αθ k are consistent. Under the null distribution N(0, 1), u k dg(u) = 0. Thus, (1 ˆα)ˆθ 1 + ˆαˆθ 0 and (1 ˆα)ˆθ 1 + ˆαˆθ 0 which implies that ˆαˆθ 0 and ˆθ 1 0 since 1 ˆα 1/. The lemma is proved. In light of Lemma 3, without loss of generality, σ can be restricted to a small neighborhood of σ = 1, say [1 δ, 1 + δ] for a small positive number δ. We proceed to derive the asymptotic distribution of the LRT. The new challenge in the present case is a loss of positive-definiteness of the quadratic term as in (11). To overcome the difficulty, the parameter space is partitioned into two parts: θ > ɛ and θ ɛ, for an arbitrarily small ɛ > 0. The LRT will be analyzed within each part by using the sandwich approach. Let R n (ɛ; I) denote the supremum of the likelihood function within the part θ > ɛ, and R n (ɛ; II) the supremum within θ ɛ. Then 18
19 R n = max{r n (ɛ; I), R n (ɛ; II)}. The number ɛ will remain fixed as n approaches infinity. It is easily seen that Lemma 3 remains true under either restriction θ > ɛ or θ ɛ. Dependence on ɛ will be suppressed notationally for the MLE s of the parameters. Thus ˆα, ˆθ 1, ˆθ and ˆσ will denote the constrained MLE s of α, θ 1, θ and σ with restriction θ ɛ in the analysis of R n (ɛ; I), but stand for the constrained MLE s with restriction θ ɛ in the analysis of R n (ɛ; II). 3. Analysis of R n (ɛ; I) We first establish an asymptotic upper bound for R n (ɛ; I). As in Section., write r n (α, θ 1, θ, σ) = {l n (α, θ 1, θ, σ) l n (0, 0, 0, 1)} + {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = r 1n (α, θ 1, θ, σ) + r n. To analyze r 1n (α, θ 1, θ, σ), express r 1n (α, θ 1, θ, σ) = log(1 + δ i ), where [ 1 δ i = (1 α) σ exp{x i (X i θ 1 ) ] [ 1 } 1 + α σ σ exp{x i (X i θ ) ] } 1 σ = (1 α)θ 1 Y i (θ 1, σ) + αθ Y i (θ, σ) + (σ 1)U i (σ), (1) with Y i (θ, σ) and U i (σ) are defined in (5) and (4). Re-write δ i = m 1 Y i (0, 1) + (σ 1 + m )U i (1) + m 3 V i (θ ) + ɛ in, where ɛ in is the remainder term of replacement, and m 1 = (1 α)θ 1 + αθ, m = (1 α)θ 1 + αθ, m 3 = αθ 3, V i (θ ) = Y i(θ, 1) Y i (0, 1) θ U i (1). () θ Define V i (0) = (X i /) + (Xi 3 /6) so that the function V i (θ) is continuously differentiable. By a similar analysis to the single mean parameter case, it is seen that the total remainder satisfies ɛ n = ɛ in = O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 (3) 19
20 Recall that U i = U i (1) = (X i 1)/ and Y i (0, 1) = X i. We have n δ i = m 1 X i + (σ n 1 + m ) U i + m 3 V i (θ ) + ɛ n. Since the remainders resulting from the square and cubic sums are or the same (or higher) order as that from the linear sum (see the similar analysis in the case of single mean parameter mixtures), we have r 1n (α, θ 1, θ, σ) δ i δi + (/3) δi 3 = {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} +(/3) {m 1 X i + (σ 1 + m )U i + m 3 V i (θ )} 3 O p { n σ 1 [ m 1 + θ1 + αθ + σ 1 ] + n θ1 }. 3 Furthermore, the cubic sum is negligible when compared to the square sum. This can be justified by using the idea leading to (14). First, the square sum times n 1 approaches E{m 1 X 1 +(σ 1+m )U 1 +m 3 V 1 (θ )} uniformly. The limit is a positive definitive quadratic form in variables m 1, σ 1 + m and m 3. Next, noting that X i, U i and V i (θ ) are mutually orthogonal, we see that r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + ˆɛ n. Here the terms with a hat are their (constrained) MLE s with restriction θ > ɛ as remarked in the end of Section 3.1. In particular, from (3), ˆɛ n = O p { n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 }. 0
21 By Cauchy inequality [e.g., n ˆm n ˆm 1] and the restriction θ > ɛ (hence ˆθ ɛ), we have n ˆσ 1 [ ˆm 1 + ˆθ 1 + ˆαˆθ + ˆσ 1 ] + n ˆθ 3 1 ˆσ 1 [4 + n{ ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) }] + ˆθ 1 (1 + nˆθ 4 1) = o p (1) + no p { ˆm 1 + ˆθ (ˆαˆθ ) + (ˆσ 1) } = o p (1) + no p { ˆm 1 + (ˆσ 1 + ˆm ) + ˆm 3}. Hence the remainder term ˆɛ n can also be absorbed into the quadratic sum, i.e. r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) n { ˆm 1 X i + (ˆσ n 1 + ˆm ) U i + ˆm 3 V i (ˆθ )} { ˆm 1 Xi + (ˆσ 1 + ˆm ) n Ui + ˆm 3 Vi (ˆθ )}{1 + o p (1)} + o p (1). Applying the argument leading to (16), the right-hand side of the above inequality becomes even greater when ˆm 1, ˆσ 1 + ˆm and ˆm 3 are replaced with Xi m 1 =, σ Ui 1 + m X =, m i U 3 = {sgn(θ ) V i (θ )} +, (4) i V i (θ ) for any ɛ < θ M, so that or r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } + { U i } [{sgn(θ) V i (θ)} + ] X + sup i U + o i ɛ< θ M V p (1), (5) i (θ) r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) { X i } n + { U i } n On the other hand, the classic analysis gives [{sgn(θ) V i (θ)} + ] + sup + o ɛ< θ M nev1 p (1). (θ) r n = {l n (0, 0, 0, 1) l n (0, ˆθ, ˆθ, ˆσ 0 )} = n X (/n){ U i } + o p (1). (6) Combining (5) and (6) yields R n (ɛ; I) [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (θ) 1
22 We thus have established an asymptotic upper bound for R n (ɛ; I). Again we can see that the upper bound is achievable. For θ ɛ fixed, let α, θ 1 and σ be the solutions for α, θ 1 and σ of (4). Then α = O p (n 1/ ), θ 1 = O p (n 1/ ) and σ 1 = O p (n 1/ ) uniformly in θ. The uniformity is ensured by the restriction θ > ɛ. Use the Taylor expansion r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + η i ), where η i < δ i. The argument leading to (18) also proves max{ η i } max{ δ i } = o p (1), so that By (4), α, θ 1 and σ are such that r 1n ( α, θ 1, θ, σ) = δ i δ i (1 + o p (1)). sup r n ( α, θ [{sgn(θ) V i (θ)} + ] 1, θ, σ) = sup + o ɛ< θ M ɛ θ M nev1 p (1). (θ) It is thus shown that the asymptotic upper bound is achievable. That is, R n (ɛ; I) = This concludes the analysis of R n (ɛ; I). [{sgn(θ) V i (θ)} + ] sup + o ɛ< θ M nev1 p (1). (7) (θ) 3.3 Analysis of R n (ɛ; II) Now consider the restriction θ ɛ. In this case, θ 1 and θ can be treated equally. In fact, since the MLE of θ 1 is consistent, we can restrict θ 1 ɛ as well. As before, we know that r 1n (α, θ 1, θ, σ) = log(1 + δ i ) δ i δi + (/3) δi 3. Let ˆm k = (1 ˆα)ˆθ k 1 + ˆαˆθ k. Using the Taylor expansions of Y i (ˆθ, ˆσ) and U i (ˆσ) in (1), we have ˆδ i = ˆm 1 Y i (0, 1) + (ˆσ 1 + ˆm )Y i (0, 1) + 1 ˆm 3Y i (0, 1) {3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm }Y i (0, 1) + ˆɛ in, (8)
23 where Y i (0, 1) is the first partial derivative of Y i (θ, σ) with respect to θ at θ = 0 and σ = 1 and similarly for Y i (0, 1) and Y i (0, 1). As before, put Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1), Y i = Y i (0, 1). By calculation, Y i = U i (1) = (X i 1)/, Y i = (Xi 3 3X i )/3, and Y i = U i(1) = (Xi 4 6Xi +3)/4. The sum of the remainders ˆɛ n = ˆɛ in satisfies ˆɛ n = n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (9) Note that the cross product terms in the Taylor expansion of (8) have been taken into account in the remainder. For example, n 1/ (ˆσ 1) ˆm 1 = o p (n 1/ ˆm 1 ) = o p (1 + n ˆm 1). The coefficient n 1/ in the above results from the iid sum of zero-mean random variables as we have seen in the last section. Also note that 3(ˆσ 1) + ˆm 4 + 6(ˆσ 1) ˆm = 3(ˆσ 1 + ˆm ) + ˆm 4 3 ˆm. Hence (8) reduces to where ˆδ i = ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i + ˆɛ in, ŝ 1 = ˆm 1, ŝ = ˆσ 1 + ˆm, ŝ 3 = (1/) ˆm 3, ŝ 4 = (1/6)( ˆm 4 3 ˆm ), (30) and combining (9), the sum of the remainders, ˆɛ n = ˆɛ in, becomes ˆɛ n = n 1/ ŝ O p (1) + n 1/ (ˆσ 1) 3 O p (1) + n( ˆm 1 + ˆm 3)o p (1) +n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (31) Therefore, an upper bound for r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) is r 1n (ˆα, ˆθ 1, ˆθ, ˆσ) {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3
24 {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } + {ŝ 1 Y i + ŝ Y i + ŝ 3 Y i + ŝ 4 Y i } 3 + ˆɛ 3 n. The argument leading to (14) shows that the cubic sum is controlled by the square sum up to a term of ɛo p (1). Also note that Y i, Y i, Y i and Y i are mutually orthogonal and hence the quadratic sum is positive-definite. Therefore, r 1n (ˆα, ˆθ 1, ˆθ n, ˆσ) {ŝ 1 Y i + ŝ ŝ (Y i ) + ŝ 3 n Y i + ŝ 3 (Y i ) + ŝ 4 Since ˆσ 1 = o p (1), ˆm ɛ and ˆm 3 ˆm 6, we have n Y n i + ŝ 4 (Y Y i } {ŝ 1 Y i + i ) }{1 + ɛo p (1)} + ˆɛ n. (3) n 1/ (ˆσ 1) 3 8n 1/ { ŝ 3 + ˆm 3 } ɛn 1/ ŝ O p (1) + n 1/ ˆm 6 O p (1), so that (31) can be expressed as ˆɛ n = ɛn 1/ ŝ O p (1) + n( ˆm 1 + ˆm 3)o p (1) + n 1/ ( ˆm 5 + ˆm 6 )O p (1) + o p (1). (33) Now the key point is to show that ˆɛ n = o p (1) + ɛn{ŝ 1 + ŝ + ŝ 3 + ŝ 4}O p (1). (34) This result implies that the remainder is also negligible when compared to the square sum in (3). Put ˆτ = (1 ˆα) ˆθ ˆα ˆθ 5. Then ˆm 5 + ˆm 6 = O p (ˆτ). Therefore, (34) follows immediately from (33) and the following lemma. Lemma 4 ˆτ = o p (1) + ɛ{ ŝ 1 + ŝ + ŝ 3 + ŝ 4 }O p (1). 4
25 Proof. The proof is accomplished by partitioning the sample space into several parts and showing that in each part, one of ŝ i,,, 3, 4 controls the size of ˆτ. Consider the first part: (1 ˆα) ˆθ 1 γ ˆα ˆθ for a constant γ > 1. In this case, On the other hand, ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ (γ 1)ˆα ˆθ. ˆm 1 (1 ˆα) ˆθ 1 ˆα ˆθ ( ) ˆα ˆθ (1 ˆα) ˆθ 1 1 (1 ˆα) ˆθ 1 (1 γ 1 )(1 ˆα) ˆθ 1. So, ˆτ = ɛo p {(1 ˆα) ˆθ 1 + ˆα ˆθ } = ɛo p (ŝ 1 ). Similarly, let (1 ˆα) ˆθ 1 3 γ ˆα ˆθ 3. We have ˆτ = ɛo p (ŝ 3 ). Finally, consider the case that γ 1 (1 ˆα) ˆθ 1 k ˆα ˆθ k γ, for k = 1, 3. Solving the inequalities with k = 1 and 3, we have γ ˆα/(1 ˆα) γ, implying (1 ˆα) ˆαγ and ˆα (1 ˆα)γ. Therefore, ˆm 4 ˆm = ˆα(1 ˆα)(ˆθ 1 ˆθ ) ˆα(1 ˆα)(ˆθ ˆθ 4 ) γ {(1 ˆα) ˆθ4 1 + ˆα ˆθ4 } γ ˆm. (35) So, ˆm 4 3 ˆm (γ ) ˆm < 0 when the constant γ is chosen priorly to be between 1 and. It follows that ˆm ˆm 4 3 ˆm /( γ ). 5
26 Equation (35) also implies ˆm 4 (1 + γ ) ˆm. Consequently, ˆτ = ɛo p ( ˆm 4 ) = ɛo p ( ˆm ) = ɛo p (ŝ 4 ). We have thus exhausted the sample space and the lemma follows. From (31) and (34), it follows that r 1n (ˆα, ˆθ 1, ˆθ n n, ˆσ) {ŝ 1 Y i + ŝ Y n i + ŝ 3 Y n i + ŝ 4 Y i } {ŝ 1 Yi + ŝ (Y i ) + ŝ 3 (Y i ) + ŝ 4 (Y i ) }{1 + ɛo p (1)}. Applying the quadratic form argument which has been used several times in the previous sections, we obtain that R n (ɛ; II) ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). The upper bound in the above inequality is attained when the parameters α, θ 1, θ and σ assume the values determined by the following equations: Yi Y i s 1 =, s Y = i (Y i ), Y i Y s 3 = (Y i ), s i 4 = (Y i ), where s 1, s, s 3 and s 4 are defined correspondingly by (30). We thus arrive at R n (ɛ; II) = ( Y i ) ne(y 1 ) + ( Y i ) ne(y 1 ) + ɛo p(1). (36) Remark. A by-product of the above analysis shows that the MLE of σ has a convergence rate at most n 1/4. To see this, consider the submodel where θ 1 = θ = θ, α = 1/ and σ 1 = θ. The maximum of the likelihood function is achieved when m 4 3m = θ 4 = 6 Y i (Y i ) = O p(n 1/ ). This implies that ˆθ = O p (n 1/8 ) and ˆσ 1 = n 1/4. This is in contrast to the ordinary semi-parametric models, where one may still have the usual rate of n 1/ for 6
27 the parametric components. See Van der Vaart (1996). Moreover, the result suggests that the best possible rate for estimating the mixing distribution when a structural parameter is present, is n 1/8 rather than n 1/4 as found by Chen (1995) for the mixture models without a structure parameter. 3.4 Asymptotic distribution of the LRT Theorem Let X 1,, X n be a random sample from the mixture distribution (1 α)n(θ 1, σ ) + αn(θ, σ ), where 0 α 1/, θ i M, i = 1, and σ > 0. Let R n be (twice) the log-likelihood ratio test statistic for testing H 0 : α = 0 or θ 1 = θ, i.e., N(θ 1, σ ). Then under the null distribution N(0, 1), as n, R n sup [{ς + (θ)} I(θ 0) + {ς(0) + Z }I(θ = 0)], θ M where the process involved in the limiting distribution is defined as follows: (1) ς(θ), θ M is a Gaussian process with mean 0, variance 1 and the autocorrelation function for s, t 0, b(st) ρ(s, t) = sgn(st), and ρ(0, t) = t 3 b(s )b(t ) 6b(t ), (37) where b(x) = e x 1 x x /, and () ς(0) and Z N(0, 1) are independent, and for s 0, Proof. For any fixed ɛ > 0, Cov{ς(s), Z} = s 4 6b(s ). (38) R n = max{r n (ɛ; I), R n (ɛ; II)}. By (7), the asymptotic distribution of R n (ɛ; I) is determined by the limit of { V i (θ)} nev1 (θ). 7
28 By the definition of V i (θ) given in (), we see that n 1/ n V i (θ)/{ev 1 (θ)} 1/ converges weakly to a Gaussian process, say ξ (θ), ɛ θ M with mean 0, variance 1 and autocorrelation function as follows: for ɛ s, t M, Cov{ξ (s), ξ (t)} = b(st)/ b(s )b(t ). Define ς(θ) = sgn(θ)ξ (θ) for θ 0 and ς(0) = ξ (0). Then ς(θ) is a Gaussian process with the autocorrelation function (37) and R n (ɛ; I) converges weakly to by first letting n and then ɛ 0. sup {ς + (θ)}, 0< θ M On the other hand, by (36) we can have that, by first letting n and then ɛ 0, R n (ɛ; II) converges weakly to ς(0) + Z. To see this, put R n (ɛ; II) = A n + ɛo p (1) in (36). For any η > 0, there exists C > 0 such that P ( O p (1) > C) < η for all large n s. Thus for any given x and n large, P (A n x ɛc) η P (R n (ɛ; II) x) P (A n x + ɛc) + η, implying that R n (ɛ; II) converges weakly to ς(0) + Z by first letting n and then ɛ 0 and finally η 0. The independence of ς(0) and Z is due to the fact V i (0) = Y i / and the orthogonality of Y i and Y i. The correlation between ς(θ) and Z is seen from the following calculation: Cov{V i (θ), Y i } = (θ/6)var(y i ) = θ/4, and Var{V i (θ)} = a(s )/θ 6. Thus the correlation between V i (θ) and Y i are given by (38). The proof is completed. 4 Conclusion Remark The asymptotic null distribution of the LRT for homogeneity in finite normal mixture models in the presence of a structural parameter has been derived without separation 8
29 conditions on the mean parameters. It is proved that the asymptotic null distribution of the LRT is the maximum of a χ -variable and the supremum of the square of a truncated Gaussian process. If the structural parameter were removed from the model, the peculiar large sample behavior of the LRT would disappear and the limiting null distribution would be simply the supremum of the square of the truncated Gaussian process and reduce to the one discovered by Chen and Chen (001a). If, in addition, let M approach infinity, the supremum is distributed approximately as that of ( log M) 1/ + {X log(π)}/( log M) 1/, where P (X x) = exp{ e x }, which is the type-i extreme value distribution. See Chernoff and Lander (1995, Appendix D) and Adler (1990). The result in Bickel and Chernoff (1993) can be obtained in a heuristic way by letting M = (log n/) 1/. It is interesting to see that the results from different model set-ups agree formally. Bickel and Chernoff actually dealt with a modified LRT by replacing a random element in the LRT statistic with its mean in order to simplify the analysis. It seems that their modification might not have changed the asymptotic behavior of the LRT substantially. Computing the quantiles of the supremum of a Gaussian process over a region is a difficult problem. See also the comments by Dacunha-Castelle and Gassiat (1999), and Chen and Chen (001b). Some approximations in special cases can be found in Adler (1990) and Sun (1993). Owning to the large sample study, it is found that even though the structural parameter is not part of the mixing distribution, the convergence rate of the MLE is n 1/4 rather than n 1/. This is in sharp contrast to the ordinary semi-parametric models. Moreover, the estimated mixing distribution has a convergence rate n 1/8 rather than n 1/4 as discovered by Chen (1995) for finite mixture models without a structural parameter. 9
30 REFERENCES Adler, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes. Institute of Mathematical Statistics, Lecture Notes. Vol. 1. Hayward, CA. Bickel, P. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in prototypical non regular problem. Statistics and Probability: a Raghu Raj Bahadur Festschrift. Ed: J.K. Ghosh, S.K. Mitra, K.R. Parthasarathy, and B.L.S. Prakasa Rao. Wiley Eastern Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Chen, H. and Chen, J. (001a). Large sample distribution of the likelihood ratio test for normal mixtures. Statistics & Probability Letters Chen, H. and Chen, J. (001b). The likelihood ratio test for homogeneity in the finite mixture models. Canad. J. Statist Chen, J. (1995). Optimal rate of convergence in finite mixture models. Ann. Statist Cheng, R.C.H. and Traylor, L. (1995). Non-regular maximum likelihood problems. J. Roy. Statist. Soc. B Chernoff, H. and Lander, E. (1995). Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. J. Statist. Plann. Inf Chow, Y.S., and Teicher, H. (1978). Probability Theory: Independence, Interchangeability, Martingales. Springer-Verlag, New York. Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton. 30
31 Dacunha-Castelle, D. and Gassiat, É. (1999). Testing in locally conic models, and application to mixture models. Ann. Statist Dean, C.B. (199). Testing for overdispersion in Poisson and Binomial regression models. J. Amer. Statist. Assoc Ghosh, J.K. and Sen, P.K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Hartigan, J.A. (1985). A Failure of Likelihood Asymptotics for Normal Mixtures. Proc. Berk. Conf. in Honor of J. Neyman and J. Kiefer. Edited by L. LeCam and R.A. Olshen. Lemdani, M. and Pons, O. (1999). Likelihood ratio tests in contamination models. Bernoulli Leroux, B. (199). Consistent estimation of a mixture distributions. Ann. Statist Lindsay, B.G. (1989). Moment matrices: applications in mixtures. Ann. Statist Rubin H. (1956). Uniform convergence of random functions with applications to statistics. Ann. Math. Statist Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Sun, J. (1993). Tail probabilities of the maxima of Gaussian random fields. The Annals of Probability
32 Van der Vaart, A. W. (1996). Efficient maximum likelihood estimation in semiparametric mixture models. Ann. Statist Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist Wilks, S.S. (1938). The large sample distribution of the likelihood ratio for testing composite hypothesis. Ann. Math. Statist., Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA. hchen@math.bgsu.edu Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ont NL 3G1, Canada jhchen@uwaterloo.ca 3
A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS
A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN FINITE MIXTURE MODELS Hanfeng Chen, Jiahua Chen and John D. Kalbfleisch Bowling Green State University and University of Waterloo Abstract Testing for
More informationMODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM
Statistica Sinica 19 (009), 1603-1619 MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN A TWO-SAMPLE PROBLEM Yuejiao Fu, Jiahua Chen and John D. Kalbfleisch York University, University of British Columbia
More informationINFORMATION CRITERION AND CHANGE POINT PROBLEM FOR REGULAR MODELS 1. Jianmin Pan
INFORMATION CRITERION AND CHANGE POINT PROBLEM FOR REGULAR MODELS 1 Jiahua Chen Department of Statistics & Actuarial Science University of Waterloo Waterloo, Ontario, Canada N2L 3G1 A. K. Gupta Department
More informationTesting for Homogeneity in Genetic Linkage Analysis
Testing for Homogeneity in Genetic Linkage Analysis Yuejiao Fu, 1, Jiahua Chen 2 and John D. Kalbfleisch 3 1 Department of Mathematics and Statistics, York University Toronto, ON, M3J 1P3, Canada 2 Department
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationMathematics Ph.D. Qualifying Examination Stat Probability, January 2018
Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationTesting Algebraic Hypotheses
Testing Algebraic Hypotheses Mathias Drton Department of Statistics University of Chicago 1 / 18 Example: Factor analysis Multivariate normal model based on conditional independence given hidden variable:
More informationChapter 7. Hypothesis Testing
Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationA general upper bound of likelihood ratio for regression
A general upper bound of likelihood ratio for regression Kenji Fukumizu Institute of Statistical Mathematics Katsuyuki Hagiwara Nagoya Institute of Technology July 4, 2003 Abstract This paper discusses
More informationRandom Bernstein-Markov factors
Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit
More informationAsymptotic Normality under Two-Phase Sampling Designs
Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationAsymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis
The Canadian Journal of Statistics Vol.?, No.?, 2006, Pages???-??? La revue canadienne de statistique Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationLocal Asymptotic Normality
Chapter 8 Local Asymptotic Normality 8.1 LAN and Gaussian shift families N::efficiency.LAN LAN.defn In Chapter 3, pointwise Taylor series expansion gave quadratic approximations to to criterion functions
More informationFinal Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.
1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically
More informationChi-square lower bounds
IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square
More informationThe properties of L p -GMM estimators
The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationStochastic Convergence, Delta Method & Moment Estimators
Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationMathematics Qualifying Examination January 2015 STAT Mathematical Statistics
Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationSome New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary
Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationCh. 5 Hypothesis Testing
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,
More informationCUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS. Sangyeol Lee
CUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS Sangyeol Lee 1 Contents 1. Introduction of the CUSUM test 2. Test for variance change in AR(p) model 3. Test for Parameter Change in Regression Models
More informationSoo Hak Sung and Andrei I. Volodin
Bull Korean Math Soc 38 (200), No 4, pp 763 772 ON CONVERGENCE OF SERIES OF INDEENDENT RANDOM VARIABLES Soo Hak Sung and Andrei I Volodin Abstract The rate of convergence for an almost surely convergent
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS
Statistica Sinica 17(2007), 1047-1064 ASYMPTOTIC NORMALITY UNDER TWO-PHASE SAMPLING DESIGNS Jiahua Chen and J. N. K. Rao University of British Columbia and Carleton University Abstract: Large sample properties
More informationClosest Moment Estimation under General Conditions
Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationOn the Power of Tests for Regime Switching
On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating
More informationFitting circles to scattered data: parameter estimates have no moments
arxiv:0907.0429v [math.st] 2 Jul 2009 Fitting circles to scattered data: parameter estimates have no moments N. Chernov Department of Mathematics University of Alabama at Birmingham Birmingham, AL 35294
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationLarge Sample Properties of Estimators in the Classical Linear Regression Model
Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in
More informationGoodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach
Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationA Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies
A Statistical Test for Mixture Detection with Application to Component Identification in Multi-dimensional Biomolecular NMR Studies Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationEmpirical Likelihood Tests for High-dimensional Data
Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on
More informationTesting for homogeneity in mixture models
Testing for homogeneity in mixture models Jiaying Gu Roger Koenker Stanislav Volgushev The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP09/13 TESTING FOR HOMOGENEITY
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationStability and Sensitivity of the Capacity in Continuous Channels. Malcolm Egan
Stability and Sensitivity of the Capacity in Continuous Channels Malcolm Egan Univ. Lyon, INSA Lyon, INRIA 2019 European School of Information Theory April 18, 2019 1 / 40 Capacity of Additive Noise Models
More informationarxiv:submit/ [math.st] 6 May 2011
A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationInvariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationOn the Uniform Asymptotic Validity of Subsampling and the Bootstrap
On the Uniform Asymptotic Validity of Subsampling and the Bootstrap Joseph P. Romano Departments of Economics and Statistics Stanford University romano@stanford.edu Azeem M. Shaikh Department of Economics
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationMath 181B Homework 1 Solution
Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper
McGill University Faculty of Science Department of Mathematics and Statistics Part A Examination Statistics: Theory Paper Date: 10th May 2015 Instructions Time: 1pm-5pm Answer only two questions from Section
More informationHigh-dimensional asymptotic expansions for the distributions of canonical correlations
Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 1, Issue 1 2005 Article 3 Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics Moulinath Banerjee Jon A. Wellner
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationCHANGE DETECTION IN TIME SERIES
CHANGE DETECTION IN TIME SERIES Edit Gombay TIES - 2008 University of British Columbia, Kelowna June 8-13, 2008 Outline Introduction Results Examples References Introduction sunspot.year 0 50 100 150 1700
More informationConsistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models
Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models Yingfu Xie Research Report Centre of Biostochastics Swedish University of Report 2005:3 Agricultural Sciences ISSN
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More information