INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS

Size: px

Start display at page:

Download "INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS"

Hester Weaver
5 years ago
Views:

1 INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS By CLAUDIO FUENTES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011

2 c 2011 Claudio Fuentes 2

3 To my parents, who have been there in every step 3

4 ACKNOWLEDGMENTS I would like to gratefully and sincerely thank Dr. George Casella for his guidance, understanding and patience during my graduate studies at the University of Florida. Working with him, as a research assistant and as a student, has been one of the most rewarding experiences of my life. His wealth of knowledge and experience has shaped not only the way I understand statistics today. I would also like to thank my graduate committee members: Dr. Michael Daniels, Dr. Malay Ghosh and Dr. Gary Peter for their understanding and support, throughout the whole process. Their sharp comments and suggestions have greatly improved the quality of this work. I am deeply grateful to all my teachers and professors. In particular those at the University of Florida and the Pontificia Universidad Católica de Chile. It is not a exaggeration to say that almost everything I know today is the product of their dedication and excellence at teaching. Without any doubts, they thought me more than I could learn. Thank you Dr. Alvaro Cofré. I would not be here writing these lines if it was not for your constant support and inspiration. Finally, I would like to thank my parents Jorge Fuentes and Edith Meléndez. It is because of their unconditional love and support that I have been able to reach this far. 4

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Two Formulations of the Problem Inference on the Selected Mean INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION The Known Variance Case The Unknown Variance Case Numerical Studies Tables and Figures CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF k 1 POPULATIONS An Alternative Approach Numerical Studies Tables and Figures INTERVAL ESTIMATION FOLLOWING THE SELECTION OF A RANDOM NUMBER OF POPULATIONS Connection to FDR Tables and Figures APPLICATION EXAMPLE Fixed Selection Random Selection Tables and Figures CONCLUSIONS LIST OF REFERENCES BIOGRAPHICAL SKETCH

6 Table LIST OF TABLES page 2-1 Configuration of the new parameterization for the coverage probability Configuration of the new parameterization for the case p = Representation of the parameters i,j when p = k Coverage probability of 95% CI for the selected mean when p = Structure of the s for the case p = 4, k = Coverage probabilities for the number of population means vs the number of selected populations Observed confidence coefficient for 95% CI when p = Cutoff points for 95% CI using the new method Confidence intervals for fixed top log-score differences Confidence intervals for random top log-score differences

7 Figure LIST OF FIGURES page 2-1 Coverage probability as a function of 21 and 32 when p = Plot of h/ 21 when p = Plots of the first two terms of h/ Confidence coefficient vs the number of populations for the iid case and α = Cutoff point versus number of populations for the iid case and α = Coverage probabilities as a function of when p = Individual components for the coverage probability for random K Lower bound for random K varying the probability selection Coverage probabilities for random K for different values of p

8 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS Chair: Dr. George Casella Major: Statistics By Claudio Fuentes August 2011 Consider an experiment in which p independent populations π i, with corresponding unknown means θ i are available and suppose that for every 1 i p, we can obtain a sample X i1,..., X in from π i. In this context, researchers are sometimes interested in selecting the populations that give the largest sample means as a result of the experiment, and to estimate the corresponding population means θ i. In this dissertation, we present a frequentist approach to the problem, based on the minimization of the coverage probability, and discuss how to construct confidence intervals for the mean of k 1 selected populations, assuming the populations π i are normal and have a common variance σ 2. Finally, we extend the results for the case when the value of k is randomly chosen and discuss the potential connection of the procedure with false discovery rate analysis. We include numerical studies and a real application example that corroborate this new approach produces confidence intervals that maintain the nominal coverage probability while taking into account the selection procedure. 8

9 CHAPTER 1 INTRODUCTION Given a set of p available technologies (treatments, machines, etc.), researchers must often determine which one is the best, or simply rank them according to a certain pre-specified criteria. For instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or they could be interested in ranking a class of vehicles following a safety standard. This type of problems is known as ranking and selection problems and specific solutions and procedures have been proposed in the literature since the second half of the 20th century, with a start which is usually traced back to Bechhofer (1954), Gupta and Sobel (1957). In his paper, Bechhofer presents a single sample multiple decision procedure for ranking means of normal populations. Assuming the variances of the populations are known, he is able to obtain closed form expressions for the probabilities of a correct ranking in different scenarios. This approach is more concerned with selection of the population with the largest mean rather than estimation of that mean. Gupta and co-authors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P (see Gupta and Panchapakesan (2002)); while Bechhofer uses an indifferent zone. That is, there is a minimum guaranteed probability of selecting the population with the largest mean, as long as that mean is separated from the second largest by a specified distance δ (see Bechhofer et al. (1995)). 1.1 Two Formulations of the Problem Here we are concerned with estimation, and describe two formulations of this problem, with subtle differences between them. Suppose that we have p populations, with unknown means θ i (1 i p). Assuming that for every 1 i p we can obtain a sample X i1,..., X ini from the population π i, we can either: 1. Select the population that has the largest parameter, max{θ 1,..., θ p }, and estimate its value. 9

10 2. Select the population with the largest sample mean, and estimate the corresponding θ i. The first of these problems has been widely discussed in the literature. For example, Blumenthal and Cohen (1968) consider estimating the larger mean from two normal populations and compare different estimators, but they do not discuss how to make the selection. In this direction, Guttman and Tiao (1964) propose a Bayesian procedure consisting in the maximization of the expected posterior utility for a certain utility function U(θ i ). In the same direction, but from a frequentist perspective, Saxena and Tong (1969), Saxena (1976), and Chen and Dudewicz (1976) consider point and interval estimation of the largest mean. 1.2 Inference on the Selected Mean Surprisingly, the second problem has received less attention. In this context, a common and widely used estimator is δ(x) = p i=1 X i I ( X i = X (p) ). This estimator has been discussed in the literature and is known to be biased (Putter and Rubinstein (1968)). This issue becomes clear if we consider all the populations to be identically distributed, for we will be estimating the population mean by an extreme value. Dahiya (1974) addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of the MSE. Progress was made by Cohen and Sackrowitz (1982), Cohen and Sackrowitz (1986) and Gupta and Miescke (1990), where Bayes and generalized Bayes rules were obtained and studied. However, performance theorems are scarce. One exception is Hwang (1993), who proposes an empirical Bayes estimator and shows that it performs better in terms of the Bayes risk with respect to any normal prior. Another exception is Sackrowitz and Samuel-Cahn (1984) who, in the case of the negative exponential distribution, find UMVUE and minimax estimators of the mean of the selected population. The problem of improving the intuitive estimator is technically difficult. In addition, despite the obvious bias problem, it has been difficult to establish its optimality 10

11 properties. Standard investigations in admissibility and minimaxity, following ideas such as Berger (1976), Brown (1979) and Lele (1993) are not straightforward. In this direction, Stein (1964) established the minimaxity and admissibility of the naive estimator for k = 2. Minimaxity for the general case, was established later by Sackrowitz and Samuel-Cahn (1986), were they discussed the case normal case for k 3. Admissibility, for the general case, appears to be still open. Similarly, interval estimation is an equally challenging and again, little can be found in the literature. Typically, confidence intervals are constructed in the usual way, using the standard normal distribution as a reference to attain the desired coverage probability. However these intervals do not maintain the nominal coverage probability, as the number of populations increase. Qiu and Hwang (2007) propose an empirical Bayes approach to construct simultaneous confidence intervals for K selected means, but we are not aware of any other attempts to solve this problem. In their paper, Qiu and Hwang consider a normal-normal model for the mean of the selected population, which assumes that each population mean θ i follows a normal distribution. Under these assumptions they are able to construct simultaneous confidence intervals that maintain the nominal coverage probability and are substantially shorter than the intervals constructed using the Bonferroni s bounds. However the confidence intervals they propose are asymptotically optimal, and since their coverage probabilities are obtained averaging over both sample space and prior, they do not give a valid frequentist interval. We are not aware of any other attempts to solve this problem. Recently, a modern variation of this problem has become very popular, with a major reason being the explosion of genomic data, calling for the development of new methodologies. For instance, in genomic studies, looking either for differential expression or genome wide association, thousands of genes are screened, but only a smaller number are selected for further study. Consequently, the assessment of 11

12 significance, through testing or interval estimation, must take this selection mechanism into account. If the usual confidence intervals are used (not accounting for selection) the actual confidence coefficient is smaller than the nominal level, and approaches zero as the number of genes (populations) increases. In this dissertation, we address the problem of interval estimation and present a frequentist approach to construct confidence intervals for the means of the selected populations, where the selection mechanism are properly described in the corresponding chapters. In Chapter 2 we focus on the problem of selecting one population. In Chapter 3 we introduce a novel methodology to produce confidence intervals when selecting k > 1 populations, where k is a fixed and known number. Later, in Chapter 4 we extend the results for the case k > 1, when k is a random quantity. Finally, in Chapter 5 we discuss the main conclusions and possible extensions of the results presented on this dissertation. 12

13 CHAPTER 2 INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION For 1 i p, let X i1,..., X in be a random sample from a population π i with unknown mean θ i and variance σ 2. Assume the populations π i are independent and normally distributed, so that the sample mean X i = n 1 n j=1 X ij N(θ i, σ 2 /n) for i = 1... p and define the order statistics X (1),..., X (p) as the sample values placed in descending order. In other words, the order statistics satisfy X (1)... X (p). In this context, we want to construct confidence intervals for the mean of the population that gives the largest sample mean as a result of the experiment. Formally, if we define θ (1) = p i=1 θ ii (X i = X (1) ), our aim is to produce confidence intervals for θ (1), based on X (1), such that the confidence coefficient is at least 1 α, for any 0 < α < 1 specified prior to the experiment. It is not difficult to realize that the standard confidence intervals do not maintain the nominal coverage probability. For instance, if all the populations π i are normally distributed with mean θ and variance 1, then, for samples of size n = 1, X 1,..., X p iid N(θ, 1). It follows that P(X (1) x) = Φ p (x θ), where Φ( ) denotes the cdf of the standard normal distribution. Moreover the mean of the selected population θ (1) = θ and hence for any value of c > 0. In particular, when p = 3, we obtain P(θ (1) X (1) ± c) = Φ p (c) Φ p (). P(θ (1) X (1) ± c) = Φ 3 (c) Φ 3 () = (Φ(c) Φ())(Φ 2 (c) + Φ(c)Φ() + Φ 2 ()) = (2Φ(c) 1)(1 Φ(c) + Φ 2 (c)). 13

14 Since 1 Φ(c)+Φ 2 (c) < 1, we have the standard confidence interval is smaller than the nominal level given by 2Φ(c) 1. In fact, it is easy to show that coverage probability maintain the nominal level only for p = 1 and 2, and then decreases as p goes to infinity. The problem is that the traditional intervals do not take into account the selection mechanism. Thus, in order to construct confidence intervals that maintain the nominal level we must take into account the selection procedure. To this end, we first consider the partition of the sample space induced by the order statistics and write P(θ (1) X (1) ± c) = p P(θ i X i ± c, X i = X (1) ). (2 1) i=1 Observe that each term in the sum (2 1) can be explicitly determined using the joint distribution of (X 1,..., X p ). For example, when i = 1 (the first term of the sum), we have P(θ 1 X 1 ± c, X 1 = X (1) ) = P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ). (2 2) In the next section we derive a closed form expression for the coverage probability in (2 1), assuming the population variance σ 2 is known, and present a new approach to obtain the desired confidence intervals. 2.1 The Known Variance Case Suppose the population variance σ 2 is known and define Z j = n(x j θ j )/σ for j = 1,..., p. It follows that Z 1,..., Z p iid N(0, 1) and X 1 X j n(x 1 θ 1 )/σ n(x j θ j + θ j θ 1 )/σ Z 1 Z j + j1 Z 1 Z j j1, where j1 = n(θ j θ 1 )/σ for j = 1,..., p. At this point, to simplify the notation we take n = σ 2 = 1. Then, if we consider the transformation 14

15 z = z 1 ω 2 = z 1 z 2 T :. ω p = z 1 z p we can rewrite (2 2) in terms of p1, and obtain P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ) = P( z c, ω 2 21,..., ω p p1 ) { 1 c p } = e 1 (2π) p/2 2 (ω j z) 2 dω j e 1 2 z 2 dz. Notice that for fixed z, the integrals within the curly brackets { } are essentially the tail probability of a normal distribution centered at z. Therefore, we can write { c p } P( z c, ω 2 21,..., ω p p1 ) = Φ(z j1 ) φ(z)dz, where φ( ) denotes the pdf of the standard normal distribution. Of course, the same argument is valid for the remaining terms of the sum in (2 1). It follows that we can fully describe the probability P(θ (1) X (1) ± c) in terms of a new set of parameters ij s, where ij = θ i θ j for 1 i, j p. Under this representation, for every c > 0, the value of the coverage probability P(θ (1) X (1) ± c) is determined by the relative distances between the population means θ i, i = 1,..., p. In other words, we coverage probability defines a function h c ( ) = P(θ (1) X (1) ± c), where = ( 11, 12,..., pp ) is the vector of possible configurations of the relative distances ij s. In this context, we can obtain confidence intervals for θ (1), that have (at least) the right nominal level, by minimizing first the function h c. Specifically, given 0 α 1, we can determine the value of c > 0 that satisfies j=2 j=2 j1 P(θ (1) X (1) ± c) min h c ( ) = 1 α. (2 3) 15

16 In order to minimize the function h c, we first notice the following properties of the parameters ij s: 1. jj = 0, for every j. 2. ij = ji, for every i, j. 3. For j > k, jk = j,j 1 + j 1,j k+1,k. These properties reveal a certain underlying symmetry in the structure of the problem. This symmetry is portrayed in Table 2-1 where every entry ij corresponds to the difference between the values of θ i and θ j located in row i and column j respectively. In addition, Property 3 indicates that we only need to consider p 1 parameters in order to determine the value of P(θ (1) X (1) ± c). In fact, for any given ordering of the parameters θ i s, we can always choose a representation of the probability in (2 1) based on p 1 parameters ij. As a result, we have that the true ordering of the population means θ i s is not particularly relevant in this approach, and hence, we will assume (without any loss of generality) that θ 1 θ 2... θ p. Although the introduction of the new parameterization seems to reduce (in a sense) the complexity of the problem, the minimization of h c is still difficult. First, because of the delicate balance existing between the ij s in the full expression (see Table 2-1) and second, because the formula of the coverage probability is somehow involved. To illustrate these problems, let us discuss the case p = 2. We have P(θ (1) X (1) ± c) = = c c Φ(z 12 )φ(z)dz + c [Φ(z 12 ) + Φ(z + 12 )]φ(z)dz, Φ(z + 12 )φ(z)dz where 12 > 0. Since only the quantity in brackets [ ] depends on 21 and φ(z) > 0, it seems reasonable to think that h c ( 12 ) = P(θ (1) X (1) ± c) is minimized at the same point where g z ( 12 ) = Φ(z 12 ) + Φ(z + 12 ) finds its minimum. However, differentiating g z 16

17 with respect to 12 we obtain dg z 0, z 0 = φ(z + 12 ) φ(z 12 ) d 12 < 0, z > 0 where we observe that the value of the derivative depends on 12 and z, and consequently, the minimum of h c can not be determined by simple examination of the behavior of g z. From the analysis of g z, we conclude that g z ( 12 ) is minimized at 12 = 0, when z 0 and (asymptotically) at 12 = +, when z > 0. Then, we can establish the inequality P(θ (1) X (1) ± c) 0 2Φ(z)φ(z)dz + c 0 φ(z)dz, however, this lower bound is not obtained by direct minimization of the coverage probability and is less appealing. The problem is that a strategy based on this type of lower bounds may be too conservative and lead to extremely wide intervals when applied to higher dimensions (p > 2). In order to find a formal solution to the minimization problem, we start with the case p = 3. For this case, we can fully describe the probability of interest in terms of the two parameters 12 and 23, as is shown in Table 2-2. We obtain P(θ (1) X (1) ± c) = 1 c 2π + 1 2π c Φ(z 12 )Φ(z )e 1 2 z 2 dz (2 4) + 1 2π c Φ(z + 12 )Φ(z 23 )e 1 2 z 2 dz Φ(z + 23 )Φ(z )e 1 2 z 2 dz, where 12, 23 0 and Φ( ) denotes the cdf of the standard normal distribution. Preliminary studies suggest that the global minimum of h c ( 12, 23 ) = P(θ (1) X (1) ± c) is located at the origin (see Figure 2-1), but a formal proof is required. To this end, it is sufficient to show that h c / 23 > 0 and h c / 12 > 0. 17

18 Taking partial derivatives with respect to 21 we obtain h c = 1 c 12 2π 1 2π + 1 2π 1 2π c c c Φ(z + 23 )e 1 2 ( z) z 2 dz (2 5) Φ(z 12 )e 1 2 ( z) z 2 dz Φ(z 23 )e 1 2 ( 12+z) z 2 dz Φ(z )e 1 2 ( 12 z) z 2 dz. Since the partial derivative depends on both 12 and 23, the behavior of its sign is not obvious, but different numerical studies support the idea that the derivative is non-negative. Figure 2-2 shows the plot of the integrand of h c / 12 for fixed values of 12 and 23. Notice that if we group the first two terms and the last two terms of (2 5), we can look at the partial derivative as the sum of two differences. In Figure 2-3 we observe (in separate plots) the integrands of the first two terms of the partial derivative h c / 12, for fixed values of 12 and 23. The plot suggest that the integrands differ only by a location parameter. In fact, changing variables, we can rewrite the expression in (2 5) as h c 12 = D 1 + D 2, (2 6) where D 1 = 1 { c 2π D 2 = 1 2π { 12 +c c 12 c } Φ(z 12 )e 1 2 ( z) z 2 dz, } Φ(z )e 1 2 ( 12 z) z 2 dz. Recall that 12 > 0, then looking at D 2, we have two possibilities for the intervals of integration: 1. < 12 c < c < 12 + c. 2. < c < 12 c < 12 + c. 18

19 In other words, the intervals may overlap or not. Denoting by R 1 and R 2 the non-common regions of integration, that is R 1 = (, 12 c) and R 2 = (c, 12 + c) for case (1). R 1 = (, c) and R 2 = ( 12 c, 12 + c) for case (2). We have that D 2 is guaranteed to be positive, as long as the integral over R 2 is greater than the integral over R 1, regardless of the case. We first notice that R 1 and R 2 are intervals of the same length. In fact, l(r 1 ) = l(r 2 ) = 12 for case (1), and l(r 1 ) = l(r 2 ) = 2c for case (2). Then, we only need to show that for any two points z 1 R 1 and z 2 R 2 located at a certain distance ɛ > 0 from the extremes of the corresponding intervals, the integrand evaluated at z 2 is greater than the integrand evaluated at z 1. Observe that for any z 1 < z 2, Φ(z ) e z 2 12 z 2 2 Φ(z ) e z 1 12 z 2 1 = q exp{(z 2 z 1 )[ 12 (z 2 + z 1 )]}, (2 7) where q = Φ(z )/Φ(z ) > 1. Then, for any 0 < ɛ < min{ 12, 2c}, take z 1 = 12 c ɛ, z 2 = c + ɛ whenever min{ 12, 2c} = 21 (i.e. case 1) and z 1 = c ɛ, z 2 = 12 c + ɛ whenever min{ 12, 2c} = 2c (i.e. case 2). Replacing these values in (2 7) we obtain the ratio is greater than 1 (regardless the case) which is compelling to conclude that D 2 > 0. Notice that the argument still holds if we replace the cdf Φ( ) by any non-decreasing function or if we change the interval (, c) for ( 1, c 2 ), where c 1, c 2 > 0. This way, we obtain the following more general result: Proposition 2.1. Let 1, 2, c 1, c 2 > 0 and let the function f (z, λ) be non decreasing in z, where λ is an arbitrary set of parameters. Then, { 1 +c c2 1 } f (z, λ)e 1 2 ( 1 z) z 2 dz 0, where the inequality is strict whenever the function f is monotonically increasing in z. 19

20 An immediate consequence of Proposition 2.1 is that D 1 > 0. As a result, we obtain that h/ 12 > 0. A similar argument shows that h/ 23 > 0, completing the proof. It follows that coverage probability P(θ (1) X (1) ± c) is minimized at 12 = 23 = 0, that is, whenever θ 1 = θ 2 = θ 3. Observe that Proposition 2.1 gives a straightforward proof for the case p = 2. In effect, for h c ( 12 ) = P(θ (1) X (1) ± c), we have dh 12 +c c c = φ(z 12 )φ(z)dz φ(z 12 )φ(z)dz. d Then, applying Proposition 2.1 with f = 1/2π, we obtain that h c( 12 ) 0. It immediately follows that the coverage probability is minimized at 12 = 0, or equivalently, when θ 1 = θ 2. For the general case (p > 3), we observe that when moving from the case p = k to the case p = k + 1, we only need to include the extra parameter k+1,k in order to describe the problem (see Table 2-3). Then, using Proposition 2.1 and mathematical induction we obtain the following result: Lemma 1. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1). Then, c2 min P(θ (1) (X (p) c 1, X (p) + c 2 )) θ 1,...,θ p = p Φ p 1 (z)φ(z)dz 1 = Φ p (c 2 ) Φ p ( 1 ), where Φ( ) and φ( ) are respectively the cdf and pdf of the standard normal distribution. Using this lemma, we can easily obtain the following theorem, that summarizes the main results of this section. The proof is straightforward. Theorem 2.1. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i is unknown, but σ 2 is known. Then, a confidence interval for θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is 20

21 given by X (1) ± σ n c, where the value of c satisfies Φ p (c) Φ p () = 1 α. 2.2 The Unknown Variance Case If the variance σ 2 is unknown, we need to estimate its value. We assume that we have an independent estimate s 2 of σ 2, such that s/σ has a pdf ϕ. In a regular experiment, where we observe a sample of size n from each population, s 2 can be taken as the pooled variance estimate and s 2 /σ 2 χ 2 ν, a chi-square distribution with ν = p(n 1) degrees of freedom. Suppose first that p = 3 and for simplicity take n = 1. Then, the coverage probability can be written as P(θ (1) X (1) ± sc) = P( Z 1 cs/σ, Z 1 Z , Z 1 Z ) +P(Z 2 Z , Z 2 cs/σ, Z 2 Z ) +P(Z 3 Z , Z 3 Z , Z 3 cs/σ) (2 8) where Z i = (X i θ i )/σ and ij = (θ i θ j )/σ for 1 i, j 3. Notice that taking t = s/σ we can rewrite each term in the sum (2 8) as a mixture. We obtain P(θ (1) (X (1) sc) = P( Z 1 ct, Z 1 Z , Z 1 Z t)ϕ(t)dt P(Z 2 Z , Z 2 ct, Z 2 Z t)ϕ(t)dt P(Z 3 Z , Z 3 Z , Z 3 ct t)ϕ(t)dt, 21

22 where ϕ( ) denotes the pdf of t. It follows that P(θ (1) X (1) ± sc) = 0 P(θ (1) X (1) ± tc t)ϕ(t)dt, where we know (from Section 2.1) that the probability P(θ (1) X (1) ± tc t) in the integral is minimized at θ 1 = θ 2 = θ 3. The generalization of this result follows from a direct application of Lemma 1. Lemma 2. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1), where both θ i and σ 2 are unknown. If s 2 is an estimate of σ 2 independent of X 1,..., X n, then min P(θ (1) (X (1) sc 1, X (1) sc 2 )) = θ 1,...,θ p 0 (Φ p (c 2 t) Φ p ( 1 t)) ϕ(t)dt, where ϕ( ) is the pdf of s/σ and Φ( ) is the cdf of the standard normal distribution. We end this section with the following theorem. The proof follows directly form Lemma 2. Theorem 2.2. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i and σ 2 are unknown. Then, a confidence interval for the θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is given by X (1) ± s n c, where s = p 1 (n 1) p i=1 s2 i, si 2 satisfies 0 = (n 1) 1 n j=1 (X ij X i ) 2 for i = 1,..., p and c (Φ p (ct) Φ p (t)) ϕ(t)dt = 1 α. 2.3 Numerical Studies In this chapter, we have proposed a method to construct confidence intervals for the mean of the selected population that takes into account the selection procedure. In this section we present some numerical results that compare the performance of the new and the traditional intervals. 22

23 First, we study the behavior of the confidence coefficient, as a function of the number of populations. Results show that the confident coefficient of the traditional intervals decreases rapidly as the number of population increases. This effect is particularly extreme when all the populations have the same mean. Figure 2-4 shows the result of simulations considering up to 30 populations with the same mean and setting α = The solid blue line represents the confidence coefficient obtained using our proposed confidence intervals and the dashed red line depicts the behavior of the confidence coefficient obtained using the standard confidence intervals. Observe that the solid line is constant at the nominal level 95%. Intuitively, in order to maintain the coverage probability constant, the confidence intervals need to get wider. However, this increment is not dramatic and slow down as the number of populations increase. For instance, if we consider populations, the value of the cutoff point is only about In fact, from the inequality in Theorem 2.1 it can be determined that the behavior of the cutoff value c log(p). An indirect way to obtain confidence intervals for θ (1), that attain (at least) the nominal level, would be to construct simultaneous confidence intervals for the means of all the populations considered in the experiment using, for instance, Bonferroni intervals. The natural question is whether such a procedure produces better intervals, in terms of the length. The answer is no. In fact, the size of the Bonferroni intervals increase at a faster rate compared to the intervals we propose. Figure 2-5 shows the behavior of the cutoff point c, as the number of populations increase for the case α = The solid line correspond to the value of the standard cutoff point for a 95% confidence interval (z α/2 = 1.96). The dashed/dotted line represents the value of c for the new confidence intervals and the dashed line correspond to the cutoff values for the Bonferroni intervals. In an applied situation, the population means θ i (1 i p) will be rarely identical. Hence we need to compare the performance of the confidence intervals when the populations means are different. Table 2-4 summarize some results obtained by 23

24 simulations for the case p = 4. The first column shows the true value of the population means (all of them with variance equal to 1) while the second and third column show the observed coverage probability for the traditional and new intervals at a confidence level of 95%. The reported values correspond to the average for the coverage probabilities after ten replications and the numbers in parenthesis are the corresponding standard errors. We observe that our proposed intervals outperform the traditional ones, even when the population means are far apart. It is interesting to notice that even in situations where one of the population should be somehow distinguishable (see row four in Table 2-4), the traditional intervals may perform poorly. 2.4 Tables and Figures Table 2-1. Configuration of the new parameterization for the probability P(θ (1) X (1) ± c). In the table ij = θ i θ j. θ 1 θ 2 θ p θ p1 θ p θ p p1 p2 0 Table 2-2. Configuration of the new parameterization for the case p = 3, when 12 and 23 are the free parameters. In the table ij = θ i θ j. θ 1 θ 2 θ 3 θ ( ) θ θ Table 2-3. Representation of the parameters i,j for the case p = k + 1. θ 1 θ 2... θ k θ k+1 θ k,k k+1,k θ k,k k+1,k θ k -( k,k ) -( k,k )... 0 k+1,k θ k+1 -( k+1,k ) -( k+1,k )... k+1,k 0 24

25 Table 2-4. Observed coverage probability of 95% CI for the mean of the selected population out of four populations using the traditional and the new method. The reported values correspond to the average after ten replications and the number in parenthesis is the corresponding standard error. (θ 1, θ 2, θ 3, θ 4 ) Trad CI New CI (0,0,0,0) (0.0016) (0.0012) (0,0.25,0.5,1) (0.0020) (0.0011) (0,5,10,15) (0.0014) (0.0009) (0,0,0,2) (0.0042) (0.0027) (0,0,0,5) (0.0031) (0.0028) Figure 2-1. Coverage probability as a function of 21 and 32 when p = 3. 25

26 Figure 2-2. Plot of h/ 21 for predetermined values of 21 and Figure 2-3. Plots of the first two terms of h/ 21 for predetermined values of 21 and

27 Confidence Coefficient New Traditional Number of Populations Figure 2-4. Confidence coefficient versus number of populations for the case of identical population means and α = The solid blue line corresponds to the confidence coefficient for the new confidence intervals, and the dashed red line corresponds to the confidence coefficient for the traditional confidence intervals. 27

28 Cutoff Value Nominal New Bonferroni Number of Populations Figure 2-5. Cutoff point versus number of populations for the case of identical population means and α = The dashed blue line corresponds to the cutoff value for the traditional confidence interval, z α/2 = The dashed red line corresponds to the cutoff value for the new intervals and the dashed line corresponds to the cutoff value for the Bonferroni intervals. 28

29 CHAPTER 3 CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF K 1 POPULATIONS Using the same framework as in Chapter 2, we assume that for i = 1,..., p, we have independent random variables X j N(θ j, σ 2 /n). Also, we define the order statistics X (1),..., X (p) according the inequalities X (1)... X (p) and for simplicity, we start considering σ 2 = n = 1. Then, we observe that the mean of the population from which the jth biggest observation, X (j), is sampled, can be written as θ (j) = p θ i I (X i = X (j) ). i=1 In this context, we want to find the value of c > 0 such that P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) 1 α (3 1) for any 0 < α < 1 and 1 k p. (3 1) as Following the same approach we used in Chapter 2, we can write the probability in j 1... j k P(θ (1) X (1) ± c,..., θ (k) X (k) ± c, X (1) = X j1,..., X (k) = X jk ), where the sum has ( p k) terms. Let us consider first, the case p = 4 and k = 2. Then, the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) = i j P(θ i X i ± c, θ j X j ± c, X (1) = X i, X (2) = X j ), (3 2) where 1 i, j 4. In order to obtain closed form expressions for each terms in the sum, observe that for X (1) = X 1 and X (2) = X 2, we have (X (1) = X 1, X (2) = X 2 ) = (X 1 X 2, X 2 X 3, X 2 X 4 ). In other words, the relative order between X 3 and X 4 is irrelevant. 29

30 It follows that we only need to pay attention to possible configurations of the random variables that are the top. In this case the possible configurations are (X 1 X 2, X 2 X 3, X 2 X 4 ) (X 3 X 1, X 1 X 2, X 1 X 4 ) (X 1 X 3, X 3 X 2, X 3 X 4 ) (X 3 X 2, X 2 X 1, X 2 X 4 ) (X 1 X 4, X 4 X 2, X 4 X 3 ) (X 3 X 4, X 4 X 1, X 4 X 2 ) (X 2 X 1, X 1 X 3, X 1 X 4 ) (X 4 X 1, X 1 X 1, X 1 X 3 ) (X 2 X 3, X 3 X 1, X 3 X 4 ) (X 4 X 2, X 2 X 1, X 2 X 3 ) (X 2 X 4, X 4 X 1, X 4 X 3 ) (X 4 X 3, X 3 X 1, X 3 X 2 ) If we define Z j = X j θ j (1 j 4) and ij = θ i θ j (1 i, j 4), we observe X 1 X 2 Z 1 Z X 2 X 3 Z 2 Z X 2 X 4 Z 2 Z , where Z 1,..., Z 4 are iid N(0, 1). Then, the first term of the sum in (3 2) can be written P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = P( Z 1 c, Z 2 c, Z 2 Z , Z 3 Z , Z 4 Z ) and making use of the normality assumptions, we can explicitly write P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = + c min(c,z1 21 ) c min(c,z2 12 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Of course, the same argument is valid for the other terms in the sum. This way, considering all the 12 possible configurations for the order of the random variables X 1, 30

31 X 2, X 3 and X 4 we can write the sum in (3 2) in closed form P(θ (1) X (1) ± c, θ (2) X (2) ± c) = c min(c,z1 21 ) c min(c,z2 12 ) c min(c,z1 31 ) c min(c,z3 13 ) c min(c,z1 41 ) c min(c,z4 14 ) c min(c,z2 32 ) c min(c,z3 23 ) c min(c,z2 42 ) c min(c,z4 24 ) c min(c,z3 43 ) c min(c,z4 34 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(z 3 23 )Φ(z 3 43 })φ(z 1 )φ(z 3 )dz 3 dz 1 Φ(z 1 21 )Φ(z 1 41 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 4 24 )Φ(z 4 34 })φ(z 1 )φ(z 4 )dz 4 dz 1 Φ(z 1 21 )Φ(z 1 31 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(z 3 13 )Φ(z 3 43 })φ(z 3 )φ(z 2 )dz 3 dz 2 Φ(z 2 12 )Φ(z 2 42 })φ(z 3 )φ(z 2 )dz 2 dz 3 Φ(z 4 14 )Φ(z 4 34 })φ(z 4 )φ(z 2 )dz 4 dz 2 Φ(z 2 12 )Φ(z 2 32 })φ(z 4 )φ(z 2 )dz 2 dz 4 Φ(z 4 14 )Φ(z 4 24 })φ(z 3 )φ(z 4 )dz 4 dz 3 Φ(z 3 13 )Φ(z 3 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 In order to minimize this expression, we need to address two difficulties equally challenging: First, the construction of any lower bound need to take into account the delicate balance between the ij s in the expression. Second, special attention need to be paid to the limits of integration. The corners of the form min(c, z ij ) make nearly impossible any procedure based on differentiation. 31

32 To overcome the difficulty due to the corners, we notice that the events (Z 2 Z , Z 3 Z , Z 4 Z ) and (Z 2 Z , Z 3 Z , Z 4 Z ) are disjoint. Hence, we can express the sum of the probabilities for these two events as the probability of their union. Consequently, instead of writing down 12 terms for the sum (one term per configuration), we can express the probability of interest using only 6 terms, each of them describing the two random variables positioned at the top. Working the details, we obtain: X 1 and X 2 at the top. P( Z 1 c, Z 2 c, Z 3 max{z , Z }, Z 4 max{z , Z }) = c c X 1 and X 3 at the top. Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 P( Z 1 c, Z 3 c, Z 2 max{z , Z 3 23 }, Z 4 max{z , Z }) = c c X 1 and X 4 at the top. Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 P( Z 1 c, Z 4 c, Z 2 max{z , Z 4 24 }, Z 3 max{z , Z 4 24 }) = c c X 2 and X 3 at the top. Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 P( Z 2 c, Z 3 c, Z 1 max{z 2 12, Z 3 13 }, Z 4 max{z , Z }) = c c X 2 and X 4 at the top. Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 P( Z 2 c, Z 4 c, Z 1 max{z 2 12, Z }, Z 3 max{z , Z 4 34 }) = c c Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 32

33 X 3 and X 4 at the top. P( Z 3 c, Z 4 c, Z 1 max{z 3 13, Z 4 14 }, Z 2 max{z 3 23, Z 4 24 }) = c c Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 This way, an alternative representation for the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) (3 3) = c c c c c c c c c c c c Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 Observe that this new representation does not completely solve the problem of the corners, but rather removes them from the limits of integration and puts them inside the integrand. Now, we find expressions of the form max{z + ij } in the argument of the normal cdf s Φ( ), which still makes difficult any minimization approach based on differentiation. However, this new representation reveals more clearly the symmetry in the structure of the s, as is portrayed in Table 3-1. This pattern is particularly important since it suggests to generalize the expression for any values of p and k. In order to determine the configuration of s that minimize the expression in (3 3), we assume (without loss of generality) that θ 1 θ 2 θ 3 θ 4, this way ij 0 for any i j. Also, we consider 12, 23 and 34 as free parameters. Based on our previous results, it is reasonable to believe that the minimum of (3 3) is reached at the origin. In order to prove this claim we have studied the behavior of the 33

34 coverage probability (CP) for different configurations of the ij s, with special attention to the behavior at the boundary. Among others we considered the following cases: 12 = 23 = 34 = 0 12 > 0, 23 = 34 = 0 c CP = c c CP = 6 Φ 2 (max{z 1, z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 c c c c c 12, 23 > 0 and 34 = 0: CP = c c c c c c c c c c Φ 2 (max{z , z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z 2 12, z 3 12 })Φ(max{z 2, z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12 + Φ 2 (max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z 3 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z 2 12, z 3 13 })Φ(max{z , z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 3 13, z 4 13 })Φ(max{z 3 23, z 4 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12, 23 + However, none of the cases we considered provided conclusive (analytical) evidence that the minimum is at the origin. On the contrary, various numerical studies has suggested that the minimum is not located located at the origin (see Figure 3-1), but the current formulation of the problem makes difficult even to establish that is not located at the interior of the region determined by 12, 23 and 34. section. These difficulties call for a different approach which we discuss in the following 34

35 3.1 An Alternative Approach So far, we have approached the problem considering partitions of the coverage probability based on the possible configurations of the vector (X (1), X (2),..., X (k) ). Notice that such approach, by construction, takes into account the relative orderings between the variables that are selected (the top k). Instead, we can consider an alternative that do not take explicit consideration of the ordering between the variables that have been selected. Notice there are ( p k) different ways to select k out of p populations, without considering the order. Suppose that j indexes one of such arrangements and denote by X j1,..., X jk the top k variables, and by X jk+1,..., X jp are the bottom p k. Then, we can separate the sample space according to min{x j1,..., X jk } max{x jk+1,..., X jp } for j = 1,..., ( p k). This way, the coverage probability can be written P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = P(θ j1 X j1 ± c,..., θ jk X jk ± c, min{x j1,..., X jk } max{x jk+1,..., X jp }) j=1 Let us consider first the term where (X 1, X 2,..., X k ) are at the top. For this case, the corresponding piece of relevant probability is P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) θ1 +c θk +c p = P θj (X j min{x 1,..., x k })f (x 1,..., x k )dx 1 dx k θ 1 θ k j=k+1 where f (x 1,..., x k ) is the joint density of (X 1,..., X k ). Hence, making use of the the normality assumptions, we have P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) c c p k = Φ(min{z 1 + θ 1,..., z k + θ k } θ j ) φ(z i )dz i, j=k+1 i=1 where z i = x i θ 1 for i = 1,..., k. 35

36 From here, it is not difficult to obtain the following expression for the coverage probability P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c c m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l, (3 4) l I j l I j where I j = {j 1,..., j k }, the set of indices for the top k variables in the j-th arrangement and I c j arrangement. = {j k+1,..., j p }, the set of indices for the bottom p k variables in the j-th Notice that if k = 1 we are back in the case discussed in Chapter 2 and the case k = p correspond to simultaneous confidence intervals. Let us take a closer look at this formula and consider first the case p = 6 and k = 3. In such case, the sum in (3 4) will have ( 6 3) = 20 terms determined by the configurations where the numbers to the left of the vertical line are the indices of the set I j (the populations being selected) and the numbers to the right the indices of the set I c j (the populations being not selected). Observe that all the indices appear on the left side (and on the right side) the same number of times (10), revealing some symmetry in the problem. 36

37 Using this symmetry, suppose that θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 and let θ 6. Then, for the 10 groups for which 6 is on the right side, the corresponding term goes to zero. For the remaining groups (for which 6 appears on the left) the value of Φ (min l=1,...,k {z jl + θ jl } θ jm ) is not affected by θ 6, and the coverage probability is determined by the following configuration which correspond to the possible ways of choosing 2 out of 5 populations. Repeating the argument, but letting θ 5 we obtain the configuration which are the possible ways to choose 1 out of 4 populations. For this case, we know (from Chapter 2) that the minimum is reached at θ 1 = θ 2 = θ 3 = θ 4. This example suggests that the coverage probability is minimized when the biggest p k 1 population means are sent to + and the remaining k + 1 are set to be equal. However, a formal argument is required. For the general case (1 k < p) the number of possible configurations is ( ) p k = = ( ) ( ) p 1 p 1 + k p k ( ) ( ) p 1 p 1 + k k 1 where ( ) p 1 k is the number of times that any given index j appears on the right side (population j is not selected) and ( p 1 k 1) is the number of configurations that have index j on the left side (population j is selected). 37

38 Suppose (without any loss of generality) that θ 1... θ p and define where I ( ) is the indicator function. ( ) I j (θ p ) = I min {z l + θ l } z p + θ p l I j {p} ( ) Ij c (θ p ) = I min {z l + θ l } < z p + θ p l I j {p} From the definition, it immediately follows min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) (3 5) l I j l I j {p} and therefore, the coverage probability can be written as P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 ( k) p + j=1 c c c c Φ ((z p + θ p ) θ m ) I j (θ p ) φ(z l )dz l l I j m I c j m I c j Now, observe that as θ p ( ) Φ min {z l + θ l } θ m Ij c (θ p ) φ(z l )dz l l I j {p} l I j min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) l I j l I j {p} min {z l + θ l } l I j {p} and hence m I c j ( ) Φ min{z l + θ l } θ m ( ) Φ min {z l + θ l } θ m l I j m Ij c l I j {p} for all the terms for which θ p is on the left side. At the same time, for the terms where θ p is on the right side, we have m I c j ( ) Φ min{z l + θ l } θ m 0, l I j 38

39 and therefore, as θ p, the coverage probability converges to ( k 1) p 1 c j=1 c m I c j ( ) Φ min {z l + θ l } θ m φ(z l )dz l. l I j {p} l I j Before we move forward, let us consider the example p = 3, k = 2. Then, the coverage probability is P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) ( 3 2) = = + + c c j=1 c c c c c c and, as θ 3, we obtain m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j Φ(min{z 1 + θ 1, z 2 + θ 2 } θ 3 )φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 6) Φ(min{z 2 + θ 2, z 3 + θ 3 } θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3, M = + c c c c Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 7) Φ(z 2 + θ 2 θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3. Suppose now, that for a fixed θ p, min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3. Since we are assuming that θ 1 θ 2 θ 3, this can only happens for certain values of z 1 and z 3. Let R 1 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 1 + θ 1 } and R 2 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3 }. Then, the integral in (3 6) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 3 + θ 3 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. R 1 R 2 Similarly, the integral in (3 7) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 R 1 R 2 39

40 and, since θ 3 θ 2 θ 1 θ 2, we obtain c c c c Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. Using similar argument with the third integral in the coverage probability, we conclude that P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) M. For the general case, suppose that θ p (fixed) is such that I j (θ p ) = 1 for some j. That is min l Ij {z l + θ l } = (z p + θ p ). Under the assumption θ 1... θ p, we have θ p θ m θ l θ m for any 1 m, l p and therefore, I j (θ p ) can be equal to 1 only in a certain region of the hyper-cube (, c) k. Then, partitioning the integrals accordingly, we obtain P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c ( k 1) p 1 c j=1 c c m I c j m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j ( ) Φ min {z l + θ l } θ m φ(z l )dz l, (3 8) l I j {p} l I j where the equality is attained asymptotically as θ p approaches infinity. Integrating (3 8) with respect to z p, we obtain ( p 1 (Φ(c) Φ()) k 1) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p} φ(z l )dz l where the quantity in brackets [ ] is exactly the coverage probability for selecting k 1 out of p 1. Repeating the argument, but now letting θ p 1, we obtain the lower bound ( p 2 (Φ(c) Φ()) 2 k 2) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p,p 2} φ(z l )dz l. 40

confidence intervals for the means of the k selected populations,

Electronic Journal of Statistics Vol. 12 (2018) 58 79 ISSN: 1935-7524 https://doi.org/10.1214/17-ejs1374 Confidence intervals for the means of the selected populations Claudio Fuentes Department of Statistics