INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS

Size: px
Start display at page:

Download "INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS"

Transcription

1 INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS By CLAUDIO FUENTES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011

2 c 2011 Claudio Fuentes 2

3 To my parents, who have been there in every step 3

4 ACKNOWLEDGMENTS I would like to gratefully and sincerely thank Dr. George Casella for his guidance, understanding and patience during my graduate studies at the University of Florida. Working with him, as a research assistant and as a student, has been one of the most rewarding experiences of my life. His wealth of knowledge and experience has shaped not only the way I understand statistics today. I would also like to thank my graduate committee members: Dr. Michael Daniels, Dr. Malay Ghosh and Dr. Gary Peter for their understanding and support, throughout the whole process. Their sharp comments and suggestions have greatly improved the quality of this work. I am deeply grateful to all my teachers and professors. In particular those at the University of Florida and the Pontificia Universidad Católica de Chile. It is not a exaggeration to say that almost everything I know today is the product of their dedication and excellence at teaching. Without any doubts, they thought me more than I could learn. Thank you Dr. Alvaro Cofré. I would not be here writing these lines if it was not for your constant support and inspiration. Finally, I would like to thank my parents Jorge Fuentes and Edith Meléndez. It is because of their unconditional love and support that I have been able to reach this far. 4

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Two Formulations of the Problem Inference on the Selected Mean INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION The Known Variance Case The Unknown Variance Case Numerical Studies Tables and Figures CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF k 1 POPULATIONS An Alternative Approach Numerical Studies Tables and Figures INTERVAL ESTIMATION FOLLOWING THE SELECTION OF A RANDOM NUMBER OF POPULATIONS Connection to FDR Tables and Figures APPLICATION EXAMPLE Fixed Selection Random Selection Tables and Figures CONCLUSIONS LIST OF REFERENCES BIOGRAPHICAL SKETCH

6 Table LIST OF TABLES page 2-1 Configuration of the new parameterization for the coverage probability Configuration of the new parameterization for the case p = Representation of the parameters i,j when p = k Coverage probability of 95% CI for the selected mean when p = Structure of the s for the case p = 4, k = Coverage probabilities for the number of population means vs the number of selected populations Observed confidence coefficient for 95% CI when p = Cutoff points for 95% CI using the new method Confidence intervals for fixed top log-score differences Confidence intervals for random top log-score differences

7 Figure LIST OF FIGURES page 2-1 Coverage probability as a function of 21 and 32 when p = Plot of h/ 21 when p = Plots of the first two terms of h/ Confidence coefficient vs the number of populations for the iid case and α = Cutoff point versus number of populations for the iid case and α = Coverage probabilities as a function of when p = Individual components for the coverage probability for random K Lower bound for random K varying the probability selection Coverage probabilities for random K for different values of p

8 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS Chair: Dr. George Casella Major: Statistics By Claudio Fuentes August 2011 Consider an experiment in which p independent populations π i, with corresponding unknown means θ i are available and suppose that for every 1 i p, we can obtain a sample X i1,..., X in from π i. In this context, researchers are sometimes interested in selecting the populations that give the largest sample means as a result of the experiment, and to estimate the corresponding population means θ i. In this dissertation, we present a frequentist approach to the problem, based on the minimization of the coverage probability, and discuss how to construct confidence intervals for the mean of k 1 selected populations, assuming the populations π i are normal and have a common variance σ 2. Finally, we extend the results for the case when the value of k is randomly chosen and discuss the potential connection of the procedure with false discovery rate analysis. We include numerical studies and a real application example that corroborate this new approach produces confidence intervals that maintain the nominal coverage probability while taking into account the selection procedure. 8

9 CHAPTER 1 INTRODUCTION Given a set of p available technologies (treatments, machines, etc.), researchers must often determine which one is the best, or simply rank them according to a certain pre-specified criteria. For instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or they could be interested in ranking a class of vehicles following a safety standard. This type of problems is known as ranking and selection problems and specific solutions and procedures have been proposed in the literature since the second half of the 20th century, with a start which is usually traced back to Bechhofer (1954), Gupta and Sobel (1957). In his paper, Bechhofer presents a single sample multiple decision procedure for ranking means of normal populations. Assuming the variances of the populations are known, he is able to obtain closed form expressions for the probabilities of a correct ranking in different scenarios. This approach is more concerned with selection of the population with the largest mean rather than estimation of that mean. Gupta and co-authors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P (see Gupta and Panchapakesan (2002)); while Bechhofer uses an indifferent zone. That is, there is a minimum guaranteed probability of selecting the population with the largest mean, as long as that mean is separated from the second largest by a specified distance δ (see Bechhofer et al. (1995)). 1.1 Two Formulations of the Problem Here we are concerned with estimation, and describe two formulations of this problem, with subtle differences between them. Suppose that we have p populations, with unknown means θ i (1 i p). Assuming that for every 1 i p we can obtain a sample X i1,..., X ini from the population π i, we can either: 1. Select the population that has the largest parameter, max{θ 1,..., θ p }, and estimate its value. 9

10 2. Select the population with the largest sample mean, and estimate the corresponding θ i. The first of these problems has been widely discussed in the literature. For example, Blumenthal and Cohen (1968) consider estimating the larger mean from two normal populations and compare different estimators, but they do not discuss how to make the selection. In this direction, Guttman and Tiao (1964) propose a Bayesian procedure consisting in the maximization of the expected posterior utility for a certain utility function U(θ i ). In the same direction, but from a frequentist perspective, Saxena and Tong (1969), Saxena (1976), and Chen and Dudewicz (1976) consider point and interval estimation of the largest mean. 1.2 Inference on the Selected Mean Surprisingly, the second problem has received less attention. In this context, a common and widely used estimator is δ(x) = p i=1 X i I ( X i = X (p) ). This estimator has been discussed in the literature and is known to be biased (Putter and Rubinstein (1968)). This issue becomes clear if we consider all the populations to be identically distributed, for we will be estimating the population mean by an extreme value. Dahiya (1974) addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of the MSE. Progress was made by Cohen and Sackrowitz (1982), Cohen and Sackrowitz (1986) and Gupta and Miescke (1990), where Bayes and generalized Bayes rules were obtained and studied. However, performance theorems are scarce. One exception is Hwang (1993), who proposes an empirical Bayes estimator and shows that it performs better in terms of the Bayes risk with respect to any normal prior. Another exception is Sackrowitz and Samuel-Cahn (1984) who, in the case of the negative exponential distribution, find UMVUE and minimax estimators of the mean of the selected population. The problem of improving the intuitive estimator is technically difficult. In addition, despite the obvious bias problem, it has been difficult to establish its optimality 10

11 properties. Standard investigations in admissibility and minimaxity, following ideas such as Berger (1976), Brown (1979) and Lele (1993) are not straightforward. In this direction, Stein (1964) established the minimaxity and admissibility of the naive estimator for k = 2. Minimaxity for the general case, was established later by Sackrowitz and Samuel-Cahn (1986), were they discussed the case normal case for k 3. Admissibility, for the general case, appears to be still open. Similarly, interval estimation is an equally challenging and again, little can be found in the literature. Typically, confidence intervals are constructed in the usual way, using the standard normal distribution as a reference to attain the desired coverage probability. However these intervals do not maintain the nominal coverage probability, as the number of populations increase. Qiu and Hwang (2007) propose an empirical Bayes approach to construct simultaneous confidence intervals for K selected means, but we are not aware of any other attempts to solve this problem. In their paper, Qiu and Hwang consider a normal-normal model for the mean of the selected population, which assumes that each population mean θ i follows a normal distribution. Under these assumptions they are able to construct simultaneous confidence intervals that maintain the nominal coverage probability and are substantially shorter than the intervals constructed using the Bonferroni s bounds. However the confidence intervals they propose are asymptotically optimal, and since their coverage probabilities are obtained averaging over both sample space and prior, they do not give a valid frequentist interval. We are not aware of any other attempts to solve this problem. Recently, a modern variation of this problem has become very popular, with a major reason being the explosion of genomic data, calling for the development of new methodologies. For instance, in genomic studies, looking either for differential expression or genome wide association, thousands of genes are screened, but only a smaller number are selected for further study. Consequently, the assessment of 11

12 significance, through testing or interval estimation, must take this selection mechanism into account. If the usual confidence intervals are used (not accounting for selection) the actual confidence coefficient is smaller than the nominal level, and approaches zero as the number of genes (populations) increases. In this dissertation, we address the problem of interval estimation and present a frequentist approach to construct confidence intervals for the means of the selected populations, where the selection mechanism are properly described in the corresponding chapters. In Chapter 2 we focus on the problem of selecting one population. In Chapter 3 we introduce a novel methodology to produce confidence intervals when selecting k > 1 populations, where k is a fixed and known number. Later, in Chapter 4 we extend the results for the case k > 1, when k is a random quantity. Finally, in Chapter 5 we discuss the main conclusions and possible extensions of the results presented on this dissertation. 12

13 CHAPTER 2 INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION For 1 i p, let X i1,..., X in be a random sample from a population π i with unknown mean θ i and variance σ 2. Assume the populations π i are independent and normally distributed, so that the sample mean X i = n 1 n j=1 X ij N(θ i, σ 2 /n) for i = 1... p and define the order statistics X (1),..., X (p) as the sample values placed in descending order. In other words, the order statistics satisfy X (1)... X (p). In this context, we want to construct confidence intervals for the mean of the population that gives the largest sample mean as a result of the experiment. Formally, if we define θ (1) = p i=1 θ ii (X i = X (1) ), our aim is to produce confidence intervals for θ (1), based on X (1), such that the confidence coefficient is at least 1 α, for any 0 < α < 1 specified prior to the experiment. It is not difficult to realize that the standard confidence intervals do not maintain the nominal coverage probability. For instance, if all the populations π i are normally distributed with mean θ and variance 1, then, for samples of size n = 1, X 1,..., X p iid N(θ, 1). It follows that P(X (1) x) = Φ p (x θ), where Φ( ) denotes the cdf of the standard normal distribution. Moreover the mean of the selected population θ (1) = θ and hence for any value of c > 0. In particular, when p = 3, we obtain P(θ (1) X (1) ± c) = Φ p (c) Φ p (). P(θ (1) X (1) ± c) = Φ 3 (c) Φ 3 () = (Φ(c) Φ())(Φ 2 (c) + Φ(c)Φ() + Φ 2 ()) = (2Φ(c) 1)(1 Φ(c) + Φ 2 (c)). 13

14 Since 1 Φ(c)+Φ 2 (c) < 1, we have the standard confidence interval is smaller than the nominal level given by 2Φ(c) 1. In fact, it is easy to show that coverage probability maintain the nominal level only for p = 1 and 2, and then decreases as p goes to infinity. The problem is that the traditional intervals do not take into account the selection mechanism. Thus, in order to construct confidence intervals that maintain the nominal level we must take into account the selection procedure. To this end, we first consider the partition of the sample space induced by the order statistics and write P(θ (1) X (1) ± c) = p P(θ i X i ± c, X i = X (1) ). (2 1) i=1 Observe that each term in the sum (2 1) can be explicitly determined using the joint distribution of (X 1,..., X p ). For example, when i = 1 (the first term of the sum), we have P(θ 1 X 1 ± c, X 1 = X (1) ) = P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ). (2 2) In the next section we derive a closed form expression for the coverage probability in (2 1), assuming the population variance σ 2 is known, and present a new approach to obtain the desired confidence intervals. 2.1 The Known Variance Case Suppose the population variance σ 2 is known and define Z j = n(x j θ j )/σ for j = 1,..., p. It follows that Z 1,..., Z p iid N(0, 1) and X 1 X j n(x 1 θ 1 )/σ n(x j θ j + θ j θ 1 )/σ Z 1 Z j + j1 Z 1 Z j j1, where j1 = n(θ j θ 1 )/σ for j = 1,..., p. At this point, to simplify the notation we take n = σ 2 = 1. Then, if we consider the transformation 14

15 z = z 1 ω 2 = z 1 z 2 T :. ω p = z 1 z p we can rewrite (2 2) in terms of p1, and obtain P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ) = P( z c, ω 2 21,..., ω p p1 ) { 1 c p } = e 1 (2π) p/2 2 (ω j z) 2 dω j e 1 2 z 2 dz. Notice that for fixed z, the integrals within the curly brackets { } are essentially the tail probability of a normal distribution centered at z. Therefore, we can write { c p } P( z c, ω 2 21,..., ω p p1 ) = Φ(z j1 ) φ(z)dz, where φ( ) denotes the pdf of the standard normal distribution. Of course, the same argument is valid for the remaining terms of the sum in (2 1). It follows that we can fully describe the probability P(θ (1) X (1) ± c) in terms of a new set of parameters ij s, where ij = θ i θ j for 1 i, j p. Under this representation, for every c > 0, the value of the coverage probability P(θ (1) X (1) ± c) is determined by the relative distances between the population means θ i, i = 1,..., p. In other words, we coverage probability defines a function h c ( ) = P(θ (1) X (1) ± c), where = ( 11, 12,..., pp ) is the vector of possible configurations of the relative distances ij s. In this context, we can obtain confidence intervals for θ (1), that have (at least) the right nominal level, by minimizing first the function h c. Specifically, given 0 α 1, we can determine the value of c > 0 that satisfies j=2 j=2 j1 P(θ (1) X (1) ± c) min h c ( ) = 1 α. (2 3) 15

16 In order to minimize the function h c, we first notice the following properties of the parameters ij s: 1. jj = 0, for every j. 2. ij = ji, for every i, j. 3. For j > k, jk = j,j 1 + j 1,j k+1,k. These properties reveal a certain underlying symmetry in the structure of the problem. This symmetry is portrayed in Table 2-1 where every entry ij corresponds to the difference between the values of θ i and θ j located in row i and column j respectively. In addition, Property 3 indicates that we only need to consider p 1 parameters in order to determine the value of P(θ (1) X (1) ± c). In fact, for any given ordering of the parameters θ i s, we can always choose a representation of the probability in (2 1) based on p 1 parameters ij. As a result, we have that the true ordering of the population means θ i s is not particularly relevant in this approach, and hence, we will assume (without any loss of generality) that θ 1 θ 2... θ p. Although the introduction of the new parameterization seems to reduce (in a sense) the complexity of the problem, the minimization of h c is still difficult. First, because of the delicate balance existing between the ij s in the full expression (see Table 2-1) and second, because the formula of the coverage probability is somehow involved. To illustrate these problems, let us discuss the case p = 2. We have P(θ (1) X (1) ± c) = = c c Φ(z 12 )φ(z)dz + c [Φ(z 12 ) + Φ(z + 12 )]φ(z)dz, Φ(z + 12 )φ(z)dz where 12 > 0. Since only the quantity in brackets [ ] depends on 21 and φ(z) > 0, it seems reasonable to think that h c ( 12 ) = P(θ (1) X (1) ± c) is minimized at the same point where g z ( 12 ) = Φ(z 12 ) + Φ(z + 12 ) finds its minimum. However, differentiating g z 16

17 with respect to 12 we obtain dg z 0, z 0 = φ(z + 12 ) φ(z 12 ) d 12 < 0, z > 0 where we observe that the value of the derivative depends on 12 and z, and consequently, the minimum of h c can not be determined by simple examination of the behavior of g z. From the analysis of g z, we conclude that g z ( 12 ) is minimized at 12 = 0, when z 0 and (asymptotically) at 12 = +, when z > 0. Then, we can establish the inequality P(θ (1) X (1) ± c) 0 2Φ(z)φ(z)dz + c 0 φ(z)dz, however, this lower bound is not obtained by direct minimization of the coverage probability and is less appealing. The problem is that a strategy based on this type of lower bounds may be too conservative and lead to extremely wide intervals when applied to higher dimensions (p > 2). In order to find a formal solution to the minimization problem, we start with the case p = 3. For this case, we can fully describe the probability of interest in terms of the two parameters 12 and 23, as is shown in Table 2-2. We obtain P(θ (1) X (1) ± c) = 1 c 2π + 1 2π c Φ(z 12 )Φ(z )e 1 2 z 2 dz (2 4) + 1 2π c Φ(z + 12 )Φ(z 23 )e 1 2 z 2 dz Φ(z + 23 )Φ(z )e 1 2 z 2 dz, where 12, 23 0 and Φ( ) denotes the cdf of the standard normal distribution. Preliminary studies suggest that the global minimum of h c ( 12, 23 ) = P(θ (1) X (1) ± c) is located at the origin (see Figure 2-1), but a formal proof is required. To this end, it is sufficient to show that h c / 23 > 0 and h c / 12 > 0. 17

18 Taking partial derivatives with respect to 21 we obtain h c = 1 c 12 2π 1 2π + 1 2π 1 2π c c c Φ(z + 23 )e 1 2 ( z) z 2 dz (2 5) Φ(z 12 )e 1 2 ( z) z 2 dz Φ(z 23 )e 1 2 ( 12+z) z 2 dz Φ(z )e 1 2 ( 12 z) z 2 dz. Since the partial derivative depends on both 12 and 23, the behavior of its sign is not obvious, but different numerical studies support the idea that the derivative is non-negative. Figure 2-2 shows the plot of the integrand of h c / 12 for fixed values of 12 and 23. Notice that if we group the first two terms and the last two terms of (2 5), we can look at the partial derivative as the sum of two differences. In Figure 2-3 we observe (in separate plots) the integrands of the first two terms of the partial derivative h c / 12, for fixed values of 12 and 23. The plot suggest that the integrands differ only by a location parameter. In fact, changing variables, we can rewrite the expression in (2 5) as h c 12 = D 1 + D 2, (2 6) where D 1 = 1 { c 2π D 2 = 1 2π { 12 +c c 12 c } Φ(z 12 )e 1 2 ( z) z 2 dz, } Φ(z )e 1 2 ( 12 z) z 2 dz. Recall that 12 > 0, then looking at D 2, we have two possibilities for the intervals of integration: 1. < 12 c < c < 12 + c. 2. < c < 12 c < 12 + c. 18

19 In other words, the intervals may overlap or not. Denoting by R 1 and R 2 the non-common regions of integration, that is R 1 = (, 12 c) and R 2 = (c, 12 + c) for case (1). R 1 = (, c) and R 2 = ( 12 c, 12 + c) for case (2). We have that D 2 is guaranteed to be positive, as long as the integral over R 2 is greater than the integral over R 1, regardless of the case. We first notice that R 1 and R 2 are intervals of the same length. In fact, l(r 1 ) = l(r 2 ) = 12 for case (1), and l(r 1 ) = l(r 2 ) = 2c for case (2). Then, we only need to show that for any two points z 1 R 1 and z 2 R 2 located at a certain distance ɛ > 0 from the extremes of the corresponding intervals, the integrand evaluated at z 2 is greater than the integrand evaluated at z 1. Observe that for any z 1 < z 2, Φ(z ) e z 2 12 z 2 2 Φ(z ) e z 1 12 z 2 1 = q exp{(z 2 z 1 )[ 12 (z 2 + z 1 )]}, (2 7) where q = Φ(z )/Φ(z ) > 1. Then, for any 0 < ɛ < min{ 12, 2c}, take z 1 = 12 c ɛ, z 2 = c + ɛ whenever min{ 12, 2c} = 21 (i.e. case 1) and z 1 = c ɛ, z 2 = 12 c + ɛ whenever min{ 12, 2c} = 2c (i.e. case 2). Replacing these values in (2 7) we obtain the ratio is greater than 1 (regardless the case) which is compelling to conclude that D 2 > 0. Notice that the argument still holds if we replace the cdf Φ( ) by any non-decreasing function or if we change the interval (, c) for ( 1, c 2 ), where c 1, c 2 > 0. This way, we obtain the following more general result: Proposition 2.1. Let 1, 2, c 1, c 2 > 0 and let the function f (z, λ) be non decreasing in z, where λ is an arbitrary set of parameters. Then, { 1 +c c2 1 } f (z, λ)e 1 2 ( 1 z) z 2 dz 0, where the inequality is strict whenever the function f is monotonically increasing in z. 19

20 An immediate consequence of Proposition 2.1 is that D 1 > 0. As a result, we obtain that h/ 12 > 0. A similar argument shows that h/ 23 > 0, completing the proof. It follows that coverage probability P(θ (1) X (1) ± c) is minimized at 12 = 23 = 0, that is, whenever θ 1 = θ 2 = θ 3. Observe that Proposition 2.1 gives a straightforward proof for the case p = 2. In effect, for h c ( 12 ) = P(θ (1) X (1) ± c), we have dh 12 +c c c = φ(z 12 )φ(z)dz φ(z 12 )φ(z)dz. d Then, applying Proposition 2.1 with f = 1/2π, we obtain that h c( 12 ) 0. It immediately follows that the coverage probability is minimized at 12 = 0, or equivalently, when θ 1 = θ 2. For the general case (p > 3), we observe that when moving from the case p = k to the case p = k + 1, we only need to include the extra parameter k+1,k in order to describe the problem (see Table 2-3). Then, using Proposition 2.1 and mathematical induction we obtain the following result: Lemma 1. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1). Then, c2 min P(θ (1) (X (p) c 1, X (p) + c 2 )) θ 1,...,θ p = p Φ p 1 (z)φ(z)dz 1 = Φ p (c 2 ) Φ p ( 1 ), where Φ( ) and φ( ) are respectively the cdf and pdf of the standard normal distribution. Using this lemma, we can easily obtain the following theorem, that summarizes the main results of this section. The proof is straightforward. Theorem 2.1. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i is unknown, but σ 2 is known. Then, a confidence interval for θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is 20

21 given by X (1) ± σ n c, where the value of c satisfies Φ p (c) Φ p () = 1 α. 2.2 The Unknown Variance Case If the variance σ 2 is unknown, we need to estimate its value. We assume that we have an independent estimate s 2 of σ 2, such that s/σ has a pdf ϕ. In a regular experiment, where we observe a sample of size n from each population, s 2 can be taken as the pooled variance estimate and s 2 /σ 2 χ 2 ν, a chi-square distribution with ν = p(n 1) degrees of freedom. Suppose first that p = 3 and for simplicity take n = 1. Then, the coverage probability can be written as P(θ (1) X (1) ± sc) = P( Z 1 cs/σ, Z 1 Z , Z 1 Z ) +P(Z 2 Z , Z 2 cs/σ, Z 2 Z ) +P(Z 3 Z , Z 3 Z , Z 3 cs/σ) (2 8) where Z i = (X i θ i )/σ and ij = (θ i θ j )/σ for 1 i, j 3. Notice that taking t = s/σ we can rewrite each term in the sum (2 8) as a mixture. We obtain P(θ (1) (X (1) sc) = P( Z 1 ct, Z 1 Z , Z 1 Z t)ϕ(t)dt P(Z 2 Z , Z 2 ct, Z 2 Z t)ϕ(t)dt P(Z 3 Z , Z 3 Z , Z 3 ct t)ϕ(t)dt, 21

22 where ϕ( ) denotes the pdf of t. It follows that P(θ (1) X (1) ± sc) = 0 P(θ (1) X (1) ± tc t)ϕ(t)dt, where we know (from Section 2.1) that the probability P(θ (1) X (1) ± tc t) in the integral is minimized at θ 1 = θ 2 = θ 3. The generalization of this result follows from a direct application of Lemma 1. Lemma 2. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1), where both θ i and σ 2 are unknown. If s 2 is an estimate of σ 2 independent of X 1,..., X n, then min P(θ (1) (X (1) sc 1, X (1) sc 2 )) = θ 1,...,θ p 0 (Φ p (c 2 t) Φ p ( 1 t)) ϕ(t)dt, where ϕ( ) is the pdf of s/σ and Φ( ) is the cdf of the standard normal distribution. We end this section with the following theorem. The proof follows directly form Lemma 2. Theorem 2.2. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i and σ 2 are unknown. Then, a confidence interval for the θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is given by X (1) ± s n c, where s = p 1 (n 1) p i=1 s2 i, si 2 satisfies 0 = (n 1) 1 n j=1 (X ij X i ) 2 for i = 1,..., p and c (Φ p (ct) Φ p (t)) ϕ(t)dt = 1 α. 2.3 Numerical Studies In this chapter, we have proposed a method to construct confidence intervals for the mean of the selected population that takes into account the selection procedure. In this section we present some numerical results that compare the performance of the new and the traditional intervals. 22

23 First, we study the behavior of the confidence coefficient, as a function of the number of populations. Results show that the confident coefficient of the traditional intervals decreases rapidly as the number of population increases. This effect is particularly extreme when all the populations have the same mean. Figure 2-4 shows the result of simulations considering up to 30 populations with the same mean and setting α = The solid blue line represents the confidence coefficient obtained using our proposed confidence intervals and the dashed red line depicts the behavior of the confidence coefficient obtained using the standard confidence intervals. Observe that the solid line is constant at the nominal level 95%. Intuitively, in order to maintain the coverage probability constant, the confidence intervals need to get wider. However, this increment is not dramatic and slow down as the number of populations increase. For instance, if we consider populations, the value of the cutoff point is only about In fact, from the inequality in Theorem 2.1 it can be determined that the behavior of the cutoff value c log(p). An indirect way to obtain confidence intervals for θ (1), that attain (at least) the nominal level, would be to construct simultaneous confidence intervals for the means of all the populations considered in the experiment using, for instance, Bonferroni intervals. The natural question is whether such a procedure produces better intervals, in terms of the length. The answer is no. In fact, the size of the Bonferroni intervals increase at a faster rate compared to the intervals we propose. Figure 2-5 shows the behavior of the cutoff point c, as the number of populations increase for the case α = The solid line correspond to the value of the standard cutoff point for a 95% confidence interval (z α/2 = 1.96). The dashed/dotted line represents the value of c for the new confidence intervals and the dashed line correspond to the cutoff values for the Bonferroni intervals. In an applied situation, the population means θ i (1 i p) will be rarely identical. Hence we need to compare the performance of the confidence intervals when the populations means are different. Table 2-4 summarize some results obtained by 23

24 simulations for the case p = 4. The first column shows the true value of the population means (all of them with variance equal to 1) while the second and third column show the observed coverage probability for the traditional and new intervals at a confidence level of 95%. The reported values correspond to the average for the coverage probabilities after ten replications and the numbers in parenthesis are the corresponding standard errors. We observe that our proposed intervals outperform the traditional ones, even when the population means are far apart. It is interesting to notice that even in situations where one of the population should be somehow distinguishable (see row four in Table 2-4), the traditional intervals may perform poorly. 2.4 Tables and Figures Table 2-1. Configuration of the new parameterization for the probability P(θ (1) X (1) ± c). In the table ij = θ i θ j. θ 1 θ 2 θ p θ p1 θ p θ p p1 p2 0 Table 2-2. Configuration of the new parameterization for the case p = 3, when 12 and 23 are the free parameters. In the table ij = θ i θ j. θ 1 θ 2 θ 3 θ ( ) θ θ Table 2-3. Representation of the parameters i,j for the case p = k + 1. θ 1 θ 2... θ k θ k+1 θ k,k k+1,k θ k,k k+1,k θ k -( k,k ) -( k,k )... 0 k+1,k θ k+1 -( k+1,k ) -( k+1,k )... k+1,k 0 24

25 Table 2-4. Observed coverage probability of 95% CI for the mean of the selected population out of four populations using the traditional and the new method. The reported values correspond to the average after ten replications and the number in parenthesis is the corresponding standard error. (θ 1, θ 2, θ 3, θ 4 ) Trad CI New CI (0,0,0,0) (0.0016) (0.0012) (0,0.25,0.5,1) (0.0020) (0.0011) (0,5,10,15) (0.0014) (0.0009) (0,0,0,2) (0.0042) (0.0027) (0,0,0,5) (0.0031) (0.0028) Figure 2-1. Coverage probability as a function of 21 and 32 when p = 3. 25

26 Figure 2-2. Plot of h/ 21 for predetermined values of 21 and Figure 2-3. Plots of the first two terms of h/ 21 for predetermined values of 21 and

27 Confidence Coefficient New Traditional Number of Populations Figure 2-4. Confidence coefficient versus number of populations for the case of identical population means and α = The solid blue line corresponds to the confidence coefficient for the new confidence intervals, and the dashed red line corresponds to the confidence coefficient for the traditional confidence intervals. 27

28 Cutoff Value Nominal New Bonferroni Number of Populations Figure 2-5. Cutoff point versus number of populations for the case of identical population means and α = The dashed blue line corresponds to the cutoff value for the traditional confidence interval, z α/2 = The dashed red line corresponds to the cutoff value for the new intervals and the dashed line corresponds to the cutoff value for the Bonferroni intervals. 28

29 CHAPTER 3 CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF K 1 POPULATIONS Using the same framework as in Chapter 2, we assume that for i = 1,..., p, we have independent random variables X j N(θ j, σ 2 /n). Also, we define the order statistics X (1),..., X (p) according the inequalities X (1)... X (p) and for simplicity, we start considering σ 2 = n = 1. Then, we observe that the mean of the population from which the jth biggest observation, X (j), is sampled, can be written as θ (j) = p θ i I (X i = X (j) ). i=1 In this context, we want to find the value of c > 0 such that P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) 1 α (3 1) for any 0 < α < 1 and 1 k p. (3 1) as Following the same approach we used in Chapter 2, we can write the probability in j 1... j k P(θ (1) X (1) ± c,..., θ (k) X (k) ± c, X (1) = X j1,..., X (k) = X jk ), where the sum has ( p k) terms. Let us consider first, the case p = 4 and k = 2. Then, the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) = i j P(θ i X i ± c, θ j X j ± c, X (1) = X i, X (2) = X j ), (3 2) where 1 i, j 4. In order to obtain closed form expressions for each terms in the sum, observe that for X (1) = X 1 and X (2) = X 2, we have (X (1) = X 1, X (2) = X 2 ) = (X 1 X 2, X 2 X 3, X 2 X 4 ). In other words, the relative order between X 3 and X 4 is irrelevant. 29

30 It follows that we only need to pay attention to possible configurations of the random variables that are the top. In this case the possible configurations are (X 1 X 2, X 2 X 3, X 2 X 4 ) (X 3 X 1, X 1 X 2, X 1 X 4 ) (X 1 X 3, X 3 X 2, X 3 X 4 ) (X 3 X 2, X 2 X 1, X 2 X 4 ) (X 1 X 4, X 4 X 2, X 4 X 3 ) (X 3 X 4, X 4 X 1, X 4 X 2 ) (X 2 X 1, X 1 X 3, X 1 X 4 ) (X 4 X 1, X 1 X 1, X 1 X 3 ) (X 2 X 3, X 3 X 1, X 3 X 4 ) (X 4 X 2, X 2 X 1, X 2 X 3 ) (X 2 X 4, X 4 X 1, X 4 X 3 ) (X 4 X 3, X 3 X 1, X 3 X 2 ) If we define Z j = X j θ j (1 j 4) and ij = θ i θ j (1 i, j 4), we observe X 1 X 2 Z 1 Z X 2 X 3 Z 2 Z X 2 X 4 Z 2 Z , where Z 1,..., Z 4 are iid N(0, 1). Then, the first term of the sum in (3 2) can be written P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = P( Z 1 c, Z 2 c, Z 2 Z , Z 3 Z , Z 4 Z ) and making use of the normality assumptions, we can explicitly write P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = + c min(c,z1 21 ) c min(c,z2 12 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Of course, the same argument is valid for the other terms in the sum. This way, considering all the 12 possible configurations for the order of the random variables X 1, 30

31 X 2, X 3 and X 4 we can write the sum in (3 2) in closed form P(θ (1) X (1) ± c, θ (2) X (2) ± c) = c min(c,z1 21 ) c min(c,z2 12 ) c min(c,z1 31 ) c min(c,z3 13 ) c min(c,z1 41 ) c min(c,z4 14 ) c min(c,z2 32 ) c min(c,z3 23 ) c min(c,z2 42 ) c min(c,z4 24 ) c min(c,z3 43 ) c min(c,z4 34 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(z 3 23 )Φ(z 3 43 })φ(z 1 )φ(z 3 )dz 3 dz 1 Φ(z 1 21 )Φ(z 1 41 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 4 24 )Φ(z 4 34 })φ(z 1 )φ(z 4 )dz 4 dz 1 Φ(z 1 21 )Φ(z 1 31 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(z 3 13 )Φ(z 3 43 })φ(z 3 )φ(z 2 )dz 3 dz 2 Φ(z 2 12 )Φ(z 2 42 })φ(z 3 )φ(z 2 )dz 2 dz 3 Φ(z 4 14 )Φ(z 4 34 })φ(z 4 )φ(z 2 )dz 4 dz 2 Φ(z 2 12 )Φ(z 2 32 })φ(z 4 )φ(z 2 )dz 2 dz 4 Φ(z 4 14 )Φ(z 4 24 })φ(z 3 )φ(z 4 )dz 4 dz 3 Φ(z 3 13 )Φ(z 3 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 In order to minimize this expression, we need to address two difficulties equally challenging: First, the construction of any lower bound need to take into account the delicate balance between the ij s in the expression. Second, special attention need to be paid to the limits of integration. The corners of the form min(c, z ij ) make nearly impossible any procedure based on differentiation. 31

32 To overcome the difficulty due to the corners, we notice that the events (Z 2 Z , Z 3 Z , Z 4 Z ) and (Z 2 Z , Z 3 Z , Z 4 Z ) are disjoint. Hence, we can express the sum of the probabilities for these two events as the probability of their union. Consequently, instead of writing down 12 terms for the sum (one term per configuration), we can express the probability of interest using only 6 terms, each of them describing the two random variables positioned at the top. Working the details, we obtain: X 1 and X 2 at the top. P( Z 1 c, Z 2 c, Z 3 max{z , Z }, Z 4 max{z , Z }) = c c X 1 and X 3 at the top. Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 P( Z 1 c, Z 3 c, Z 2 max{z , Z 3 23 }, Z 4 max{z , Z }) = c c X 1 and X 4 at the top. Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 P( Z 1 c, Z 4 c, Z 2 max{z , Z 4 24 }, Z 3 max{z , Z 4 24 }) = c c X 2 and X 3 at the top. Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 P( Z 2 c, Z 3 c, Z 1 max{z 2 12, Z 3 13 }, Z 4 max{z , Z }) = c c X 2 and X 4 at the top. Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 P( Z 2 c, Z 4 c, Z 1 max{z 2 12, Z }, Z 3 max{z , Z 4 34 }) = c c Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 32

33 X 3 and X 4 at the top. P( Z 3 c, Z 4 c, Z 1 max{z 3 13, Z 4 14 }, Z 2 max{z 3 23, Z 4 24 }) = c c Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 This way, an alternative representation for the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) (3 3) = c c c c c c c c c c c c Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 Observe that this new representation does not completely solve the problem of the corners, but rather removes them from the limits of integration and puts them inside the integrand. Now, we find expressions of the form max{z + ij } in the argument of the normal cdf s Φ( ), which still makes difficult any minimization approach based on differentiation. However, this new representation reveals more clearly the symmetry in the structure of the s, as is portrayed in Table 3-1. This pattern is particularly important since it suggests to generalize the expression for any values of p and k. In order to determine the configuration of s that minimize the expression in (3 3), we assume (without loss of generality) that θ 1 θ 2 θ 3 θ 4, this way ij 0 for any i j. Also, we consider 12, 23 and 34 as free parameters. Based on our previous results, it is reasonable to believe that the minimum of (3 3) is reached at the origin. In order to prove this claim we have studied the behavior of the 33

34 coverage probability (CP) for different configurations of the ij s, with special attention to the behavior at the boundary. Among others we considered the following cases: 12 = 23 = 34 = 0 12 > 0, 23 = 34 = 0 c CP = c c CP = 6 Φ 2 (max{z 1, z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 c c c c c 12, 23 > 0 and 34 = 0: CP = c c c c c c c c c c Φ 2 (max{z , z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z 2 12, z 3 12 })Φ(max{z 2, z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12 + Φ 2 (max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z 3 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z 2 12, z 3 13 })Φ(max{z , z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 3 13, z 4 13 })Φ(max{z 3 23, z 4 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12, 23 + However, none of the cases we considered provided conclusive (analytical) evidence that the minimum is at the origin. On the contrary, various numerical studies has suggested that the minimum is not located located at the origin (see Figure 3-1), but the current formulation of the problem makes difficult even to establish that is not located at the interior of the region determined by 12, 23 and 34. section. These difficulties call for a different approach which we discuss in the following 34

35 3.1 An Alternative Approach So far, we have approached the problem considering partitions of the coverage probability based on the possible configurations of the vector (X (1), X (2),..., X (k) ). Notice that such approach, by construction, takes into account the relative orderings between the variables that are selected (the top k). Instead, we can consider an alternative that do not take explicit consideration of the ordering between the variables that have been selected. Notice there are ( p k) different ways to select k out of p populations, without considering the order. Suppose that j indexes one of such arrangements and denote by X j1,..., X jk the top k variables, and by X jk+1,..., X jp are the bottom p k. Then, we can separate the sample space according to min{x j1,..., X jk } max{x jk+1,..., X jp } for j = 1,..., ( p k). This way, the coverage probability can be written P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = P(θ j1 X j1 ± c,..., θ jk X jk ± c, min{x j1,..., X jk } max{x jk+1,..., X jp }) j=1 Let us consider first the term where (X 1, X 2,..., X k ) are at the top. For this case, the corresponding piece of relevant probability is P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) θ1 +c θk +c p = P θj (X j min{x 1,..., x k })f (x 1,..., x k )dx 1 dx k θ 1 θ k j=k+1 where f (x 1,..., x k ) is the joint density of (X 1,..., X k ). Hence, making use of the the normality assumptions, we have P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) c c p k = Φ(min{z 1 + θ 1,..., z k + θ k } θ j ) φ(z i )dz i, j=k+1 i=1 where z i = x i θ 1 for i = 1,..., k. 35

36 From here, it is not difficult to obtain the following expression for the coverage probability P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c c m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l, (3 4) l I j l I j where I j = {j 1,..., j k }, the set of indices for the top k variables in the j-th arrangement and I c j arrangement. = {j k+1,..., j p }, the set of indices for the bottom p k variables in the j-th Notice that if k = 1 we are back in the case discussed in Chapter 2 and the case k = p correspond to simultaneous confidence intervals. Let us take a closer look at this formula and consider first the case p = 6 and k = 3. In such case, the sum in (3 4) will have ( 6 3) = 20 terms determined by the configurations where the numbers to the left of the vertical line are the indices of the set I j (the populations being selected) and the numbers to the right the indices of the set I c j (the populations being not selected). Observe that all the indices appear on the left side (and on the right side) the same number of times (10), revealing some symmetry in the problem. 36

37 Using this symmetry, suppose that θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 and let θ 6. Then, for the 10 groups for which 6 is on the right side, the corresponding term goes to zero. For the remaining groups (for which 6 appears on the left) the value of Φ (min l=1,...,k {z jl + θ jl } θ jm ) is not affected by θ 6, and the coverage probability is determined by the following configuration which correspond to the possible ways of choosing 2 out of 5 populations. Repeating the argument, but letting θ 5 we obtain the configuration which are the possible ways to choose 1 out of 4 populations. For this case, we know (from Chapter 2) that the minimum is reached at θ 1 = θ 2 = θ 3 = θ 4. This example suggests that the coverage probability is minimized when the biggest p k 1 population means are sent to + and the remaining k + 1 are set to be equal. However, a formal argument is required. For the general case (1 k < p) the number of possible configurations is ( ) p k = = ( ) ( ) p 1 p 1 + k p k ( ) ( ) p 1 p 1 + k k 1 where ( ) p 1 k is the number of times that any given index j appears on the right side (population j is not selected) and ( p 1 k 1) is the number of configurations that have index j on the left side (population j is selected). 37

38 Suppose (without any loss of generality) that θ 1... θ p and define where I ( ) is the indicator function. ( ) I j (θ p ) = I min {z l + θ l } z p + θ p l I j {p} ( ) Ij c (θ p ) = I min {z l + θ l } < z p + θ p l I j {p} From the definition, it immediately follows min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) (3 5) l I j l I j {p} and therefore, the coverage probability can be written as P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 ( k) p + j=1 c c c c Φ ((z p + θ p ) θ m ) I j (θ p ) φ(z l )dz l l I j m I c j m I c j Now, observe that as θ p ( ) Φ min {z l + θ l } θ m Ij c (θ p ) φ(z l )dz l l I j {p} l I j min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) l I j l I j {p} min {z l + θ l } l I j {p} and hence m I c j ( ) Φ min{z l + θ l } θ m ( ) Φ min {z l + θ l } θ m l I j m Ij c l I j {p} for all the terms for which θ p is on the left side. At the same time, for the terms where θ p is on the right side, we have m I c j ( ) Φ min{z l + θ l } θ m 0, l I j 38

39 and therefore, as θ p, the coverage probability converges to ( k 1) p 1 c j=1 c m I c j ( ) Φ min {z l + θ l } θ m φ(z l )dz l. l I j {p} l I j Before we move forward, let us consider the example p = 3, k = 2. Then, the coverage probability is P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) ( 3 2) = = + + c c j=1 c c c c c c and, as θ 3, we obtain m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j Φ(min{z 1 + θ 1, z 2 + θ 2 } θ 3 )φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 6) Φ(min{z 2 + θ 2, z 3 + θ 3 } θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3, M = + c c c c Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 7) Φ(z 2 + θ 2 θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3. Suppose now, that for a fixed θ p, min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3. Since we are assuming that θ 1 θ 2 θ 3, this can only happens for certain values of z 1 and z 3. Let R 1 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 1 + θ 1 } and R 2 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3 }. Then, the integral in (3 6) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 3 + θ 3 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. R 1 R 2 Similarly, the integral in (3 7) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 R 1 R 2 39

40 and, since θ 3 θ 2 θ 1 θ 2, we obtain c c c c Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. Using similar argument with the third integral in the coverage probability, we conclude that P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) M. For the general case, suppose that θ p (fixed) is such that I j (θ p ) = 1 for some j. That is min l Ij {z l + θ l } = (z p + θ p ). Under the assumption θ 1... θ p, we have θ p θ m θ l θ m for any 1 m, l p and therefore, I j (θ p ) can be equal to 1 only in a certain region of the hyper-cube (, c) k. Then, partitioning the integrals accordingly, we obtain P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c ( k 1) p 1 c j=1 c c m I c j m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j ( ) Φ min {z l + θ l } θ m φ(z l )dz l, (3 8) l I j {p} l I j where the equality is attained asymptotically as θ p approaches infinity. Integrating (3 8) with respect to z p, we obtain ( p 1 (Φ(c) Φ()) k 1) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p} φ(z l )dz l where the quantity in brackets [ ] is exactly the coverage probability for selecting k 1 out of p 1. Repeating the argument, but now letting θ p 1, we obtain the lower bound ( p 2 (Φ(c) Φ()) 2 k 2) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p,p 2} φ(z l )dz l. 40

confidence intervals for the means of the k selected populations,

confidence intervals for the means of the k selected populations, Electronic Journal of Statistics Vol. 12 (2018) 58 79 ISSN: 1935-7524 https://doi.org/10.1214/17-ejs1374 Confidence intervals for the means of the selected populations Claudio Fuentes Department of Statistics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

arxiv: v1 [math.st] 5 Jul 2007

arxiv: v1 [math.st] 5 Jul 2007 EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

Hybrid Censoring; An Introduction 2

Hybrid Censoring; An Introduction 2 Hybrid Censoring; An Introduction 2 Debasis Kundu Department of Mathematics & Statistics Indian Institute of Technology Kanpur 23-rd November, 2010 2 This is a joint work with N. Balakrishnan Debasis Kundu

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bootstrap Confidence Intervals

Bootstrap Confidence Intervals Bootstrap Confidence Intervals Patrick Breheny September 18 Patrick Breheny STA 621: Nonparametric Statistics 1/22 Introduction Bootstrap confidence intervals So far, we have discussed the idea behind

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Econ 2148, spring 2019 Statistical decision theory

Econ 2148, spring 2019 Statistical decision theory Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t ) LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

Analysis of Type-II Progressively Hybrid Censored Data

Analysis of Type-II Progressively Hybrid Censored Data Analysis of Type-II Progressively Hybrid Censored Data Debasis Kundu & Avijit Joarder Abstract The mixture of Type-I and Type-II censoring schemes, called the hybrid censoring scheme is quite common in

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

1 Lyapunov theory of stability

1 Lyapunov theory of stability M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY BY YINGQIU MA A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Unit 5a: Comparisons via Simulation. Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology

Unit 5a: Comparisons via Simulation. Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology Unit 5a: Comparisons via Simulation Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology Motivation Simulations are typically run to compare 2 or more

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Econ 2140, spring 2018, Part IIa Statistical Decision Theory Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES

STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES The Pennsylvania State University The Graduate School Department of Mathematics STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES A Dissertation in Mathematics by John T. Ethier c 008 John T. Ethier

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Supplementary appendix to the paper Hierarchical cheap talk Not for publication

Supplementary appendix to the paper Hierarchical cheap talk Not for publication Supplementary appendix to the paper Hierarchical cheap talk Not for publication Attila Ambrus, Eduardo M. Azevedo, and Yuichiro Kamada December 3, 011 1 Monotonicity of the set of pure-strategy equilibria

More information

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Interval Estimation. Chapter 9

Interval Estimation. Chapter 9 Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information

Comment on The Veil of Public Ignorance

Comment on The Veil of Public Ignorance Comment on The Veil of Public Ignorance Geoffroy de Clippel February 2010 Nehring (2004) proposes an interesting methodology to extend the utilitarian criterion defined under complete information to an

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Sequential Decisions

Sequential Decisions Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

A proof of the existence of good nested lattices

A proof of the existence of good nested lattices A proof of the existence of good nested lattices Dinesh Krithivasan and S. Sandeep Pradhan July 24, 2007 1 Introduction We show the existence of a sequence of nested lattices (Λ (n) 1, Λ(n) ) with Λ (n)

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information