INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS
|
|
- Hester Weaver
- 5 years ago
- Views:
Transcription
1 INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS By CLAUDIO FUENTES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2011
2 c 2011 Claudio Fuentes 2
3 To my parents, who have been there in every step 3
4 ACKNOWLEDGMENTS I would like to gratefully and sincerely thank Dr. George Casella for his guidance, understanding and patience during my graduate studies at the University of Florida. Working with him, as a research assistant and as a student, has been one of the most rewarding experiences of my life. His wealth of knowledge and experience has shaped not only the way I understand statistics today. I would also like to thank my graduate committee members: Dr. Michael Daniels, Dr. Malay Ghosh and Dr. Gary Peter for their understanding and support, throughout the whole process. Their sharp comments and suggestions have greatly improved the quality of this work. I am deeply grateful to all my teachers and professors. In particular those at the University of Florida and the Pontificia Universidad Católica de Chile. It is not a exaggeration to say that almost everything I know today is the product of their dedication and excellence at teaching. Without any doubts, they thought me more than I could learn. Thank you Dr. Alvaro Cofré. I would not be here writing these lines if it was not for your constant support and inspiration. Finally, I would like to thank my parents Jorge Fuentes and Edith Meléndez. It is because of their unconditional love and support that I have been able to reach this far. 4
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Two Formulations of the Problem Inference on the Selected Mean INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION The Known Variance Case The Unknown Variance Case Numerical Studies Tables and Figures CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF k 1 POPULATIONS An Alternative Approach Numerical Studies Tables and Figures INTERVAL ESTIMATION FOLLOWING THE SELECTION OF A RANDOM NUMBER OF POPULATIONS Connection to FDR Tables and Figures APPLICATION EXAMPLE Fixed Selection Random Selection Tables and Figures CONCLUSIONS LIST OF REFERENCES BIOGRAPHICAL SKETCH
6 Table LIST OF TABLES page 2-1 Configuration of the new parameterization for the coverage probability Configuration of the new parameterization for the case p = Representation of the parameters i,j when p = k Coverage probability of 95% CI for the selected mean when p = Structure of the s for the case p = 4, k = Coverage probabilities for the number of population means vs the number of selected populations Observed confidence coefficient for 95% CI when p = Cutoff points for 95% CI using the new method Confidence intervals for fixed top log-score differences Confidence intervals for random top log-score differences
7 Figure LIST OF FIGURES page 2-1 Coverage probability as a function of 21 and 32 when p = Plot of h/ 21 when p = Plots of the first two terms of h/ Confidence coefficient vs the number of populations for the iid case and α = Cutoff point versus number of populations for the iid case and α = Coverage probabilities as a function of when p = Individual components for the coverage probability for random K Lower bound for random K varying the probability selection Coverage probabilities for random K for different values of p
8 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy INTERVAL ESTIMATION FOR THE MEAN OF THE SELECTED POPULATIONS Chair: Dr. George Casella Major: Statistics By Claudio Fuentes August 2011 Consider an experiment in which p independent populations π i, with corresponding unknown means θ i are available and suppose that for every 1 i p, we can obtain a sample X i1,..., X in from π i. In this context, researchers are sometimes interested in selecting the populations that give the largest sample means as a result of the experiment, and to estimate the corresponding population means θ i. In this dissertation, we present a frequentist approach to the problem, based on the minimization of the coverage probability, and discuss how to construct confidence intervals for the mean of k 1 selected populations, assuming the populations π i are normal and have a common variance σ 2. Finally, we extend the results for the case when the value of k is randomly chosen and discuss the potential connection of the procedure with false discovery rate analysis. We include numerical studies and a real application example that corroborate this new approach produces confidence intervals that maintain the nominal coverage probability while taking into account the selection procedure. 8
9 CHAPTER 1 INTRODUCTION Given a set of p available technologies (treatments, machines, etc.), researchers must often determine which one is the best, or simply rank them according to a certain pre-specified criteria. For instance, researchers may be interested in determining what treatment is more efficient in fighting a certain disease, or they could be interested in ranking a class of vehicles following a safety standard. This type of problems is known as ranking and selection problems and specific solutions and procedures have been proposed in the literature since the second half of the 20th century, with a start which is usually traced back to Bechhofer (1954), Gupta and Sobel (1957). In his paper, Bechhofer presents a single sample multiple decision procedure for ranking means of normal populations. Assuming the variances of the populations are known, he is able to obtain closed form expressions for the probabilities of a correct ranking in different scenarios. This approach is more concerned with selection of the population with the largest mean rather than estimation of that mean. Gupta and co-authors have pioneered the subset selection approach, in which a subset of populations is selected with a minimum probability guarantee of containing the largest mean with certain probability P (see Gupta and Panchapakesan (2002)); while Bechhofer uses an indifferent zone. That is, there is a minimum guaranteed probability of selecting the population with the largest mean, as long as that mean is separated from the second largest by a specified distance δ (see Bechhofer et al. (1995)). 1.1 Two Formulations of the Problem Here we are concerned with estimation, and describe two formulations of this problem, with subtle differences between them. Suppose that we have p populations, with unknown means θ i (1 i p). Assuming that for every 1 i p we can obtain a sample X i1,..., X ini from the population π i, we can either: 1. Select the population that has the largest parameter, max{θ 1,..., θ p }, and estimate its value. 9
10 2. Select the population with the largest sample mean, and estimate the corresponding θ i. The first of these problems has been widely discussed in the literature. For example, Blumenthal and Cohen (1968) consider estimating the larger mean from two normal populations and compare different estimators, but they do not discuss how to make the selection. In this direction, Guttman and Tiao (1964) propose a Bayesian procedure consisting in the maximization of the expected posterior utility for a certain utility function U(θ i ). In the same direction, but from a frequentist perspective, Saxena and Tong (1969), Saxena (1976), and Chen and Dudewicz (1976) consider point and interval estimation of the largest mean. 1.2 Inference on the Selected Mean Surprisingly, the second problem has received less attention. In this context, a common and widely used estimator is δ(x) = p i=1 X i I ( X i = X (p) ). This estimator has been discussed in the literature and is known to be biased (Putter and Rubinstein (1968)). This issue becomes clear if we consider all the populations to be identically distributed, for we will be estimating the population mean by an extreme value. Dahiya (1974) addresses this problem for the case of two normal populations and proposed estimators that perform better in terms of the MSE. Progress was made by Cohen and Sackrowitz (1982), Cohen and Sackrowitz (1986) and Gupta and Miescke (1990), where Bayes and generalized Bayes rules were obtained and studied. However, performance theorems are scarce. One exception is Hwang (1993), who proposes an empirical Bayes estimator and shows that it performs better in terms of the Bayes risk with respect to any normal prior. Another exception is Sackrowitz and Samuel-Cahn (1984) who, in the case of the negative exponential distribution, find UMVUE and minimax estimators of the mean of the selected population. The problem of improving the intuitive estimator is technically difficult. In addition, despite the obvious bias problem, it has been difficult to establish its optimality 10
11 properties. Standard investigations in admissibility and minimaxity, following ideas such as Berger (1976), Brown (1979) and Lele (1993) are not straightforward. In this direction, Stein (1964) established the minimaxity and admissibility of the naive estimator for k = 2. Minimaxity for the general case, was established later by Sackrowitz and Samuel-Cahn (1986), were they discussed the case normal case for k 3. Admissibility, for the general case, appears to be still open. Similarly, interval estimation is an equally challenging and again, little can be found in the literature. Typically, confidence intervals are constructed in the usual way, using the standard normal distribution as a reference to attain the desired coverage probability. However these intervals do not maintain the nominal coverage probability, as the number of populations increase. Qiu and Hwang (2007) propose an empirical Bayes approach to construct simultaneous confidence intervals for K selected means, but we are not aware of any other attempts to solve this problem. In their paper, Qiu and Hwang consider a normal-normal model for the mean of the selected population, which assumes that each population mean θ i follows a normal distribution. Under these assumptions they are able to construct simultaneous confidence intervals that maintain the nominal coverage probability and are substantially shorter than the intervals constructed using the Bonferroni s bounds. However the confidence intervals they propose are asymptotically optimal, and since their coverage probabilities are obtained averaging over both sample space and prior, they do not give a valid frequentist interval. We are not aware of any other attempts to solve this problem. Recently, a modern variation of this problem has become very popular, with a major reason being the explosion of genomic data, calling for the development of new methodologies. For instance, in genomic studies, looking either for differential expression or genome wide association, thousands of genes are screened, but only a smaller number are selected for further study. Consequently, the assessment of 11
12 significance, through testing or interval estimation, must take this selection mechanism into account. If the usual confidence intervals are used (not accounting for selection) the actual confidence coefficient is smaller than the nominal level, and approaches zero as the number of genes (populations) increases. In this dissertation, we address the problem of interval estimation and present a frequentist approach to construct confidence intervals for the means of the selected populations, where the selection mechanism are properly described in the corresponding chapters. In Chapter 2 we focus on the problem of selecting one population. In Chapter 3 we introduce a novel methodology to produce confidence intervals when selecting k > 1 populations, where k is a fixed and known number. Later, in Chapter 4 we extend the results for the case k > 1, when k is a random quantity. Finally, in Chapter 5 we discuss the main conclusions and possible extensions of the results presented on this dissertation. 12
13 CHAPTER 2 INTERVAL ESTIMATION FOLLOWING THE SELECTION OF ONE POPULATION For 1 i p, let X i1,..., X in be a random sample from a population π i with unknown mean θ i and variance σ 2. Assume the populations π i are independent and normally distributed, so that the sample mean X i = n 1 n j=1 X ij N(θ i, σ 2 /n) for i = 1... p and define the order statistics X (1),..., X (p) as the sample values placed in descending order. In other words, the order statistics satisfy X (1)... X (p). In this context, we want to construct confidence intervals for the mean of the population that gives the largest sample mean as a result of the experiment. Formally, if we define θ (1) = p i=1 θ ii (X i = X (1) ), our aim is to produce confidence intervals for θ (1), based on X (1), such that the confidence coefficient is at least 1 α, for any 0 < α < 1 specified prior to the experiment. It is not difficult to realize that the standard confidence intervals do not maintain the nominal coverage probability. For instance, if all the populations π i are normally distributed with mean θ and variance 1, then, for samples of size n = 1, X 1,..., X p iid N(θ, 1). It follows that P(X (1) x) = Φ p (x θ), where Φ( ) denotes the cdf of the standard normal distribution. Moreover the mean of the selected population θ (1) = θ and hence for any value of c > 0. In particular, when p = 3, we obtain P(θ (1) X (1) ± c) = Φ p (c) Φ p (). P(θ (1) X (1) ± c) = Φ 3 (c) Φ 3 () = (Φ(c) Φ())(Φ 2 (c) + Φ(c)Φ() + Φ 2 ()) = (2Φ(c) 1)(1 Φ(c) + Φ 2 (c)). 13
14 Since 1 Φ(c)+Φ 2 (c) < 1, we have the standard confidence interval is smaller than the nominal level given by 2Φ(c) 1. In fact, it is easy to show that coverage probability maintain the nominal level only for p = 1 and 2, and then decreases as p goes to infinity. The problem is that the traditional intervals do not take into account the selection mechanism. Thus, in order to construct confidence intervals that maintain the nominal level we must take into account the selection procedure. To this end, we first consider the partition of the sample space induced by the order statistics and write P(θ (1) X (1) ± c) = p P(θ i X i ± c, X i = X (1) ). (2 1) i=1 Observe that each term in the sum (2 1) can be explicitly determined using the joint distribution of (X 1,..., X p ). For example, when i = 1 (the first term of the sum), we have P(θ 1 X 1 ± c, X 1 = X (1) ) = P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ). (2 2) In the next section we derive a closed form expression for the coverage probability in (2 1), assuming the population variance σ 2 is known, and present a new approach to obtain the desired confidence intervals. 2.1 The Known Variance Case Suppose the population variance σ 2 is known and define Z j = n(x j θ j )/σ for j = 1,..., p. It follows that Z 1,..., Z p iid N(0, 1) and X 1 X j n(x 1 θ 1 )/σ n(x j θ j + θ j θ 1 )/σ Z 1 Z j + j1 Z 1 Z j j1, where j1 = n(θ j θ 1 )/σ for j = 1,..., p. At this point, to simplify the notation we take n = σ 2 = 1. Then, if we consider the transformation 14
15 z = z 1 ω 2 = z 1 z 2 T :. ω p = z 1 z p we can rewrite (2 2) in terms of p1, and obtain P(θ 1 X 1 ± c, X 1 X 2,..., X 1 X p ) = P( z c, ω 2 21,..., ω p p1 ) { 1 c p } = e 1 (2π) p/2 2 (ω j z) 2 dω j e 1 2 z 2 dz. Notice that for fixed z, the integrals within the curly brackets { } are essentially the tail probability of a normal distribution centered at z. Therefore, we can write { c p } P( z c, ω 2 21,..., ω p p1 ) = Φ(z j1 ) φ(z)dz, where φ( ) denotes the pdf of the standard normal distribution. Of course, the same argument is valid for the remaining terms of the sum in (2 1). It follows that we can fully describe the probability P(θ (1) X (1) ± c) in terms of a new set of parameters ij s, where ij = θ i θ j for 1 i, j p. Under this representation, for every c > 0, the value of the coverage probability P(θ (1) X (1) ± c) is determined by the relative distances between the population means θ i, i = 1,..., p. In other words, we coverage probability defines a function h c ( ) = P(θ (1) X (1) ± c), where = ( 11, 12,..., pp ) is the vector of possible configurations of the relative distances ij s. In this context, we can obtain confidence intervals for θ (1), that have (at least) the right nominal level, by minimizing first the function h c. Specifically, given 0 α 1, we can determine the value of c > 0 that satisfies j=2 j=2 j1 P(θ (1) X (1) ± c) min h c ( ) = 1 α. (2 3) 15
16 In order to minimize the function h c, we first notice the following properties of the parameters ij s: 1. jj = 0, for every j. 2. ij = ji, for every i, j. 3. For j > k, jk = j,j 1 + j 1,j k+1,k. These properties reveal a certain underlying symmetry in the structure of the problem. This symmetry is portrayed in Table 2-1 where every entry ij corresponds to the difference between the values of θ i and θ j located in row i and column j respectively. In addition, Property 3 indicates that we only need to consider p 1 parameters in order to determine the value of P(θ (1) X (1) ± c). In fact, for any given ordering of the parameters θ i s, we can always choose a representation of the probability in (2 1) based on p 1 parameters ij. As a result, we have that the true ordering of the population means θ i s is not particularly relevant in this approach, and hence, we will assume (without any loss of generality) that θ 1 θ 2... θ p. Although the introduction of the new parameterization seems to reduce (in a sense) the complexity of the problem, the minimization of h c is still difficult. First, because of the delicate balance existing between the ij s in the full expression (see Table 2-1) and second, because the formula of the coverage probability is somehow involved. To illustrate these problems, let us discuss the case p = 2. We have P(θ (1) X (1) ± c) = = c c Φ(z 12 )φ(z)dz + c [Φ(z 12 ) + Φ(z + 12 )]φ(z)dz, Φ(z + 12 )φ(z)dz where 12 > 0. Since only the quantity in brackets [ ] depends on 21 and φ(z) > 0, it seems reasonable to think that h c ( 12 ) = P(θ (1) X (1) ± c) is minimized at the same point where g z ( 12 ) = Φ(z 12 ) + Φ(z + 12 ) finds its minimum. However, differentiating g z 16
17 with respect to 12 we obtain dg z 0, z 0 = φ(z + 12 ) φ(z 12 ) d 12 < 0, z > 0 where we observe that the value of the derivative depends on 12 and z, and consequently, the minimum of h c can not be determined by simple examination of the behavior of g z. From the analysis of g z, we conclude that g z ( 12 ) is minimized at 12 = 0, when z 0 and (asymptotically) at 12 = +, when z > 0. Then, we can establish the inequality P(θ (1) X (1) ± c) 0 2Φ(z)φ(z)dz + c 0 φ(z)dz, however, this lower bound is not obtained by direct minimization of the coverage probability and is less appealing. The problem is that a strategy based on this type of lower bounds may be too conservative and lead to extremely wide intervals when applied to higher dimensions (p > 2). In order to find a formal solution to the minimization problem, we start with the case p = 3. For this case, we can fully describe the probability of interest in terms of the two parameters 12 and 23, as is shown in Table 2-2. We obtain P(θ (1) X (1) ± c) = 1 c 2π + 1 2π c Φ(z 12 )Φ(z )e 1 2 z 2 dz (2 4) + 1 2π c Φ(z + 12 )Φ(z 23 )e 1 2 z 2 dz Φ(z + 23 )Φ(z )e 1 2 z 2 dz, where 12, 23 0 and Φ( ) denotes the cdf of the standard normal distribution. Preliminary studies suggest that the global minimum of h c ( 12, 23 ) = P(θ (1) X (1) ± c) is located at the origin (see Figure 2-1), but a formal proof is required. To this end, it is sufficient to show that h c / 23 > 0 and h c / 12 > 0. 17
18 Taking partial derivatives with respect to 21 we obtain h c = 1 c 12 2π 1 2π + 1 2π 1 2π c c c Φ(z + 23 )e 1 2 ( z) z 2 dz (2 5) Φ(z 12 )e 1 2 ( z) z 2 dz Φ(z 23 )e 1 2 ( 12+z) z 2 dz Φ(z )e 1 2 ( 12 z) z 2 dz. Since the partial derivative depends on both 12 and 23, the behavior of its sign is not obvious, but different numerical studies support the idea that the derivative is non-negative. Figure 2-2 shows the plot of the integrand of h c / 12 for fixed values of 12 and 23. Notice that if we group the first two terms and the last two terms of (2 5), we can look at the partial derivative as the sum of two differences. In Figure 2-3 we observe (in separate plots) the integrands of the first two terms of the partial derivative h c / 12, for fixed values of 12 and 23. The plot suggest that the integrands differ only by a location parameter. In fact, changing variables, we can rewrite the expression in (2 5) as h c 12 = D 1 + D 2, (2 6) where D 1 = 1 { c 2π D 2 = 1 2π { 12 +c c 12 c } Φ(z 12 )e 1 2 ( z) z 2 dz, } Φ(z )e 1 2 ( 12 z) z 2 dz. Recall that 12 > 0, then looking at D 2, we have two possibilities for the intervals of integration: 1. < 12 c < c < 12 + c. 2. < c < 12 c < 12 + c. 18
19 In other words, the intervals may overlap or not. Denoting by R 1 and R 2 the non-common regions of integration, that is R 1 = (, 12 c) and R 2 = (c, 12 + c) for case (1). R 1 = (, c) and R 2 = ( 12 c, 12 + c) for case (2). We have that D 2 is guaranteed to be positive, as long as the integral over R 2 is greater than the integral over R 1, regardless of the case. We first notice that R 1 and R 2 are intervals of the same length. In fact, l(r 1 ) = l(r 2 ) = 12 for case (1), and l(r 1 ) = l(r 2 ) = 2c for case (2). Then, we only need to show that for any two points z 1 R 1 and z 2 R 2 located at a certain distance ɛ > 0 from the extremes of the corresponding intervals, the integrand evaluated at z 2 is greater than the integrand evaluated at z 1. Observe that for any z 1 < z 2, Φ(z ) e z 2 12 z 2 2 Φ(z ) e z 1 12 z 2 1 = q exp{(z 2 z 1 )[ 12 (z 2 + z 1 )]}, (2 7) where q = Φ(z )/Φ(z ) > 1. Then, for any 0 < ɛ < min{ 12, 2c}, take z 1 = 12 c ɛ, z 2 = c + ɛ whenever min{ 12, 2c} = 21 (i.e. case 1) and z 1 = c ɛ, z 2 = 12 c + ɛ whenever min{ 12, 2c} = 2c (i.e. case 2). Replacing these values in (2 7) we obtain the ratio is greater than 1 (regardless the case) which is compelling to conclude that D 2 > 0. Notice that the argument still holds if we replace the cdf Φ( ) by any non-decreasing function or if we change the interval (, c) for ( 1, c 2 ), where c 1, c 2 > 0. This way, we obtain the following more general result: Proposition 2.1. Let 1, 2, c 1, c 2 > 0 and let the function f (z, λ) be non decreasing in z, where λ is an arbitrary set of parameters. Then, { 1 +c c2 1 } f (z, λ)e 1 2 ( 1 z) z 2 dz 0, where the inequality is strict whenever the function f is monotonically increasing in z. 19
20 An immediate consequence of Proposition 2.1 is that D 1 > 0. As a result, we obtain that h/ 12 > 0. A similar argument shows that h/ 23 > 0, completing the proof. It follows that coverage probability P(θ (1) X (1) ± c) is minimized at 12 = 23 = 0, that is, whenever θ 1 = θ 2 = θ 3. Observe that Proposition 2.1 gives a straightforward proof for the case p = 2. In effect, for h c ( 12 ) = P(θ (1) X (1) ± c), we have dh 12 +c c c = φ(z 12 )φ(z)dz φ(z 12 )φ(z)dz. d Then, applying Proposition 2.1 with f = 1/2π, we obtain that h c( 12 ) 0. It immediately follows that the coverage probability is minimized at 12 = 0, or equivalently, when θ 1 = θ 2. For the general case (p > 3), we observe that when moving from the case p = k to the case p = k + 1, we only need to include the extra parameter k+1,k in order to describe the problem (see Table 2-3). Then, using Proposition 2.1 and mathematical induction we obtain the following result: Lemma 1. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1). Then, c2 min P(θ (1) (X (p) c 1, X (p) + c 2 )) θ 1,...,θ p = p Φ p 1 (z)φ(z)dz 1 = Φ p (c 2 ) Φ p ( 1 ), where Φ( ) and φ( ) are respectively the cdf and pdf of the standard normal distribution. Using this lemma, we can easily obtain the following theorem, that summarizes the main results of this section. The proof is straightforward. Theorem 2.1. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i is unknown, but σ 2 is known. Then, a confidence interval for θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is 20
21 given by X (1) ± σ n c, where the value of c satisfies Φ p (c) Φ p () = 1 α. 2.2 The Unknown Variance Case If the variance σ 2 is unknown, we need to estimate its value. We assume that we have an independent estimate s 2 of σ 2, such that s/σ has a pdf ϕ. In a regular experiment, where we observe a sample of size n from each population, s 2 can be taken as the pooled variance estimate and s 2 /σ 2 χ 2 ν, a chi-square distribution with ν = p(n 1) degrees of freedom. Suppose first that p = 3 and for simplicity take n = 1. Then, the coverage probability can be written as P(θ (1) X (1) ± sc) = P( Z 1 cs/σ, Z 1 Z , Z 1 Z ) +P(Z 2 Z , Z 2 cs/σ, Z 2 Z ) +P(Z 3 Z , Z 3 Z , Z 3 cs/σ) (2 8) where Z i = (X i θ i )/σ and ij = (θ i θ j )/σ for 1 i, j 3. Notice that taking t = s/σ we can rewrite each term in the sum (2 8) as a mixture. We obtain P(θ (1) (X (1) sc) = P( Z 1 ct, Z 1 Z , Z 1 Z t)ϕ(t)dt P(Z 2 Z , Z 2 ct, Z 2 Z t)ϕ(t)dt P(Z 3 Z , Z 3 Z , Z 3 ct t)ϕ(t)dt, 21
22 where ϕ( ) denotes the pdf of t. It follows that P(θ (1) X (1) ± sc) = 0 P(θ (1) X (1) ± tc t)ϕ(t)dt, where we know (from Section 2.1) that the probability P(θ (1) X (1) ± tc t) in the integral is minimized at θ 1 = θ 2 = θ 3. The generalization of this result follows from a direct application of Lemma 1. Lemma 2. Let c 1, c 2 > 0 and for p 2, let X 1,..., X p be independent random variables with X i N(θ i, 1), where both θ i and σ 2 are unknown. If s 2 is an estimate of σ 2 independent of X 1,..., X n, then min P(θ (1) (X (1) sc 1, X (1) sc 2 )) = θ 1,...,θ p 0 (Φ p (c 2 t) Φ p ( 1 t)) ϕ(t)dt, where ϕ( ) is the pdf of s/σ and Φ( ) is the cdf of the standard normal distribution. We end this section with the following theorem. The proof follows directly form Lemma 2. Theorem 2.2. Let 0 < α < 1 and for i = 1,..., p, suppose that X i1,..., X in is a random sample from a N(θ i, σ 2 ), where θ i and σ 2 are unknown. Then, a confidence interval for the θ (1) = p i=1 θ ii (X i = X (1) ) with a confidence coefficient of (at least) (1 α) is given by X (1) ± s n c, where s = p 1 (n 1) p i=1 s2 i, si 2 satisfies 0 = (n 1) 1 n j=1 (X ij X i ) 2 for i = 1,..., p and c (Φ p (ct) Φ p (t)) ϕ(t)dt = 1 α. 2.3 Numerical Studies In this chapter, we have proposed a method to construct confidence intervals for the mean of the selected population that takes into account the selection procedure. In this section we present some numerical results that compare the performance of the new and the traditional intervals. 22
23 First, we study the behavior of the confidence coefficient, as a function of the number of populations. Results show that the confident coefficient of the traditional intervals decreases rapidly as the number of population increases. This effect is particularly extreme when all the populations have the same mean. Figure 2-4 shows the result of simulations considering up to 30 populations with the same mean and setting α = The solid blue line represents the confidence coefficient obtained using our proposed confidence intervals and the dashed red line depicts the behavior of the confidence coefficient obtained using the standard confidence intervals. Observe that the solid line is constant at the nominal level 95%. Intuitively, in order to maintain the coverage probability constant, the confidence intervals need to get wider. However, this increment is not dramatic and slow down as the number of populations increase. For instance, if we consider populations, the value of the cutoff point is only about In fact, from the inequality in Theorem 2.1 it can be determined that the behavior of the cutoff value c log(p). An indirect way to obtain confidence intervals for θ (1), that attain (at least) the nominal level, would be to construct simultaneous confidence intervals for the means of all the populations considered in the experiment using, for instance, Bonferroni intervals. The natural question is whether such a procedure produces better intervals, in terms of the length. The answer is no. In fact, the size of the Bonferroni intervals increase at a faster rate compared to the intervals we propose. Figure 2-5 shows the behavior of the cutoff point c, as the number of populations increase for the case α = The solid line correspond to the value of the standard cutoff point for a 95% confidence interval (z α/2 = 1.96). The dashed/dotted line represents the value of c for the new confidence intervals and the dashed line correspond to the cutoff values for the Bonferroni intervals. In an applied situation, the population means θ i (1 i p) will be rarely identical. Hence we need to compare the performance of the confidence intervals when the populations means are different. Table 2-4 summarize some results obtained by 23
24 simulations for the case p = 4. The first column shows the true value of the population means (all of them with variance equal to 1) while the second and third column show the observed coverage probability for the traditional and new intervals at a confidence level of 95%. The reported values correspond to the average for the coverage probabilities after ten replications and the numbers in parenthesis are the corresponding standard errors. We observe that our proposed intervals outperform the traditional ones, even when the population means are far apart. It is interesting to notice that even in situations where one of the population should be somehow distinguishable (see row four in Table 2-4), the traditional intervals may perform poorly. 2.4 Tables and Figures Table 2-1. Configuration of the new parameterization for the probability P(θ (1) X (1) ± c). In the table ij = θ i θ j. θ 1 θ 2 θ p θ p1 θ p θ p p1 p2 0 Table 2-2. Configuration of the new parameterization for the case p = 3, when 12 and 23 are the free parameters. In the table ij = θ i θ j. θ 1 θ 2 θ 3 θ ( ) θ θ Table 2-3. Representation of the parameters i,j for the case p = k + 1. θ 1 θ 2... θ k θ k+1 θ k,k k+1,k θ k,k k+1,k θ k -( k,k ) -( k,k )... 0 k+1,k θ k+1 -( k+1,k ) -( k+1,k )... k+1,k 0 24
25 Table 2-4. Observed coverage probability of 95% CI for the mean of the selected population out of four populations using the traditional and the new method. The reported values correspond to the average after ten replications and the number in parenthesis is the corresponding standard error. (θ 1, θ 2, θ 3, θ 4 ) Trad CI New CI (0,0,0,0) (0.0016) (0.0012) (0,0.25,0.5,1) (0.0020) (0.0011) (0,5,10,15) (0.0014) (0.0009) (0,0,0,2) (0.0042) (0.0027) (0,0,0,5) (0.0031) (0.0028) Figure 2-1. Coverage probability as a function of 21 and 32 when p = 3. 25
26 Figure 2-2. Plot of h/ 21 for predetermined values of 21 and Figure 2-3. Plots of the first two terms of h/ 21 for predetermined values of 21 and
27 Confidence Coefficient New Traditional Number of Populations Figure 2-4. Confidence coefficient versus number of populations for the case of identical population means and α = The solid blue line corresponds to the confidence coefficient for the new confidence intervals, and the dashed red line corresponds to the confidence coefficient for the traditional confidence intervals. 27
28 Cutoff Value Nominal New Bonferroni Number of Populations Figure 2-5. Cutoff point versus number of populations for the case of identical population means and α = The dashed blue line corresponds to the cutoff value for the traditional confidence interval, z α/2 = The dashed red line corresponds to the cutoff value for the new intervals and the dashed line corresponds to the cutoff value for the Bonferroni intervals. 28
29 CHAPTER 3 CONFIDENCE INTERVALS FOLLOWING THE SELECTION OF K 1 POPULATIONS Using the same framework as in Chapter 2, we assume that for i = 1,..., p, we have independent random variables X j N(θ j, σ 2 /n). Also, we define the order statistics X (1),..., X (p) according the inequalities X (1)... X (p) and for simplicity, we start considering σ 2 = n = 1. Then, we observe that the mean of the population from which the jth biggest observation, X (j), is sampled, can be written as θ (j) = p θ i I (X i = X (j) ). i=1 In this context, we want to find the value of c > 0 such that P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) 1 α (3 1) for any 0 < α < 1 and 1 k p. (3 1) as Following the same approach we used in Chapter 2, we can write the probability in j 1... j k P(θ (1) X (1) ± c,..., θ (k) X (k) ± c, X (1) = X j1,..., X (k) = X jk ), where the sum has ( p k) terms. Let us consider first, the case p = 4 and k = 2. Then, the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) = i j P(θ i X i ± c, θ j X j ± c, X (1) = X i, X (2) = X j ), (3 2) where 1 i, j 4. In order to obtain closed form expressions for each terms in the sum, observe that for X (1) = X 1 and X (2) = X 2, we have (X (1) = X 1, X (2) = X 2 ) = (X 1 X 2, X 2 X 3, X 2 X 4 ). In other words, the relative order between X 3 and X 4 is irrelevant. 29
30 It follows that we only need to pay attention to possible configurations of the random variables that are the top. In this case the possible configurations are (X 1 X 2, X 2 X 3, X 2 X 4 ) (X 3 X 1, X 1 X 2, X 1 X 4 ) (X 1 X 3, X 3 X 2, X 3 X 4 ) (X 3 X 2, X 2 X 1, X 2 X 4 ) (X 1 X 4, X 4 X 2, X 4 X 3 ) (X 3 X 4, X 4 X 1, X 4 X 2 ) (X 2 X 1, X 1 X 3, X 1 X 4 ) (X 4 X 1, X 1 X 1, X 1 X 3 ) (X 2 X 3, X 3 X 1, X 3 X 4 ) (X 4 X 2, X 2 X 1, X 2 X 3 ) (X 2 X 4, X 4 X 1, X 4 X 3 ) (X 4 X 3, X 3 X 1, X 3 X 2 ) If we define Z j = X j θ j (1 j 4) and ij = θ i θ j (1 i, j 4), we observe X 1 X 2 Z 1 Z X 2 X 3 Z 2 Z X 2 X 4 Z 2 Z , where Z 1,..., Z 4 are iid N(0, 1). Then, the first term of the sum in (3 2) can be written P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = P( Z 1 c, Z 2 c, Z 2 Z , Z 3 Z , Z 4 Z ) and making use of the normality assumptions, we can explicitly write P(θ 1 X 1 ± c, θ 2 X 2 ± c, X 1 X 2, X 2 X 3, X 2 X 4 ) = + c min(c,z1 21 ) c min(c,z2 12 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Of course, the same argument is valid for the other terms in the sum. This way, considering all the 12 possible configurations for the order of the random variables X 1, 30
31 X 2, X 3 and X 4 we can write the sum in (3 2) in closed form P(θ (1) X (1) ± c, θ (2) X (2) ± c) = c min(c,z1 21 ) c min(c,z2 12 ) c min(c,z1 31 ) c min(c,z3 13 ) c min(c,z1 41 ) c min(c,z4 14 ) c min(c,z2 32 ) c min(c,z3 23 ) c min(c,z2 42 ) c min(c,z4 24 ) c min(c,z3 43 ) c min(c,z4 34 ) Φ(z 2 32 )Φ(z 2 43 })φ(z 1 )φ(z 2 )dz 2 dz 1 Φ(z 1 31 )Φ(z 1 41 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(z 3 23 )Φ(z 3 43 })φ(z 1 )φ(z 3 )dz 3 dz 1 Φ(z 1 21 )Φ(z 1 41 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 4 24 )Φ(z 4 34 })φ(z 1 )φ(z 4 )dz 4 dz 1 Φ(z 1 21 )Φ(z 1 31 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(z 3 13 )Φ(z 3 43 })φ(z 3 )φ(z 2 )dz 3 dz 2 Φ(z 2 12 )Φ(z 2 42 })φ(z 3 )φ(z 2 )dz 2 dz 3 Φ(z 4 14 )Φ(z 4 34 })φ(z 4 )φ(z 2 )dz 4 dz 2 Φ(z 2 12 )Φ(z 2 32 })φ(z 4 )φ(z 2 )dz 2 dz 4 Φ(z 4 14 )Φ(z 4 24 })φ(z 3 )φ(z 4 )dz 4 dz 3 Φ(z 3 13 )Φ(z 3 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 In order to minimize this expression, we need to address two difficulties equally challenging: First, the construction of any lower bound need to take into account the delicate balance between the ij s in the expression. Second, special attention need to be paid to the limits of integration. The corners of the form min(c, z ij ) make nearly impossible any procedure based on differentiation. 31
32 To overcome the difficulty due to the corners, we notice that the events (Z 2 Z , Z 3 Z , Z 4 Z ) and (Z 2 Z , Z 3 Z , Z 4 Z ) are disjoint. Hence, we can express the sum of the probabilities for these two events as the probability of their union. Consequently, instead of writing down 12 terms for the sum (one term per configuration), we can express the probability of interest using only 6 terms, each of them describing the two random variables positioned at the top. Working the details, we obtain: X 1 and X 2 at the top. P( Z 1 c, Z 2 c, Z 3 max{z , Z }, Z 4 max{z , Z }) = c c X 1 and X 3 at the top. Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 P( Z 1 c, Z 3 c, Z 2 max{z , Z 3 23 }, Z 4 max{z , Z }) = c c X 1 and X 4 at the top. Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 P( Z 1 c, Z 4 c, Z 2 max{z , Z 4 24 }, Z 3 max{z , Z 4 24 }) = c c X 2 and X 3 at the top. Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 P( Z 2 c, Z 3 c, Z 1 max{z 2 12, Z 3 13 }, Z 4 max{z , Z }) = c c X 2 and X 4 at the top. Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 P( Z 2 c, Z 4 c, Z 1 max{z 2 12, Z }, Z 3 max{z , Z 4 34 }) = c c Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 32
33 X 3 and X 4 at the top. P( Z 3 c, Z 4 c, Z 1 max{z 3 13, Z 4 14 }, Z 2 max{z 3 23, Z 4 24 }) = c c Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 This way, an alternative representation for the probability of interest is P(θ (1) X (1) ± c, θ (2) X (2) ± c) (3 3) = c c c c c c c c c c c c Φ(max{z , z })Φ(max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z , z 4 24 })Φ(max{z , z 4 34 })φ(z 1 )φ(z 4 )dz 1 dz 4 Φ(max{z 2 12, z 3 13 })Φ(max{z , z })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 2 12, z 4 14 })Φ(max{z , z 4 34 })φ(z 2 )φ(z 4 )dz 2 dz 4 Φ(max{z , z 4 14 })Φ(max{z 3 23, z 4 24 })φ(z 3 )φ(z 4 )dz 3 dz 4 Observe that this new representation does not completely solve the problem of the corners, but rather removes them from the limits of integration and puts them inside the integrand. Now, we find expressions of the form max{z + ij } in the argument of the normal cdf s Φ( ), which still makes difficult any minimization approach based on differentiation. However, this new representation reveals more clearly the symmetry in the structure of the s, as is portrayed in Table 3-1. This pattern is particularly important since it suggests to generalize the expression for any values of p and k. In order to determine the configuration of s that minimize the expression in (3 3), we assume (without loss of generality) that θ 1 θ 2 θ 3 θ 4, this way ij 0 for any i j. Also, we consider 12, 23 and 34 as free parameters. Based on our previous results, it is reasonable to believe that the minimum of (3 3) is reached at the origin. In order to prove this claim we have studied the behavior of the 33
34 coverage probability (CP) for different configurations of the ij s, with special attention to the behavior at the boundary. Among others we considered the following cases: 12 = 23 = 34 = 0 12 > 0, 23 = 34 = 0 c CP = c c CP = 6 Φ 2 (max{z 1, z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 c c c c c 12, 23 > 0 and 34 = 0: CP = c c c c c c c c c c Φ 2 (max{z , z 2 })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z 2 12, z 3 12 })Φ(max{z 2, z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12 + Φ 2 (max{z , z })φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(max{z , z 3 23 })Φ(max{z , z 3 })φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(max{z 2 12, z 3 13 })Φ(max{z , z 3 })φ(z 2 )φ(z 3 )dz 2 dz 3 Φ(max{z 3 13, z 4 13 })Φ(max{z 3 23, z 4 23 })φ(z 3 )φ(z 4 )dz 3 dz 4 φ(z 1 )φ(z 2 )dz 1 dz 2, as 12, 23 + However, none of the cases we considered provided conclusive (analytical) evidence that the minimum is at the origin. On the contrary, various numerical studies has suggested that the minimum is not located located at the origin (see Figure 3-1), but the current formulation of the problem makes difficult even to establish that is not located at the interior of the region determined by 12, 23 and 34. section. These difficulties call for a different approach which we discuss in the following 34
35 3.1 An Alternative Approach So far, we have approached the problem considering partitions of the coverage probability based on the possible configurations of the vector (X (1), X (2),..., X (k) ). Notice that such approach, by construction, takes into account the relative orderings between the variables that are selected (the top k). Instead, we can consider an alternative that do not take explicit consideration of the ordering between the variables that have been selected. Notice there are ( p k) different ways to select k out of p populations, without considering the order. Suppose that j indexes one of such arrangements and denote by X j1,..., X jk the top k variables, and by X jk+1,..., X jp are the bottom p k. Then, we can separate the sample space according to min{x j1,..., X jk } max{x jk+1,..., X jp } for j = 1,..., ( p k). This way, the coverage probability can be written P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = P(θ j1 X j1 ± c,..., θ jk X jk ± c, min{x j1,..., X jk } max{x jk+1,..., X jp }) j=1 Let us consider first the term where (X 1, X 2,..., X k ) are at the top. For this case, the corresponding piece of relevant probability is P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) θ1 +c θk +c p = P θj (X j min{x 1,..., x k })f (x 1,..., x k )dx 1 dx k θ 1 θ k j=k+1 where f (x 1,..., x k ) is the joint density of (X 1,..., X k ). Hence, making use of the the normality assumptions, we have P(θ 1 X 1 ± c,..., θ k X k ± c, min{x 1,..., X k } max{x k+1,..., X p }) c c p k = Φ(min{z 1 + θ 1,..., z k + θ k } θ j ) φ(z i )dz i, j=k+1 i=1 where z i = x i θ 1 for i = 1,..., k. 35
36 From here, it is not difficult to obtain the following expression for the coverage probability P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c c m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l, (3 4) l I j l I j where I j = {j 1,..., j k }, the set of indices for the top k variables in the j-th arrangement and I c j arrangement. = {j k+1,..., j p }, the set of indices for the bottom p k variables in the j-th Notice that if k = 1 we are back in the case discussed in Chapter 2 and the case k = p correspond to simultaneous confidence intervals. Let us take a closer look at this formula and consider first the case p = 6 and k = 3. In such case, the sum in (3 4) will have ( 6 3) = 20 terms determined by the configurations where the numbers to the left of the vertical line are the indices of the set I j (the populations being selected) and the numbers to the right the indices of the set I c j (the populations being not selected). Observe that all the indices appear on the left side (and on the right side) the same number of times (10), revealing some symmetry in the problem. 36
37 Using this symmetry, suppose that θ 1 θ 2 θ 3 θ 4 θ 5 θ 6 and let θ 6. Then, for the 10 groups for which 6 is on the right side, the corresponding term goes to zero. For the remaining groups (for which 6 appears on the left) the value of Φ (min l=1,...,k {z jl + θ jl } θ jm ) is not affected by θ 6, and the coverage probability is determined by the following configuration which correspond to the possible ways of choosing 2 out of 5 populations. Repeating the argument, but letting θ 5 we obtain the configuration which are the possible ways to choose 1 out of 4 populations. For this case, we know (from Chapter 2) that the minimum is reached at θ 1 = θ 2 = θ 3 = θ 4. This example suggests that the coverage probability is minimized when the biggest p k 1 population means are sent to + and the remaining k + 1 are set to be equal. However, a formal argument is required. For the general case (1 k < p) the number of possible configurations is ( ) p k = = ( ) ( ) p 1 p 1 + k p k ( ) ( ) p 1 p 1 + k k 1 where ( ) p 1 k is the number of times that any given index j appears on the right side (population j is not selected) and ( p 1 k 1) is the number of configurations that have index j on the left side (population j is selected). 37
38 Suppose (without any loss of generality) that θ 1... θ p and define where I ( ) is the indicator function. ( ) I j (θ p ) = I min {z l + θ l } z p + θ p l I j {p} ( ) Ij c (θ p ) = I min {z l + θ l } < z p + θ p l I j {p} From the definition, it immediately follows min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) (3 5) l I j l I j {p} and therefore, the coverage probability can be written as P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 ( k) p + j=1 c c c c Φ ((z p + θ p ) θ m ) I j (θ p ) φ(z l )dz l l I j m I c j m I c j Now, observe that as θ p ( ) Φ min {z l + θ l } θ m Ij c (θ p ) φ(z l )dz l l I j {p} l I j min{z l + θ l } = (z p + θ p )I j (θ p ) + min {z l + θ l }Ij c (θ p ) l I j l I j {p} min {z l + θ l } l I j {p} and hence m I c j ( ) Φ min{z l + θ l } θ m ( ) Φ min {z l + θ l } θ m l I j m Ij c l I j {p} for all the terms for which θ p is on the left side. At the same time, for the terms where θ p is on the right side, we have m I c j ( ) Φ min{z l + θ l } θ m 0, l I j 38
39 and therefore, as θ p, the coverage probability converges to ( k 1) p 1 c j=1 c m I c j ( ) Φ min {z l + θ l } θ m φ(z l )dz l. l I j {p} l I j Before we move forward, let us consider the example p = 3, k = 2. Then, the coverage probability is P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) ( 3 2) = = + + c c j=1 c c c c c c and, as θ 3, we obtain m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j Φ(min{z 1 + θ 1, z 2 + θ 2 } θ 3 )φ(z 1 )φ(z 2 )dz 1 dz 2 Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 6) Φ(min{z 2 + θ 2, z 3 + θ 3 } θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3, M = + c c c c Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 (3 7) Φ(z 2 + θ 2 θ 1 )φ(z 2 )φ(z 3 )dz 2 dz 3. Suppose now, that for a fixed θ p, min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3. Since we are assuming that θ 1 θ 2 θ 3, this can only happens for certain values of z 1 and z 3. Let R 1 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 1 + θ 1 } and R 2 = {(z 1, z 3 ) : min{z 1 + θ 1, z 3 + θ 3 } = z 3 + θ 3 }. Then, the integral in (3 6) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 3 + θ 3 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. R 1 R 2 Similarly, the integral in (3 7) can be written as Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 + Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 R 1 R 2 39
40 and, since θ 3 θ 2 θ 1 θ 2, we obtain c c c c Φ(min{z 1 + θ 1, z 3 + θ 3 } θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3 Φ(z 1 + θ 1 θ 2 )φ(z 1 )φ(z 3 )dz 1 dz 3. Using similar argument with the third integral in the coverage probability, we conclude that P(θ (1) X (1) ± c,..., θ (3) X (3) ± c) M. For the general case, suppose that θ p (fixed) is such that I j (θ p ) = 1 for some j. That is min l Ij {z l + θ l } = (z p + θ p ). Under the assumption θ 1... θ p, we have θ p θ m θ l θ m for any 1 m, l p and therefore, I j (θ p ) can be equal to 1 only in a certain region of the hyper-cube (, c) k. Then, partitioning the integrals accordingly, we obtain P(θ (1) X (1) ± c,..., θ (k) X (k) ± c) ( k) p = j=1 c ( k 1) p 1 c j=1 c c m I c j m I c j ( ) Φ min{z l + θ l } θ m φ(z l )dz l l I j l I j ( ) Φ min {z l + θ l } θ m φ(z l )dz l, (3 8) l I j {p} l I j where the equality is attained asymptotically as θ p approaches infinity. Integrating (3 8) with respect to z p, we obtain ( p 1 (Φ(c) Φ()) k 1) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p} φ(z l )dz l where the quantity in brackets [ ] is exactly the coverage probability for selecting k 1 out of p 1. Repeating the argument, but now letting θ p 1, we obtain the lower bound ( p 2 (Φ(c) Φ()) 2 k 2) c j=1 c m I c j ( ) Φ min {z l + θ l } θ m l I j {p} l I j {p,p 2} φ(z l )dz l. 40
confidence intervals for the means of the k selected populations,
Electronic Journal of Statistics Vol. 12 (2018) 58 79 ISSN: 1935-7524 https://doi.org/10.1214/17-ejs1374 Confidence intervals for the means of the selected populations Claudio Fuentes Department of Statistics
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationLECTURE NOTES 57. Lecture 9
LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationStructure learning in human causal induction
Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use
More informationTest Volume 11, Number 1. June 2002
Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationControlling Bayes Directional False Discovery Rate in Random Effects Model 1
Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationarxiv: v1 [math.st] 5 Jul 2007
EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationP Values and Nuisance Parameters
P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;
More informationHybrid Censoring; An Introduction 2
Hybrid Censoring; An Introduction 2 Debasis Kundu Department of Mathematics & Statistics Indian Institute of Technology Kanpur 23-rd November, 2010 2 This is a joint work with N. Balakrishnan Debasis Kundu
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationBootstrap Confidence Intervals
Bootstrap Confidence Intervals Patrick Breheny September 18 Patrick Breheny STA 621: Nonparametric Statistics 1/22 Introduction Bootstrap confidence intervals So far, we have discussed the idea behind
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationEcon 2148, spring 2019 Statistical decision theory
Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationIntroduction to Probability
LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationPeter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationLecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )
LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationExact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring
Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme
More informationAnalysis of Type-II Progressively Hybrid Censored Data
Analysis of Type-II Progressively Hybrid Censored Data Debasis Kundu & Avijit Joarder Abstract The mixture of Type-I and Type-II censoring schemes, called the hybrid censoring scheme is quite common in
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More information1 Lyapunov theory of stability
M.Kawski, APM 581 Diff Equns Intro to Lyapunov theory. November 15, 29 1 1 Lyapunov theory of stability Introduction. Lyapunov s second (or direct) method provides tools for studying (asymptotic) stability
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationUncertain Inference and Artificial Intelligence
March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini
More informationMULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY
MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY BY YINGQIU MA A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationUnit 5a: Comparisons via Simulation. Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology
Unit 5a: Comparisons via Simulation Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology Motivation Simulations are typically run to compare 2 or more
More informationFDR and ROC: Similarities, Assumptions, and Decisions
EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers
More informationEcon 2140, spring 2018, Part IIa Statistical Decision Theory
Econ 2140, spring 2018, Part IIa Maximilian Kasy Department of Economics, Harvard University 1 / 35 Examples of decision problems Decide whether or not the hypothesis of no racial discrimination in job
More informationModule 1. Probability
Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive
More informationSTRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES
The Pennsylvania State University The Graduate School Department of Mathematics STRONG FORMS OF ORTHOGONALITY FOR SETS OF HYPERCUBES A Dissertation in Mathematics by John T. Ethier c 008 John T. Ethier
More informationChapter Three. Hypothesis Testing
3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationA BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain
A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationPost-Selection Inference
Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationSupplementary appendix to the paper Hierarchical cheap talk Not for publication
Supplementary appendix to the paper Hierarchical cheap talk Not for publication Attila Ambrus, Eduardo M. Azevedo, and Yuichiro Kamada December 3, 011 1 Monotonicity of the set of pure-strategy equilibria
More informationLecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from
Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationInterval Estimation. Chapter 9
Chapter 9 Interval Estimation 9.1 Introduction Definition 9.1.1 An interval estimate of a real-values parameter θ is any pair of functions, L(x 1,..., x n ) and U(x 1,..., x n ), of a sample that satisfy
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationComparing Non-informative Priors for Estimation and. Prediction in Spatial Models
Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified
More informationComment on The Veil of Public Ignorance
Comment on The Veil of Public Ignorance Geoffroy de Clippel February 2010 Nehring (2004) proposes an interesting methodology to extend the utilitarian criterion defined under complete information to an
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationSequential Decisions
Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationA proof of the existence of good nested lattices
A proof of the existence of good nested lattices Dinesh Krithivasan and S. Sandeep Pradhan July 24, 2007 1 Introduction We show the existence of a sequence of nested lattices (Λ (n) 1, Λ(n) ) with Λ (n)
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationModel Complexity of Pseudo-independent Models
Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract
More information3 Undirected Graphical Models
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More information