SIMULATION STUDIES AND IMPLEMENTATION OF BOOTSTRAP-BASED MULTIPLE TESTING PROCEDURES

Size: px
Start display at page:

Download "SIMULATION STUDIES AND IMPLEMENTATION OF BOOTSTRAP-BASED MULTIPLE TESTING PROCEDURES"

Transcription

1 SIMULATION STUDIES AND IMPLEMENTATION OF BOOTSTRAP-BASED MULTIPLE TESTING PROCEDURES A thesis submitted to the faculty of San Francisco State University In partial fulfillment of The requirements for The degree Master of Arts in Mathematics by Vera Klimkovsky San Francisco, California May, 2006

2 Copyright by Vera Klimkovsky 2006

3 CERTIFICATION OF APPROVAL I certify that I have read Simulation studies and implementation of bootstrap-based multiple testing procedures by Vera Klimkovsky, and that in my opinion this work meets the criteria for approving a thesis in partial fulfillment of the requirements for the degree: Master of Arts in Mathematics at San Francisco State University. Mohammad R. Kafai Professor of Mathematics Eric Hayashi Professor of Mathematics Sergei Ovchinnikov Professor of Mathematics

4 SIMULATION STUDIES AND IMPLEMENTATION OF BOOTSTRAP-BASED MULTIPLE TESTING PROCEDURES Vera Klimkovsky San Francisco State University 2006 In this project, I use Statistical Analysis System (SAS) software to implement and perform simulation studies of new multiple testing procedures proposed recently. The key feature of these procedures is to use bootstrap re-sampling techniques (parametric or nonparametric) to obtain a consistent estimator of the test statistics null distribution to derive the cut-offs. Theoretically, it has been shown that the test statistics null distribution will be asymptotically multivariate normal for the asymptotically linear estimators of the parameters under consideration, and the bootstrap estimated null distribution will provide asymptotic control of type I error rate. I certify that the Abstract is a correct representation of the content of this thesis. Chair, Thesis Committee Date

5 v ACKNOWLEDGEMENTS First of all, I would like to acknowledge my thesis advisor Dr. Mohammad Kafai whose passion for the Theory of Probability and Statistics inspired my interest in the subject matter. I like to thank Dr. Mohammad Kafai for his continuous support and encouragement. My special thanks go to all my thesis committee, Mohammad Kafai, Eric Hayashi, and Sergei Ovchinnikov, for their time to carefully review earlier versions of this thesis, and valuable suggestions which led to a substantially improved final version. Also, many thanks to Dr. David Meredith and my thesis advisors for being so patient while I took my time to work on this project. Thank you for believing in me. In addition, I would like to thank all my teachers of mathematics at San Francisco State University for inspiration, motivation, encouragement and support in studying mathematics throughout the years of my graduate work.

6 Contents List of Tables List of Figures ix x 1 Introduction Why multiple hypothesis testing? Example: HLA-disease associations Thesis overview Methods of multiple hypothesis testing Background: Analysis of Variance Multiple Hypothesis Testing Basic Concepts Defined Classification of methods Single-Step Procedures LSD Test Bonferroni procedure Šidák s Method Tukey s Method Tukey-Kramer method Scheffé Method Stepwise Procedures Holm s procedure Shaffer method Choosing the right method Bootstrap-based resampling techniques Introduction vi

7 vii 3.2 Basic definitions Nonparametric bootstrap Parametric bootstrap Bootstrap estimate of standard error Bootstrap estimate of bias Bootstrap confidence intervals The bootstrap-t interval Bootstrap-based hypothesis testing Robustness and failure of bootstrap Jackknife as an approximation to Bootstrap Jackknife samples and estimates Jackknife or Bootstrap? Proposed bootstrap multiple testing procedures Microarrays as a Motivating Factor Background Microarray Experiment Multiple Hypothesis Testing Model Null hypotheses Hypothesis testing Multiple Testing Procedures Single-step common-quantile procedure Single-step common-cut-off procedure Proposed test statistics null distribution Bootstrap-based single step procedures Bootstrap estimation of the null distribution Advantages of the Proposed Procedures Simulation Studies Formulation of the Problem Objectives of Experiments Tests about the Mean Normal Distribution Models Poisson Distribution Models

8 viii 6 Implementation Software Program Design and Implementation Details Conclusion Summary of the Proposed Methods What has been done in the study Areas Left for Future investigation A SAS code 81 A.1 Supplydata sas code A.2 Procedure 3 sas code A.3 Procedure 1 sas code

9 List of Tables 3.1 Evidence against H Expression data from two groups of subjects: cancer patients and healthy controls. The data are already normalized [10] n realizations of a random g-vector Type I and Type II errors Summary of postulated and simulated models Summary of test statistic null distribution, theoretical and bootstrap estimated Summary of test statistic null distribution, theoretical and bootstrap estimated The results of the twenty simultaneous tests about the mean vector of the multivariate normal distribution Summary of postulated and simulated models Summary of test statistic null distribution (from poisson), theoretical and bootstrap estimated The results of the twenty simultaneous tests about the mean vector of multivariate poission distribution ix

10 List of Figures 5.1 Empirical Normal distribution Estimated bootstrap distibution Empirical Possion Distribution (λ = 2.5) Estimated Bootstrap Distribution The main flowchart diagram of the project The flowchart diagram of supplydata.sas The flow chart diagram for procedure The flow chart diagram for procedure x

11 Chapter 1 Introduction But is not always so; it may happen that small differences in the initial conditions produce very great ones in the final phenomena. Jules H. Poincare 1.1 Why multiple hypothesis testing? Multiple Hypothesis Testing is a test of more than one hypothesis at the same time. It represents a rich field of scientific research in a branch of inferential statistics addressing aspects of multiple comparison. For over half a century, beginning with the names of world famous statisticians such as Fisher, Tukey, Bonferroni, and Duncan, just to name a few, statisticians have worked on development and improvement of multiple comparison procedures based on the parameters under consideration and various assumptions about the underlying distributions, still leaving room for improvement or alternative implementation. Even the very need for multiple hypothesis testing 1

12 2 remains controversial as in which situations should multiple methods be applied and whether they should be applied at all. What are the considerations that we must have when addressing the question of statistical significance? To keep it simple, let us suppose that we like to study how vitamins affect people s strength. In our experiment, we randomly divide, say, 100 people into 5 groups of 20 and ask each person to take a daily vitamin pill. One group is assigned to a control group and they take a placebo (a pill that contains no vitamins at all). The remaining four groups are treatment groups and they take, respectively, a low dose of vitamin brand A, a high dose of vitamin brand A, a low dose of vitamin brand B, and a high dose of vitamin brand B. The response variable is a certain characteristic of people s strength. Is there a significant difference in responses between a control group and each treatment group? Is there a significant difference in responses between groups taking different dosages, different vitamins brands, different dosages and different brands? The attempt to answer all these questions will lead to ( 5 2) = 10 pairwise comparisons. If we test each null hypothesis of no significant difference between two different groups at the 5% level of significance and all tests are independent, then the probability that we falsely reject at least one true null hypothesis is P (at least one false positive) = 1 (.95) 10 = In other words, with only 10 hypothesis tests performed simultaneously, there is slightly over 40% chance the researchers will report the significant findings when in reality no effect exists. This probability increases rapidly with the increase of the number of tests required. With as many as 20 tests, the probability of at least one false rejection

13 3 already reaches 64%. The primary concern of the theory of the multiple hypothesis testing is to develop methods that would adjust or account for multiplicity effect. 1.2 Example: HLA-disease associations Questions raised in various research fields such as medicine, economics, engineering sciences, and social sciences often call for Multiple Hypothesis Testing. Human genetics White blood cells are components of blood and are part of the immune system. Another name for white blood cells is leukocytes or immune cells. White blood cells carry a group of genes called the human leukocyte antigen (HLA) system. It consists of several closely linked genetic loci on chromosome number 6. The loci within the HLA system are highly polymorphic showing numerous alleles (i.e. alternative forms of a single gene). It has been demonstrated that some human leukocyte antigens (genetic markers) are linked to particular diseases. For instance, it s been shown that HLA-B5 marker is associated with Hodgkin s disease, and HLA-B27 marker is associated with ankylosing spondylitis. Can there be associations of particular markers with other diseases?... literally scores of studies of HLA-disease associations have been published. Each of these

14 4 studies has attempted to find an HLA association with some particular disease, and a suprising number have succeeded. A few of these associations, for example that between HLA-B27 and ankylosing spondylitis, have been repeatedly confirmed and are very striking and obviously real. Most of the reported associations, however, have not withstood closer examination.... The problem lies in the large number of intercorrelated hypothesis tests, one for each antigen, a procedure that has a high probability of yielding a significant result for at least one of the tests, even when no real association exists. [9] 1.3 Thesis overview This thesis focuses on the implementation and simulation studies of multiple testing procedures that have been recently developed [1]. The objective of experiments performed in these studies is to demonstrate that these procedures based on bootstrapresampling techniques provide asymptotic control for Type I Error rate for a wide range of multiple hypothesis testing problems. Chapter 2, Methods of Multiple Hypothesis Testing, outlines some of the wellknown multiple testing procedures and their features. Chapter 3, Bootstrap-Based Resampling Techinques, introduces the reader to the resampling techniques and discusses their effectiveness and robustness. Chapter 4, Proposed Bootstrap Multiple Testing Procedures, discusses the new procedures. Chapter 5, Simulation Studies, outlines the main objectives of simulation studies and presents the results of a se-

15 5 ries of experiments. Chapter 6, Implementation, is devoted to implementation of these new procedures with the Statistical Analysis System (SAS) programming language. Chapter 7, Conclusion, summarizes the results and discusses the advantages and disadvantages of new procedures in the light of existing procedures and classical methods.

16 Chapter 2 Methods of multiple hypothesis testing Euclid taught me that without assumptions there is no proof. Therefore, in any argument, examine the assumptions. E. T. Bell 2.1 Background: Analysis of Variance Historically, many multiple comparison procedures originate from the test addressing equality of the group means. The framework of the test lies within the Analysis of Variance Procedure. Suppose we want to compare the average effects of k treatments. One question we might ask is, Do all treatments produce the same effect? Then, the null and 6

17 7 alternative hypotheses will be stated as follows: H 0 : µ 1 = µ 2 =... = µ k H 1 : not all the µ s s are equal The extreme value of the F statistic at a given level α only indicates that one or more µ j s significantly differ from the others. The rejection of H 0 gives no information which µ j s are different and which are the same. To answer these kinds of questions, many comparisons of the means might be needed. These problems give rise to Multiple Comparison Methods or more generally Multiple Hypothesis Testing. 2.2 Multiple Hypothesis Testing Multiple Hypothesis Testing is a branch of simultaneus inference and refers to methods designed to simultaneously test two or more hypotheses. The desireable property of such methods or procedures is to control the number of falsely rejected hypotheses in a probabilistic sense. Other considerations are also important; the ability of the method (procedure) to correctly recognize the set of the true hypotheses (known as the power of the test) or the ability of the procedure to account for the dependence structure and/or logical constraints among the set of hypotheses to be tested. To help build the theoretical foundation for the ideas presented in this and the following chapters, let us introduce important notions and terminology that will be

18 8 used throughout the entire paper Basic Concepts Defined Let H 0 and H 1 represent the null and alternative hypotheses, respectively. In Multiple Hypothesis Testing problems we have a collection of null and corresponding alternative hypotheses {(H 0j, H 1j ) : j = 1,..., m}, where m 2 is the number of hypotheses to be tested. A Multiple Hypotheses Test is a procedure or an algorithm that leads to a decision whether or not each null hypothesis should be rejected in a favor of its alternative. A family of ( k 2) hypotheses, for example, can be stated as follows, introducing the problem of pairwise comparisons among k group means: H 0 : µ i = µ j H 1 : µ i µ j, for all i < j i, j {1, 2,..., k} To compare the means among k = 5 groups, for instance, ( ) 5 2 = 5! = 10 pairwise 2!3! comparisons are required. Other examples of families of hypotheses are comparing group means to a control or testing general contrasts. Generally speaking, the parameters of interest in the hypothesis testing are means, variances, covariances, correlations, or parameters of a regression model.

19 9 Overall, the procedures for hypotheses testing can be distinguished by the type of inference or comparisons they make and the strength of the inference they provide. The rejection of a true null hypothesis is classified as Type I error (false positive is another term frequently used in the theory of hypothesis testing). If the procedure fails to reject a false null hypothesis, we say a Type II error (false negative) is committed. In multiple hypothesis testing, therefore, the procedure may result in more than one false positive. In this respect, we can define the rate of false positives or Type I error rate. Commonly-used Type I error rate can be defined as follows [1]: Per-comparison error rate (PCER) is the expected proportion of Type I errors among the m tests. Per-family error rate (PFER) is the expected number of Type I errors. Median-based per-family error rate (mpfer) is the median number of Type I errors. Family-wise error rate (FWER) is the probability of at least one Type I error. Generalized Family-wise error rate (gfwer) is the probability of at least (k+1) Type I errors, where (k + 1) cannot exceed the number of true null hypotheses. Strong and weak control When appropriate, procedures can be compared or classified by the type of control they provide for Type I error rate.

20 10 Definition 1. For any given Multiple Testing Procedure, if P r(reject at least one H i, i = j 1,, j t H j1,, H jt are true) α, for any configuration of true nulls H j1, H jt, then the MTP controls the FWER in the strong sense. Definition 2. A Multiple Testing Procedure is said to control the FWER in the weak sense if P r(reject at least one H i all H i are true) α. Another criterion that can be applied when charicterizing a certain procedure or comparing methods is whether the test is conservative. Definition 3. A hypothesis test is conservative if the actual significance level for the test is smaller than the stated significance level of the test. A conservative test may incorrectly fail to reject the null hypothesis, and thus is less powerful than was expected. Subset pivotality condition Let m be the number of hypotheses being tested and {P 1,..., P m } a vector of unadjusted p-values. (Note that P i s denote the random p-values as opposed to p i s, experimentaly observed p-values.) [2] Definition 4. The distribution of vector {P 1,..., P m } is said to have a subset pivotality property if the joint distribution of the subvector {P i : i K} is identical

21 11 under the restrictions i K H 0i and H C 0, for all subsets K = i 1,..., i j of true null hypotheses. Here, H C 0 denotes the complete null specification (H C 0 = m i=1 {H 0i is true }). As stated in [2]: The subset pivotality condition is important for two reasons. First, resampling is particularly convenient under this condition: resampling is done under the complete null hypothesis H0 C, rather than under partial hypotheses H0 K. Second, when subset pivotality holds, it will be shown that such resampling-based methods control the Family Wise Error rate in the strong sense (at least approximately, under asymptotic subset pivotality). Without this condition, resampling under H C 0 can be assume to control only the FWEC.. In [2], FWEC denotes the FWER calculated under the complete null hypothesis Classification of methods Methods of Multiple Hypothesis Testing can be classified as follows: Single-step methods, and stepwise methods that can be further subdivided into step-up and step-down methods. Definition 5. Single-step methods are simultaneous test procedures that perform equivalent multiplicity adjustment for all tests, regardless of the ordering of the observed p-values p 1,..., p k, and without considering any predetermined sequence of hypotheses.

22 12 Definition 6. Stepwise simultaneous step procedures allow different adjustment techniques for different hypotheses, depending upon how the hypotheses are ordered. Hypotheses may be ordered according to the size of p-values, or experimental or logical constraints. One may acheive the improvement in power and control for Type I error rate by the use of stepwise procedures. In step-down procedures, the hypotheses corresponding to the most significant test statistics (e.g. smallest unadjusted p-values) are considered successively, with further tests depending on outcome of earlier ones. As soon as one fails to reject a null hypothesis, no further hypotheses are rejected.,,[1] In step-up procedures, the hypothesis corresponding to the least significant test statistics are considered successively, again with further tests depending on the outcome of earlier ones. As soon as one hypothesis is rejected, all remaining more significant hypotheses are rejected.,,[1] In the following subsections, we ll examine procedures representing class of singlestep procedures and stepwise procedures. The Bonferroni procedure (Fisher s second procedure), for example, can be considered a typical example of single-step procedure, while Holm procedure is an example of step-down procedures. Newly proposed procedures introduced in Chapter 4 belong to the class of singlestep procedures. Therefore, the focus of the discussion will be on single-step proce-

23 13 dures. 2.3 Single-Step Procedures LSD Test Fisher s first procedure to account for multiplicity is called the protected Least Significant Difference (LSD) test. The procedure is done in two steps: Step 1. Apply ANOVA F-test to test if the means are significantly different. Step 2. If the result is significant, perform multiple t-tests, each at level α. If the result is not significant, no additional tests required. Procedure terminates. Features: The procedure doesn t control for FWER for all configurations of group means. Control for FWER is only provided under the null hypothesis that there is no difference in means. Thus, the test controls for FWER in a weak sense Bonferroni procedure Bonferroni procedure performs multiple t-tests, each at level α = α/ ( k 2). The decision rule can be stated in terms of p-values. If p j < α/k, H 0j should be rejected. For the Bonferroni procedure, adjusted p-values p j can be defined by p j = min(kp j, 1) and equivalently can be used in the decision rule: reject H 0j if p j α.

24 14 Features: The procedure controls for FWER in the strong sense. The method is conservative (fails to account for dependencies among tests) Bonferroni procedure can be used to obtain confidence intervals for all pairwise differences among the group means Šidák s Method Šidák Method rejects H 0j when p j < 1 (1 α) 1/k where p j is the corresponding p-value. The adjusted p-value is computed as follows: p j = 1 (1 p j ) k. Features: Šidák Method is conservative when p-values are not independently distributed Tukey s Method Tukey s method is designed for ( k 2) pairwise comparisons of k individual means, that is H 0 : µ i = µ j versus H 1 : µ i µ j for all i < j, where i, j {1, 2,..., k} The test is performed using confidence intervals for µ i µ j. The construction of confidence intervals involves the studentized range, Q k,v = R, where R is the range S

25 15 of a set of normally distributed random variables and S is their estimated standard deviation. Features: Tukey s method works for one-way balanced ANOVA. The method accounts for dependencies and thus, achieves more power Tukey-Kramer method The Tukey-Kramer method is a generalization of Tukey s method designed to work in the case of unbalanced design. Features: As the differences between the group size increase, the method becomes more conservative. The method controls FWER for means comparisons in a strong sense. The method is more powerful than the Bonferroni, Sidak, or Scheffe methods for pairwise comparisons Scheffé Method Scheffé Method is built within ANOVA framework: If ANOVA gives insignificant results, there will be no significant contrasts declared by the Scheffé Method. Let k

26 16 be the number of means to be compared, then two means, µ i and µ j, are considered to be significantly different if t ij (k 1)F (α; k 1, ν). Observe that the critical value depends on the number of means and not on the number of tests. Features: The method is appropriate for all possible comparisons. Thus, the method is appropriate for pairwise comparisons, general contrasts, or orthogonal contrasts. The method controls for FWER for all possible contrasts. The method is known to be conservative in case of pairwise comparisons. The power of the test increases when the number of comparisons is large comparing to the number of means. For pairwise comparisons, Šidák s method gives more power and thus is preferrable. 2.4 Stepwise Procedures Holm s procedure Holm s method is a step-down procedure based on the Bonferroni inequality. Let p (1), p (2),..., p (k) be the ordered p-values such that p (1) p (2)... p (k), corresponding to null hypotheses H 0(1), H 0(2),..., H 0(k). The decision rule to reject null hypotheses can be conveniently stated in terms of adjusted p-values: reject H 0(k) if

27 17 p (k) α. Here is how one would calculate the adjusted p-values for Holm procedure [2]: p (1) = kp (1) p (2) = max( p (1), (k 1)p (2) ). p (j) = max( p j 1, (k j + 1)p (j) ). p (k) = max( p k 1, p (k) ) Features: The test can be applied to any family of pairwise comparisons (no assumption required about model or distribution). The procedure provides a strong control for FWER. The test is conservative Shaffer method Shaffer s method is an improvement over Holm method. The method incorporates the logical constraints among hypotheses. Features:

28 18 The method controls FWER in a strong sense. The method is more powerful than Holm method (as stated in Westfall) 2.5 Choosing the right method No method is universal enough to be applied to all possible situations where multiple hypotheses testing is required. Since methods can generally be classified by the type and strength of inference they provide, the choice of method depends on the particular question under consideration. Other factors also play an important role, such as sample sizes, assumptions we can make about models and distributions, known or unknown logical constraints, and the dependence structure among test hypotheses. In addition, typically single-step procedures are based on or may lead to construction of simultaneous confidence intervals, while stepwise procedures are generally more powerful but in most cases don t produce simultaneous confidence intervals.

29 Chapter 3 Bootstrap-based resampling techniques On two occasions I have been asked, Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out? I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Charles Babbage 3.1 Introduction The theory of statistical inference essentially resides on the construction of sampling distributons and computing an accuracy measure of a given statistic. Bias and standard error are typically used to assess the accuracy of an estimator. The standard error of the mean can be estimated analytically from its empirical distribution. 19

30 20 This is an example of a traditional way of dealing with a situation. However, the traditional approach may fail or have certain disadvantages. Theoretical formulas require assumptions about the model. If the postulated model is different from the true model or the model changes because of added or removed conditions, the results may be invalid and thus, theoretical formulas will have to be derived again for each new problem. Resampling is a method or rather a class of methods used to obtain samples from a given data set and thus estimate or approximate the accuracy and the sampling distribution of a statistic under consideration. Most popular methods are Bootstrap, Jackknife and Permutation methods. In this chapter we focus on the Bootstrap method since the proposed multiple testing procedures are based on a bootstrap resampling technique. The bootstrap method was introduced in 1979 by Efron as a computer-based resampling method to estimate the standard error of a parameter estimate. (For instance, the bootstrap method can be used to estimate the standard error of a sample mean.) Nowadays, the bootstrap can be applied to a variety of statistical procedures. Here are some usages of the bootstrap methods [3]: The bootstrap estimate of a standard error of a statistic from a single sample. The bootstrap estimate of bias. The bootstrap construction of confidence intervals.

31 21 The bootstrap applied to hypothesis testing problems. 3.2 Basic definitions Let X 1, X 2,..., X n be independent and identically distributed (iid) random variables from unknown distribution F. Let θ be a parameter of the distribution F and let ˆθ be a statistic that estimates the parameter of interest θ. The empirical distribution ˆF, also denoted by F n, is defined by F n (x) = 1 n n I{X i x} i=1 where I{X i x} is the indicator function 1, if X i x; I{X i x} = 0, otherwise. If x i is a realization of random variable X, then x = (x 1, x 2,..., x n ) is a random sample drawn from unknown distribution F. In the construction of empirical distribution ˆF, each x i has a probability of 1/n of occurring. We can also think of a parameter θ as a function of the probability distribution F and a statistic as a function of the sample x. Let θ = τ(f ) and ˆθ = t(x). Definition 7. Estimator ˆθ is called a plug-in estimate of a parameter θ = τ(f ) if ˆθ = τ( ˆF ).

32 22 For instance, while sample mean x = n 1 n i=1 x i is a plug-in estimate of the population mean µ, but sample variance s 2 = (n 1) 1 n i=1 (x i x) 2 is not a plug-in estimate of population variance σ 2. The bootstrap resampling methods can be classified as parametric or nonparametric. The definition below refers to nonparametric bootstrap. Nonparametric bootstrap methods are based on empirical distribution ˆF, while parametric bootstrap methods are based on the ˆFˆθ, an estimate of a parametric model F θ Nonparametric bootstrap Definition 8 (Bootstrap sample). A bootstrap sample x # = (x # 1,..., x # n ) is a random sample of size n where each x # i is obtained with probability 1/n by drawing with replacement from the original sample x = (x 1,..., x n ). For example, say, we have a random sample x of size n = 7 drawn from a distribution F, and x # is one of the possible bootstrap samples, then x = (x 1, x 2, x 3, x 4, x 5, x 6, x 7 ) x # = (x 3, x 7, x 1, x 4, x 3, x 1, x 6 ) is the actual data set is a bootstrap sample. And thus, x # 1 = x 3, x # 2 = x 7, x # 3 = x 1, x # 4 = x 4, x # 5 = x 3, x # 6 = x 1, x # 7 = x 6. A statistic ˆθ # obtained from a bootstrap sample is called a bootstrap replication of

33 23 ˆθ. For instance, if ˆθ = x = n i=1 x i/n is a sample mean, then ˆθ # = x # = n i=1 x# i /n is a bootstrap replication of a sample mean Parametric bootstrap Again, let x = (x 1, x 2,..., x n ) be n independent realizations of random variable X F. If θ is a parameter or a vector of parameters of distribution F, then ˆFˆθ is a parametric estimate of the probability distribution F. A bootstrap sample x # = (x # 1, x # 2,..., x # n ) of size n is drawn from a parametric estimate ˆFˆθ of the true unknown distribution F θ. 3.3 Bootstrap estimate of standard error As defined previously, θ is a parameter of interest from a population described with unknown distribution F. Draw a random sample x from F and calculate an estimate of θ. x = (x 1, x 2,..., x n ) ˆθ. We want to assess the accuracy of ˆθ. Here we outline the algorithm that uses bootstrap resampling method to estimate the standard error of the ˆθ. Step 1 Draw B independent bootstrap samples of same size n. Bootstrap technique can be parametric or nonparametric. When parametric bootstrap is used, replace

34 24 empirical distibution ˆF with the parametric estimate of the population, ˆFˆθ. ˆF x #1 = (x #1 1, x #1 2,..., x #1 n ) ˆF x #2 = (x #2 1, x #2 2,..., x #2 n ) ˆF x #B = (x #B 1, x #B 2,..., x #B n ). Step 2 Obtain a bootstrap replication of ˆθ from each bootstrap sample. x #1 = (x #1 1, x #1 2,..., x #1 n ) ˆθ #1 = t(x #1 ) x #2 = (x #2 1, x #2 2,..., x #2 n ) ˆθ #2 = t(x #2 ). x #B = (x #B 1, x #B 2,..., x #B n ) ˆθ #B = t(x #B ). Step 3 Calculate the sample standard deviation of the B bootstrap replications. This sample standard deviation is the bootstrap estimate of standard error of ˆθ. ŝe B = B (ˆθ #b ˆθ # ( )) 2 B 1 b=1 where ˆθ # ( ) = B b=1 ˆθ #b /B. Number of bootstrap replications needed As stated in [3], 1. Even a small number of bootstrap replications (B = 25) is usually informative.

35 25 B = 50 is often enough to give a good estimate of se F (ˆθ). 2. Very seldom are more than B = 200 replications needed for estimating a standard error. Much bigger values of B are required for bootstrap confidence intervals. 3.4 Bootstrap estimate of bias Let ˆθ be an estimator of the parameter θ. The bias of ˆθ, denoted by bias(ˆθ), is defined as the difference of the expected value of ˆθ and the parameter θ being estimated, bias F (ˆθ) = E F (ˆθ) θ. The estimator ˆθ is called unbiased estimator of θ if E F (ˆθ) = θ. Unbiasness is a desirable property of an estimator. Using bootstrap samples, we ll obtain the bootstrap estimate of bias defined as follows: bias ˆF = E ˆF [t(x # )] τ( ˆF ) Having utilized the algorithm for approximating a sample error of ˆθ, generate B bootstrap samples x #1,..., x #B and compute bootstrap replications ˆθ #b = t(x #b ), b = 1,..., B. Then, bias B = ˆθ # ( ) τ( ˆF )

36 26 where ˆθ # ( ) = B b=1 ˆθ #b /B. Also observe that the formula for bootstrap estimate of a bias uses the plug-in estimate τ( ˆF ). 3.5 Bootstrap confidence intervals The bootstrap-t interval Let θ be a parameter of interest and ˆθ = τ( ˆF ) be a plug-in estimate of θ. In addition to point estimate ˆθ, we may also be interested in constructing an interval to estimate θ with a desired degree of confidence. If α is a real number between 0 and 1, typically taking small values such as 0.01, 0.05, or 0.10, a (1 α) 100% confidence interval can be derived as follows: [ˆθ q (1 α/2) ŝe, ˆθ q (α/2) ŝe] where ŝe can be either a bootstrap estimate or any other reasonable estimate of standard error of ˆθ. And q (α/2) and q (1 α/2) are 100 (α/2) and 100 (1 α/2) percentiles, respectively, of the distribution of random variable Z = (ˆθ θ)/ŝe. Note that random variable Z used here does not necessarily indicate a standard normal distribution. Whenever the normality holds (at least in asymptotic sense), q (α/2) and q (1 α/2) values can be replaced by the standard scores from the standard normal table. For instance, q = z = and q = z = And thus, 95% con-

37 27 difence interval for θ will be constructed as [ˆθ ŝe, ˆθ ŝe]. When Z cannot be assumed to be standard normal or a t-distribution, the bootstrap can be used to obtain an accurate interval. Here is the procedure: Step 1 Generate B bootstrap samples x #1, x #2,..., x #B. Typically, B = 1000 is required for quantile estimation. Step 2 For each bootstrap sample b, compute ˆθ #b = t(x #b ) and the estimated standard error of ˆθ #b denoted by ŝe #b Z #b = ˆθ #b ˆθ ŝe #b Note that when ˆθ is not a sample mean but a more complicated statistics, bootstrap rasampling may be used to estimate ŝe #b for each bootstrap sample b. This results in a nested bootstrap sampling. Step 3 Let Q denote a cumulative distribution of Z #b values. Then, the α/2 quantile of Z #b is estimated by the value ˆt α/2 such that ˆt α/2 = inf{z #b : Q(Z #b ) α/2} Step 4 Construct the bootstrap-t (1 α) 100% confidence intervals: (ˆθ ˆt 1 α/2 ŝe, ˆθ ˆt α/2 ŝe)

38 28 Two main disadvantages of this algorithm are (1) costly computing as the result of two nested levels of bootstrap samples and (2) erratical results in the case of a small sample, nonparametric setting.[3] 3.6 Bootstrap-based hypothesis testing The bootstrap-based hypothesis test proposed by Efron is similar to permutation test introduced by R.A. Fisher in 1930.[3] The procedure is designed to test the null hypothesis that the two probability distributions from which samples are drawn are identical. Let X F and Y G and observe independent random samples, sizes n and m respectively, x = {x 1, x 2,..., x n } and y = {y 1, y 2,..., y m }. H 0 : F = G Here is a bootstrap algorithm: Step 1 Calculate a test statistic (in this case, the difference of means): T (x, y) = x y, where x = n i=1 x i/n and y = m i=1 y i/m.

39 29 Step 2 Form a new sample w of size n + m by combining samples x and y w = (x 1, x 2,..., x n, y 1, y 2,..., y m ). Step 3 Generate B bootstrap samples from w. In each bootstrap sample, let the first n observations form bootstrap sample x # and the remaining m observations form bootstrap sample y #. w #b = (w #b 1, w #b 2,..., w n #b #b, w }{{} n+1, w n+2, #b..., w #b n+m) for all b = 1,..., B. }{{} x # y # Step 4 For each bootstrap sample b, evaluate a bootstrap replication of test statistic (in this case, a difference in means): T (w #b ) = x #b y #b, where x #b = 1 n n i=1 w#b i and y #b = 1 m (n+m) i=n+1 w#b i. Step 5 Approximate bootstrap P -value of the test by ˆP boot -value = 1 B B I(T (w #b ) T (x, y)), b=1 where I( ) is an indicator function. Step 6 Decision Rule: Reject H 0 if ˆP boot -value α, where α is some prespecified level of significance.

40 30 Statement Interpretation P -value <.10 borderline evidence against H 0 P -value <.05 reasonably strong evidence against H 0 P -value <.025 strong evidence against H 0 P -value <.01 very strong evidence against H 0 Table 3.1: Evidence against H 0. Note that studentized statistics could also be used: T (x, y) = x y σ 1/n + 1/m, where σ = [ n i=1 (x i x) 2 + m i=1 (y i y) 2 ]/[n + m 2]. Generally, one can adopt the convention given in table 3.1 to interpret the P -value of the test. 3.7 Robustness and failure of bootstrap Definition 9. A procedure is said to be robust if it is not heavily affected by the violations of assumptions made about the model. In other words, robustness signifies insensitivity to small deviations from assumptions [5].

41 31 Nonparametric bootstrap methods and jackknife methods which will be described further are considered robust since to perform well, they do not require theoretical assumptions about the model. Robust methods offer a remedy when theoretical assumptions about the model are violated; at the same time they do not claim to be efficient when all assumptions are present in the model. For instance, the equality of error variances is one of the classical assumptions of a linear model Y i = β 0 + β 1 X 1i β k X ki + ɛ i. That is, the variance V ar(ɛ i ) = σ 2 is a constant i = 1,..., n. In the event the model assumptions are correct, the bootstrap method will not be the most efficient. However, we can estimate the regression model coefficients by bootstraping if the equality of variances assumption fails. While bootstrap methods are known to be robust and efficient, we should still have consideration for the cases when the bootstrap approach does fail. The major cases of bootstrap failer include: small sample size, distributions with infinite moments, and estimation of extreme values [7]. 3.8 Jackknife as an approximation to Bootstrap Jackknife samples and estimates Jackknife is another popular resampling method used for estimating the bias and standard error of an estimate. Jackknife samples are obtained by removing one obser-

42 32 vation at a time. This technique was introduced by Quenouille in 1949 and precedes bootstrap introduced by Efron in Definition 10. Let x = (x 1, x 2,..., x n ) be a random sample of size n. The jackknife i-th sample is a sample with the i-th observation left out of the original sample: x (i) = (x 1, x 2,..., x i 1, x i+1,..., x n ) where i = 1, 2,..., n. Let θ be a parameter of interest and ˆθ its estimator. If ˆθ (i) = t(x (i) ) is a jackknife replication of ˆθ, the jackknife estimate of bias is defined as bias jack = (n 1)(ˆθ ( ) ˆθ) where ˆθ ( ) = n i=1 ˆθ (i). The jackknife estimate of standard error is defined by n ŝe jack = [ n 1 n n (ˆθ (i) ˆθ ( ) ) 2 ] 1/2 i= Jackknife or Bootstrap? Having introduced the two resampling techniques, we would like to outline further the advantages and appropriateness of usage of one technique over another. Let us introduce a few more definitions.

43 33 Definition 11. A statistic, ˆθ, is said to be a linear statistic if it can be written in the form ˆθ = t(x) = µ + 1 n n α(x i ), i=1 where µ is a constant and α( ) is a function of data. Definition 12. A statistics, ˆθ, is said to be a quadratic statistic if it can be written in the form ˆθ = t(x) = µ + 1 n α(x i ) + 1 β(x n 2 i, x j ) 1 i n 1 i j n where µ is a constant and α and β are functions of data. The examples of linear and nonlinear statistics are the mean and the correlation coefficient, respectively. If ˆθ is a linear statistic, the jackknife and bootstrap estimate of standard errors agree, except for a factor of {(n 1)/n} 1/2 used by jackknife. If ˆθ is a nonlinear statistic, the jackknife makes a linear approximation to the bootstrap. This means, if there is a certain linear statistic that approximates ˆθ, then the jackknife will agree with the bootstrap (again, there is a difference of a factor present). Generally speaking, the accuracy of jackknife estimate of standard error of ˆθ depends on the degree of linearity of ˆθ. If ˆθ is highly nonlinear, the jackknife can be very inefficient [3]. If ˆθ is a quadratic statistic, the jackknife and bootstrap estimates of bias essentially agree for quadratic statistics.

44 34 In the cases where jackknife provides a good approximation to bootstrap, there is an advantage of using jackknife since it s easier to compute. However if the statistic under consideration is not a smooth (differentiable) function of x, the jackknife estimate of standard error is inconsistent, that is the estimator does not converge to the true standard error of the ˆθ.

45 Chapter 4 Proposed bootstrap multiple testing procedures Probability is expectation founded upon partial knowledge. A perfect acquaintance with all the circumstances affecting the occurrence of an event would change expectation into certainty, and leave neither room nor demand for a theory of probabilities. George Boole 4.1 Microarrays as a Motivating Factor Background The objective of this subsection is to provide the reader with the most basic terminology used throughout this section. In no way this is considered a crash course 35

46 36 in genetics. Much of what will be said in the upcoming example has to do with the expression of genes. Therefore, some background will certainly be helpful. A gene is a segment or region of DNA that encodes instructions, which allow a cell to produce a specific product. This product is typically a protein, such as an enzyme. Proteins are used to support the cell structure, break down chemicals, build new chemicals, transport items, and regulate production. Every human being has about 40, 000 putative genes that produce proteins. Many of these genes are always identical from one person to another, but others show variation in different people. The genes determine hair color, eye color, sex, personality, and many other traits that in combination make everyone a unique entity [10]. Every cell of an individual organism will contain the same DNA, carrying the same information. However, a liver cell will be obviously different from a muscle cell for example. The differentiation occurs because not all the genes are expressed in the same way in all cells. The differentiation between cells is given by different patterns of gene activations which in turn control the production of proteins. [10] A gene is active, or expressed, if the cell produces the protein encoded by the gene. If a lot of protein is produced, the gene is said to be highly expressed. If no protein is produced, the gene is not expressed or unexpressed. The objective of researchers is to detect and quantify gene expression levels under particular circumstances. One can

47 37 compare various tissues with each other, or a tumor tissue with the healthy tissue. Gene expression can be used to understand the phenomena related to aging or fetal development. While there have been methods available to look at the expression levels of genes, the problem with those methods was that only a few genes could be analysed at a time [10]. Mircoarrays on the other hand is powerful technology that allows simultaneous measurement of expression levels for up to tens of thousands of genes. The fact that microarrays can interrogate thousands of genes at the same time leads to wide adoption of this technology, but it also creates a number of challeges associated with its use. The classical techniques (such as chi-square test) that were designed to test whether there is a significant difference between the groups considered cannot be applied directly because in microarray expreriments the number of variables (usually thousands of genes) is much greater than the number of experiments (say, tens of experiments) Microarray Experiment Let us consider an experiment comparing the gene expression levels in two different conditions such as healthy tissue vs. tumor. Suppose in our experiment we like to compare 20 genes by comparing 5 tumor samples and 5 healthy tissue samples. The data have been pre-processed and normalized and is presented in table 4.1. The last step in the normalization is division by the global maximum. Thus, all values are between zero and one. The maximum value was an internal control so the value one

48 38 Tumor Control Gene T1 T2 T3 T4 T5 C1 C2 C3 C4 C5 g g g g g g g g g g g g g g g g g g g g Table 4.1: Expression data from two groups of subjects: cancer patients and healthy controls. The data are already normalized [10]. does not actually appear in the data. The task here could be to find those genes that are differentially regulated between cancer and healthy subject.

49 Multiple Hypothesis Testing Model Let P be a data generating distribution and M be a statistical model (parametric or non-parametric). Let X be a random g-vector such that X P M and X = (X(j) : j = 1,..., g). Thus, X 1,..., X n are n iid random variables each of which is a g-vector. In the light of DNA microarray data presented in table 4.1, for a patient i (i = 1,..., n), let x i = (x i (1), x i (2),..., x i (g)) be a realization of random variable X i. Then the data frame might be presented as the one given in table 4.2. With the data set such as the one given in table 4.1 (or the more general setting in table 4.2), a researcher might be interested in comparing the mean expression level of genes from a tumor tissue and healthy (control tissue). This example helps outline the challenges that arise in problems of statistical inference in genomic data analysis: (i) high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables; (ii) large parameter spaces; (iii) a number of variables (hypotheses) that is much larger than the sample size; and (iv) some nonnegligible proportion of false null hypotheses, i.e., true positives [1].

50 40 Tumor (treatment) Group Control Group X k k n X(1) x 1 (1) x 2 (1)... x k (1) x k+1 (1)... x n (1) X(2) x 1 (2) x 2 (2)... x k (2) x k+1 (2)... x n (2) X(3) x 1 (3) x 2 (3)... x k (3) x k+1 (3)... x n (3) X(4) x 1 (4) x 2 (4)... x k (4) x k+1 (4)... x n (4) X(5) x 1 (5) x 2 (5)... x k (5) x k+1 (5)... x n (5) X(6) x 1 (6) x 2 (6)... x k (6) x k+1 (6)... x n (6) X(7) x 1 (7) x 2 (7)... x k (7) x k+1 (7)... x n (7) X(8) x 1 (8) x 2 (8)... x k (8) x k+1 (8)... x n (8) X(9) x 1 (9) x 2 (9)... x k (9) x k+1 (9)... x n (9) X(10) x 1 (10) x 2 (10)... x k (10) x k+1 (10)... x n (10) X(11) x 1 (11) x 2 (11)... x k (11) x k+1 (11)... x n (11) X(12) x 1 (12) x 2 (12)... x k (12) x k+1 (12)... x n (12) X(13) x 1 (13) x 2 (13)... x k (13) x k+1 (13)... x n (13) X(14) x 1 (14) x 2 (14)... x k (14) x k+1 (14)... x n (14) X(15) x 1 (15) x 2 (15)... x k (15) x k+1 (15)... x n (15) X(16) x 1 (16) x 2 (16)... x k (16) x k+1 (16)... x n (16) X(17) x 1 (17) x 2 (17)... x k (17) x k+1 (17)... x n (17) X(g) x 1 (g) x 2 (g)... x k (g) x k+1 (g)... x n (g) Table 4.2: n realizations of a random g-vector.

51 Null hypotheses General definition Let m be the number of null hypotheses and let {M j } m j=1 be the collection of submodels, that is M j M for each j = 1,..., m. Define m null hypotheses and corresponding alternative hypotheses as follows: H 0j I(P M j ) H 1j I(P / M j ) Here I is the indicator function. Special case Typically, corresponding null and alternative hypotheses are defined in terms of single parameters which are the functions of the data generating distributions. Consider an m-vector of parameters µ = (µ(j) : j = 1,..., m) where each µ(j) = µ j (P ) R is a function of the unknown data generating distribution. Let µ 0 (j) be hypothesized null-values. There are two types of testing problems: One-sided tests: H 0j = I(µ(j) µ 0 (j)) H 1j = I(µ(j) > µ 0 (j)), j = 1,..., m.

52 42 Two-sided tests: H 0j = I(µ(j) = µ 0 (j)) H 1j = I(µ(j) µ 0 (j)), j = 1,..., m. Parameters of interest Parameters of interest can be classified as follows: Location parameters (means, difference in means, medians) Scale parameters (standard deviation, covariances and correlations) Regression parameters (slopes, main effects, interactions, parameters for Cox proportional hazard model) Parameters that refer to time-series models, or dose-response models, etc Hypothesis testing Notations and notions Here we introduce some important conventions, notions and notations used in connection with hypothesis testing. Fact 1 In hypothesis testing, each hypothesis is either true or false depending on the true (but uknown) data generating distribution P. Fact 2 Testing procedure results in either rejecting a null hypothesis or failing to do so.

53 43 Type I error Type II error false positive false negative Rejecting a true null hypothesis Failing to reject a false null hypothesis V n is the number of Type I errors U n is the number of Type II errors V n = S n S 0 U n = Sn c S0 c Table 4.3: Type I and Type II errors Recall that m represent the number of null hypotheses being tested. Now we introduce the following notation: True null hypotheses Let S 0 denote the set of true null hypotheses. Then, S 0 = S 0 (P ) = {j : H 0j is true} and m 0 = S 0 is the number of true null hypotheses. False null hypotheses Let S c 0 denote the set of false null hypotheses. Then, S c 0 = S 0 (P ) = {j : H 0j is false} and m 1 = S c 0 is the number of false null hypotheses. Note that m 0 + m 1 = m. Rejected null hypotheses Let S n denote the set of rejected null hypotheses. Then, R n = S n is the number of rejected hypotheses. Multiple testing procedure and types of errors that can be committed The outcome of the multiple testing procedure is the set of rejected null hypotheses, S n. Since S n only estimates S0, c two types of errors can be committed as outlined in table 4.3: rejecting a true null hypothesis and failing to reject a false null hypothesis.

54 44 Type I error rates The set of rejected hypotheses, S n = S(T n, Q 0, α), is the function of 1. Test statistics T n ( where T n = (T n (j) : j = 1,..., m) are the functions of the data X 1,..., X n ) 2. Test statistics null distribution Q 0 (which is used to derive cut-offs) 3. The desired upper bound for Type I error rate (nominal level α) Definition 13. Let F Vn be discrete cumulative distribution function on {0, 1,..., m} for the number of Type I errors, V n. Type I error rate is defined as the parameter θ of the distribution of Type I errors, θ(f Vn ). Let us formally introduce Type I error rates commonly-used in multiple hypothesis testing: Definition 14. Per-comparison error rate is the expected proportion of Type I errors among the m tests P CER E(V n )/m = vdf Vn (v)/m. Definition 15. Per-family error rate is expected number of Type I errors P F ER E(V n ) = vdf Vn (v). Definition 16. Median-based per-family error rate is the median number of Type I errors mp F ER Median(F Vn ) = F 1 V n (1/2).

55 45 Definition 17. Family-wise error rate is the probability of at least one Type I error F W ER P r(v n 1) = 1 F Vn (0). Definition 18. Generalized family-wise error rate is the probability of at least (k+1) Type I errors, k = 0,..., m 0 1 F W ER P r(v n k + 1) = 1 F Vn (k). Assumptions for the parameter θ Given parameter θ such that θ : F θ(f ), we make the following assumptions: Monotonicity. Given two c.d.f. s F 1 and F 2 on {0,..., m}, F 1 F 2 = θ(f 1 ) θ(f 2 ) Uniform continuity. Given two c.d.f. s F 1 and F 2 on {0,..., m}, define the distance measure d by d(f 1, F 2 ) = max x {0,...,m} F 1 (x) F 2 (x). For two sequences of c.d.f. s, {F n } and {G n }, if d(f n, G n ) 0, as n, then θ(f n ) θ(g n ) 0.

56 46 Type I error rate control Definition 19. We say that a multiple testing procedure S n = S(T n, Q 0, α) provides finite sample control of the Type I error rate θ(f Vn ) at level α (0, 1), if θ(f Vn ) α Definition 20. A multiple testing procedure S n = S(T n, Q 0, α) provides asymptotic control of the Type I error rate θ(f Vn ) at level α (0, 1), if lim sup θ(f Vn ) α n Approach to Type I error rate control In multiple testing procedure S n = S(T n, Q 0, α), we use the assumed null distribution Q 0 of the test statistics to derive the cut-offs for the rejection regions. Note that the unknown true distribution, denoted by Q n = Q n (P ), of the test statistics T n defines the number of the false positives V n. The choice of Q 0 is thus crucial since we want to make sure the multiple test procedure provides the required control of Type I error rate under Q n. Let R(Q 0 Q) = R(S(T n, Q 0, α) Q) S(T n, Q 0, α), V (Q 0 Q) = R(S(T n, Q 0, α) Q) S(T n, Q 0, α) S 0 (P ).

57 47 Then R n R(S(T n, Q 0, α) Q n ), R 0 R(S(T n, Q 0, α) Q 0 ), V n V (S(T n, Q 0, α) Q n ), V 0 V (S(T n, Q 0, α) Q 0 ). Control of Type I error rates (θ(f Vn )) is achieved by three-step approach: 1. Null domination conditions for Type I error rate. A null distribution should be selected so that θ(f Vn ) θ(f V0 ) lim sup n θ(f Vn ) θ(f V0 ) [finite sample control] [asymptotic control]. 2. Note that V 0 R 0 and F V0 F R0. Thus, by Monotonicity Assumption, θ(f V0 ) θ(f R0 ). 3. Control the parameter θ(f R0 ), corresponding to the observed number of rejected hypotheses R 0, under the null distribution Q 0, i.e., assuming T n Q 0, θ(f R0 ) α. Steps 1, 2, and 3 lead to the control of Type I error rate as follows: θ(f Vn ) θ(f V0 ) θ(f R0 ) α lim sup n θ(f Vn ) θ(f V0 ) θ(f R0 ) α finite sample control asymptotic control

58 Multiple Testing Procedures Single-step common-quantile procedure Procedure 1. Single-step common-quantile procedure for control of general Type I error rates θ(f Vn ) Given an m-variate null distribution Q 0 and δ [0, 1], define an m-vector, d(q 0, δ) = (d j (Q 0, δ) : j = 1,..., m), of δ-quantiles, d j (Q 0, δ) Q 1 0j (δ) = inf{z : Q 0j(z) δ}, j = 1,..., m, where the Q 0j denote the marginal cumulative distribution functions corresponding to Q 0. For a test of level α (0, 1), choose δ as δ 0 (α) inf{δ : θ(f R(d(Q0,δ) Q 0 )) α}, where R(d(Q 0, δ) Q 0 ) denotes the number of rejected hypotheses for common-quantile cut-offs d(q 0, δ), under the null distribution Q 0 for the test statistics T n. The single-step common-quantile multiple testing procedure for controlling the Type I error rate θ(f Vn ) at level α is defined in terms of the common-quantile cut-offs,

59 49 c(q 0, α) d(q 0, δ 0 (α)), by the following rule. Reject H 0j if T n (j) > d j (Q 0, δ 0 (α)), j = 1,..., m, that is, S(T n, Q 0, α) {j : T n (j) > d j (Q 0, δ 0 (α))}. Here, F Vn denotes the c.d.f. for the number of Type I errors, V n V (d(q 0, δ 0 (α)) Q n ), under the true distribution Q n = Q n (P ) for the test statistics T n. Theorem 1. [Asymptotic control of Type I error rate for single-step common-quantile Procedure 1] Assume that there exists a random m-vector Z Q 0 = Q 0 (P ), so that, for all c = (c j : j = 1,..., m) R m and x {0,..., m}, the joint distribution Q n = Q n (P ) of the test statistics T n satisfies the following asymptotic null domination property with respect to Q 0 lim inf n P r Q n ( j S 0 I(T n > c j ) x ) P r Q0 ( j S 0 I(Z(j) > c j ) x ) AQ0 In other words, the number of Type I errors, V n, under the true distribution Q n = Q n (P ) for the test statistics T n, is stochastically smaller in the limit than the corresponding number of Type I errors, V 0, under the null distribution Q 0 : lim inf n F Vn (x) F V0 (x) x. In addition, suppose that the mapping θ( ) defining the Type I error rate is such that monotonicity and uniform continuity assumptions hold. Then, single-step Procedure 1, with common-quantile cut-offs c(q 0, α) = d(q 0, δ 0 (α)), provides asymptotic control of the Type I error rate θ(f Vn ) at level α.

60 50 That is, lim sup θ(f Vn ) α, n where V n denotes the number of Type I errors for T n Q n (P ) (So, V n V (c(q 0, α) Q n ) = j S 0 I(T n (j) > c j (Q 0, α)). ) Single-step common-cut-off procedure Procedure 2. Single-step common-cut-off procedure for control of general type I error rates θ(f Vn ). Given an m-variate null distribution Q 0 and for a test of level α (0, 1), define a common cut-off e(q 0, α), such that e(q 0, α) inf{c : θ(f R(c,...,c) Q0 ) α}, where we recall that R((c,..., c) Q 0 ) denotes the number of rejected hypotheses for common cut-off c, under the null distribution Q 0 for the test statistics T n. The single-step common-cut-off multiple testing procedure for controlling the Type I error rate θ(f Vn ) at level α is defined in terms of the common cut-offs c(q 0, α) = (e(q 0, α),..., e(q 0, α)), by the following rule: Reject H 0j if T n (j) > e(q 0, α), j = 1,..., m.

61 51 That is, S(T n, Q 0,α ) {j : T n (j) > e(q 0, α)}. Here, F Vn denotes the c.d.f. for the number of Type I errors, V n V ((e(q 0, α),..., e(q 0, α)) Q n ), under the true distribution Q n = Q n (P ) for the test statistics T n Proposed test statistics null distribution Theorem 2. [General construction for null distribution Q 0 ] Suppose there exists known m-vectors λ 0 R m and τ 0 R +m of null-values, so that lim sup n E [T n (j)] λ 0 (j) and lim sup n V ar [T n (j)] τ 0 (j), for j S 0. Let ν 0n (j) min ( ) τ 0 (j) 1, V ar [T n (j)] and define an m-vector Z n by Z n (j) ν 0n (j) (T n (j) + λ 0 E [T n (j)]), j = 1,..., m.

62 52 Suppose that Z n L Z Q 0 (P ). Then, for this choice of null distribution Q 0 = Q 0 (P ), and for all c = (c j : j = 1,..., m) R m and x {0,..., m}, lim inf n P r Q n ( j S 0 I(T n (j) > c j ) x ) P r Q0 ( j S 0 I(Z(j) > c j ) x so that asymptotic null domination Assumption AQ0 in Theorem 1 holds. ) 4.4 Bootstrap-based single step procedures Bootstrap estimation of the null distribution The null distribution Q 0 can be estimated by the distribution of the null-value shifted and scaled bootstrap statistics: Z n # τ 0 (j) (j) min 1, ] V ar P n [T ( [ T # n # n (j) + λ 0 (j) E P n T # n (j) ]) (j) where P n is an estimator of the true data generating distribution P. Procedure 3. Bootstrap estimation of null distribution Q 0 1. Generate B bootstrap samples, ( X b 1,..., X b n), b = 1,..., B. For the bth sample, the X b i, i = 1,..., n are n i.i.d. realizations of a random variable X # P n.

63 53 2. For each bootstrap sample, compute an m-vector of test statistics, T b n = ( T b n(j) : j = 1,..., m ). This can be arranged in an m B matrix, T = ( T b n (j) ), with rows corresponding to the m hypotheses and columns to the B bootstrap samples. 3. Compute row means and variances of the matrix T to yield estimates of E [T n (j)] and V ar [T n (j)], j = 1,..., m. 4. Obtain an m B matrix Z = ( Zn() ) b of null-value shifted and scaled bootstrap statistics Zn(j), b as in Theorem 2, by row shifting and scaling the matrix T using the bootstrap estimates of E [T n (j)] and V ar [T n (j)] and the user-supplied nullvalues λ 0 (j) and τ 0 (j). 5. The bootstrap estimate Q 0n of the null distribution Q 0 from Theorem 2 is the empirical distribution of the columns Z b n of matrix Z. Procedure 4. Bootstrap estimation of common quantiles for Procedure 1 for gfwer control. 1. Apply Procedure 3 to generate an m B matrix Z = ( Zn) b of null-values shifted and scaled bootstrap statistics Zn(j). b The bootstrap estimate Q 0n of the null distribution Q 0 from Theorem 2 is the empirical distribution of the columns Zn b of matrix Z. 2. For Procedure 1, the bootstrap common-quantile cut-offs are simply the row quantiles of the matrix Z. That is d j (Q 0n, δ) is the δ-quantile of the B-vector

64 54 ( Z b n (j) : b = 1,..., B ) of bootstrap statistics for H 0j. d j (Q 0n, δ) inf { z : 1 B } B I(Zn(j) b z) δ b=1 3. For a test with nominal level α (0, 1), δ is chosen as δ 0n (α) inf { δ : θ(f R(d(Q0n,δ) Q 0n )) }. That is, δ 0n (α) corresponds to the smallest cut-offs d(q 0n, δ) such that the value of the mapping θ( ), applied to the distribution of the number of rejections R(d(Q 0n, δ) Q 0n ), under the bootstrap distribution Q 0n, is at most α. In the case of gfwer control, and for a (limit) null distribution Q 0 with continuous and strictly monotone marginal distributions, (1 δ 0n (α)) is the α-quantile of the bootstrap estimate of the distribution of the (k + 1)st ordered unadjusted p-value. Specifically, δ 0n (α) is obtained as follows. (a) Compute an m B matrix, P = ( P b n(j) ), of bootstrap unadjusted p-values, by row-ranking the matrix Z, i.e., by replacing each Z b n(j) by its rank over the B bootstrap samples, where 1 corresponds to the largest value of Z b n and B the smallest. (b) For each column of the matrix P, compute the (k + 1)st smallest p-value, P b n (k + 1). For FWER control (k = 0), simply compute column minima. (c) The estimate (1 δ 0n (α)) is the α-quantile of the B-vector (P b n (k + 1) : b = 1,..., B).

65 Advantages of the Proposed Procedures In this section we summarize main features and advantages of the proposed procedures. The key feature of the proposed procedures is the test statistics null distribution (T Q 0 ) that is used to derive cut-offs and the adjusted p-values. The existing multiple testing procedures, on the other hand, use data generating null distribution (X P 0 ) [1]. Single-step common-quantile and common-cutoff procedures control for the Type I error rate for arbitrary data generating distribution under the asymptotic domination condition for a null distribution. Therefore, there is no need for the subset pivotality condition [1]. The control of the Type I error rate is done under the true data generating distribution P, i.e., under the joint distribution Q n = Q n (P ) of the test statistics T n implied by P. Therefore, the notion of weak and strong control are irrelevant to the proposed procedures [1]. The proposed procedures based on the construction of the test statistics null distribution provide the desired asymptotic Type I error rate control in general testing problems, whereas the currently existing procedures can only be applied to a limited set of multiple testing problems.

66 Chapter 5 Simulation Studies Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin. John von Neumann 5.1 Formulation of the Problem The preceding chapter, Chapter 4 Proposed bootstrap multiple testing procedures, outlines the multiple testing procedures for simultaneous testing of parameters (such as mean) of an arbitrary data generating distribution. In particular, in this paper we focus on single-step common-quantiles multiple testing procedure about the vector of mean values µ = (µ(j) : j = 1,..., m). 56

67 57 For example, a collection of right-sided tests is stated as follows: H 0j = I(µ(j) µ 0 (j)) H 1j = I(µ(j) > µ 0 (j)), j = 1,..., m. The procedure is claimed to asymptotically control the Type I error rate. For rigorous mathematical proof of this theoretical results, please refer to [1]. The data generating distribution can be chosen to be arbitrary so that no particular data model assumptions must be made in advance. Also, the subset pivotality condition is not required as was discussed previously (see chapter 2 and chapter 4). The procedure is based on the consistent estimator of the null distribution of the test statistics generated with the bootstrap algorithm. 5.2 Objectives of Experiments Using the multiple hypothesis testing procedures (MTP) implemented based on the theoretical results outlined in this paper, we d like to perform a series of experiments. All experiments are run with known theoretical probability models where the classical (theoretical) approach can be used to obtain the results. The experiments are performed in both univariate and multivariate settings where the procedures are used to test a family of hypotheses about population means. Tests about other parameters (such as median or parameters of a linear regression model) will not be demonstrated here due to the time constraints.

68 58 The objectives of experiments are to show that the experimental results obtained from implemented procedures coincide with the known theoretical results the MTP work with various distributions (discrete or continuous) the null distribution is (asymptotically) normal the MTP provides (asymptotic) control for type I error rate (FWER α). 5.3 Tests about the Mean Normal Distribution Models Normal Univariate Case Let X be a random variable from N(0, 1), the univariate standard normal distribution with µ = 0 and σ 2 = 1. To simulate this model in our experiment we draw n = 400 independent realizations of X. The empirical distribution of X based on a sample of n = 400 observations is presented by a histogram in figure 5.1. We like to perform the right-sided test of hypothesis about the population mean: H 0 : µ = 0 H 1 : µ > 0

69 59 Normal Distribution Theoretical Model Simulated Model Number of observations N/A 400 Mean Standard Deviation Table 5.1: Summary of postulated and simulated models. Null Distribution Theoretical Bootstrap Estimated Number of observations N/A 1000 Mean Standard Deviation α Critical value Z Test statistic N/A 0.83 Table 5.2: Summary of test statistic null distribution, theoretical and bootstrap estimated. Under the null hypothesis T = X µ 0 (σ/ n) is normaly distributed, T N(0, 1), and we can refer to a table for a critical value for each given level of significance α. For instance, for a level of significance α = 0.05 the table will provide In order to arrive at the conclusion for the test of hypothesis in our experiment, we don t rely on table values or any prior knowledge about the model, but rather use the proposed procedure to construct bootstrap estimate of the null distribution of the test statistic. To compare the theoretical results with the experimental results, please refer to a brief summary of parameters and statistics in tables 5.1 and 5.2. In particular, note that the experiment-based critical value obtained from the estimated null distribution is very close to a table value: 1.63 versus

70 60 Figure 5.1: Empirical Normal distribution. Figure 5.2: Estimated bootstrap distibution.

71 61 The experiment was performed with B = 1000 bootstrap samples drawn from n = 400 observations. As the result of experiment, we can also see that in the case of univariate normal distribution model the procedure controls for FWER. (In this reduced case, since the family of hypotheses consists of only one hypothesis, FWER, the probability of at lease one false positive, is just the probability of type I error). According to our experiment, the FWER is estimated as which is less or equal to nominal level α = 0.05, F W ER α. The bootstrap estimated null distriution is shown on figure 5.2. One of the objectives of the experiment is to show that this constructed null distribution of the test statistic is normal. H 0 : bootstrap estimated distribution is normal H 1 : bootstrap estimated distribution is not normal The goodness-of-fit tests presented in table 5.3, according to the provided p-value, all agree in the conclusion: At the 5% level of significance, fail to reject the assumption that bootstrap estimated null distribution is normal. Normal Multivariate Case Let X = (X(j) : j = 1, 2,..., 20) be a random vector, where a random variable X(j) is from N(0, 1), j = 1, 2,..., 20. To simulate this model in our experiment we draw n = 400 independent realizations of X. We like to perform 20 right-sided tests

72 62 Test Statistic DF p Value Kolmogorov-Smirnov D = > Cramer-von Mises W-Sq = > Anderson-Darling A-Sq = > Chi-Square Chi-Sq = > Table 5.3: Summary of test statistic null distribution, theoretical and bootstrap estimated. of hypothesis about the population mean vector, µ = (µ(j) : j = 1,..., 20): H 0j : µ(j) = 0 H 1j : µ(j) > 0 Under the null hypothesis the vector of test statistics T = (T (j) : j = 1, 2,..., 20) is multivariate normal, N(µ, Σ), where T (j) N(0, 1), µ = 0 is a mean vector µ = (0, 0,..., 0), and Σ = I is a covariance matrix such that Σ =

73 63 If all X(j) s are independent (the condition implied by the covariance matrix given) and the nominal level α is set to 0.05, then we could calculate the theoretical critical values. Let q be a critical value such that the probability of at least one false positive is Since all X(j) s are iid, let F (x) = P (X i x) X(j). Then, P (at least one X(j) > q) = 1 (F (q)) 20 = Solving for q results in the theoretical critical value q = The experiment was performed with B = 1000 bootstrap samples drawn from the dataset. Table 5.4 presents 20 values for the test statistics and the corresponding experimental critical values. Compare the theoretically obtained critical value q = with the experimentaly obtained values. They are pretty close. Based on the decision rule and experimental critical values none of the 20 hypotheses is rejected. The estimated FWER is found to be < α. In reality, we don t always have independent and uncorrelated structure among the test statistics. So, finding the theoretical cutoff value does not always seem feasible. Instead, we ll have to rely on the bootstrap multiple testing procedures to get the cutoff values from the estimated test statistics null distribution. In table 5.4, also note that we would have to reject Hypotheses 7 and 10 if we performed 20 single hypothesis tests, each at the α = 0.05 level of significance, ignoring the multiplicity effect. We would have announced a significant difference, whereas the effect (as we know from the initial conditions of our experiment) is merely due to a chance.

74 64 Hypothesis Test Statistic Critical Value Decision Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Fail to Reject Table 5.4: The results of the twenty simultaneous tests about the mean vector of the multivariate normal distribution.

75 Poisson Distribution Models Poisson Univariate Case Let X be a random variable from a Poisson distribution, X f X (x) = e λ λ x /x!, x = 0, 1, 2,.... Then, E(X) = λ and V ar(x) = λ. We like to simulate a Poisson Distribution model with λ = 2.5, so generate n = 400 observations from P oisson(λ = 2.5). The empirical distribution of X is shown in figure 5.3. To perform the right-sided test of hypothesis about the population mean, we set up the null and alternative hypotheses as follows: H 0 : µ = 2.5 H 1 : µ > 2.5 Under the null hypothesis T = X µ 0 (σ/ n) is normaly distributed, T N(0, 1), and we can refer to a table for a critical value for each given level of significance α. For instance, for a level of significance α = 0.05 the table will provide Let us construct a bootstrap estimate of the null distribution of the test statistic and compare the theoretically expected number for a critical value with the experimental one. Please refer to a brief summary of parameters and statistics in tables 5.5 and 5.6. In particular, the experiment-based critical value obtained from the estimated null distribution is 1.48 versus theoretical value of

76 66 Poisson Distribution Theoretical Model Simulated Model Number of observations N/A 400 Mean Standard Deviation Table 5.5: Summary of postulated and simulated models. Null Distribution Theoretical Bootstrap Estimated Number of observations N/A 1000 Mean Standard Deviation α Critical value Z Test statistic N/A Table 5.6: Summary of test statistic null distribution (from poisson), theoretical and bootstrap estimated. Figure 5.3: Empirical Possion Distribution (λ = 2.5).

77 67 Figure 5.4: Estimated Bootstrap Distribution. Poisson Multivariate Case Let X = (X(j) : j = 1, 2,..., 20) be a random vector, where a random variable X(j) is from P oisson(λ = 2.5), j = 1, 2,..., 20. To simulate this model in our experiment we draw n = 400 independent realizations of X. We like to perform 20 right-sided tests of hypothesis about the population mean vector, µ = (µ(j) : j = 1,..., 20): H 0j : µ(j) = 2.5 H 1j : µ(j) > 2.5

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Multiple Comparison Methods for Means

Multiple Comparison Methods for Means SIAM REVIEW Vol. 44, No. 2, pp. 259 278 c 2002 Society for Industrial and Applied Mathematics Multiple Comparison Methods for Means John A. Rafter Martha L. Abell James P. Braselton Abstract. Multiple

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 13 Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates Sandrine Dudoit Mark

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Lec 1: An Introduction to ANOVA

Lec 1: An Introduction to ANOVA Ying Li Stockholm University October 31, 2011 Three end-aisle displays Which is the best? Design of the Experiment Identify the stores of the similar size and type. The displays are randomly assigned to

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 164 Multiple Testing Procedures: R multtest Package and Applications to Genomics Katherine

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 14 Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate Mark J. van der Laan

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Bootstrap Confidence Intervals

Bootstrap Confidence Intervals Bootstrap Confidence Intervals Patrick Breheny September 18 Patrick Breheny STA 621: Nonparametric Statistics 1/22 Introduction Bootstrap confidence intervals So far, we have discussed the idea behind

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

H0: Tested by k-grp ANOVA

H0: Tested by k-grp ANOVA Analyses of K-Group Designs : Omnibus F, Pairwise Comparisons & Trend Analyses ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures

More information

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr. Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Single gene analysis of differential expression

Single gene analysis of differential expression Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

H0: Tested by k-grp ANOVA

H0: Tested by k-grp ANOVA Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures Alpha estimation reconsidered H0: Tested by k-grp ANOVA Regardless

More information

CH.9 Tests of Hypotheses for a Single Sample

CH.9 Tests of Hypotheses for a Single Sample CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power.

Announcements. Unit 3: Foundations for inference Lecture 3: Decision errors, significance levels, sample size, and power. Announcements Announcements Unit 3: Foundations for inference Lecture 3:, significance levels, sample size, and power Statistics 101 Mine Çetinkaya-Rundel October 1, 2013 Project proposal due 5pm on Friday,

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination McGill University Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II Final Examination Date: 20th April 2009 Time: 9am-2pm Examiner: Dr David A Stephens Associate Examiner: Dr Russell Steele Please

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Multiple Hypothesis Testing in Microarray Data Analysis

Multiple Hypothesis Testing in Microarray Data Analysis Multiple Hypothesis Testing in Microarray Data Analysis Sandrine Dudoit jointly with Mark van der Laan and Katie Pollard Division of Biostatistics, UC Berkeley www.stat.berkeley.edu/~sandrine Short Course:

More information

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600 Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions

Purposes of Data Analysis. Variables and Samples. Parameters and Statistics. Part 1: Probability Distributions Part 1: Probability Distributions Purposes of Data Analysis True Distributions or Relationships in the Earths System Probability Distribution Normal Distribution Student-t Distribution Chi Square Distribution

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES

A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES A TUTORIAL ON THE INHERITANCE PROCEDURE FOR MULTIPLE TESTING OF TREE-STRUCTURED HYPOTHESES by Dilinuer Kuerban B.Sc. (Statistics), Southwestern University of Finance & Economics, 2011 a Project submitted

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5) STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject

More information

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

EC2001 Econometrics 1 Dr. Jose Olmo Room D309 EC2001 Econometrics 1 Dr. Jose Olmo Room D309 J.Olmo@City.ac.uk 1 Revision of Statistical Inference 1.1 Sample, observations, population A sample is a number of observations drawn from a population. Population:

More information

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation, Sampling Distributions & Hypothesis Testing Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which

More information

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS THUY ANH NGO 1. Introduction Statistics are easily come across in our daily life. Statements such as the average

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap University of Zurich Department of Economics Working Paper Series ISSN 1664-7041 (print) ISSN 1664-705X (online) Working Paper No. 254 Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV Theory of Engineering Experimentation Chapter IV. Decision Making for a Single Sample Chapter IV 1 4 1 Statistical Inference The field of statistical inference consists of those methods used to make decisions

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

COMPARING SEVERAL MEANS: ANOVA

COMPARING SEVERAL MEANS: ANOVA LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing STAT 515 fa 2016 Lec 20-21 Statistical inference - hypothesis testing Karl B. Gregory Wednesday, Oct 12th Contents 1 Statistical inference 1 1.1 Forms of the null and alternate hypothesis for µ and p....................

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

PRINCIPLES OF STATISTICAL INFERENCE

PRINCIPLES OF STATISTICAL INFERENCE Advanced Series on Statistical Science & Applied Probability PRINCIPLES OF STATISTICAL INFERENCE from a Neo-Fisherian Perspective Luigi Pace Department of Statistics University ofudine, Italy Alessandra

More information

Single Sample Means. SOCY601 Alan Neustadtl

Single Sample Means. SOCY601 Alan Neustadtl Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Prerequisite Material

Prerequisite Material Prerequisite Material Study Populations and Random Samples A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina Multiple Testing Tim Hanson Department of Statistics University of South Carolina January, 2017 Modified from originals by Gary W. Oehlert Type I error A Type I error is to wrongly reject the null hypothesis

More information