Inferential Statistics Hypothesis tests Confidence intervals

Similar documents
Inferential Statistics

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

1 Statistical inference for a population mean

1 Hypothesis testing for a single mean

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Performance Evaluation and Comparison

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

The problem of base rates

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

STAT 461/561- Assignments, Year 2015

Data Mining. CS57300 Purdue University. March 22, 2018

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

One-way ANOVA. Experimental Design. One-way ANOVA

HYPOTHESIS TESTING. Hypothesis Testing

Visual interpretation with normal approximation

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

hypothesis a claim about the value of some parameter (like p)

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Introductory Econometrics

Lec 1: An Introduction to ANOVA

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lecture 11 - Tests of Proportions

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Garvan Ins)tute Biosta)s)cal Workshop 16/7/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Alpha-Investing. Sequential Control of Expected False Discoveries

Statistical Inference. Hypothesis Testing

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Single Sample Means. SOCY601 Alan Neustadtl

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Probability and Statistics Notes

Confidence Interval Estimation

Psychology 282 Lecture #4 Outline Inferences in SLR

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MTMS Mathematical Statistics

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Review. December 4 th, Review

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Statistics 251: Statistical Methods

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Probabilistic Inference for Multiple Testing

An inferential procedure to use sample data to understand a population Procedures

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Journal Club: Higher Criticism

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Sampling Distributions

Basic Concepts of Inference

Announcements. Proposals graded

Chapter 9 Inferences from Two Samples

Applied Statistics for the Behavioral Sciences

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Multiple samples: Modeling and ANOVA

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Inference for Single Proportions and Means T.Scofield

7.1 Basic Properties of Confidence Intervals

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lectures 5 & 6: Hypothesis Testing

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Advanced Experimental Design

Lecture 21: October 19

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

simple if it completely specifies the density of x

Lecture 12 November 3

Probability and Statistics

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Chapter 8 - Statistical intervals for a single sample

Marginal Screening and Post-Selection Inference

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Advanced Statistical Methods: Beyond Linear Regression

Statistical Inference

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Statistical Inference

Detection theory. H 0 : x[n] = w[n]

Estimation of a Two-component Mixture Model

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Hypothesis testing: theory and methods

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Summary of Chapters 7-9

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Confidence Intervals and Hypothesis Tests

Stat 231 Exam 2 Fall 2013

Transcription:

Inferential Statistics Hypothesis tests Confidence intervals Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it

Part G. Multiple tests Part H. Confidence intervals 1. Introduction 2. Confidence interval for the mean (a of a Normal variable known variance (b of a Normal variable unknown variance (c of a variable with unknown distribution (approximate 3. Different levels 1 α 4. Confidence intervals and tests 1

Part G. Multiple tests We may need to conduct many hypothesis tests concurrently Suppose each test is conducted at level α. For any one test, the chance of a false rejection of the null is α. But the chance of at least one false rejection is much higher Examples: Measuring the state of anxiety by questionnaire in two groups of subjects. Various questions help define the level of anxiety. As more questions are compared, it becomes more likely that the two groups will appear to differ on at least one topic by random chance alone. Efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. It becomes more likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom. 2

Consider m hypothesis tests: H i 0 and Hi 1 for i = 1,..., m Example For α = 0.05 and m = 2 Probability to retain both H 1 0 and H2 0 when true: (1 α2 = 0.95 2 = 0.90 Probability to reject at least one true hypothesis: 1 (1 α 2 = 1 0.95 2 = 0.10 For α = 0.05 and m = 20 Probability to reject at least one true hypothesis: 1 (1 α 20 = 1 0.95 20 = 0.64 α There are many ways to deal with this problem. Here we discuss two methods 3

Bonferroni (B Method Let p 1,..., p m denote the m p-values for these tests. Reject null hypothesis H i 0 if p i α m The probability of falsely rejecting any null hypotheses is less than or equal to α Example (continue m = 2: 1 (1 (0.05/2 2 = 0.0493 m = 20: 1 (1 (0.05/20 20 = 0.0488 Benjamini-Hochberg (BH Method 1. Let p (1 < < p (m denote the ordered p-values 2. Reject all null hypotheses H (i 0 for which p (i < i m α If the tests are not independent the value to compare p (i appropriately adjusted is 4

Example Consider the following 10 (sorted p-values. Fix α = 0.05 p=c(0.00017,0.00448,0.00671,0.00907,0.01220,0.33626,0.39341, 0.53882,0.58125,0.98617 alpha=0.05; m=length(p; i=seq(1,m; b=i*alpha/m; BH=(p<b; B=(p<alpha/m; cbind(p,b,bh,b p b BH B [1,] 0.00017 0.005 1 1 [2,] 0.00448 0.010 1 1 [3,] 0.00671 0.015 1 0 [4,] 0.00907 0.020 1 0 [5,] 0.01220 0.025 1 0 [6,] 0.33626 0.030 0 0 [7,] 0.39341 0.035 0 0 [8,] 0.53882 0.040 0 0 [9,] 0.58125 0.045 0 0 [10,] 0.98617 0.050 0 0 Reject H i 0 for - i = 1, 2 with Bonferroni method - i = 1, 2, 3, 4, 5 with Benjamini-Hochberg method 5

Abuse of test Warning! There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis (from Wassermann A summary of the paper by Regina Nuzzo. (2014 Statistical Errors P values, the gold standard of statistical validity, are not as reliable as many scientists assume. Nature, vol. 506, p. 150-152 Ronald Fisher 1920s intended p-values as an informal way to judge whether evidence was significant for a second look one part of a fluid, non-numerical process that blended data and background knowledge to lead to scientific conclusions Interpretation the p-value summarizes the data assuming a specific null hypothesis 6

Caveats tendency to deflect attention from the actual size of an effect P-hacking or significance-chasing, including making assumptions, monitor data while it is being collected, excluding data points,... Measures that can help look for replicability do not ignore exploratory studies nor prior knowledge report effect sizes and confidence intervals take advantage of Bayes rule (not part of this course unfortunately try multiple methods on the same data set adopt a two-stage analysis, or preregistered replication 7

Part H Confidence intervals 1. Introduction 2. Confidence interval for the mean (a of a Normal variable known variance (b of a Normal variable unknown variance (c of a variable with unknown distribution (approximate 3. Different levels 1 α 4. Confidence intervals and tests 8

1. Introduction Let θ be a real valued parameter and L and U two real valued functions of the random sample X = (X 1,..., X n such that L(x U(x for all instances x of the random sample. Then (L, U is called an interval estimation for θ If P (θ (L, U 1 α then (L, U is a 1 α confidence interval 1 α is called the coverage of the confidence interval Usually 1 α = 0.95 We have over a 1 α chance of covering the unknown parameter with the estimator interval (from Casella Berger 9

From Wassermann Warning! (L, U is random and θ is fixed Warning! There is much confusion about how to interpret a confidence interval. A confidence interval is not a probability statement about θ since θ is a fixed quantity [... ] Warning! Some texts interpret confidence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 1 α percent of the time, e.g. 95% of the time. This is correct but useless since we rarely repeat the same experiment over and over. [... ] Rather - day 1: θ 1 collect data construct a 95% IC for θ 1 - day 2: θ 2 collect data construct a 95% IC for θ 2 - day 3: θ 3 collect data construct a 95% IC for θ 3 -... Then 95 percent of your intervals will trap the true parameter value. There is no need to introduce the idea of repeating the same experiment over and over 10

2. Confidence interval for the mean of a random variable X 1,..., X n i.i.d. random sample. Parameter of interest µ Point estimator : X; point estimate x (sample value of X at the observed data points Confidence interval or Interval estimator with coverage 1 α: ( X δ, X + δ with δ such that P ( X δ < µ < X + δ = 1 α The limit of the interval X δ and X +δ are random variables The sample confidence interval is: (x δ, x + δ How to compute δ? Using the (exact or approximate distribution of the point estimator X 11

2. (a Confidence interval for the mean of a Normal variable known variance For X 1,..., X n i.i.d. sample random variables with X 1 N (µ, σ 2 X N ( µ, σ2 n or Z = X µ σ/ n N (0, 1 1 α = P ( X δ < µ < X + δ = P ( µ δ < X < µ + δ Example X 1 N (µ, 4 n = 9 1 α = 0.95 X N ( µ, 4 9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 µ δ µ µ + δ 12

Computation of δ 1 α = P ( µ δ < X < µ + δ ( µ δ µ = P σ < X µ σ < µ + δ µ σ ( n n n = P δ σ n < Z < δ σ n δ σ n σ = z 1 α/2 δ = z 1 α/2 n Density functions of - Z N (0, 1 - X N (µ, 4/9 z 1 0.05/2 = 1.96 δ = 1.31 0.0 0.1 0.2 0.3 0.4 0.5 0.6 1.96 0 1.96 µ δ µ µ + δ Confidence interval for µ: ( X z 1 α/2 σ n, X + z 1 α/2 σ n 13

Sample confidence interval for µ: ( x z 1 α/2 σ n, x + z 1 α/2 σ n - we do not know if µ belongs or not to this sample interval whose limits are computed using the sample value x - another x the interval would be different Among all possible confidence intervals constructed as before, 95% contains µ and 5% does not Simulation for 100 samples: n = 80 σ 2 = 4 1 α = 95% ( x 1.96 2/ 80, x + 1.96 2/ 80 6 intervals do not contain µ 14

2. (b Confidence interval for the mean of a Normal variable unknown variance For X 1,..., X n random sample and X 1 N (µ, σ 2 as point estimator of µ and σ 2 take X and S 2 respectively Consider the random variable T = X µ S/ n t [n 1] The computation of the confidence interval for µ is similar to the normal case (X t 1 α/2 S n, X + t 1 α/2 S n 2. (c Confidence interval for the mean of a random variable with unknown distribution If the sample size is large we can use the approximate distribution of X via CLT: (X z 1 α/2 S n, X + z 1 α/2 S n 15

3. Different coverage coefficients 1 α A 95%-confidence interval or a 99%-confidence interval? Values of the 1 α/2 quantile of a standard normal random variable N (0, 1: z 0.950 = 1.64 z 0.975 = 1.96 z 0.995 = 2.58 0.90 0.95 0.99 What is gained in precision is lost in range Example X N (µ, 4/80 and assume x = 2.5: - at 90%: δ = 0.37 sample confidence interval (1.92, 3.08 - at 95%: δ = 0.44 sample confidence interval (2.06, 2.94 - at 99%: δ = 0.58 sample confidence interval (2.13, 2.87 16

5. Confidence intervals and tests Parameter of interest µ of a N (µ, σ with σ known Two-sided 1 α confidence interval for µ and two-sided test at level α (H 0 : µ = µ 0 and H 1 : µ µ 0 H 0 is retained for x (µ 0 δ, µ 0 + δ The sample confidence interval is (x δ, x + δ µ ( 0 ( x ( x A B where δ = z 1 α/2 σ n in both cases The interval where H 0 is retained is centered in µ 0 while the confidence interval is centered in x If the sample confidence interval contains µ 0 then H 0 is retained, and viceversa 17

Compare tests and confidence intervals in R output Example. Chicago Tribune (continue > np=750*0.2347; prop.test(np,750,0.25 1-sample proportions test with continuity correction data: np out of 750, null probability 0.25 X-squared = 0.85654, df = 1, p-value = 0.3547 alternative hypothesis: true p is not equal to 0.25 95 percent confidence interval: 0.2051343 0.2670288 sample estimates: p 0.2347 > prop.test(np,750,0.25,"less" 1-sample proportions test with continuity correction data: np out of 750, null probability 0.25 X-squared = 0.85654, df = 1, p-value = 0.1774 alternative hypothesis: true p is less than 0.25 95 percent confidence interval: 0.0000000 0.2617696 sample estimates: p 0.2347 Remark. If the parameter of interest is a proportion p then δ is different for confidence intervals and tests, because it depends on the standard deviation. In the first case it is calculated using the sample value ˆp, in the second one using p 0 18