Two-sample inference: Continuous data

Similar documents
Two-sample inference: Continuous data

Two-sample inference: Continuous Data

Two-sample Categorical data: Testing

One-sample categorical data: approximate inference

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

CHAPTER 10 Comparing Two Populations or Groups

Two-sample Categorical data: Testing

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Multiple samples: Modeling and ANOVA

Power and sample size calculations

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

The Central Limit Theorem

Harvard University. Rigorous Research in Engineering Education

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Business Statistics. Lecture 5: Confidence Intervals

Chapter 9 Inferences from Two Samples

One-Sample and Two-Sample Means Tests

First we look at some terms to be used in this section.

The Central Limit Theorem

Ch. 7. One sample hypothesis tests for µ and σ

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

THE SAMPLING DISTRIBUTION OF THE MEAN

Correlation. Martin Bland. Correlation. Correlation coefficient. Clinical Biostatistics

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

10.1. Comparing Two Proportions. Section 10.1

STA Module 10 Comparing Two Proportions

Rama Nada. -Ensherah Mokheemer. 1 P a g e

Lecture 10: Generalized likelihood ratio test

Chapter 12 - Lecture 2 Inferences about regression coefficient

Normal Curve in standard form: Answer each of the following questions

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Sampling (Statistics)

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 18: Simple Linear Regression

Lecture 2. Estimating Single Population Parameters 8-1

Chapter 10: Comparing Two Populations or Groups

Stat 20 Midterm 1 Review

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Chapter 10: Comparing Two Populations or Groups

Do students sleep the recommended 8 hours a night on average?

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Confidence Intervals with σ unknown

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Didacticiel Études de cas. Parametric hypothesis testing for comparison of two or more populations. Independent and dependent samples.

Lecture 11 - Tests of Proportions

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Two sample hypothesis testing

Sampling distribution of GLM regression coefficients

P-values and statistical tests 3. t-test

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

appstats27.notebook April 06, 2017

Sampling Distributions: Central Limit Theorem

Chapter 5 Confidence Intervals

Confidence intervals

Confidence intervals CE 311S

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Statistics Introductory Correlation

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Chapter 10: Comparing Two Populations or Groups

Chapter 27 Summary Inferences for Regression

One-factor analysis of variance (ANOVA)

L6: Regression II. JJ Chen. July 2, 2015

Lecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Statistical Inference, Populations and Samples

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Statistical Inference

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Review of Statistics 101

Advanced Experimental Design

INFERENCE FOR REGRESSION

Introduction to Survey Analysis!

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Treatment of Error in Experimental Measurements

Chapter 23. Inference About Means

Quadratic Equations Part I

Warm-up Using the given data Create a scatterplot Find the regression line

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

STAT 536: Genetic Statistics

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Inferential statistics

Count data and the Poisson distribution

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

PLSC PRACTICE TEST ONE

Transcription:

Two-sample inference: Continuous data Patrick Breheny April 6 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 1 / 36

Our next several lectures will deal with two-sample inference for continuous data As we will see, some procedures work very well when the continuous variable follows a roughly normal distribution, but work poorly when it doesn t We will begin with the procedures that work well for data that (at least approximately) follows a normal distribution Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 2 / 36

Example: lead exposure and IQ Our motivating example for today deals with a study that we ve looked at a few times concerning lead exposure and neurological development for a group of children in El Paso, Texas The study compared the IQ levels of 57 children who lived within 1 mile of a lead smelter and a control group of 67 children who lived at least 1 mile away from the smelter In the study, the average IQ of the children who lived near the smelter was 89.2, while the average IQ of the control group was 92.7 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 3 / 36

Looking at the data Percent of Total 40 30 20 10 0 Far Near 40 30 20 10 140 120 100 80 60 40 60 80 100 120 140 0 Far Near IQ Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 4 / 36

Could the results have been due to chance? Looking at the raw data, it appears that living close to the smelter may hamper neurological development However, as always, there is sampling variability present just because, in this sample, the children who lived closer to the smelter had lower IQs does not necessarily imply that the population of children who live near smelters have lower IQs We need to ask whether or not our finding could have been due to chance, and what other explanations are consistent with the data Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 5 / 36

The difference between two means We can calculate separate confidence intervals for the two groups using one-sample t-distribution methods However, as we have said several times, the better way to analyze the data is to use all of the data to answer the single question: is there a difference between the two groups? Denoting the two groups with the subscripts 1 and 2, we will attempt to test the hypothesis that µ 1 = µ 2 by looking at the random variable x 1 x 2 and determining whether it is far away from 0 with respect to variability (standard error) First, we will go over an exact, albeit computer/labor-intensive, approach (the permutation test ); then talk about an approximate approach that is much easier to do by hand (the two-sample t-test ) Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 6 / 36

Viewing our study as balls in an urn The same concept that we encountered in the two-sample Fisher s exact test can be used for continuous data also If the smelter had no effect on children s neurological development, then it wouldn t matter if the child lived near it or not Under this null hypothesis, then, it would be like writing down each child s IQ on a ball, putting all the balls into an urn, then randomly picking out 57 balls and calling them the near group, and calling the other 67 balls the far group If we were to do this, how often would we get a difference as large or larger than 3.5, the actual difference observed? Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 7 / 36

Results of the experiment 600 Frequency 400 200 0 10 5 0 5 10 Difference Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 8 / 36

Results of the experiment (cont d) In the experiment, I obtained a random difference in means larger than the observed one 1,807 times out of 10,000 My experimental p-value is 1807/10000=.18 Thus, our suggestive-looking box plots could easily have come about entirely due to chance Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 9 / 36

Experiment examples Examples of recreations of the experiment under the null hypothesis in which a difference as large or larger was obtained: 140 140 120 120 IQ 100 IQ 100 80 80 60 60 Far Near Far Near Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 10 / 36

This approach to carrying out a hypothesis test is called a permutation test The different orders that a sequence of numbers can be arranged in are called permutations; a permutation test is essentially calculating the percent of random permutations under the null hypothesis that produce a result as extreme or more extreme than the observed value Unlike Fisher s exact test, exact solutions are very time-consuming to calculate (this problem is much harder than Fisher s exact test) Unless the number of observations is small enough that we can count the permutations by hand, we need a computer to perform a permutation test Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 11 / 36

Approximate results based on the normal distribution As you might guess from the look of the histogram, a much easier way to obtain an answer is to use the normal distribution as an approximation Letting d = x 1 x 2, consider the test statistic: d d 0 SE d = x 1 x 2 SE d But what s the standard error of the difference between two means, SE d? Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 12 / 36

The standard error of the difference between two means Suppose we have x 1 with standard error SE 1 and x 2 with standard error SE 2 (and that x 1 and x 2 are independent) Then the standard error of x 1 x 2 is SE d = SE 2 1 + SE 2 2 Note the connections with both the root-mean-square idea and the square root law from earlier in the course Note also that SE 2 1 + SE 2 2 < SE 1 + SE 2 a two-sample test is a more powerful way to look at differences than two separate analyses Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 13 / 36

The split This equation would be perfect if we knew SE 1 and SE 2 But of course we don t all we have are estimates There are two ways of settling this question, and they have led to two different forms of the two-sample t-test Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 14 / 36

Approach #1: Student s t-test The first approach was invented by W.S. Gosset (Student) His approach was to assume that the standard deviations of the two groups were the same If you do this, then you only have one extra source of uncertainty to worry about: the uncertainty in your estimate of the common standard deviation Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 15 / 36

Approach #2: Welch s t-test The second approach was invented by B.L. Welch He generalized the two-sample t-test to situations in which the standard deviations were different between the two groups If you don t make Student s assumption, then you have two extra sources of uncertainty to worry about: uncertainty about SD 1 and uncertainty about SD 2 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 16 / 36

vs. Welch s test We ll talk more about the difference between the two tests at the end of class In most situations, the two tests provide similar answers However, is much easier to do by hand Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 17 / 36

The pooled standard deviation In order to estimate a standard error, we will need an estimate of the common standard deviation This is obtained by pooling the deviations To calculate a standard deviation, we took the root-mean-square of the deviations (only with n 1 in the denominator) To calculate a pooled standard deviation, we take the root-mean-square of all the deviations from both groups (only with n 1 + n 2 2 in the denominator) Essentially, the pooled standard deviation is a weighted average of the standard deviations in the two groups, where the weights are the number of observations in each group Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 18 / 36

The pooled standard deviation and the standard error Letting SD p denote our pooled standard deviation, our estimated standard error is SDp 2 SE d = + SD2 p n 1 n 2 = SD p 1 n 1 + 1 n 2 This equation is similar to our earlier one, only now the amount by which the SE is reduced in comparison to the SD depends on the sample size in each group Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 19 / 36

Sample size and standard error So, let s say that we have 50 subjects in one group and 10 subjects in the other group, and we have enough money to enroll 20 more people in the study To reduce the SE as much as possible, should we assign them to the group that already has 50, or the group that only has 10? Let s check: 1 70 + 1 10 = 0.34 1 50 + 1 30 = 0.23 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 20 / 36

The advantages of balanced sample sizes This example illustrates an important general point: the greatest improvement in accuracy/reduction in standard error comes when the sample sizes of the two groups are balanced Occasionally, it is much easier (or cheaper) to obtain (or assign) subjects in one group than in the other In these cases, one often sees unbalanced sample sizes However, it is rare to see a ratio that exceeds 3:1, as the study runs into diminishing returns no matter how much you reduce the standard error of x 1, the standard error of x 2 will still be there Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 21 / 36

The degrees of freedom of the two-sample t-test We have said that if we make the assumption of equal standard deviations, then we only need to worry about the variability of x 1 x 2 and the variability in your estimate of the common standard deviation How variable is our estimate of the common standard deviation? Well, it s now based on n 1 + n 2 2 degrees of freedom (we lose one degree of freedom for each mean that we calculate) Thus, to perform inference, we will look up results on Student s curve with n 1 + n 2 2 degrees of freedom Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 22 / 36

Student s t-test: procedure The procedure of Student s two-sample t-test should look quite familiar: #1 Estimate the standard error: SE d = SD p 1 n 1 + 1 n 2 #2 Calculate the test statistic t = x 1 x 2 SE d #3 Calculate the area under the Student s curve with n 1 + n 2 2 degrees of freedom curve outside ±t Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 23 / 36

Student s t-test: example For the lead study, The mean IQ for the children who lived near the smelter was 89.2 The mean IQ for the children who did not live near the smelter was 92.7 The pooled standard deviation was 14.4 The individual standard deviations were 12.2 and 16.0 note that the pooled standard deviation is close to the average of the individual standard deviations, but slightly closer to the SD for the far group since that group had the larger sample size Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 24 / 36

Student s t-test: example 1 #1 Estimate the standard error: SE d = 14.4 57 + 1 67 = 2.59 #2 Calculate the test statistic: t = 92.7 89.2 2.59 = 1.35 #3 For Student s curve with 57 + 67 2 = 122 degrees of freedom, 17.9% of the area lies outside 1.35 Thus, if there was no difference in IQ between the two groups, we would have observed a difference as large or larger than the one we saw about 18% of the time; the study provides very little evidence of a population difference Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 25 / 36

Carrying out t-tests in R As always, it is useful to know how to carry out these tests using statistical software; here s how to carry out t-tests in R: Student s t-test: > t.test(iq ~ Smelter, lead_iq, var.equal=true) Two Sample t-test t = 1.3505, df = 122, p-value = 0.1793... Welch s t-test: > t.test(iq ~ Smelter, lead_iq) Welch Two Sample t-test data: IQ by Smelter t = 1.38, df = 120.62, p-value = 0.1702... Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 26 / 36

Student s t-test, Welch s t-test, and the permutation test Note that Student s t-test agrees quite well with our permutation test from earlier, and with Welch s test Permutation: p =.181 Student s: p =.179 Welch s: p =.170 This is usually the case when the sample sizes are reasonably large: the approximations work well, agreeing both with each other and with exact approaches Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 27 / 36

Our hypothesis test suggests that there may be no difference in IQ between the two groups Is it possible that there is a difference, even a large difference? This is the kind of question answered by a confidence interval Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 28 / 36

: Procedure The procedure for calculating confidence intervals is straightforward: 1 #1 Estimate the standard error: SE d = SD p n 1 + 1 n 2 #2 Determine the values ±t x% that contain the middle x% of the Student s curve with n 1 + n 2 2 degrees of freedom #3 Calculate the confidence interval: ( x 1 x 2 t x% SE d, x 1 x 2 + t x% SE d ) Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 29 / 36

: Example For the lead study: 1 #1 The standard error is SE d = 14.4 57 + 1 67 = 2.59, and the difference between the two means was 92.7 89.2 = 3.5 #2 The values ±1.98 contain the middle 95% of the Student s curve with 57 + 67 2 = 122 degrees of freedom #3 Thus, the 95% confidence interval is: (3.5 1.98(2.59), 3.5 + 1.98(2.59)) = ( 1.63, 8.61) So, although we cannot rule out the idea that lead has no effect on IQ, it s possible that lead reduces IQ by as much as 8 and a half points (or increases it by a point and a half) Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 30 / 36

vs. t-tests Welch vs. Student Should I use a permutation test or a t-test? The difference between a permutation test or a t-test is whether one trusts the normal approximation to the sampling distribution Thus, if n is large, it doesn t matter; the sampling distribution will always be approximately normal (because of the Central Limit Theorem) If n is small, then this is a tough choice the difference between the two tests can be sizable, and there is little data with which to determine normality In practice, however, permutation tests are not very common The reason is that, if one is concerned about normality, it is usually better to choose one of the methods that we will talk about in the next lecture Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 31 / 36

vs. t-tests Welch vs. Student Should I use or Welch s test? In the earlier example, and Welch s test were basically the same, even though the standard deviations of the two groups differed by about 30% This is often the case However, the two tests will provide different answers when: The standard deviations of the two groups are very different, and The number of subjects in each group is also very different Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 32 / 36

Tradeoff vs. t-tests Welch vs. Student So which test is better? This is a complicated question and depends on a number of factors, but the basic tradeoff is this: When the standard deviations are fairly close, Student s t-test is (slightly) more powerful for small sample sizes When they are not, Student s t-test is unreliable it is making an assumption of common variability and that assumption is inaccurate Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 33 / 36

Power in equal SD setting vs. t-tests Welch vs. Student Welch Student 1.0 0.8 Power 0.6 0.4 0.2 0.0 2 4 6 8 10 Sample size per group Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 34 / 36

Unequal SD setting More samples from the low-variance group means inflated type I error: vs. t-tests Welch vs. Student More samples from the high-variance group means low power: Welch Student Welch Student 0.5 1.0 Type I error rate 0.4 0.3 0.2 0.1 Power 0.8 0.6 0.4 0.2 0.0 0.0 10 12 14 16 18 Number in group 1 10 12 14 16 18 Number in group 1 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 35 / 36

vs. t-tests Welch vs. Student We discussed three tests that work well when the data is normally distributed: Student s t-test Welch s t-test assumes equal standard deviations; Welch s test does not Know how to carry out hypothesis tests and construct confidence intervals based on Student s approach Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 36 / 36