Hypothesis Testing in Action: t-tests

Size: px

Start display at page:

Download "Hypothesis Testing in Action: t-tests"

Jemimah Cunningham
5 years ago
Views:

1 Hypothesis Testing in Action: t-tests Mark Muldoon School of Mathematics, University of Manchester Mark Muldoon, January 30, 2007 t-testing - p. 1/31

2 Overview large Computing t for two : reprise Today we ll examine four data sets and use hypothesis tests to explore them. Differences in proportions: The Boston aspirin study Differences in means: the t-tests and H.H. Koh s macular pigment data Confidence intervals revisited: confidence with t Are my data normally qq-plots Mark Muldoon, January 30, 2007 t-testing - p. 2/31

3 The Boston aspirin study large Computing t for two : reprise In a famous and very large study during the 1980 s, several hospitals in the Boston area worked together to conduct a placebo controlled, double-blind study of the efficacy of aspirin in preventing heart attacks. The results were: Group N-attacks N Patients Aspirin Placebo Is this an important difference? Mark Muldoon, January 30, 2007 t-testing - p. 3/31

4 Setup large Computing t for two : reprise The first question to ask is: How likely is this difference to have arisen by chance? We begin with a hypothesis test based on a z-score that addresses this question. Null Hypothesis The two proportions are the same. Alternative Hypothesis Either The two proportions differ (two-sided test) or A specific one of the proportions is larger (one-sided test). Mark Muldoon, January 30, 2007 t-testing - p. 4/31

5 Differences of proportions, large large Computing t for two : reprise The ingredients for this test are two experimentally observed proportions: p 1 = (r 1 /n 1 ) and p 2 = (r 2 /n 2 ). (a) As the null hypothesis is that the proportions are the same, combine the data to get a single estimate of the underlying proportion: p = r 1 + r 2 n 1 + n 2 (b) Estimate the standard error of the difference between the two measured proportions: ( 1 SE = p(1 p) + 1 ) n 1 n 2 Mark Muldoon, January 30, 2007 t-testing - p. 5/31

6 Differences of proportions, continued large Computing t for two : reprise (c) Compute z = p 1 p 2 SE p 1 p 2 = ( ) 1 p(1 p) n n 2 (d) Consult the table for the standard normal. Mark Muldoon, January 30, 2007 t-testing - p. 6/31

7 Application to the aspirin data large Computing t for two : reprise (a) Under the null hypothesis our best estimate fo p is p = ( )/( ) (b) The standard error of the difference is then ( 1 SE = p(1 p) ) (c) The z-score is z = (189/11034) (104/11037) Mark Muldoon, January 30, 2007 t-testing - p. 7/31

8 Application to the aspirin data large Computing t for two : reprise (d) This is a massively implausible z-score: we can reject the null hypothesis in favour of the alternative that the aspirin group has fewer heart attacks with confidence 99.99%. Mark Muldoon, January 30, 2007 t-testing - p. 8/31

9 Visual pigments and macular large Computing t for two : reprise The next two examples involve measurements of Macular Pigment Optical Density (MPOD) collected from two groups: patients suffering from macular and healthy control subjects. Raw data are 10 total measurements per subject, collected in two sessions of 5 measurements and with around a 30 minute break in between. The MPOD is the difference between measurements taken at central fixation and another in the periphery (5 degree visual angle). All measurements are on healthy eyes even among the patients, each of whom had only one d eye. These data were collected by Ms. Hui Hiang Koh (now Dr. Koh) and her advisor, Dr. Ian Murray. Mark Muldoon, January 30, 2007 t-testing - p. 9/31

10 Are the two groups different? large Computing t for two : reprise Patients Controls MPOD SEM MPOD SEM m x m y s x s y Mark Muldoon, January 30, 2007 t-testing - p. 10/31

11 Testing for differences large Computing t for two : reprise If anything, the patients seem to have more pigment than the controls. Is this apparent difference significant? Test with a new hypothesis test, the Two Sample t-test, designed for differences in the means of small. Null Hypothesis MPOD for Patients and Controls are drawn from the same normal distribution (same mean, same variance). Alternative Hypothesis MPOD for the two groups drawn from normal distributions with different means, but the same variance. This will involve a two-sided test based on a new statistic, t. Mark Muldoon, January 30, 2007 t-testing - p. 11/31

12 Folklore large Computing t for two : reprise The t-test was developed by W.S. Gosset ( ), a statistician who worked for the Guiness brewing company. Employees of the firm were not allowed to publish under their own names so he wrote under the pseudonym Student. The t-statistic is: similar to a z-score, but is applicable when the sample is too small to assume that s 2 x and s 2 y provide good estimates of the variances; this advantage comes at a small cost: the t-distribution (and hence the tables one consults to use it) are less straightforward than those for the normal distribution; depends on the size of the when this grows large the distribution of t tends to the normal. Mark Muldoon, January 30, 2007 t-testing - p. 12/31

13 Distribution of t large Computing t for two : reprise t Student s t-distribution for ν = 2, 4 and 8. The dashed curve at the top is the standard normal distribution (µ = 0, σ = 1). Mark Muldoon, January 30, 2007 t-testing - p. 13/31

14 Computing t for two large Computing t for two : reprise The ingredients are a confidence level C and two of lists of numbers, say, {x 1, x 2,..., x Nx } and {y 1, y 2,..., y Ny }. (a) Computes the two sample means, m x and m y. Recall that, for example, Nx j=1 m x = x j. N x (b) Computes the two standard deviations, s x and s y. Recall that, for example, Ny s 2 j=1 y = (y j m y ) 2. (N y 1) Mark Muldoon, January 30, 2007 t-testing - p. 14/31

15 Two-sample t continued large Computing t for two : reprise (c) Computes the pooled standard deviation, s, which satisfies (d) Last, one computes s 2 = (N x 1)s 2 x + (N y 1)s 2 y (N x 1) + (N y 1) t = m x m y s N x N y N x + N y. (e) Consult the t-table for ν = N x + N y 2 degrees of freedom.. Mark Muldoon, January 30, 2007 t-testing - p. 15/31

16 Testing the MPOD data large Computing t for two : reprise Working through the recipe, N x = N y = 9 and: (a) Patients had m x = 0.293; Controls had m y = (b) Patients had s x = 0.135; Controls had s y = (c) The pooled variance is thus s 2 = (N x 1)s 2 x + (N y 1)s 2 y (N x 1) + (N y 1) = 8(0.135)2 + 8(0.142) Mark Muldoon, January 30, 2007 t-testing - p. 16/31

17 Testing the MPOD data, continued large Computing t for two : reprise (d) Thus t = m x m y N x N y s N x + N y = (e) This is smaller than the critical value, 2.120, for a two-sided test with ν = 16 degrees of freedom at 95% confidence. We cannot reject the null hypothesis. Mark Muldoon, January 30, 2007 t-testing - p. 17/31

18 Paired sample design large Computing t for two : reprise The considerable variation within the groups may make it hard to see whether there is much systematic difference between groups. Design a new type of study in which Patients and Controls are matched for age, gender, eye (left or right), iris colour and smoking habits. Compare with the Paired Sample t-test. Mark Muldoon, January 30, 2007 t-testing - p. 18/31

19 Computing t for paired large Computing t for two : reprise The only ingredients are a confidence level C and a list of N pairs of numbers {(x 1, y 1 ),..., (x N, y N )}. Null hypothesis is that the two members of each pair are drawn from normal distributions having the same mean. All the distributions for the x s are assumed to share the same variance as are all the y s, but the variance shared by the x s need not equal that shared by the y s. Mark Muldoon, January 30, 2007 t-testing - p. 19/31

20 Computing t for paired, continued large Computing t for two : reprise (a) Compute the differences δ j = (x j y j ); (b) Compute the mean of the differences N j=1 m = δ j N ; (c) Estimate the variance of the differences N s 2 j=1 = (δ j m) 2 ; N 1 Mark Muldoon, January 30, 2007 t-testing - p. 20/31

21 Computing t for paired, concluded large Computing t for two : reprise (d) Computes the paired-sample t-statistic t = m N s (e) Check against critical values in the t-table, here using ν = N 1 degrees of freedom.. Mark Muldoon, January 30, 2007 t-testing - p. 21/31

22 Paired MPOD data, differences large Computing t for two : reprise MPOD Control Patient δ Mark Muldoon, January 30, 2007 t-testing - p. 22/31

23 Paired MPOD data, conclusions large Computing t for two : reprise The mean difference is m = with standard deviation s = This leads to t = m N s = This far exceeds the critical value, 2.306, for a two-sided test at the 95% confidence level (α = 0.025, ν = 8). We can reject the null hypothesis and conclude that the difference is nonzero. Mark Muldoon, January 30, 2007 t-testing - p. 23/31

24 Confidence intervals for large : reprise large Computing t for two : reprise Recall: confidence intervals for a population mean based on large. (1) Choose a confidence level C (C = 0.95 for 95%) and define α = 1 C. (2) Use the standard normal table to find that z-score, z α/2, such that P( z z α/2 ) = (α/2). (3) If your sample has mean m the desired confidence interval for the population mean µ is: m z α/2 SEM µ m + z α/2 SEM Mark Muldoon, January 30, 2007 t-testing - p. 24/31

25 Confidence intervals: small large Computing t for two : reprise When sample is small (say, N < 30) then one cannot assume that the sample standard deviation s is a good estimate of that for the population, σ: that s why Gosset developed the t-test. His statistic comes into small-sample confidence intervals too: (1) Choose a confidence level C and define α = 1 C. (2) Use the t-table with ν = N 1 degrees of freedom to find that t-score, t α/2, ν, such that P( t t α/2, ν ) = (α/2). (3) Desired confidence interval is: m t α/2, ν SEM µ m + t α/2, ν SEM Mark Muldoon, January 30, 2007 t-testing - p. 25/31

26 Are my data normally large Computing t for two : reprise The hypotheses for the t-tests all involve normal distributions: how does one check whether the data are normally Impossible to answer definitely: exact distribution is a property beyond the reach of measurement. Informative standard graphical methods are available. Mark Muldoon, January 30, 2007 t-testing - p. 26/31

27 Making a qq-plot large Computing t for two : reprise Sole ingredients is a list of, say, N numbers. (a) Sort the data into ascending order so that x 1 x 2... x N (b) Assign a cumulative probability to each x j p j = (j 0.5) N Mark Muldoon, January 30, 2007 t-testing - p. 27/31

28 Making a qq-plot, continued large Computing t for two : reprise (c) Work out the z-score that would be associated with each cumulative probability p j Φ(zj) = pj One has to use the z-score table in reverse for this. (d) Plot the pairs (z j, x j ). If the data really are normally distributed then the points will lie near to the line x j = σz j + µ where µ and σ are the population mean and standard deviation and for the x s. Mark Muldoon, January 30, 2007 t-testing - p. 28/31

29 Example: uniformly distributed data pdf x Samples x j Quantiles from std. normal A distribution that is uniform over the interval 0 x 1 and the normal distribution with the same mean and variance. The qq-plot for a sample of 50 values drawn from the uniform distribution at left. The dashed line is x = s z + m where m and s are the sample mean and standard deviation for the x s. Mark Muldoon, January 30, 2007 t-testing - p. 29/31

30 Example: normally distributed data pdf The standard normal distribution: µ = 0, σ = 1. x Samples x j Quantiles: std. normal The qq-plot for a sample of 50 values drawn from the normal distribution at left: notice that the dots are concentrated along the dashed line. Mark Muldoon, January 30, 2007 t-testing - p. 30/31

31 The Shapiro-Wilk Test large Computing t for two : reprise One can take seriously the remark: If the data really are normally distributed then the points will lie near to the line x j = σz j + µ where µ and σ are the population mean and standard deviation and for the x s. from a few slides back and, more-or-less, try to fit a line to the pairs (z j, x j ). A goodness-of-fit test on the line (about which we ll learn more later in the term) then yields a test statistic and a numerical p-value. The evaluation of this statistic is somewhat more involved than my sketch suggests and the test requires special tables, so one normally resorts to software, for example the shapiro.test() function in R. Mark Muldoon, January 30, 2007 t-testing - p. 31/31

Hypothesis Testing in Action

Hypothesis Testing in Action Jonathan Bagley School of Mathematics, University of Manchester Jonathan Bagley, September 23, 2005 The t-tests - p. 1/23 Overview Today we ll examine three data sets and use