COMPARING GROUPS PART 1CONTINUOUS DATA

Size: px

Start display at page:

Download "COMPARING GROUPS PART 1CONTINUOUS DATA"

Valentine Terry
5 years ago
Views:

1 COMPARING GROUPS PART 1CONTINUOUS DATA Min Chen, Ph.D. Assistant Professor Quantitative Biomedical Research Center Department of Clinical Sciences Bioinformatics Shared Resource Simmons Comprehensive Cancer Center Lecture 4 July 9, 2013 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 1 / 38 OUTLINE 1 REVIEW 2 INTRODUCTION 3 COMPARISON OF TWO GROUPS Parametric tests Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 2 / 38

REVIEW: (1 α)% CONFIDENCE INTERVAL OF THE MEAN Lower Limit : L = X z α/2 s n Standard Normal Distribution: µ = 0,σ = 1 Upper Limit : U = X + z α/2 s n Min Chen (QBRC/CCBSR) Comparing groups

2 REVIEW: (1 α)% CONFIDENCE INTERVAL OF THE MEAN Lower Limit : L = X z α/2 s n Standard Normal Distribution: µ = 0,σ = 1 Upper Limit : U = X + z α/2 s n Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 3 / 38 REVIEW OF CONFIDENCE INTERVAL FROM SMALL SAMPLE As a rule of thumb, if sample size, N < 30, use the formula below. (1 α)% Confidence Interval: X ± t α/2,n 1 s n where t α/2 is the (α/2)th quantile of the t-distribution with (n -1) degrees of freedom. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 4 / 38

3 REVIEW: INTERPRETATION OF CI The CI: Pr(L(X) θ U(X)) = 1 α. It is temping to state the probability that the θ lies between two numbers, L and U, is (1 α). Wrong because θ is a fixed number; L(X) and U(X) are random variables, not numbers. On average 95% times the calculated intervals will contain the true population parameter θ. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 5 / 38 RELATIONSHIP BETWEEN TYPE I ERROR (α) AND POWER Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 6 / 38

4 PARAMETRIC VS NON-PARAMETRIC Parametric tests Assume data follow some known distribution E.g., Normal, t-distribution, chi-square, Binomial distribution etc. Compare means, variances Non-parametric tests Don t assume a form of distribution Compare other measures of central tendency (e.g., median, or location shift) Useful for skewed data, small samples, ordinal data Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 7 / 38 NOTATION Population parameter Sample value Mean µ X Standard deviation σ s Variance σ 2 s 2 Sample Size n Sample Value x i Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 8 / 38

5 ONE SAMPLE t TEST Recall one - sample t-test: t = X µ 0 s/ n Test statistic for comparing the mean of one group against a fixed value. General form of a t-statistic is t = T-statistic follows a t-distribution! difference of means. standard error Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 9 / 38 STUDENT S t-distribution Here is how to generate a Student s t random variable: T ν = Z, V/ν where Z is a standard normal distribution; V has a chi-squared distribution with ν degrees of freedom (df), i.e., V = ν Zi 2 i=1 where Z i are iid standard normal r.v. s. (Recall E[Zi 2 ]=1. So E[V]=ν.) Z and V are independent. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 10 / 38

6 t A FAMILY OF DISTRIBUTIONS IDENTIFIED BY df Recall t = X µ 0 s/ n = ( X µ 0 )/ σ n. s 2 /σ 2 Approaches Normal distribution as df increases. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 11 / 38 SMALL SAMPLE VS. LARGE SAMPLE Recall in CI, as a rule of thumb, if sample size n < 30, use the t statistic for the (1 α)% confidence Interval: while for large samples we have X ± t α/2,n 1 s n X ± z α/2 s n. The reason is when sample size is large, t α/2,n 1 z α/2. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 12 / 38

7 OUTLINE 1 REVIEW 2 INTRODUCTION 3 COMPARISON OF TWO GROUPS Parametric tests Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 13 / 38 COMPARING MEANS OF PAIRED SAMPLES In paired samples each data point in one sample is matched to another data point in the second sample. Same subject Measured at 2 time points Before and after intervention Two eyes (Left, Right) Two organs (Heart, Liver) Matched subjects Experimental animal, Pair-fed Match Male, Female Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 14 / 38

8 COMPARING MEANS OF TWO INDEPENDENT SAMPLES Two independent samples Subjects are unrelated in two separate groups; Sample sizes may be different in each group, (n 1,n 2 ) Variances in each group may be Equal, σ 2 1 = σ 2 2 Unequal, σ 2 1 = σ 2 2 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 15 / 38 EXAMPLE 1 In a hypertension research study, subjects are given dietary counseling to restrict their sodium intake. Data on urinary sodium from 8 subjects at Baseline (Week 0), and Week 1, are shown. Subject Week 0 Week 1 Change X s Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 16 / 38

9 EXAMPLE 1(CONTD.) Subject Week 0 Week 1 Change X s Q1:Paired samples or two independent samples? Q2: Is there a change in mean levels of urinary sodium after 1 week? Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 17 / 38 PAIRED t-test Example 1 has paired sample data (since same subject was measured at two time points). Compute the mean and standard deviations of differences. H 0 : µ 1 µ 2 = c vs. H a : µ 1 µ 2 = c X t = d c s d / n, which follows a t-distribution with (n 1) degreesoffreedom. If t > tn 1 (1 α/2), reject H 0. Here tn 1 (1 α/2) is the (1 α/2) quantile of T n 1. P value = Pr(T n 1 > t ). Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 18 / 38

EXAMPLE 1 Values shown in bold red have been modified from

10 REJECTION REGIONS Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 19 / 38 PAIRED T-TEST USING EXCEL EXAMPLE 1 Values shown in bold red have been modified from original data. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 20 / 38

11 EXAMPLE 2 A study was performed to compare the mean ERG (electroretinogram) amplitude of patients with different genetic types of retinitis pigmentosa (RP), a genetic eye disease that often results in blindness. Data was collected in patients of age years with different genetic types. Genetic type Mean ± SD N Dominant 0.85 ± Recessive 0.38 ± Table shows values for natural log of ERG. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 21 / 38 EXAMPLE 2(CONTD.) Q1:Paired samples or two independent samples? Q2: Is there a difference in mean log(erg) amplitude between patients with dominant RP versus those with the recessive form? Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 22 / 38

12 TWO-SAMPLE t-test WITH EQUAL VARIANCES Example 2 has two independent samples. H 0 : µ 1 = µ 2 vs. H a : µ 1 = µ 2 X t = 1 X 2, s 1 p n n 2 which follows a t-distribution with (n 1 + n 2 2) degreesoffreedom,where is the pooled variance. s 2 p = (n 1 1)s 2 1 +(n 2 1)s 2 2 n 1 + n 2 2 If t > t n 1 +n 2 2 (1 α/2), reject H 0. P value = Pr(T n 1 > t ). Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 23 / 38 TWO-SAMPLE t-test FOR EQUAL VARIANCES USING EXCEL EXAMPLE 2 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 24 / 38

13 COMPARING VARIANCES In Example 2, the two-sample t-test for independent samples assumed that variances were equal Variance of Group 1 = Variance of Group 2 Note that σ 2 1 = = and σ 2 2 = = Is equal variance assumption true? Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 25 / 38 COMPARING VARIANCES To compare variances, we conduct a hypothesis test to exam if the ratio of variances is equal to 1. H 0 : σ 2 1 σ 2 2 which follows an F-distribution. = 1 vs. H a : σ 2 1 σ 2 2 Test statistic: f = s2 1 s 2, 2 = 1 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 26 / 38

14 F-DISTRIBUTION Here is how to generate a F random variable: where F ν1,ν 2 = V 1/ν 1 V 2 /ν 2, V 1 and V 2 have chi-squared distributions with ν 1 and ν 2 degrees of freedom (df), respectively. V 1 and V 2 are independent. Recall E[V]=ν. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 27 / 38 F-DISTRIBUTION F-distribution is a family of distributions that are identified by numerator and denominator degrees of freedom (df). F-distribution are always right-skewed; Have numerator and denominator df. Recall f = s2 1 s 2 = s2 1 /σ s 2 2 /σ 2 2. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 28 / 38

15 REJECTION REGIONS FOR THE F-TEST Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 29 / 38 F-TEST FOR COMPARING VARIANCES H 0 : σ 2 1 σ 2 2 = 1 vs. H a : σ 2 1 σ 2 2 Test statistic: f = s2 1 s 2, 2 = 1 which follows an F-distribution with (n 1 1,n 2 1) degreesoffreedom. If f > F n1 1,n 2 1(1 α/2) or f < F n1 1,n 2 1(α/2), Reject H 0. If f 1, then P value = 2 Pr(F n1 1,n 2 1 > f ); If f < 1, then P value = 2 Pr(F n1 1,n 2 1 < f ). Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 30 / 38

16 F-TEST FOR EQUALITY OF VARIANCES USING EXCEL EXAMPLE 2 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 31 / 38 TWO-SAMPLE t-test WITH UNEQUAL VARIANCES H 0 : µ 1 = µ 2 vs. H a : µ 1 = µ 2 X t = 1 X 2, s 2 1 n 1 + s2 2 n 2 which follows a t-distribution with d degrees of freedom, where s d 2 = 1 /n 1 + s 2 2 /n 2 2. s 2 1 /n 1 n s2 2 /n 2 n 2 1 Round d down to nearest integer and call it d. If t > t d (1 α/2), reject H 0. P value = Pr(T d > t ). Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 32 / 38

17 EXAMPLE 3 Aresearchstudyaimedtoassessthefamilialaggregationofcholesterol levels by collecting data on children of age 2- to 14-years. Cholesterol levels (mg/dl) were collected in one group of children (say, cases ) whose father died from heart disease. Data were also collected in historical control group of children of same age. Group Mean ± SD N Cases ± Historical Control ± Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 33 / 38 EXAMPLE 3(CONTD.) Paired sample or two independent samples? Is there a difference in mean cholesterol levels between Cases and Historical Control group? Which statistical test should we use? Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 34 / 38

18 F-TEST FOR EQUALITY OF VARIANCES USING EXCEL EXAMPLE 3 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 35 / 38 TWO-SAMPLE t-test FOR UNEQUAL VARIANCES USING EXCEL EXAMPLE 3 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 36 / 38

19 ADVANTAGES OF PAIRED SAMPLES Suppose we want to test H 0 : µ 1 = µ 2 vs. H a : µ 1 = µ 2 Test statistic is related to X 1 X 2. The variance is: Var( X 1 X 2 )=Var( X 1 )+Var( X 2 ) 2ρ 12 Var( X 1 ) Var( X 2 ) The positive correlation ρ 12 in paired-samples reduces the variance of the difference, yielding more powerful test than the independent sample design. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 37 / 38 REFERENCES I Rafia Bhore. Lecture notes. Berman, Nancy (2007). Comparison of Means. In Methods in Molecular Biology, Vol 404: Topics in Biostatistics, edited by W. T. Ambrosius. Humana Press Inc., Totowa, NJ, USA. Rosner, Bernard (2000). Fundamentals of Biostatistics, 5th edition. Duxbury Press, California, USA. Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 38 / 38

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings Yujin Chung October 4th, 2016 Fall 2016 Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30 Previous Two types of statistical