Statistics - Lecture 05 Nicodème Paul Faculté de médecine, Université de Strasbourg http://statnipa.appspot.com/cours/05/index.html#47 1/47
Descriptive statistics and probability Data description and graphical representation Mean, median, quartiles, standard deviation, interquartile range (IQR) Barplot, histogram and boxplot Decisions are based on probability calculation Notion of random variables and distributions Binomial and normal distributions http://statnipa.appspot.com/cours/05/index.html#47 2/47 2/47
Estimation μ σ 2 Notion of parameters (, ) Di erence between estimate and estimator Xˉ S 2 The sample mean and the sample variance are estimators The Central Limit Theorem and sampling distribution Interval estimate or con dence interval http://statnipa.appspot.com/cours/05/index.html#47 3/47 3/47
Hypothesis testing Parametric tests and test procedure Notion of null and alternative hypotheses Test statistic and its sampling distribution Critical values and critical regions P-values http://statnipa.appspot.com/cours/05/index.html#47 4/47 4/47
Relation between variables and goodness of t Notion of correlation Non parametric tests and goodness of t Notion of association between categorical variables The χ 2 test The Fisher exact test http://statnipa.appspot.com/cours/05/index.html#47 5/47 5/47
Examples You have been assigned 12 consenting experimental subjects, each of whom has a brain tumour of the same size and type. Four are allocated at random to an untreated control group, four are treated with the drug Tumostat and four more with the drug Inhibin 4. After two months of treatment, the diameter of each tumour is remeasured. Survival times in ve human cancer (stomach, bronchus, colon, ovary, breast). Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proceedings of the National Academy of Science USA, 75, 4538-4542 Comparison of 5 pretreated patches to reduce mosquito human contact. Bhatnagar, A and Mehta, VK (2007) E cacy of Deltamethrin and Cy uthrin Impregnated Cloth over Uniform against Mosquito Bites. Medical Journal Armed Forces India, 63, 120-122 http://statnipa.appspot.com/cours/05/index.html#47 6/47 6/47
Question You have been assigned 12 consenting experimental subjects, each of whom has a brain tumour of the same size and type. Four are allocated at random to an untreated control group, four are treated with the drug Tumostat and four more with the drug Inhibin 4. After two months of treatment, the diameter of each tumour is remeasured. What would be the appropriate test here? Parametric test Non parametric test Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 7/47 7/47
ANOVA: Analysis Of Variance ( μ 1, σ1 2 ) ( X 11, X 12,..., X 1,n1 ) (, ) Population 1: Sample 1: Population 2: Sample 2: Population k: Sample k: Xˉ1 S 2 1 ( μ 2, σ2 2 ) ( X 21, X 22,..., X 2,n2 ) (, ).......................... Xˉ2 S 2 2 ( μ k, σ 2 ) ( X,,..., ) k k1 X k2 X k,nk (, ) Xˉk S 2 k Objective: Comparing the means of multiple populations http://statnipa.appspot.com/cours/05/index.html#47 8/47 8/47
ANOVA assumption and objectives Each of the k population or treatment response distributions is normal σ 1 = σ 2 =... = σ k (The k normal distributions have identical standard deviations) The observations in the sample from any particular one of the k populations or treatments are independent of one another When comparing population means, the k random samples are selected independently of one another μ 1 μ 2 μ k H 0 : = =... = H 1 μ : At least two of the s are di erent Estimation of simultaneous con dence intervals for the mean di erences μ i i, j = 1,..., k and i j μ j for 9/47 http://statnipa.appspot.com/cours/05/index.html#47 9/47
Example You have been assigned 12 consenting experimental subjects, each of whom has a brain tumour of the same size and type. Four are allocated at random to an untreated control group, four are treated with the drug Tumostat and four more with the drug Inhibin 4. After two months of treatment, the diameter of each tumour is remeasured. Population parameters: - μ 1 : population mean of the control group - μ 2 : population mean of the Neurohib group - μ 3 : population mean of the Tumostop group. H 0 : There is no di erence in mean tumour diameter among the treatments H 1 μ : There is a di erence in mean tumour diameter among the treatments. At least two of the s are di erent = = μ 1 μ 2 μ 3 http://statnipa.appspot.com/cours/05/index.html#47 10/47 10/47
Data x ij i j is the th observation resulting from th treatment = T.j n j : total of th treatment. : mean of the th treatment xˉ.j T.j i=1 x ij j = n j j T.. = k j=1 T.j = k j=1 n j i=1 x ij : total of all observations T xˉ.. =.. N : the grand mean where N = k j=1 n j http://statnipa.appspot.com/cours/05/index.html#47 11/47 11/47
Example 1 xˉ.1 = (7 + 8 + 10 + 11) = 9 4 1 xˉ.2 = (4 + 5 + 7 + 8) = 6 4 1 xˉ.3 = (4 + 5 + 1 + 2) = 3 4 1 xˉ.. = (7 + 8 + 10 + 11 + 4 + 5 + 7 + 8 + 4 + 5 + 2 + 1) = 6 12 http://statnipa.appspot.com/cours/05/index.html#47 12/47 12/47
Within groups sum of squares Variation within group: ssw = 3 j=1 4 i=1( x ij xˉ.j ) 2 ssw = (4 + 1 + 1 + 4) + (4 + 1 + 1 + 4) + (4 + 1 + 1 + 4) = 30 http://statnipa.appspot.com/cours/05/index.html#47 13/47 13/47
Between groups sum of squares Variation between groups: ssb = 3 j=1 4( xˉ.j xˉ.. ) 2 ssb = 4 9 + 0 + 4 9 = 72 http://statnipa.appspot.com/cours/05/index.html#47 14/47 14/47
Total sum of squares Variation between groups: sst = 3 i=1 4 i=1( x ij xˉ.. ) 2 sst = (1 + 4 + 16 + 25) + (1 + 1 + 4 + 4) + (1 + 4 + 16 + 25) = 102 sst = ssw + ssb = 30 + 72 = 102 http://statnipa.appspot.com/cours/05/index.html#47 15/47 15/47
Check yourself Under the null hypothesis, the ratio should be close to 1. True False Submit Show Hint Show Answer Clear ssb ssw http://statnipa.appspot.com/cours/05/index.html#47 16/47 16/47
Test for equal means The hypotheses are: H 0 : H 1 : μ 1 = μ 2 =... = μ k some means are different If the null hypothesis is true, we combine k samples to estimate overall mean and the sample mean for the group as: i Xˉ.. k i=1 n j X ij Xˉ.j 1 = = N j=1 n 1 j X i,j n j i=1 SST = k j=1 n j ( sum of squares within groups and groups i=1 X ij Xˉ..) 2 Total sum of squares, SSW = ( We can show that: SST = SSW + SSB k j=1 n j i=1 X ij Xˉ.j) 2 SSB = k j=1 n j( Xˉ.j Xˉ..) 2 sum of squares between http://statnipa.appspot.com/cours/05/index.html#47 17/47 17/47
Check yourself X 1, X 2,..., X n N (μ, σ 2 ) μ σ S 2 = 1 ( n 1 n i=1 X i Xˉ) 2 Let with and unknown. Let. What is the distribution of? σ N (μ; 2 ) n t n 1 χ 2 1 n 1 S 2 σ 2 χ 2 n 1 χ 2 n Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 18/47 18/47
Check yourself In the anova framework, we de ned SSW Let n 1 n 2 n k, what is the distribution of? N = + +... + SSW = k j=1 n j i=1( X ij Xˉ.j) 2 σ 2 as the sum of squares within group. χ 2 1 χ 2 k χ 2 k 1 χ 2 N χ 2 N 1 χ 2 N k χ 2 N k 1 Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 19/47 19/47
Parameter estimation Comparing means from multiple populations assuming the variances are the same and equal to σ 2 Pooled variance estimator: S 2 pool k j=1( 1) = n j S 2 j = k j=1( n j 1) k j=1 n j i=1( X ij Xˉ.j) 2 N k N = n 1 + n 2 +... + n k = where, and Xˉ.j 1 n j n j i=1 X i,j S 2 1 j = n j i=1( X ij Xˉ.j) 2 n j 1 Notice that: S 2 pool ( n 1 1) S1 2 ( n 2 1) S2 2 ( n k 1) S 2 k = + +... + σ 2 σ 2 σ 2 σ 2 χ 2 N k j = 1, 2,..., k As for, ( n j 1) S 2 j σ 2 χ 2 1 n j http://statnipa.appspot.com/cours/05/index.html#47 20/47 20/47
Parameter estimation = Xˉ.j 1 n j n j is an estimator of i=1 X ij μ j As,,..., N (, ), then X 1j X 2j X nj j μ j σ 2 N (, ) Xˉ.j μ j σ 2 n j Xˉ.j μ j 1 S pool nj t N k (1 α) con dence intervals for population means are: = [, + ] 1 α xˉ.j t N k 2 n 1 α j 2 n j I α xˉ.j t N k s pool s pool http://statnipa.appspot.com/cours/05/index.html#47 21/47 21/47
Example A 95% con dence interval for respectively μ control, μ neurohib and μ mitostop is: - [9 2.262 30/9 1/2; 9 2.262 30/9 1/2] = [6.935; 11.065] - [6 2.262 30/9 1/2; 6 2.262 30/9 1/2] = [3.935; 8.065] - [3 2.262 30/9 1/2; 3 2.262 30/9 1/2] = [0.935; 5.065] http://statnipa.appspot.com/cours/05/index.html#47 22/47 22/47
Check yourself In the anova framework, we de ned SST distribution of? σ 2 SST = k j=1 n j ( i=1 X ij Xˉ..) 2 as the sum of total squares. What is the χ 2 1 χ 2 k χ 2 N 1 χ 2 N χ 2 N k Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 23/47 23/47
Check yourself In the anova framework, we de ned SSB What is the distribution of? σ 2 SSB = k j=1 n j( Xˉ.j Xˉ..) 2 as the sum of squares between groups. χ 2 1 χ 2 k χ 2 N 1 χ 2 N k χ 2 k 1 Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 24/47 24/47
Test for equal means SST σ 2 SSW σ 2 X ij = k j=1 n j Xˉ.. i=1( ) 2 χ 2 σ N 1 k j=1 n j X ij Xˉ.j = i=1( ) 2 χ 2 σ N k = ( σ ) 2 χ 2 k 1 SSB k σ 2 j=1 Xˉ.j Xˉ.. n j http://statnipa.appspot.com/cours/05/index.html#47 25/47 25/47
Test for equal means De nition: Let Y and W be independent random variables such that Y has the χ 2 m distribution with m degrees of freedom and has the distribution with n degrees of freedom, where m and n are given positive integers. The random variable T de ned as follows: T F m,n Then the distribution of is, the F distribution with m and n degrees of freedom. W T = Y /m W/n χ 2 n Under the null hypothesis, the random variable: T = SSB/(k 1) SSW/(N k) F k 1,N k k 1 N k has a distribution with and degrees of freedom. http://statnipa.appspot.com/cours/05/index.html#47 26/47 26/47
Test for equal means α f To test with a signi cant level, we calculate the value of the test statistic from the samples H 0 f > f 1 α Reject the null distribution if where is the critical value. The p value for the test is: k 1,N k f 1 α k 1,N k p value = P(T > f) where T F k 1,N k If the null hypothesis is rejected, what next? - - Tests for contrasts Pairwise comparison http://statnipa.appspot.com/cours/05/index.html#47 27/47 27/47
Example 72/2 k 1 = 2 N k = 9 f = = 10.8 We have:,,, and. 30/9 2,9 = 4.256 p value = 0.004 f 0.95 We reject the null hypothesis of equal mean of tumour diameters http://statnipa.appspot.com/cours/05/index.html#47 28/47 28/47
Example: Comparison of 5 pretreated patches to reduce mosquito human contact Reference: Bhatnagar, A and Mehta, VK (2007) E cacy of Deltamethrin and Cy uthrin Impregnated Cloth over Uniform against Mosquito Bites. Medical Journal Armed Forces India, 63, 120-122 http://statnipa.appspot.com/cours/05/index.html#47 29/47 29/47
Example: parameter estimation model01 = aov(measure~treatment, data=bites) model.tables(model01, type="means") Tables of means Grand mean 7.153333 Treatment Treatment C+O Cyfluthrin Deltamethrin D+O Odomos 5.367 8.033 8.133 6.333 7.901 xˉ = 7.153 xˉc = 8.033 xˉd = 8.133 xˉo = 7.901 xˉc+o = 5.367 xˉd+o = 6.333 http://statnipa.appspot.com/cours/05/index.html#47 30/47 30/47
Example: test summary(model01) Df Sum Sq Mean Sq F value Pr(>F) Treatment 4 184.6 46.16 4.48 0.00192 ** Residuals 145 1494.1 10.30 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We have: If then, we reject the null hypothesis of equal mean mosquito bite rates. k 1 = 4 SSB = 184.6 N k = 145 SSW = 1494.1f = 4.48 p value = 0.00192 f4,145 1 α α = 0.05 = 2.434 http://statnipa.appspot.com/cours/05/index.html#47 31/47 31/47
Example: Survival times in terminal human cancer Reference: Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: reevaluation of prolongation of survival times in terminal human cancer. Proceedings of the National Academy of Science USA, 75, 4538-4542 STOMACH BRONCHUS COLON OVARY BREAST 124 81 248 1234 1235 42 461 377 89 24 25 20 189 201 1581 45 450 1843 356 1166 412 246 180 2970 40 51 166 537 456 727 1112 63 519 3808 46 64 455 791 103 155 406 1804 876 859 365 3460 146 151 942 719 340 166 776 396 37 372 223 163 138 101 72 20 245 283 http://statnipa.appspot.com/cours/05/index.html#47 32/47 32/47
Example: Survival times in terminal human cancer http://statnipa.appspot.com/cours/05/index.html#47 33/47 33/47
Example: Survival times in terminal human cancer http://statnipa.appspot.com/cours/05/index.html#47 34/47 34/47
Example: parameter estimation model02 = aov(survival~type, data=cdat) model.tables(model02, type="means") Tables of means Grand mean 5.555785 Type stomach bronchus colon ovary breast 4.968 4.953 5.749 6.151 6.559 rep 13.000 17.000 17.000 6.000 11.000 xˉ = 5.555785 xˉs = 4.968 xˉb = 4.953 xˉc = 5.749 xˉo = 6.151 xˉb = 6.559 http://statnipa.appspot.com/cours/05/index.html#47 35/47 35/47
Example: test summary(model02) Df Sum Sq Mean Sq F value Pr(>F) Type 4 24.49 6.122 4.286 0.00412 ** Residuals 59 84.27 1.428 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 We have: f4,59 1 α α = 0.05 = 2.434 k 1 = 4 SSB = 24.49 N k = 59 SSW = 84.27 f = 4.286 p value = 0.00412 If then, we reject the null hypothesis of equal mean survival days. http://statnipa.appspot.com/cours/05/index.html#47 36/47 36/47
Contrasts A contrast is any linear combination of the population means k i=1 c i = 0 such that and integer. μ 1, μ 2,..., μ 5 k = 5 c i C = c 1 μ 1 + c 2 μ 2 +... + c k μ k If are means of populations, some examples of contrasts are: - μ 1 μ 2-2μ 1 μ 3 μ 4 - μ 1 + μ 2 + μ 3 μ 4 2μ 5 http://statnipa.appspot.com/cours/05/index.html#47 37/47 37/47
Check yourself You want to test: 1 H 0 : μ 1 ( + + + ) = 0 : + ( + + + ) 0 4 μ 1 2 μ 3 μ 4 μ 5 H 1 μ 1 4 μ 2 μ 3 μ 4 μ 5 What would be the contrast: 1 μ 1 ( + + + ) 4 μ 2 μ 3 μ 4 μ 5 4 μ 1 μ 2 μ 3 μ 4 μ 5 5 μ 1 μ 2 μ 3 μ 4 μ 5 Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 38/47 38/47
Test for a contrast The hypotheses: H 0 k k : c j μ j = 0 versus H 1 : c j μ j 0 j=1 j=1 S w Note = SSW/(N k), the test statistic: T = k j=1 c jxˉ.j S w k j=1 c 2 j n j has a t-distribution with N k degrees of freedom. The (1 α) 100% con dence interval for the contrast is: k c 2 k c 2 [ t N k j ; + t N k j ] k j=1 c j xˉ.j 1 α 2 s w j=1 n j k j=1 c j xˉ.j 1 α 2 s w j=1 n j http://statnipa.appspot.com/cours/05/index.html#47 39/47 39/47
Check yourself In the anova framework with k conditions or treatments, when you reject the null hypothesis how many comparisons would you do to compare the means? k 1 k k 2 k(k+1) 2 k(k 1) 2 Submit Show Hint Show Answer Clear http://statnipa.appspot.com/cours/05/index.html#47 40/47 40/47
Pairwise comparisons There are k(k 1) 2 pairwise comparisons or tests Recall that when testing a single hypothesis H 0, a type I error is made if it is rejected, even if it is actually true. The probability of making a type I error in a test is usually controlled to be smaller than a α certain level of, typically equal to 0.05 H 01 H 02 H 0m α When there are several null hypotheses,,,...,, and all of them are tested simultaneously, one may want to control the type I error at some level. A type I error is then made if at least one true hypothesis in the family of hypotheses being tested is rejected. This signi cance level is called the familywise error rate (FWER). If the hypotheses in the family are independent, then: α i i = 1, 2,..., m FWER = 1 (1 α i ) m where for are individual signi cance levels. http://statnipa.appspot.com/cours/05/index.html#47 41/47 41/47
Pairwise comparisons FWER α H 0i H 01, H 02,..., p-value is less than α/m. H 0m Bonferroni: To control, reject all among for which the Studentized range distribution (Tukey) procedure: - Rank the k sample means - Two population means μ i and μ j are declared signi cantly di erent if the (1 α)100 μ i μ j con dence interval of : xˉi xˉj ± q N k,k,1 α s 1 1 1 w ( + ) 2 n i n j q N k,k,1 α populations. is the upper-tail critical value of the Studentized range for comparing k di erent 42/47 http://statnipa.appspot.com/cours/05/index.html#47 42/47
Comparison of 5 pretreated patches Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = Measure ~ Treatment, data = bites) $Treatment diff lwr upr p adj D+O-C+O 0.9663333-1.3232391 3.255906 0.7707275 Odomos-C+O 2.5336667 0.2440942 4.823239 0.0220410 Cyfluthrin-C+O 2.6656667 0.3760942 4.955239 0.0136686 Deltamethrin-C+O 2.7660000 0.4764276 5.055572 0.0093589 Odomos-D+O 1.5673333-0.7222391 3.856906 0.3268078 Cyfluthrin-D+O 1.6993333-0.5902391 3.988906 0.2476696 Deltamethrin-D+O 1.7996667-0.4899058 4.089239 0.1965293 Cyfluthrin-Odomos 0.1320000-2.1575724 2.421572 0.9998540 Deltamethrin-Odomos 0.2323333-2.0572391 2.521906 0.9986342 Deltamethrin-Cyfluthrin 0.1003333-2.1892391 2.389906 0.9999510 http://statnipa.appspot.com/cours/05/index.html#47 43/47 43/47
Comparison of 5 pretreated patches Cy uthrin patches when applied in presence of odomos were found to have much more repellent action as compared to only odomos. The di erence in the repellent action was very highly signi cant (p < 0.01). Thus it can be inferred that signi cant bene t is achieved in reducing man-mosquito contact when cy uthrin patches are applied over the uniform by the troops in addition to using odomos as compared to those using odomos only http://statnipa.appspot.com/cours/05/index.html#47 44/47 44/47
Survival times in terminal human cancer Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = Survival ~ Type, data = cdat) $Type diff lwr upr p adj stomach-bronchus 0.01474955-1.2242933 1.253792 0.9999997 colon-bronchus 0.79595210-0.3575340 1.949438 0.3072938 ovary-bronchus 1.19744617-0.3994830 2.794375 0.2296079 breast-bronchus 1.60543320 0.3041254 2.906741 0.0083352 colon-stomach 0.78120255-0.4578403 2.020245 0.3981146 ovary-stomach 1.18269662-0.4770864 2.842480 0.2763506 breast-stomach 1.59068365 0.2129685 2.968399 0.0158132 ovary-colon 0.40149407-1.1954351 1.998423 0.9540004 breast-colon 0.80948110-0.4918267 2.110789 0.4119156 breast-ovary 0.40798703-1.2987803 2.114754 0.9615409 http://statnipa.appspot.com/cours/05/index.html#47 45/47 45/47
Survival times in terminal human cancer is signi cantly di erent to 0. In fact, ascorbate, when used with the treatment, seems to improve survival times better in breast cancer than in bronchus cancer. μ breast log( ) log( ) μ breast μ bronchus μ stomach log( ) log( ) is also signi cantly di erent to 0, showing a signi cant improvement of survival in breast cancer compared to stomach cancer when ascorbate supplement is used in the treatment. http://statnipa.appspot.com/cours/05/index.html#47 46/47 46/47
See you next time http://statnipa.appspot.com/cours/05/index.html#47 47/47 47/47