Simultaneous Confidence Intervals and Multiple Contrast Tests

Simultaneous Confidence Intervals and Multiple Contrast Tests Edgar Brunner Abteilung Medizinische Statistik Universität Göttingen 1

Contents Parametric Methods Motivating Example SCI Method Analysis of the Example Nonparametric Methods Motivating Example SCI Method Analysis of the Example Paricular Difficulties References 2

I Parametric Methods Motivating Example O 2 -Consumption of Leucocytes bars show min max O 2 -Consumption of Leucocytes D2 n 3 =7 D1 n 2 =8 PL n 1 =8 3,0 3,5 4,0 O 2 -Consumption [µl] Question Which dose is different from control? 3

Motivating Example Classical Analysis (1) ANOVA / H 0 : µ P = µ 1 = µ 2 (2) H 0 rejected multiple comparisons (FWE s = 0.05) (3) confidence intervals for µ 1 µ P and µ 2 µ P must be compatible to the decisions of the MCP i.e. confidence interval (CI) for µ i µ P may not contain 0 H 0 : µ i µ P = 0 is rejected, i = 1,2 Statistical Methods / Procedures ANOVA (F-test) multiple comparisons using closure principle (CTP) Bonferroni confidence intervals (1 α = 0.975) Results global hypothesis: F = 2.53 p-value 0.1056 - (n.s.) MCP PL - D1: p = 0.1424 - (n.s.) / PL - D2: p = 0.0488 - (n.s.) 4

Motivating Example Shift of the D1 Data O 2 -Consumption of Leucocytes D2 n 3 =7 D1 n 2 =8 PL n 1 =8 3,0 3,5 4,0 O 2 -Consumption [µl] Results global hypothesis: F = 4.06 p-value 0.0355 - (*) MCP (CTP) PL - D1: p = 0.0256 - (*) / PL - D2: p = 0.0488 - (*) 5

Conclusions from the Motivating Example Confidence Intervals (Bonferroni) PL - D1: [ 0.024, 0.557] - contains 0 / not compatible to CTP PL - D2: [ 0.063, 0.538] - contains 0 / not compatible to CTP Conclusions (undesirable properties decision on effect PL - D2 depends on effect PL - D1 confidence intervals are not compatible dependency of the statistics X 1 X P and X 2 X P not used (wasting information) different method needed 6

Different Method Idea statistical model is adapted and reduced to the particular questions of the experimenter take dependence of the statistics into account statistics completely dependent no α-adjusting necessary independence is the worst case example of O 2 -consumption ( ) 1 1 0 C = = ( 1 1 0 1 2.I 2 ) and X = (X P,X 1,X 2 ) ( ) ( ) X desired contrasts CX = 1 X P µ1 µ, µ X 2 X δ = P P µ 2 µ P consider the distribution of CX N(µ δ, Σ) [( ) ] n Σ = σ 2 1 1 0 0 n 1 + n 1 P J 2 = (s i j ) i, j=1,2 2 7

Different Method Derivation of the Statistic s ii = σ 2 (n i + n P )/(n i n P ), i = 1,2 - diagonal elements of Σ ŝ ii : LS-estimator of s ii replacing σ 2 with the pooled estimator σ 2 N = 1 N 3 i=p,1,2 n i (X ik X i ) 2, N = n 1 + n 2 + n P k=1 ( ) X 1 X P studentize each component of CX = X 2 X P under H 0 : µ δ = 0, consider the statistics (i = 1,2) ni n P T i = (X i X / σ P ) N n i + n P. N(0,1), N, N/n i < N 0 < multivariate statistic T = (T 1,T 2 ). N(0,R), R: correlation matrix with ŝ ii 8

Different Method Derivation of the (1 α)-quantiles same quantile z 1 α,2,r for all components, such that z1 α,2,r z 1 α,2,r z1 α,2,r z 1 α,2,r dn(0,r) = 1 α better approximation: mulitvariate t-distribution: t 1 α,2,ν, R N R N : LS-estimator of R replacing σ 2 with σ 2 N diagonal elements = 1 off-diagonal elements depend only on sample sizes and σ 2 N T multivariate t-distribution references original paper: Bretz, Genz and Hothorn (2001) multivariate integration: Genz and Bretz (2009) heteroscedastic case: Hasler and Hothorn (2008) in general: C may be any appropriate contrast matrix 9

SCI-Method / Quantiles 4 2 0 2 4 2 4 2 0 2 0 2 4 4 2 0 2 4 Korrelation = 0.99, Quantil= 2.0133 4 Korrelation = 0.5, Quantil= 2.2121 4 Korrelation = 0, Quantil= 2.2365 4 2 0 2 4 4 2 0 2 4 equi-coordinate quantiles of different bivariate normal distributions squares containing mass 1 α of the bivariate normal distributions computation by means of R-package mvtnorm SAS-macro: to be developed or input of R-code in SAS/IML Studio 3.2 10

SCI-Method / Procedure Multiple Comparisons reject H (i) 0 : δ i = µ i µ P = 0 if T i z 1 α,2,r - or T i t 1 α,2,ν, R N Global Hypothesis reject H 0 : Cµ= µ δ = 0 if max{t 1,T 2 } z 1 α,2,r - or max{t 1,T 2 } t 1 α,2,ν, R N Simultaneous Confidence Intervals ( { [ P δ i X i X P ± z ]} ) 1 α,2, R N ni + n P. = σ N n i n 1 α P i I Error Control? FWE s (by Gabriel s Theorem, 1969) 11

Example: Analysis by SCI-Method Original Data Set (O 2 -Consumption of Leucocytes) O 2 -Consumption of Leucocytes D2 n 3 =7 D1 n 2 =8 PL n 1 =8 3,0 3,5 4,0 O 2 -Consumption [µl] SCI Classical PL - D1 t = 2.10 p-value 0.0965 - n.s. n.s. PL - D2 t = 2.18 p-value 0.0864 - n.s. n.s. 12

Example: Analysis by SCI-Method Shift of the D1 Data O 2 -Consumption of Leucocytes D2 n 3 =7 D1 n 2 =8 PL n 1 =8 3,0 3,5 4,0 O 2 -Consumption [µl] SCI Classical PL - D1 t = 2.53 p-value 0.0460 - ( ) ( ) PL - D2 t = 2.18 p-value 0.0864 - n.s. ( ) 13

Conclusions from the Analysis Confidence Intervals (D1 Shifted) PL - D1: [0.0049, 0.5276] - does not contain 0 / compatible PL - D2: [ 0.0324, 0.5074] - contains 0 / compatible Conclusions decision on effect PL - D2 does not depend on effect PL - D1 confidence intervals are compatible dependency of the statistics X 1 X P and X 2 X P is used 14

Extensions / Generalizations Factorial Designs Biesheuvel and Hothorn (2002) / stratified samples general case under research: diploma thesis Large Number of Dimensions Σ N may become singular (breakdown?) Repeated Measures n d and n < d (breakdown?) high-dimensional data / Froemke, Hothorn and Kropf (2008) is there a limit distribution? Binomial Data Schaarschmidt, Sill and Hothorn (2008) Nonparametric effects non-normal data (Konietschke, 2009) ordinal data: ordinal effect size measure (Ryu and Agresti, 2008) 15

II Nonparametric Methods Motivating Example Toxicity Trial (60 Wistar Rats) damage by an inhalable substance on the mucosa of the nose 3 concentrations ( 2[ppm], 5[ppm], 10[ppm]) score (0 = no damage,..., 3 = severe damage ) ordinal data Concentration Number of Rats with Score 0 1 2 3 2 [ppm] 18 2 0 0 5 [ppm] 12 6 2 0 10 [ppm] 3 7 6 4 16

Motivating Example Classical Analysis Strategy statistical model X ik F i (x), i = 1,2,3; k = 1,...,20 hypotheses H (1) 0 : F 1 = F 2 = F 3 H (2) 0 : F 1 = F 2 - relative effect: p 12 = F 1 df 2 H (3) 0 : F 1 = F 3 - relative effect: p 13 = F 1 df 3 H (4) 0 : F 2 = F 3 - relative effect: p 23 = F 2 df 3 relative effect p i j - interpretation p i j = F i df j = P(X i1 < X j1 )+ 1 2 P(X i1 = X j1 ) probability that the observations in group i tend to smaller values than in group j ordinal data: effect size measure (Ryu and Agresti, 2008) needed: confidence intervals for p i j = F i df j, i j = 1,2,3 error control: FWE s 17

SCI-Method Hypotheses of Interest H (1) 0 : p 12 = 1 2, H(2) 0 : p 13 = 1 2 Estimators of the Relative Effects p i j p i j = ( ) F i d F j = 1 (i j) n i R j n j+1 2 p = ( p12 asymptotic distribution of N( p p) N(0,V N ) depends on unknown parameters (elements of V N ) no pivotal quantity Statistics p 13 ) studentize each component (i, j) of p by v (i j) v (i j) : estimated variance of p i j (diagonal elements of V N ) (i j) j) estimation by means of ranks R ik, R(i jk, R(i) j) ik, and R( jk Reference: Brunner, Munzel und Puri (2002) 18

SCI-Method Asymptotic Distribution of the Statistics (i j) asymptotic distribution under H 0 : p i j = 1 2 of T i j = N ( p i j 1 2)/ v i j.. N(0,1) T = (T 12,T 13 ).. N(0,R), R: correlation matrix use the same procedure as in the parametric case error control: FWE s problem: confidence intervals may exceed the [0,1]-interval 19

SCI-Method / Properties Problem intervals are not range preserving lower and upper bound of a 95% confidence interval (n = 10) Solution multivariate δ-method 20

Range Preserving Intervals Procedure continuous transformation of G( p i j ) (, ) G : (G 1,...,G q ) : (0,1) q R q strictly monotone, i.e. G l (p i j) 0 differentiable, bijective, G l ( 1 2 ) = 0, l = 1,...,q in the example: q = 2 asymptotic distribution of G: Cramer s δ-theorem transformed estimators are also multivariat normal elements v i j of the covariance matrix of G multivariate δ-theorem: v i j = [G (p i j )] 2 v i j back transformation of the limits [0,1] - range preserving 21

Example: Analysis by SCI-Method Toxicity Trial (60 Wistar Rats) Results (Probit) Concentration Number of Rats with Score 0 1 2 3 2 [ppm] 18 2 0 0 5 [ppm] 12 6 2 0 10 [ppm] 3 7 6 4 Comparison Effect Interval p-value 2 vs. 5 0.66 0.5 / [0.501; 0.787] 0.049 2 vs. 10 0.90 0.5 / [0.753; 0.970] < 0.0001 22

Nonparametric Methods / Difficulties Non-Transitivity pairwise relative effects are not transitive e.g.: p 1 < p 2 < p 3 < p 1 counter-example: Efron s paradox dice (Rump, 2001) Brown and Hettmansperger (2002) - one-way layout Thangavelu and Brunner (2007) - stratified Wilcoxon tests New Definition of Relative Effects for a > 2 e.g. p i = HdF i, H = mean of the F i all distributions are compared to H or all distributions are compared to the same reference to be worked out covariance matrix of N( p 1,..., p d ) is quite involved first results: Konietschke (2009) Factorial Designs consider each factor separately? or combine all comparisons in one vector? to be worked out 23

Discussion aund Outlook SCI-Method unifies 3 steps of the classical analysis strategy ANOVA multiple comparisons (controlling FWE s ) confidence intervals for the effects - compatible to the multiple comparisons in one procedure further research detailed results regarding power extension to factorial designs extension to repeated measures designs for parametric as well as nonparametric models Software so far only for independent samples (one-factorial design) parametric models: R-package: SimComp in CRAN nonparametric models: R-package: nparcomp in CRAN 24

Cooperation / Credits Ludwig Hothorn and assistants (Biostatistik, LU Hannover) Frank Konietschke (Medizinische Statistik, University of Göttingen) 25

References BIESHEUVEL, E. and HOTHORN, L.A. (2002). Many-to-one comparisons in stratified designs. BIOMETRICAL JOURNAL 44, 101-116. BRETZ, F., GENZ, A., and HOTHORN, L.A. (2001). On the numerically availibilty of multiple comparison procedures, Biometrical Journal 43, 645-656. BROWN, B. M. and HETTMANSPERGER, T. P. (2002). Kruskal-Wallis, Multiple Comparisons and Efron Dice. Australian and New Zealand Journal of Statistics 44, 427-438. BRUNNER,E., MUNZEL, U., and PURI, M., (2002). The multivariate nonparametric Behrens-Fisher problem. Journal of Statistical Planning and Inference 108, 37-53. FROEMKE C., HOTHORN L.A. and KROPF S. (2008). Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes. BMC Bioinformatics, 9:54 doi: 10.1186/1471-2105-9-54. 26

References GABRIEL, K.R. (1969). Simultaneous Test Procedures - Some Theory of Multiple Comparisons. The Annals of Mathematical Statistics 40, 224-250. GENZ, A. and BRETZ F. (2009). Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics 195. Springer, Heidelberg, New York. HASLER M. and HOTHORN L.A. (2008). Multiple Contrast Tests in the Presence of Heteroscedasticity. Biometrical Journal 50, 793-800. KONIETSCHKE, F. (2009). Simultane Konfidenzintervalle für nichtparametrische relative Kontrasteffekte. Dissertation, Georg-August-Universität Göttingen RUMP, C. M. (2001). Strategies for Rolling the Efron dice. Mathematics Magazine 74, 212-216. 27

References RYU, E. and AGRESTI, A. (2008). Modeling and inference for an ordinal effect size measure. Statistics in Medicine 27, 1703-1717. SCHAARSCHMIDT, F., SILL, M. and HOTHORN, L.A. (2008). Approximate Simultaneous Confidence Intervals for Multiple Contrasts of Binomial Proportions. Biometrical Journal 50, 782-792. THANGAVELU, K. and BRUNNER, E. (2007). Wilcoxon Mann-Whitney Test for Stratified Samples and Efron s Paradox Dice. Journal of Statistical Planning and Inference 137, 720-737. 28