Statistical analysis of short-term studies in regulatory toxicology using R

Size: px
Start display at page:

Download "Statistical analysis of short-term studies in regulatory toxicology using R"

Transcription

1 Statistical analysis of short-term studies in regulatory toxicology using R Ludwig A. Hothorn hothorn@biostat.uni-hannover.de Institute of Biostatistics, Leibniz University Hannover, Germany May 17, / 96

2 Aims for the next two hours I - Evaluation strategies of toxicological studies - Using confidence intervals - Using two-sample confidence intervals for a proof of hazard without FWER-control - Using Dunnett/Williams procedures: parametric (incl. variance heterogeneity), non-parametric, proportions with FWER-control according to NTP recommendations - Using proof of hazard or proof of safety - Using R (real data example based). Sorry: not interactive with R. But frequent R-user can run the program parallel to the slides 2 / 96

3 Motivating examples I - What means short-term studies? Repeated administration of a drug/compound on rats, mice or dogs (not really today since too small sample sizes), e.g. 4 weeks, 13 week or 6 month studies, e.g. acc.to the OECD 408 guideline: repeated dose 90-day oral toxicity (in opposite to long-term carcinogenicity studies whereas time to-death/tumor-relationships are of interest) 3 / 96

4 Motivating examples II - Example I: Continuous endpoints The data of a 13 weeks feeding study on Sodium dichromate dihydrate in F344 Rats was downloaded from NTP For each sex 10 rats were randomized to control, 62.5, 125, 250, 500 and 1000 mg/kg. Several hematological and clinical chemistry endpoints were measured after 5, 23 and 93 days of administration We use here clinical chemistry endpoints after 93 days as an example Furthermore, organ weight data and weekly-measured body weight data are available as well. These data are almost representative for short-term studies: both sexes, 10 animals/group, many endpoints. However, three instead five dose groups are common and only one final measurement, whereas sometimes a baseline measure is available. 4 / 96

5 Motivating examples III - Example II: Histopathological findings Incidence data: incidences of tubular epithelia hyaline droplet degeneration in male rats were reported for a 28-day oral dose toxicity study of nonylphenol to: 0/10, 0/10, 3/10, 8/10 [WSI + 07]. - Example III: Graded findings Non-Neoplastic lesions in the P-Cresidine carcinogenicity study on each 30 male mice: 1) hyperplasia in parotid gland (salivary glands) and 2) kidney hydoephoris (where the single finding minimal was categorized as none finding as the unlisted animals) The second example shows no finding in the control at all. - Data files in.cvs format available 5 / 96

6 Motivating examples IV - Data characteristics in toxicology: small sample sizes, particular in in-vivo studies, i.e. 5 to 12 animals/group. the randomized unit animal is the relevant sample size unit, not the pub in reprotox studies comparisons versus control treatment groups, but more often dose groups, i.e. dose-response analysis, multiple endpoints, e.g. chronic toxicity studies with more than 100 endpoints Specific: endpoint are approx. normal distributed, continuous but not normal distributed, proportions, and counts Specific: variance heterogeneity occur commonly in continuous data 6 / 96

7 Required R packages I - multcomp: parametric simultaneous inference using multiple contrasts [BHW02] - pairwiseci: two-sample confidence intervals - MCPAN: simultaneous confidence intervals for proportions (and poly3-estimates) - nparcomp non-parametric simultaneous inference using multiple contrasts - mratios ratio-to-control inference [DSH07] - ETC Bofinger approach for equivalence of several treatments with respect to control and related Bonferroni-TOST Please, download these packages from CRAN to your R installation! 7 / 96

8 Guidelines I - Regulatory toxicology is according to guidelines. However, the statistical design and analysis is described rather noncommittal, e.g. OECD 408 guideline:when applicable, numerical results should be evaluated by an appropriate and generally acceptable statistical method. - U.S. National Toxicology Program: Body/organ weights are to be assumed normally distributed and hence Dunnett/Williams [Dun55],[Wil71] approach is recommended controlling the FWER (familywise error rate). All other continuous endpoints should be analyzed non-parametrically by the Dunn/Shirley [DUN64], [SHI77] procedure, for proportions the arcsine transformation to normality is used and for severity data (ordered categorical data) should be analyzed by the WMW test (More details in the paper-version) 8 / 96

9 Guidelines II - In the 2001 FDA Guidance on Statistical Aspects of the Design, Analysis, and Interpretation of Chronic Rodent Carcinogenicity Studies of Pharmaceuticals a detailed description of the evaluation of neoplastic lesions is described only (not today) - However, in the separate recommendation for the evaluation of immunotoxicological studies, the Bartlett on homogeneity of variances in an one-way k-sample design is used as a pre-test. When significant, i.e. heterogeneous variance, non-parametric approaches are recommended, otherwise parametric. - Some text books on statistics in toxicology exists, e.g. [Gad05], [Mor96], [PB97], [Kir00], [Cho98],[KP03], but no textbook provides data examples and their appropriate evaluation with related software. - Conclusion: Dunnett/Williams respective Dunn/Shirley procedures using R and alternatives in this ASA-Webinar 9 / 96

10 Presentation style of significances I For tests the following presentation styles of significances are common: 1) yes/no decision H 0 vs. H 1 2) yes/no decision H 0 vs. H 1, but for three α levels: 0.05, 0.01, with the symbols: 3) p values, e.g. [MZL + 08] H / 96

11 Presentation style of significances II 4) confidence intervals 11 / 96

12 Presentation style of significances III - The p-value is motivated by Poppers falsification principle..we can never proof an effect directly; only by the small probability of its opposite. I.e. the p-value is the smallest possible f + rate (Explain it!) - Advantage: simple, reduction of any complex question to one value, commonly used - Disadvantage: i) it is a probability between 0 and 1 only - but we need a measure of efficacy, ii) it depends on sample size n, i.e. in un-designed experiments, any small p-value can be achieved by increasing sample size, independent on the effect size µ T µ C iii) (commonly) for a point-zero null-hypothesis H 0 : µ T µ C = 0, but in bio-medicine we are never interested in tiny differences 12 / 96

13 Presentation style of significances IV - A better alternative is the use of effect sizes and their confidence intervals - For continuous data: µ T µ C or µ T /µ C - Confidence intervals (CI) for these measures by re-formulating the x t-test: T x C = t df,1 p=min(α) into SD (2/n) (µ T µ C ) SD (2/n)t df,1 α, (µ T µ C ) + SD (2/n)t df,1 α - ICH E9 Guidance for RCT: Estimates of treatment effects should be accompanied by confidence intervals, whenever possible... - Sometimes, interpretation is easier as percentage change, e.g. k-fold rule in mutagenicity assays, and a confidence interval for µ T /µ C is recommended (switch from additive into multiplicative model). A bit more complicated (no formula here) according to Fieller [Fie54] 13 / 96

14 Presentation style of significances V - Properties of confidence intervals for ratio-to-control: i) asymmetric, because they are multiplicative [0.5, 1.5] ii) problems with control values near to zero, iii) useful for a direct comparison of multiple endpoints, iv) useful for endpoints with different scales (contin., proportions, counts), v) one confidence interval approach for superiority and non-inferiority, vi) others - Notice, one-sided intervals available as well 14 / 96

15 Presentation style of significances VI - Problems: i) the width of the confidence interval, i.e. SD (2/n)t df,1 α is a function of sample size, i.e. larger sample sizes, smaller (more significant) width and analogously smaller p-values- independent of effect size and variance. The sample size must be defined a-priori - by guideline, by power approach (see later) ii) Variance heterogeneity iii) Violation of normal distribution assumption - The common mis-understanding between statistical significance and biological relevance results from inappropriate use of p-values, testing point-zero H 0, and un-designed experiments. Therefore, toxicological studies should be characterized by effect measures and their confidence intervals. 15 / 96

16 Effect sizes and their confidence intervals I - Recent critique for studies with individuals, i.e. volunteers in RTC [Bro10] I once asked a well-published medical researcher what p < 0.05 meant to him. He said: It means that everyone on X did better than everyone on Y - What is needed is...translating a significant t-test p value and sample size into a form that can more clearly express the magnitude of the effect While traditional significance testing usually deals with differences between population means, there is an increasing focus in fields such as medicine on the probability of one treatment being more successful than another on a per-individual basis 16 / 96

17 Effect sizes and their confidence intervals II - We need..a measure of how often a random subject receiving treatment X will outperform a random subject receiving treatment Y, typically expressed as P(X > Y ). - Relative effect size Measure for stochastic order : p01 = P(X 01 < X 11 ) (for continous data). I.e. a probability that a randomly selected patient in the control reveals a smaller response value than a randomly selected patient in the treatment group. This measure p01 was denoted relative effect [BM00]. 17 / 96

18 Effect sizes and their confidence intervals III Important generalization for non-continous date (ties, scores data,...): p 01 as ordinal effect size measure [RA08] and is defined: p 01 = F 0 df 1 = P(X 01 < X 11 ) + 0.5P(X 01 = X 11 ). (1) This is an effect size according to Browne [Bro10], and nowadays both confidence intervals and software (R library nparcomp) exist Interpretation: X 1j tends in comparison to X 0j stochastically to larger values, if p 01 > 0.5, to smaller values, if p 01 < 0.5, to no decision against H 0 if p 01 = / 96

19 Proof of hazard vs. proof of safety I - Common decision in toxicology Harmless, if the p-value of an appropriate test for D j vs. C is non-significant > 0.05, otherwise harmful(based on the sample sizes in the guidelines, for the design C, D 1, D 2, D 3, independent for both sexes, each endpoint, each time) and with a consecutive discussion of biological relevance if p < 0.05, - Issue 1: Point-zero-null-hypotheses H 0 : µ C µ D δ = 0 are not appropriate Better: a-priori definition of relevance thresholds in toxicology. But a consensus- particularly for the many endpoints- seems to be hopeless Therefore, estimation of confidence intervals and their post-hoc interpretation in terms of tolerable thresholds 19 / 96

20 Proof of hazard vs. proof of safety II - Issue 2: Kirkland (1999) be confident in negative results [LYH + 00]. But common used hypotheses: H 0 : µ C = µ D harmless H 1 : µ C < µ D harmful Remember falsification principle Crux: Neyman-Pearson tests are asymmetric, i.e. only one error rate can be controlled directly, namely the error of falsely rejecting H 0 (f + ) error rate, type I error rate, α). I.e. the common-used t-test at level α = 0.05 allows a 5% f + rate to reject H 0. Alternative: Proof of safety, i.e. we formulate the hypotheses, that we control the more important error in toxicology, i.e. the f rate (Explain it! π = 1 f ): H 0 : µ C µ D δ harmful(toxic) H 1 : µ C µ D < δ harmless(non-toxic) 20 / 96

21 Proof of hazard vs. proof of safety III - Issue 3: In chron-tox studies, the variances are endpoint-specific (or in long-term carcinogenicity studies are the spontaneous tumor rates specific). Therefore, the f rate in the common-used proof of hazard are endpoint-specific: crazy! - Issue 4: Although multiple endpoints occur in a chron-tox study, we will not perform multiplicity adjustment Differently to multiple efficacy endpoints in a RCT, because claiming efficacy for Y 1 OR Y 2 OR... - Issue 5: Although in the design C, D i,..., D k multiple comparisons occur, we will not adjust against multiplicity in tox because we are interested to keep the more important f error rate low instead of a strict control of the less important f + error rate 21 / 96

22 Proof of hazard vs. proof of safety IV - Curious In tox with the design [C, D 1,..., D k ] the main used approach is Dunnett-Test (it is the 14. most cited statistical paper [RW05] with a majority in tox) An example Van Vleet et al. [VVWS + 07]: Statistical Analyses Dunnetts test was used to confirm/rule-out apparent dose-related trends - Why multiplicity adjustment according to Dunnett is less appropriate in the proof of hazard? i) dose-related trend test is better, e.g. [Wil71] procedure ii) Problems with down-turn effects at higher doses? A protected Williams approach is available [BH03] iii) Why multiplicity adjustment in toxicology? For an efficacy endpoint in RCT, we must pay an price because claim for D 1 OR D 2 OR D 3 What happens in toxicology when using multiplicity adjustment? 22 / 96

23 Proof of hazard vs. proof of safety V i) The power π will be reduced; this is particularly critical because most sample sizes are (too) small ii) There is no claim for a toxic effect at D 1 OR D 2 OR...; any toxic dose is an outcome - Either proof of safety, i.e. D 1 OR D 2 OR D 3 are equivalent (or non-inferior) to control OR proof of hazard, but without multiplicity adjustment. I.e. we tolerate in increasing f + rate instead an increasing f error rate Precisely: the control of the comparisonwise error rate may be sufficient in toxicology 23 / 96

24 Proof of hazard vs. proof of safety VI - But NTP recommendation: Dunnett/Dunn and Williams/Shirley do control FWER. Therefore, today all three approaches - Notice: Why are simple confidence intervals of two-sample tests sometimes not the best? Answer: The small df: ν 0i = n 0 + n i 2 are less than common df of a one-way layout (notice, assuming variance homogeneity, i.e. ν = n 0 + n n k k 1, which is particular important in tox because of the little number of animals. 24 / 96

25 One- or two-sided formulation of hypothesis? I - There are controversial arguments for/against one/two-sided hypotheses. In RCT the efficacy will be commonly tested by a two-sided hypothesis or an one-sided at level α/2 - Most endpoints in tox are directed, e.g. increasing tumor rate, ASAT, finding rates. But two-sided problems exists (rarely) as well, e.g. body weight changes. - A simple way: two-sided generally, but: i) the f error rate increases unnecessarily in the proof of hazard ii) In the proof of safety a clear distinction between testing equivalence (2-sided) and non-inferiority (1-sided) exists - Therefore: most hypotheses in tox are one-sided; and therefore the testing non-inferiority is the main approach 25 / 96

26 Dunnett procedure or Williams procedure or...? I - The NTP recommends the parametric Dunnett/Williams or the non-parametric Dunn/Shirley procedure. Which one is appropriate? - The common dose-response design C = D = 0, D i,..., D k should be analyzed by the Williams procedure assuming an one-sided and monotonic trend H 1 : µ C µ 1... µ k (Notice, two-sided trend hypotheses are possible, but hard to imagine) - Why the Dunnett procedure for H 1 : µ 0 < µ i (at least one i, anyone) should be used? i) changes are of interest, i.e. two-sided alternatives, for which the Dunnett procedure was constructed, ii) still one-sided, but doubts on monotonicity - An alternative for down-turn effects at high doses: modified Williams test [BH02] - For high-throughput analysis: two-sided Dunnett procedure. For specific analysis: one-sided Williams procedure, sometimes modified against non-monotonicity 26 / 96

27 Proof of hazard using unadjusted comparisons I - A consequence from the primary importance of f in the proof of hazard, is not to control a familywise f + rate neither against several doses/treatments nor against multiple endpoints, i.e. the use of unadjusted two-sample comparisons throughout. Even when the not really estimable f + rate increases seriously and sentences as although statistically significant, this increase in... is biologically not relevant are used frequently - To achieve comparability between differently-scaled multiple endpoints, unadjusted two-sided (1 α) confidence intervals for ratios-to-control can be recommended, whereas a parametric Fieller-type version [Fie54] for heterogeneous variances is available [TL04], [HVH08] in the R package pairwiseci. Hodges-Lehman-type intervals are proposed [HM02] whereas a Behrens-Fisher modification is not available. 27 / 96

28 Proof of hazard using unadjusted comparisons II - Alternatively, related confidence intervals for relative effects for a Behrens-Fisher solution can be used, by means of the R package nparcomp. Notice, two serious limitations exist for the non-parametric approach: control values near-to-zero and small sample sizes (e.g. n i < 10) - Example 1: evaluation of relative organ weights analogously to [WJD + 09]. Re-analysis using pairwiseci Analyze using R!: parametric approach, Hodges-Lehmann approach, relative effect size approach - Questions so far? Jump to next chapter 28 / 96

29 Evaluation of Example 1 I setwd("e:\\aktuell_e\\ PUB\\_PAPER\\_StatTox2010\\Datenbeispiele") organ <- read.csv("organ.csv") organ$dose <- as.factor(organ$dose) library(pairwiseci) exa1 <- pairwiseci(weight ~ Dose, by="organ", data=organ, alternative="two.sided", method="hl.ratio", control=" plot(exa1,civert=false, H0line=c(0.8,1, 1.25), H0lty=c(2,1,2), main="relative organ weights", xlab="non-parame exa1a <- pairwiseci(weight ~ Dose, by="organ", data=organ, alternative="two.sided", method="param.ratio", var.e plot(exa1a,civert=false, H0line=c(0.8,1, 1.25), H0lty=c(2,1,2), main="relative organ weights", xlab="parametri library(pairwiseci) library(nparcomp) tym <- organ[organ$organ=="thymus", ]; tym500 <- tym[tym$dose==0 tym$dose==500, ] tym500$dose <- factor(tym500$dose, levels=c(0,500)); tym500 <- tym500[,c(4,6)] npar.t.test(weight ~ DOSE, data=tym500, alternative="two.sided", p.permu = TRUE, plot.simci = TRUE, info = TRUE pairwisetest(weight ~ DOSE, data=tym500, alternative="two.sided", method="t.test.ratio") Interpretation: i) Compare parametric vs. non-parametric! ii) Use directional decisions!, iii) Interprete: significant, but biologically not relevant 29 / 96

30 Evaluation of Example 1 II Tests instead: Compare p rel.effect = with p ratio.to control.sasabuchi = Notice the high f + for this approach 30 / 96

31 Simultaneous confidence intervals in toxicological studies I - Tox studies use similar designs: [C, T 1,..., T k ] resp. [C, D 1,..., D k ],i.e. comparing of treatments or doses versus C - Still better design: include a further positive control [C, D 1,..., D k, C + ]. Two options: i) Proof of assay sensitivity in advance (to limit f error rate), ii) to characterize a dose effect relative to C- and relative to C+. - Typical point-zero-hypothesis for T i vs. C for a difference: H 0 : µ 0 =... = µ k vs. H 1 : µ 0 < µ i (at least one i, anyone)(0... index of control) - OR for non-inferiority( toxic): H 0 : µ i µ 0 δ i vs. H 1 : µ i µ 0 > δ i 31 / 96

32 Simultaneous confidence intervals in toxicological studies II - Ordered alternative: H 1 : µ 0 µ 1... µ k ; at least µ 0 < µ k - Therefore only two methods, assuming N(µ i, σ 2 ): i) Dunnett (1955) [Dun55] two- or one-sided, ii) Williams (1971) [Wil71], one-sided on monotone increase (or decrease) 32 / 96

33 Multiple Comparison procedures for differences of µ i - demonstrated as multiple contrast test I - Aim: Simultaneous confidence intervals for (µ i µ i ), using linear test statistics - Special case: comparisons vs. control (µ i µ 0 ) - Simultaneous lower confidence limits acc. to Dunnett (1955) [Dun55]: [ x i x 0 S n 1 i + n0 1 t k,df,r,1 α; ] - A contrast is a suitable linear combination of means: k i=0 c i x i. A contrast test is standardized t Contrast = k i=0 c k i x i /S i ci 2/n i where k i=0 c i = 0 guaranteed a t df,1 α distributed level-α-test. - A multiple contrast test is defined as maximum test: t MCT = max(t 1,..., t q ) which follows jointly (t 1,..., t q ) a q-variate t- distribution with degree of freedom df and the correlation matrix R, with ρ ab = k i=1 a i b i /n i k i=1 a2 i /n k i i=1 b2 i /n i 33 / 96

34 Multiple Comparison procedures for differences of µ i - demonstrated as multiple contrast test II - Notice: With increasing average correlation and lower number of contrasts q the q-variate t-distribution tends to the univariate t- distribution, i.e. the degree of adjustment reduces - Question: which contrasts and how much? Aim: less, correlated contrasts, which are relevant to the tox questions - Simple examples (balanced design k=3) - Dunnett one-sided c i C T 1 T 2 c a c b / 96

35 Multiple Comparison procedures for differences of µ i - demonstrated as multiple contrast test III - Tukey all pairs comparisons (two-sided) c i C T 1 T 2 c a c b c c c d c e c f Williams Procedure as multiple contrast [Bre06] c i C D 1 D 2 c a c b -1 1/2 1/2 35 / 96

36 Multiple Comparison procedures for differences of µ i - demonstrated as multiple contrast test IV - Two-sided confidence intervals: [ k i=0 c i x i ± St q,df,r,2 sided,1 α k i c 2 i /n i] - Notice: multiplicity-adjusted p-values are available alternatively to simultaneous confidence intervals. And they are compatible, i.e. they yield the same decisions - Notice: although recently simultaneous confidence intervals for stepwise MCP were made available [SB08] they are non-informative and can not be recommended, regardless of their (small) power advantage 36 / 96

37 Multiple Comparison procedures for differences of µ i - demonstrated as multiple contrast test V - Example 2: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoint ALT Analyze using R!: Jittered box-plots, Dunnett procedure two-sided, variance homogeneity (heterogeneity see below), two-sided simultaneous confidence intervals Jump to next chapter 37 / 96

38 Evaluation of Example 2 I clin <- read.csv("clin.csv") clin$dose <- as.factor(clin$dose) boxplot(alt ~ Dose, data=clin, outline=false) points(jitter(as.integer(clin$dose)),clin$alt, cex=1, pch=17) library(multcomp) library(sandwich) myalt <- lm(alt ~ Dose,data=clin) exa2 <- glht(myalt, linfct = mcp(dose = "Dunnett"), alternative="two.sided") plot(exa2, xlim=c(0,450), main="clinical chemistry: ALT- variance homogeneity") 38 / 96

39 Evaluation of Example 2 II - Interpretation: i) Although box-plots indicate variance homogeneity, standard Dunnett procedure is used ii) Use directional decisions!, iii) Interprete: significant, but biologically not relevant Notice the high f for this approach since FWER is controlled 39 / 96

40 Multiple comparisons for ratios of µ i I - Aim: simultaneous confidence intervals for µ i /µ 0 - Trick: Re-formulation the ratios in a linear form Z i0 = x i θ x 0 (Fieller, 1954) [Fie54] (Assumption θ = const.) [ ] - Therefore Z i0 N(0, σz 2 i 0 ), where σ2 Z i0 = 1 + θ2 ni n 0 σ 2 - t i0 (θ) = x i θ x 0 S Zi0 is univariate t- distributed - Simultaneous confidence intervals for the ratios γ i0 = µ i /µ 0 ( γ i G) ± [ ( ( γ i G) 2 (1 G) γ i 2 N G n i )] 1 2 /(1 G) i = 1,..., q, where G = S 2 q 2 α,m,ν,r /(N x 2 0 ) - Notice, the equi-coordinate percentage point t q,ν,r,1 α depends on the unknown ratios γ i0 by the correlation matrix 40 / 96

41 Multiple comparisons for ratios of µ i II - Solutions: Bonferroni, Sidak, Plug-in [DBGH04] - Software: R package mratios [DSH07] - Example 3: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoints Cholesterol and Triglyceride. Analyse using R!: Jittered box-plots,two-sided Dunnett-type procedure for ratios-to-control assuming variance homogeneity Jump to next chapter 41 / 96

42 Evaluation of Example 3 I library(mratios) plot(sci.ratio(cholesterol~dose, data=clin, type="dunnett"), main="cholesterol") plot(sci.ratio(triglyceride~dose, data=clin, type="dunnett"), main="triglyceride") - Interpretation: i) Use the dimensionlessness of ratios-to-control comparisons and interpreted both endpoints in terms of significance and relevance, ii) Use directional decisions! 42 / 96

43 Modifications for variance heterogeneity I - Variance heterogeneity is more likely in real toxicological data than variance homogeneity, since a possible proportionality between variance and mean - Particularly in unbalanced designs ( n i inverse to s i ) neither two-sample tests nor multiple contrast tests control α - Therefore, modifications for variance heterogeneity are highly recommended in toxicology. They can used as default approach (accepting some conservativeness for homogeneous variances) or conditional to pre-tests e.g. according to Levene [PF09] - Three approaches: i) Using a sandwich estimator for variance-covariance matrix in the linear model [HSH10], ii) Welch-type df-adjustment for multiple contrast tests [Has09], iii) Behrens-Fisher modification of non-parametric tests [FK09]. R-programs are available. 43 / 96

44 Modifications for variance heterogeneity II - Example 4: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoint ALT Analyse using R!: Jittered box-plots, two- and one-sided Dunnett-type procedure assuming variance heterogeneity Jump to next chapter 44 / 96

45 Evaluation of Example 4 I boxplot(alt ~ Dose, data=clin, outline=false) points(jitter(as.integer(clin$dose)),clin$alt, cex=1, pch=17) sandwich(myalt) myvcov <- vcovhc(myalt, type = "HC") exa4 <- glht(myalt, linfct=mcp(dose = "Dunnett"),vcov=sandwich, alternative="two.sided") plot(exa4,xlim=c(0,450), main="clinical chemistry: ALT- variance heterogeneity") exa4a <- glht(myalt, linfct=mcp(dose = "Dunnett"),vcov=sandwich, alternative="greater") plot(exa4a,xlim=c(0,450), main="clinical chemistry: ALT- variance heterogeneity for an increase") 45 / 96

46 Evaluation of Example 4 II - Interpretation: i) Notice variance heterogeneity in the box-plots, ii) Use sandwich estimator, - Compare one-and two-sided intervals 46 / 96

47 Evaluation of Example 5 I Example 5: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoints Cholesterol Analyse using R!: Jittered box-plots, two-sided Dunnett-type procedure assuming variance heterogeneity Jump to next chapter boxplot(cholesterol~dose, data=clin, outline=false, main="cholesterol") points(jitter(as.integer(clin$dose)),clin$cholesterol, cex=0.5, pch=16) library(mratios) exa5 <- sci.ratiovh(cholesterol~dose, data=clin, type="dunnett") plot(exa5, main="cholesterol: variance homogeneity") 47 / 96

48 Evaluation of Example 5 II - Interpretation: i) Notice variance heterogeneity in the box-plots, ii) Use new Welch modification [Has09],iii) Compare one-and two-sided intervals - Questions so far? 48 / 96

49 Trend tests and related simultaneous confidence intervals I - Important criteria of relevance in the proof of hazard: a significant trend. Question: what means trend? Two criteria: i) one-sided, ii) monotone, i.e. H 1 : µ C µ 1... µ k, i.e. all possible elementary hypotheses, not just a linear trend. This alternative H 1 : µ C < µ 1 =... = µ k is hard to accept as a trend by some toxicologists, but it is a trend alternative - Therefore, a trend test must be sensitive against all possible elementary alternatives, not against just one, e.g. the linear as the wide-spread used Cochran-Armitage trend test [Arm55] for proportions or the Jonckheere trend test for pairwise ranks. - At least two approaches: MLE-test acc. to [Bar59] quadratic test statistics, and MCT linear test statistics - A trend test, which compares vs. control: Williams trend test [Wil71]. 49 / 96

50 Trend tests and related simultaneous confidence intervals II - For studies for 2 to 4 doses (typically in toxicology), model-based approaches difficult (see R library MCPMod [BPB05]) - But Williams (1971) procedure [Wil71] is first choice, because monotone alternative vs. control - The contrast structure c i C D 1 D 2 c a c b -1 1/2 1/2 - Example 6: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoint Cholesterol Analyse using R!: Williams procedure for a monotonous decrease assuming variance heterogeneity Jump to next chapter 50 / 96

51 Trend tests and related simultaneous confidence intervals III library(multcomp) mychol <- lm(cholesterol ~ Dose,data=clin) sandwich(mychol) myvcov <- vcovhc(mychol, type = "HC") exa6 <- glht(mychol, linfct=mcp(dose = "Williams"),vcov=sandwich, alternative="less") plot(exa6, main="williams trend approach: Cholesterol- variance heterogeneity") - Interpretation: i) global monotonic decrease, ii) minimal effective dose: / 96

52 Trend tests and related simultaneous confidence intervals IV - Example 7: Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only and the endpoint Cholesterol Analyse using R!: Williams-type procedure for monotonous decreasing ratios-to-control assuming variance heterogeneity Jump to next chapter library(mratios) exa7 <- sci.ratiovh(cholesterol~dose, data=clin, type="williams",alternative="less") plot(exa7,main="cholesterol: Williams-type ratios-to-control" ) 52 / 96

53 Non-parametric approaches and related simultaneous confidence intervals I - For non-normal data, the trend test according to Shirley [SHI77] is widely used toxicology. I.e. the observations are jointly ranked and Williams test [WIL72] is applied. - H0 F : F 0 =... = F k formulated in terms of the distribution functions against the ordered alternative H1 F : F 0... F k with at least one strict inequality F i < F s, i s. It controls the FWER strongly. - The distribution of the rank means is unknown under the alternative, neither simultaneous confidence intervals are numerically available for a general unbalanced design, nor power can be estimated. - Tied or ordered categorical data, such as severity counts, should be analyzed as well. Therefore, a non-parametric approach is required that includes continuous, discrete, and even dichotomous data in a unified way. 53 / 96

54 Non-parametric approaches and related simultaneous confidence intervals II - Since variance heterogeneity occur (particularly increasing variances with increasing effects in unbalanced designs with n Control = kn Doses ), the control of the FWER may be problematic. Therefore, a related robust procedure is needed, the so-called Behrens-Fisher (BF) modification as an analogue to the related parametric approach for multiple contrast tests under variance heterogeneity [HH08]. - Using relative effect size [BM00],[RA08]: p 01 = F 0 df 1 = P(X 01 < X 11 ) + 0.5P(X 01 = X 11 ). (2) Hereby, the addition 0.5P(X 01 = X 11 ) ensures that data with ties are taken into account. 54 / 96

55 Non-parametric approaches and related simultaneous confidence intervals III - Note that the numerator of the Wilcoxon statistic estimates the relative effect p 01. The Wilcoxon test, however, can only be used for testing the hypothesis H0 F : F 0 = F 1 formulated in terms of the distribution functions. Moreover, the Wilcoxon-test procedure is not robust against variance heterogeneity. Therefore, test procedures which test the hypothesis in terms of the relative effect, e.g. the Brunner-Munzel-test [BM00], are more appropriate. - Relative Shirley-type effects: Let n k 1 n k 1 +n k C q (k+1) = n 1 1 n n k... n k 1 n n k denote the Williams contrast matrix [Bre06]. n k n k 1 +n k n k n n k 55 / 96

56 Non-parametric approaches and related simultaneous confidence intervals IV - E.g., for the common balanced design with three dose groups and one control, the three contrasts are: C 3 4 = That is, the first contrast indicates a strictly global trend, the second contrast a plateau for the two higher doses, and the third contrast a plateau of all doses, just different from control.. 56 / 96

57 Non-parametric approaches and related simultaneous confidence intervals V - Therefore, treatment effects can be defined by using the relative effect between the distribution of the negative control group F 0 and the distribution of the samples M l, l = 1,..., q: p 1 = p 0k p 2 = n k 1 n k p n k 1 + n 0(k 1) + p 0k k n k 1 + n k. p q = n 1 n k p p 0k. n n k n n k - The effects p 1,..., p q are called relative Shirley-type effects and they denote linear combinations of the two-sample relative effects between the negative control group and the active treatments. Therefore, in case of a monotonically increasing order of location, the relative Shirley-type effects p 1,..., p q decrease, i.e., p 1 p 2... p q. 57 / 96

58 Non-parametric approaches and related simultaneous confidence intervals VI - Example 8: Shirley-type test for Potassium (Clinical chemistry data of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only) The box-plots indicate a right-skewed distribution with variance heterogeneity and therefore a Behrens-Fisher modification of the Shirley trend test for relative effects by means of the R library nparcomp is used. The relative effect size allows a scale-independent comparison of multiple endpoints, such as in clinical chemistry. Analyse using R! Jump to next chapter Interpretation: i) normal distribution questionable, ii) NTP requires non-parametric trend test, iii) variance heterogeneity, iv) relative effect sizes and their extremest confidence limits: not harmful 58 / 96

59 Non-parametric approaches and related simultaneous confidence intervals VII boxplot(potassium~dose, data=clin, outline=false, main="potassium") points(jitter(as.integer(clin$dose)),clin$potassium, cex=0.5, pch=15) library(nparcomp) nparcomp(potassium~dose, data=clin, type ="Williams", conflevel = 0.95, alternative ="two.sided", rounds 59 / 96

60 Non-parametric approaches and related simultaneous confidence intervals VIII - Example 9: Shirley-type test for graded histopathological findings Scores data are particularly suitable for statistics of relative effects [RA08] The graded findings [none, Mild, Moderate, Marked] will be transferred into the equal-distant scores [0,1,2,3] The relative effect sizes and their Shirley-type simultaneous confidence intervals will be estimated by means of the R library nparcomp Analyse using R! Jump to next chapter parotid <- read.csv("parotid.csv") boxplot(score~group, data=parotid, outline=false, main="graded histopathological findings") points(jitter(as.integer(parotid$group)),parotid$score, cex=0.5, pch=16) library(nparcomp) nparcomp(score ~Group, data=parotid, asy.method = "mult.t", type = "Williams", alternative = "grea 60 / 96

61 Non-parametric approaches and related simultaneous confidence intervals IX Interpretation: i) analysis of ordered categorical data, particularly on trend, is not trivial, ii) interpretation of ordered categorial data as ordinal effect size measure [RA08], iii) - Questions so far? 61 / 96

62 Simultaneous confidence intervals for proportions I - Rates are rather typically in toxicological studies, e.g. histopathological findings, mortality, tumor rates - General contradiction in toxicological risk assessment: the evaluation of continous endpoints is powerful and related statistical approaches are widely available- however their predictive value is limited, such as body weight, hematology. On the other hand, the predictive value of proportions, such as selected clinical findings, is larger, but the power is much lower and appropriate statistical approaches are rarely available for such small sample sizes - Moreover, for sample sizes of n i = there is no hope for valid (1 α) Wald intervals- therefore we need confidence intervals where its coverage probability is also for smaller samples (not really small samples) is approximately 95% 62 / 96

63 Simultaneous confidence intervals for proportions II - And, for all proportions a one-sided alternative for an increase is appropriate, never a two-sided alternative - As effect size the difference of proportions is common. Alternatively the relative risk or the odds ratio could be used 63 / 96

64 Two-sample comparisons I - There is an ongoing discussion on appropriate small-sample confidence intervals, whereas we focus on one-sided lower limits. - Based on the score interval proposed by [New98] introduced an interval for the difference of proportions (referred to as Newcombes Hybrid Score interval (NHS). Its variance term is constructed based on Wilson score confidence limits for the single proportions. The lower (1 α/2) limit is: ˆπ 2 ˆπ 1 z 1 α/2 l 2 (1 l 2 ) n 2 + u 1(1 u 1 ) n 1, (3) where l i, u i is the lower, upper limit of the (1 α) Wilson score interval for the single proportion π i : 64 / 96

65 Two-sample comparisons II [l i, u i ] = y i + z2 1 α/2 2 n i + z1 α/2 2 ± z 1 α/2 n i ˆπ i (1 ˆπ i ) + z2 1 α/2 4 n i + z 2 1 α/2 (4) - In the R library pairwiseci related confidence intervals for the option "NHS" are available. - Example 10: Unadjusted two-sample lower NHS confidence limits for tubular epithelia hyaline droplet degeneration in male rats. Notice, the data are not in a two-by-two table, but more realistic in an animal-specific flat file - Analyse using R! Jump to next chapter 65 / 96

66 Two-sample comparisons III tubepi <- read.csv("tubepi.csv") # NTP P-Cresidine carcinogenicity study histop library(pairwiseci) exa10 <- pairwiseci(cbind(tubularepithelia,without) ~ Group, data=tubepi, alter plot(exa10, main="proportions of tubular epithelia") - Interpretation: 66 / 96

67 A Dunnett-type for proportions I - One-sided, lower (1 α) Wald-type confidence limits for the difference of the proportions of treatment against those from a control are I c i p i z I q,r,1 α ˆV (p i ) (5) i=1 with ˆV (p i ) = p i (1 p i ) /n i and z q,r,1 α denoting the (1 α) quantile of the q-variate normal distribution whereas its correlation matrix R depends not only on the known contrast coefficients c im and sample sizes n i but also on the unknown π i and V (p i ) where the plug-in of the ML-estimators ˆπ i and Ṽ (p i ) works well. - However, Wald limits for binomial proportions are known to keep the (1 α) coverage probability only for asymptotically large sample sizes [AC00], [PB04] i=1 c 2 i 67 / 96

68 A Dunnett-type for proportions II - [AC98] showed that adding a total of four pseudo-observations to the observed successes and failures yields approximate confidence intervals for one binomial proportion with good small sample performance - One-sided limits was investigated only by [Cai05] in the case of a single binomial proportion, and recently [SV09] I c i p i z I q,r,1 α ci 2Ṽ ( p i ) (6) i=1 i=1 Table: Choices for p i and Ṽ (p i) Notation p i Ṽ (p i ) Wald Y i /n i p i (1 p i ) /n i add-1 (Y i + 0.5) / (n i + 1) p i (1 p i ) / (n i + 1) add-2 (Y i + 1) / (n i + 2) p i (1 p i ) / (n i + 2) 68 / 96

69 A Dunnett-type for proportions III - A simulation study [SSH08] indicates the use of the add1 approximation for one-sided lower limits when sample sizes are not too small - Example 11: Simultaneous confidence limits for tubular epithelia hyaline droplet degeneration in male rats. Notice, the data are not in a two-by-two table, but more realistic in an animal-specific flat file - Analyse using R! Jump to next chapter 69 / 96

70 A Dunnett-type for proportions IV library(mcpan) exa11 <- binomrdci(tubularepithelia ~ Group, data=tubepi,type="dunnett", cmat=n plot(exa11, main="dunnett-type procedure for proportions of tubular epithelia") - Interpretation 70 / 96

71 A Williams-type for proportions I - Lower simultaneous confidence limits for a small-to-medium sample sizes Williams-type approach is analogously to the Dunnett-type approach, whereas the contrast coefficients c i are for the specific order restriction, see the parametric approach above (a recent publication: Hothorn and Schaarschmidt 2010) - Example 12: Simultaneous confidence limits for tubular epithelia hyaline droplet degeneration in male rats. Notice, the data are not in a two-by-two table, but more realistic in an animal-specific flat file - Analyse using R! Jump to next chapter 71 / 96

72 A Williams-type for proportions II library(mcpan) exa12 <- binomrdci(tubularepithelia ~ Group, data=tubepi,type="williams", cmat= plot(exa12, main="williams-type procedure for proportions of tubular epithelia" - Interpretation - Questions so far? 72 / 96

73 Proof of safety by means of confidence intervals I - Proof of hazard is not adequate: Absence of evidence is not evidence of absence [AB04] - Advantage: direct control of the more important f error rate, i.e. consumers risk - Therefore, hypotheses on equivalence for endpoints where increase OR decrease are possible toxic effects, e.g. body weight change, OR on non-inferiority for endpoints where exactly one direction is a toxic effect, e.g. increasing tumor rates - Both hypotheses need an a priori definition of a relevance threshold δ (for difference to control) or ratio to control θ, e.g. 2-fold rule of the Ames Assay. - However, in the guidelines such endpoint-specific thresholds are rarely to find (k-fold rules). A more realistic strategy is to estimate confidence interval and interpret those post-hoc as thresholds. 73 / 96

74 Proof of safety by means of confidence intervals II - Question: Difference or ratios? i) Choice between additive or multiplicative model, ii) Ratio is dimensionless, i.e. % change; appropriate for multiple endpoints - Definition of local or global safety in the common design [C, D 1, D 2, D 3 ] related to the dose groups: i) local: D 1 OR D 2 OR D 3 are safe (UIT), ii) global D 1 AND D 2 AND D 3 are safe (IUT) - Marginal or global safety for multiple endpoints: i) marginal Y i is safe, ii) global Y 1 AND Y 2 AND...AND Y p are safe (IUT) - Notice, IUTs are rather conservative and a solution taking the correlations into account is not available yet 74 / 96

75 Proof of safety for non-inferiority: normal distributed endpoints I - Assumptions: i) directional toxic decision, e.g.increase is toxic, ii) N(µ i, σ 2 ), iii) in a randomized oneway-layout - Hypotheses: H 0 : µ D µ C δ with δ > 0 harmful H 1 : µ D µ C < δ harmless - Translation into a confidence limit: Harmless [upper µi µ 0 ; ] < δ respective [upper µi /µ 0 ] < θ; harmful otherwise - One-sided simultaneous (1 α) confidence intervals acc. to Dunnett [Dun55] using library(multcomp) 75 / 96

76 Proof of safety for non-inferiority: normal distributed endpoints II - Example 13: Hematology parameter hemoglobine of the 13 weeks study on sodium dichromate dihydrate in female F344 rats, final data at day 93 only A priori we assume: only decreasing hemoglobine values are hazardous The box-plots indicate approximate symmetric distribution, but variance heterogeneity occurs. I.e. the one-sided lower limits are of interest for claiming non-inferiority- but we do not know any safety threshold Analyse using R! Jump to next chapter 76 / 96

77 Proof of safety for non-inferiority: normal distributed endpoints III hema <- read.csv("hema.csv") hema$dose <- as.factor(hema$dose) hemaf <- hema[hema$sex=="female", ] boxplot(hb ~ Dose, data=hemaf, outline=false, main="hemoglobin") points(jitter(as.integer(hemaf$dose)),hemaf$hb, cex=1, pch=12) myhb <- lm(hb ~ Dose, data=hemaf) library(multcomp) exa13 <-glht(myhb, linfct = mcp(dose = "Dunnett"), alternative="greater") plot(exa13, main="hb- proof of non-inferiority") 77 / 96

78 Proof of safety for non-inferiority: normal distributed endpoints IV Interpretation using δ = 1.0 the lower doses 62.5,.., 500mg/kg are harmless (non-inferior with respect to control), but the high doses is not harmless. The post-hoc choice of δ in the scale of hemoglobin is the problem / 96

79 Proof of safety for non-inferiority: ratios to control I - Relative changes are easier to interpret, particularly for multiple endpoints, e.g. in chronic studies - One-sided simultaneous (1 α) confidence intervals for ratios to control acc. to [DBGH04] using library(mratio) - Example 14: Hemoglobin, again the lower limits are of interest. Interpretation using a relative threshold θ = 0.8 may be more appropriate Jump to next chapter 79 / 96

80 Proof of safety for non-inferiority: ratios to control II library(mratios) exa14 <-sci.ratiovh(hb~dose, data=hemaf, type="dunnett",alternative="greater") plot(exa14,rho0 = c(0.9,1), rho0lty=c(2,1), rho0col=c("blue","black"),main="hb- 80 / 96

81 Proof of safety for non-inferiority: proportions I - Rates are rather typically in toxicological studies, e.g. histopathological findings, mortality, tumor rates - For sample sizes of n i = there is no hope on valid (1 α) Wald intervals - And, for all proportions a one-sided alternative for an increase is appropriate, never a two-sided alternative - Alternative: [RM99], but for two-sample comparisons only - Alternative, for one-sided confidence intervals: add-1 intervals acc. to Agresti [AC00] can be used for moderate sample sizes [SV09], i.e. instead of p i = r i /n i we use p i = r i +0.5 n i +1. Using the R library(mcpan) 81 / 96

82 Proof of safety for non-inferiority: proportions II - Example 15: In a chronic study with a design (0, 10,50,100 mg/kg) 4,1,6,8 animals died of 40,20,20,20 randomized animals - Analyse using R! Jump to next chapter library(mcpan) died <- c(4,1,6,8) animals <- c(40,20,20,20) dosesn <- c("0", "10", "50", "100") exa15 <-binomrdci(n=animals, x=died, names=dosesn, alternative="less", method=" plot(exa15, main="mortality rates- proof of safety") - Interpretation 82 / 96

83 Two-sided hypotheses: claiming equivalence I - Bofinger s [BB95]procedure for claiming equivalence in several treatments with respect to a control group Hypotheses: H 0i : µ i µ 0 δ (harmful) vs. H 1i : µ i µ 0 < δ (harmless) (1 i k) with a relevant threshold δ > 0. The null hypotheses can be formed as H 0i : µ i µ 0 δ or µ i µ 0 δ (1 i k). 83 / 96

84 Two-sided hypotheses: claiming equivalence II The limits of the two-sided (1 α)100% simultaneous confidence intervals are given as ( ˆδ (l) i = min X i X 1 0 t k,1 α (ν, R) S ˆδ (u) i = max ( X i X 0 + t k,1 α (ν, R) S + 1 ), 0, n i n ), 0 n i n 0 (1 i (7) k) with the lower (1 α) quantile t k,1 α (ν, R) of an underlying k-variate t-distribution with ν = k i=0 (n i 1) degrees of freedom and correlation matrix R = (r im ) i,m according to Tong [Ton69] and Bofinger and Bofinger [BB95], 84 / 96

85 Two-sided hypotheses: claiming equivalence III where 1, i = m, ρ, i m, i, m {1, 2,..., t} or i, m {t + 1, t + 2,... r im = ρ, i m, i {1, 2,..., t} and m {t + 1, t + 2,..., k} ρ, i m, m {1, 2,..., t} and i {t + 1, t + 2,..., k} (8) with ρ = 1 (1 + n0 /n 1 )(1 + n 0 /n 1 ) and t = k/2 (the integral part of k/2). Note, the approach of Bofinger and Bofinger [BB95] is only correct for balancedness within the non-control group doses. For the case of unbalancedness, one can not derive a single ρ for all i and m, and a Bonferroni-adjusted TOST approach is an alternative. (9) 85 / 96

86 Two-sided hypotheses: claiming equivalence IV - Example 16: Evaluation of relative thymus weights ( 1000) using ETC Analyse using R!: parametric Bofinger approach, assuming hazardous changes in thymus weights are possible (i.e. increase or decrease), relevance threshold δ = 0.15 Jump to next chapter library(etc) organ <- read.csv("organ.csv") organ$dose <- as.factor(organ$dose) tym <- organ[organ$organ=="thymus", ] ; tym$tym_r_weight <- tym$weight*1 boxplot(tym_r_weight ~ Dose, data=tym, outline=false, main="relative thym points(jitter(as.integer(tym$dose)),tym$tym_r_weight, cex=1, pch=21) summary(etc.diff(tym_r_weight~dose, data=tym, margin.up=0.15, method="bof 86 / 96

87 Two-sided hypotheses: claiming equivalence V estimate statistic lower upper p.value Interpretation 87 / 96

88 Two-sided hypotheses: claiming equivalence VI - A Bonferroni-TOST approach [HVH08] Bofinger s approach is limited to Gaussian distributed endpoints, variance homogeneity and balanced sample sizes in the dose groups - rather restricted assumptions for real toxicological studies. Taken the special structure of the correlation matrix in Equation (8) into account, it becomes clear that a Bonferroni-type alternative [HKH99] does not loose much power even when all assumptions are fulfilled, is much simpler and can be generalized for several situations. We denote it here as Bonferroni-TOST approach because it bases on the two-one-sided-t-tests (TOST) and the multiplicity adjustment according to Bonferroni. The loss in power under the margin of the null hypothesis (where the power equals the type I error) was about 7%, and for settings under the alternative hypothesis about 3% or less [HVH08] 88 / 96

89 Two-sided hypotheses: claiming equivalence VII - The Bonferroni-TOST approach for ratios-to-controls Because of the specific structure of the correlation matrix in the Bofinger approach, there is no hope for a ratio-based version- but ratio-to-control is appropriate for multiple endpoints. Fieller-type confidence intervals can be used accordingly - Bonferroni-TOST approach when variance heterogeneity occurs Variance heterogeneity occurs sometimes in toxicological studies, e.g. increasing variance with increasing effects in the dose groups, where the control is a zero-dose group. Two-sided (1 2α)100% Welch-type confidence intervals will be estimated for the individual comparisons D i C, each at the Bonferroni level α = α/k - Non-parametric Bonferroni-TOST Non-parametric exact Hodges-Lehmann intervals [HL63] or intervals for relative effects can be used 89 / 96

User-defined contrasts within multiple contrast tests- case studies using R

User-defined contrasts within multiple contrast tests- case studies using R 1 / 37 Vienna Section ROES User-defined contrasts within multiple contrast tests- case studies using R Ludwig A. Hothorn hothorn@biostat.uni-hannover.de Institute of Biostatistics, Leibniz University Hannover,

More information

A combined approach for claiming equivalence and difference for multiple endpoints with application in GMO-trials 1

A combined approach for claiming equivalence and difference for multiple endpoints with application in GMO-trials 1 A combined approach for claiming equivalence and difference for multiple endpoints with application in GMO-trials 1 Ludwig A. Hothorn Institute of Biostatistics Leibniz University Hannover e-mail: hothorn@biostat.uni-hannover.de

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

Vienna Medical University 11/2014. Quality ranking. or... Comparisons against the grand mean

Vienna Medical University 11/2014. Quality ranking. or... Comparisons against the grand mean 1 / 30 Vienna Medical University 11/2014 Quality ranking or... Comparisons against the grand mean Ludwig A. Hothorn hothorn@biostat.uni-hannover.de Institute of Biostatistics, Leibniz University Hannover,

More information

Simultaneous Confidence Intervals and Multiple Contrast Tests

Simultaneous Confidence Intervals and Multiple Contrast Tests Simultaneous Confidence Intervals and Multiple Contrast Tests Edgar Brunner Abteilung Medizinische Statistik Universität Göttingen 1 Contents Parametric Methods Motivating Example SCI Method Analysis of

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

of 2-Phenoxyethanol in B6D2F1 Mice

of 2-Phenoxyethanol in B6D2F1 Mice Summary of Drinking Water Carcinogenicity Study of 2-Phenoxyethanol in B6D2F1 Mice June 2007 Japan Bioassay Research Center Japan Industrial Safety and Health Association PREFACE The tests were contracted

More information

COMPARING SEVERAL MEANS: ANOVA

COMPARING SEVERAL MEANS: ANOVA LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons

More information

Multiple Comparison Methods for Means

Multiple Comparison Methods for Means SIAM REVIEW Vol. 44, No. 2, pp. 259 278 c 2002 Society for Industrial and Applied Mathematics Multiple Comparison Methods for Means John A. Rafter Martha L. Abell James P. Braselton Abstract. Multiple

More information

Modeling and inference for an ordinal effect size measure

Modeling and inference for an ordinal effect size measure STATISTICS IN MEDICINE Statist Med 2007; 00:1 15 Modeling and inference for an ordinal effect size measure Euijung Ryu, and Alan Agresti Department of Statistics, University of Florida, Gainesville, FL

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

More information

Online publication date: 22 March 2010

Online publication date: 22 March 2010 This article was downloaded by: [South Dakota State University] On: 25 March 2010 Access details: Access Details: [subscription number 919556249] Publisher Taylor & Francis Informa Ltd Registered in England

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Comparison of Two Samples

Comparison of Two Samples 2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials Research Article Received 29 January 2010, Accepted 26 May 2010 Published online 18 April 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4008 Mixtures of multiple testing procedures

More information

Relative Potency Estimations in Multiple Bioassay Problems

Relative Potency Estimations in Multiple Bioassay Problems Relative Potency Estimations in Multiple Bioassay Problems Gemechis Dilba Institute of Biostatistics, Leibniz University of Hannover, Germany 5 th International Conference on Multiple Comparison Procedures

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 01 / 2010 Leibniz University of Hannover Natural Sciences Faculty Titel: Multiple contrast tests for multiple endpoints Author: Mario Hasler 1 1 Lehrfach Variationsstatistik,

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t t Confidence Interval for Population Mean Comparing z and t Confidence Intervals When neither z nor t Applies

More information

Group comparison test for independent samples

Group comparison test for independent samples Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences between means. Supposing that: samples come from normal populations

More information

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences Faculty of Health Sciences Categorical covariate, Quantitative outcome Regression models Categorical covariate, Quantitative outcome Lene Theil Skovgaard April 29, 2013 PKA & LTS, Sect. 3.2, 3.2.1 ANOVA

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013 Adaptive designs beyond p-value combination methods Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013 Outline Introduction Combination-p-value method and conditional error function

More information

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function

More information

Non-parametric methods

Non-parametric methods Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Basic Concepts of Inference

Basic Concepts of Inference Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT).

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests Chapter 59 Two Correlated Proportions on- Inferiority, Superiority, and Equivalence Tests Introduction This chapter documents three closely related procedures: non-inferiority tests, superiority (by a

More information

Tutorial 2: Power and Sample Size for the Paired Sample t-test

Tutorial 2: Power and Sample Size for the Paired Sample t-test Tutorial 2: Power and Sample Size for the Paired Sample t-test Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function of sample size, variability,

More information

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim Frank Bretz Statistical Methodology, Novartis Joint work with Martin Posch (Medical University

More information

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz STP 31 Example EXAM # Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT) . Welcome! Webinar Biostatistics: sample size & power Thursday, April 26, 12:30 1:30 pm (NDT) Get started now: Please check if your speakers are working and mute your audio. Please use the chat box to

More information

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics 1 Multiple Endpoints: A Review and New Developments Ajit C. Tamhane (Joint work with Brent R. Logan) Department of IE/MS and Statistics Northwestern University Evanston, IL 60208 ajit@iems.northwestern.edu

More information

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18 and t-tests Patrick Breheny October 15 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18 Introduction The t-test As we discussed previously, W.S. Gossett derived the t-distribution as a way of

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Didacticiel Études de cas. Parametric hypothesis testing for comparison of two or more populations. Independent and dependent samples.

Didacticiel Études de cas. Parametric hypothesis testing for comparison of two or more populations. Independent and dependent samples. 1 Subject Parametric hypothesis testing for comparison of two or more populations. Independent and dependent samples. The tests for comparison of population try to determine if K (K 2) samples come from

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics

A Brief Introduction to Intersection-Union Tests. Jimmy Akira Doi. North Carolina State University Department of Statistics Introduction A Brief Introduction to Intersection-Union Tests Often, the quality of a product is determined by several parameters. The product is determined to be acceptable if each of the parameters meets

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

The Design of a Survival Study

The Design of a Survival Study The Design of a Survival Study The design of survival studies are usually based on the logrank test, and sometimes assumes the exponential distribution. As in standard designs, the power depends on The

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

An introduction to biostatistics: part 1

An introduction to biostatistics: part 1 An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random

More information

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600

Multiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600 Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

In ANOVA the response variable is numerical and the explanatory variables are categorical.

In ANOVA the response variable is numerical and the explanatory variables are categorical. 1 ANOVA ANOVA means ANalysis Of VAriance. The ANOVA is a tool for studying the influence of one or more qualitative variables on the mean of a numerical variable in a population. In ANOVA the response

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

8.1-4 Test of Hypotheses Based on a Single Sample

8.1-4 Test of Hypotheses Based on a Single Sample 8.1-4 Test of Hypotheses Based on a Single Sample Example 1 (Example 8.6, p. 312) A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true average system-activation

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Simultaneous Confidence Intervals for Risk Ratios in the Many-to-One Comparisons of Proportions

Simultaneous Confidence Intervals for Risk Ratios in the Many-to-One Comparisons of Proportions Western University Scholarship@Western Electronic Thesis and Dissertation Repository August 2012 Simultaneous Confidence Intervals for Risk Ratios in the Many-to-One Comparisons of Proportions Jungwon

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Week 14 Comparing k(> 2) Populations

Week 14 Comparing k(> 2) Populations Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Two sample Hypothesis tests in R.

Two sample Hypothesis tests in R. Example. (Dependent samples) Two sample Hypothesis tests in R. A Calculus professor gives their students a 10 question algebra pretest on the first day of class, and a similar test towards the end of the

More information

Contrasts (in general)

Contrasts (in general) 10/1/015 6-09/749 Experimental Design for Behavioral and Social Sciences Contrasts (in general) Context: An ANOVA rejects the overall null hypothesis that all k means of some factor are not equal, i.e.,

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

A simulation study for comparing testing statistics in response-adaptive randomization

A simulation study for comparing testing statistics in response-adaptive randomization RESEARCH ARTICLE Open Access A simulation study for comparing testing statistics in response-adaptive randomization Xuemin Gu 1, J Jack Lee 2* Abstract Background: Response-adaptive randomizations are

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I. Statistiek I t-tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/46 Overview 1 Basics on t-tests 2 Independent Sample t-tests 3 Single-Sample

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Session 3 The proportional odds model and the Mann-Whitney test

Session 3 The proportional odds model and the Mann-Whitney test Session 3 The proportional odds model and the Mann-Whitney test 3.1 A unified approach to inference 3.2 Analysis via dichotomisation 3.3 Proportional odds 3.4 Relationship with the Mann-Whitney test Session

More information

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization Statistics and Probability Letters ( ) Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: wwwelseviercom/locate/stapro Using randomization tests to preserve

More information

H0: Tested by k-grp ANOVA

H0: Tested by k-grp ANOVA Analyses of K-Group Designs : Omnibus F, Pairwise Comparisons & Trend Analyses ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals This article was downloaded by: [Kalimuthu Krishnamoorthy] On: 11 February 01, At: 08:40 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 107954 Registered office:

More information

Intro to Parametric & Nonparametric Statistics

Intro to Parametric & Nonparametric Statistics Kinds of variable The classics & some others Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats come

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core]

Introduction to Statistical Analysis. Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core] Introduction to Statistical Analysis Cancer Research UK 12 th of February 2018 D.-L. Couturier / M. Eldridge / M. Fernandes [Bioinformatics core] 2 Timeline 9:30 Morning I I 45mn Lecture: data type, summary

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements: Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported in

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

http://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences

More information