E509A: Principle of Biostatistics. GY Zou
|
|
- Stewart Allen
- 5 years ago
- Views:
Transcription
1 E509A: Principle of Biostatistics (Effect measures ) GY Zou gzou@robarts.ca
2 We have discussed inference procedures for 2 2 tables in the context of comparing two groups. Yes No Group 1 a b n 1 Group 2 c d n 2 m 1 m 2 n For hypothesis testing, we use Pearson chi-square test; For interval estimation, we use methods for p 1 p 2 (of course, NNT).
3 However, Pearson chi-square test will work only if expected value for every cell is greater than 5. For data with small cells, we can use Fisher s exact test. The idea of this test is to fix the row and column totals as the observed table, and compute the probabilities of observing as or more extreme tables in their departure from the null hypothesis (recall the definition of P -value). Fisher (1935, The logic of inductive inference JRSS A 98: 39-54) presented his test at the annual Christmas meeting of the Royal Statistical Society. The title is very good, because I ve heard that the most important contribution of statistics to science is not the formula, but logic. Still, right after his talk, a speaker compared Fisher s talk to the braying of the Golden Ass.
4 The probability of observing the table is given by Pr(a, b, c, d marginals = n 1,n 2,m 1,m 2 )= n 1!n 2!m 1!m 2! n!a!b!c!d! Fisher s procedure requires the probability of all more extreme tables to be computed, using Eq (1) repeatedly. (1) The p-value of the test is obtained by definition: Sum of all those probabilities. Thus, Fisher s exact test is essentially one-sided. If two-sided is called for, the simplest way to do it is to double the p value. This is exactly SAS proc freq gives you when n 1 = n 2.
5 Example of Fisher s exact test (p. 375). Yes No Group Group Pr(a, b, c, d 48, 9, 24) = 24!24!9!39! 48!8!16!1!23! = Amoreextremetableis: Yes No Group Group Pr(a, b, c, d 48, 9, 24) = 24!24!9!39! 48!9!15!0!24! = The p-value is then =
6 SAS function for hypergeometric probability * pdf( HYPER, a, n, n 1,m 1 ); data; bb=pdf( HYPER, 8, 48, 24, 9); cc=pdf( HYPER, 9, 48, 24, 9); dd=bb+cc; proc print; run; Obs bb cc dd Two-sided p-value is then =
7 Yes No Group Group data fisher; do i = 0 to 9; bb=pdf( HYPER, i, 44, 9, 24); output; end; proc print; run;
8 The SAS System Obs i bb One-sided p-value is = Another way called mid-p-value (Lancaster 1961 JASA 56: ): One sided mid-p value is 1/ = Two-sided p-value is 2 mid p = = Two-sided p-values is given by = This is how SAS obtains two-sided p-value.
9 The way to present the results is: Rate in group I was?, in Group II was?; difference? (95% confidence interval? to? ), P =? (Fisher s Exact test two-sided mid P).
10 McKinney et al (1989 The inexact use of Fisher s exact test in six major medical journals JAMA 261: ). Half of 70 articles reviewed either had used a one-tailed test when a two-tailed test was called for, or the authors simply had not bothered to state which test they had used.
11 If only hypothesis testing, an epidemiologist s life would be too easy. Effect estimation makes it hard, also interesting.
12 Besides randomized studies, there are more ways of generating 2 2 table: cross-sectional (naturalistic, multinomial) sampling: select a total of N subjects, followed by the determination for each subject of presence or absence of characteristics of A and B; retrospective sampling: predetermine n 1 of subjects who possess A and n 2 who do not possess A, followed by the determination of B in each group, where A is usually a disease of interest and B is a risk factor. Case-control study prospective sampling: similar to case-control, except A and B is switched. Cohort study.
13 Cross-sectional sample to estimate risk ratio (relative risk, RR) Outcome (D) Exposure (E) Yes (+) No (-) 1(Yes, +) a b n 1 2(No, -) c d n 2 Risk ratio is defined by m 1 m 2 n RR = Pr(D+ E + ) Pr(D + E ) The estimated RR is RR = a/n 1 c/n 2
14 The estimated variance for ln RR estimated by var[ln( RR)] = 1 a 1 n c 1 n 2. 95% CI for RR is obtained by obtaining CI for ln(rr) because the sampling distribution of ln RR is closer to Normal than that of RR l, u =ln( RR) ± 1.96 var(ln RR) The CI for RR is then given by exp(l), exp(u)
15 Example. Data for 200 mothers and their baby birthweight are as Outcome Maternal Age 2500 > 2500 < 20 a =10 b =40 n 1 =50 20 c =15 b = 135 n 2 = 150 m 1 =25 m 2 = 175 n = 200 RR = 10/50 15/150 =2 with variance estimate for ln RR given by var[ln( RR)] = 1 a 1 n c 1 n 2 = = % CI for RR is exp[ln(2) ± ] = (0.96, 4.16)
16 Levin s attributable risk fraction: How much risk would be reduced if the exposure is eliminated? e.g., force all the smoker in London leave town. Since people with disease include two exclusive types: those who were exposed, and those who were not exposed, we have Pr(D + ) = Pr(D + E + )+Pr(D + E ) =Pr(D + E + )Pr(E + )+Pr(D + E )Pr(E ) If E + cannot cause disease, we would expected people with exposure (E + ) have the same disease rate as those who were not exposed, i.e., Pr(D + E ). Thus, the proportion of exposed people will have disease, if the exposure could not cause disease, isgivenby Pr(D + E ) Pr(E + )
17 Levin (1953, Acta Unio Int contra Cancrum 19: ) defined Attributable Fraction as R A = actual counterfactual actual = Pr(D+ E + )Pr(E + ) Pr(D + E )Pr(E + ) = Pr(D + ) Pr(E + )[Pr(D + E + ) Pr(D + E )] Pr(D + E + )Pr(E + )+Pr(D + E )Pr(E ) = Pr(E+ )[Pr(D + E + )/ Pr(D + E ) 1] Pr(E + )Pr(D + E + )/ Pr(D + E )+Pr(E ) Pr(E = + )[RR 1] Pr(E + )RR+1 Pr(E ) R A = Pr(E+ )[RR 1] 1+Pr(E + )(RR 1)
18 R A = Pr(E+ )(RR 1) 1+Pr(E + )(RR 1) Estimated by R A = ] Pr(E + ) [ RR 1 1+ Pr(E + )( RR 1) with estimated variance for ln(1 R A ) given by [ ] var ln(1 R A ) = 1 [ ] b + R A (a + d) nc see Fleiss (1979 Am J Epidemiol 110: ).
19 Example: Infant mortality by birthweight for n = live births in New York City in What % of deaths could have been prevented if low birthweight had been eliminated? 1 yr BW Dead Alive Total 2500g a/n = b/n = (a + b)/n = > 2500g c/n = d/n = (c + d)/n = Total n/n =1 RR = / / = P (E + )=0.0717
20 R A = ] Pr(E + ) [ RR 1 1+ Pr(E + )( RR 1) =.0717( ) ( ) = Variance for ln(1 R A ) is var i.e., [ ] ln(1 R A ) ŝ.e. = ( ) [ ] ln(1 R A ) = var = = % CI for ln(1 R A ) is ln(1 R A ) ± 1.96ŝe[ln(1 R A )] = ln( ) ± =( 0.900, 0.755)
21 CI for R A is then given by [1 exp( 0.755), 1 exp( 0.900)] = (0.530, 0.593) With 95% confidence, between 53% and 59% of all infant death in New York City in 1974 could have been prevented if low birth weight had been eliminated.
22 Attributable risk among the exposed R E =1 1 RR which is widely used in the law to describe the excess risk as a fraction of the risk among those exposed to the antecedent factor. The estimator is R E =1 1 RR Estimation can be conducted through RR.
23 .
24 Cohort study may be used to estimate all of the above effect measures.
25 Case-control study and odds ratio Rothman, Modern Epidemiology, 1986, p.62 The sophisticated use and understanding of case-control studies is the most outstanding methodological development of modern epidemiology My understanding of case-control study is from Breslow Statistics in epidemiology: The case-control study. J Am Stat Assoc 91:14 28.
26 Recall case-control design involves selecting n 1 of subjects who have disease D + and n 2 who do not possess D, followed by the determination of exposure X + or X in each group. Case-control design in general can only provides the ratio of exposure odds of case group to that of the control group, i.e. OR e = exposure odds case exposure odds control Since odds is defined as P 1 P, OR e = Pr(X + D + ) 1 Pr(X + D + ) Pr(X + D ) 1 Pr(X + D ) = Pr(X+ D + )Pr(X D ) Pr(X + D )Pr(X D + )
27 For OR e to be useful, it must have some relationship with risk ratio Pr(D + X + ) Pr(D + X ). Entered Cornfield (1951 J. Natl Cancer Inst 11: ) who showed that 1) Pr(D+ X+)Pr(X+) Pr(X + D )Pr(X D + ) = Pr(D + ) OR e = Pr(X+ D + )Pr(X D ) = Pr(D+ X + )Pr(D X ) Pr(D X + )Pr(D + X ) = OR d Pr(D X )Pr(X ) Pr(D ) Pr(D X + )Pr(X + ) Pr(D Pr(D+ X )Pr(X ) ) Pr(D + ) 2) Pr(D X ) Pr(D + X ) 1, whenpr(d+ ) 0. which implies that OR d RR d. To see 2), observe Pr(D X ) = Pr(X D )P (D ) = P (X ) = Pr(X D )P (D ) Pr(X D )+Pr(X D + ) Pr(X D )P (D ) Pr(X D )Pr(D )+Pr(X D + )Pr(D + ) 1, when Pr(D+ ) 0 Similarly Pr(D X + ) 1 when Pr(D + ) 0.
28 Thus, case-control studies are indeed useful. In fact, Mantel & Haenszel (1959, J Natl Cancer Inst 22: ) stated Among the desirable attributes of the retrospective study is the ability to yield results from presently collectible data... The retrospective approach is also adapted to the limited resources of an individual investigator... For especially rare disease a retrospective study may be the only feasible approach... In the absence of important biases in the study setting, the retrospective method could be regarded, according to sound statistical theory, as the study method of choice (p. 720). This was almost 50 years ago. Recent epidemiologic literature has seen more and more prospective studies, except genetic epidemiologic literature in which case-control design is almost universal. More detailed discussion of statistical issues may be found in Zou (2006 Annals of Human Genetics 70: ).
29 Status Exposure Case Control Yes a b n 1 No c d n 2 Odds ratio in a case control study is estimated by ÔR = ad bc with variance of ln(ôr) estimated by var[ln(ôr)] = 1 a + 1 b + 1 c + 1 d
30 Thus, (1 α) 100% CI for OR is given by [ ] exp ln(ôr) ± Z 1 α/2 var[ln(ôr)]
31 Example. Sun protection during childhood by case-control status for cutaneous melanoma in Belgium, France and Germany. Exposure Sun protection Case Control Yes No ÔR = =0.72 var[ln(ôr)] = = % CI for OR is exp[ln(0.72) ± ] = exp( , ) =(0.53, 0.98)
32 Status Exposure Case Control Yes a b n 1 No c d n 2 Odds ratio in a case control study is estimated by ÔR = ad bc Status trt Yes No 1 a =0 b =14 n 1 =14 2 c =0 d =11 n 2 =11 This is a data set discussed by Parzen M, Lipsitz S, Ibrahim J, Klar N An estimate of the odds ratio that always exists. J Comput Graph Stat 11:
33 Exact confidence interval for OR It is available in SAS proc freq, which computes exact confidence limits for the odds ratio with an algorithm by Thomas (1971, Applied Statistics 20: ), ie., the limits L and U are iterative solutions for the following two equations: m1 ( n1 ) L i i=a m1 i=0 a i=0 m1 i=0 )( n2 i m 1 i ( n1 )( n2 i m 1 i ( n1 )( n2 i m 1 i ( n1 )( n2 i m 1 i ) L i = α/2 ) U i ) = α/2 U i
34 Exact may not be the best in categorical data analysis because the results may be too conservative Agresti A Dealing with discreteness: making exact confidence intervals for proportions, differences of proportions, and odds ratios more exact Stat Meth Med Res 12 (1): the inversion of the asymptotic score test seems to be a good choice. This tends to have actual level fluctuating around the nominal level. If one prefers that level to be a bit more conservative, mid-p adaptations of exact methods work well. For situations that require a bound on the error, it appears that basing conservative intervals on inverting the exact score test has reasonable performance. For teaching, the Wald-type interval of point estimate plus and minus a normal-score multiple of a standard error is simplest. Unfortunately, this can perform poorly, but simple adjustments sometimes provide much improved performance.
35 Odds ratio versus risk ratio. Most traditional statistical methods in epidemiology were developed in the case of case-control design. Specifically, OR was the effect measure of choice. Unfortunately, when it comes to prospective or cross-sectional design, the rare disease assumption may not be satisfied. In such cases, OR becomes very difficulty to interpret, sometimes misleading (see NEJM 1999;341:279 83). For a RR =2, p p OR = p 1/(1 p 1 ) p 2 /(1 p 2 )
36 My view is that we must always remember why Cornfield (1951) proposed OR. Excellent discussion on the choice of effect measures may be found in Greenland (Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol 1987;125:761 8).
37 Converting OR to RR (Zhang & Yu, 1998 JAMA 280: ): RR = RR = OR 1 p 2 + p 2 OR ÔR 1 p 2 + p 2 ÔR (2) Eq. (2) results in correct point estimate only if they are no confounder; Substituting confidence limits for OR to obtain CI for RR yields invalid interval for RR. More discussion can be found in McNutt et al (2003 Am J Epidemiol 157:940 3) and Zou (2004 Am J Epidemiol 159: 702 6).
38 A little trick to check calculations when a confidence interval is constructed through log-transformation (Lee PN Stat Med 18: ): Such a interval should satisfies: square of the point estimate should equal to the product of lower and upper limits. l = exp[ln point Z var(ln point)] u = exp[ln point + Z var(ln point)] thus l u =exp(2 ln point) =(point) 2
39 Sample size estimation with SAS proc power proc power; twosamplefreq test=pchi relativerisk = 1.5 refproportion = 0.2 power=0.8 ntotal=.; run;
40 The POWER Procedure Pearson Chi-square Test for Two Proportions Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Reference (Group 1) Proportion 0.2 Relative Risk 1.5 Nominal Power 0.8 Number of Sides 2 Null Relative Risk 1 Alpha 0.05 Group 1 Weight 1 Group 2 Weight 1 Computed N Total Actual N Power Total
41 proc power; twosamplefreq test=pchi oddsratio = 2.5 refproportion = 0.3 groupweights = (1 2) ntotal =. power = 0.8; run;
42 The POWER Procedure Pearson Chi-square Test for Two Proportions Fixed Scenario Elements Distribution Asymptotic normal Method Normal approximation Reference (Group 1) Proportion 0.3 Odds Ratio 2.5 Group 1 Weight 1 Group 2 Weight 2 Nominal Power 0.8 Number of Sides 2 Null Odds Ratio 1 Alpha 0.05 Computed N Total Actual N Power Total
43 proc power; twosamplefreq test=fisher groupproportions = (.35.15) power=0.80 npergroup =.; run;
44 The POWER Procedure Fisher s Exact Conditional Test for Two Proportions Fixed Scenario Elements Distribution Exact conditional Method Walters normal approximation Group 1 Proportion 0.35 Group 2 Proportion 0.15 Nominal Power 0.8 Number of Sides 2 Alpha 0.05 Computed N Per Group Actual N Per Power Group
45
46 Combine information from multiple 2 2 tables (Mantel-Haenszel methods)
47 Outcome Exposure Yes No Yes a k b k n 1k No c k d k n 2k m 1k m 2k n k MH test for no association between exposure and outcome (Mantel & Haenszel, 1959 J Natl Cancer Inst 22: ) [ ( )] 2 χ 2 k a k m 1kn 1k n k MH = k m 1k m 2k n 1k n 2k n 2 k (n k 1) which is distributed as chi-square with one degree-of-freedom, under H 0. 5 years earlier, Cochran (1954 Biometrics 10: ) proposed a test that is virtually identical to χ 2 MH (difference?) When k =1, χ 2 MH reduce to Pearson chi-square test for 2 2 table.
48 Mantel-Haenszel odds ratio estimator (1959) OR MH = k a kd k /n k k b kc k /n k For 20 years, nobody knew what was the standard error for OR MH. Hauck (1979, Biometrics 35: ) provided a formula that is valid when each table are large. Outcome Exposure Yes No Yes a k b k n 1k No c k d k n 2k m 1k m 2k n k
49 The popular variance formula for OR MH is the one derived by Robins, Breslow, & Greenland (1986, Biometrics 42: ). Recall OR MH = k a kd k /n k k b kc k /n k = k R k k S k Define two more terms P k =(a k + d k )/n k and Q k =(b k + c k )/n k var[ln( OR k MH )] = P kr k 2 ( k k R ) 2 + P ks k + k Q kr k k 2 ( k R )( k k S ) k + Q ks k k 2 ( k S k Outcome Exposure Yes No Yes a k b k n 1k No c k d k n 2k m 1k m 2k n k ) 2
50 (1 α) 100% CI for OR [ ( ) exp ln OR MH ] ± Z 1 α/2 var[ln( OR MH )
51 Example (case-control study): OR MH. Case-control studies on the role of high voltage power lines in the etiology of leukemia in children (Hanley & Thriault, 2000 Epidemiology 11(5): 613) Study 1 Study 2 Case Control Case Control < 100m a k =18 b k =25 n 1k = > 100m c k = 162 d k = 252 n 2k = m 1k = 180 m 2k = 277 n k = ÔR 1 =1.12 ÔR 2 =1.62
52 If not stratify, we have Status Case Control < 100m a =30 b = 148 > 100m c = 188 d = 683 ÔR = from leukemia. =0.74, living closer to powerlines protects children However, OR MH = = =
53 Mantel-Haenszel technique has also been used to derive RR estimator (Tarone, 1981 J Chronic Dis 34: ): RR MH = k a kn 2k /n k k c kn 1k /n k with variance given by var[ln( RR MH )] = ( k [n 1kn 2k m 1k a k c k n k ] /n 2 k k a )( kn 2k /n k k c ) kn 1k /n k Outcome Exposure Yes No Yes a k b k n 1k No c k d k n 2k m 1k m 2k n k
54 (1 α) 100% CI for RR [ ( ) exp ln RR MH ] ± Z 1 α/2 var[ln( RR MH )
55 Ex 8.5. Example (Clinical trial): RR MH Age 65+ Age 65- Drug Yes No Yes No B a k =32 b k =8 n 1k = A c k =24 d k =36 n 2k = m 1k =56 m 2k =44 n k = χ 2 MH [ = k k ( a k n 1k m 1k )] 2 n k n 1k n 2k m 1k m 2k n 2 k (n k 1) ) 2 = ( (100 1) (100 1) =18.435
56 RR MH = k a kn 2k /n k k c kn 1k /n k = = =2.0 var[ln( RR)] = 95% CI is then k (n 1kn 2k m 1k a k c k n k )/n 2 k ( k a kn 2k /n k)( k c kn 1k /n k) = = exp(ln 2 ± ) = (1.44, 2.77) =2 2 (checked)
57 Summary: Application of Mantel-Haenszel methods Adjust for confounding (the original purpose): combat Simpson s paradox; Meta-analysis: considering each study as a stratum. The method of meta-analysis is commonly referred to as fix-effect model with intention of summarizing available evidence, but not to predict future study results. For that, random-effect model must have to be adopted.
58 A note: Like fire, the chi-square test is an excellent servant and a bad master (Hill, 1965 Proc R Soc Med 58: ). Study 1 Study 2 Exposure D + D Risk D + D Risk E E RR OR p-value
Suppose that we are concerned about the effects of smoking. How could we deal with this?
Suppose that we want to study the relationship between coffee drinking and heart attacks in adult males under 55. In particular, we want to know if there is an association between coffee drinking and heart
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationEpidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 13 - Effect Measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered 1. risk factors 2. risk
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationPart IV Statistics in Epidemiology
Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret
More informationLecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the
More informationPerson-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data
Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time
More informationPOWER FOR COMPARING TWO PROPORTIONS WITH INDEPENDENT SAMPLES
This handout covers material found in Section 0.5 of the text. POWER FOR COMPARING TWO PROPORTIONS WITH INDEPENDENT SAMPLES EXAMPLE: Otolaryngology (Example 0.3 of your text, page 405). Suppose a study
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationE509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.
E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationConfounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning
Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK Advanced Statistical
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationTESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN
Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO
More informationInference for Binomial Parameters
Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for
More informationIgnoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationAnalysis of Categorical Data Three-Way Contingency Table
Yu Lecture 4 p. 1/17 Analysis of Categorical Data Three-Way Contingency Table Yu Lecture 4 p. 2/17 Outline Three way contingency tables Simpson s paradox Marginal vs. conditional independence Homogeneous
More informationE509A: Principle of Biostatistics. GY Zou
E509A: Principle of Biostatistics (Week 4: Inference for a single mean ) GY Zou gzou@srobarts.ca Example 5.4. (p. 183). A random sample of n =16, Mean I.Q is 106 with standard deviation S =12.4. What
More informationSample Size/Power Calculation by Software/Online Calculators
Sample Size/Power Calculation by Software/Online Calculators May 24, 2018 Li Zhang, Ph.D. li.zhang@ucsf.edu Associate Professor Department of Epidemiology and Biostatistics Division of Hematology and Oncology
More information1 Comparing two binomials
BST 140.652 Review notes 1 Comparing two binomials 1. Let X Binomial(n 1,p 1 ) and ˆp 1 = X/n 1 2. Let Y Binomial(n 2,p 2 ) and ˆp 2 = Y/n 2 3. We also use the following notation: n 11 = X n 12 = n 1 X
More informationPB HLTH 240A: Advanced Categorical Data Analysis Fall 2007
Cohort study s formulations PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Srine Dudoit Division of Biostatistics Department of Statistics University of California, Berkeley www.stat.berkeley.edu/~srine
More informationJournal of Biostatistics and Epidemiology
Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department
More informationAnalytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis
Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis 2014 Maternal and Child Health Epidemiology Training Pre-Training Webinar: Friday, May 16 2-4pm Eastern Kristin Rankin,
More informationPower and Sample Size (StatPrimer Draft)
Power and Sample Size (StatPrimer Draft) To achieve meaningful results, statistical studies must be carefully planned and designed. Study design has many aspects. Here s just a sampling of questions you
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationSample size and power calculation using R and SAS proc power. Ho Kim GSPH, SNU
Sample size and power calculation using R and SAS proc power Ho Kim GSPH, SNU Pvalue (1) We want to show that the means of two populations are different! Y 1 a sample mean from the 1st pop Y 2 a sample
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationThree-Way Contingency Tables
Newsom PSY 50/60 Categorical Data Analysis, Fall 06 Three-Way Contingency Tables Three-way contingency tables involve three binary or categorical variables. I will stick mostly to the binary case to keep
More informationA note on R 2 measures for Poisson and logistic regression models when both models are applicable
Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationMantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC
Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)
More informationEpidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being covered
More informationStandardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE
ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal
More informationOne-Way Tables and Goodness of Fit
Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More information3 Way Tables Edpsy/Psych/Soc 589
3 Way Tables Edpsy/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017
More informationLecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio
Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK March 3-5,
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS
ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS Background Independent observations: Short review of well-known facts Comparison of two groups continuous response Control group:
More informationData, Design, and Background Knowledge in Etiologic Inference
Data, Design, and Background Knowledge in Etiologic Inference James M. Robins I use two examples to demonstrate that an appropriate etiologic analysis of an epidemiologic study depends as much on study
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationMulti-Level Test of Independence for 2 X 2 Contingency Table using Cochran and Mantel Haenszel Statistics
IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. Issue 8, August 015. ISSN 348 7968 Multi-Level Test of Independence for X Contingency Table using Cochran and Mantel
More informationOne-sample categorical data: approximate inference
One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationDIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS
DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationAsymptotic equivalence of paired Hotelling test and conditional logistic regression
Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS
More informationEstimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds
American Journal of Epidemiology ª The Author 2009. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationReports of the Institute of Biostatistics
Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationTests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X
Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationLecture 25: Models for Matched Pairs
Lecture 25: Models for Matched Pairs Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture
More informationLing 289 Contingency Table Statistics
Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationConfidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection
Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationTests for Two Correlated Proportions in a Matched Case- Control Design
Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population
More information10: Crosstabs & Independent Proportions
10: Crosstabs & Independent Proportions p. 10.1 P Background < Two independent groups < Binary outcome < Compare binomial proportions P Illustrative example ( oswege.sav ) < Food poisoning following church
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationChapter Six: Two Independent Samples Methods 1/51
Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were
More informationCollated responses from R-help on confidence intervals for risk ratios
Collated responses from R-help on confidence intervals for risk ratios Michael E Dewey November, 2006 Introduction This document arose out of a problem assessing a confidence interval for the risk ratio
More informationChapter 11. Correlation and Regression
Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationStatistical Methods in Epidemiologic Research. EP 521 Spring 2006 Course Notes Vol I (Part 1 of 5)
EP 521, Spring 2006, Vol I, Part 1 Statistical Methods in Epidemiologic Research 1 EP 521 Spring 2006 Course Notes Vol I (Part 1 of 5) A. Russell Localio*, and Jesse A Berlin ( The Great Master ) *Department
More information11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.
Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011
More informationInferences for Proportions and Count Data
Inferences for Proportions and Count Data Corresponds to Chapter 9 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT), with some slides by Ramón V. León (University of Tennessee) 1 Inference
More informationLogistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S
Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic
More informationAnalysis of categorical data S4. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands
Analysis of categorical data S4 Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands m.hauptmann@nki.nl 1 Categorical data One-way contingency table = frequency table Frequency (%)
More informationPROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF
pp: --2 (col.fig.: Nil) STATISTICS IN MEDICINE Statist. Med. 2004; 2:000 000 (DOI: 0.002/sim.8) PROD. TYPE: COM ED: Chandra PAGN: Vidya -- SCAN: Nil Simple improved condence intervals for comparing matched
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationRelative Effect Sizes for Measures of Risk. Jake Olivier, Melanie Bell, Warren May
Relative Effect Sizes for Measures of Risk Jake Olivier, Melanie Bell, Warren May MATHEMATICS & THE UNIVERSITY OF NEW STATISTICS SOUTH WALES November 2015 1 / 27 Motivating Examples Effect Size Phi and
More informationCase-control studies
Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark b@bxc.dk http://bendixcarstensen.com Department of Biostatistics, University of Copenhagen, 8 November
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationOne-stage dose-response meta-analysis
One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and
More informationMeans or "expected" counts: j = 1 j = 2 i = 1 m11 m12 i = 2 m21 m22 True proportions: The odds that a sampled unit is in category 1 for variable 1 giv
Measures of Association References: ffl ffl ffl Summarize strength of associations Quantify relative risk Types of measures odds ratio correlation Pearson statistic ediction concordance/discordance Goodman,
More information