Matched-Pair Case-Control Studies when Risk Factors are Correlated within the Pairs
|
|
- Augustine Atkinson
- 6 years ago
- Views:
Transcription
1 International Journal of Epidemiology O International Epidemlologlcal Association 1996 Vol. 25. No. 2 Printed In Great Britain Matched-Pair Case-Control Studies when Risk Factors are Correlated within the Pairs BETH C GLADEN Gladen B C (Statistics and Biomathematics Branch, Mail Drop A3-03, National Institute of Environmental Health Sciences, PO Box 12233, Research Triangle Park, NC 27709, USA). Matched-pair case-control studies when risk factors are correlated within the pairs. International Journal of Ep/demfo/ogy 1996; 25: Background. If pair members are independent, simple matched-pair case-control studies are known to yield consistent estimates of the population odds ratio. If pair members are not independent, this is not necessarily true. It has been shown previously that the usual matched-pair estimate remains consistent if the exposure of interest is correlated within the pairs. However, the effect of correlation of unmeasured risk factors within the pairs has not been studied. Methods. We examine the effect of wfthin-pair correlation of unmeasured risk factors independent of the measured exposure. This is done within the context of a simple matched-pair case-control study. We compare the large-sample expectation of the usual matched-pair estimate to the population odds ratio. Results. We show that the usual estimate may be inconsistent in the presence of this correlation. However, if the disease is rare, the magnitude of the bias will be negligible. Conclusions. Correlation of unmeasured risk factors independent of the measured exposure is not a practical problem in this setting. Keywords, bias (epidemiology), odds ratio, selection bias, epidemlological methods Matched-pair case-control studies can be used to study the relationship between a disease and an exposure of interest. In the simple version of such a study, we choose a random sample of cases and a matched control for each case. We determine whether each pair member is exposed. We calculate the ratio of the number of pairs with an exposed case and an unexposed control to the number of pairs with an exposed control and an unexposed case. Under the usual assumptions, this ratio will be a consistent (that is, unbiased in large samples) estimate of the population odds ratio. One of the usual assumptions is that everyone in the study is independent. If controls are chosen as, for example, random people from the same city and of the same age and sex as the case, this may be a reasonable assumption. If, however, the controls are siblings or spouses of the cases, the assumption of independence within pairs becomes less tenable. Such controls may be used because they are considered more appropriate; they may also be used for the practical reason that they are readily identified and likely to be willing to participate in a study. Since using these types of controls Statistics and Biomathematics Branch, Mail Drop A3-03, National Institute of Environmental Health Sciences, PO Box 12233, Research Triangle Park, NC 27709, USA. violates the usual assumptions, we need to check the behaviour of the estimate under these conditions. Goldstein, Hodge, and Haile looked at simple matched-pair case-control studies where the exposure of interest is correlated within the pairs. 1 In retrospective studies, what we are examining is the distribution of exposure. When exposure of a control is related to exposure of a case, it is reasonable to think this correlation may distort our inferences. However, Goldstein et at. demonstrated that the usual matched-pair estimate remains a consistent estimate of the population odds ratio despite the correlation of exposure (assuming no other assumptions are violated). A similar result appears in Pike and Robins 2 in a modification of the results of Flanders and Austin. 3 However, this is not completely reassuring since the correlation within pairs may well extend further. Although we are only interested in and only measure a single exposure, there are always other risk factors for the disease. The pair members may well be correlated on these other risk factors as well. These other risk factors may not be recognized, let alone measured. For example, suppose we are studying the relationship between a disease and some exposure; if the disease is thought to have a genetic component, but the genes responsible are unknown, sibling controls may be used. 420
2 EFFECT OF WITHIN-PAIR CORRELATION 421 The genetic risk factor cannot be measured, since the gene is unknown. The siblings may have correlated values of a variety of other unmeasured risk factors as well; these might include diet or socioeconomic status. Similarly, if a disease is known to vary by socioeconomic status or geographical location, neighbourhood controls may be used. The underlying risk factors may be unknown and thus unmeasured, and may be correlated within neighbourhoods. Neighbourhood is not a risk factor in itself, but a surrogate for these other risk factors. In this paper, we examine whether the usual estimate in a simple matched-pair case-control study remains consistent if correlation of risk factors (both the single measured one and the unmeasured ones) is present within pairs. Throughout, we will ignore precision; we are only concerned with bias. We will also assume that the unmeasured risk factors are independent of the measured exposure. Dependence would create a standard confounding situation where bias would be expected; under independence, one might expect to avoid problems. We explore whether this expectation is accurate. ASSUMPTIONS AND NOTATION Validity of a matched study is dependent on the rules which specify which non-cases are potential matched controls for each case. Certain schemes, such as use of friend controls, can cause bias. 2 " 5 This bias is avoided if the population from which cases arise can be divided into non-overlapping groups, and controls are chosen from the same group as the case; this has been called 'reciprocal design'. 2-5 We assume throughout that controls are chosen in this fashion. These non-overlapping groups might consist, for example, of sibship members or of residents of the same city block. For concreteness, we will assume that the groups in question are pairs, and we will call the pair members the wife and the husband. Assume a single dichotomous exposure of interest, denoted by E. This exposure will be the focus of the matched-pair case-control study. Let p and q denote the prevalences of the exposure for wives and husbands, respectively; we need not assume that they are equal. Assume that exposures of wife and husband are correlated, and let r denote the covariance. The joint probabilities of E are: P(wife is E, husband is E) = pq + r P(wife is E, husband ise) = p(l q) - r P(wife is E, husband is E) = (l-p)q - r P(wife is E, husband is E) = (1 p)(l q) + r If r = 0, the exposures of wife and husband are independent. Assume one other discrete risk factor (denoted F) with f categories. F can be thought of as subsuming all other risk factors, since it could actually be a composite of multiple, possibly dependent, risk factors; for example, level I is young white professionals, level 2 is old white professionals, level 3 is young white labourers, and so on. F will not be measured in the matched-pair study; it nevertheless plays a role in determining the distribution of disease in the pairs. Assume that F is correlated within pairs, but that F and E are independent. Assume that prevalences of F are the same for wives and husbands. Denote the marginal and joint probabilities for F by: Pr(wife is F ; ) = Pr(husband is F;) = x, Pr(wife is Fj and husband is F) = Pr(wife is ~ and husband is = x,xj + Zlj If z,j = 0 for all values of i and j, then the risk factors of husband and wife are independent. Finally, denote disease by D. Assume that disease risk depends only on E and F. In particular, assume that variations in disease risk from one pair to another are attributable solely to variations in E and F. Assume that, conditional on the risk factors, occurrence of disease in one individual is independent of occurrence in all others. Denote the disease probabilities by: Pr(D I E, Fj) = a, Pr(D I E, Fj) = bj Note that we do not assume that relative risks for E (that is, b/a,) are constant across the levels of F; this means effect modification is permitted. Thus, for example, we allow for the possibility that one factor is environmental, the other is genetic, and no elevation in risk occurs unless both are present. RESULTS Population Parameters We may derive the population values for relative risks and odds ratios for exposure through straightforward algebra; details are in the Appendix. First, we may show that the risk of disease conditional on exposure r is Pr(D I E) = X Xjb t. This is, of course, just a weighted average of the risks (bj) in the various levels of F, weighted by the frequencies (Xj). Similarly, we may show that Pr(DlE~) = Ex^. Then the relative risk
3 422 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY f f is Ix,b,/ Zxa-. A similar expression in the case I-I ' ' i-i ' ' where f = 2 is given by Khoury and James. 6 Similarly, the odds ratio is: f I I-I Note that the correlation parameters, ^ and z tj, do not enter into these expressions; the relative risk and odds ratios are the same whether or not exposures are correlated within the pairs. Matched Pair Estimate Suppose we do a matched-pair case-control study looking at the effect of E on D. F is not measured in such a study, but it affects the distribution of D nonetheless. By design, only those pairs discordant for D (that is, pairs with one case and one control) appear in the study. Of those, only those pairs discordant for E contribute to the usual estimate of the odds ratio. The expected number of pairs with an exposed case and an unexposed control will be proportional to: Pr(wife is E, D and husband is E, D) + Pr(wife is E, D and husband is E, D) which can be shown to be: [ P + q-2pq-2r]{[ix i (l-a,)][ x 1 b 1 ]-iiz 1J b 1 a J } i-i i.i i-ij-i ' ' Similarly, the expected number of pairs with an unexposed case and an exposed control will be proportional to: Pr(wife is E, D and husband is E, D) + Pr(wife is E, D and husband is E, D) = [p + q-2pq-2r]{[ix,a i ][ x i (l-b 1 )]-iiz li a j b 1 } i-i i-i i-ij-i ' The ratio of these two terms gives the expression for the large sample expectation of the estimated odds ratio: (1) Behaviour of Estimate Under Various Conditions First note that the distribution of the exposure E is irrelevant to the behaviour of the estimate; p, q, and r do not appear in expression (2). Expression (2) will be equal to the population odds ratio (1) in several circumstances. First, if the exposure of interest E is not actually a risk factor, there is no bias. This condition is equivalent to a; = b ; for all i. In this situation, both the population odds ratio and the large-sample expectation of the estimate will be 1. Second, if there is no correlation on F within pairs, there is no bias. This condition is equivalent to z,j = 0 for all i and j. Third, if F is not a risk factor within both exposure groups, there is no bias. If F does not affect disease risk among the unexposed, then a, = a for all i; it can be shown that there is no bias. In similar fashion, if F does not affect disease risk among the exposed, then b, = b for all i, and there will be no bias. Note that the case studied by Goldstein et al} had no risk factor F, which is equivalent to having both a, = a and bj = b; thus their results are a special case of the results obtained here. The behaviour of expression (2) in the rare disease case can be seen by letting disease rates go to zero with relative rates fixed. Simple calculus shows that the limit is the relative risk. Thus as the disease becomes rarer, both the large-sample expectation (2) and the population odds ratio (1) approach the population relative risk and the bias disappears. Example Suppose that F is dichotomous. The distribution of F can then be described by only two parameters, due to constraints mentioned in the Appendix. Thus, we have: Pr(wife is F,) = Pr(husband is F,) = x, Pr(wife is F 2 ) = Pr(husband is F 2 ) = 1-x, Pr(wife is F, and husband is F,) = x, 2 + z n Pr(wife is F, and husband is F 2 ) = Pr(wife is F 2 and husband is F,) = x,(l-x ) - z n Pr(wife is F 2 and husband is F 2 ) = (1-x,) 2 + z,, [Zx i (l-b,)]-zzz«b i a j } i-i i-ij-i i-ij-i Clearly, this expression differs from the population odds ratio (1); specifically, it has an extra term subtracted from both numerator and denominator. Unlike the population odds ratio, the estimate is affected by the correlation parameters z~. Thus, the usual matchedpairs estimate will be biased. We now examine the nature of the bias. (2) There will be four disease parameters (a,, a^ b v b 2 ). Assume that F = 2 is the higher risk category for both exposed and unexposed, so that a 2 &a, and b 2 3>b. Assume also that exposure is detrimental in both categories of F, so that b, 3= a, and b 2 ^ a 2. Assuming all this, we conducted a numerical search through the region where disease risks (a,, a^ b,, b 2 ) are small (KT 6 to 0.1) and relative risks (li.il.i.,^.) i D l flj are moderate (1-5). The parameters x, and Z, were allowed to range through all possible values. The search yielded no example where expression (2)
4 EFFECT OF WTTHIN-PAIR CORRELATION 423 differed by more than 3% from the population odds ratio. In this particular case, expression (2) is an increasing function of z u. Thus positive correlation within the pair (z n >0) will produce a value for expression (2) greater than the population odds ratio. Conversely, negative correlation will produce a value which is smaller. DISCUSSION We have shown that correlation within matched casecontrol pairs on unmeasured risk factors independent of the measured exposure can cause the usual estimate to be inconsistent for the population odds ratio. The bias vanishes as disease becomes rare; thus the bias is unlikely to be of practical importance. There is no bias if the exposure of interest is not a risk factor. There is also no bias if the unmeasured risk factor is not truly a risk factor or if it is not correlated within pairs. We assume throughout that the quantity of interest is the population odds ratio. This will not always be the situation. For example, if the unmeasured risk factor is genotype, only the risk among the susceptibles may be of interest. 7 ' 8 We assume that disease is independent within pairs, conditional on the risk factors; for non-infectious diseases, this is likely to be true since any correlation of disease is probably induced by correlation of risk factors. We assumed that marginal distribution of unmeasured risk factors was the same for the two pair members; situations where this is not true (for example, spouses of breast cancer cases) are likely to represent problematic choices of controls. Related but different problems have been discussed by other authors. Khoury and James 6 assume a measured environmental factor and an unmeasured genetic factor, but examine a different study design. They identify affected individuals and determine the disease status of the pair member. They calculate risk of disease in one pair conditional on the other pair member being diseased and conditional on exposure status. In contrast, the matched-pair case-control study examined here looks at risk of exposure in a pair conditional on the disease status of the pair. They show that the relative risks they obtain will equal the population relative risk if risks are multiplicative. Robins and Pike 5 discuss the situation of two risk factors in matched-pair case-control studies. However, they assume that both risk factors E and F are measured and the effects of both are estimated simultaneously. This is a different estimator from the one discussed here. They assume that E and F are correlated with each other. They show that if risks are multiplicative, the estimates for both risk factors will be unbiased. ACKNOWLEDGEMENTS I thank Dale Sandier for bringing this problem to my attention and Glinda Cooper, Dale Sandier, David Umbach, and Clarice Weinberg for helpful comments. REFERENCES 1 Goldstein A M, Hodge S E, Haile R W C. Selection bias in case-control studies using relatives as the controls. Int J Epidemiol 1989; 18: Pike M C, Robins J. Re: 'Possibility of selection bias in matched case-control studies using friend controls'. Am J Epidemiol 1989; 130: Flanders W D, Austin H. Possibility of selection bias in matched case-control studies using friend controls. Am J Epidemiol 1986; 124: Austin H, Flanders W D, Rothman K J. Bias arising in casecontrol studies from selection of controls from overlapping groups. Int J Epidemiol 1989; 18: Robins J, Pike M. The validity of case-control studies with nonrandom selection of controls. Epidemiology 1990; 1: Khoury M J, James L M. Population and familial relative risks of disease associated with environmental factors in the presence of gene-environment interaction. Am J Epidemiol 1993; 137: Khoury M J, Stewart W, Beaty T H. The effect of genetic susceptibility on causal inference in epidemiologic studies. Am J Epidemiol 1987; 126: ' Breitner J C S, Murphy E A, Woodbury M A. Case-control studies of environmental influences in diseases with genetic determinants, with an application to Alzheimer's disease. Am J Epidemiol 1991; 133: (Revised version received August 1995)
5 424 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY APPENDIX We give here details of some of the calculations. Note first for future reference that symmetry in the definitions of the probabilities of F imply that z^ = z-^. The fact that probabilities add to 1 implies that Z x, = ' Th e definition of x implies that z H = Xz ii =0 I.I j.i First, derive the risk of disease conditional on exposure: Pr(D I E) = Pr(E) Pr(R) Pr(D I E.R) / Pr(E) = X;b, The derivation of Pr(D E) is exactly analogous; population values of relative risks and odds ratios follow immediately. The expected number of pairs with an exposed case and an unexposed control will be proportional to: r r Pr(wife is E, D and husband is E, D) + Pr(wife is E, D and husband is E, D) = X Z[Pr(wife is E, D, F ; and husband is E, D, Fp '"' H + Pr(wife is E, D, Fj and husband is E, D, Fp] r r z I I [Pr(wife is E and husband is E) Pr(wife is F and husband is F),-i j.i i Pr(wife is D wife is E, F,) Pr(husband is D husband is E, Fp + Pr(wife is E and husband is E) Pr(wife is F, and husband is Fp Pr(wife is D wife is E\ Fj) Pr(husband is D husband is E, Fp] = {[p(l-q)-r](x 1 x j + z,pb,(l-ap + [(l^q-rkxjxj + ZyXl-a,)^} = [ P (l-q)-r] x,b, x J (l-a J ) + [(l-p)q-r] x 1 (l-a i ) x J b J +[p(l-q)-r] z, J b 1 (l-a J ) I.I j.i I-I j.i i-ij.i + [(l-p)q-r)] z ij (l-a i )b J i.ij.i = [p + q-2pq-2r][ x l (l-a,)][ x,b 1 ] + [p(l-q)-r] z lj b i (l-a j ) + [(l-p)q-r] z jl b J (l-a 1 ) i.i i.i i-ij-i I.IJ.I = [ P + q-2pq-2r]{[ x,(l-a 1 )][ x l b 1 ]+ z u b 1 (l-a J )} I-I i.i I-IJ-I = [p + q-2pq-2r]{[ x l (l-a i )][ x,b i ]+ b i z ij - z, J b 1 a j } i-i I.I i.i j.i i-ij-i = [p + q-2pq-2r]([ x,(l-a,)][ x i b 1 ]- z, J b i a j } i-i I.I i-ij-i The expected number of pairs with an unexposed case and an exposed control can be derived similarly, and expression (2) follows immediately. Expression (2) will be equal to the population odds ratio (1) in several circumstances. First, there is no bias if a; = b, for all i. Under this assumption, the numerator of (2) equals the denominator of (2), so expression (2) equals 1. Since the population odds ratio is also 1, there is no bias. Second, there is no bias if z V) = 0 for all i and j. Under this condition, the extra term in the numerator and denominator of (2) is zero; this makes expressions (1) and (2) exactly equal. Third, there is no bias if a; = a for all i. Under these circumstances, the extra term is again zero: r f f r Y. Z z^b^i = a Z b, I z. = 0. i.i J-I 'J ' J I.I ' I.I 'J Thus there is no bias. In similar fashion, if bj = b for all i, the extra term is again zero.
6 EFFECT OF WITHIN-PAJR CORRELATION The behaviour of expression (2) in the rare disease case can be seen by letting disease rates go to zero with relative rates fixed. Let b, = ^a, and a, = s^. Expression (2) becomes 1=1 1-1 i-i i.i,-ij.i ' ' /{[Ix 1 s 1 HIx i ]-a 0 [Ix 1 s,][ix i iis l ]-a 0 z lj i;s l s J } i.i i.i I-I 1=1 i-ij-i We need the limit of this expression as ag goes to zero with all other terms fixed. Simple calculus shows that the limit is: r r f r [Xx^sJ / [Xx.s,] = [ X b,] / [ x,a,] =relativerisk i.i I-I I-I I-I Thus as the disease becomes rare, the large-sample expectation approaches the population relative risk.
Ignoring the matching variables in cohort studies - when is it valid, and why?
Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association
More informationPerson-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data
Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time
More informationEffect Modification and Interaction
By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions
More informationTests for Two Correlated Proportions in a Matched Case- Control Design
Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationPart IV Statistics in Epidemiology
Part IV Statistics in Epidemiology There are many good statistical textbooks on the market, and we refer readers to some of these textbooks when they need statistical techniques to analyze data or to interpret
More informationSimple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.
Simple Sensitivity Analysis for Differential Measurement Error By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A. Abstract Simple sensitivity analysis results are given for differential
More informationEstimating direct effects in cohort and case-control studies
Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research
More informationStandardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE
ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationConfounding, mediation and colliding
Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of
More informationThe identification of synergism in the sufficient-component cause framework
* Title Page Original Article The identification of synergism in the sufficient-component cause framework Tyler J. VanderWeele Department of Health Studies, University of Chicago James M. Robins Departments
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationEquivalence of random-effects and conditional likelihoods for matched case-control studies
Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and
More information6.3 How the Associational Criterion Fails
6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P
More informationPropensity Score Analysis with Hierarchical Data
Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational
More informationIn some settings, the effect of a particular exposure may be
Original Article Attributing Effects to Interactions Tyler J. VanderWeele and Eric J. Tchetgen Tchetgen Abstract: A framework is presented that allows an investigator to estimate the portion of the effect
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationMissing covariate data in matched case-control studies: Do the usual paradigms apply?
Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman
More informationEstimating and contextualizing the attenuation of odds ratios due to non-collapsibility
Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:
More informationThe identi cation of synergism in the su cient-component cause framework
The identi cation of synergism in the su cient-component cause framework By TYLER J. VANDEREELE Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637
More informationMendelian randomization as an instrumental variable approach to causal inference
Statistical Methods in Medical Research 2007; 16: 309 330 Mendelian randomization as an instrumental variable approach to causal inference Vanessa Didelez Departments of Statistical Science, University
More informationThe distinction between a biologic interaction or synergism
ORIGINAL ARTICLE The Identification of Synergism in the Sufficient-Component-Cause Framework Tyler J. VanderWeele,* and James M. Robins Abstract: Various concepts of interaction are reconsidered in light
More informationLecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio
Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK March 3-5,
More informationKnown unknowns : using multiple imputation to fill in the blanks for missing data
Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationAsymptotic efficiency of general noniterative estimators of common relative risk
Biometrika (1981), 68, 2, pp. 526-30 525 Printed in Great Britain Asymptotic efficiency of general noniterative estimators of common relative risk BY MARKKU NTJRMINEN Department of Epidemiology and Biometry,
More informationA counterfactual approach to bias and effect modification in terms of response types
uzuki et al. BM Medical Research Methodology 2013, 13:101 RARH ARTIL Open Access A counterfactual approach to bias and effect modification in terms of response types tsuji uzuki 1*, Toshiharu Mitsuhashi
More informationAdditive and multiplicative models for the joint effect of two risk factors
Biostatistics (2005), 6, 1,pp. 1 9 doi: 10.1093/biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology
More information15: Regression. Introduction
15: Regression Introduction Regression Model Inference About the Slope Introduction As with correlation, regression is used to analyze the relation between two continuous (scale) variables. However, regression
More informationJournal of Biostatistics and Epidemiology
Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationSensitivity analysis and distributional assumptions
Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationCausality II: How does causal inference fit into public health and what it is the role of statistics?
Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual
More informationEco517 Fall 2014 C. Sims FINAL EXAM
Eco517 Fall 2014 C. Sims FINAL EXAM This is a three hour exam. You may refer to books, notes, or computer equipment during the exam. You may not communicate, either electronically or in any other way,
More informationConfounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning
Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK Advanced Statistical
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division
More informationCausal Inference. Prediction and causation are very different. Typical questions are:
Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting
More informationEstimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee
Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014 Acknowledgements This is joint work with Alastair Rushworth
More informationDennis Cosrnatos. Department of Biostatistics University of North Carolina at Chapel Hill. September 1988
METHODS FOR MODELING DISEASE RISK USING PROBABILITY-QF-EXPOSURE MEASURES by Dennis Cosrnatos Department of Biostatistics University of North Carolina at Chapel Hill Institute of Mimeo Series No. 1858T
More informationComparison of Three Approaches to Causal Mediation Analysis. Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh
Comparison of Three Approaches to Causal Mediation Analysis Donna L. Coffman David P. MacKinnon Yeying Zhu Debashis Ghosh Introduction Mediation defined using the potential outcomes framework natural effects
More informationInterpolation and Approximation
Interpolation and Approximation The Basic Problem: Approximate a continuous function f(x), by a polynomial p(x), over [a, b]. f(x) may only be known in tabular form. f(x) may be expensive to compute. Definition:
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationCausal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD
Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationAsymptotic equivalence of paired Hotelling test and conditional logistic regression
Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationLecture 2: Probability and Distributions
Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info
More informationHarvard University. Harvard University Biostatistics Working Paper Series
Harvard University Harvard University Biostatistics Working Paper Series Year 2015 Paper 192 Negative Outcome Control for Unobserved Confounding Under a Cox Proportional Hazards Model Eric J. Tchetgen
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationBIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More informationCausal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University
Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationDATA-ADAPTIVE VARIABLE SELECTION FOR
DATA-ADAPTIVE VARIABLE SELECTION FOR CAUSAL INFERENCE Group Health Research Institute Department of Biostatistics, University of Washington shortreed.s@ghc.org joint work with Ashkan Ertefaie Department
More informationOn the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm
On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm Richard Wyss 1, Bruce Fireman 2, Jeremy A. Rassen 3, Sebastian Schneeweiss 1 Author Affiliations:
More informationSampling. Module II Chapter 3
Sampling Module II Chapter 3 Topics Introduction Terms in Sampling Techniques of Sampling Essentials of Good Sampling Introduction In research terms a sample is a group of people, objects, or items that
More informationstatistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:
Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility
More informationA unified framework for studying parameter identifiability and estimation in biased sampling designs
Biometrika Advance Access published January 31, 2011 Biometrika (2011), pp. 1 13 C 2011 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asq059 A unified framework for studying parameter identifiability
More informationUsing Geographic Information Systems for Exposure Assessment
Using Geographic Information Systems for Exposure Assessment Ravi K. Sharma, PhD Department of Behavioral & Community Health Sciences, Graduate School of Public Health, University of Pittsburgh, Pittsburgh,
More informationThis paper revisits certain issues concerning differences
ORIGINAL ARTICLE On the Distinction Between Interaction and Effect Modification Tyler J. VanderWeele Abstract: This paper contrasts the concepts of interaction and effect modification using a series of
More informationLecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017
Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More informationTESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN
Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO
More informationExact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25,
Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25, 2016 1 McNemar s Original Test Consider paired binary response data. For example, suppose you have twins randomized to two
More informationDependent Nondifferential Misclassification of Exposure
Dependent Nondifferential Misclassification of Exposure DISCLAIMER: I am REALLY not an expert in data simulations or misclassification Outline Relevant definitions Review of implications of dependent nondifferential
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationBayesian Hierarchical Models
Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology
More informationData, Design, and Background Knowledge in Etiologic Inference
Data, Design, and Background Knowledge in Etiologic Inference James M. Robins I use two examples to demonstrate that an appropriate etiologic analysis of an epidemiologic study depends as much on study
More informationCausal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of
Causal Inference for Case-Control Studies By Sherri Rose A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the Graduate Division
More informationSemiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies
Biometrika (2005), 92, 2, pp. 399 418 2005 Biometrika Trust Printed in Great Britain Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies BY NILANJAN
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationAssess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014
Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationAppendix: Modeling Approach
AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,
More informationSocial Epidemiology and Spatial Epidemiology: An Empirical Comparison of Perspectives
Social Epidemiology and Spatial Epidemiology: An Empirical Comparison of Perspectives A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Kelsey Nathel McDonald
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationCAUSAL INFERENCE IN THE EMPIRICAL SCIENCES. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea)
CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea) OUTLINE Inference: Statistical vs. Causal distinctions and mental barriers Formal semantics
More informationJoint, Conditional, & Marginal Probabilities
Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that
More informationInvestigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?
Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts? Brian Egleston Fox Chase Cancer Center Collaborators: Daniel Scharfstein,
More informationContingency Tables Part One 1
Contingency Tables Part One 1 STA 312: Fall 2012 1 See last slide for copyright information. 1 / 32 Suggested Reading: Chapter 2 Read Sections 2.1-2.4 You are not responsible for Section 2.5 2 / 32 Overview
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More information11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.
Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationApplications of GIS in Health Research. West Nile virus
Applications of GIS in Health Research West Nile virus Outline Part 1. Applications of GIS in Health research or spatial epidemiology Disease Mapping Cluster Detection Spatial Exposure Assessment Assessment
More informationAnalysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities
More informationWhat Causality Is (stats for mathematicians)
What Causality Is (stats for mathematicians) Andrew Critch UC Berkeley August 31, 2011 Introduction Foreword: The value of examples With any hard question, it helps to start with simple, concrete versions
More informationEffects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit
American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationJoint, Conditional, & Marginal Probabilities
Joint, Conditional, & Marginal Probabilities Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how
More informationIV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors
IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard
More informationMark Scheme (Results) January 2009
Mark (Results) January 009 GCE GCE Mathematics (666/0) Edexcel Limited. Registered in England and Wales No. 4496750 Registered Office: One90 High Holborn, London WCV 7BH January 009 666 Core Mathematics
More informationUnbiased estimation of exposure odds ratios in complete records logistic regression
Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology
More informationCausal inference in epidemiological practice
Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal
More information