Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Size: px
Start display at page:

Download "Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018"

Transcription

1 , Non-, Precision, and Power Statistics Statistical Methods II Presented February 27, 2018 Dan Gillen Department of Statistics University of California, Irvine Discussion.1

2 Various definitions of confounding 1. A type of bias in estimating causal effects, resulting from a mixing of effects of extraneous factors with the effect of interest. In this setting, a confounder is usually defined a third variable that causally effects both the predictor of interest and the outcome. 2. The phenomenon that occurs when stratum-specific and crude measurements differ. The stratification variable would then be termed a confounder. 3. Inseparablility of main effects and interactions under a particular controlled design. Discussion.2

3 Counterfactual approach to causation (Neyman, 1923) Suppose that N units are to be assigned one of K treatments x 0, x 1,..., x K 1, with x 0 the referent treatment. The outcome of interest for the ith unit is the value of the response variable Y i. Further, suppose that Y i will equal y ik if unit i is assigned treatment x k. Then the causal effect of x k on Y i relative to x 0 is defined to be a specified contrast of y ik and y i0, say h(y ik, y i0 ). For example, we may take h to be the difference y ik y i0. Of course, because only one of the potential outcomes y ik (k 0) can be observed in any one unit, an individual effect y ik y i0 cannot be observed. Discussion.3

4 Counterfactual approach to causation (Neyman, 1923) Probabilistic Extension: Consider the joint distribution F(y0,..., y K ) of y i0,..., y ik in a population of units. Then consider population effects defined by differences among the marginal distributions F(y 0 ),..., F (y K ), or a summary measure of these marginal distributions, eg. µ k µ 0 where µ k represents the mean of the distribution F(y k ). Discussion.4

5 based on the counterfactual model Suppose we wish to determine the effect of applying a treatment x 1 on a parameter µ in population A, relative to applying treatment x 0. Suppose that µ will equal µ A1 if x 1 is administered to population A, and will equal µ A0 if x 0 is administered to population A. Of course, if treatment x 1 is administered to the target population, A, then we will be able to observe µ A1, but µ A0 will be unobserved. To obtain a comparison measure, we will instead administer treatment x 0 to a control population, B, allowing us to observe µ B0. Discussion.5

6 based on the counterfactual model The causal effect of x 1 relative to x 0 based on the counterfactual model is defined as the change from µ A0 to µ A1, based on some specified contrast of the two measures. Since we cannot observe µ A0 we must instead base our inference on the contrast between µ B0 and µ A1, eg. µ A1 µ B0. Based on the counterfactual model, we say that confounding exists if µ A1 µ A0 µ A1 µ B0 or equivalently if µ A0 µ B0 Discussion.6

7 based on the counterfactual model Notice however that the counterfactual definition of confounding states no explicit differences between populations A and B with respect to covariates that might affect µ. Clearly, if µ A0 and µ B0 differ, then A and B must differ with respect to covariates that effect µ, these covariates being termed confounders in the counterfactual context. This definition differs from that given by (1) and (2) above in the sense that although drastic differences in covariate distributions may occur between the comparison populations, µ A0 and µ B0 may still be equal, resulting in no confounding based on the counterfactual definition. Discussion.7

8 (Unrealistic but possible) Example The effect of Statin use on mean total cholesterol Potential confounder age obesity Effect on total cholesterol What if younger obese patients were more likely to be randomized to Statins? Possible Scenario: The adverse effect of the large proportion of obese patients in the Statins group may offset the beneficial effect of the large proportion of younger patients, leaving µ Control,Young Obese = µ Control,Older Non obese Discussion.8

9 (Greenland, et al, 1999) Consider a I J K contingency table representing the joint distribution of three discrete variables X, Y, and Z, with the I J marginal table representing the joint distribution of X and Y, and the set of K I J subtables representing the joint distribution of X and Y within levels of Z. Then a measure of association of X and Y is said to be collapsible across Z if it is constant across the strata of Z and this constant value equals the value obtained from the marginal table. Discussion.9

10 (Greenland, et al, 1999) Example: Z =1 Z =0 Marginal X=1 X=0 X=1 X=0 X=1 X=0 Y = Y = Risks (Pr[Y =1]) Risk Differences Risk Ratios Odds Ratio Discussion.10

11 (Greenland, et al, 1999) In this case 1. The risk difference is strictly collapsible (stratum specific measures equal to the marginal measure) 2. The risk ratio is not collapsible (summary measure varies across the strata of Z ) 3. The odds ratio is not collapsible (stratum specific measures not equal to the marginal measure). Discussion.11

12 Example: without confounding Objective: To investigate the effect of an experimental treatment (Tx) on the response probability for the outcome Y in a population A To investigate the effect of Tx a control sample B is enlisted Sample B is chosen so that the distribution of the potential confounder Z is the same as that in the sample from popultion A Discussion.12

13 Example: without confounding Index Sample (A) Response probability if Stratum Tx=1 Tx=0 Stratum Size Z= ,000 Z= ,000 Unconditional on Z Control Sample (B) Response probability if Stratum Tx=1 Tx=0 Stratum Size Z= ,000 Z= ,000 Unconditional on Z 0.4 Discussion.13

14 Example: without confounding First note that no confounding exists (w/ respect to Z ) in the covariate imbalance definition since this was fixed by design, nor in the counterfactual definition since: True crude OR = µ A1/(1 µ A1 ) µ A0 /(1 µ A0 ) 0.6/(1 0.6) = 0.4/(1 0.4) = 2.25 = µ A1/(1 µ A1 ) µ B0 /(1 µ B0 ) = Observable crude OR Discussion.14

15 Example: without confounding But within the levels of Z, we have OR Z =1 = OR Z =0 = 0.9/(1 0.9) 0.7/(1 0.7) = /(1 0.3) 0.1/(1 0.1) = 3.86 Thus the stratum specific estimates of the OR are equal, yet different from the crude (marginal) OR, ie. the OR is noncollapsible. Note that this phenomenon is not bias, but requires careful interpretation of marginal and stratum-specific effects. Discussion.15

16 Example: with collapsibility Index Sample (A) Response probability if Stratum Tx=1 Tx=0 Stratum Size Z= ,000 Z= ,000 Unconditional on Z Control Sample (B) Response probability if Stratum Tx=1 Tx=0 Stratum Size Z= Z= ,650 Unconditional on Z 0.28 Discussion.16

17 Example: with collapsibility By changing the number of subjects with Z =0, we have introduced confounding. To see this, note that Oberservable crude OR = µ A1/(1 µ A1 ) µ B0 /(1 µ B0 ) 0.6/(1 0.6) = 0.28/(1 0.28) = 3.86 True crude OR (2.25) On the other hand the crude OR of 3.86 does now equal the stratum specific odds ratios computed previously. Discussion.17

18 Extension to regression Consider a generalized linear model for the regression of Y on two covariates X and Z : g[e(y X = x, Z = z)] = β 0 + β 1 x + β 2 z. Then the regression is said to be noncollapsible for β 1 over Z if β 1 β1 in the regression omitting Z, g[e(y X = x)] = β 0 + β 1x. Discussion.18

19 Extension to regression Suppose that the full model is correct, then β 1 is gauranteed to be collapsible over Z in the following situations: 1. β 2 = 0 (ie. no association between Y and Z ) 2. Neither β 1 nor β 2 is zero, X and Z are independent, AND g is the identity or log link (Gail, Wieand and Piantadosi, 1984; Gail 1986). Also note that collapsibility for β 1 over Z can occur even if X and Z are associated. Thus we cannot equate collapsibility over Z with independence of X. Discussion.19

20 Extension to regression In the case of situation (2), where we have independence between X and Z and noncollapsibility over Z, the difference between β 1 and β1 is often interpreted as bias due to confounding. However this is not generally true unless g is the identity or log link. Instead, we must take extra precaution in interpreting stratum-specific and population-averaged (marginal) effects. That is, if X and Z are independent, it is possible for β 1 to unbiasedly represent the effect of manipulating X within levels of Z, and at the same time, for β1 to unbiasedly represent the unconditional effect of manipulating X, even though β 1 β1. Discussion.20

21 Graphical representation of noncollapsibility in logistic regression X N (0, 1) Z a 3-level categorical predictor (Z i representing an indicator for groups i=2,3), Z independent of X Full : logit[e(y X = x, Z = z)] = β 0 + β 1 x + β 2 z 2 + β 3 z 3 Reduced : logit[e(y X = x)] = β 0 + β 1 x Discussion.21

22 Graphical representation of noncollapsibility in logistic regression Graphical representation of noncollapsibility in logistic regression Prob[Y=1] Stratum specific probabilities Marginal probability X Page 27/41-1 Discussion.22 D. Gillen/UCI Epi-2007/

23 Effect of adjustment for precision variables on power Linear regression Consider the linear regression model: Y i = β 0 + β 1 X i + β 2 Z i + ɛ i where ɛ i N (0, σf 2 ). Further consider the reduced model: Y i = β 0 + β 1 X i + ɛ i where ɛ i N (0, σ 2 R). Discussion.23

24 Effect of adjustment for precision variables on power Linear regression (cont d) Suppose X and Z are independent and β2 0, then: 1. β 1 = β1 (collapsible due to identity link) 2. Var( ˆβ 1 ) < Var( ˆβ 1 ) Thus if Z is a predictor of Y and Z is independent of X, we can gain power by adjusting for Z. Discussion.24

25 Effect of adjustment for precision variables on power inear regression Reduced (unadjusted) model: 0 ( )!^1 * Full (adjusted) model: 0 ( )!^1 Discussion.25

26 Effect of adjustment for precision variables on power Logistic regression Let Yi B(1, π i ) be a response such that log ( πi 1 π i ) = β 0 + β 1 X i + β 2 Z i Further consider the reduced model: log ( πi 1 π i ) = β0 + β1 X i. Discussion.26

27 Effect of adjustment for precision variables on power inear regression Reduced (unadjusted) model: 0 ( )!^1 * Full (adjusted) model: 0 ( )!^1 Discussion.27

28 Effect of adjustment for precision variables on power Logistic regression (cont d) Suppose X and Z are independent and β2 0, then: 1. β 1 < β 1 2. Var( ˆβ 1 ) < Var( ˆβ 1 ) Thus although adjustment for Z may increase variability of the estimate of β 1, this can be offset by β1 s position in relation to the null hypothesis, resulting in relatively little effect on power. Discussion.28

29 Effect of adjustment for precision variables on power gistic regression Reduced (unadjusted) model: 0 ( )!^1 * Full (adjusted) model: 0 ( )!^1 Discussion.29

30 Linear and logistic regression Reduced Full Mean ˆβ 1 Power Mean ˆβ 1 Power Linear Regression β 1 = 1, β 2 = β 1 = 1, β 2 = β 1 = 1, β 2 = Logistic Regression β 1 = 1, β 2 = β 1 = 1, β 2 = β 1 = 1, β 2 = Discussion.30

31 Effect of adjustment for precision variables on power Proportional hazards regression The proportional hazards model specifies that the hazard function, λ(t X, Z ), is given by λ(t X, Z ) = λ 0 (t)e β 1X+β 2 Z Further consider the reduced model: λ(t X) = λ 0(t)e β 1 X Discussion.31

32 Effect of adjustment for precision variables on power Proportional hazards regression (cont d) Conjecture 1: Suppose X and Z are independent, β2 0, and proportional hazards holds in the adjusted model. Then: 1. β 1 < β 1 2. Var( ˆβ 1 ) < Var( ˆβ 1 ) Discussion.32

33 Effect of adjustment for precision variables on power Proportional hazards regression (cont d) Conjecture 2: Suppose X and Z are independent, β2 0, proportional hazards holds in the unadjusted model. Then for local deviations from proportional hazards in the adjusted model: 1. β 1 < β 1 2. Var( ˆβ 1 ) < Var( ˆβ 1 ) Discussion.33

34 Poisson and Cox regression Reduced Full Mean ˆβ 1 Power Mean ˆβ 1 Power Poisson Regression β 1 = 1, β 2 = β 1 = 1, β 2 = β 1 = 1, β 2 = Cox Regression (case 1) β 1 = 1, β 2 = β 1 = 1, β 2 = β 1 = 1, β 2 = Discussion.34

35 Take-home messages Various definitions of confounding exist, many of which are not distinguished. can occur in the absence of confounding, and confounding can occur in the absence of noncollapsibility. When noncollapsibility occurs in the absence of confounding, the phenomenon is not bias, as long as one is careful to interpret estimates as either stratum-specific or population-averaged. In the context of generalized linear models confounding (in the counterfactual sense) and noncollapsibility are not equivalent unless using an identity or log link. Discussion.35

36 Take-home messages We conjecture that substantial gains in power can be obtained by modeling important predictors of outcome which are independent with the predictor of interest (precision variable) in the setting Cox regression, and the amount of gain depends on the strength of the effect of precision variable. Discussion.36

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Marginal, crude and conditional odds ratios

Marginal, crude and conditional odds ratios Marginal, crude and conditional odds ratios Denitions and estimation Travis Loux Gradute student, UC Davis Department of Statistics March 31, 2010 Parameter Denitions When measuring the eect of a binary

More information

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

On the Use of Linear Fixed Effects Regression Models for Causal Inference

On the Use of Linear Fixed Effects Regression Models for Causal Inference On the Use of Linear Fixed Effects Regression Models for ausal Inference Kosuke Imai Department of Politics Princeton University Joint work with In Song Kim Atlantic ausal Inference onference Johns Hopkins

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (www.cs.ucla.

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (www.cs.ucla. OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/) Statistical vs. Causal vs. Counterfactual inference: syntax and semantics

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Lab 8. Matched Case Control Studies

Lab 8. Matched Case Control Studies Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage

More information

Behavioral Data Mining. Lecture 19 Regression and Causal Effects

Behavioral Data Mining. Lecture 19 Regression and Causal Effects Behavioral Data Mining Lecture 19 Regression and Causal Effects Outline Counterfactuals and Potential Outcomes Regression Models Causal Effects from Matching and Regression Weighted regression Counterfactuals

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Propensity Score Analysis with Hierarchical Data

Propensity Score Analysis with Hierarchical Data Propensity Score Analysis with Hierarchical Data Fan Li Alan Zaslavsky Mary Beth Landrum Department of Health Care Policy Harvard Medical School May 19, 2008 Introduction Population-based observational

More information

Propensity Score Methods, Models and Adjustment

Propensity Score Methods, Models and Adjustment Propensity Score Methods, Models and Adjustment Dr David A. Stephens Department of Mathematics & Statistics McGill University Montreal, QC, Canada. d.stephens@math.mcgill.ca www.math.mcgill.ca/dstephens/siscr2016/

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018 Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

A Decision Theoretic Approach to Causality

A Decision Theoretic Approach to Causality A Decision Theoretic Approach to Causality Vanessa Didelez School of Mathematics University of Bristol (based on joint work with Philip Dawid) Bordeaux, June 2011 Based on: Dawid & Didelez (2010). Identifying

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Research Design: Causal inference and counterfactuals

Research Design: Causal inference and counterfactuals Research Design: Causal inference and counterfactuals University College Dublin 8 March 2013 1 2 3 4 Outline 1 2 3 4 Inference In regression analysis we look at the relationship between (a set of) independent

More information

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores

OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS Outcome regressions and propensity scores OUTCOME REGRESSION AND PROPENSITY SCORES (CHAPTER 15) BIOS 776 1 15 Outcome regressions and propensity scores Outcome Regression and Propensity Scores ( 15) Outline 15.1 Outcome regression 15.2 Propensity

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

Graphical Representation of Causal Effects. November 10, 2016

Graphical Representation of Causal Effects. November 10, 2016 Graphical Representation of Causal Effects November 10, 2016 Lord s Paradox: Observed Data Units: Students; Covariates: Sex, September Weight; Potential Outcomes: June Weight under Treatment and Control;

More information

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization L I Q I O N G F A N S H A R O N D. Y E A T T S W E N L E Z H A O M E D I C

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016

Mediation analyses. Advanced Psychometrics Methods in Cognitive Aging Research Workshop. June 6, 2016 Mediation analyses Advanced Psychometrics Methods in Cognitive Aging Research Workshop June 6, 2016 1 / 40 1 2 3 4 5 2 / 40 Goals for today Motivate mediation analysis Survey rapidly developing field in

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Computing FMRI Activations: Coefficients and t-statistics by Detrending and Multiple Regression

Computing FMRI Activations: Coefficients and t-statistics by Detrending and Multiple Regression Computing FMRI Activations: Coefficients and t-statistics by Detrending and Multiple Regression Daniel B. Rowe and Steven W. Morgan Division of Biostatistics Medical College of Wisconsin Technical Report

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

Causal Inference from Experimental Data

Causal Inference from Experimental Data 30th Fisher Memorial Lecture 10 November 2011 hypothetical approach counterfactual approach data Decision problem I have a headache. Should I take aspirin? Two possible treatments: t: take 2 aspirin c:

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables Causal inference in biomedical sciences: causal models involving genotypes Causal models for observational data Instrumental variables estimation and Mendelian randomization Krista Fischer Estonian Genome

More information

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 2: Constant Treatment Strategies Introduction Motivation We will focus on evaluating constant treatment strategies in this lecture. We will discuss using randomized or observational study for these

More information

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect, 13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +

More information

A tool to demystify regression modelling behaviour

A tool to demystify regression modelling behaviour A tool to demystify regression modelling behaviour Thomas Alexander Gerds 1 / 38 Appetizer Every child knows how regression analysis works. The essentials of regression modelling strategy, such as which

More information

Effect Modification and Interaction

Effect Modification and Interaction By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman Subgroup analysis using regression modeling multiple regression Aeilko H Zwinderman who has unusual large response? Is such occurrence associated with subgroups of patients? such question is hypothesis-generating:

More information

Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula

Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula Causal Inference for Complex Longitudinal Data: The Continuous Time g-computation Formula Richard D. Gill Mathematical Institute, University of Utrecht, Netherlands EURANDOM, Eindhoven, Netherlands November

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Mediation for the 21st Century

Mediation for the 21st Century Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Sensitivity analysis and distributional assumptions

Sensitivity analysis and distributional assumptions Sensitivity analysis and distributional assumptions Tyler J. VanderWeele Department of Health Studies, University of Chicago 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA vanderweele@uchicago.edu

More information

Gov 2002: 4. Observational Studies and Confounding

Gov 2002: 4. Observational Studies and Confounding Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015 Where are we? Where are we going? Last two weeks: randomized experiments. From here on: observational studies. What

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Bios 6648: Design & conduct of clinical research

Bios 6648: Design & conduct of clinical research Bios 6648: Design & conduct of clinical research Section 2 - Formulating the scientific and statistical design designs 2.5(b) Binary 2.5(c) Skewed baseline (a) Time-to-event (revisited) (b) Binary (revisited)

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Multiple linear regression: estimation and model fitting

Multiple linear regression: estimation and model fitting Multiple linear regression: estimation and model fitting January 25 Introduction The goal of today s class is to set up a multiple regression model in terms of matrices and then solve for the regression

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Review Basic Probability Concept

Review Basic Probability Concept Economic Risk and Decision Analysis for Oil and Gas Industry CE81.9008 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i.

Weighting. Homework 2. Regression. Regression. Decisions Matching: Weighting (0) W i. (1) -å l i. )Y i. (1-W i 3/5/2014. (1) = Y i. Weighting Unconfounded Homework 2 Describe imbalance direction matters STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University

More information

EXTENSIONS OF NONPARAMETRIC RANDOMIZATION-BASED ANALYSIS OF COVARIANCE

EXTENSIONS OF NONPARAMETRIC RANDOMIZATION-BASED ANALYSIS OF COVARIANCE EXTENSIONS OF NONPARAMETRIC RANDOMIZATION-BASED ANALYSIS OF COVARIANCE Michael A. Hussey A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

MA Advanced Econometrics: Applying Least Squares to Time Series

MA Advanced Econometrics: Applying Least Squares to Time Series MA Advanced Econometrics: Applying Least Squares to Time Series Karl Whelan School of Economics, UCD February 15, 2011 Karl Whelan (UCD) Time Series February 15, 2011 1 / 24 Part I Time Series: Standard

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

4.1 Example: Exercise and Glucose

4.1 Example: Exercise and Glucose 4 Linear Regression Post-menopausal women who exercise less tend to have lower bone mineral density (BMD), putting them at increased risk for fractures. But they also tend to be older, frailer, and heavier,

More information