Missing Covariate Data in Matched Case-Control Studies

Size: px
Start display at page:

Download "Missing Covariate Data in Matched Case-Control Studies"

Transcription

1 Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago with Glen A. Satten Centers for Disease Control and Prevention Raymond J. Carroll Texas A& M University October 15,

2 General Framework Highly-stratified or Clustered Binary Data Observations: j = 1,..., n within stratum i Strata: (many!) matched set i of case-control data multiple subjects in cluster i (prospective) longitudinal observations on subject i Response: D ij (binary disease status) Covariates: Z ij (vector) 2

3 Logistic Regression Model for Stratified Data For observation j, stratum i, define the odds that D ij = 1: And let θ(z ij ) = Pr(D ij = 1 Z ij ) Pr(D ij = 0 Z ij ) θ(z ij ) = exp(q i + β Z ij ) conditional logistic regression model Stratum-level intercept q i is a nuisance fixed effect 3

4 Conditional Likelihood for Nuisance Intercept Model Model: θ = exp(q i + β Z ij ) Data for stratum i: D i = (D i1,..., D in ) Z i = (Z i1,..., Z in ) Stratum-level likelihood: L i (β, q i ) = Pr(D i Z i ) can be written L i = Pr(D i j D ij, Z i ; β) }{{} Pr( j D ij Z i ; β, q i ) }{{} Important: (1) (2) j D ij is a CSS for q i no q i in (1) Define conditional likelihood for (β) L c i(β) = Pr(D i j D ij, Z i ; β) conditional logistic regression likelihood 4

5 What happens when some covariates may be missing? Covariates: X ij some may be missing Z ij always observed Missing covariate indicator: Odds that D ij = 1: R ij = I(X ij observed) θ(x ij, Z ij ) = exp(q i +β zz ij +β xx ij ) Interest on the effects of covariates given by β = (β z, β x) Missing at random (MAR) assumption allows identification of β R ij X ij D ij, Z ij 5

6 Missing Data Example Matched case-control study of hip fracture 118 female hip fracture patients (cases) in Beijing, China (Huo, Lauderdale and Li) 2 controls per case matched on neighborhood and age (within 5 years) Of interest: whether reproductive factors were related to risk of hip fractures in Chinese women aged 50 years and older. Focus on effects (Z ij s) of: parity (per child) breastfeeding (average months per child) Important adjustors (X ij s) include: height (surrogate for hip axis length) BMI (a well-established risk factor) Height and weight self-reported and hence may be missing 6

7 Missing X in CLR Model Three Tricky Features 1. Three sets of nuisance parameters to manage Nuisance intercept q i in model Distribution of missing X: Missingness model: Pr(X ij Z ij ; α) Pr(R ij D ij, Z ij ; γ) 2. Loss of strata with complete record analysis Hip Fracture Example: X ij s Height and BMI missing for 52 (15%) subjects (52/354) Results in 24 (20%) matched sets dropped and 85 (24%) observations dropped (worse if only one control per case) 7

8 3. MAR is not ignorable : Likelihood (conditioning on Z implicit) : L = {Pr(X D) Pr(R = 1 D) Pr(D)} R {Pr(R = 0 D) Pr(D)} 1 R Unconditional inference about β: L = {Pr(D X; β) Pr(X; α) Pr(R = 1 D; γ)} R {Pr(D; β, α) Pr(R = 0 D; γ)} 1 R {Pr(D X; β) Pr(X; α)} R {Pr(D; β, α)} 1 R Note: a valid likelihood but no longer a valid pmf! 8

9 3. (cont.) MAR is not ignorable : Conditional inference about β (given stratum): Temptation: Begin with L = j {Pr(D j X j ; β) Pr(X j ; α)} R j {Pr(D j ; β, α)} 1 R j and condition on j D j for that stratum Problem: L not a valid probability mass function conditioning does not make sense R j is neither a conditioning statistic nor a random variable in L 9

10 The odds model (with all these problems) Why use CLR anyway? θ(x ij, Z ij ) = exp(q i + β zz ij + β xx ij ) looks like a random (q i ) effects model We are treating q i as a fixed effect. Why? nuisance intercept Retrospective data: CLR likelihood reflects matched case-control sampling design Prospective data (clustered or longitudinal): no distributional assumption on q i distribution Q Z of q i : may depend on Z i = (Z i1,..., Z in ) controls for stratum-level confounders each cluster acts as own control 10

11 Advantage of CLR over Random Effects Models Simulated Example n = 10, 000 matched pairs (j = 1, 2) with model logit{pr(d ij = 1 Z ij )} = q i + βz ij where marginal Pr(D ij = 1) 0.5 and β = 0.5 Case 1: corr(q i, Z ij ) = 0 CLR: ˆβ = 0.52 RE: ˆβ = 0.52 Case 2: corr(q i, Z ij ) = 0.54 CLR: ˆβ = 0.48 RE: ˆβ = 0.85 (q i a confounder) 11

12 Semiparametric Efficiency of Maximum Conditional Likelihood Estimator With data across strata i obtain β by maximizing L c i(β) Let q i be random instead of fixed i Let Q Z be the non-parametric distribution function of q i which may depend on Z Semiparametric model in: ( β }{{}, Q Z }{{} ) nonparametric parametric Then β achieves Cramèr-Rao lower bound in presence of unknown Q Z Lindsay (1983) for fixed Z i = z across i Extends to Q Z varying with Z i across i Key assumption: j D ij is CSS for q i 12

13 Missing X in CLR Model Outline Complete record estimator Bias correction by conditioning on observed missingness pattern Efficiency improvement Elimination of ancillary information in missingness process via projection Approximate projection avoids high-dimensional integral and need for exact distribution of X Variation to problem of attrition in longitudinal analyses 13

14 Notation Data for stratum i: D i = (D i1,..., D in ) Z i = (Z i1,..., Z in ) For missing data, write R i = (R i1,..., R in ) X i = (X i1,..., X in ) Define X i,obs, Z i,obs, D i,obs, etc. to be the observed rows of X i, Z i, D i, etc. 14

15 Complete Record Estimation Exploiting a Missingness Model Delete records with missing X j and model (D obs X obs, Z obs ) as if this were the original data We do not need to model X j, but (selection) bias in β... inefficiency in β Bias correction by conditioning on the missingness process, modelling Pr(D obs X obs, Z obs, R) = n j=1 Pr(D j R j = 1, X j, Z j ) R j Requires a missingness model Pr(R j = 1 D j = d, X j, Z j ) = π(d, Z j ; γ) depends on response and other covariates 15

16 Odds that D j = 1 when X j is observed, θ (X j, Z j ) = Pr(D j = 1 R j = 1, X j, Z j ) Pr(D j = 0 R j = 1, X j, Z j ) are just θ (X j, Z j ) = exp(q i + β zz j + β t xx j + B j ) where B j = log{π(1, Z j ; γ)/π(0, Z j ; γ)} case control B j does not contain β or q i is just an offset term depends on missingness parameter γ 16

17 Implications In the complete record likelihood Pr(D obs X obs, Z obs, R; β, γ, q i ) j R jd j is a CSS for q i The complete-record conditional likelihood L c complete(β, γ) = Pr(D obs j R jd j }{{}, X obs, Z obs, R) is free of q i Maximizing # of cases among complete records L c i,complete(β, γ) i will yield the SPE estimator for β among estimators only relying on complete records 17

18 L c complete (β, α) written in terms of odds θ is: L c complete = j θ (X j, Z j ) R jd j d D j θ (X j, Z j ) R jd j where D = {d : j R jd j = j R jd j } Notes: = All possible allocations of complete-data cases among complete data records D L c complete requires that missingness model π(, ; γ) be known or estimated can use standard software for estimation via offset specification standard errors from standard software are conservative if γ estimated if π(d, Z j ; γ) only depends on either d or Z j, naive complete case estimator is consistent Lipsitz, Parzen & Ewell, 1998 Example (revisited) 18

19 Hip Fracture Example Naive complete-record analysis (using 94 of 118 matched sets) Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit) possibly missing (standard software) 19

20 Non-missingness model (π( )) (data from all 354 observations) Coef. Est. SE Z (others) case elem school middle school post 2nd sch parity br feed (sd unit) (standard software logistic regression) Pr(BMI and Height missing) depends on some covariates (but not parity or breast feeding) non-missing log(or; case vs. control) = B j = log{π j (1)/π j (0)} has mean 0.10 ±.08 which is not very severe complete case analysis (approx) consistent 20

21 Bias-corrected complete-record analysis Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit) (standard software with offset) Bias-corrected complete-record analysis with correct standard errors Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit)

22 Efficiency Improvement with L c complete Suppose all records are available Then π(d j, Z j ; γ) can be estimated with likelihood Pr(R i D i, Z i ; γ) i L c complete (β, γ) with estimated γ is more efficient than L c complete (β, γ) with known γ Why? Examine the full likelihood for stratum i: p(x obs D, Z; β) Pr(R D, Z; γ) }{{} Pr(D Z; β) and note that the complete data likelihood is: Pr(D obs X obs, Z obs, R) = n j=1 Pr(D j R j = 1, X j, Z j ) R j wherein R j is a random variable 22

23 Heuristically, L c complete is inefficient because it contains ancillary information in (R D, Z) Estimation of γ removes (some) ancillary information and exploits information on records with missing X Projection can further improve efficiency... Define the score U c complete = log Lc complete β Idea: remove from Ucomplete c the projection onto the space of functions of (R, D, Z) which are unbiased over R conditional on (D, Z) requires integration over X 23

24 Define the projection of a given score g: Proj(g) = E X (g R, D, Z) E R,X (g D, Z) and an improved score for β Notes: U c = U c complete Proj(U c complete) U c is doubly robust α incorrect OR γ incorrect Proj(Ucomplete c ) is (very) difficult to compute We employ an approximate projection U c improved = U c complete Proj approx (U c complete) exploiting a working model and working integral for (X j D j, Z j ; α) = β solving i U i,improved c even if p is wrong = 0 is consistent 24

25 Hip Fracture Example Efficiency Improvement Working model for BMI and height: dichotomize BMI and height 4-cell multinomial model for BMI height mean BMI and height in each category Bias-corrected complete-record analysis with efficiency improvement Coef. Est. SE Z (others) parity br feed (sd unit) Notes: bmi (sd unit) height (sd unit) small differences for missing covariates greater for non-missing covariates (80% improvement for br feed coefficient) 25

26 (from earlier) Bias-corrected complete-record analysis with correct standard errors Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit)

27 Simulation Study Binary response D ij, logistic regression n = 4 observations per stratum, 200 strata Continuous (X ij, Z ij ) with corr(x ij, Z ij ) 0.5 var(x ij ) = var(z ij ) = 1 Average E(D ij ) 0.3 Missingness probabilities depend on (D ij, Z ij ): 18% missing when D ij = 0 45% missing when D ij = replicates Similar results for binary X ij Similar results for matched case-control study 27

28 Simulation Results Continuous X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c X Z Naive π(z) L c complete π(d, Z) Uimproved c X Z π(d, Z) Uimproved c X / Z π(d, Z) = wrong model 28

29 Key Results Complete data likelihood L c complete uses data (D i,obs ij R ijd ij, X i,obs, Z i,obs, R i ) relies on model π for R ij no model for X ij required loss of efficiency Efficiency improvement in L c complete : projection to increase efficiency of L c complete estimating function U c improved exploits a working model for X ij consistent even if this working model is wrong moderate efficiency gained for β z less gain for β x Better for one (or a few) pattern of missingness Better for missing confounder variables 29

30 Fixed effects models for binary data with drop-outs Longitudinal observations: t = 1,..., J within subject i Response: D it (binary disease status) Covariates: Z it (vector) Drop-out: Subject i observed at times t = 1,..., T i J drop-out time Response vector up to time t: D it = (D i1,..., D it ) T Observed response vector: D it = (D i1,..., D it ) T 30

31 Bias-corrected complete-record model Model of interest: θ it = θ(z it ) = Pr(D it = 1 Z it ) Pr(D it = 0 Z it ) with θ it = exp(q i + β T Z it ) suppressing i... Drop-out hazard model: T is drop-out time λ(t, d t ; γ) = Pr(T = t T t, D t = d t, Z) Marginal drop-out probability: π(t, d t ; γ) = Pr(T = t D = d, Z) drop-out is MAR 31

32 Condition on drop-out: L complete = Pr(D T Z, T ) Now condition on # positive responses: L c complete = Pr(D T T t=1 D t, Z, T ) which yields L c complete = { T t=1 θd t t }π(t, D T ) d T D T { T t=1 θd t t }π(t, d T ) where D T is the set of all possible allocations of complete-positive data responses among complete data records 32

33 Efficiency Improvement with L c complete L c complete variable contains drop-out time T as a random But the drop-out process T D, Z is ancillary for parameter of interest β Remove from U c complete the projection onto the space spanned by all scores that are functions of (R it, D i,t 1, Z i ) R it is non-drop-out indicator at t which are unbiased over R it conditional on (R i,t 1 = 1, D i,t 1, Z i ) projection requires integration over (D t, D t+1,..., D J ) T 33

34 Projection requires integration over (D t, D t+1,..., D J ) T given (D i1,..., D i,t 1 ) T :... requires model for joint distribution of D Z... which depends on the non-parametric distribution Q Z of intercepts q i Approximate projection via a working transition model for the vector of responses D Simulation results relative to L complete = Pr(D T Z, T ) with correct drop-out model 5 10% efficiency improvement for using a rich drop-out model 15 20% improvement using approximate projection bias and efficiency very robust to working transition model for D rich drop-out model irrelevant under approximate projection 34

35 EXTRA SLIDES 35

36 Construction of Projected Score 36

37 Construction of Projected Score Vector R specifies missingness pattern: r k = kth missingness pattern, k = 1,..., 2 n Pattern indicator: k = I(R = r k ) = I(kth pattern observed) Data under kth pattern: X (k,obs) = observed components of X X (k,miss) = missing components of X Similarly D (k,obs), D (k,miss), Z (k,obs), Z (k,miss) Rewrite L c complete as L c complete = 2 n k=1 L k k, where 37

38 L k = Pr(D (k,obs) j D j r kj, X (k,obs), Z (k,obs), R = r k ) is L c complete under the kth missingness pattern Similarly U c complete = 2 n k=1 k U k where U k = log L k / β this sums over all possible missingness patterns Now note: k (R) and U k (D (k,obs), X (k,obs), Z (k,obs) ) no X no R Because of this, we can write where Proj( k U k ) = Proj( k )U,k U,k = E X(k,obs) (U k D (k,obs), Z (k,obs) ) 38

39 And Proj( k ) = k E R ( k D, Z) = k ɛ k So that Proj(U c complete) = 2 n k=1 ( k ɛ k )U,k and U c = U c complete Proj(U c complete) Important notes on Proj(U c complete ) does not contain X unbiasedness only depends on correct model π for R Q: Can we replace U,k by any function of (D (k,obs), Z (k,obs) )? A: Yes! Such approximate functions can be derived from a working model for X j 39

40 Modelling X among the controls 40

41 Modelling X ij among the controls Joint Model for D ij and X ij The model of interest can be written in terms of the odds that D j = 1: where θ(x j, Z j ) = Pr(D j = 1 X j, Z j ) Pr(D j = 0 X j, Z j ) θ(x j, Z j ) = exp(q i + β zz j + β xx j ) Define the marginal (over X j ) odds as θ(z j ) = Pr(D j = 1 Z j ) Pr(D j = 0 Z j ) Define a model for X j via p 0 (X j Z j ; α) = p(x j D j = 0, Z j ) new parameter density or pmf of X j among controls 41

42 Two important facts The marginal (over X j ) odds θ(z j ) are in general θ(z j ) = θ(x, Z j ) p 0 (x Z j ) dx odds of D j = 1 density of X j versus D j = 0 given D j = 0 and more specifically θ = exp(q i + β zz j ) exp(β xx) p 0 (x Z j ; α) dx. x Density p(x j D j = 1, Z j ) can be expressed generally as p(x j D j = 1, Z j ) = p 0 (X j Z j ) θ(x j, Z j ) θ(z j ) odds given X j marginal odds 42

43 and specifically as p(x j D j = 1, Z j ) = p 0 (X j Z j ; α) { exp(β xx) p 0 (x Z j ; α) dx Important notes: x role of q i is the same in θ and θ j D j is a CSS for q i in both: the model for (D j X j, Z j ) and the marginal model for (D j Z j ) p(x j D j = 1, Z j ) does not depend on q i (Satten & Kupper, 1993; Satten & Carroll, 2000) } 1 43

44 Implications The full likelihood for stratum i is p(x obs R, D, Z; β, α)pr(r D, Z; γ)pr(d Z; β, α, q i ) β regression parameter α parameter in (X j D j = 0, Z j ) γ parameter in Pr(R j = 1 D j, Z j ) Important facts: Again, j D j is a CSS for q i Pr(R D, Z) is free of (β, α) MAR: p(x obs R, D, Z) = p(x obs D, Z) By conditioning on j D j, we obtain the joint conditional likelihood for (β, α) L c (β, α) p(x obs D, Z)Pr(D j D j, Z) which is free of q i and γ 44

45 Maximizing the conditional likelihood L c i(β, α) i will be SPE for (β, α) even when X i may be missing L c (β, α) is written: { θ(z L c = p(x j D j, Z j ) R j j j ) D j j where D = {d : j d j = j D j} d D j θ(z j ) d j (Satten & Kupper, 1993; Satten & Carroll, 2000) } Pitfall of L c : heavily dependent on model p 0 (X i Z i ; α) does not reduce to standard conditional likelihood when X i is never missing Simulation results... 45

46 Suboptimal Estimation 46

47 Suboptimal Estimation In L c, random variables are (D, X obs, R) and j D j, Z are the only conditioning statistics Suggests conditioning on (X obs, R) and using likelihood Pr(D X obs, R, Z): Pr(D j R j = 1, X j, Z j ) R j Pr(D j R j = 0, Z j ) 1 R j j Again, j D j is sufficient for q i, so the conditional likelihood L c subopt = Pr(D j D j, X obs, R, Z) is free of q i Because j D j is not CSS, maximizing i L c i,subopt will not be SPE for (β, α) 47

48 L c subopt (β, α, γ) is written j θ (X j, Z j ) R jd j θ (Z j ) (1 R j)d j j θ (X j, Z j ) R jd j θ (Z j ) (1 R j)d j d D Odds that D j = 1 when X j is observed: θ (X j, Z j ) = Pr(D j = 1 R j = 1, X j, Z j ) Pr(D j = 0 R j = 1, X j, Z j ) θ (X j, Z j ) = θ(x j, Z j ) π(1, Z j) π(0, Z j ) and when X j is not observed: θ (Z j ) = Pr(D j = 1 R j = 0, Z j ) Pr(D j = 0 R j = 0, Z j ) θ (Z j ) = θ(z j ) 1 π(1, Z j) 1 π(0, Z j ) Missingness model for R j : Pr(R j = 1 D j = d, Z j ) = π(d, Z j ; γ) 48

49 Important notes about L c subopt : reduces to standard conditional likelihood when X j is never missing less dependent on model for X j but: requires a model π for missingness related to work by Paik & Sacco, 2000 Implementation: we need to pre-estimate α in p 0 (X j Z j ; α) and γ in π(d j, Z j ; γ) plug into L c subopt Simulation results... before maximizing over β 49

50 Full Simulation Results 50

51 Simulation Study Binary response D ij, logistic regression n = 4 observations per stratum, 200 strata Continuous (X ij, Z ij ) with corr(x ij, Z ij ) 0.5 var(x ij ) = var(z ij ) = 1 Average E(D ij ) 0.3 Missingness probabilities depend on (D ij, Z ij ): 18% missing when D ij = 0 45% missing when D ij = replicates Similar results for binary X ij Similar results for matched case-control study 51

52 Simulation Results Continuous X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(d, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(d, Z) L c subopt X Z π(z) Naive π(z) L c complete π(d, Z) Uimproved c X Z π(d, Z) Uimproved c X / Z π(d, Z)

53 Simulation Results Binary X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(y, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(y, Z) L c subopt X Z π(z) L c complete π(y, Z) Uimproved c X Z π(y, Z) Uimproved c X / Z π(y, Z)

54 Estimation via L c subopt Recall: L c subopt conditions on ( j D j, X obs, R, Z) L c subopt contains parameters α in p 0 (X j Z j ; α) and γ in π(d, Z j ; γ) Missingness parameter γ estimated via likelihood Pr(R i D i, Z i ; γ) i X j -model parameter α estimated via likelihood Pr(X i,obs R i, D i, Z i ; α, β x ) i which results in an extra estimate of β x this extra estimate is discarded 54

55 What is π( ) doing in L c subopt? An heuristic explanation The full likelihood can be written Pr(D j Z j ) {p(x j D j, Z j ) π(d j, Z j )} R j j {1 π(d j, Z j )} 1 R j no q i : factors π(d j, Z j ) and [1 π(d j, Z j )] generally drop out with q i : conditioning on j D j replaces Pr(D j Z j ) with Pr(D j D j, Z) j and so π(d j, Z j ) and [1 π(d j, Z j )] still drop out of the final conditional likelihood This likelihood is only conditional on Z j ; (D j, R j, X j ) are random variables 55

56 Conditioning on X obs requires also conditioning on R The starting-point likelihood is Pr(D j R j = 1, X j, Z j ) R j Pr(D j R j = 0, Z j ) 1 R j j each factor contains π( ) after conditioning on j D j: the terms containing π( ) are no longer separable 56

57 Simulation Results Binary X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(y, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(y, Z) L c subopt X Z π(z) L c complete π(y, Z) Uimproved c X Z π(y, Z) Uimproved c X / Z π(y, Z)

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Biometrics DOI: 101111/j1541-0420200700828x Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Samiran Sinha 1, and Tapabrata Maiti 2, 1 Department of Statistics, Texas

More information

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random

TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random TECHNICAL REPORT Fixed effects models for longitudinal binary data with drop-outs missing at random Paul J. Rathouz University of Chicago Abstract. We consider the problem of attrition under a logistic

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Modelling Association Among Bivariate Exposures In Matched Case-Control Studies

Modelling Association Among Bivariate Exposures In Matched Case-Control Studies Sankhyā : The Indian Journal of Statistics Special Issue on Statistics in Biology and Health Sciences 2007, Volume 69, Part 3, pp. 379-404 c 2007, Indian Statistical Institute Modelling Association Among

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018 , Non-, Precision, and Power Statistics 211 - Statistical Methods II Presented February 27, 2018 Dan Gillen Department of Statistics University of California, Irvine Discussion.1 Various definitions of

More information

Lab 8. Matched Case Control Studies

Lab 8. Matched Case Control Studies Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting)

In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) In Praise of the Listwise-Deletion Method (Perhaps with Reweighting) Phillip S. Kott RTI International NISS Worshop on the Analysis of Complex Survey Data With Missing Item Values October 17, 2014 1 RTI

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago

Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data Don Hedeker University of Illinois at Chicago This work was supported by National Institute of Mental Health Contract N44MH32056. 1

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Missing Exposure Data in Stereotype Regression Model: Application to Matched Case Control Study with Disease Subclassification

Missing Exposure Data in Stereotype Regression Model: Application to Matched Case Control Study with Disease Subclassification Biometrics 67, 546 558 June 2011 DOI: 10.1111/j.1541-0420.2010.01453.x issing Exposure Data in Stereotype Regression odel: Application to atched Case Control Study with Disease Subclassification Jaeil

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Fitting stratified proportional odds models by amalgamating conditional likelihoods

Fitting stratified proportional odds models by amalgamating conditional likelihoods STATISTICS IN MEDICINE Statist. Med. 2008; 27:4950 4971 Published online 10 July 2008 in Wiley InterScience (www.interscience.wiley.com).3325 Fitting stratified proportional odds models by amalgamating

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

A unified framework for studying parameter identifiability and estimation in biased sampling designs

A unified framework for studying parameter identifiability and estimation in biased sampling designs Biometrika Advance Access published January 31, 2011 Biometrika (2011), pp. 1 13 C 2011 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asq059 A unified framework for studying parameter identifiability

More information

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari

1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari Orthogonal Locally Ancillary Estimating Functions for Matched-Pair Studies and Errors-in-Covariates Molin Wang Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and John J.

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

Propensity Score Methods, Models and Adjustment

Propensity Score Methods, Models and Adjustment Propensity Score Methods, Models and Adjustment Dr David A. Stephens Department of Mathematics & Statistics McGill University Montreal, QC, Canada. d.stephens@math.mcgill.ca www.math.mcgill.ca/dstephens/siscr2016/

More information

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models Hedeker D & Gibbons RD (1997). Application of random-effects pattern-mixture models for missing data in longitudinal

More information

5 Methods Based on Inverse Probability Weighting Under MAR

5 Methods Based on Inverse Probability Weighting Under MAR 5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Stuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz

Stuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz J. R. Statist. Soc. A (2009) 172, Part 1, pp. 3 20 Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model

Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model Biometrics DOI: 10.1111/biom.12159 Semiparametric Approach for Non-Monotone Missing Covariates in a Parametric Regression Model Samiran Sinha, 1 Krishna K. Saha, 2 and Suojin Wang 1, * 1 Department of

More information

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies. Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

A Bayesian multi-dimensional couple-based latent risk model for infertility

A Bayesian multi-dimensional couple-based latent risk model for infertility A Bayesian multi-dimensional couple-based latent risk model for infertility Zhen Chen, Ph.D. Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

ST745: Survival Analysis: Cox-PH!

ST745: Survival Analysis: Cox-PH! ST745: Survival Analysis: Cox-PH! Eric B. Laber Department of Statistics, North Carolina State University April 20, 2015 Rien n est plus dangereux qu une idee, quand on n a qu une idee. (Nothing is more

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 2: Constant Treatment Strategies. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 2: Constant Treatment Strategies Introduction Motivation We will focus on evaluating constant treatment strategies in this lecture. We will discuss using randomized or observational study for these

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Case-control studies

Case-control studies Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark b@bxc.dk http://bendixcarstensen.com Department of Biostatistics, University of Copenhagen, 8 November

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information