Missing Covariate Data in Matched Case-Control Studies

Size: px

Start display at page:

Download "Missing Covariate Data in Matched Case-Control Studies"

Constance Warren
5 years ago
Views:

1 Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago with Glen A. Satten Centers for Disease Control and Prevention Raymond J. Carroll Texas A& M University October 15,

2 General Framework Highly-stratified or Clustered Binary Data Observations: j = 1,..., n within stratum i Strata: (many!) matched set i of case-control data multiple subjects in cluster i (prospective) longitudinal observations on subject i Response: D ij (binary disease status) Covariates: Z ij (vector) 2

3 Logistic Regression Model for Stratified Data For observation j, stratum i, define the odds that D ij = 1: And let θ(z ij ) = Pr(D ij = 1 Z ij ) Pr(D ij = 0 Z ij ) θ(z ij ) = exp(q i + β Z ij ) conditional logistic regression model Stratum-level intercept q i is a nuisance fixed effect 3

4 Conditional Likelihood for Nuisance Intercept Model Model: θ = exp(q i + β Z ij ) Data for stratum i: D i = (D i1,..., D in ) Z i = (Z i1,..., Z in ) Stratum-level likelihood: L i (β, q i ) = Pr(D i Z i ) can be written L i = Pr(D i j D ij, Z i ; β) }{{} Pr( j D ij Z i ; β, q i ) }{{} Important: (1) (2) j D ij is a CSS for q i no q i in (1) Define conditional likelihood for (β) L c i(β) = Pr(D i j D ij, Z i ; β) conditional logistic regression likelihood 4

5 What happens when some covariates may be missing? Covariates: X ij some may be missing Z ij always observed Missing covariate indicator: Odds that D ij = 1: R ij = I(X ij observed) θ(x ij, Z ij ) = exp(q i +β zz ij +β xx ij ) Interest on the effects of covariates given by β = (β z, β x) Missing at random (MAR) assumption allows identification of β R ij X ij D ij, Z ij 5

6 Missing Data Example Matched case-control study of hip fracture 118 female hip fracture patients (cases) in Beijing, China (Huo, Lauderdale and Li) 2 controls per case matched on neighborhood and age (within 5 years) Of interest: whether reproductive factors were related to risk of hip fractures in Chinese women aged 50 years and older. Focus on effects (Z ij s) of: parity (per child) breastfeeding (average months per child) Important adjustors (X ij s) include: height (surrogate for hip axis length) BMI (a well-established risk factor) Height and weight self-reported and hence may be missing 6

7 Missing X in CLR Model Three Tricky Features 1. Three sets of nuisance parameters to manage Nuisance intercept q i in model Distribution of missing X: Missingness model: Pr(X ij Z ij ; α) Pr(R ij D ij, Z ij ; γ) 2. Loss of strata with complete record analysis Hip Fracture Example: X ij s Height and BMI missing for 52 (15%) subjects (52/354) Results in 24 (20%) matched sets dropped and 85 (24%) observations dropped (worse if only one control per case) 7

8 3. MAR is not ignorable : Likelihood (conditioning on Z implicit) : L = {Pr(X D) Pr(R = 1 D) Pr(D)} R {Pr(R = 0 D) Pr(D)} 1 R Unconditional inference about β: L = {Pr(D X; β) Pr(X; α) Pr(R = 1 D; γ)} R {Pr(D; β, α) Pr(R = 0 D; γ)} 1 R {Pr(D X; β) Pr(X; α)} R {Pr(D; β, α)} 1 R Note: a valid likelihood but no longer a valid pmf! 8

9 3. (cont.) MAR is not ignorable : Conditional inference about β (given stratum): Temptation: Begin with L = j {Pr(D j X j ; β) Pr(X j ; α)} R j {Pr(D j ; β, α)} 1 R j and condition on j D j for that stratum Problem: L not a valid probability mass function conditioning does not make sense R j is neither a conditioning statistic nor a random variable in L 9

10 The odds model (with all these problems) Why use CLR anyway? θ(x ij, Z ij ) = exp(q i + β zz ij + β xx ij ) looks like a random (q i ) effects model We are treating q i as a fixed effect. Why? nuisance intercept Retrospective data: CLR likelihood reflects matched case-control sampling design Prospective data (clustered or longitudinal): no distributional assumption on q i distribution Q Z of q i : may depend on Z i = (Z i1,..., Z in ) controls for stratum-level confounders each cluster acts as own control 10

11 Advantage of CLR over Random Effects Models Simulated Example n = 10, 000 matched pairs (j = 1, 2) with model logit{pr(d ij = 1 Z ij )} = q i + βz ij where marginal Pr(D ij = 1) 0.5 and β = 0.5 Case 1: corr(q i, Z ij ) = 0 CLR: ˆβ = 0.52 RE: ˆβ = 0.52 Case 2: corr(q i, Z ij ) = 0.54 CLR: ˆβ = 0.48 RE: ˆβ = 0.85 (q i a confounder) 11

12 Semiparametric Efficiency of Maximum Conditional Likelihood Estimator With data across strata i obtain β by maximizing L c i(β) Let q i be random instead of fixed i Let Q Z be the non-parametric distribution function of q i which may depend on Z Semiparametric model in: ( β }{{}, Q Z }{{} ) nonparametric parametric Then β achieves Cramèr-Rao lower bound in presence of unknown Q Z Lindsay (1983) for fixed Z i = z across i Extends to Q Z varying with Z i across i Key assumption: j D ij is CSS for q i 12

13 Missing X in CLR Model Outline Complete record estimator Bias correction by conditioning on observed missingness pattern Efficiency improvement Elimination of ancillary information in missingness process via projection Approximate projection avoids high-dimensional integral and need for exact distribution of X Variation to problem of attrition in longitudinal analyses 13

14 Notation Data for stratum i: D i = (D i1,..., D in ) Z i = (Z i1,..., Z in ) For missing data, write R i = (R i1,..., R in ) X i = (X i1,..., X in ) Define X i,obs, Z i,obs, D i,obs, etc. to be the observed rows of X i, Z i, D i, etc. 14

15 Complete Record Estimation Exploiting a Missingness Model Delete records with missing X j and model (D obs X obs, Z obs ) as if this were the original data We do not need to model X j, but (selection) bias in β... inefficiency in β Bias correction by conditioning on the missingness process, modelling Pr(D obs X obs, Z obs, R) = n j=1 Pr(D j R j = 1, X j, Z j ) R j Requires a missingness model Pr(R j = 1 D j = d, X j, Z j ) = π(d, Z j ; γ) depends on response and other covariates 15

16 Odds that D j = 1 when X j is observed, θ (X j, Z j ) = Pr(D j = 1 R j = 1, X j, Z j ) Pr(D j = 0 R j = 1, X j, Z j ) are just θ (X j, Z j ) = exp(q i + β zz j + β t xx j + B j ) where B j = log{π(1, Z j ; γ)/π(0, Z j ; γ)} case control B j does not contain β or q i is just an offset term depends on missingness parameter γ 16

17 Implications In the complete record likelihood Pr(D obs X obs, Z obs, R; β, γ, q i ) j R jd j is a CSS for q i The complete-record conditional likelihood L c complete(β, γ) = Pr(D obs j R jd j }{{}, X obs, Z obs, R) is free of q i Maximizing # of cases among complete records L c i,complete(β, γ) i will yield the SPE estimator for β among estimators only relying on complete records 17

18 L c complete (β, α) written in terms of odds θ is: L c complete = j θ (X j, Z j ) R jd j d D j θ (X j, Z j ) R jd j where D = {d : j R jd j = j R jd j } Notes: = All possible allocations of complete-data cases among complete data records D L c complete requires that missingness model π(, ; γ) be known or estimated can use standard software for estimation via offset specification standard errors from standard software are conservative if γ estimated if π(d, Z j ; γ) only depends on either d or Z j, naive complete case estimator is consistent Lipsitz, Parzen & Ewell, 1998 Example (revisited) 18

19 Hip Fracture Example Naive complete-record analysis (using 94 of 118 matched sets) Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit) possibly missing (standard software) 19

20 Non-missingness model (π( )) (data from all 354 observations) Coef. Est. SE Z (others) case elem school middle school post 2nd sch parity br feed (sd unit) (standard software logistic regression) Pr(BMI and Height missing) depends on some covariates (but not parity or breast feeding) non-missing log(or; case vs. control) = B j = log{π j (1)/π j (0)} has mean 0.10 ±.08 which is not very severe complete case analysis (approx) consistent 20

21 Bias-corrected complete-record analysis Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit) (standard software with offset) Bias-corrected complete-record analysis with correct standard errors Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit)

22 Efficiency Improvement with L c complete Suppose all records are available Then π(d j, Z j ; γ) can be estimated with likelihood Pr(R i D i, Z i ; γ) i L c complete (β, γ) with estimated γ is more efficient than L c complete (β, γ) with known γ Why? Examine the full likelihood for stratum i: p(x obs D, Z; β) Pr(R D, Z; γ) }{{} Pr(D Z; β) and note that the complete data likelihood is: Pr(D obs X obs, Z obs, R) = n j=1 Pr(D j R j = 1, X j, Z j ) R j wherein R j is a random variable 22

23 Heuristically, L c complete is inefficient because it contains ancillary information in (R D, Z) Estimation of γ removes (some) ancillary information and exploits information on records with missing X Projection can further improve efficiency... Define the score U c complete = log Lc complete β Idea: remove from Ucomplete c the projection onto the space of functions of (R, D, Z) which are unbiased over R conditional on (D, Z) requires integration over X 23

24 Define the projection of a given score g: Proj(g) = E X (g R, D, Z) E R,X (g D, Z) and an improved score for β Notes: U c = U c complete Proj(U c complete) U c is doubly robust α incorrect OR γ incorrect Proj(Ucomplete c ) is (very) difficult to compute We employ an approximate projection U c improved = U c complete Proj approx (U c complete) exploiting a working model and working integral for (X j D j, Z j ; α) = β solving i U i,improved c even if p is wrong = 0 is consistent 24

25 Hip Fracture Example Efficiency Improvement Working model for BMI and height: dichotomize BMI and height 4-cell multinomial model for BMI height mean BMI and height in each category Bias-corrected complete-record analysis with efficiency improvement Coef. Est. SE Z (others) parity br feed (sd unit) Notes: bmi (sd unit) height (sd unit) small differences for missing covariates greater for non-missing covariates (80% improvement for br feed coefficient) 25

26 (from earlier) Bias-corrected complete-record analysis with correct standard errors Coef. Est. SE Z (others) parity br feed (sd unit) bmi (sd unit) height (sd unit)

27 Simulation Study Binary response D ij, logistic regression n = 4 observations per stratum, 200 strata Continuous (X ij, Z ij ) with corr(x ij, Z ij ) 0.5 var(x ij ) = var(z ij ) = 1 Average E(D ij ) 0.3 Missingness probabilities depend on (D ij, Z ij ): 18% missing when D ij = 0 45% missing when D ij = replicates Similar results for binary X ij Similar results for matched case-control study 27

28 Simulation Results Continuous X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c X Z Naive π(z) L c complete π(d, Z) Uimproved c X Z π(d, Z) Uimproved c X / Z π(d, Z) = wrong model 28

29 Key Results Complete data likelihood L c complete uses data (D i,obs ij R ijd ij, X i,obs, Z i,obs, R i ) relies on model π for R ij no model for X ij required loss of efficiency Efficiency improvement in L c complete : projection to increase efficiency of L c complete estimating function U c improved exploits a working model for X ij consistent even if this working model is wrong moderate efficiency gained for β z less gain for β x Better for one (or a few) pattern of missingness Better for missing confounder variables 29

30 Fixed effects models for binary data with drop-outs Longitudinal observations: t = 1,..., J within subject i Response: D it (binary disease status) Covariates: Z it (vector) Drop-out: Subject i observed at times t = 1,..., T i J drop-out time Response vector up to time t: D it = (D i1,..., D it ) T Observed response vector: D it = (D i1,..., D it ) T 30

31 Bias-corrected complete-record model Model of interest: θ it = θ(z it ) = Pr(D it = 1 Z it ) Pr(D it = 0 Z it ) with θ it = exp(q i + β T Z it ) suppressing i... Drop-out hazard model: T is drop-out time λ(t, d t ; γ) = Pr(T = t T t, D t = d t, Z) Marginal drop-out probability: π(t, d t ; γ) = Pr(T = t D = d, Z) drop-out is MAR 31

32 Condition on drop-out: L complete = Pr(D T Z, T ) Now condition on # positive responses: L c complete = Pr(D T T t=1 D t, Z, T ) which yields L c complete = { T t=1 θd t t }π(t, D T ) d T D T { T t=1 θd t t }π(t, d T ) where D T is the set of all possible allocations of complete-positive data responses among complete data records 32

33 Efficiency Improvement with L c complete L c complete variable contains drop-out time T as a random But the drop-out process T D, Z is ancillary for parameter of interest β Remove from U c complete the projection onto the space spanned by all scores that are functions of (R it, D i,t 1, Z i ) R it is non-drop-out indicator at t which are unbiased over R it conditional on (R i,t 1 = 1, D i,t 1, Z i ) projection requires integration over (D t, D t+1,..., D J ) T 33

34 Projection requires integration over (D t, D t+1,..., D J ) T given (D i1,..., D i,t 1 ) T :... requires model for joint distribution of D Z... which depends on the non-parametric distribution Q Z of intercepts q i Approximate projection via a working transition model for the vector of responses D Simulation results relative to L complete = Pr(D T Z, T ) with correct drop-out model 5 10% efficiency improvement for using a rich drop-out model 15 20% improvement using approximate projection bias and efficiency very robust to working transition model for D rich drop-out model irrelevant under approximate projection 34

35 EXTRA SLIDES 35

36 Construction of Projected Score 36

37 Construction of Projected Score Vector R specifies missingness pattern: r k = kth missingness pattern, k = 1,..., 2 n Pattern indicator: k = I(R = r k ) = I(kth pattern observed) Data under kth pattern: X (k,obs) = observed components of X X (k,miss) = missing components of X Similarly D (k,obs), D (k,miss), Z (k,obs), Z (k,miss) Rewrite L c complete as L c complete = 2 n k=1 L k k, where 37

38 L k = Pr(D (k,obs) j D j r kj, X (k,obs), Z (k,obs), R = r k ) is L c complete under the kth missingness pattern Similarly U c complete = 2 n k=1 k U k where U k = log L k / β this sums over all possible missingness patterns Now note: k (R) and U k (D (k,obs), X (k,obs), Z (k,obs) ) no X no R Because of this, we can write where Proj( k U k ) = Proj( k )U,k U,k = E X(k,obs) (U k D (k,obs), Z (k,obs) ) 38

39 And Proj( k ) = k E R ( k D, Z) = k ɛ k So that Proj(U c complete) = 2 n k=1 ( k ɛ k )U,k and U c = U c complete Proj(U c complete) Important notes on Proj(U c complete ) does not contain X unbiasedness only depends on correct model π for R Q: Can we replace U,k by any function of (D (k,obs), Z (k,obs) )? A: Yes! Such approximate functions can be derived from a working model for X j 39

40 Modelling X among the controls 40

41 Modelling X ij among the controls Joint Model for D ij and X ij The model of interest can be written in terms of the odds that D j = 1: where θ(x j, Z j ) = Pr(D j = 1 X j, Z j ) Pr(D j = 0 X j, Z j ) θ(x j, Z j ) = exp(q i + β zz j + β xx j ) Define the marginal (over X j ) odds as θ(z j ) = Pr(D j = 1 Z j ) Pr(D j = 0 Z j ) Define a model for X j via p 0 (X j Z j ; α) = p(x j D j = 0, Z j ) new parameter density or pmf of X j among controls 41

42 Two important facts The marginal (over X j ) odds θ(z j ) are in general θ(z j ) = θ(x, Z j ) p 0 (x Z j ) dx odds of D j = 1 density of X j versus D j = 0 given D j = 0 and more specifically θ = exp(q i + β zz j ) exp(β xx) p 0 (x Z j ; α) dx. x Density p(x j D j = 1, Z j ) can be expressed generally as p(x j D j = 1, Z j ) = p 0 (X j Z j ) θ(x j, Z j ) θ(z j ) odds given X j marginal odds 42

43 and specifically as p(x j D j = 1, Z j ) = p 0 (X j Z j ; α) { exp(β xx) p 0 (x Z j ; α) dx Important notes: x role of q i is the same in θ and θ j D j is a CSS for q i in both: the model for (D j X j, Z j ) and the marginal model for (D j Z j ) p(x j D j = 1, Z j ) does not depend on q i (Satten & Kupper, 1993; Satten & Carroll, 2000) } 1 43

44 Implications The full likelihood for stratum i is p(x obs R, D, Z; β, α)pr(r D, Z; γ)pr(d Z; β, α, q i ) β regression parameter α parameter in (X j D j = 0, Z j ) γ parameter in Pr(R j = 1 D j, Z j ) Important facts: Again, j D j is a CSS for q i Pr(R D, Z) is free of (β, α) MAR: p(x obs R, D, Z) = p(x obs D, Z) By conditioning on j D j, we obtain the joint conditional likelihood for (β, α) L c (β, α) p(x obs D, Z)Pr(D j D j, Z) which is free of q i and γ 44

45 Maximizing the conditional likelihood L c i(β, α) i will be SPE for (β, α) even when X i may be missing L c (β, α) is written: { θ(z L c = p(x j D j, Z j ) R j j j ) D j j where D = {d : j d j = j D j} d D j θ(z j ) d j (Satten & Kupper, 1993; Satten & Carroll, 2000) } Pitfall of L c : heavily dependent on model p 0 (X i Z i ; α) does not reduce to standard conditional likelihood when X i is never missing Simulation results... 45

46 Suboptimal Estimation 46

47 Suboptimal Estimation In L c, random variables are (D, X obs, R) and j D j, Z are the only conditioning statistics Suggests conditioning on (X obs, R) and using likelihood Pr(D X obs, R, Z): Pr(D j R j = 1, X j, Z j ) R j Pr(D j R j = 0, Z j ) 1 R j j Again, j D j is sufficient for q i, so the conditional likelihood L c subopt = Pr(D j D j, X obs, R, Z) is free of q i Because j D j is not CSS, maximizing i L c i,subopt will not be SPE for (β, α) 47

48 L c subopt (β, α, γ) is written j θ (X j, Z j ) R jd j θ (Z j ) (1 R j)d j j θ (X j, Z j ) R jd j θ (Z j ) (1 R j)d j d D Odds that D j = 1 when X j is observed: θ (X j, Z j ) = Pr(D j = 1 R j = 1, X j, Z j ) Pr(D j = 0 R j = 1, X j, Z j ) θ (X j, Z j ) = θ(x j, Z j ) π(1, Z j) π(0, Z j ) and when X j is not observed: θ (Z j ) = Pr(D j = 1 R j = 0, Z j ) Pr(D j = 0 R j = 0, Z j ) θ (Z j ) = θ(z j ) 1 π(1, Z j) 1 π(0, Z j ) Missingness model for R j : Pr(R j = 1 D j = d, Z j ) = π(d, Z j ; γ) 48

49 Important notes about L c subopt : reduces to standard conditional likelihood when X j is never missing less dependent on model for X j but: requires a model π for missingness related to work by Paik & Sacco, 2000 Implementation: we need to pre-estimate α in p 0 (X j Z j ; α) and γ in π(d j, Z j ; γ) plug into L c subopt Simulation results... before maximizing over β 49

50 Full Simulation Results 50

51 Simulation Study Binary response D ij, logistic regression n = 4 observations per stratum, 200 strata Continuous (X ij, Z ij ) with corr(x ij, Z ij ) 0.5 var(x ij ) = var(z ij ) = 1 Average E(D ij ) 0.3 Missingness probabilities depend on (D ij, Z ij ): 18% missing when D ij = 0 45% missing when D ij = replicates Similar results for binary X ij Similar results for matched case-control study 51

52 Simulation Results Continuous X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(d, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(d, Z) L c subopt X Z π(z) Naive π(z) L c complete π(d, Z) Uimproved c X Z π(d, Z) Uimproved c X / Z π(d, Z)

53 Simulation Results Binary X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(y, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(y, Z) L c subopt X Z π(z) L c complete π(y, Z) Uimproved c X Z π(y, Z) Uimproved c X / Z π(y, Z)

54 Estimation via L c subopt Recall: L c subopt conditions on ( j D j, X obs, R, Z) L c subopt contains parameters α in p 0 (X j Z j ; α) and γ in π(d, Z j ; γ) Missingness parameter γ estimated via likelihood Pr(R i D i, Z i ; γ) i X j -model parameter α estimated via likelihood Pr(X i,obs R i, D i, Z i ; α, β x ) i which results in an extra estimate of β x this extra estimate is discarded 54

55 What is π( ) doing in L c subopt? An heuristic explanation The full likelihood can be written Pr(D j Z j ) {p(x j D j, Z j ) π(d j, Z j )} R j j {1 π(d j, Z j )} 1 R j no q i : factors π(d j, Z j ) and [1 π(d j, Z j )] generally drop out with q i : conditioning on j D j replaces Pr(D j Z j ) with Pr(D j D j, Z) j and so π(d j, Z j ) and [1 π(d j, Z j )] still drop out of the final conditional likelihood This likelihood is only conditional on Z j ; (D j, R j, X j ) are random variables 55

56 Conditioning on X obs requires also conditioning on R The starting-point likelihood is Pr(D j R j = 1, X j, Z j ) R j Pr(D j R j = 0, Z j ) 1 R j j each factor contains π( ) after conditioning on j D j: the terms containing π( ) are no longer separable 56

57 Simulation Results Binary X ij True values are β z = and β x = % Rel. % Bias MSE Eff. Method X-model R-model β z β x β z β x L c X / Z L c subopt X / Z π(y, Z) L c subopt X / Z π(z) L c X Z L c subopt X Z π(y, Z) L c subopt X Z π(z) L c complete π(y, Z) Uimproved c X Z π(y, Z) Uimproved c X / Z π(y, Z)

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure

Biometrics DOI: 101111/j1541-0420200700828x Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Samiran Sinha 1, and Tapabrata Maiti 2, 1 Department of Statistics, Texas