Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Size: px

Start display at page:

Download "Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use"

Preston George
5 years ago
Views:

1 Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014

2 Presentation Outline I and Data Issues II Correlated Count Regression Models III Time-Dependent Covariate Estimation IV Models for with Excess Zeros V Hurdle Generalized Method of Moments VI : EMA

3 and Data Issues

4 Motivating Example EMA: Ecological Momentary Assessment Interest: Honest Reporting of Marijuana Usage Connection to Craving, Motivation, Social Context

5 Motivating Example EMA: Ecological Momentary Assessment Interest: Honest Reporting of Marijuana Usage Connection to Craving, Motivation, Social Context Subjects recruited based on usage history, physiological testing Respond to text messages 3 times per day for 14 consecutive days.

6 Properties of the Data EMA Data: Count Response Variable Longitudinal Responses Excess Zeros Expected (And Observed) Predictors Change Over Time

7 Correlated Count Regression Models

8 Ordinary Count Regression Models Poisson regression: Random Component: Poisson Distribution Y i Poi(λ(x i )) Systematic and Link Components: Log Link ln(λ(x i )) = x T i β

9 Ordinary Count Regression Models Parameter Estimation typically proceeds using Maximum Likelihood (implemented using Iterative Re-Weighted Least Squares) l(β; y, x) = N [y i ln(λ(β; x i )) λ(β; x i ) ln(y i!)] i=1 Hypothesis Testing performed using Wald Statistics Implicitly assumes Var(Y i ) = E[Y i ]

10 Overdispersion When Var(Y i ) > E[Y i ] the data are overdispersed Positive autocorrelation in the data leads to overdispersion Consequence: Inflation of Type I Error Rate

11 Correlated Count Regression Models Accounting for autocorrelation in count responses: Conditional Models: Include random subjects effects Subject-Specific Interpretations

12 Correlated Count Regression Models Accounting for autocorrelation in count responses: Conditional Models: Include random subjects effects Subject-Specific Interpretations Marginal Models: Determine marginal moments directly Population-Averaged Interpretations

13 Conditional Correlated Count Regression Mixed Poisson Count Regression: Random Component: Poisson Distribution / Gamma Random Effect Y it u i Poi(λ(x it, z it )) u i Gamma(α, β) Systematic and Link Components: Log Link, Random Effects Design ln(λ(x it, z it )) = x T it β + z T it v(u)

14 Conditional Correlated Count Regression Parameter Estimation: Maximum Likelihood, h-likelihood, Markov Chain Monte Carlo, EM Algorithm The above model is often referred to as a Random Intercept model. Sometimes Random Slopes models are applied, adding columns from X to Z.

15 Marginal Correlated Count Regression Marginal Correlated Poisson Count Regression: Random Component: Mean and Variance Specified Y it D (λ(x it ), φv (λ(x it ))) The marginal model is specified through the mean and variance structure, as defining a quasi-likelihood.

16 Marginal Correlated Count Regression The mean is specified through the link and systematic components: ln(λ(x it )) = x T it β The variance-covariance structure is specified directly: V (λ(x it )) = A 1/2 i R i (α)a 1/2 i

17 Marginal Correlated Count Regression Estimate parameters by solving estimating equations: N ( ) λ(β; xi ) T [φv i (λ(β; x i ))] 1 (Y i λ(β; x i )) = 0 i=1 β Dispersion parameters estimated similarly, using GEE2 Test hypotheses using Sandwich Wald Tests / Generalized Score Tests

18 Time-Dependent Covariate Estimation

19 Predictors that include variation both between and within subjects Exogenous versus Endogenous External versus Internal Also Time-Varying Covariates or Within-Subjects Covariates

20 Time-Dependent Covariate Models Consider three approaches: 1 Conditional Models: Mixed Correlated Count Regression 2 Marginal Models: GEE for Count Regression 3 Marginal Models: GMM for Count Regression

21 Conditional Time-Dependent Covariate Models Directly split coefficients into within and between components (Neuhaus and Kalbfleisch (1998)). Traditional Mixed Poisson: ln(λ(x it, z it )) = β 0 + β 1 x it + z T it v(u)

22 Conditional Time-Dependent Covariate Models Directly split coefficients into within and between components (Neuhaus and Kalbfleisch (1998)). Traditional Mixed Poisson: ln(λ(x it, z it )) = β 0 + β 1 x it + z T it v(u) Mixed Poisson with TDC Decomposition: ln(λ(x it, z it )) = β 0 + β 1B x i. + β 1W (x it x i. ) + z T it v(u)

23 Conditional Time-Dependent Covariate Models Coefficient interpretations: β B represents the expected effect on the (transformed) response mean from changes across individuals β W represents the expected effect on the (transformed) response mean from changes within individuals

24 Marginal Time-Dependent Covariate Models: GEE GEE Approach: For longitudinal data, Pepe and Anderson (1994) argued Use a diagonal working correlation structure or Verify the sufficient condition: E[Y it X it ] = E[Y it X ij, j = 1,..., T ] Either will guarantee the expectation of the GEE is the zero vector

25 Marginal Time-Dependent Covariate Models: GEE GEE Approach: Use of Independent working correlation structure recommended Fitzmaurice (1995) noted losses in efficiency with this approach Efficiency depends on strength of autocorrelation

26 Marginal Time-Dependent Covariate Models: GMM Generalized Method of Moments Approach: Lai and Small (2007) proposed a method of avoiding diagonal working correlation structures Idea: Select combinations of derivative and residual terms without a working correlation structure Ensure that expectation is zero, depending on nature of time-dependent covariate

27 Marginal Time-Dependent Covariate Models GMM Process: Minimum Quadratic Form estimation Minimize Q(β) = G T (β; Y, X)W 1 G(β; Y, X) Where G(β; Y, X) is an average vector of valid moment conditions constructed according to the type of TDC such that E[G(β; Y, X)] = 0

28 Models for with Excess Zeros

29 Excess-Zero Count Model Options Hurdle Poisson: Certain Zero comes from one process Once hurdle is cleared, responses are positive Zero-Inflated Poisson: Zero comes from two processes Either Certain Zero or part of Poisson process

30 Excess-Zero Correlated Count Model Options Correlated Count Model Options: 1 Conditional Models: Mixed Hurdle 2 Conditional Models: Mixed ZIP 3 Marginal Models: Hurdle GEE

31 Conditional Model: Mixed Hurdle Poisson Model Mixed Hurdle Poisson Distributional Component Y ij u i HurP(π(x it, z it ; u i ), λ(x it, z it ; u i )) u i N (0, σ 2 u) π it y ij = 0 f ij (y ij u i ; π it, λ it ) = (1 π it ) f (y ij u i ;λ it ) 1 f (0;λ it ) y ij > 0

32 Conditional Model: Mixed ZIP Model Mixed Zero-Inflated Poisson Distributional Component Y ij u i ZIP(π(x it, z it ; u i ), λ(x it, z it ; u i )) u i N (0, σ 2 u) π it + (1 π it )f (0; λ it ) y ij = 0 f ij (y ij u i ; π it, λ it ) = (1 π it )f (y ij u i ; λ it ) y ij > 0

33 Conditional Model: Mixed ZIP Model Mixed Hurdle / ZIP Systematic Components logit(π it ) = x l,it α + z it u ln(λ it ) = x c,it β + z it u

34 Conditional Modeling Mixed Hurdle and ZIP Models: Estimation proceeds using likelihood methods (MCMC) Time-dependent covariates can again be split into within and between effects Models show high Type I Error rates in the presence of time-dependent covariates

35 Marginal Modeling: Hurdle GEE GEE for zero-inflation presented by Dobbie and Welsch (2001) Construct two response vectors: Binary: Certain Zero Indicator: Y bin,it D(π it, π it (1 π it )) Count: Positive counts with Positive Poisson Moments: Y it (y it > 0) D (µ(λ it ), V (λ it ))

36 Marginal Modeling: Hurdle GEE Hurdle GEE: Construct two models: Binary Response: logit (π(z it )) = z T it α Positive Count Response: ln (λ(x it )) = x T it β

37 Marginal Modeling: Hurdle GEE Hurdle GEE: Solve two estimating equations: N i=1 N i=1 ( ) πi V 1 α l,i (y bin π i ) = 0 ( ) µi V 1 ( ) c,i I(yi >0) (yi µ i ) = 0 β (Use Independent Working Correlation Structure for )

38 Hurdle Generalized Method of Moments

39 Why not use existing methods? Need to account for autocorrelation, excess zeros, time-dependent covariates Other methods have a single treatment for all Marginal method (Independent GEE) imposes independence assumption, with consequences of lost efficiency

40 : Model Joint Quasi-Generalized Linear Model: Random Components Certain Zero: Y bin,it = I (Y it = 0) D(π it, π it (1 π it )) Positive Count: Y it (y it > 0) D (µ(λ it ), V (λ it ))

41 : Model Joint Quasi-Generalized Linear Model: Random Components Positive Count: µ(λ it ) = λ it 1 e λ it V (λ it ) = µ(λ it ) [1 λ it + µ(λ it )]

42 : Model Joint Quasi-Generalized Linear Model: Systematic Components Certain Zero: logit (π(z it )) = z T it α Positive Count: ln (λ(x it )) = x T it β

43 : General Process Process: Independently Minimize Quadratic Forms: Q l (α) = (G l (α; Y, Z)) T W 1 l (G l (α; Y, Z)) Q c (β) = (G c (β; Y, X)) T W 1 c (G c (β; Y, X))

44 : General Process Process: Select Valid Moment Conditions: G l (α; Y, Z) = 1 N N g l,i (α; Y i, Z i ) i=1 G c (β; Y, X) = 1 N N g c,i (β; Y i, X i ) i=1 E[g l,ij ] = E[g c,ij ] = 0

45 : General Process Process: Structure of Valid Moment Conditions: g l,ij (α; Y i, Z i ) = π(z is) α k ( I(Yit =0) π(z it ) ) g c,ij (β; Y i, X i ) = µ(λ(x is)) β k ( I(Yit >0)[Y it µ(λ(x is ))] ) It remains to determine how to construct these Valid Moment Conditions

46 : Valid Moment Conditions Determining Valid Moment Conditions: 1 Types as selected by researcher 2 Extended Classification using data

47 : Types Validity of Moment Conditions depends on the expectation: [ ] µis E β (Y it µ it ) = 0 β j Lai and Small (2007) proposed using expected characteristics of individual to make decisions on combinations of s and t that would lead to independent components.

48 : Types Type I TDC: Expectation holds for all s and t Type II TDC: Expectation holds for s t Type III TDC: Expectation holds for s = t Type IV TDC: Expectation holds for s t

49 : Types Type I TDC: The response and time-dependent covariate are associated only at the same time. Type II TDC: The response is associated with prior values of the time-dependent covariate. Type III TDC: A feedback loop exists between the time-dependent covariate and the response. Type IV TDC: The time-dependent covariate is associated with prior values of the response.

50 : Types Construct subject vectors of Valid Moment Conditions using values of s and t that satisfy the chosen type of TDC: g l,ij (α; Y i, Z i ) = π(z is) α k ( I(Yit =0) π(z it ) ) g c,ij (β; Y i, X i ) = µ(λ(x is)) β k ( I(Yit >0)[Y it µ(λ(x it ))] )

51 : Extended Classification Extended Classification Process: 1 Estimate derivative, residual terms of expectation using initial estimates (Independent GEE) 2 Select Valid Moment Conditions individually based on empirical independence of standardized derivative, residual terms (using all subjects) 3 Construct vectors of Valid Moment Conditions using empirically supported combinations of s and t

52 : Extended Classification Using initial parameter estimates, calculate component-wise independent vectors: ˆd sj = ˆµ s β j Standardized Values: rˆt = y t ˆµ t d sji and r ti

53 : Extended Classification Calculate Correlation: ˆρ sjt = ( d sji d sj )( r ti r t ) ( d sji d sj ) 2 ( r ti r t ) 2 Assuming all fourth moments exist and are finite, ( ρ sjt = ˆρ sjt ˆµ22 /N ˆµ 22 = (1/N) i N (0, 1) ( d sji ) 2 ( r ti ) 2 ) Omit potential moment conditions with significant association

54 : Estimation Process 0 (Based on Hurdle GEE (Independent), evaluate associations in potential moment conditions) 1 Construct separate vectors of Valid Moment Conditions for two components of Joint Quasi-GLM 2 Using initial parameter estimates, estimate optimal weight matrices for each component 3 Separately minimize two Quadratic Forms for two components of Joint Quasi-GLM

55 : Estimation Options Implementation of GMM: Two-Step GMM: Estimate weight matrix Ŵ using initial parameter estimates, minimize Q(β) Iterated GMM: Iterate between estimation of Ŵ and minimization of Q(β) Continuously Updating GMM: Minimize Q(β), where W(β) is a function of unknown parameters

56 : Two-Step Estimation Implementation of GMM: Two-Step GMM: Estimate weight matrix Ŵ using initial parameter estimates, minimize Q(β) Iterated GMM: Iterate between estimation of Ŵ and minimization of Q(β) Continuously Updating GMM: Minimize Q(β), where W(β) is a function of unknown parameters

57 : Two-Step Estimation ˆα = arg min [Q l (α)], ˆβ = arg min [Q c (β)] ( ) T 1 N g Cov( ˆα) = l ˆV 1 N α i=1 ( 1 N Cov( ˆβ) = N i=1 g c β ) T ˆV 1 g l ( g c ( 1 N 1 N N i=1 N i=1 g l α g c β ) )

58 : EMA

59 EMA Data Analysis Predict next usage using craving, controls (day of the week, academics)

60 Usage over Time Usage by Study Day Next Usage Day of Study

61 Usage Reports Times Used Frequency % Zeros

62 Usage and Craving Next Usage versus Craving Next Usage Craving

63 Usage and Craving Usage and Craving by Study Day Day of Study Next Usage / Craving

64 Models Fit Three models fit: 1 Mixed Hurdle Poisson, with Between / Within Decomposition of craving 2 GEE Hurdle, with Independent Working Correlation Structure 3 GMM Hurdle, with craving as Type II TDC, Day as Type I TDC

65 Model Results Mixed Hurdle Hurdle IGEE Logistic craving (W) (B) controls Not Significant. ** Count craving (W) (B) cum GPA controls Not Significant. **

66 Model Results Populations with higher craving: Lower probability of certain zero Higher expected positive count, once hurdle is cleared Higher within-subject variation: Higher probability of certain zero Lower expected positive count, once hurdle is cleared

67 Concluding Remarks GMM: Initial Values, Estimation Method Simulating Types of TDC s GMM Fit Statistics ARE versus IGEE

68 Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction