On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Size: px

Start display at page:

Download "On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data"

Lindsay Walker
5 years ago
Views:

1 On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao Guan, Rajita Sinha Yale University

2 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time

3 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004).

4 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004). Cocaine abuse causes serious concerns to the society because of its association with criminal activities as well as the high cost to treat health problems related to it.

5 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004). Cocaine abuse causes serious concerns to the society because of its association with criminal activities as well as the high cost to treat health problems related to it. Despite the availability of efficacious behavioral treatments, relapse rates remain high (Sinha, 2001, 2007).

6 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood.

7 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood. To describe the baseline cocaine use behavior, daily cocaine use trajectory data are collected in a short period prior to treatment.

8 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood. To describe the baseline cocaine use behavior, daily cocaine use trajectory data are collected in a short period prior to treatment. statistics derived from these trajectories are then used as predictors in a subsequent analysis to explain cocaine craving and cocaine relapse.

9 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center.

10 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second.

11 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected.

12 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected. The 59 subjects in the first period were interviewed 90 days after treatment.

13 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected. The 59 subjects in the first period were interviewed 90 days after treatment. The 83 subjects in the second study were interviewed 14, 30, 90 and 180 days after treatment. Urine screening were conducted on these interviews to ensure the accuracy of the first relapse time.

14 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993).

15 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment.

16 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when.

17 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when. In the second period, urine test was used to check the validity of the self-reported relapse time.

18 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when. In the second period, urine test was used to check the validity of the self-reported relapse time. The relapse time are interval censored.

19 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar.

20 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar. It has been shown that the TLFB can provide reliable daily cocaine use data that have high 1) retest reliability, 2) correlation with other cocaine use measures and 3) agreement with collateral informants reports of patients cocaine use as well as results obtained from urine assays

21 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar. It has been shown that the TLFB can provide reliable daily cocaine use data that have high 1) retest reliability, 2) correlation with other cocaine use measures and 3) agreement with collateral informants reports of patients cocaine use as well as results obtained from urine assays All research assistants had been trained by PhD level psychologists and had over three-year experience in administration of similar assessments and they were closely supervised when conducting these interviews.

22 Assessing baseline cocaine use pattern All subjects had been informed upfront that all data are coded and confidential that they can not be summoned in court. They were also informed that they would be removed from the study if they were found out not being truthful about their drug use.

23 Baseline usage pattern

24 statistics and measurement error The true covariates are characteristics of the patients long term cocaine use pattern. The surrogates are summary statistics from the 90 day baseline usage trajectory, e.g. average daily use amount, use frequency. These trajectories are assumed to be stationary, and we consider two kinds of trajectories marked point pattern: use time and amount; point pattern: use time only. The point pattern is more reliable because a) people tend to under report the use amount, but there is less incentive to deny a use; b) the subjects use different ways to use cocaine, it is hard to convert use amount into equivalent grams.

25 Difference with the classical measurement error problems The baseline trajectories are subject specific stochastic processes. The estimation error in the summary statistics are heteroscedastic, non-gaussian, and dependent on the true covariate. We only have one realization of the random process. In the joint modeling literature, there is work on using the random slop and intercept of longitudinal processes in a second level regression model. But a strong parametric assumption on the measurement error need to be made. In functional data analysis, characteristics of random curves are used in second level regression (Crainiceanu et al. 2009).

26 Notation and setup Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let N i be a stationary stochastic process that has generated the ith cocaine use trajectory over the baseline period (0, τ]. W i = 1 τ τ 0 N i (dt), (1) where Ni is either N i or a different stationary process that is derived from N i. The distribution of N i depends on a set of parameters Λ i, X i = E(W i Λ i ) is the true predictor. W i = X i + U i, σi 2 = var(u i ) might depends on X i and is different across the subjects.

27 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i.

28 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i. Define σ i 2 = lim τ (τσu 2 i ). Weak Dependency Assumption: We assume that (pτ)var[u i (t, p) Λ i ] = σ 2 i 1 pτ α i + o ( 1 pτ where α i is a constant that is typically nonnegative. ), (2)

29 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i. Define σ i 2 = lim τ (τσu 2 i ). Weak Dependency Assumption: We assume that (pτ)var[u i (t, p) Λ i ] = σ 2 i 1 pτ α i + o ( 1 pτ where α i is a constant that is typically nonnegative. By the weak dependence assumption, ), (2) Var[U i (t, p) X i ] σ2 u i p 1 p p 2 τ 2 α i, (3) if pτ is sufficiently large.

30 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B.

31 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B. ( X E(A X, Z) = T X + σ 2 X T Z Z T X Z T Z where σ 2 = E(U T U) = n i=1 σ2 u i. ) ( X, E(B X, Z) = T X X T Z Z T X Z T Z ) θ,

32 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B. ( X E(A X, Z) = T X + σ 2 X T Z Z T X Z T Z ) ( X, E(B X, Z) = T X X T Z Z T X Z T Z where σ 2 = E(U T U) = n i=1 σ2 u i. The method of moment estimator θ mom is obtained by subtracting σ 2 from the first entry of A ) θ,

33 Subsampling estimation of the variance Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let t be an arbitrary time point with 0 t < τ kpτ, where k = [1/p]. We then partition the interval (t, t + kpτ] into k nonoverlapping subintervals each of length pτ. We require p 1/2 so that k 2. Let W i (t + jpτ, p) be the summary statistic defined on the (j + 1)th subinterval for the ith trajectory, where j = 0, 1,, k 1. Define σ u 2 i (t, p) = 1 k 1 [W i (t + jpτ, p) W i (t, kp)] 2. k j=0

34 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance Let t be an arbitrary time point with 0 t < τ kpτ, where k = [1/p]. We then partition the interval (t, t + kpτ] into k nonoverlapping subintervals each of length pτ. We require p 1/2 so that k 2. Let W i (t + jpτ, p) be the summary statistic defined on the (j + 1)th subinterval for the ith trajectory, where j = 0, 1,, k 1. Define σ u 2 i (t, p) = 1 k 1 [W i (t + jpτ, p) W i (t, kp)] 2. k σ 2 u i (p) = j=0 1 τ kpτ τ kpτ 0 σ 2 u i (t, p)dt,

35 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4)

36 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4) Scheme 1: Ignore correlation, and σ 2 = p/(1 1 k ) n i=1 σ2 u i (p).

37 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4) Scheme 1: Ignore correlation, and σ 2 = p/(1 1 k ) n i=1 σ2 u i (p). Scheme 2: Take into account of correlation, let Ỹ (p) = p n i=1 σ2 u i (p) 1 1/k n Ỹ i (p) and X (p) = i=1 1 p (1 kp)/k2 pτ 2. (1 1/k) (5) Regress Ỹ (p) on X (p) at some preselected values (p 1,, p J ) where J 2. The resulting intercept and (minus) slope estimators, are ˆσ 2 and ˆα.

38 Within trajectory dependence Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Figure: Scatter plot of X (p) and Ỹ (p) for p = k/τ where k = 30, 31,, 40 and τ = 80.

39 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling extrapolation Like SIMEX, we inflate the error in the estimator, find out how the estimator change as the level of error increases. Not knowing the true distribution of measurement error, we use subsampling to create surrogate with inflated error variance, instead of adding simulated Gaussian error.

40 Subsampling extrapolation Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Like SIMEX, we inflate the error in the estimator, find out how the estimator change as the level of error increases. Not knowing the true distribution of measurement error, we use subsampling to create surrogate with inflated error variance, instead of adding simulated Gaussian error. Let W i (t, p) be the summary statistic defined over the subintervals S(t, p) = (t, t + pτ], Var[W i (t, p) Λ i ] σ2 u i p 1 p ( 1 p 2 τ 2 α i = p 1 p p 2 τ 2 α i σ 2 u i ) σ 2 u i x i (p)σ 2 u i. (6)

41 SUBEX (cont.) Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let ˆθ(p; t 1,, t n ) be the estimated regression coefficient based on {Y i, W i (t i, p), Z i }, where t i [0, (1 p)τ], i = 1,, n, without accounting for measurement error. ˆθ(p) = 1 [(1 p)τ] n (1 p)τ 0 (1 p)τ The integral is evaluated by Monte-Carlo. 0 ˆθ(p; t 1,, t n )dt 1 dt n. The procedure is repeated for a set of preselected (p 1,..., p J ). We fit a parametric model for (ˆx(p), ˆθ(p)), e.g. G Q (p, Γ) = γ 1 + ˆx(p)γ 2 + ˆx(p) 2 γ 3, where Γ = (γ 1, γ 2, γ 3 ). Normally, a quadratic or cubic extrapolant function is used. ˆθ subex = G Q (p =, ˆΓ). (7)

42 SUBEX (cont.) Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time The variance inflation factor is x(p) = 1 ( 1 1 p ) α p pτ 2 σ 2. Scheme 1: Ignore correlation, x i (p) 1/p. This is valid if α i /(τ 2 σ 2 u i ) is small and/or pτ is large. Scheme 2: Plug in ˆσ 2 and ˆα from the regression between Ỹ (p) and X (p).

43 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Two naive alternatives that ignore within-subject correlation Let W i (0, 1/2) and W i (τ/2, 1/2) be the counterparts of W i based on the data in (0, τ/2] and (τ/2, τ]. Naive MOM: estimate σ 2 by ˆσ 2 naive = n σ u 2 i (1/2) = 1 4 i=1 n [W i (0, 1/2) W i (τ/2, 1/2)] 2. i=1

44 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Two naive alternatives that ignore within-subject correlation Let W i (0, 1/2) and W i (τ/2, 1/2) be the counterparts of W i based on the data in (0, τ/2] and (τ/2, τ]. Naive MOM: estimate σ 2 by n ˆσ naive 2 = σ u 2 i (1/2) = 1 n [W i (0, 1/2) W i (τ/2, 1/2)] 2. 4 i=1 i=1 Empirical SIMEX (Devanarayan and Stefanski, 2002): use the pseudo remeasurements ζ W i (ζ) = W i + 2 [W i(0, 1/2) W i (τ/2, 1/2)], for each ζ > 0.

45 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8)

46 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8) In our data, about 50.6% of the subjects have observed relapse time, 31.6% are interval censored and 17.8% are right censored.

47 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8) In our data, about 50.6% of the subjects have observed relapse time, 31.6% are interval censored and 17.8% are right censored. Following Ruppert et al. (2003), Cai and Betensky (2003), ψ(t) = log λ 0 (t) = a 0 + a 1 t + K b k (t κ k ) +, (9) k=1 where x + max(x, 0), and κ k s are the knots.

48 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (2003) We observe n independent tuples, (Ti l, T i r, δ i), where [Ti l, T i r ] gives the censoring interval, δ i is the indicator for right censoring. When δ i = 0 and T l i when δ i = 1 and T l i when δ i = 1 and T l i = T r i, the event time T i is right censored at T r < Ti r, T i is interval censored within [Ti l, T i r ]; = Ti r, T i is observed at Ti r. In addition, let δ 0i = Ti r ). be the indicator for observed data, i.e. δ 0i = I (δ i = 1, T l i i ;

49 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (2003) We observe n independent tuples, (Ti l, T i r, δ i), where [Ti l, T i r ] gives the censoring interval, δ i is the indicator for right censoring. When δ i = 0 and T l i when δ i = 1 and T l i when δ i = 1 and T l i = T r i, the event time T i is right censored at T r < Ti r, T i is interval censored within [Ti l, T i r ]; = Ti r, T i is observed at Ti r. In addition, let δ 0i = Ti r ). be the indicator for observed data, i.e. δ 0i = I (δ i = 1, T l i Denote X = (X T, Z T ) T and Λ 0 (t) = t 0 λ 0(u)du. l(θ) = n i=1 δ 0i {log λ 0 (T r i ) + XT i θ} exp(x T i θ)λ 0 (T r i ) +δ i (1 δ 0i ) log where Θ is the collection of all parameters. ( ) exp[{λ 0 (Ti r ) Λ 0(Ti l )} exp(xt i θ)] 1,(10) i ;

50 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (Cont.) The spline coefficients b are modeled as random effects with the joint distribution Normal(0, σb 2 I ). The joint likelihood is proportional to l p (Θ) = l(θ) K 2 log(σ2 b) 1 2σb 2 b T b. (11) Θ = (θ T, a T, b T ) T is estimated by maximizing the penalized likelihood (11). The tuning parameter σb 2 is chosen by maximizing the marginal likelihood using Laplace approximation (Breslow and Clayton, 1993). When X is measured with error, SUBEX method is applicable.

51 Simulation 1: Linear model Let N i ( ) be a stationary point process with intensity X i, where X i 1+ Log Normal (0, 0.5); Z i is an error-free Bernoulli random variable independent of X i, with P(Z i = 1) = 0.5. Y i = θ 0 + θ 1 X i + θ 2 Z i + ɛ i, i = 1, 2,, n, where θ = (θ 0, θ 1, θ 2 ) T = (1, 1, 1) T and ɛ i Normal(0, ).

52 Simulation 1: Linear model Let N i ( ) be a stationary point process with intensity X i, where X i 1+ Log Normal (0, 0.5); Z i is an error-free Bernoulli random variable independent of X i, with P(Z i = 1) = 0.5. Y i = θ 0 + θ 1 X i + θ 2 Z i + ɛ i, i = 1, 2,, n, where θ = (θ 0, θ 1, θ 2 ) T = (1, 1, 1) T and ɛ i Normal(0, ). We assume that X i is unknown but can be estimated by W i = N i ((0, τ])/τ, where N i ((0, τ]) denotes the number of events of N i contained in the time interval (0, τ].

53 Simulation 1: Linear model (Cont.) We consider two scenarios for N i. Scenario 1: N i is a homogeneous Poisson process with intensity X i. Scenario 2: N i is a homogeneous Poisson cluster process: we generate a homogeneous Poisson process with intensity ρ i = X i /3 as the parent process; each parent generates m = 3 children on average according to a Poisson distribution, and disperse the children locations independently following a normal distribution centered at the parent location and with standard deviation ω = 2. The final process contains only the children event times.

54 Simulation 1 (Cont.) We set n = 200 and repeat the simulation 200 times in each case. We apply the naive estimator, the empirical SIMEX method, two versions of Method of Moment estimator and two versions of SUBEX (MOM 1 and SUBEX 1 ignore within-trajectory correlation, MOM 2 and SUBEX 2 take into account of the correlation). For SUBEX, we set the p values to be from 0.6 to 0.95 with an increment of 0.05, and use a quadratic function for the extrapolation.

55 Simulation results for linear model Scenario 1: Poisson process, τ = 10 θ 1 θ 2 Bias SE Rel. Eff. Bias SE Rel. Eff. Naive MOM MOM SUBEX (.1334) (.1308).5084 SUBEX (.1317) (.1297).5158 SIMEX (.0896) (.0913).9237 Scenario 2: Poisson cluster process θ 1, τ = 10 θ 1, τ = 20 Bias SE Rel. Eff. Bias SE Rel. Eff. Naive MOM MOM SUBEX (.1227) (.1463) SUBEX (.1556) (.1692) SIMEX (.0802) (.0913)

56 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T.

57 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T. We assume censoring at random and set the censoring times to be (0.2, 0.5, 1). Let the censoring indicator δ i be a binary variable independent of X i and Z i, with P(δ i = 1) = 0.5. When δ i = 1, the event time T i is censored in the interval between the two censoring times closest to T i ; if T i is less than 0.2, it is censored in [Ti l = 0, Ti r = 0.2]; if an event time is over 1, it is automatically right censored at 1.

58 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T. We assume censoring at random and set the censoring times to be (0.2, 0.5, 1). Let the censoring indicator δ i be a binary variable independent of X i and Z i, with P(δ i = 1) = 0.5. When δ i = 1, the event time T i is censored in the interval between the two censoring times closest to T i ; if T i is less than 0.2, it is censored in [Ti l = 0, Ti r = 0.2]; if an event time is over 1, it is automatically right censored at 1. Overall, about 12% of the observations are right censored, 43% are interval censored, and the rest 45% are observed.

59 Simulation results in the interval-censored failure time data Poisson, τ = 10, n = 200 Poisson cluster, τ = 10, n = 200 Bias SE Rel. Eff. Bias SE Rel. Eff. No Error (.1469) (.1462) Naive (.1129) (.0800) SUBEX (.2587) (.2016) SUBEX (.2570) (.2590) SIMEX (.1732) (.1237) Poisson cluster, τ = 20, n = 200 Poisson cluster, τ = 20, n = 400 Bias SE Rel. Eff. Bias SE Rel. Eff. No Error (.1465) (.1015) Naive (.0971) (.0695) SUBEX (.2602) (.1843) SUBEX (.3052) (.2178) SIMEX (.1558) (.1107)

60 Results for the analysis of the CCQ-Brief score data NAIVE MOM 1 SIMEX MOM 2 SUBEX 1 SUBEX 2 frequency.476 (.176).529 (.195).537 (.223).565 (.211).577 (.204).662 (.257) indicator (.062) (.062) (.064) (.062) (.064) (.065) gender (.093) (.093) (.091) (.093) (.092) (.095) race.305 (.099).306 (.100).311 (.101).307 (.100).289 (.100).271 (.103) age (.081) (.082) (.081) (.083) (.081) (.083) cocyrs (.087) (.087) (.085) (.088) (.083) (.086) curanxs.139 (.085).142 (.085).134 (.079).145 (.086).163 (.084).178 (.086)

61 Results for the analysis of the time to first relapse data NAIVE SIMEX SUBEX 1 SUBEX 2 cocuse (.081) (.096) (.095).017 (.206) gender (.299) (.301) (.319) (.347) race (.274) (.277) (.272) (.276) age (.024) (.024) (.024) (.024) cocyrs.110 (.028).111 (.028).111 (.028).109 (.029) curanxs.279 (.221).275 (.224).308 (.218).352 (.232)

62 In many scientific problems in particular substance use research, it is common to have summary statistics derived from some stochastic processes as covariates. The estimation error in these summary statistics causes estimation bias in the regression coefficients like in classical measurement error problems. We propose a new method-of-moment approach for linear models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear models. The proposed methods are based on novel subsampling techniques that take into account of the correlation within individual processes, and have shown good performance in both simulation and real data analysis.

arxiv: v1 [stat.ap] 21 Aug 2015

arxiv: v1 [stat.ap] 21 Aug 2015 Submitted to the Annals of Applied Statistics arxiv: arxiv:. JOINT MODELING OF LONGITUDINAL DRUG USING PATTERN AND TIME TO FIRST RELAPSE IN COCAINE DEPENDENCE TREATMENT DATA arxiv:158.5412v1 [stat.ap]