On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Size: px
Start display at page:

Download "On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data"

Transcription

1 On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao Guan, Rajita Sinha Yale University

2 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time

3 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004).

4 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004). Cocaine abuse causes serious concerns to the society because of its association with criminal activities as well as the high cost to treat health problems related to it.

5 Cocaine dependence Cocaine dependence remains a serious public health problem in the US with more than one million individuals meeting criteria for current dependence (SAMSHA, 2004). Cocaine abuse causes serious concerns to the society because of its association with criminal activities as well as the high cost to treat health problems related to it. Despite the availability of efficacious behavioral treatments, relapse rates remain high (Sinha, 2001, 2007).

6 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood.

7 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood. To describe the baseline cocaine use behavior, daily cocaine use trajectory data are collected in a short period prior to treatment.

8 Cocaine relapse and baseline usage behavior Previous studies have revealed that one s baseline cocaine use behavior is predictive of cocaine relapse, along with many other risk factors such as age and gender, cocaine withdrawal severity, and stress and negative mood. To describe the baseline cocaine use behavior, daily cocaine use trajectory data are collected in a short period prior to treatment. statistics derived from these trajectories are then used as predictors in a subsequent analysis to explain cocaine craving and cocaine relapse.

9 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center.

10 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second.

11 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected.

12 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected. The 59 subjects in the first period were interviewed 90 days after treatment.

13 The data The research was conducted at the Clinical Neuroscience Research Unit (CNRU) of the Connecticut Mental Health Center. The study was conducted in two separate periods with 59 enrolling in the first period and 83 enrolling in the second. At the baseline, demographic variables (e.g. age, gender, race, years of cocaine use, anxiety level) and daily cocaine use history in the 90 days prior to admission were collected. The 59 subjects in the first period were interviewed 90 days after treatment. The 83 subjects in the second study were interviewed 14, 30, 90 and 180 days after treatment. Urine screening were conducted on these interviews to ensure the accuracy of the first relapse time.

14 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993).

15 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment.

16 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when.

17 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when. In the second period, urine test was used to check the validity of the self-reported relapse time.

18 Response variables of interest Post-treatment cocaine craving score: desire for using cocaine at this moment, measured by the Tiffany Cocaine Craving Questionnaire-Brief (CCQ-Brief) (Tiffany et al., 1993). First relapse time: the first time that the subject use cocaine after the treatment. At each post treatment interview, the subject will report whether he/she used cocaine, and when. In the second period, urine test was used to check the validity of the self-reported relapse time. The relapse time are interval censored.

19 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar.

20 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar. It has been shown that the TLFB can provide reliable daily cocaine use data that have high 1) retest reliability, 2) correlation with other cocaine use measures and 3) agreement with collateral informants reports of patients cocaine use as well as results obtained from urine assays

21 Assessing baseline cocaine use pattern Upon treatment entry, all subjects were interviewed by well-trained psychologists to collect baseline daily cocaine use history in the 90 days prior to admission, which is documented using a 90-day time-line follow-back (TLFB) Substance Use Calendar. It has been shown that the TLFB can provide reliable daily cocaine use data that have high 1) retest reliability, 2) correlation with other cocaine use measures and 3) agreement with collateral informants reports of patients cocaine use as well as results obtained from urine assays All research assistants had been trained by PhD level psychologists and had over three-year experience in administration of similar assessments and they were closely supervised when conducting these interviews.

22 Assessing baseline cocaine use pattern All subjects had been informed upfront that all data are coded and confidential that they can not be summoned in court. They were also informed that they would be removed from the study if they were found out not being truthful about their drug use.

23 Baseline usage pattern

24 statistics and measurement error The true covariates are characteristics of the patients long term cocaine use pattern. The surrogates are summary statistics from the 90 day baseline usage trajectory, e.g. average daily use amount, use frequency. These trajectories are assumed to be stationary, and we consider two kinds of trajectories marked point pattern: use time and amount; point pattern: use time only. The point pattern is more reliable because a) people tend to under report the use amount, but there is less incentive to deny a use; b) the subjects use different ways to use cocaine, it is hard to convert use amount into equivalent grams.

25 Difference with the classical measurement error problems The baseline trajectories are subject specific stochastic processes. The estimation error in the summary statistics are heteroscedastic, non-gaussian, and dependent on the true covariate. We only have one realization of the random process. In the joint modeling literature, there is work on using the random slop and intercept of longitudinal processes in a second level regression model. But a strong parametric assumption on the measurement error need to be made. In functional data analysis, characteristics of random curves are used in second level regression (Crainiceanu et al. 2009).

26 Notation and setup Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let N i be a stationary stochastic process that has generated the ith cocaine use trajectory over the baseline period (0, τ]. W i = 1 τ τ 0 N i (dt), (1) where Ni is either N i or a different stationary process that is derived from N i. The distribution of N i depends on a set of parameters Λ i, X i = E(W i Λ i ) is the true predictor. W i = X i + U i, σi 2 = var(u i ) might depends on X i and is different across the subjects.

27 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i.

28 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i. Define σ i 2 = lim τ (τσu 2 i ). Weak Dependency Assumption: We assume that (pτ)var[u i (t, p) Λ i ] = σ 2 i 1 pτ α i + o ( 1 pτ where α i is a constant that is typically nonnegative. ), (2)

29 Notation and setup (Cont.) Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let S(t, p) = (t, t + pτ] be a subinterval, where 0 < p 1 and 0 t (1 p)τ, W i (t, p) is the summary statistics defined on S(t, p), and U i (t, p) = W i (t, p) X i. Define σ i 2 = lim τ (τσu 2 i ). Weak Dependency Assumption: We assume that (pτ)var[u i (t, p) Λ i ] = σ 2 i 1 pτ α i + o ( 1 pτ where α i is a constant that is typically nonnegative. By the weak dependence assumption, ), (2) Var[U i (t, p) X i ] σ2 u i p 1 p p 2 τ 2 α i, (3) if pτ is sufficiently large.

30 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B.

31 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B. ( X E(A X, Z) = T X + σ 2 X T Z Z T X Z T Z where σ 2 = E(U T U) = n i=1 σ2 u i. ) ( X, E(B X, Z) = T X X T Z Z T X Z T Z ) θ,

32 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Method of moment bias correction in linear model We consider Y i = X i β + Z T i η + ɛ i, θ = (β, η). The naive estimator using the surrogate W, [ W ˆθ naive = T W W T Z Z T W Z T Z ] 1 [( W T Z T ) ] Y A 1 B. ( X E(A X, Z) = T X + σ 2 X T Z Z T X Z T Z ) ( X, E(B X, Z) = T X X T Z Z T X Z T Z where σ 2 = E(U T U) = n i=1 σ2 u i. The method of moment estimator θ mom is obtained by subtracting σ 2 from the first entry of A ) θ,

33 Subsampling estimation of the variance Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let t be an arbitrary time point with 0 t < τ kpτ, where k = [1/p]. We then partition the interval (t, t + kpτ] into k nonoverlapping subintervals each of length pτ. We require p 1/2 so that k 2. Let W i (t + jpτ, p) be the summary statistic defined on the (j + 1)th subinterval for the ith trajectory, where j = 0, 1,, k 1. Define σ u 2 i (t, p) = 1 k 1 [W i (t + jpτ, p) W i (t, kp)] 2. k j=0

34 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance Let t be an arbitrary time point with 0 t < τ kpτ, where k = [1/p]. We then partition the interval (t, t + kpτ] into k nonoverlapping subintervals each of length pτ. We require p 1/2 so that k 2. Let W i (t + jpτ, p) be the summary statistic defined on the (j + 1)th subinterval for the ith trajectory, where j = 0, 1,, k 1. Define σ u 2 i (t, p) = 1 k 1 [W i (t + jpτ, p) W i (t, kp)] 2. k σ 2 u i (p) = j=0 1 τ kpτ τ kpτ 0 σ 2 u i (t, p)dt,

35 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4)

36 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4) Scheme 1: Ignore correlation, and σ 2 = p/(1 1 k ) n i=1 σ2 u i (p).

37 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling estimation of the variance (cont.) By condition (1), and define α = n i=1 α i, then [ n ] E σ u 2 i (p) 1 ( 1 1 ) σ 2 1 p k p 2 τ 2 i=1 ( 1 p 1 kp k 2 ) α. (4) Scheme 1: Ignore correlation, and σ 2 = p/(1 1 k ) n i=1 σ2 u i (p). Scheme 2: Take into account of correlation, let Ỹ (p) = p n i=1 σ2 u i (p) 1 1/k n Ỹ i (p) and X (p) = i=1 1 p (1 kp)/k2 pτ 2. (1 1/k) (5) Regress Ỹ (p) on X (p) at some preselected values (p 1,, p J ) where J 2. The resulting intercept and (minus) slope estimators, are ˆσ 2 and ˆα.

38 Within trajectory dependence Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Figure: Scatter plot of X (p) and Ỹ (p) for p = k/τ where k = 30, 31,, 40 and τ = 80.

39 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Subsampling extrapolation Like SIMEX, we inflate the error in the estimator, find out how the estimator change as the level of error increases. Not knowing the true distribution of measurement error, we use subsampling to create surrogate with inflated error variance, instead of adding simulated Gaussian error.

40 Subsampling extrapolation Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Like SIMEX, we inflate the error in the estimator, find out how the estimator change as the level of error increases. Not knowing the true distribution of measurement error, we use subsampling to create surrogate with inflated error variance, instead of adding simulated Gaussian error. Let W i (t, p) be the summary statistic defined over the subintervals S(t, p) = (t, t + pτ], Var[W i (t, p) Λ i ] σ2 u i p 1 p ( 1 p 2 τ 2 α i = p 1 p p 2 τ 2 α i σ 2 u i ) σ 2 u i x i (p)σ 2 u i. (6)

41 SUBEX (cont.) Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Let ˆθ(p; t 1,, t n ) be the estimated regression coefficient based on {Y i, W i (t i, p), Z i }, where t i [0, (1 p)τ], i = 1,, n, without accounting for measurement error. ˆθ(p) = 1 [(1 p)τ] n (1 p)τ 0 (1 p)τ The integral is evaluated by Monte-Carlo. 0 ˆθ(p; t 1,, t n )dt 1 dt n. The procedure is repeated for a set of preselected (p 1,..., p J ). We fit a parametric model for (ˆx(p), ˆθ(p)), e.g. G Q (p, Γ) = γ 1 + ˆx(p)γ 2 + ˆx(p) 2 γ 3, where Γ = (γ 1, γ 2, γ 3 ). Normally, a quadratic or cubic extrapolant function is used. ˆθ subex = G Q (p =, ˆΓ). (7)

42 SUBEX (cont.) Outline Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time The variance inflation factor is x(p) = 1 ( 1 1 p ) α p pτ 2 σ 2. Scheme 1: Ignore correlation, x i (p) 1/p. This is valid if α i /(τ 2 σ 2 u i ) is small and/or pτ is large. Scheme 2: Plug in ˆσ 2 and ˆα from the regression between Ỹ (p) and X (p).

43 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Two naive alternatives that ignore within-subject correlation Let W i (0, 1/2) and W i (τ/2, 1/2) be the counterparts of W i based on the data in (0, τ/2] and (τ/2, τ]. Naive MOM: estimate σ 2 by ˆσ 2 naive = n σ u 2 i (1/2) = 1 4 i=1 n [W i (0, 1/2) W i (τ/2, 1/2)] 2. i=1

44 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Two naive alternatives that ignore within-subject correlation Let W i (0, 1/2) and W i (τ/2, 1/2) be the counterparts of W i based on the data in (0, τ/2] and (τ/2, τ]. Naive MOM: estimate σ 2 by n ˆσ naive 2 = σ u 2 i (1/2) = 1 n [W i (0, 1/2) W i (τ/2, 1/2)] 2. 4 i=1 i=1 Empirical SIMEX (Devanarayan and Stefanski, 2002): use the pseudo remeasurements ζ W i (ζ) = W i + 2 [W i(0, 1/2) W i (τ/2, 1/2)], for each ζ > 0.

45 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8)

46 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8) In our data, about 50.6% of the subjects have observed relapse time, 31.6% are interval censored and 17.8% are right censored.

47 Interval censored relapse time Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time We model the first relapse time T i through a Cox proportional hazard model (Cox, 1972), λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η). (8) In our data, about 50.6% of the subjects have observed relapse time, 31.6% are interval censored and 17.8% are right censored. Following Ruppert et al. (2003), Cai and Betensky (2003), ψ(t) = log λ 0 (t) = a 0 + a 1 t + K b k (t κ k ) +, (9) k=1 where x + max(x, 0), and κ k s are the knots.

48 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (2003) We observe n independent tuples, (Ti l, T i r, δ i), where [Ti l, T i r ] gives the censoring interval, δ i is the indicator for right censoring. When δ i = 0 and T l i when δ i = 1 and T l i when δ i = 1 and T l i = T r i, the event time T i is right censored at T r < Ti r, T i is interval censored within [Ti l, T i r ]; = Ti r, T i is observed at Ti r. In addition, let δ 0i = Ti r ). be the indicator for observed data, i.e. δ 0i = I (δ i = 1, T l i i ;

49 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (2003) We observe n independent tuples, (Ti l, T i r, δ i), where [Ti l, T i r ] gives the censoring interval, δ i is the indicator for right censoring. When δ i = 0 and T l i when δ i = 1 and T l i when δ i = 1 and T l i = T r i, the event time T i is right censored at T r < Ti r, T i is interval censored within [Ti l, T i r ]; = Ti r, T i is observed at Ti r. In addition, let δ 0i = Ti r ). be the indicator for observed data, i.e. δ 0i = I (δ i = 1, T l i Denote X = (X T, Z T ) T and Λ 0 (t) = t 0 λ 0(u)du. l(θ) = n i=1 δ 0i {log λ 0 (T r i ) + XT i θ} exp(x T i θ)λ 0 (T r i ) +δ i (1 δ 0i ) log where Θ is the collection of all parameters. ( ) exp[{λ 0 (Ti r ) Λ 0(Ti l )} exp(xt i θ)] 1,(10) i ;

50 Method of moment estimator Subsampling extrapolation (SUBEX) Interval censored failure time Penalized spline approach of Cai and Betensky (Cont.) The spline coefficients b are modeled as random effects with the joint distribution Normal(0, σb 2 I ). The joint likelihood is proportional to l p (Θ) = l(θ) K 2 log(σ2 b) 1 2σb 2 b T b. (11) Θ = (θ T, a T, b T ) T is estimated by maximizing the penalized likelihood (11). The tuning parameter σb 2 is chosen by maximizing the marginal likelihood using Laplace approximation (Breslow and Clayton, 1993). When X is measured with error, SUBEX method is applicable.

51 Simulation 1: Linear model Let N i ( ) be a stationary point process with intensity X i, where X i 1+ Log Normal (0, 0.5); Z i is an error-free Bernoulli random variable independent of X i, with P(Z i = 1) = 0.5. Y i = θ 0 + θ 1 X i + θ 2 Z i + ɛ i, i = 1, 2,, n, where θ = (θ 0, θ 1, θ 2 ) T = (1, 1, 1) T and ɛ i Normal(0, ).

52 Simulation 1: Linear model Let N i ( ) be a stationary point process with intensity X i, where X i 1+ Log Normal (0, 0.5); Z i is an error-free Bernoulli random variable independent of X i, with P(Z i = 1) = 0.5. Y i = θ 0 + θ 1 X i + θ 2 Z i + ɛ i, i = 1, 2,, n, where θ = (θ 0, θ 1, θ 2 ) T = (1, 1, 1) T and ɛ i Normal(0, ). We assume that X i is unknown but can be estimated by W i = N i ((0, τ])/τ, where N i ((0, τ]) denotes the number of events of N i contained in the time interval (0, τ].

53 Simulation 1: Linear model (Cont.) We consider two scenarios for N i. Scenario 1: N i is a homogeneous Poisson process with intensity X i. Scenario 2: N i is a homogeneous Poisson cluster process: we generate a homogeneous Poisson process with intensity ρ i = X i /3 as the parent process; each parent generates m = 3 children on average according to a Poisson distribution, and disperse the children locations independently following a normal distribution centered at the parent location and with standard deviation ω = 2. The final process contains only the children event times.

54 Simulation 1 (Cont.) We set n = 200 and repeat the simulation 200 times in each case. We apply the naive estimator, the empirical SIMEX method, two versions of Method of Moment estimator and two versions of SUBEX (MOM 1 and SUBEX 1 ignore within-trajectory correlation, MOM 2 and SUBEX 2 take into account of the correlation). For SUBEX, we set the p values to be from 0.6 to 0.95 with an increment of 0.05, and use a quadratic function for the extrapolation.

55 Simulation results for linear model Scenario 1: Poisson process, τ = 10 θ 1 θ 2 Bias SE Rel. Eff. Bias SE Rel. Eff. Naive MOM MOM SUBEX (.1334) (.1308).5084 SUBEX (.1317) (.1297).5158 SIMEX (.0896) (.0913).9237 Scenario 2: Poisson cluster process θ 1, τ = 10 θ 1, τ = 20 Bias SE Rel. Eff. Bias SE Rel. Eff. Naive MOM MOM SUBEX (.1227) (.1463) SUBEX (.1556) (.1692) SIMEX (.0802) (.0913)

56 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T.

57 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T. We assume censoring at random and set the censoring times to be (0.2, 0.5, 1). Let the censoring indicator δ i be a binary variable independent of X i and Z i, with P(δ i = 1) = 0.5. When δ i = 1, the event time T i is censored in the interval between the two censoring times closest to T i ; if T i is less than 0.2, it is censored in [Ti l = 0, Ti r = 0.2]; if an event time is over 1, it is automatically right censored at 1.

58 Simulation 2: interval-censored failure time data We simulate X i, N i and Z i as in Simulation 1, and simulate the failure time T i through a Cox model λ(t X i, Z i ) = λ 0 (t) exp(x i β + Z i η), (12) where λ 0 (t) = t and (β, η) T = (1, 1) T. We assume censoring at random and set the censoring times to be (0.2, 0.5, 1). Let the censoring indicator δ i be a binary variable independent of X i and Z i, with P(δ i = 1) = 0.5. When δ i = 1, the event time T i is censored in the interval between the two censoring times closest to T i ; if T i is less than 0.2, it is censored in [Ti l = 0, Ti r = 0.2]; if an event time is over 1, it is automatically right censored at 1. Overall, about 12% of the observations are right censored, 43% are interval censored, and the rest 45% are observed.

59 Simulation results in the interval-censored failure time data Poisson, τ = 10, n = 200 Poisson cluster, τ = 10, n = 200 Bias SE Rel. Eff. Bias SE Rel. Eff. No Error (.1469) (.1462) Naive (.1129) (.0800) SUBEX (.2587) (.2016) SUBEX (.2570) (.2590) SIMEX (.1732) (.1237) Poisson cluster, τ = 20, n = 200 Poisson cluster, τ = 20, n = 400 Bias SE Rel. Eff. Bias SE Rel. Eff. No Error (.1465) (.1015) Naive (.0971) (.0695) SUBEX (.2602) (.1843) SUBEX (.3052) (.2178) SIMEX (.1558) (.1107)

60 Results for the analysis of the CCQ-Brief score data NAIVE MOM 1 SIMEX MOM 2 SUBEX 1 SUBEX 2 frequency.476 (.176).529 (.195).537 (.223).565 (.211).577 (.204).662 (.257) indicator (.062) (.062) (.064) (.062) (.064) (.065) gender (.093) (.093) (.091) (.093) (.092) (.095) race.305 (.099).306 (.100).311 (.101).307 (.100).289 (.100).271 (.103) age (.081) (.082) (.081) (.083) (.081) (.083) cocyrs (.087) (.087) (.085) (.088) (.083) (.086) curanxs.139 (.085).142 (.085).134 (.079).145 (.086).163 (.084).178 (.086)

61 Results for the analysis of the time to first relapse data NAIVE SIMEX SUBEX 1 SUBEX 2 cocuse (.081) (.096) (.095).017 (.206) gender (.299) (.301) (.319) (.347) race (.274) (.277) (.272) (.276) age (.024) (.024) (.024) (.024) cocyrs.110 (.028).111 (.028).111 (.028).109 (.029) curanxs.279 (.221).275 (.224).308 (.218).352 (.232)

62 In many scientific problems in particular substance use research, it is common to have summary statistics derived from some stochastic processes as covariates. The estimation error in these summary statistics causes estimation bias in the regression coefficients like in classical measurement error problems. We propose a new method-of-moment approach for linear models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear models. The proposed methods are based on novel subsampling techniques that take into account of the correlation within individual processes, and have shown good performance in both simulation and real data analysis.

arxiv: v1 [stat.ap] 21 Aug 2015

arxiv: v1 [stat.ap] 21 Aug 2015 Submitted to the Annals of Applied Statistics arxiv: arxiv:. JOINT MODELING OF LONGITUDINAL DRUG USING PATTERN AND TIME TO FIRST RELAPSE IN COCAINE DEPENDENCE TREATMENT DATA arxiv:158.5412v1 [stat.ap]

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Frailty Probit model for multivariate and clustered interval-censor

Frailty Probit model for multivariate and clustered interval-censor Frailty Probit model for multivariate and clustered interval-censored failure time data University of South Carolina Department of Statistics June 4, 2013 Outline Introduction Proposed models Simulation

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

A simple bivariate count data regression model. Abstract

A simple bivariate count data regression model. Abstract A simple bivariate count data regression model Shiferaw Gurmu Georgia State University John Elder North Dakota State University Abstract This paper develops a simple bivariate count data regression model

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate

More information

Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and

More information

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle Statistical Analysis of Spatio-temporal Point Process Data Peter J Diggle Department of Medicine, Lancaster University and Department of Biostatistics, Johns Hopkins University School of Public Health

More information

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models. Some New Methods for Latent Variable Models and Survival Analysis Marie Davidian Department of Statistics North Carolina State University 1. Introduction Outline 3. Empirically checking latent-model robustness

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes

A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes A Joint Model with Marginal Interpretation for Longitudinal Continuous and Time-to-event Outcomes Achmad Efendi 1, Geert Molenberghs 2,1, Edmund Njagi 1, Paul Dendale 3 1 I-BioStat, Katholieke Universiteit

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Time-varying proportional odds model for mega-analysis of clustered event times

Time-varying proportional odds model for mega-analysis of clustered event times Biostatistics (2017) 00, 00, pp. 1 18 doi:10.1093/biostatistics/kxx065 Time-varying proportional odds model for mega-analysis of clustered event times TANYA P. GARCIA Texas A&M University, Department of

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Statistical Inference and Methods

Statistical Inference and Methods Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY Cross-Validation, Information Criteria, Expected Utilities and the Effective Number of Parameters Aki Vehtari and Jouko Lampinen Laboratory of Computational Engineering Introduction Expected utility -

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

A Bivariate Point Process Model with Application to Social Media User Content Generation

A Bivariate Point Process Model with Application to Social Media User Content Generation 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial Jin Kyung Park International Vaccine Institute Min Woo Chae Seoul National University R. Leon Ochiai International

More information

Lab 8. Matched Case Control Studies

Lab 8. Matched Case Control Studies Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Regression Pitfalls. Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap.

Regression Pitfalls. Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap. Regression Pitfalls Pitfall Noun: A hidden or unsuspected danger or difficulty. A covered pit used as a trap. Multiple regression is a widely used and powerful tool. It is also one of the most abused statistical

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Sheng Luo, PhD Associate Professor Department of Biostatistics & Bioinformatics Duke University Medical Center sheng.luo@duke.edu

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Chapter 17. Failure-Time Regression Analysis. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Chapter 17 Failure-Time Regression Analysis William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on the authors

More information

Simple techniques for comparing survival functions with interval-censored data

Simple techniques for comparing survival functions with interval-censored data Simple techniques for comparing survival functions with interval-censored data Jinheum Kim, joint with Chung Mo Nam jinhkim@suwon.ac.kr Department of Applied Statistics University of Suwon Comparing survival

More information

arxiv: v2 [stat.me] 4 Jun 2016

arxiv: v2 [stat.me] 4 Jun 2016 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros

Web-based Supplementary Material for A Two-Part Joint. Model for the Analysis of Survival and Longitudinal Binary. Data with excess Zeros Web-based Supplementary Material for A Two-Part Joint Model for the Analysis of Survival and Longitudinal Binary Data with excess Zeros Dimitris Rizopoulos, 1 Geert Verbeke, 1 Emmanuel Lesaffre 1 and Yves

More information

Digital Southern. Georgia Southern University. Varadan Sevilimedu Georgia Southern University. Fall 2017

Digital Southern. Georgia Southern University. Varadan Sevilimedu Georgia Southern University. Fall 2017 Georgia Southern University Digital Commons@Georgia Southern Electronic Theses & Dissertations Graduate Studies, Jack N. Averitt College of Fall 2017 Application of the Misclassification Simulation Extrapolation

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Multiple imputation to account for measurement error in marginal structural models

Multiple imputation to account for measurement error in marginal structural models Multiple imputation to account for measurement error in marginal structural models Supplementary material A. Standard marginal structural model We estimate the parameters of the marginal structural model

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Ensemble estimation and variable selection with semiparametric regression models

Ensemble estimation and variable selection with semiparametric regression models Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,

More information