Pattern Mixture Models for the Analysis of Repeated Attempt Designs

Size: px
Start display at page:

Download "Pattern Mixture Models for the Analysis of Repeated Attempt Designs"

Transcription

1 Biometrics 71, December 2015 DOI: /biom Mixture Models for the Analysis of Repeated Attempt Designs Michael J. Daniels, 1, * Dan Jackson, 2, ** Wei Feng, 3, *** and Ian R. White 2, **** 1 Department of Integrative Biology, Department of Statistics & Data Sciences, University of Texas, Austin, TX Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, U.K. 3 Department of Statistics, University of Florida, Gainesville, Florida mjdaniels@austin.utexas.edu daniel.jackson@mrc-bsu.cam.ac.uk fengwei@ufl.edu ian.white@mrc-bsu.cam.ac.uk Summary. It is not uncommon in follow-up studies to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful or not provides useful information for the purposes of assessing the missing at random (MAR) assumption and facilitating missing not at random (MNAR) modeling. This is because measurements from subjects who provide this data after multiple failed attempts may differ from those who provide the measurement after fewer attempts. This type of continuum of resistance to providing a measurement has hitherto been modeled in a selection model framework, where the outcome data is modeled jointly with the success or failure of the attempts given these outcomes. Here, we present a pattern mixture approach to model this type of data. We re-analye the repeated attempt data from a trial that was previously analyed using a selection model approach. Our pattern mixture model is more flexible and is more transparent in terms of parameter identifiability than the models that have previously been used to model repeated attempt data and allows for sensitivity analysis. We conclude that our approach to modeling this type of data provides a fully viable alternative to the more established selection model. Key words: Nonignorable missingness; Repeated attempt model; Selection model; Sensitivity analysis. 1. Introduction It is not uncommon in follow-up studies to make multiple attempts to collect a measurement after baseline (e.g., Wood et al., 2006; Jackson et al., 2012). Here, we refer to this type of design as a repeated attempt designs (RAD) and the corresponding statistical models as repeated attempt models (RAM). Information about the multiple attempts made to obtain outcome data have the potential to provide some information about the unobserved responses. This has been exploited in several papers, in the context of selection models (Alho, 1990; Wood et al., 2006; Jackson et al., 2010, 2012). In these selection models the information about the repeated attempts is thought to describe a continuum of resistance to providing data (Lin and Schaeffer, 1995). Evidence for this type of resistance can be informally assessed by tabulating numerical summaries of outcome data, such as the mean outcome, by the number of attempts made to obtain these data. Assuming that a large value for the response variable is a favorable outcome, a negative association between the mean outcome and the number of attempts made to obtain outcome data provides evidence that those with less favorable outcomes are resistant. This would suggest that those who do not provide data after many failed attempts, and therefore are highly resistant, may have very unfavorable outcomes and this would almost certainly invalidate a statistical analysis which assumes data are missing at random (MAR). The advantage of the existing selection models is that they exploit the RAD to identify all the parameters in the full data model. Selection models that describe the marginal probability that outcome data are observed, rather than the marginal probability that each attempt to obtain outcome data is successful, are very sensitive to outliers and the distributional assumptions made, because these models are very weakly identified (Kenward, 1998). However the selection model becomes much more strongly identifiable when it is used to describe each attempt (Jackson et al., 2012) in situations where multiple attempts to obtain data are made. This identifiability can also be considered a disadvantage of the RAM because this approach does not allow the type of sensitivity parameter defined by Daniels and Hogan (2008); such parameters are recommended for the analysis of missing data (National Research Council, 2010) and importantly, when varied, do not impact the fit of the model to the observed data. As such, these parameters allow examination of the sensitivity of inferences to unverifiable assumptions about the missingness. To our knowledge, there has been no previous work using a pattern mixture model for repeated attempt data. mixture models (Little, 1993, 1995) have been advocated as an approach to handle missing data that easily allows for sensitivity parameters due to their direct connection to the extrapolation factoriation (Daniels and Hogan, 2000, 2008). Here we describe how this type of model can be The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

2 Mixture Models for the Analysis of Repeated Attempt Designs 1161 adapted to incorporate the repeated attempt information; as such, we propose a pattern mixture model RAM as a competitor to the selection model RAM. We motivate this work based on the QUATRO trial (Gray et al., 2006). The QUATRO trial (Gray et al., 2006) was a single-blind, multi-center randomied controlled trial of the effectiveness of adherence therapy for participants with schiophrenia. The trial included 409 participants in four centers: Amsterdam (the Netherlands), Leipig (Germany), London (United Kingdom) and Verona (Italy). Participants were recruited from June 2002 to October 2003 from people under the care of mental health services and were individually randomied to receive either adherence therapy (intervention) or health education (control). The inclusion and exclusion criteria are described in detail by Gray et al. (2006). Assessments were undertaken at baseline and at a follow-up of 52 weeks. The objective of this multicenter trial was to assess the impact of adherence therapy on self-reported quality-of-life of people with severe mental illness. The investigators made multiple attempts to collect the quality-of-life outcome (as many as nine attempts) but there were still individuals whose response could not be collected. In the treatment and control arms, 29 out of 204 (14%) and 13 out of 205 (6%) subjects failed to provide outcome data at the end of the trial, respectively. This imbalance in the amount of missing data by treatment arm, in conjunction with the concern that those with less favorable outcomes may be less likely to provide data, motivated both the previous MNAR modeling of Jackson et al. (2010) and the methods described here. The intuition that underlies our modeling is that we could produce treatment arm-specific plots which show the mean quality of life scores against the number of attempts. To impute the outcomes for those who do not provide outcome data, we could fit regression lines to each of these plots and then extrapolate, where the extent of the extrapolation reflects how resistant the non-responders are thought to be. We develop a statistically principled and more general version of this idea, where we fully take into account all the various sources of uncertainty in a Bayesian pattern mixture model. We first briefly review the use of selection models for the RAD in Section 2. We then introduce a pattern mixture model for this design that allows intuitive sensitivity parameters in Section 3. We show connections between the selection model and pattern mixture model formulations in Section 4. We reanalye the QUATRO data using these models in Section 5. Section 6 contains a discussion and open issues. 2. Notation, Target of Inference, and Review of Repeated Attempt Models 2.1. Notation The outcome of interest (here, self reported QoL at 52 weeks) will be denoted as Y, and the set of baseline covariates (here, baseline QoL and center) as X. R will denote the number of attempts until the outcome is successfully collected; we assume up to K attempts where R = K + 1 corresponds to the outcome not being collected after the maximum number of attempts. We assume a randomied trial where Z denotes randomiation to the intervention of interest (here, adherence therapy); an extension to an observational study is mentioned in Section Inferential Goal The quantity of interest here, θ, is the treatment effect on the means, unconditional on X and R, θ = E(Y Z = 1) E(Y Z = 0). (1) In the existing RAM-SM (described in Section 2.3), this parameter is specified directly in the model; see equation (3) below. For the RAM-PMM (introduced in Section 3), this parameter is not directly specified in the model; computation of this parameter requires evaluation of the double integral in (5) below. In the QUATRO data, θ is the effect of adherence therapy on self reported QoL at 52 weeks Repeated Attempt Selection Model (RAM-SM) The RAM-SM was originally proposed by Alho (1990). A logistic regression is used to model the probability that each attempt to obtain outcome data is successful, where an MNAR model is obtained by using the (possibly missing) outcome data as a covariate in this logistic regression. The joint likelihood of the response and the repeated attempts data is obtained by modeling the marginal distribution of the outcome data Y and the missing data mechanism of the RAM (Wood et al., 2006); the latter describes the probability that the attempts are successful given Y. The key identifying assumption is that the covariate effect associated with the outcome Y is common across all attempts. To date, the RAM has only been developed for an incomplete univariate outcome. An example of the missing data mechanism in a RAM is given by the model logitp(r = k R k, Z =, X = x,y = y) = λ 0k + γ k + λ k x + δy, (2) where R = k if the kth attempt is successful. We assume that all attempts are independent, so λ 0k is the log odds, for participants with X = 0, Y = 0, and Z = 0, that the kth attempt is successful given no success on the previous k 1 attempts. Model (2) can be thought of as a discrete time survival model, or a stratified logistic regression, where we also handle the unobserved outcomes (Jackson et al., 2012). The term λ k x allows the probability that attempts are successful to depend on covariates, where the covariate effects may or may not depend on k. The term δy permits MNAR models. The MAR assumption is equivalent to assuming that δ = 0. The key identifying assumption is that the covariate effect of Y in (2) is constant across all attempts, that is δ does not depend on k. By jointly modeling Y, and then the attempts given Y using (2), a selection model approach has been adopted. If only one attempt is made for all subjects then the RAM reduces to a standard selection model. The model specified for the response Y is Y Z =, X = x N{μ(, x),σ 2 (, x)}, (3)

3 1162 Biometrics, December 2015 where in the example μ(, x) = β 0 + θ + β x and σ 2 (, x) = σ 2. Alho (1990) originally proposed using a modified likelihood to fit the RAM but two new estimation methods have subsequently been developed by Wood et al. (2006). The first of these methods uses the EM algorithm to fit the model using the full likelihood and the second uses a Bayesian approach and the software WinBUGS (Lunn et al., 2002). The full likelihood was also used by Jackson et al. (2012) to fit the RAM, but without using the EM algorithm. It can be difficult to assess the fit of this selection model RAM to the observed data because the observed data likelihood is not available in closed form (though it can be evaluated numerically using one-dimensional Gaussian quadrature to integrate out the missing data). Previous work involving fitting a RAM to the QUATRO data was performed using WinBUGS (see Jackson et al. (2010) for full details). Briefly, no strong evidence that data are MNAR was found. As such, the MAR assumption made by Gray et al. (2006) does not appear unreasonable though the estimated treatment effect was slightly reduced (but it remained statistically insignificant). 3. Repeated Attempt Mixture Model (RAM-PMM) 3.1. Model In our pattern mixture formulation of the RAM, denoted as RAM-PMM, the patterns are defined by the values of R. For pattern R = k (k = 1,...,K+ 1) and arm = 0, 1, we consider the following model for the conditional distribution of Y Y Z =, X = x,r= k N{μ(, x,k),σ 2 (, x,k)}, so that we allow the mean and variance of the outcome data to depend on covariates and patterns. In all our modeling, we assume the particular function μ(, x,k) for k K, μ(, x,k) = α (k) + β 1 x, (4) and a constant variance, σ 2 (, x,k) = σ 2. More complex forms of the mean and variance are possible with richer data; we return to this issue in the discussion. So far, we have not discussed anything about the identification of the distribution of the missing data, specifically, μ(, x,k+ 1). Different assumptions about μ(, x,k+ 1) in Section 3.2 will allow us to explore a range of possibilities in a sensitivity analysis. We then specify a model for the conditional distribution of the pattern indicator (similar to the RAM-SM specification in equation (2) but not conditioning on Y). An example of this type of model is logit{π k (x)} =λ 0k + λ + λ x x, where π k (x) = P(R = k R k, Z =, X = x). Finally, we specify a model for [X Z] = [X] (by randomiation). This factoriation respects the fact, which is sometimes overlooked in pattern mixture models, that the distribution of the baseline outcome (which is included in X) does not depend on Z. Note that this model can be specified parametrically or using Bayesian nonparametrics (more in the discussion). Here, we assume a parametric model for this distribution; note, that in our example, X is composed of the baseline outcome, Y 0 and indicators of center. We assume for each center, the distribution of the baseline outcome is normal with mean and variance depending on center. It is easy to assess the fit of this model to the observed data as its distribution is modeled directly in the pattern mixture framework. The quantity of interest here, θ, given in (1) can be written as follows, θ = E[Y Z = 1] E[Y Z = 0] = μ(1, x,k)df(k x,z = 1)dF(x) μ(0, x, k)df(k x,z = 0)dF(x), (5) where F(k x,) and F(x) were specified above. All integrals are double integrals over x and k (the first integral is a sum over the discrete k). The parameter, θ can be computed using Monte Carlo (MC) integration in WinBUGS. Details can be found in Section Priors In the proposed mixture model, not all parameters are identified by the observed data. One of the main contributions of our approach is the form of the priors for the unidentified parameters, which is described below. First we outline priors for the identified parameters. Identified parameters. The parameters indexing (and identified by) the observed data comprise ({α (k) : k = 1,...,K}, β 1,σ,λ). For the regression parameters, we use diffuse normal priors. For the variance component, σ 2, we use a vague inverse gamma prior. Unidentified parameters. The parameters α (K+1) are not identified by the observed data (or modeling assumptions) and would be classified as sensitivity parameters (Daniels and Hogan, 2008). To identify these parameters, we exploit the repeated attempt design and assume a functional relationship between the intercept parameters for the observed outcomes, {α (k) : k = 1,...,K} and the number of attempts (k). In particular, we specify a prior for α (K+1) conditional on ᾱ (K) = (α (1),...,α (K) ) T, i.e., p(α (K+1) ᾱ (K) ). We center this prior at its prediction based on implicitly fitting the regression α (k) = h (k; ζ) + ɛ k, k = 1,...,K. In what follows, we set h (k; ζ) = ζ 0 + ζ 1 k, k = 1,...,K (a linear regression with α (k) as the dependent variable and pattern (k) as the independent variable) and compute the least squares estimate of (ζ 0,ζ 1 ) to obtain α (K+1) ᾱ (K) N{ˆζ 0 + ˆζ 1 (K + C),τ 2 }.

4 Mixture Models for the Analysis of Repeated Attempt Designs 1163 where ˆζ j are functions of {α (1),...,α (K) }. The linear relation- for k = 1,,K in k is the key assumption. We ship of α (k) assume that the intercepts in (4) follow a linear trend over patterns that provide outcome data and that we can extrapolate from this following the intuition that we described in the introduction. Here the sensitivity parameters are C and τ, where C represents how far we should extrapolate the linear trend to describe the missing outcome data (i.e., how resistant are those that have not provided outcome data by the Kth attempt) and τ represents our uncertainty in the precision of the extrapolation. In what follows, the sensitivity parameter τ is fixed at ero and we focus on C. This approach does not put any modeling restrictions on the observed data, but still attempts to use information in an intuitive manner from the repeated attempt design. Also note that it is not necessary to assume a linear form of h (k; ζ); any functional relationship (subject to having enough patterns/attempts) is possible. We return to this in the discussion Connections to Priors and Sensitivity Parameters in a Two- Mixture Model It is not uncommon for information on the number of attempts to be collected but not used in the modeling. The pattern mixture model approach in that situation would only have two patterns, corresponding to Y being observed or missing. Implicitly, the first pattern would be formed by combining the first K successful patterns (where outcome was observed) from our RAM-PMM into a single pattern. We can define the corresponding intercept for this combined pattern as a function of the parameters in our repeated attempt model to be α, where α = K p k=1 kα (k) and p k = E x {P(R = k Z =, X = x)}; the quantity P(R = k, x) can be computed recursively from the π k s. The typical approach in a two pattern model would be to specify the conditional mean of α (K+1) as E(α (K+1) α ) = α + η, where η is a sensitivity parameter. The value of η implied by the model and priors for our RAM-PMM is η = ˆζ 0 + ˆζ 1 (K + C) α. (6) If the investigator wanted to do an analysis with a two-pattern model, (6) could help calibrate η based on the repeated attempt model. We do this in Section Computations in WinBUGS Models can be specified, and the posterior sampled using MCMC, in WinBUGS (see the Supplementary Materials for code). For continuous components in X, the integral in (1) can be computed in WinBUGS by using the following trick: (1) Create L units with a missing outcome and missing covariates. (2) Compute the mean of these L outcomes for each Z at each iteration. This trick implicitly does a Monte Carlo (MC) integration over the distribution of X at each iteration. L is chosen such that the error in the MC integration for the quantities of interest is negligible. Corresponding code can be found in the Supplementary materials. 4. Some Connections Between the Parameters of RAM-PMM and RAM-SM 4.1. SM Corresponding to the RAM-PMM and Direct Theoretical Connections In the following, we describe the selection model derived from the RAM-PMM. For simplicity, we only consider one covariate and suppress the dependence of the intercept on. In addition, we assume the coefficient for this covariate is constant across patterns as is the residual variance, Y x, R = k N(α (k) + β 1 x, σ 2 ), for k = 1,...,K+ 1. The implied selection model has the link function, log P(D = k + 1 x, y). P(D = k x, y) The main observation here is that the implied selection model corresponds to a different link function than that used in the RAM-SM and obviously a different distribution for Y (i.e., a mixture of normals). There are also some direct theoretical connections between the RAM-SM and RAM-PMM. Setting the parameter ζ 1 equal to ero (due to α (k) not depending on k) implies MAR (since then we have the same (normal) distribution in each pattern and thus the marginal distribution of y is normal [not a mixture of normals]) for the RAM-PMM. This model is equivalent to RAM-SM with δ = Empirical Connections We also assess connections empirically between the RAM- SM and RAM-PMM specified in this paper. We simulated data under the RAM-SM (using parameter values based on the QUATRO data) to assess how the value of δ impacts the derived pattern specific conditional means, α (k) as a function of k (see Figure 1); as such, we do not actually need to fit the RAM-PMM and the sensitivity parameter, C does not need to be specified. We see that there is a monotone non-decreasing pattern in the pattern specific means for odds ratios larger than one and monotone non-increasing pattern for odds ratios less than one. We also simulated data under many different true values for λ 0j (not shown); for all scenarios examined, the odds ratios were monotone non-increasing (non-decreasing) based on the sign of δ. To summarie, the RAM-SM and RAM-PMM try to exploit the repeated attempts in similar ways. However, the RAM- PMM allows for sensitivity analysis and more transparency in the model specification for the observed data and parameter identifiability; sensitivity analysis is an essential component of inference for missing data in randomied trials. 5. Analysis of QUATRO The main analysis of QUATRO (Gray et al., 2006) was a complete-case analysis using a linear regression model, where individuals with missing data at baseline or follow-up were excluded. Specifically, the final quality-of-life score was regressed on randomied group, adjusted for the baseline score and center. This gave an estimated intervention effect of 0.4

5 1164 Biometrics, December 2015 OR=1.36 OR=0.5 OR=0.67 OR=1 OR=1.5 OR=2 Figure 1. specific means under RAM-SM with different values of OR = exp(δ) and other parameters based on the QUATRO data. The ORs in the plots are for a one standard deviation change in Y. OR=1.36 is the estimate from the QUATRO data. (intervention minus control) with a 95% confidence interval of ( 2.6, 1.8); negative values correspond to a harmful effect of intervention. These results do not allow for the missing data (although the sensitivity analyses in Gray et al. (2006) did do so). There were more missing final quality-of-life scores in the treatment group (Table 1). Up to 9 attempts were made to collect the 52 week outcome for participants. However, given the sparsity of subjects with 3 to 9 attempts on each arm for our analysis, we merged those subjects into one pattern (see Table 2). We see an overall decreasing outcome mean with the number of attempts which both the RAM-SM and the RAM-PMM try to exploit. For our analysis, the number of attempts, R takes values in {1, 2, 3, 4} (i.e., K = 3). R = 4 corresponds to the pattern that Y is not observed even after all attempts. Individuals with Y missing, but fewer than three attempts, have R censored; there are 8 and 20 subjects censored respectively on the two arms (Table 2). Covariates X are indicators of the four centers and the baseline response. The results from the analysis performed by Jackson et al. (2010) using a RAM-SM are not directly comparable to those obtained in Section 5 because Jackson et al. modeled all nine attempts. Undertaking the modeling of all nine attempts in our pattern mixture framework is not feasible because only a small proportion of participants (28/409) receive more than three attempts Results Ignorable analysis. We start with an ignorable analysis which assumes the missingness is MAR and does not explicitly model the number of attempts. For the MCMC al-

6 Mixture Models for the Analysis of Repeated Attempt Designs 1165 Table 1 QUATRO data: counts (outcome means) by number of attempts (k) and randomied group (Z) Y observed after k attempts # of attempts (k) Y not observed Control (n = 205) 77 (42.4) 94 (41.3) 7 (38.7) 7 (34.7) 3 (34.2) 2 (32.9) 1 (40.7) 1 (62.98) 0 (NA) 13 Treatment (n = 204) 73 (40.7) 90 (40.2) 7 (38.6) 1 (45.7) 3 (35.0) 0 (NA) 0 (NA) 1 (30.3) 0 (NA) 29 gorithm, we ran iterations with a burn-in of 1000 iterations. We define m(, x) = E(Y Z =, X = x). We compute the treatment effect (marginalied over X) as θ = E(Y Z = 1) E(Y Z = 0) = m(1, x)df(x) m(0, x)df(x). The marginal treatment effect θ has a posterior mean of 0.4 with 95% credible interval of ( 2.5, 1.8); thus, the effect of the adherence therapy on 52 week QoL is minimal with a confidence interval that overlaps ero, providing little (if any evidence) of a beneficial effect of the intervention. The treatment effect is smaller in magnitude than suggested by Table 2 because of the covariate adjustment (results not shown). These results are in excellent agreement with previous ignorable analyses that adjust for the same covariates used here by Jackson et al. (2010) and Gray et al. (2006) RAM-PMM nonignorable analyses. For the MCMC algorithm, we again ran iterations with a burn-in of 1000 iterations. For the MC integration, we set L = 2000, which made the MC error negligible; increasing L to 3000 made no substantive difference in the posterior mean of θ. We vary C between 0 (meaning that missing subjects are comparable to the last responders) and 3 (meaning that missing subjects differ from the last responders as much as the last responders differ from the first responders). However, other choices can be made including negative C s; we discuss this further in Section 6. For all values of C considered (see Table 3), representing different degrees of resistance, we observed a larger negative effect of adherence therapy on self-reported 52 week QoL (with θ ranging from 0.6 to 0.8) than in the ignorable analysis and wider confidence intervals (that still cover ero). The estimated effect of adherence therapy increases with C which corresponds to those without QoL observed after the maxi- Table 2 QUATRO data: counts (outcome means) by number of attempts (k) and randomied group (Z) after merging 3 9 attempts into one pattern Y observed Y missing R <4 4 Control (n = 205) 77 (42.4) 94 (41.3) 21 (37.4) 8 5 Treatment (n = 204) 73 (40.7) 90 (40.2) 12 (37.6) 20 9 mum number of attempts having poorer QoL than those observed after 3 or fewer attempts. This can also be seen by the slope of the priors, ζ 10 and ζ 11, with posterior means (95% credible intervals) of 1.9 ( 4.8, 0.97) and 1.7 ( 5.2, 1.7), respectively; note that both are negative with the slope for those on adherence therapy more extreme (doing worse as the number of attempts increases). We point out these slopes do not depend on the values of the sensitivity parameter (C) considered. Posterior means of all the parameters are given in the Supplementary Materials A (standard) two-pattern model. We fit a standard two-pattern (outcome observed or not) mixture model under (nonignorable) MAR which gave essentially the same results as the ignorable analysis (not shown). Finally, we fit a MNAR two pattern model with sensitivity parameters, η, specified as in Section 3.3. The values of the sensitivity parameters η for C = 0, 1, 2, 3 were (η 0 = 2.96,η 1 = 2.3), (η 0 = 4.8,η 1 = 3.9), (η 0 = 6.8,η 1 = 5.6), and (η 0 = 8.6,η 1 = 7.3), respectively. As expected, the results closely match the RAM- PMM analysis (in terms of posterior means), but with less uncertainty. For example, for C = 1, the posterior mean and credible interval for θ was 0.6 ( 2.8, 1.6); for C = 3, 0.9 ( 3.1, 1.4). The decrease in uncertainty is expected since there are fewer patterns (and thus parameters). Overall, there was not strong evidence of a beneficial effect of the adherence therapy intervention under any of the PMM formulations. Table 3 Posterior summaries for the RAM-PMM for θ and the treatment specific means. C is the sensitivity parameter C parameter mean 95% CI 0 θ 0.6 ( 2.9, 1.7) E(Y Z = 0) 40.9 (39.2, 42.5) E(Y Z = 1) 40.2 (38.4, 42.1) 1 θ 0.7 ( 3.1, 1.8) E(Y Z = 0) 40.7 (39.1, 42.4) E(Y Z = 1) 40.0 (38.0, 42.1) 2 θ 0.7 ( 3.5, 2.0) E(Y Z = 0) 40.5 (38.8, 42.3) E(Y Z = 1) 39.8 (37.4, 42.1) 3 θ 0.8 ( 3.8, 2.2) E(Y Z = 0) 40.4 (38.5, 42.2) E(Y Z = 1) 39.6 (36.9, 42.2)

7 1166 Biometrics, December Comparing the Results to Those Obtained Using RAM-SMs The Stata module alho, available at the website of the last author, was used to fit a RAM-SM that is conceptually similar to the RAM-PMM used here. This module requires complete covariates and so we used mean imputation to impute missing baseline quality of life scores (imputing missing baseline values in this way in randomied trials is not a source of bias (White and Thompson, 2005)). All participants who provided outcome data after more than 3 attempts were placed in the R = 3 pattern. Those who did not provide outcome data were placed in the R = 4 pattern, regardless of the number of attempts made to obtain their outcome data. A standard linear regression model was assumed for the final quality of life scores, where the covariates were the treatment group, the baseline quality of life scores and center indicators. The following model was assumed for the RAM-SM missing data mechanism, logit{p(r = k R k, Z =, X = x,y = y)} = λ 0k + γ + λ k x + δ 1 y + δ 2 y, (7) where the covariates X are the baseline quality of life score and center effects. The parameter, δ 2 allows the relationship between the outcome and the number of attempts to differ by treatment arm (related to ζ 1 in the PMM); δ 2 = 0 would roughly correspond to ζ 1 being the same for both treatments. This parameter is important because the treatment contrast is very sensitive to the missing data mechanism differing by randomied arm (White et al., 2007). The alho module gave maximum likelihood estimates of ˆδ 1 = 0.041(.008,.074) and ˆδ 2 = 0.026(.011,.063) and the estimated treatment effect, θ, was 1.5( 3.9, 0.8). From this RAM-SM we obtain some evidence that the final quality-of-life scores had an effect in model (7), where the estimates of δ 1 and δ 2 suggest that participants with better final quality-of-life scores are more likely to report them, particularly in the treatment group. This is consistent with the slopes, ζ 1j of the RAM-PMM. The estimated treatment effect is notably larger than that from the RAM-PMM analysis but still does not achieve statistical significance. In fact, the treatment effect for the RAM-SM would correspond to an implausibly extreme value of C>10 in the RAM-PMM, which would likely lead us to question the validity of the RAM-SM results here. However, there are alternative explanations here including whether a better fitting RAM-PMM (or RAM-SM) would necessitate such an extreme C for roughly compatible results. Estimates of all the parameters can be found in the Supplementary Materials A Comparison of the RAM-SM and RAM-PMM Model Fits We compare the relative fit of the RAM-SM and RAM- PMM using the BIC based on the observed data likelihood, BIC = 2loglik + p log n where p is the number of parameters in each model (18 for the RAM-PMM and 16 for the RAM-SM); note that to compute the BIC for the RAM-PMM, we fit a simple frequentist (equivalent) model to the observed data using linear and logistic regressions with missing baseline data filled in using mean imputation (as was done in the RAM-SM analysis) and censoring treated as in the RAM-SM. The BIC for RAM-SM is and for the RAM-PMM is , indicating better fit for the RAM-SM here. However, we accept the better fit of the RAM-SM with caution here given the implicit extreme value of C implied for the models considered and the ad hoc adjustments (described above) needed to make the BIC comparison here. We discuss this further in Section Conclusions/Discussion We have proposed a pattern mixture model for a repeated attempt design that includes sensitivity parameters; we have also made comparisons with the existing selection models for repeated attempt designs. In the QUATRO study, we found minimal evidence of a significant effect of the intervention (adherence therapy) on 52 week self-reported QoL with the models considered. For our analysis of the QUATRO data, the RAM-SM provided a better fit to the data than the RAM-PMM as measured by the BIC. In fact, the log likelihood for the SM was larger than for the PMM; this is likely due to subtle differences between the models including the pattern-specific distributions implied by the selection model, the different forms of the missing data mechanism for the two approaches (cf. Section 4.1) and the ad hoc adjustments needed for the RAM- SM comparison (cf. Section 5.3) here (using the stata alho module). This is despite the fact, as seen in the simulations in Section 4, that the pattern-specific behavior of the RAM-SM is similar to the linear model used to extrapolate the missing pattern in the RAM-PMM. However, we recommend consideration of the RAM-PMM in general as it allows for sensitivity analysis, unlike the RAM- SM, handles the missing data similar to the RAM-SM in terms of a continuum of resistance, and does not have the issue of a potential large impact on inferences of modeling choices in the missing data mechanism (though this impact does not seem to be as large as in ordinary selection models (Ng, 2013)). We saw this undesirable sensitivity as the RAM-SM results here corresponded to an unreasonably extreme value of C and were quite different than related RAM-SM s fit to the QUATRO data in Jackson et al. (2012). In fact, the extreme value of C suggests that very extreme values of the QoL were needed to make the full data response look normal. There are a variety of extensions to the current models. The choice of linearity for the functional form of the conditional means for extrapolation was made due to there only being three patterns in the QUATRO data example. More complex forms for h (k; ζ) can easily be accommodated (though this choice is restricted by the number of attempts/patterns). In the QUATRO data, we only consider positive values for C based on the concept of a continuum of resistance and we recommend the maximum number of attempts as a default upper bound for C. However, negative values can be accommodated as appropriate for other datasets (where, for example, it is thought that those unobserved after the maximum number of attempts are more similar to those observed after few attempts). We could also consider more complex forms for the mean and variance functions, μ(, x,k) and σ 2 (, x,k). We as-

8 Mixture Models for the Analysis of Repeated Attempt Designs 1167 sumed a parametric form for the distribution of the covariates, X; more flexible specifications could easily be developed using Bayesian nonparametric models for the distribution of X (as well as for the response, Y model). For repeated attempts with sparse patterns, we can adapt the ideas from the work of Roy (2003) and Roy and Daniels (2008) to combine patterns in a data-dependent way. We are also working on proving the monotonicity of the pattern specific means observed in Section 4.2. Finally, the approach here was developed for a randomied trial. Extension to observational studies would require some minor adjustments including a definition of X as all required confounders as opposed to covariates that potentially impact missingness. 7. Supplementary Materials The supplementary materials contains WinBUGS code for the models fit in Section 5.1 and parameter estimates for models fit in Sections and 5.2, and are available with this paper at the Biometrics website on Wiley Online Library. Acknowledgements MJD was partially supported by US NIH grants CA85295 and CA DJ and IRW are employed by the UK Medical Research Council [Unit Programme number U ]. References Alho, J. M. (1990). Adjusting for nonresponse bias using logistic regression. Biometrika 77, Daniels, M. J. and Hogan, J. W. (2000). Reparameteriing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics 56, Daniels, M. J. and Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis, volume 109 of Monographs on Statistics and Applied Probability. Boca Raton, FL: Chapman & Hall/CRC. Gray, R., Leese, M., Bindman, J., Becker, T., Burti, L., David, A., et al. M. (2006). Adherence therapy for people with schiophrenia european multicentre randomised controlled trial. The British Journal of Psychiatry 189, Jackson, D., Mason, D., White, I. R., and Sutton, S. (2012). An exploration of the missing data mechanism in an internet based smoking cessation trial. BMC Medical Research Methodology 12, 157. Jackson, D., White, I. R., and Leese, M. (2010). How much can we learn about missing data?: An exploration of a clinical trial in psychiatry. Journal of the Royal Statistical Society: Series A (Statistics in Society) 173, Kenward, M. G. (1998). Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. Statistics in Medicine 17, Lin, I.-F. and Schaeffer, N. C. (1995). Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59, Little, R. J. (1993). -mixture models for multivariate incomplete data. Journal of the American Statistical Association 88, Little, R. J. (1995). Modeling the drop-out mechanism in repeatedmeasures studies. Journal of the American Statistical Association 90, Lunn, D., Best, N., Thomas, A., Wakefield, J., and Spiegelhalter, D. (2002). Bayesian analysis of population PK/PD models: General concepts and software. Journal of Pharmacokinetics and Pharmacodynamics 29, National Research Council (2010). The Prevention and Treatment of Missing Data in Clinical Trials. Washington, D.C.: The National Academies Press. Ng, Y. L. (2013). Using repeated contact attempts to move beyond the missing at random assumption. PhD thesis, University of Cambridge. Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59, Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, White, I. R., Carpenter, J., Evans, S., and Schroter, S. (2007). Eliciting and using expert opinions about dropout bias in randomised controlled trials. Clinical Trials 4, White, I. R. and Thompson, S. G. (2005). Adjusting for partially missing baseline measurements in randomied trials. Statistics in Medicine 24, Wood, A. M., White, I. R., and Hotopf, M. (2006). Using number of failed contact attempts to adjust for non-ignorable nonresponse. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169, Received December Revised May Accepted May 2015.

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010

Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010 Strategy for modelling non-random missing data mechanisms in longitudinal studies using Bayesian methods: application to income data from the Millennium Cohort Study Alexina Mason Department of Epidemiology

More information

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

Whether to use MMRM as primary estimand.

Whether to use MMRM as primary estimand. Whether to use MMRM as primary estimand. James Roger London School of Hygiene & Tropical Medicine, London. PSI/EFSPI European Statistical Meeting on Estimands. Stevenage, UK: 28 September 2015. 1 / 38

More information

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 36 11-1-2016 Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Li Qin Yale University,

More information

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Richard D Riley was supported by funding from a multivariate meta-analysis grant from Bayesian bivariate meta-analysis of correlated effects: impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences Author affiliations Danielle L Burke

More information

Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates

Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates Biometrics 000, 000 000 DOI: 000 000 0000 Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates M.J. Daniels, C. Wang, B.H. Marcus 1 Division of Statistics & Scientific

More information

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial Antonio R. Linero, Michael J. Daniels

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

A mean score method for sensitivity analysis. to departures from the missing at random. assumption in randomised trials

A mean score method for sensitivity analysis. to departures from the missing at random. assumption in randomised trials arxiv:1705.00951v1 [stat.me] 2 May 2017 A mean score method for sensitivity analysis to departures from the missing at random assumption in randomised trials Ian R. White 1,2,, James Carpenter 2,3, and

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA) A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA) Ian White Cochrane Statistical Methods Group Webinar 14 th June 2017 Background The choice between

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure Ciarleglio and Arendt Trials (2017) 18:83 DOI 10.1186/s13063-017-1791-0 METHODOLOGY Open Access Sample size determination for a binary response in a superiority clinical trial using a hybrid classical

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data Biometrics 000, 000 000 DOI: 000 000 0000 Discussion of Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Non-ignorable Missing Data Dylan S. Small

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Simulation-based robust IV inference for lifetime data

Simulation-based robust IV inference for lifetime data Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

An Introduction to Causal Analysis on Observational Data using Propensity Scores

An Introduction to Causal Analysis on Observational Data using Propensity Scores An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Downloaded from:

Downloaded from: Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003

More information

Guideline on adjustment for baseline covariates in clinical trials

Guideline on adjustment for baseline covariates in clinical trials 26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial William R. Gillespie Pharsight Corporation Cary, North Carolina, USA PAGE 2003 Verona,

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Analysing longitudinal data when the visit times are informative

Analysing longitudinal data when the visit times are informative Analysing longitudinal data when the visit times are informative Eleanor Pullenayegum, PhD Scientist, Hospital for Sick Children Associate Professor, University of Toronto eleanor.pullenayegum@sickkids.ca

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49 State-space Model Eduardo Rossi University of Pavia November 2013 Rossi State-space Model Financial Econometrics - 2013 1 / 49 Outline 1 Introduction 2 The Kalman filter 3 Forecast errors 4 State smoothing

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1

Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Inference for correlated effect sizes using multiple univariate meta-analyses

Inference for correlated effect sizes using multiple univariate meta-analyses Europe PMC Funders Group Author Manuscript Published in final edited form as: Stat Med. 2016 April 30; 35(9): 1405 1422. doi:10.1002/sim.6789. Inference for correlated effect sizes using multiple univariate

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University

More information

Estimating complex causal effects from incomplete observational data

Estimating complex causal effects from incomplete observational data Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods MIT 14.385, Fall 2007 Due: Wednesday, 07 November 2007, 5:00 PM 1 Applied Problems Instructions: The page indications given below give you

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information