Pattern Mixture Models for the Analysis of Repeated Attempt Designs
|
|
- Jerome Joseph
- 6 years ago
- Views:
Transcription
1 Biometrics 71, December 2015 DOI: /biom Mixture Models for the Analysis of Repeated Attempt Designs Michael J. Daniels, 1, * Dan Jackson, 2, ** Wei Feng, 3, *** and Ian R. White 2, **** 1 Department of Integrative Biology, Department of Statistics & Data Sciences, University of Texas, Austin, TX Medical Research Council Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, U.K. 3 Department of Statistics, University of Florida, Gainesville, Florida mjdaniels@austin.utexas.edu daniel.jackson@mrc-bsu.cam.ac.uk fengwei@ufl.edu ian.white@mrc-bsu.cam.ac.uk Summary. It is not uncommon in follow-up studies to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful or not provides useful information for the purposes of assessing the missing at random (MAR) assumption and facilitating missing not at random (MNAR) modeling. This is because measurements from subjects who provide this data after multiple failed attempts may differ from those who provide the measurement after fewer attempts. This type of continuum of resistance to providing a measurement has hitherto been modeled in a selection model framework, where the outcome data is modeled jointly with the success or failure of the attempts given these outcomes. Here, we present a pattern mixture approach to model this type of data. We re-analye the repeated attempt data from a trial that was previously analyed using a selection model approach. Our pattern mixture model is more flexible and is more transparent in terms of parameter identifiability than the models that have previously been used to model repeated attempt data and allows for sensitivity analysis. We conclude that our approach to modeling this type of data provides a fully viable alternative to the more established selection model. Key words: Nonignorable missingness; Repeated attempt model; Selection model; Sensitivity analysis. 1. Introduction It is not uncommon in follow-up studies to make multiple attempts to collect a measurement after baseline (e.g., Wood et al., 2006; Jackson et al., 2012). Here, we refer to this type of design as a repeated attempt designs (RAD) and the corresponding statistical models as repeated attempt models (RAM). Information about the multiple attempts made to obtain outcome data have the potential to provide some information about the unobserved responses. This has been exploited in several papers, in the context of selection models (Alho, 1990; Wood et al., 2006; Jackson et al., 2010, 2012). In these selection models the information about the repeated attempts is thought to describe a continuum of resistance to providing data (Lin and Schaeffer, 1995). Evidence for this type of resistance can be informally assessed by tabulating numerical summaries of outcome data, such as the mean outcome, by the number of attempts made to obtain these data. Assuming that a large value for the response variable is a favorable outcome, a negative association between the mean outcome and the number of attempts made to obtain outcome data provides evidence that those with less favorable outcomes are resistant. This would suggest that those who do not provide data after many failed attempts, and therefore are highly resistant, may have very unfavorable outcomes and this would almost certainly invalidate a statistical analysis which assumes data are missing at random (MAR). The advantage of the existing selection models is that they exploit the RAD to identify all the parameters in the full data model. Selection models that describe the marginal probability that outcome data are observed, rather than the marginal probability that each attempt to obtain outcome data is successful, are very sensitive to outliers and the distributional assumptions made, because these models are very weakly identified (Kenward, 1998). However the selection model becomes much more strongly identifiable when it is used to describe each attempt (Jackson et al., 2012) in situations where multiple attempts to obtain data are made. This identifiability can also be considered a disadvantage of the RAM because this approach does not allow the type of sensitivity parameter defined by Daniels and Hogan (2008); such parameters are recommended for the analysis of missing data (National Research Council, 2010) and importantly, when varied, do not impact the fit of the model to the observed data. As such, these parameters allow examination of the sensitivity of inferences to unverifiable assumptions about the missingness. To our knowledge, there has been no previous work using a pattern mixture model for repeated attempt data. mixture models (Little, 1993, 1995) have been advocated as an approach to handle missing data that easily allows for sensitivity parameters due to their direct connection to the extrapolation factoriation (Daniels and Hogan, 2000, 2008). Here we describe how this type of model can be The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
2 Mixture Models for the Analysis of Repeated Attempt Designs 1161 adapted to incorporate the repeated attempt information; as such, we propose a pattern mixture model RAM as a competitor to the selection model RAM. We motivate this work based on the QUATRO trial (Gray et al., 2006). The QUATRO trial (Gray et al., 2006) was a single-blind, multi-center randomied controlled trial of the effectiveness of adherence therapy for participants with schiophrenia. The trial included 409 participants in four centers: Amsterdam (the Netherlands), Leipig (Germany), London (United Kingdom) and Verona (Italy). Participants were recruited from June 2002 to October 2003 from people under the care of mental health services and were individually randomied to receive either adherence therapy (intervention) or health education (control). The inclusion and exclusion criteria are described in detail by Gray et al. (2006). Assessments were undertaken at baseline and at a follow-up of 52 weeks. The objective of this multicenter trial was to assess the impact of adherence therapy on self-reported quality-of-life of people with severe mental illness. The investigators made multiple attempts to collect the quality-of-life outcome (as many as nine attempts) but there were still individuals whose response could not be collected. In the treatment and control arms, 29 out of 204 (14%) and 13 out of 205 (6%) subjects failed to provide outcome data at the end of the trial, respectively. This imbalance in the amount of missing data by treatment arm, in conjunction with the concern that those with less favorable outcomes may be less likely to provide data, motivated both the previous MNAR modeling of Jackson et al. (2010) and the methods described here. The intuition that underlies our modeling is that we could produce treatment arm-specific plots which show the mean quality of life scores against the number of attempts. To impute the outcomes for those who do not provide outcome data, we could fit regression lines to each of these plots and then extrapolate, where the extent of the extrapolation reflects how resistant the non-responders are thought to be. We develop a statistically principled and more general version of this idea, where we fully take into account all the various sources of uncertainty in a Bayesian pattern mixture model. We first briefly review the use of selection models for the RAD in Section 2. We then introduce a pattern mixture model for this design that allows intuitive sensitivity parameters in Section 3. We show connections between the selection model and pattern mixture model formulations in Section 4. We reanalye the QUATRO data using these models in Section 5. Section 6 contains a discussion and open issues. 2. Notation, Target of Inference, and Review of Repeated Attempt Models 2.1. Notation The outcome of interest (here, self reported QoL at 52 weeks) will be denoted as Y, and the set of baseline covariates (here, baseline QoL and center) as X. R will denote the number of attempts until the outcome is successfully collected; we assume up to K attempts where R = K + 1 corresponds to the outcome not being collected after the maximum number of attempts. We assume a randomied trial where Z denotes randomiation to the intervention of interest (here, adherence therapy); an extension to an observational study is mentioned in Section Inferential Goal The quantity of interest here, θ, is the treatment effect on the means, unconditional on X and R, θ = E(Y Z = 1) E(Y Z = 0). (1) In the existing RAM-SM (described in Section 2.3), this parameter is specified directly in the model; see equation (3) below. For the RAM-PMM (introduced in Section 3), this parameter is not directly specified in the model; computation of this parameter requires evaluation of the double integral in (5) below. In the QUATRO data, θ is the effect of adherence therapy on self reported QoL at 52 weeks Repeated Attempt Selection Model (RAM-SM) The RAM-SM was originally proposed by Alho (1990). A logistic regression is used to model the probability that each attempt to obtain outcome data is successful, where an MNAR model is obtained by using the (possibly missing) outcome data as a covariate in this logistic regression. The joint likelihood of the response and the repeated attempts data is obtained by modeling the marginal distribution of the outcome data Y and the missing data mechanism of the RAM (Wood et al., 2006); the latter describes the probability that the attempts are successful given Y. The key identifying assumption is that the covariate effect associated with the outcome Y is common across all attempts. To date, the RAM has only been developed for an incomplete univariate outcome. An example of the missing data mechanism in a RAM is given by the model logitp(r = k R k, Z =, X = x,y = y) = λ 0k + γ k + λ k x + δy, (2) where R = k if the kth attempt is successful. We assume that all attempts are independent, so λ 0k is the log odds, for participants with X = 0, Y = 0, and Z = 0, that the kth attempt is successful given no success on the previous k 1 attempts. Model (2) can be thought of as a discrete time survival model, or a stratified logistic regression, where we also handle the unobserved outcomes (Jackson et al., 2012). The term λ k x allows the probability that attempts are successful to depend on covariates, where the covariate effects may or may not depend on k. The term δy permits MNAR models. The MAR assumption is equivalent to assuming that δ = 0. The key identifying assumption is that the covariate effect of Y in (2) is constant across all attempts, that is δ does not depend on k. By jointly modeling Y, and then the attempts given Y using (2), a selection model approach has been adopted. If only one attempt is made for all subjects then the RAM reduces to a standard selection model. The model specified for the response Y is Y Z =, X = x N{μ(, x),σ 2 (, x)}, (3)
3 1162 Biometrics, December 2015 where in the example μ(, x) = β 0 + θ + β x and σ 2 (, x) = σ 2. Alho (1990) originally proposed using a modified likelihood to fit the RAM but two new estimation methods have subsequently been developed by Wood et al. (2006). The first of these methods uses the EM algorithm to fit the model using the full likelihood and the second uses a Bayesian approach and the software WinBUGS (Lunn et al., 2002). The full likelihood was also used by Jackson et al. (2012) to fit the RAM, but without using the EM algorithm. It can be difficult to assess the fit of this selection model RAM to the observed data because the observed data likelihood is not available in closed form (though it can be evaluated numerically using one-dimensional Gaussian quadrature to integrate out the missing data). Previous work involving fitting a RAM to the QUATRO data was performed using WinBUGS (see Jackson et al. (2010) for full details). Briefly, no strong evidence that data are MNAR was found. As such, the MAR assumption made by Gray et al. (2006) does not appear unreasonable though the estimated treatment effect was slightly reduced (but it remained statistically insignificant). 3. Repeated Attempt Mixture Model (RAM-PMM) 3.1. Model In our pattern mixture formulation of the RAM, denoted as RAM-PMM, the patterns are defined by the values of R. For pattern R = k (k = 1,...,K+ 1) and arm = 0, 1, we consider the following model for the conditional distribution of Y Y Z =, X = x,r= k N{μ(, x,k),σ 2 (, x,k)}, so that we allow the mean and variance of the outcome data to depend on covariates and patterns. In all our modeling, we assume the particular function μ(, x,k) for k K, μ(, x,k) = α (k) + β 1 x, (4) and a constant variance, σ 2 (, x,k) = σ 2. More complex forms of the mean and variance are possible with richer data; we return to this issue in the discussion. So far, we have not discussed anything about the identification of the distribution of the missing data, specifically, μ(, x,k+ 1). Different assumptions about μ(, x,k+ 1) in Section 3.2 will allow us to explore a range of possibilities in a sensitivity analysis. We then specify a model for the conditional distribution of the pattern indicator (similar to the RAM-SM specification in equation (2) but not conditioning on Y). An example of this type of model is logit{π k (x)} =λ 0k + λ + λ x x, where π k (x) = P(R = k R k, Z =, X = x). Finally, we specify a model for [X Z] = [X] (by randomiation). This factoriation respects the fact, which is sometimes overlooked in pattern mixture models, that the distribution of the baseline outcome (which is included in X) does not depend on Z. Note that this model can be specified parametrically or using Bayesian nonparametrics (more in the discussion). Here, we assume a parametric model for this distribution; note, that in our example, X is composed of the baseline outcome, Y 0 and indicators of center. We assume for each center, the distribution of the baseline outcome is normal with mean and variance depending on center. It is easy to assess the fit of this model to the observed data as its distribution is modeled directly in the pattern mixture framework. The quantity of interest here, θ, given in (1) can be written as follows, θ = E[Y Z = 1] E[Y Z = 0] = μ(1, x,k)df(k x,z = 1)dF(x) μ(0, x, k)df(k x,z = 0)dF(x), (5) where F(k x,) and F(x) were specified above. All integrals are double integrals over x and k (the first integral is a sum over the discrete k). The parameter, θ can be computed using Monte Carlo (MC) integration in WinBUGS. Details can be found in Section Priors In the proposed mixture model, not all parameters are identified by the observed data. One of the main contributions of our approach is the form of the priors for the unidentified parameters, which is described below. First we outline priors for the identified parameters. Identified parameters. The parameters indexing (and identified by) the observed data comprise ({α (k) : k = 1,...,K}, β 1,σ,λ). For the regression parameters, we use diffuse normal priors. For the variance component, σ 2, we use a vague inverse gamma prior. Unidentified parameters. The parameters α (K+1) are not identified by the observed data (or modeling assumptions) and would be classified as sensitivity parameters (Daniels and Hogan, 2008). To identify these parameters, we exploit the repeated attempt design and assume a functional relationship between the intercept parameters for the observed outcomes, {α (k) : k = 1,...,K} and the number of attempts (k). In particular, we specify a prior for α (K+1) conditional on ᾱ (K) = (α (1),...,α (K) ) T, i.e., p(α (K+1) ᾱ (K) ). We center this prior at its prediction based on implicitly fitting the regression α (k) = h (k; ζ) + ɛ k, k = 1,...,K. In what follows, we set h (k; ζ) = ζ 0 + ζ 1 k, k = 1,...,K (a linear regression with α (k) as the dependent variable and pattern (k) as the independent variable) and compute the least squares estimate of (ζ 0,ζ 1 ) to obtain α (K+1) ᾱ (K) N{ˆζ 0 + ˆζ 1 (K + C),τ 2 }.
4 Mixture Models for the Analysis of Repeated Attempt Designs 1163 where ˆζ j are functions of {α (1),...,α (K) }. The linear relation- for k = 1,,K in k is the key assumption. We ship of α (k) assume that the intercepts in (4) follow a linear trend over patterns that provide outcome data and that we can extrapolate from this following the intuition that we described in the introduction. Here the sensitivity parameters are C and τ, where C represents how far we should extrapolate the linear trend to describe the missing outcome data (i.e., how resistant are those that have not provided outcome data by the Kth attempt) and τ represents our uncertainty in the precision of the extrapolation. In what follows, the sensitivity parameter τ is fixed at ero and we focus on C. This approach does not put any modeling restrictions on the observed data, but still attempts to use information in an intuitive manner from the repeated attempt design. Also note that it is not necessary to assume a linear form of h (k; ζ); any functional relationship (subject to having enough patterns/attempts) is possible. We return to this in the discussion Connections to Priors and Sensitivity Parameters in a Two- Mixture Model It is not uncommon for information on the number of attempts to be collected but not used in the modeling. The pattern mixture model approach in that situation would only have two patterns, corresponding to Y being observed or missing. Implicitly, the first pattern would be formed by combining the first K successful patterns (where outcome was observed) from our RAM-PMM into a single pattern. We can define the corresponding intercept for this combined pattern as a function of the parameters in our repeated attempt model to be α, where α = K p k=1 kα (k) and p k = E x {P(R = k Z =, X = x)}; the quantity P(R = k, x) can be computed recursively from the π k s. The typical approach in a two pattern model would be to specify the conditional mean of α (K+1) as E(α (K+1) α ) = α + η, where η is a sensitivity parameter. The value of η implied by the model and priors for our RAM-PMM is η = ˆζ 0 + ˆζ 1 (K + C) α. (6) If the investigator wanted to do an analysis with a two-pattern model, (6) could help calibrate η based on the repeated attempt model. We do this in Section Computations in WinBUGS Models can be specified, and the posterior sampled using MCMC, in WinBUGS (see the Supplementary Materials for code). For continuous components in X, the integral in (1) can be computed in WinBUGS by using the following trick: (1) Create L units with a missing outcome and missing covariates. (2) Compute the mean of these L outcomes for each Z at each iteration. This trick implicitly does a Monte Carlo (MC) integration over the distribution of X at each iteration. L is chosen such that the error in the MC integration for the quantities of interest is negligible. Corresponding code can be found in the Supplementary materials. 4. Some Connections Between the Parameters of RAM-PMM and RAM-SM 4.1. SM Corresponding to the RAM-PMM and Direct Theoretical Connections In the following, we describe the selection model derived from the RAM-PMM. For simplicity, we only consider one covariate and suppress the dependence of the intercept on. In addition, we assume the coefficient for this covariate is constant across patterns as is the residual variance, Y x, R = k N(α (k) + β 1 x, σ 2 ), for k = 1,...,K+ 1. The implied selection model has the link function, log P(D = k + 1 x, y). P(D = k x, y) The main observation here is that the implied selection model corresponds to a different link function than that used in the RAM-SM and obviously a different distribution for Y (i.e., a mixture of normals). There are also some direct theoretical connections between the RAM-SM and RAM-PMM. Setting the parameter ζ 1 equal to ero (due to α (k) not depending on k) implies MAR (since then we have the same (normal) distribution in each pattern and thus the marginal distribution of y is normal [not a mixture of normals]) for the RAM-PMM. This model is equivalent to RAM-SM with δ = Empirical Connections We also assess connections empirically between the RAM- SM and RAM-PMM specified in this paper. We simulated data under the RAM-SM (using parameter values based on the QUATRO data) to assess how the value of δ impacts the derived pattern specific conditional means, α (k) as a function of k (see Figure 1); as such, we do not actually need to fit the RAM-PMM and the sensitivity parameter, C does not need to be specified. We see that there is a monotone non-decreasing pattern in the pattern specific means for odds ratios larger than one and monotone non-increasing pattern for odds ratios less than one. We also simulated data under many different true values for λ 0j (not shown); for all scenarios examined, the odds ratios were monotone non-increasing (non-decreasing) based on the sign of δ. To summarie, the RAM-SM and RAM-PMM try to exploit the repeated attempts in similar ways. However, the RAM- PMM allows for sensitivity analysis and more transparency in the model specification for the observed data and parameter identifiability; sensitivity analysis is an essential component of inference for missing data in randomied trials. 5. Analysis of QUATRO The main analysis of QUATRO (Gray et al., 2006) was a complete-case analysis using a linear regression model, where individuals with missing data at baseline or follow-up were excluded. Specifically, the final quality-of-life score was regressed on randomied group, adjusted for the baseline score and center. This gave an estimated intervention effect of 0.4
5 1164 Biometrics, December 2015 OR=1.36 OR=0.5 OR=0.67 OR=1 OR=1.5 OR=2 Figure 1. specific means under RAM-SM with different values of OR = exp(δ) and other parameters based on the QUATRO data. The ORs in the plots are for a one standard deviation change in Y. OR=1.36 is the estimate from the QUATRO data. (intervention minus control) with a 95% confidence interval of ( 2.6, 1.8); negative values correspond to a harmful effect of intervention. These results do not allow for the missing data (although the sensitivity analyses in Gray et al. (2006) did do so). There were more missing final quality-of-life scores in the treatment group (Table 1). Up to 9 attempts were made to collect the 52 week outcome for participants. However, given the sparsity of subjects with 3 to 9 attempts on each arm for our analysis, we merged those subjects into one pattern (see Table 2). We see an overall decreasing outcome mean with the number of attempts which both the RAM-SM and the RAM-PMM try to exploit. For our analysis, the number of attempts, R takes values in {1, 2, 3, 4} (i.e., K = 3). R = 4 corresponds to the pattern that Y is not observed even after all attempts. Individuals with Y missing, but fewer than three attempts, have R censored; there are 8 and 20 subjects censored respectively on the two arms (Table 2). Covariates X are indicators of the four centers and the baseline response. The results from the analysis performed by Jackson et al. (2010) using a RAM-SM are not directly comparable to those obtained in Section 5 because Jackson et al. modeled all nine attempts. Undertaking the modeling of all nine attempts in our pattern mixture framework is not feasible because only a small proportion of participants (28/409) receive more than three attempts Results Ignorable analysis. We start with an ignorable analysis which assumes the missingness is MAR and does not explicitly model the number of attempts. For the MCMC al-
6 Mixture Models for the Analysis of Repeated Attempt Designs 1165 Table 1 QUATRO data: counts (outcome means) by number of attempts (k) and randomied group (Z) Y observed after k attempts # of attempts (k) Y not observed Control (n = 205) 77 (42.4) 94 (41.3) 7 (38.7) 7 (34.7) 3 (34.2) 2 (32.9) 1 (40.7) 1 (62.98) 0 (NA) 13 Treatment (n = 204) 73 (40.7) 90 (40.2) 7 (38.6) 1 (45.7) 3 (35.0) 0 (NA) 0 (NA) 1 (30.3) 0 (NA) 29 gorithm, we ran iterations with a burn-in of 1000 iterations. We define m(, x) = E(Y Z =, X = x). We compute the treatment effect (marginalied over X) as θ = E(Y Z = 1) E(Y Z = 0) = m(1, x)df(x) m(0, x)df(x). The marginal treatment effect θ has a posterior mean of 0.4 with 95% credible interval of ( 2.5, 1.8); thus, the effect of the adherence therapy on 52 week QoL is minimal with a confidence interval that overlaps ero, providing little (if any evidence) of a beneficial effect of the intervention. The treatment effect is smaller in magnitude than suggested by Table 2 because of the covariate adjustment (results not shown). These results are in excellent agreement with previous ignorable analyses that adjust for the same covariates used here by Jackson et al. (2010) and Gray et al. (2006) RAM-PMM nonignorable analyses. For the MCMC algorithm, we again ran iterations with a burn-in of 1000 iterations. For the MC integration, we set L = 2000, which made the MC error negligible; increasing L to 3000 made no substantive difference in the posterior mean of θ. We vary C between 0 (meaning that missing subjects are comparable to the last responders) and 3 (meaning that missing subjects differ from the last responders as much as the last responders differ from the first responders). However, other choices can be made including negative C s; we discuss this further in Section 6. For all values of C considered (see Table 3), representing different degrees of resistance, we observed a larger negative effect of adherence therapy on self-reported 52 week QoL (with θ ranging from 0.6 to 0.8) than in the ignorable analysis and wider confidence intervals (that still cover ero). The estimated effect of adherence therapy increases with C which corresponds to those without QoL observed after the maxi- Table 2 QUATRO data: counts (outcome means) by number of attempts (k) and randomied group (Z) after merging 3 9 attempts into one pattern Y observed Y missing R <4 4 Control (n = 205) 77 (42.4) 94 (41.3) 21 (37.4) 8 5 Treatment (n = 204) 73 (40.7) 90 (40.2) 12 (37.6) 20 9 mum number of attempts having poorer QoL than those observed after 3 or fewer attempts. This can also be seen by the slope of the priors, ζ 10 and ζ 11, with posterior means (95% credible intervals) of 1.9 ( 4.8, 0.97) and 1.7 ( 5.2, 1.7), respectively; note that both are negative with the slope for those on adherence therapy more extreme (doing worse as the number of attempts increases). We point out these slopes do not depend on the values of the sensitivity parameter (C) considered. Posterior means of all the parameters are given in the Supplementary Materials A (standard) two-pattern model. We fit a standard two-pattern (outcome observed or not) mixture model under (nonignorable) MAR which gave essentially the same results as the ignorable analysis (not shown). Finally, we fit a MNAR two pattern model with sensitivity parameters, η, specified as in Section 3.3. The values of the sensitivity parameters η for C = 0, 1, 2, 3 were (η 0 = 2.96,η 1 = 2.3), (η 0 = 4.8,η 1 = 3.9), (η 0 = 6.8,η 1 = 5.6), and (η 0 = 8.6,η 1 = 7.3), respectively. As expected, the results closely match the RAM- PMM analysis (in terms of posterior means), but with less uncertainty. For example, for C = 1, the posterior mean and credible interval for θ was 0.6 ( 2.8, 1.6); for C = 3, 0.9 ( 3.1, 1.4). The decrease in uncertainty is expected since there are fewer patterns (and thus parameters). Overall, there was not strong evidence of a beneficial effect of the adherence therapy intervention under any of the PMM formulations. Table 3 Posterior summaries for the RAM-PMM for θ and the treatment specific means. C is the sensitivity parameter C parameter mean 95% CI 0 θ 0.6 ( 2.9, 1.7) E(Y Z = 0) 40.9 (39.2, 42.5) E(Y Z = 1) 40.2 (38.4, 42.1) 1 θ 0.7 ( 3.1, 1.8) E(Y Z = 0) 40.7 (39.1, 42.4) E(Y Z = 1) 40.0 (38.0, 42.1) 2 θ 0.7 ( 3.5, 2.0) E(Y Z = 0) 40.5 (38.8, 42.3) E(Y Z = 1) 39.8 (37.4, 42.1) 3 θ 0.8 ( 3.8, 2.2) E(Y Z = 0) 40.4 (38.5, 42.2) E(Y Z = 1) 39.6 (36.9, 42.2)
7 1166 Biometrics, December Comparing the Results to Those Obtained Using RAM-SMs The Stata module alho, available at the website of the last author, was used to fit a RAM-SM that is conceptually similar to the RAM-PMM used here. This module requires complete covariates and so we used mean imputation to impute missing baseline quality of life scores (imputing missing baseline values in this way in randomied trials is not a source of bias (White and Thompson, 2005)). All participants who provided outcome data after more than 3 attempts were placed in the R = 3 pattern. Those who did not provide outcome data were placed in the R = 4 pattern, regardless of the number of attempts made to obtain their outcome data. A standard linear regression model was assumed for the final quality of life scores, where the covariates were the treatment group, the baseline quality of life scores and center indicators. The following model was assumed for the RAM-SM missing data mechanism, logit{p(r = k R k, Z =, X = x,y = y)} = λ 0k + γ + λ k x + δ 1 y + δ 2 y, (7) where the covariates X are the baseline quality of life score and center effects. The parameter, δ 2 allows the relationship between the outcome and the number of attempts to differ by treatment arm (related to ζ 1 in the PMM); δ 2 = 0 would roughly correspond to ζ 1 being the same for both treatments. This parameter is important because the treatment contrast is very sensitive to the missing data mechanism differing by randomied arm (White et al., 2007). The alho module gave maximum likelihood estimates of ˆδ 1 = 0.041(.008,.074) and ˆδ 2 = 0.026(.011,.063) and the estimated treatment effect, θ, was 1.5( 3.9, 0.8). From this RAM-SM we obtain some evidence that the final quality-of-life scores had an effect in model (7), where the estimates of δ 1 and δ 2 suggest that participants with better final quality-of-life scores are more likely to report them, particularly in the treatment group. This is consistent with the slopes, ζ 1j of the RAM-PMM. The estimated treatment effect is notably larger than that from the RAM-PMM analysis but still does not achieve statistical significance. In fact, the treatment effect for the RAM-SM would correspond to an implausibly extreme value of C>10 in the RAM-PMM, which would likely lead us to question the validity of the RAM-SM results here. However, there are alternative explanations here including whether a better fitting RAM-PMM (or RAM-SM) would necessitate such an extreme C for roughly compatible results. Estimates of all the parameters can be found in the Supplementary Materials A Comparison of the RAM-SM and RAM-PMM Model Fits We compare the relative fit of the RAM-SM and RAM- PMM using the BIC based on the observed data likelihood, BIC = 2loglik + p log n where p is the number of parameters in each model (18 for the RAM-PMM and 16 for the RAM-SM); note that to compute the BIC for the RAM-PMM, we fit a simple frequentist (equivalent) model to the observed data using linear and logistic regressions with missing baseline data filled in using mean imputation (as was done in the RAM-SM analysis) and censoring treated as in the RAM-SM. The BIC for RAM-SM is and for the RAM-PMM is , indicating better fit for the RAM-SM here. However, we accept the better fit of the RAM-SM with caution here given the implicit extreme value of C implied for the models considered and the ad hoc adjustments (described above) needed to make the BIC comparison here. We discuss this further in Section Conclusions/Discussion We have proposed a pattern mixture model for a repeated attempt design that includes sensitivity parameters; we have also made comparisons with the existing selection models for repeated attempt designs. In the QUATRO study, we found minimal evidence of a significant effect of the intervention (adherence therapy) on 52 week self-reported QoL with the models considered. For our analysis of the QUATRO data, the RAM-SM provided a better fit to the data than the RAM-PMM as measured by the BIC. In fact, the log likelihood for the SM was larger than for the PMM; this is likely due to subtle differences between the models including the pattern-specific distributions implied by the selection model, the different forms of the missing data mechanism for the two approaches (cf. Section 4.1) and the ad hoc adjustments needed for the RAM- SM comparison (cf. Section 5.3) here (using the stata alho module). This is despite the fact, as seen in the simulations in Section 4, that the pattern-specific behavior of the RAM-SM is similar to the linear model used to extrapolate the missing pattern in the RAM-PMM. However, we recommend consideration of the RAM-PMM in general as it allows for sensitivity analysis, unlike the RAM- SM, handles the missing data similar to the RAM-SM in terms of a continuum of resistance, and does not have the issue of a potential large impact on inferences of modeling choices in the missing data mechanism (though this impact does not seem to be as large as in ordinary selection models (Ng, 2013)). We saw this undesirable sensitivity as the RAM-SM results here corresponded to an unreasonably extreme value of C and were quite different than related RAM-SM s fit to the QUATRO data in Jackson et al. (2012). In fact, the extreme value of C suggests that very extreme values of the QoL were needed to make the full data response look normal. There are a variety of extensions to the current models. The choice of linearity for the functional form of the conditional means for extrapolation was made due to there only being three patterns in the QUATRO data example. More complex forms for h (k; ζ) can easily be accommodated (though this choice is restricted by the number of attempts/patterns). In the QUATRO data, we only consider positive values for C based on the concept of a continuum of resistance and we recommend the maximum number of attempts as a default upper bound for C. However, negative values can be accommodated as appropriate for other datasets (where, for example, it is thought that those unobserved after the maximum number of attempts are more similar to those observed after few attempts). We could also consider more complex forms for the mean and variance functions, μ(, x,k) and σ 2 (, x,k). We as-
8 Mixture Models for the Analysis of Repeated Attempt Designs 1167 sumed a parametric form for the distribution of the covariates, X; more flexible specifications could easily be developed using Bayesian nonparametric models for the distribution of X (as well as for the response, Y model). For repeated attempts with sparse patterns, we can adapt the ideas from the work of Roy (2003) and Roy and Daniels (2008) to combine patterns in a data-dependent way. We are also working on proving the monotonicity of the pattern specific means observed in Section 4.2. Finally, the approach here was developed for a randomied trial. Extension to observational studies would require some minor adjustments including a definition of X as all required confounders as opposed to covariates that potentially impact missingness. 7. Supplementary Materials The supplementary materials contains WinBUGS code for the models fit in Section 5.1 and parameter estimates for models fit in Sections and 5.2, and are available with this paper at the Biometrics website on Wiley Online Library. Acknowledgements MJD was partially supported by US NIH grants CA85295 and CA DJ and IRW are employed by the UK Medical Research Council [Unit Programme number U ]. References Alho, J. M. (1990). Adjusting for nonresponse bias using logistic regression. Biometrika 77, Daniels, M. J. and Hogan, J. W. (2000). Reparameteriing the pattern mixture model for sensitivity analyses under informative dropout. Biometrics 56, Daniels, M. J. and Hogan, J. W. (2008). Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis, volume 109 of Monographs on Statistics and Applied Probability. Boca Raton, FL: Chapman & Hall/CRC. Gray, R., Leese, M., Bindman, J., Becker, T., Burti, L., David, A., et al. M. (2006). Adherence therapy for people with schiophrenia european multicentre randomised controlled trial. The British Journal of Psychiatry 189, Jackson, D., Mason, D., White, I. R., and Sutton, S. (2012). An exploration of the missing data mechanism in an internet based smoking cessation trial. BMC Medical Research Methodology 12, 157. Jackson, D., White, I. R., and Leese, M. (2010). How much can we learn about missing data?: An exploration of a clinical trial in psychiatry. Journal of the Royal Statistical Society: Series A (Statistics in Society) 173, Kenward, M. G. (1998). Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. Statistics in Medicine 17, Lin, I.-F. and Schaeffer, N. C. (1995). Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly 59, Little, R. J. (1993). -mixture models for multivariate incomplete data. Journal of the American Statistical Association 88, Little, R. J. (1995). Modeling the drop-out mechanism in repeatedmeasures studies. Journal of the American Statistical Association 90, Lunn, D., Best, N., Thomas, A., Wakefield, J., and Spiegelhalter, D. (2002). Bayesian analysis of population PK/PD models: General concepts and software. Journal of Pharmacokinetics and Pharmacodynamics 29, National Research Council (2010). The Prevention and Treatment of Missing Data in Clinical Trials. Washington, D.C.: The National Academies Press. Ng, Y. L. (2013). Using repeated contact attempts to move beyond the missing at random assumption. PhD thesis, University of Cambridge. Roy, J. (2003). Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics 59, Roy, J. and Daniels, M. J. (2008). A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64, White, I. R., Carpenter, J., Evans, S., and Schroter, S. (2007). Eliciting and using expert opinions about dropout bias in randomised controlled trials. Clinical Trials 4, White, I. R. and Thompson, S. G. (2005). Adjusting for partially missing baseline measurements in randomied trials. Statistics in Medicine 24, Wood, A. M., White, I. R., and Hotopf, M. (2006). Using number of failed contact attempts to adjust for non-ignorable nonresponse. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169, Received December Revised May Accepted May 2015.
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationDiscussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs
Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationAlexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010
Strategy for modelling non-random missing data mechanisms in longitudinal studies using Bayesian methods: application to income data from the Millennium Cohort Study Alexina Mason Department of Epidemiology
More informationA comparison of fully Bayesian and two-stage imputation strategies for missing covariate data
A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More information2 Naïve Methods. 2.1 Complete or available case analysis
2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling
More informationWhether to use MMRM as primary estimand.
Whether to use MMRM as primary estimand. James Roger London School of Hygiene & Tropical Medicine, London. PSI/EFSPI European Statistical Meeting on Estimands. Stevenage, UK: 28 September 2015. 1 / 38
More informationLatent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness
Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 36 11-1-2016 Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Li Qin Yale University,
More informationRichard D Riley was supported by funding from a multivariate meta-analysis grant from
Bayesian bivariate meta-analysis of correlated effects: impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences Author affiliations Danielle L Burke
More informationFully Bayesian inference under ignorable missingness in the presence of auxiliary covariates
Biometrics 000, 000 000 DOI: 000 000 0000 Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates M.J. Daniels, C. Wang, B.H. Marcus 1 Division of Statistics & Scientific
More informationA Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute
A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial Antonio R. Linero, Michael J. Daniels
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationA mean score method for sensitivity analysis. to departures from the missing at random. assumption in randomised trials
arxiv:1705.00951v1 [stat.me] 2 May 2017 A mean score method for sensitivity analysis to departures from the missing at random assumption in randomised trials Ian R. White 1,2,, James Carpenter 2,3, and
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationUnbiased estimation of exposure odds ratios in complete records logistic regression
Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology
More informationMISSING or INCOMPLETE DATA
MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing
More informationStatistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationBayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang
Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationMeasuring Social Influence Without Bias
Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from
More informationA comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)
A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA) Ian White Cochrane Statistical Methods Group Webinar 14 th June 2017 Background The choice between
More informationA weighted simulation-based estimator for incomplete longitudinal data models
To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun
More informationMultiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas
Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo
More informationMixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina
Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -
More informationFactor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )
Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationMarkov Chain Monte Carlo in Practice
Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France
More informationBayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material
Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationSample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure
Ciarleglio and Arendt Trials (2017) 18:83 DOI 10.1186/s13063-017-1791-0 METHODOLOGY Open Access Sample size determination for a binary response in a superiority clinical trial using a hybrid classical
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationSelection on Observables: Propensity Score Matching.
Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017
More informationDiscussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data
Biometrics 000, 000 000 DOI: 000 000 0000 Discussion of Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Non-ignorable Missing Data Dylan S. Small
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationSimulation-based robust IV inference for lifetime data
Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationStatistical Practice
Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed
More informationNonrespondent subsample multiple imputation in two-phase random sampling for nonresponse
Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work
More informationAnalyzing Pilot Studies with Missing Observations
Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationThe Bayesian Approach to Multi-equation Econometric Model Estimation
Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation
More informationAn Introduction to Causal Analysis on Observational Data using Propensity Scores
An Introduction to Causal Analysis on Observational Data using Propensity Scores Margie Rosenberg*, PhD, FSA Brian Hartman**, PhD, ASA Shannon Lane* *University of Wisconsin Madison **University of Connecticut
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationDownloaded from:
Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003
More informationGuideline on adjustment for baseline covariates in clinical trials
26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More informationStructure learning in human causal induction
Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use
More informationVariable selection and machine learning methods in causal inference
Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of
More informationCase Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial
Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial William R. Gillespie Pharsight Corporation Cary, North Carolina, USA PAGE 2003 Verona,
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationBootstrapping Sensitivity Analysis
Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.
More informationAnalysing longitudinal data when the visit times are informative
Analysing longitudinal data when the visit times are informative Eleanor Pullenayegum, PhD Scientist, Hospital for Sick Children Associate Professor, University of Toronto eleanor.pullenayegum@sickkids.ca
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationBIOS 6649: Handout Exercise Solution
BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss
More informationState-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49
State-space Model Eduardo Rossi University of Pavia November 2013 Rossi State-space Model Financial Econometrics - 2013 1 / 49 Outline 1 Introduction 2 The Kalman filter 3 Forecast errors 4 State smoothing
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationAn Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies
Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationAdvanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1
Advanced Quantitative Research Methodology, Lecture Notes: Research Designs for Causal Inference 1 Gary King GaryKing.org April 13, 2014 1 c Copyright 2014 Gary King, All Rights Reserved. Gary King ()
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationJoint Modeling of Longitudinal Item Response Data and Survival
Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationInference for correlated effect sizes using multiple univariate meta-analyses
Europe PMC Funders Group Author Manuscript Published in final edited form as: Stat Med. 2016 April 30; 35(9): 1405 1422. doi:10.1002/sim.6789. Inference for correlated effect sizes using multiple univariate
More informationEstimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing
Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationThe STS Surgeon Composite Technical Appendix
The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic
More informationComparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters
Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University
More informationEstimating complex causal effects from incomplete observational data
Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More informationANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS
Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationAnders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh
Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh
More information[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements
[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers
More informationProblem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM
Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods MIT 14.385, Fall 2007 Due: Wednesday, 07 November 2007, 5:00 PM 1 Applied Problems Instructions: The page indications given below give you
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More information