The STS Surgeon Composite Technical Appendix

Similar documents
Bayesian Methods in Multilevel Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

ST 740: Markov Chain Monte Carlo

Determining the number of components in mixture models for hierarchical data

Monte Carlo in Bayesian Statistics

STAT 425: Introduction to Bayesian Analysis

Default Priors and Effcient Posterior Computation in Bayesian

Multilevel Statistical Models: 3 rd edition, 2003 Contents

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

VCMC: Variational Consensus Monte Carlo

Extending causal inferences from a randomized trial to a target population

Bayesian Linear Regression

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Reconstruction of individual patient data for meta analysis via Bayesian approach

MCMC algorithms for fitting Bayesian models

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Networks in Educational Assessment

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Bayesian Inference and Decision Theory

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian decision procedures for dose escalation - a re-analysis

Introduction to mtm: An R Package for Marginalized Transition Models

Multi-state Models: An Overview

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Bayesian Methods for Machine Learning

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Bayesian Hierarchical Models

Or How to select variables Using Bayesian LASSO

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

Fully Bayesian Spatial Analysis of Homicide Rates.

Inferring biological dynamics Iterated filtering (IF)

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Bayesian Linear Models

STA 4273H: Statistical Machine Learning

The Bayesian Approach to Multi-equation Econometric Model Estimation

eqr094: Hierarchical MCMC for Bayesian System Reliability

Markov Chain Monte Carlo

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Beyond MCMC in fitting complex Bayesian models: The INLA method

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Quantile POD for Hit-Miss Data

Subjective and Objective Bayesian Statistics

2 Inference for Multinomial Distribution

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Bayesian Linear Models

On Bayesian Computation

Computational Methods Short Course on Image Quality

0.1 normal.bayes: Bayesian Normal Linear Regression

Bayesian non-parametric model to longitudinally predict churn

Bayesian linear regression

Rank Regression with Normal Residuals using the Gibbs Sampler

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Modelling geoadditive survival data

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

Variable Selection for Spatial Random Field Predictors Under a Bayesian Mixed Hierarchical Spatial Model

Penalized Loss functions for Bayesian Model Choice

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Appendix: Modeling Approach

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Bayesian mixture modeling for spectral density estimation

Modelling trends in the ocean wave climate for dimensioning of ships

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

ML estimation: Random-intercepts logistic model. and z

Principles of Bayesian Inference

Nonresponse weighting adjustment using estimated response probability

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Markov Chain Monte Carlo, Numerical Integration

TECHNICAL REPORT NO. 1078R. Modeling Spatial-Temporal Binary Data Using Markov Random Fields

BAYESIAN ANALYSIS OF CORRELATED PROPORTIONS

0.1 poisson.bayes: Bayesian Poisson Regression

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Inference: Probit and Linear Probability Models

Hierarchical Linear Models

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Gibbs Sampling in Latent Variable Models #1

Multi-level Models: Idea

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

Session 3A: Markov chain Monte Carlo (MCMC)

Multivariate Survival Analysis

Bayesian Analysis. Bayesian Analysis: Bayesian methods concern one s belief about θ. [Current Belief (Posterior)] (Prior Belief) x (Data) Outline

Transcription:

The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic regression model. The term bivariate refers to the fact that both operative mortality and major complications were analyzed together in a single model, not estimated one at a time in separate models. Random-effects refers to the assumption that the provider-specific parameters of interest are assumed to arise from a specified distribution defined by parameters that are also estimated in the modelling process. To adjust for case mix, we first calculated each patient s risk score for operative mortality and each patient s risk score for major complications using an existing set of STS risk models. The goal of calculating a patient risk score was to reduce the number of covariates in the hierarchical model by summarizing the predictive information from a large number of baseline covariates into a single number. Adjustment for each covariate individually in the hierarchical model would be theoretically preferable but computationally impractical due to the large number of records and covariates, and the computationally intensive nature of Bayesian hierarchical model estimation. Calculation of Risk Scores For each patient, risk scores for operative mortality and major complications were calculated from existing models that were specific to the individual patient s type of operation. For patients undergoing isolated CABG, isolated AVR, or isolated AVR + CABG, risk scores were calculated according to the published STS 2008 mortality and major complications models for isolated CABG, isolated valve, or isolated valve + CABG, respectively. To ensure high calibration for the current study cohort, coefficients of each model were 1

re-estimated using the current 3-year study sample and current endpoint definitions. For patients undergoing a mitral operation without CABG, risk scores were calculated using a modified version of the published STS 2008 mortality and major complications models for isolated valve procedures. The STS 2008 models were modified to account for the inclusion of patients undergoing tricuspid repair and to provide a more detailed adjustment for endocarditis and degree of tricuspid regurgitation. Briefly, the modified models included a new variables for tricuspid repair (yes/no), a new variable for treated endocarditis (yes/no), and a more detailed adjustment for degree of tricuspid (less than moderate, moderate, severe). Coefficients of the modified models were estimated using the current 3-year study cohort and current endpoint definitions. For patients undergoing a mitral operation with concomitant CABG, risk scores were calculated using a modified version of the published STS 2008 mortality and major complications models for isolated valve + CABG operations. Modifications to these isolated valve + CABG models were identical to the modifications of the isolated valve models described above. Coefficients of the modified models were estimated using the current 3-year study cohort and current endpoint definitions. A patient s mortality risk score was defined as the patient s predicted risk of operative mortality and a patient s major complication risk score was defined as the patient s predicted risk of major complication. Hierarchical Model For the i-th of n j patients treated by the j-th surgeon (j = 1, 2,..., N), let p 1ji be patient s predicted risk of operative mortality (i.e. mortality risk score), let p 2ji be the patient s predicted risk of major complications (i.e. complications risk score), let p 1j = n j i=1 p 1ij/n j be the average mortality risk score for surgeon j, and let p 2j = n j i=1 p 2ij/n j be the average complications risk score for surgeon j. Let Y 1ji be a binary indicator of operative mortality status (0=alive, 1=dead), let Y 2ji be an indicator of major complications (0 = none, 1 = at least one), and let π kji = Pr(Y kji = 1 p kji ) be the probability of the occurrence of the k-th endpoint where k = 1 refers 2

to mortality and k = 2 refers to complications. The associations of p 1ji and p 2ji with Y kji and Y 2ji were assumed to be described by a bivariate random effects logistic regression model with normally distributed surgeon-specific random intercept parameters. At the first level, we specified the following within-surgeon regression model: (operative mortality) (major complication) ( π1ji log log ) 1 π 1ji = α 1j + x 1ji β 1 ( ) π2ji 1 π 2ji = α 2j + x 2ji β 2 where x kji = log(p kji /(1 p kji )), β 1 is an unknown parameter relating risk scores to mortality, β 2 is an unknown parameter relating risk scores to major complications, and α 1j and α 2j are surgeon-specific intercept parameters (random effects). Conditional on π 1ji and π 2ji, the variables Y 1ji and Y 2ji were assumed to be distributed as two independent Bernoulli variables with parameters π 1ji and π 2ji, respectively. That is: Pr(Y 1ji = y 1ji, Y 2ji = y 2ji π 1ji, π 2ji ) = 2 k=1 π y kji kji (1 π kji) 1 y kji. Outcomes of patients of different surgeons were assumed to be statistically independent, and outcomes of patients treated by the same surgeon were assumed to be conditionally independent given (α 1j, α 2j ). The assumption that Y 1ji and Y 2ji are conditionally independent given π 1ji and π 2ji is likely to be violated in practice but was made in order to facilitate computation. Although the model assumes conditional independence between Y 1ji and Y 2ji, the model does not assume marginal independence between these two variables, as the underlying probabilities π 1ji and π 2ji depend on random effects parameters α 1j and α 2j which are assumed to be correlated, as described below. At the second level, we specified the following between-surgeon model: [ α1j α 2j ] ind N ([ γ10 + γ 11 x 1j γ 20 + γ 21 x 2j ] [ σ11 σ, 12 σ 12 σ 22 where x kj = log( p kj/(1 p kj )), and where γ 10, γ 11, γ 20, γ 21, σ 11, σ 12, and σ 22 are unknown parameters to be estimated from the data. In other words α 1j and α 2j were assumed to be independent draws from a bivariate normal distribution with a mean depending on the surgeon s average case mix, as 3 ])

reflected in x 1j and x 2j. The assumption that pairs (α 1j, α 2j ) are independent is somewhat artificial as we believe that outcomes of surgeons operating at the same hospital may be correlated. Implicitly, we assume that a hospital s effect on outcomes is captured in the surgeon-level parameters (α 1j, α 2j ). Accounting for x 1j and x 2j in the surgeon-level model is intended to reduce confounding in the estimation of β 1 and β 2 in the patient-level model. As noted by various authors (e.g. Neuhaus and Kalbfleisch, Biometrics, 1998; Localio et al. Ann Intern Med, 2001; Neuhaus and McCulloch, JRSS-B, 2006), inclusion of only individual-level covariates in a hierarchical model can produce biased estimates when covariates and random effects are correlated. Inclusion of both individual-level covariates such as x 1ji and x 2ji and surgeon-level covariates such as x 1j and x 2j allows partitioning covariate effects into within-surgeon and between-surgeon components in order to produce consistent estimates of the within-surgeon model parameters β 1 and β 2. In addition, inclusion of cluster-level covariates in hierarchical models can potentially enhance the precision in the estimation of the cluster-specific parameters, which in our case are α 1j and α 2j. Definition of Risk-Adjusted Rates Based on this model, the j-th surgeon s risk-adjusted rates of operative mortality and major complications were defined as (operative mortality) θ 1j = (major complication) θ 2j = i=1 expit(α 1j + x 1ji β 1 ) i=1 expit(constant 1 + x 1ji β 1 ) Y 1 i=1 expit(α 2j + x 2ji β 2 ) i=1 expit(constant 2 + x 2ji β 2 ) Y 2 where Y 1 denotes the overall aggregate observed rate of operative mortality in the study sample, Y 2 denotes the overall aggregate observed rate of major complication in the study sample, and constant 1 and constant 2 are chosen to represent typical values of α 1j and α 2j, respectively. Definition of Composite Score The overall composite score of the j-th surgeon was defined as θ j = w(1 θ 1j ) + (1 w)(1 θ 2j ) 4

where w = (1/σ 1 )/(1/σ 1 + 1/σ 2 ) and σ k denotes the standard deviation of the θ kj s across surgeons, k = 1, 2. Estimation Model parameters were estimated in a Bayesian framework by specifying a prior probability distribution for the unknown model parameters β 1, β 2, γ 10, γ 11, γ 20, γ 21, σ 11, σ 12, σ 22. Because our prior knowledge was limited, we specified a vague proper prior distribution that consisted of independent normal distributions for regression coefficients (β 1, β 2, γ 10, γ 11, γ 20, γ 21 ), and an inverse Wishart distribution for variance parameters, Σ = (σ 11, σ 12, σ 22 ). Posterior means and credible intervals were calculated using Markov Chain Monte Carlo (MCMC) simulations as implemented in OpenBUGS version 3.2.2 software. Posterior summaries were calculated by generating 20,000 sets of simulated parameter values after a burn-in period of 1,000 MCMC iterations to ensure convergence. Adequacy of the number of MCMC iterations was assessed by the methods of Raftery and Lewis (1992) and Geweke (1991) as implemented in the CODA add-on package for R statistical software. The parameter θ j was estimated as ˆθ j = 20000 l=1 θ (l) j / 20000, where θ (l) j denotes the simulated values of θ j at the l-th iteration of the MCMC procedure. A 98% Bayesian credible interval was obtained by calculating the 200th lowest and 200th highest values of θ j across the 20,000 simulated values. Estimation of Reliability Reliability is conventionally defined as the proportion of variation in a measure that is due to true between-unit differences (i.e., signal) as opposed to random statistical fluctuations (i.e., noise). Equivalently, it is the squared correlation between a measurement and the true value. Accordingly, reliability was defined as the square of the Pearson correlation coefficient (ρ 2 ) between the set of surgeon-specific estimates ˆθ 1,..., ˆθ N and the corresponding unknown true values θ 1,..., θ N, that is: ρ 2 = j=1 (ˆθ j 1 N ˆθ N h=1 h )(θ j 1 N N h=1 θ h) j=1 (ˆθ j 1 N ˆθ N h=1 h ) 2 N j=1 (θ j 1 N N h=1 θ h) 2 5

The quantity ρ 2 was estimated by its posterior mean, namely, where ˆρ 2 = 1 20000 20000 l=1 ρ 2 (l) ρ 2 (l) = j=1 (ˆθ j 1 N h=1 ˆθ h )(θ (l) j=1 (ˆθ j 1 N ˆθ N h=1 h ) 2 N j=1 (θ(l) j 1 N j 1 N N h=1 θ(l) h ) h=1 θ(l) h )2 with θ (l) h denoting the value of θ j on the l-th MCMC sample ˆθ j = 20000 l=1 θ (l) j /20000 denoting the posterior mean of θ j. A 95% credible interval for ρ 2 was obtained by calculating the 500th smallest and 500th largest values of ρ 2 (l) across the 20,000 MCMC samples. References Geweke, John. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Vol. 196. Federal Reserve Bank of Minneapolis, Research Department, 1991. Localio, A. Russell, Berlin, Jesse A., Ten Have, Thomas R., and Kimmel, Stephen E. (2001). Adjustments for center in multicenter studies: an overview. Annals of Internal Medicine, 135, 112 23. Neuhaus, John M., and Jack D. Kalbfleisch. (1998). Between-and withincluster covariate effects in the analysis of clustered data. Biometrics, 54, 638 645. Neuhaus, John M., and Charles E. McCulloch. Separating between- and within-cluster covariate effects by using conditional and partitioning methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 859 872. Raftery, Adrian E. and Lewis, Steven. How many iterations in the Gibbs sampler? In Bayesian Statistics 4 (J.M. Bernardo et al., editors), Oxford University Press, pp. 763-773. 6