A MODERN STATISTICAL APPROACH TO QUALITY IMPROVEMENT IN HEALTH CARE USING QUANTILE REGRESSION JARROD E. DALTON

Size: px
Start display at page:

Download "A MODERN STATISTICAL APPROACH TO QUALITY IMPROVEMENT IN HEALTH CARE USING QUANTILE REGRESSION JARROD E. DALTON"

Transcription

1 A MODERN STATISTICAL APPROACH TO QUALITY IMPROVEMENT IN HEALTH CARE USING QUANTILE REGRESSION by JARROD E. DALTON Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Dissertation Advisor: Dr. Ralph O Brien Department of Epidemiology and Biostatistics CASE WESTERN RESERVE UNIVERSITY January, 2013

2 We hereby approve the thesis/dissertation of Jarrod E. Dalton candidate for the Doctor of Philosophy degree*. Ralph O Brien, PhD (chair, PhD thesis advisor) Mark Schluchter, PhD Tomas Radivoyevitch, PhD John Barnard, PhD Abdus Sattar, PhD Robert Elston, PhD July 24, 2012 *We also certify that written approval has been obtained for any proprietary material contained therein. 2

3 To my parents William and Connie Dalton, whose gifts to me over the years are too many to list, and to my wife Jamie, whose encouragement and love kept this project on track. Without these people in my life, this work would not be possible.

4 Contents 1 Introduction Baseline Risk Adjustment Linking Processes and Outcomes Quantile Regression Flexible recalibration of binary clinical prediction models Introduction Recalibration Methodology Comparing Models on Calibration An Example Discussion Baseline Risk Adjustment for In-Hospital Mortality using ICD-9-CM Codes Introduction

5 3.2 Methods Model Development Calibration Comparator Model Model Performance and Reliability Results Discussion Quantile Regression Fundamentals Quantiles Quantile Estimation as an Optimization Problem Median as a Minimizer of Absolute Deviations Estimating a Median via Linear Programming Extension to other quantiles Quantile Regression Model Formulation and Interpretation Estimation of Conditonal Quantiles Variability Estimates Covariance Matrix Approximations Wald Tests and Confidence Intervals Testing the Location-Shift Hypothesis

6 4.4.4 Example R Software Sparse Multiquantile Regression via Second-Order Fused Lasso Introduction Methodology Model Fit Criterion Optimization Selection of Regularization Parameters Model Degrees of Freedom Simulation Analysis Application to Real Dataset Conclusion Conclusion Future Work Bibliography 158 6

7 List of Figures 2.1 Calibration curves as a function of the estimated risk score, among the 20% validation cohort. A histogram of the risk score is overlayed below; the curves are plotted over the middle 99% of the distribution Harrell calibration plot of the raw risk scores in the 20% validation cohort. A histogram of the raw risk scores is overlayed below the plot

8 2.3 Calibration of two sets of updated risk scores in independent test data, based respectively on a cubic spline recalibration model and a traditional linear-logistic recalibration model. Histograms of the two sets of updated risk scores are overlayed below; calibration curves are shown for the middle 99% of the distributions of the risk scores (respectively). The displayed calibration curves were fit to the sets updated risk scores using natural cubic splines Study flow diagram

9 3.2 Calibration curves displaying the relationship between observed outcomes and model predictions. On the x-axis is the risk score (given as a predicted probability, and on the y-axis is an observed-to-expected (O/E) odds ratio. Perfect calibration implies an O/E odds ratio of 1 across the risk spectrum. An O/E odds ratio of 0.1 among patients with predicted mortality risk of 10 3, for example, implies that observed mortality was 10 times less likely for these patients than predicted by the model (thus the risk score was too high). Histograms of the risk scores underlie each panel. Calibration curves are truncated to the middle 99% of the data. Panel (a) displays the calibration of raw scores from the logistic model, within the random 20% calibration cohort. Correcting predictions based on these curves was insufficient to ensure complete calibration in the prospective 2009 data (b). However, re-calibration within the 2009 data based on the curve in (b) yielded favorable calibration for both models (c)

10 3.3 Scatterplot of 50,000 randomly sampled discharges from the 2009 data, displaying the ratio of predicted odds of mortality (calculated as the predicted odds under the AllCodeRisk model divided by the predicted odds under the POARisk model) as a function of the POARisk score. Each of the risk scores had been re-calibrated to the 2009 data. Fidelity of the All- CodeRisk model to the POARisk model in terms of characterizing individual patient risk is represented by the horizontal line at an RPO of 1.0 (dashed line). Quantile regression curves displaying the median, first and third quartiles, and middle 95% of the data as a function of POARisk (fit using the entire 2009 sample) are overlaid. The plot shows that the AllCodeRisk model produces risk estimates that are too low for the majority of the patients, though risk estimates among high-risk patients were more consistent

11 3.4 Kernel density plot of the percent difference in hospital performance under the two risk adjustment models (defined as the percent change in hospital observed-to-expected mortality ratio, AllCodeRisk vs. POARisk). Hospitals with fewer than 30 mortalities were excluded from the analysis. Performance under the AllCodeRisk model was within ±20% of that under the POARisk model for 89.6% of the hospitals, which was less than the pre-specified criterion of at least 95% of hospitals ECDF of randomly-generated Poisson(3) data (top panel) Functions r i (u) defining the absolute deviations of each unique observed value of the random Poisson(3) data, y i, i = 1, 2,..., 200; (bottom panel) The objective function ψ(u) = 1 n n i=1 r i(u), minimized at ˆM = Check functions ρ τ (y u) for τ {0.10, 0.50, 0.75} Density plot of total charges associated with heart valve replacement surgery

12 4.5 (top panel) Predictions Q Y X1 (τ) and quantile treatment effects β τ,1 from univariable quantile regression models for an outcome Y based on a dichotomous predictor X 1, for τ {0.1, 0.5, 0.9}. (bottom panel) Quantile treatment effects β τ,1 as a function of τ. The OLS estimate of the treatment effect β 1 (dashed line) is overlayed. The fact that the quantile treatment effects vary implies a violation of the location shift assumption; thus the OLS model is inadequate Deciles of the distribution of hospital charges associated with valve replacement surgery, by ischemic heart disease status Quantile regression coefficient profiles as a function of τ Search procedure for c 1 and c 2. Starting at the unconstrained or full model (triangle), find the test data fit criterion for a series of c 2 values (blue arrow), saving the combination of constraint values that minimizes the criterion (green dots). Decrement c 1 by and repeat the search along the series of c 2 values. If the minimum test data fit criterion is less than the prior minimum, proceed; otherwise stop (green X ) and perform a fine grid search around the neighborhood of the initial minimum (blue diamond)

13 5.2 Histograms of ν for simulated analyses under varying sample sizes Histograms of estimated degrees of freedom under the SMR model, from simulated analyses under varying sample sizes Quantile regression coefficient estimates for the Hospital Compare data modeling the risk-adjusted hospital 30-day mortality rate associated with patients treated for pneumonia. Estimates and pointwise 95% confidence intervals (standard errors estimated using non-parametric bootstrap resampling) are presented as white diamonds and red shaded areas, respectively, while estimates from the SMR model are given by the connected black points Adjusted quantiles of post-diagnosis survival among Non-Hodgkin s Lymphoma patients versus age of diagnosis

14 A Modern Statistical Approach to Quality Improvement in Health Care using Quantile Regression Abstract by JARROD E. DALTON Quality is difficult to measure and compare among medical providers. First, an appropriate metric must be chosen among many potential process-based and patient outcome measures. Further, when comparing providers, one must take into account the fact that patients are not homogeneous across providers. Accounting for the latter with clinical risk adjustment models is a complicated and controversial topic, as providers are increasingly paid based on their performance with respect to risk-adjusted quality measures. A modern approach to hospital quality improvement is developed by expanding on D.R. Cox s 1958 methodology for calibrating binary outcomes; using this calibration methodology and recent efficient generalized linear modeling algorithms to develop a risk-adjustment model for in-hospital morality with data on 20 million inpatient discharges from the State of California between

15 15 and 2008; applying this model to obtain 2009 observed-to-expected mortality ratios for the hospitals across California; and evaluating whether or not performance-dependent relationships between hospital process changes and observed-to-expected mortality ratios exist, based on a novel sparse multiquantile regression technique that incorporates a fused-lasso-type penalty.

16 Chapter 1 Introduction Fee-for-service systems in many current health care reimbursement programs (such as Medicare) may actually produce disincentives for ensuring the quality of services rendered [1]. The Patient Protection and Affordable Care Act passed by Congress and signed into law by President Obama in stipulated that, effective January 1, 2015, payment to physicians is to be based on the quality and not on the volume of care provided. Through this mandate, supporters of the bill argued, greater economic focus on quality might result in better health outcomes while at the same time reducing the costs associated with care. 1 H.R th Congress: Patient Protection and Affordable Care Act. (2009). In GovTrack.us (database of federal legislation). Retrieved March 25, 2012, from 16

17 17 However, the health care industry is relatively unique in that those responsible for selecting providers are largely independent of those responsible for paying for services rendered. Therefore, changing incentives with respect to only the reimbursement processes might have a lesser economic impact than additionally affecting demand for services. Along these lines, performance comparison tools such as the U.S. Department of Health and Human Services Hospital Compare Website 2 have enabled patients to make informed decisions among potential providers based on various process- and outcomes-based performance measures. Quality reporting efforts such as this have been facilitated by a rise in the availability of various large and highquality clinical and administrative data registries. While controversy exists among various health care stakeholders regarding which processes to measure and how to measure them, the statistical issues involved with measuring process adherence among various providers are relatively benign. In contrast, the choice of health outcomes to monitor for quality-of-care is relatively straightforward from a clinical standpoint but objectively comparing outcomes is fraught with statistical difficulties. The primary challenge involved with comparing outcomes is that providers vary with respect to their patients severity of disease and the severity of procedures they use to treat their patients. Thus, some form of baseline 2 Accessed March 24, 2012.

18 18 risk adjustment is necessary in order to better associate differences among providers in outcome rates with differences in quality of care [2]. 1.1 Baseline Risk Adjustment A multitude of techniques for making risk-adjusted comparisons among providers exists. For binary outcomes such as in-hospital mortality, a common technique is to estimate for each provider an observed-to-expected outcome ratio. The expected number of events used in the denominator of this calculation is assigned by the risk adjustment model being employed; specifically, the expected number of events is the sum of individual patients predicted event probabilities. Given this framework, hospital performance is thus closely tied to the underlying risk adjustment model. Correspondingly, precise and unbiased estimation of baseline risk is a crucial component of any useful risk adjustment model. Often, the relative merits of risk adjustment models are established by their ability to discriminate between patients who experience the outcome(s) in question and patients who do not, for instance, by using the concordance index (or C-statistic) [3]. While it is indeed important for useful risk adjustment models to maximize the predictive ability of baseline patient characteristics and planned procedures, these models are often developed for

19 19 patient populations that exhibit a broad spectrum of risk (such as the US inpatient population or the population of Medicare patients) and as a consequence can achieve consistently high C-statistics. In addition to discrimination, estimates from optimal risk adjustment models should calibrate well with observed outcomes. In the context of binary outcomes, this means that those with a predicted probability of p should exhibit an outcome incidence of p for all p between 0 and 1. Compared to discrimination, unfortunately, model calibration is less extensively documented in the literature. For example, there is at least a perception that currently available outcomes-based measures might not adequately adjust for risk, especially in sicker patients [4]. Providers may tend to avoid treating patients who have risk-adjusted estimates of adverse event rates that providers perceive are too low. Likewise, they may cherry-pick those patients who have risk-adjusted outcome rates that seem to be too high [5]. While calibration among high-risk patients is clearly important due to the fact that these patients are most likely to influence observed and expected outcomes, calibration among low-risk patients is also important since hospital outcome risk is typically low for the vast majority of patients. Due to these large numbers, miscalibrations among low-risk patients therefore also have the potential to influence aggregate measures of risk-adjusted outcomes. In circumstances where calibration is actually considered, standardly

20 20 implemented calibration techniques are inadequate for identifying departures from model calibration among low-risk patients. For example, among the 1.3 million cases participating in the National Surgical Quality Improvement Program between 2005 and 2011 (representing over 400 hospitals), the overall incidence of 30-day mortality was 1.6%. With this low overall incidence, it is reasonable to expect that a highly-predictive risk adjustment model would yield predicted probabilities of less than 2% for a large proportion of patients. A smooth plot of the observed incidence on the y-axis versus the expected incidence on the x axis [6] would limit the assessment of model calibration among these patients to the extreme lower left corner of the figure. The first objective of this dissertation was therefore to develop a flexible method for calibrating binary outcomes on the log-odds scale. This method extends the work by D. R. Cox in 1958 [7] which considered a linear-logistic calibration equation and tested whether or not the intercept and slope of this linear-logistic equation are 0 and 1, respectively. Another issue with existing risk adjustment models is loose distinction between which patient characteristics are baseline or present-onadmission (POA) versus which are hospital-acquired. Several highly-discriminative risk adjustment models [8, 9, 10, 11] incorporate administrative data on diagnoses and procedures, such as the Current Procedural Terminology [12] and

21 21 the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) [13]. While administrative datasets are rich (due in large part to the fact that they underlie reimbursement procedures), there has not traditionally been a mechanism in place for documenting the timing of diagnoses, which is a crucial aspect of determing whether or not hospitals are treating or causing patient morbidity. The Defecit Reduction Act of mandated that by 2008 all diagnoses be coded with POA indicators. Unfortunately, health care systems have been slow to comply and consequently, large and representative data registries that can be used to construct risk adjustment models which incorporate POA indicators have only recently become more available. A third issue arising with many existing clinical risk adjustment models is the lack of utilization of modern predictive modeling techniques. It is well known in the statistics community that purposely biasing regression parameter estimates toward zero a technique known as shrinkage often leads to gains in predictive accuracy among external observations (i.e., observations that were not used for model development). Hoerl in 1970 [14] introduced one of the earliest shrinkage methods. His ridge regression technique penalized the model fit criterion by an amount 3 S th Congress: Deficit Reduction Act of (2005). In GovTrack.us (database of federal legislation). Retrieved March 25, 2012, from

22 22 proportional to the L 2 -norm of the coefficient vector β. Ridge regression tends to average correlated predictors in order to produce increased prediction accuracy. A related shrinkage technique named the lasso, which was proposed by Tibshirani in 1996 [15], penalized the L 1 -norm of β. Lasso regression tends to shrink coefficients for predictors unrelated to the response all the way to zero, effectively performing a form of smooth variable selection while increasing predictive accuracy for external observations. Park and Hastie [16, 17] extended the lasso penalty to the general exponential family and to the proportional hazards regression modeling frameworks. In 2005, Zou and Hastie [18] combined the ridge and lasso shrinkage estimators by incorporating a penalty of α times the ridge penalty plus (1 α) times the lasso penalty, for some α between 0 and 1. They showed that this elastic net estimator often resulted in superior prediction performance relative to the ridge and lasso estimators. Finally, Friedman et al. in 2010 [19] developed an extremely efficient algorithm for obtaining the entire solution path for generalized linear models incorporating elastic net penalties, allowing practical implementation on a large scale. The second objective of this dissertation was to develop a baseline risk index for in-hospital mortality, using administrative data from the State of California. These data are relatively unique, since the State of California has been collecting POA indicators associated

23 23 with patients diagnoses longer than most other states; for this risk index, 20.0 million discharge records from the California State Inpatient Database (CA-SID) are used to develop the model and an additional 4.0 million discharge records from the 2009 CA-SID are used to prospectively validate the model. Elastic net shrinkage is used to maximize prediction accuracy for independent observations. The newly-developed calibration technique (Objective #1) is employed to ensure that risk estimates were unbiased across all predicted probabilities. 1.2 Linking Processes and Outcomes Measuring, reporting, and comparing outcomes may be the most important step toward unlocking rapid outcome improvement and making good choices about reducing costs [20]. Without objective (risk-adjusted) outcome measurement, it is impossible to know if the care clinicians deliver is good or bad [21]. Despite ongoing efforts to improve outcome measurement, the assessment of quality of care has largely focused on defining and measuring adherence to process-based performance criteria, with the presumption that these processes are associated with improved clinical outcomes. Along these lines, there already has been skepticism regarding the effectiveness of the quality measurement industry [22, 23]. Without proof

24 24 that improvement in certain processes of care actually helps patients have better outcomes, what motivation do caregivers have to undertake costly and complex efforts to improve processes other than to score highly on public report cards? It may be likely that improvement in outcomes depends on both localized and system-wide process initiatives [21]. Localized quality improvement is affected by the decisions of health care professionals during the course of care as well as by the policy decisions of hospital leadership. The responsibility of defining best practices rests on providers at all levels of care; effective quality improvement research can support these efforts by revealing differences in processes which separate the providers with the best outcomes from providers with the worst outcomes. This type of research is a crucial link in the ongoing plan-do-studyact cycle of quality improvement first conceived by Walter A. Shewhart and popularized in industrial settings by W. Edwards Deming [24, 25]. Though a practice-measure-improve analogue has been proposed for the medical setting [26, 27], the main idea of research completing the cycle of quality improvement remains. Unfortunately, scientific evidence linking process improvement to outcomes is rare [28], despite an increase in the incentives and/or requirements for providers to monitor adherence to various types of process indi-

25 25 cators (e.g., the U.S. Centers for Medicare and Medicaid Services maintains nearly 300 quality indicators as part of their voluntary Physician Quality Reporting System 4 ). An example of such evidence is the work of Peterson et al. in 2006, who found that increasing adherence rates to a composite of nine practice guidelines established by the American College of Cardiology Foundation and the American Heart Association by 10% was associated with a risk-adjusted odds ratio [95% confidence interval] for in-hospital mortality of 0.90 [0.84, 0.97] among patients with acute coronary syndromes [29]. The relationship between processes and outcomes may be complex. It may not suffice to assume, for instance, that a strategic initiative aimed at increasing the quality and quantity of hand-washing would have the same effect on postoperative infectious outcomes across all providers. One might hypothesize that this initiative would have greatest impact among hospitals with the worst risk-adjusted infection rates and little impact among hospitals with already low rates, since the latter group may already be practicing appropriate hand-washing protocols. 4 Accessed March 26, 2012.

26 Quantile Regression Quantile regression [30] is a technique for estimating selected quantiles of a response variable s distribution, conditional on one or more covariates. For instance, regressing the 90th percentile of the distribution of hospital observed-to-expected infection ratios on a set of process measurements might reveal which processes are most important for avoiding unnecessarily high infection rates (compared to other hospitals). Likewise, performing the same analysis with respect to the 10th percentile of this distribution might reveal which processes are most important for maintaining a relatively low infection rate. These sets of processes, as alluded to above, may not be similar. Though quantile regression has existed for several decades, its application has been limited to roughly the last 10 years. One reason for this delay is that the method relies on linear programming techniques in order to find optimal estimates for model coefficients. Linear programming and general convex optimization are not standard components of graduate curricula in statistics and biostatistics. Another reason for the delay is that quantile regression modules for standard statistical analysis software packages had not been developed until the last 10 years. The third objective of this dissertation was to provide a review of quantile regression, starting from fundamental theory which restates the problem of univariate quantile esti-

27 27 mation to a linear programming problem, utilizing basic linear programming modules in R to arrive at optimal quantile estimates, and extending these concepts to the general regression setting. The R package quantreg [31], which incorporates modern and more efficient linear programming algorithms and allows for easier implementation of the methodology, is introduced. Traditional quantile regression treats each desired conditional quantile as its own optimization problem. However, the fitted quantile regression parameter estimates for the 0.50 quantile are likely similar to the fitted parameter estimates for the 0.51 quantile of the response distribution (for example). Restructuring the optimization problem so that sets of regression coefficients for all quantiles of interest are estimated simultaneously - and restricting these regression coefficients in such a way that penalizes excess quantile-to-quantile fluctuations in fitted coefficients for a given predictor - might yield more consistent results in small samples. The fourth objective of this dissertation was to develop a sparse multi-quantile regression technique which restricts parameter estimates according to a second-order fused lasso penalty applied to the joint model fitting criterion. The fused lasso penalty shrinks parameters all the way to zero for unrelated predictors, while at the same time shrinking the quantile-toquantile differences in parameter estimates towards a constant for predictors that are related to the outcome (specifically, for regions of quantiles in which

28 28 there is a relationship between a given predictor and the outcome). The new multiquantile regression modeling technique is then applied in a real-world setting relating hospital process improvement metrics to riskadjusted outcomes. Data from the US Department of Health and Human Services Hospital Compare registry and from the 2010 U.S. Census are analyzed to compare risk-adjusted rates of 30-day mortality associated with pneumonia care among various hospitals. Predictors include various process indicators defining best practices for treating pneumonia, population density in the area of the hospital, minority population percentage, and a relative measure of Medicare spending per patient.

29 Chapter 2 Flexible recalibration of binary clinical prediction models 2.1 Introduction In addition to the relative risks of outcome imposed by various types of exposures, physicians and patients seek measures of absolute risk of outcome to inform care decisions [32, 33]. Likewise, researchers and policy makers are interested in classifying patients with respect to risk of outcome and evaluating the diagnostic utility of novel biomarkers [34]. It is therefore not a surprise that - with advances in statistical methodology and computational capacity, and with the establishment of many large and often disease-specific 29

30 30 observational data registries in medicine - we have seen clinical prediction modeling emerge as a prominent area of research over recent decades, with applications to various types of outcomes [35]. Binary outcomes are perhaps the most prevalent type of outcome studied in medicine, and methods specifically designed for predicting binary outcomes and evaluating model performance in external data abound [33, 6, 36, 37, 38, 39]. Beyond evaluation, adjustments of prediction rules to better characterize future data should be considered [40]. In judging the quality of binary clinical prediction models for independent data, two important considerations are discrimination (the model s ability to separate events from non-events) and calibration (the agreement between model predictions and observed outcome incidences) [6]. Diamond illustrated how these two aspects do not necessarily go hand in hand [41]. Discrimination is important in both prognostic and diagnostic models since those applying the models are typically interested in predicting who among a population is most likely to experience the outcome in question or have a disease in question. Likewise, calibration is important in both settings. When the goal of the model is to diagnose an existing condition, well-calibrated probabilities can better inform treatment decisions (for example, an invasive procedure to treat a certain disease might only be warranted if there is reasonable certainty that the patient truly has the disease in question).

31 31 Ensuring well-calibrated probabilities in prognostic models enables practitioners to give patients accurate assessments of their risk of future outcomes, for example. Curiously, however, calibration in clinical prediction models is repeatedly not given proper attention. This fact has been a primary source of concern, for example, in the interpretation and implementation of payfor-performance programs in health care. Despite the existence of several highly-discriminative risk adjustment models for monitoring quality of care [8, 9, 42, 11, 10, 43], providers tend to avoid treating high-risk patients for fear of the underlying risk adjustment models inability to adequately describe risk among high-risk patients [4]. In other words, they suspect that the models do not calibrate well with actual outcomes. It is crucial, therefore, that the models are evaluated for calibration and any departures from calibration are corrected in order to accurately represent risk in external populations. Only when prognostic models are well-calibrated should their discriminative ability be of concern. Hosmer and Lemeshow [44] developed a goodness-of-fit test for overall model calibration. This was based on grouping patients into a certain number of categories (typically, ten) with respect to their predicted probability and comparing the observed number of events within each group to the expected number of events (defined as the sum of the individual predicted

32 32 probabilities within a given group). While this test can indicate the presence of miscalibration, it is dependent on both sample size and the choice of cutpoints; further, the test alone does not prescribe a solution for the miscalibration. Cox [7] proposed a method for testing calibration and simultaneously estimating a recalibration equation, using external data. Briefly, a new logistic model is developed to relate the expected log-odds of the outcome (based on the prognostic model) to the observed log-odds using a linear equation. Details are given in Section 2 below. The model is deemed to calibrate well if the intercept and slope of the linear equation are 0 and 1, respectively. If the model systematically over- or under-estimates the probability of outcome (known as calibration-in-the-large ), the intercept will be different from zero. Likewise, there should be a one-to-one relationship between expected and observed probabilities; departures from this requirement are reflected as a slope different from one [45]. In practice, however, such a linear-logistic assumption may be inadequate to appropriately describe the nature of miscalibration. In this article Cox s method is extended to allow for a more flexible family of recalibration functions which optionally are covariate-dependent.

33 Recalibration Methodology We partition the clinical prediction modeling process into three phases, each with its own dataset (which are independent of one another, e.g., as given by random partitioning of the overall dataset prior to analysis). The first dataset, called the training dataset, is used to fit the prediction model. The second, called the validation dataset, is used to recalibrate the model. The third, called the test dataset, is used to externally evaluate model performance among data that had no influence on the finalized model itself. This type of partitioning is consistent with the framework described by Hastie et al. for building and evaluating prediction models [46] (though technically speaking one may wish to further partition the second dataset in order to select model tuning parameters such as penalization factors). Now, let us assume that, in our validation dataset, we have obtained a collection of n expected probabilities p i (0, 1) for a binary event Y i {0, 1}, where i = 1, 2,..., n. We note that there are no restrictions on the p i otherwise; regardless of how they were derived from the training cohort (e.g., using logistic regression, random forests, or any of the many other methods for binary prediction), we only require that they are positive numbers less than one. Thus, we are concerned with the agreement between the expected probability p i and the true probabilities p i = Pr(Y i = 1).

34 34 Define a risk score on the log-odds scale as a monotone transformation of the predicted probabilities p i : [ ] p i RS i = log (1 p i ) (2.1) Cox s linear-logistic calibration model is given by: [ ] p i log (1 p i ) = β 0 + β 1RS i + error i. (2.2) In this model, standard Wald chi-squared tests [47] of model coefficients can be used to test whether or not β 0 = 0 and β1 = 1. Steyerberg (2009) suggested restating Cox s model by introducing an offset term for the risk score [35]: [ ] p i log (1 p i ) = β 0 + (β 1 + 1)RS i + error i = RS i + β 0 + β 1 RS i + error i. (2.3) That is, we introduce a term for RS i whose coefficient is fixed at 1 (thus, β 1 = β 1 + 1). Arranging terms, then, we obtain log(γ i ) = β 0 + β 1 RS i + error i, (2.4)

35 35 where γ i is defined as the observed-to-expected odds ratio, i.e., γ i = p i (1 p i ) p i (1 p i ). (2.5) This framework enables generalizing the calibration equation for more complex functions of the risk score, for instance, by including in the calbration model a factor representing sequential risk strata, polynomial terms, or spline smoothers. In general, let H be an n k matrix representing a k-dimensional basis expansion of the risk score. That is, [ ] H i = h 1 (RS i ) h 2 (RS i ) h k (RS i ). (2.6) In Cox s linear-logistic model, H i = [h 1 (RS i )] = [RS i ]. If we are considering discrete risk strata represented by cutpoints {ξ 1, ξ 2,..., ξ k }, then we might have h 1 (RS i ) = I(ξ 1 RS i < ξ 2 ), h 2 (RS i ) = I(ξ 2 RS i < ξ 3 ), and so forth. Various smoothers can similarly be represented under this framework. See Hastie et al. [46] for details. Using H, we estimate the observed-to-expected odds ratio γ i for a given risk score by the offset logistic regression model log( γ i ) = α + H β. (2.7)

36 36 A global likelihood ratio test [44] of model (2.7) against a null model with only the offset term (and no intercept) can be used to test for overall miscalibration. In the presence of general miscalibration, we can test whether or not the form of this miscalibration is more complex than a simple location shift. This is equivalent to testing H 0 : β = 0. Let Σ β be the estimated variance-covariance matrix of β. The Wald statistic W = β T Σ 1 β β (2.8) has an asymptotic χ 2 k distribution under the null hypothesis. Graphical assessments of calibration can be made based on model (2.7) by plotting log( γ) against the risk score; perfect calibration in this plot would represent a horizontal line at log( γ i ) = 0. To recalibrate the model, we obtain the updated risk scores RSi, defined as [ ] RSi p i = log (1 p i ) = RS i + log( γ i ) (2.9) and thus the updated predicted probability p i is the inverse logit p i = exp {RS i + log( γ i )} 1 + exp {RS i + log( γ i )}. (2.10) Finally, note that model (2.7) can be re-used in order to assess the adequacy

37 37 of recalibration in the test dataset, and to further recalibrate predictions if departures from overall calibration remain. 2.3 Comparing Models on Calibration A natural measure of overall miscalibration for a model - which will be called the miscalibration index hereafter - is the total size of the log( γ i ): Q = log( γ) 2 = n (log( γ i )) 2. (2.11) i=1 The asymptotic behavior of Q is dependent on the structural form of the calibration model; however, given that this structural form is pre-specified and fixed, Q can be useful as a relative quantity for comparing two models. Let RS 1i and RS 2i be any two risk scores which estimate the log-odds of Y i. These could be from models fit to the training data, from externallydeveloped risk scoring algorithms, or after applying various recalibration methods. Fit respective calibration curves based on Equation 2.7 in order to estimate log( γ 1i ) and log( γ 2i ). Based on these quantities, we obtain risk-score-specific miscalibration indexes using Equation Thus, we have two miscalibration indexes Q 1 and Q 2. To compare the models, we define a

38 38 miscalibration ratio as R = Q 1 Q 2. (2.12) A bootstrap resampling routine [48] can be used to approximate the sampling distribution of R and obtain an empirical confidence interval. 2.4 An Example To demonstrate the methodology, data from one million hospital inpatient stays were obtained from the National Inpatient Sample 1 (250,000 per year). The overall dataset was randomly divided into a training set (60%), a validation dataset (20%), and a test dataset (20%). Using the training data, a logistic regression model for in-hospital mortality was estimated based on predictors age, gender, and a factor defined by combining the patients prinicpal diagnosis with their principal procedure. The U.S. Agency for Research and Quality s Clinical Classifications Software 2 was used to characterize the diagnoses and procedures. Procedures for a given diagnosis were combined if there were less than 1,000 observations in the overall dataset represented by that combination of diagnosis and procedure. Diagnoses were then com- 1 HCUP Databases. Healthcare Cost and Utilization Project (HCUP). October Agency for Healthcare Research and Quality, Rockville, MD. 2 HCUP CCS. Healthcare Cost and Utilization Project (HCUP). October Agency for Healthcare Research and Quality, Rockville, MD.

39 39 Figure 2.1: Calibration curves as a function of the estimated risk score, among the 20% validation cohort. A histogram of the risk score is overlayed below; the curves are plotted over the middle 99% of the distribution. bined using the same minimum cell size. R software version for 64-bit Microsoft Windows (The R Foundation for Statistical Computing, Vienna, Austria, was used to perform the analysis. The size of all tests was fixed at Figure 2.1 displays estimated calibration curves, based on both a traditional linear-logistic fit and a six-degree-of-freedom natural cubic regression spline fit of the form (2.7). Predictions were in general bimodal,

40 40 with a relatively large group of inpatient stays clustered around a log-odds of 20, i.e., a predicted probability on the order of one in a billion; these stays were mainly associated with obstetric care, psychiatric evaluation, and other minor procedures. However, the spline fit indicates that the model significantly underestimated risk among these stays, with log( γ) as high as 10 to 12. If the true log-odds is closer to -9 than as the calibration model indicates - then predicted mortality is more on the order of one in ten thousand. In comparison, a Harrell calibration plot [6] (i.e., a smooth plot of the predicted probabilities against the observed incidences) indicates generally good calibration of the raw risk scores (Figure 2.2). Overall, the likelihood ratio test revealed significant miscalibration (P<0.0001) and the multiparameter Wald test for miscalibration more complex than a simple location shift (Equation 2.8) was also significant (P<0.0001). The linear-logistic fit, on the other hand, failed to indicate departures from calibration; the estimated intercept [95% confidence interval] was [ , 0.020] (P=0.17, Wald test) and the estimated slope was [-0.041, 0.005] (P=0.12). Using the independent test dataset, recalibrated risk scores RSi were estimated based on the cubic spline and the linear-logistic calibration models, respectively. Another calibration curve was fit to each set of updated risk scores to graphically evaluate whether or not the recalibration models

41 Figure 2.2: Harrell calibration plot of the raw risk scores in the 20% validation cohort. A histogram of the raw risk scores is overlayed below the plot. 41

42 42 successfully achieved their goal. These new calibration curves were fit using six-degree-of-freedom natural cubic regression splines. Results are shown in Figure 2.3. The recalibration based on smoothing resulted in updated risk scores that more closely represented risk than they did prior to recalibration, as evidenced by the fact that the test-data calibration curve was generally closer to the horizontal line given by log( γ) = 0. The global likelihood ratio test did not identify significant miscalibration (P=0.09). In contrast, both the global test and the multiparameter Wald test evaluating the shape of miscalibration were each significant when applied to the updated risk scores obtained from the linear-logistic recalibration model (P < ). To compare the two models on calibration in the test dataset, we prespecified cutpoints ξ = {.00001,.00003,.0001,.0003,.001,.003,.01,.03,.1,.3} on the probability scale and re-estimated log( γ 1i ) (cubic spline recalibration model) and log( γ 2i ) (linear-logistic recalibration model) for sequential risk strata as characterized by ξ. The cutpoints ξ were chosen since they are roughly equidistant on the logit scale. The miscalibration index for the risk scores that were recalibrated using cubic splines was Q 1 = , while the miscalibration index for the risk scores that were recalibrated using a linear-logistic model was Q 2 = , corresponding to an estimated miscalibration ratio of R = In

43 43 other words, the risk scores that were recalibrated using cubic splines were 34.6% better calibrated in the test dataset than the risk scores that were recalibrated using the linear-logistic model. An empirical estimate of the 95% confidence interval for R based on bootstrap resampling was [0.651, 0.657]. 2.5 Discussion A simple but flexible recalibration method for binary prediction models has been presented, which extends the results by Cox [7] by allowing for a broader family of functions to characterize calibration. This method works by introducing an offset representing the predicted log-odds of the outcome into the model, thus establishing a regression model for the logarithm of the observedto-expected odds ratio for the event as a function of the risk score. This representation is also appealing because it facilitates more straightforward graphical assessments of calibration; perfect calibration is given by a horizontal line at zero, instead of a diagonal line through zero. When perfect calibration is represented by a diagonal line, it can be more difficult to identify potentially important deviations [49]. Analogously, when the predicted probabilities are used to assess calibration - as in the method described by Harrell (2001) [6] - potentially important deviations for low or high predicted probabilities can be difficult to ascertain graphically.

44 44 Figure 2.3: Calibration of two sets of updated risk scores in independent test data, based respectively on a cubic spline recalibration model and a traditional linear-logistic recalibration model. Histograms of the two sets of updated risk scores are overlayed below; calibration curves are shown for the middle 99% of the distributions of the risk scores (respectively). The displayed calibration curves were fit to the sets updated risk scores using natural cubic splines.

45 45 Steyerberg et al. (2004) [50] describe an unreliability index for the linear-logistic calibration model as the difference in 2 log likelihood between a linear-logistic model with the intercept and slope estimated as free parameters and a model with the intercept and slope fixed at 0 and 1, respectively. This concept was extended to the general calibration framework presented in this article by comparing the likelihood for the fitted calibration model against the likelihood for a model which included only an offset term for the risk score (and no intercept). This difference in likelihoods is useful for testing overall calibration, but a better measure of general calibration is perhaps based on the total size of the estimated observed-to-expected log odds ratios. While the latter measure is dependent on the choice of parameterization for the calibration model, it is nonetheless useful for comparing two risk scores on calibration performance, given a pre-specified parameterization that is common between the two risk scores (such as discrete risk strata). A test for assessing whether or not observed miscalibration is more complex than overall calibration-in-the-large was proposed. This amounts to a standard Wald test for a collection of model coefficients. If this test is not significant, then the miscalibration might be easily correctable by estimating an intercept-only calibration model (which includes an offset for the risk score). However, the possibility of calibration model misspecification cannot be excluded from consideration.

46 46 After the calibration model is fit to the validation dataset, recalibrating the risk score amounts to adding the calibration model prediction to the original risk score. Calibration of the corrected scores can then be evaluated by re-fitting the calibration model in an independent set of data. In summary, a more flexible method for assessing calibration of binary prediction models and recalibrating model predictions, as well as a measure of relative calibration performance between competing models, were presented.

47 Chapter 3 Baseline Risk Adjustment for In-Hospital Mortality using ICD-9-CM Codes 3.1 Introduction All stakeholders in health care patients, insurers, governments, and providers are now demanding that we rigorously assess quality. Further, these quality measurements have come to influence both patients selection of providers and payment for services [51, 52]. As such, ensuring that comparisons among physicians and among institutions are fair is critical. Consequently, risk ad- 47

48 48 justment models abound [11, 9, 8, 53, 54, 55]. Alarmingly, there is evidence that many of these models are inconsistent in terms of their characterizations of risk [56, 57]. Which model is best for risk adjustment is a difficult question to answer. Often, the relative merits of risk adjustment models are established by their ability to discriminate between patients who experience the outcome(s) in question and patients who do not, for instance, by using the concordance index (or C-statistic) [3]. Generally speaking, models which are developed for patient populations that exhibit a broad spectrum of risk (such as the US inpatient population or the population of Medicare patients) can discriminate outcomes very well; it is thus rather unsurprising that these models achieve very high concordance indices. Sessler and colleagues developed a risk adjustment methodology which incorporated all diagnoses and procedures associated with each stay, and showed that their models discriminated outcomes comparably or better than both patient demographic characteristics and the Charlson Comorbidity Index [53]. Their risk stratification indices (RSIs) are based on International Classification of Disease, 9th Revision, Clinical Modification (ICD-9-CM) codes and were estimated and validated using the Medicare Provider Analysis and Review (MEDPAR) database. Since elderly patients are generally sicker than younger patients, probability estimates obtained from these models might not accurately rep-

49 49 resent actual risk in external populations. (Though, concern regarding this type of model calibration is hardly unique to these models.) Furthermore, the ICD-9-CM diagnosis and procedure codes in the MEDPAR database did not incorporate present-on-admission (POA) indicators. To the extent that new and problematic diagnoses (such as hospital-acquired infection) occur after admission, these risk adjustment models may over-estimate baseline risk associated with each stay. This results in artificially lower (better) quality of care metrics such as risk-adjusted mortality rate. Furthermore, the degree by which quality-of-care metrics are under-estimated may differ if certain hospitals have a greater proportion of in-hospital complications (i.e., are of poorer quality) than others [58, 59, 60]. On the other hand, hospital-acquired diagnoses could also be beneficial in terms of risk of outcomes. Ensuring that expected risk is characterized using only pre-existing conditions may therefore be important. Along these lines, POA indicators could inform risk adjustment of hospital outcomes by eliminating diagnoses representing hospital-acquired complications from risk-adjustment algorithms [61] (as well as protective diagnoses whose absence prior to admission would indicate higher baseline risk than that predicted by ignoring their POA status). However, with POA coding yet to be implemented in some locations (despite the fact that the practice was mandated by Congress in 2005), risk adjustment models that do not rely on POA indicators might

50 50 still be useful provided they are not yielding biased or inaccurate estimates. The primary goal of this research is thus to develop and prospectively validate a baseline risk index for in-hospital mortality (which we denote as POARisk ) using only POA diagnoses, principal procedures, and secondary procedures occurring exclusively prior to the date of the principal procedure; secondarily, we will assess the reliability of a similarly-derived risk index which ignored the POA status of diagnoses and timing of procedures (thus including all diagnoses and procedures), in terms of accurately representing the primary baseline risk index. 3.2 Methods Under authorization by the US Agency for Healthcare Research and Quality, we obtained data on 24 million inpatient discharges from the California State Inpatient Database (CA-SID). This registry represents a census of discharges occurring within the state. POA indicators are captured for all ICD-9-CM diagnosis codes; likewise, date of procedure (relative to the admission date) is captured for all ICD-9-CM procedure codes. We used data from to derive our models, while the data from 2009 were used to prospectively test them (see Model Performance and Reliability below). The data were further split randomly to facilitate a

51 51 two-step modeling procedure. Eighty percent of those discharges were used for initial model development and the remaining 20% were used to perform an initial calibration or bias-correction of risk estimates produced by the initial model (we say an initial calibration because we feel that calibration should constantly be addressed whenever the model is applied in external populations; see the Model Development and Calibration subsections below for details). The only exclusion we made was for those who did not undergo a procedure; thus our models sought to characterize in-hospital mortality risk for all inpatients undergoing at least one procedure. A summary of discharges included and excluded, as well as how the included discharges were partitioned for the purposes of our study, is provided in Figure Model Development To develop the initial POARisk model, we used logistic regression with inhospital mortality as the response and a collection of predictors derived from the ICD-9-CM diagnosis and procedure codes. Considered as inputs to our model were the POA diagnosis codes, the principal procedure code, and any secondary procedure codes for which the date of procedure was prior (but not equal) to the date of the principal procedure. We also used patients age

52 Figure 3.1: Study flow diagram. 52

53 53 and gender (age was represented by two predictors - one which estimated risk for infants less than 1 year old and another linear term for the rest of the patients). The ICD-9-CM codes are hierarchical in nature. For example, acute myocardial infarctions are coded with diagnosis code 410.XX; the fourth digit further classifies these diagnoses based on the location (e.g., 410.2X refers to the inferolateral wall), and the fifth digit specifies the episode of care. As such, many of the five-digit codes lacked sufficient representation for inclusion in our logistic model: an aggregation routine was needed in order to have predictors with adequate cell sizes. Similar in fashion to the original RSI [11], we aggregated these sparselyrepresented diagnoses by truncating the fifth digit off of the corresponding ICD-9-CM diagnosis code. Codes with fewer than 1,000 discharges per year on average in the 80% model development cohort were truncated to four digits (for this average calculation we excluded the year 2004 as there were a number of new codes introduced the next year). The process was repeated, truncating sparsely-represented four-digit codes to three digits. Three-digit codes represented by fewer than 1,000 discharges per year were not included in the model. A comparable aggregation algorithm was implemented for the procedure codes, though we note that procedure codes are represented by a maximum of four digits and the base codes are only two digits; thus we

54 54 aggregated procedure codes from four to three to two digits based on the 1,000 discharges per year criterion. We used an elastic net approach to fit logistic models based on the aggregated predictors [18]. The elastic net is a shrinkage methodology devised to ensure protection against over-fitting a model to the development cohort. The term shrinkage comes from the fact that regression coefficients are purposely biased toward zero; this action has been shown to improve prediction accuracy in external cohorts (specifically, the elastic net encourages highly-correlated predictors to be averaged while at the same time encouraging irrelevant predictors to be removed from the model altogether) [46, 15]. Removing variables in this manner has been shown to have favorable statistical properties over traditional methods such as stepwise variable selection [15]. To fit these models, we used the R statistical software package glmnet developed by Friedman et al. (2010) [19] (on R version for 64-bit Linux, The R Project for Statistical Computing, Vienna, Austria) Calibration With pay-for-performance pressures, providers tend to avoid high-risk patients for fear of the underlying risk adjustment models inability to adequately describe their risk of outcome [4]. In other words, there is either a

55 55 perceived or real lack of agreement between predicted probability of an outcome produced by the model in question and the actual probability of the outcome in a new set of patients, i.e., a lack of model calibration. Alas, calibration is often overlooked in risk adjustment modeling [62]; even when calibration is considered, it is often as a model diagnostic [63] instead of as a prescription for adjusting the model estimates to remove any biases introduced by the lack of calibration. And furthermore, calibration in the patient population for which a model is developed is no guarantee for calibration in prospective and/or external patient populations. Therefore, we initially calibrated our model using the randomlyreserved 20% calibration cohort, with the intention that calibration is to be again assessed and corrected whenever the model is used in new data. This calibration was done by fitting a logistic regression model with in-hospital mortality as the response and the risk score (i.e., model prediction on the log-odds scale) as the only predictor [7, 64]. An offset term - that is, a predictor for which the coefficient is fixed at a value of one - was used in the model in order to represent miscalibrations as deviations from a horizontal line at zero (see Figure 3.2). Restricted cubic splines were used to allow for nonlinearities, resulting in a robust calibration curve which corrected initial mortality risk estimates based on the actual incidences observed in the calibration cohort. Predicted log-odds of mortality were then adjusted based on

56 56 this calibration curve to yield the final POARisk score Comparator Model Our research objective was to evaluate whether or not the absence of POA indicators precludes accurate and unbiased estimation of patients baseline risk of mortality. To study this hypothesis, we developed a second model (which we denote AllCodeRisk ). For this model, we employed the same strategy as in the primary POARisk model (including the initial calibration step). The only difference between the two models was the inclusion of all diagnosis codes regardless of whether or not they were POA and all procedure codes regardless of when they were performed during the stay within the AllCodeRisk model Model Performance and Reliability We used the 2009 CA-SID to prospectively evaluate the performance of the POARisk and AllCodeRisk models. Each risk score underwent a second calibration step whereby the risk scores were modified to calibrate specifically with the 2009 data, using the same calibration methodology as described above. (Hereafter, reference to the POARisk and AllCodeRisk scores in the context of the 2009 data refers to these recalibrated scores.) Discrimina-

57 57 Figure 3.2: Calibration curves displaying the relationship between observed outcomes and model predictions. On the x-axis is the risk score (given as a predicted probability, and on the y-axis is an observed-to-expected (O/E) odds ratio. Perfect calibration implies an O/E odds ratio of 1 across the risk spectrum. An O/E odds ratio of 0.1 among patients with predicted mortality risk of 10 3, for example, implies that observed mortality was 10 times less likely for these patients than predicted by the model (thus the risk score was too high). Histograms of the risk scores underlie each panel. Calibration curves are truncated to the middle 99% of the data. Panel (a) displays the calibration of raw scores from the logistic model, within the random 20% calibration cohort. Correcting predictions based on these curves was insufficient to ensure complete calibration in the prospective 2009 data (b). However, re-calibration within the 2009 data based on the curve in (b) yielded favorable calibration for both models (c).

58 58 tive ability of the corrected scores was evaluated by estimating respective concordance indices. To evaluate the ability of the AllCodeRisk model to accurately approximate the POARisk model in terms of individual patient risk, we calculated a ratio of predicted odds (RPO) for each patient as the odds of mortality under the AllCodeRisk model divided by the odds under the POARisk model. If the AllCodeRisk model accurately approximates the POARisk model, then the RPOs would be tightly clustered around a value of 1.0. We thus pre-supposed that adequate approximation would be represented if risk under the AllCodeRisk model was within ±50% of risk under the POARisk model for at least 95% of patients (i.e., an RPO between 0.5 and 1.5). We also evaluated this accuracy specific to different levels of patient risk (as defined by the POARisk model) by making a Bland-Altman-type plot (i.e., a scatterplot of the RPO vs. POARisk) [49]. On this plot we overlayed quantile regression curves approximating the median, inter-quartile range, and middle 95% of the data as a function of POARisk. Finally, we performed an analysis whereby hospital performance was compared under each model. Hospital performance was defined as an observed-to-expected mortality ratio, with the expected number of mortalities differentially defined for the two models. For a given model, the expected number of mortalities was calculated as the sum of individual patients pre-

59 59 dicted probabilities of mortality. The percent difference in O/E ratio was then calculated for each hospital and analyzed using a histogram (for this analysis, we excluded hospitals for whom there were fewer than 30 inpatient mortalities over the year 2009). Adequate approximation of hospital performance via the AllCodeRisk model was pre-defined as having at least 95% of hospitals with an O/E ratio under the AllCodeRisk model that was within ±20% of that defined by the POARisk model. 3.3 Results Of the 20 million discharges in the CA-SID, 7.3 million were associated with inpatient stays for which no procedures were performed. Removing these discharges and randomly partitioning the data, we used 10.1 million discharges (80%) for fitting the logistic models and 2.5 million (20%) for estimating the calibration curves. Aggregation of the ICD-9-CM codes based on the cell size criterion of 1,000 patients per year (on average, for the years ) resulted in 2,476 predictors for the POARisk model (1,807 diagnosis-related predictors, 666 procedure-related predictors, and 3 demographic-related predictors) and 2,584 predictors for the AllCodeRisk model (1,870 predictors, 711 predictors, and 3 predictors, respectively). The elastic net logistic regression modeling algorithm removed 501/2,476 (20.2%) and 494/2,584 (19.1%)

60 60 irrelevant predictors, respectively. Calibration of the raw risk scores among the randomly-reserved 20% initial calibration cohort was poor (Figure 3.2a). Both risk scores overestimated risk for patients with predicted probabilities roughly between 10 4 and 10 2 and for patients with predicted probabilities roughly greater than 0.6. Using these calibration curves to correct the model estimates and applying the corrected model to the 2009 discharges, we found that calibration generally improved for both models but was still less than ideal (Figure 3.2b). However, re-calibration based on these curves resulted in risk estimates that unbiasedly represented true risk among the 2009 data (Figure 3.2c). C- statistics for the POARisk and AllCodeRisk models (as re-calibrated to the 2009 data) were and 0.981, respectively, indicating a high degree of discriminative ability. In the 2009 data, individual patient risk estimates based on the All- CodeRisk model tended to depart from those obtained from the POARisk model. Adequate approximation of baseline risk from the AllCodeRisk model - as defined by a risk estimate within 50% of that obtained from the POARisk model (i.e., an RPO between 0.5 and 1.5) - occurred in 15.8% of patients, which was lower than the pre-specified criterion of >95% of patients. Risk estimates were lower in the AllCodeRisk model than in the POARisk model for 92.5% of patients. The median RPO [middle 95%] was 0.25 [ ],

61 61 i.e., the predicted odds of mortality under the AllCodeRisk model was 0.25 times the predicted odds under the POARisk model or lower for 50% of the 2009 discharges. Inconsistency in risk estimates generally held across the entire risk spectrum (Figure 3.3). In the analysis of hospital performance under the two models, we excluded 125/424 (29.5%) due to the fact that they had fewer than 30 deaths in 2009; thus we analyzed 299 hospitals. The median percent change in observed-to-expected mortality ratio (using the AllCodeRisk model vs. using the POARisk model) was -0.1% (Figure 3.4). Ninety five percent of the hospitals had a percent change between -20.8% and +30.6%, implying that variability in performance was not adequately low (based on our pre-defined criterion of ±20% for at least 95% of hospitals); 89.6% of hospitals had percent differences smaller than ±20%. 3.4 Discussion Risk adjustment is fraught with difficulties. Choosing the right risk adjustment model to gauge quality has financial implications. To a large extent, statistical performance should guide this decision. We developed two highlyaccurate models for in-hospital mortality, based on differing sets of ICD-9-CM codes. The POARisk model used only codes that would resemble our best

62 Figure 3.3: Scatterplot of 50,000 randomly sampled discharges from the 2009 data, displaying the ratio of predicted odds of mortality (calculated as the predicted odds under the AllCodeRisk model divided by the predicted odds under the POARisk model) as a function of the POARisk score. Each of the risk scores had been re-calibrated to the 2009 data. Fidelity of the All- CodeRisk model to the POARisk model in terms of characterizing individual patient risk is represented by the horizontal line at an RPO of 1.0 (dashed line). Quantile regression curves displaying the median, first and third quartiles, and middle 95% of the data as a function of POARisk (fit using the entire 2009 sample) are overlaid. The plot shows that the AllCodeRisk model produces risk estimates that are too low for the majority of the patients, though risk estimates among high-risk patients were more consistent. 62

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Measuring community health outcomes: New approaches for public health services research

Measuring community health outcomes: New approaches for public health services research Research Brief March 2015 Measuring community health outcomes: New approaches for public health services research P ublic Health agencies are increasingly asked to do more with less. Tough economic times

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Incorporating published univariable associations in diagnostic and prognostic modeling

Incorporating published univariable associations in diagnostic and prognostic modeling Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

A note on R 2 measures for Poisson and logistic regression models when both models are applicable Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Dose-response modeling with bivariate binary data under model uncertainty

Dose-response modeling with bivariate binary data under model uncertainty Dose-response modeling with bivariate binary data under model uncertainty Bernhard Klingenberg 1 1 Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267 and Institute of Statistics,

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Modelling Survival Data using Generalized Additive Models with Flexible Link

Modelling Survival Data using Generalized Additive Models with Flexible Link Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Penalized likelihood logistic regression with rare events

Penalized likelihood logistic regression with rare events Penalized likelihood logistic regression with rare events Georg Heinze 1, Angelika Geroldinger 1, Rainer Puhr 2, Mariana Nold 3, Lara Lusa 4 1 Medical University of Vienna, CeMSIIS, Section for Clinical

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS Stanley Weng, National Agricultural Statistics Service, U.S. Department of Agriculture 3251 Old Lee Hwy, Fairfax, VA 22030,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Modular Program Report

Modular Program Report Modular Program Report Disclaimer The following report(s) provides findings from an FDA initiated query using Sentinel. While Sentinel queries may be undertaken to assess potential medical product safety

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial Analysis Summary: Acute Myocardial Infarction and Social Determinants of Health Acute Myocardial Infarction Study Summary March 2014 Project Summary :: Purpose This report details analyses and methodologies

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners

More information

Scottish Atlas of Variation

Scottish Atlas of Variation Scottish Atlas of Variation User Guide March 2019 Contents Introduction... 3 Accessing the Atlas of Variation... 3 Dashboard... 4 Introduction... 4 Drop-down menus... 5 Explore icon... 5 Information icons...

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates WI-ATSA June 2-3, 2016 Overview Brief description of logistic

More information

A simulation study of model fitting to high dimensional data using penalized logistic regression

A simulation study of model fitting to high dimensional data using penalized logistic regression A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson Proceedings of the 0 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RELATIVE ERROR STOCHASTIC KRIGING Mustafa H. Tongarlak Bruce E. Ankenman Barry L.

More information

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Chris Paciorek Department of Biostatistics Harvard School of Public Health application joint

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Base Rates and Bayes Theorem

Base Rates and Bayes Theorem Slides to accompany Grove s handout March 8, 2016 Table of contents 1 Diagnostic and Prognostic Inference 2 Example inferences Does this patient have Alzheimer s disease, schizophrenia, depression, etc.?

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses Background The fifteen wholly-owned health plans under WellPoint, Inc. (WellPoint) historically did not collect data in regard to the race/ethnicity of it members. In order to overcome this lack of data

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018 Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States By Russ Frith Advisor: Dr. Raid Amin University of W. Florida Capstone Project in Statistics April,

More information

Overview of statistical methods used in analyses with your group between 2000 and 2013

Overview of statistical methods used in analyses with your group between 2000 and 2013 Department of Epidemiology and Public Health Unit of Biostatistics Overview of statistical methods used in analyses with your group between 2000 and 2013 January 21st, 2014 PD Dr C Schindler Swiss Tropical

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were Goetzel RZ, Pei X, Tabrizi MJ, Henke RM, Kowlessar N, Nelson CF, Metz RD. Ten modifiable health risk factors are linked to more than one-fifth of employer-employee health care spending. Health Aff (Millwood).

More information

Using Instrumental Variables to Find Causal Effects in Public Health

Using Instrumental Variables to Find Causal Effects in Public Health 1 Using Instrumental Variables to Find Causal Effects in Public Health Antonio Trujillo, PhD John Hopkins Bloomberg School of Public Health Department of International Health Health Systems Program October

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information