On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

Similar documents
Distributed analysis in multi-center studies

DATA-ADAPTIVE VARIABLE SELECTION FOR

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

University of California, Berkeley

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Targeted Maximum Likelihood Estimation in Safety Analysis

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Power calculator for instrumental variable analysis in pharmacoepidemiology

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Authors and Affiliations: Nianbo Dong University of Missouri 14 Hill Hall, Columbia, MO Phone: (573)

PEARL VS RUBIN (GELMAN)

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies.

Using Geographic Information Systems for Exposure Assessment

Introduction to Statistical Analysis

Incorporating published univariable associations in diagnostic and prognostic modeling

Statistics in medicine

Statistics in medicine

Comparative effectiveness of dynamic treatment regimes

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses

Federated analyses. technical, statistical and human challenges

STAT331. Cox s Proportional Hazards Model

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

Journal of Biostatistics and Epidemiology

6.873/HST.951 Medical Decision Support Spring 2004 Evaluation

High-dimensional propensity score algorithm in comparative effectiveness research with time-varying interventions

Measurement Error in Spatial Modeling of Environmental Exposures

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Review. More Review. Things to know about Probability: Let Ω be the sample space for a probability measure P.

Introduction to the Analysis of Tabular Data

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Instrumental variables estimation in the Cox Proportional Hazard regression model

Since the seminal paper by Rosenbaum and Rubin (1983b) on propensity. Propensity Score Analysis. Concepts and Issues. Chapter 1. Wei Pan Haiyan Bai

Disease mapping with Gaussian processes

Ignoring the matching variables in cohort studies - when is it valid, and why?

Lab 8. Matched Case Control Studies

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

University of California, Berkeley

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Simple Sensitivity Analysis for Differential Measurement Error. By Tyler J. VanderWeele and Yige Li Harvard University, Cambridge, MA, U.S.A.

Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records

Part IV Statistics in Epidemiology

Effect Modification and Interaction

Power and Sample Size Calculations with the Additive Hazards Model

GIS and Health Geography. What is epidemiology?

3003 Cure. F. P. Treasure

Harvard University. Harvard University Biostatistics Working Paper Series

Combining multiple observational data sources to estimate causal eects

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Multi-state Models: An Overview

Poisson Regression: Let me count the uses!

A Bootstrap Confidence Interval Procedure for the Treatment Effect Using Propensity Score Subclassification

Lecture 8 Stat D. Gillen

Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

Introduction to logistic regression

Supplemental Table 48. Studies reporting the prevalence of rheumatoid arthritis in multiple sclerosis

Bounding the Probability of Causation in Mediation Analysis

Propensity Score Analysis with Hierarchical Data

Propensity Score Methods for Causal Inference

Casual Mediation Analysis

Survival Analysis I (CHL5209H)

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Propensity scores for repeated treatments: A tutorial for the iptw function in the twang package

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Technical Track Session I: Causal Inference

eappendix for Sensitivity Analysis Without Assumptions

Data, Design, and Background Knowledge in Etiologic Inference

Estimating the Marginal Odds Ratio in Observational Studies

JOINTLY MODELING CONTINUOUS AND BINARY OUTCOMES FOR BOOLEAN OUTCOMES: AN APPLICATION TO MODELING HYPERTENSION

Growth Mixture Model

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis

Causal inference in epidemiological practice

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft

Discussion of Papers on the Extensions of Propensity Score

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

A spatial scan statistic for multinomial data

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Logistic Regression Models for Multinomial and Ordinal Outcomes

Equivalence of random-effects and conditional likelihoods for matched case-control studies

METHODS FOR STATISTICS

A Unification of Mediation and Interaction. A 4-Way Decomposition. Tyler J. VanderWeele

Balancing Covariates via Propensity Score Weighting

Stratified Randomized Experiments

Technical Track Session I:

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Transcription:

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm Richard Wyss 1, Bruce Fireman 2, Jeremy A. Rassen 3, Sebastian Schneeweiss 1 Author Affiliations: 1 Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School 2 Kaiser Permanente, Northern California 3 Aetion, Inc., New York, New York

In the article titled High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data, the authors introduce a semi-automated variable selection algorithm for high-dimensional proxy adjustment within insurance healthcare claims databases. 1 The high-dimensional propensity score (HDPS) algorithm evaluates thousands of diagnostic, procedural, and medication claims codes and, for each code, generates binary variables based on the frequency of occurrence for each code during a defined pre-exposure covariate assessment period. The HDPS then prioritizes or ranks each variable based on its potential for bias by assessing the variable s prevalence and univariate association with the treatment and outcome according to the Bross formula. 1, 2 From this ordered list, investigators then specify the number of variables to include in the HDPS model along with prespecified variables such as age and sex. 1 A full description of the HDPS algorithm is provided elsewhere. 1 In the original article by Schneeweiss et al., 1 the Bross bias multiplier for prioritizing covariates was defined as P C1 (RR CD 1)+1 P C0 (RR CD 1)+1, if RR CD 1 1 P C1 ( RR 1)+1 CD, if RR 1 CD < 1, P C0 ( RR 1)+1 CD where P C1 represents the prevalence of the binary covariate within the exposed group, P C0 the prevalence of the binary covariate within the unexposed group, and RR CD the relative risk for the univariate association between the binary covariate and the study outcome.

One of us (BF) noted that for correct assessment of a binary covariate s confounding impact, the Bross bias multiplier should be defined simply as: P C1 (RR CD 1)+1 P C0 (RR CD 1)+1, for all values of RR CD We repeated a subset of the analyses from the original manuscript using a revised HDPS that included the correct implementation of the Bross formula. A full description of the data sources that were used for the empirical analyses is provided in the original manuscript. 1, 3 Table 1 shows that there was almost no change from the results reported in the original manuscript after using the above Bross formula for covariate prioritization. For the NSAID data example (Table 1), 199 out of the top 200 ranked variables and 476 out of the top 500 ranked variables were common to both the ordering from the revised HDPS and the ordering from the original manuscript. For the Statin data example (Table 1), 193 out of the top 200 ranked variables and 486 out of the top 500 ranked variables were common to both the ordering from the revised HDPS and the ordering from the original manuscript. The HDPS software that is distributed online has been updated to include the modified implementation of the Bross formula. 4 Results from analyses that have been conducted using older versions of the HDPS algorithm are unlikely to change meaningfully after this correction.

REFERENCES 1. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512-522 2. Bross ID. Spurious effects from an extraneous variable. Journal of chronic diseases. 1966;19:637-647 3. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. American journal of epidemiology. 2011;173:1404-1413 4. Rassen JA, Doherty M, Huang W, Schneeweiss S. Pharmacoepidemiology toolbox. Boston, MA.http://www.hdpharmacoepi.org

Table 1. Comparison of results for a subset of analyses reported in Schneeweiss et al. (2009) with results using the revised HDPS Dataset a Model b No. Covariates Selected c Common HDPS Results Reported in 2009 1 Results with Revised HDPS Selected Variables d C-Statistic 95% CI NSAID Odds Ratio 95% CI C-Statistic Odds Ratio 1 Unadjusted ----- ------ 1.09 0.91, 1.30 ------ 1.09 0.91, 1.30 2 d = 4 ----- 0.61 1.01 0.84, 1.21 0.61 1.01 0.84, 1.21 3 d = 4, l = 14 ----- 0.66 0.94 0.78, 1.12 0.66 0.94 0.78, 1.12 4 d = 4, l = 14, k = 200 199 out of 200 0.69 0.86 0.72, 1.04 0.69 0.86 0.72, 1.04 5 d = 4, l = 14, k = 500 476 out of 500 0.71 0.88 0.73, 1.06 0.71 0.87 0.72, 1.06 5b d = 4, k = 500 476 out of 500 0.71 0.87 0.72, 1.05 0.70 0.88 0.73, 1.06 Statin 1 Unadjusted ------ ------ 0.56 0.51, 0.62 ------ 0.56 0.51, 0.62 2 d = 4 ------ 0.70 0.77 0.69, 0.85 0.70 0.77 0.69, 0.85 3 d = 4, l = 42 ------ 0.82 0.80 0.70, 0.90 0.82 0.80 0.70, 0.90 4 d = 4, l = 42, k = 200 193 out of 200 0.86 0.86 0.76, 0.98 0.85 0.86 0.76, 0.98 5 d = 4, l = 42, k = 500 486 out of 500 0.87 0.86 0.76, 0.98 0.87 0.87 0.76, 0.99 5b d = 4, k = 500 486 out of 500 0.86 0.89 0.78, 1.02 0.86 0.90 0.79, 1.02 a NSAID: comparison of nonsteroidal anti-inflammatory drugs (NSAID) versus selective Cox-2 Inhibitors on GI complications; Statin: comparison of statin use versus glaucoma initiation on risk of death (see Schneeweiss et al. (2009) for further details). b Models 1 through 5b in the above table correspond to Models 1 through 5b in Tables 3 and 4 from the original manuscript by Schneeweiss et al. (2009) c d = the number of demographic variables, l = the number of predefined covariates, and k = the number of empirically selected variables (see Schneeweiss et al. (2009) for further details) d Number of HDPS generated variables that were selected by the revised HDPS (i.e., HDPS with correct implementation of the Bross formula for covariate prioritization) that were also selected by the version of HDPS used in Schneeweiss et al. (2009).