Distributed analysis in multi-center studies

Similar documents
On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

Targeted Maximum Likelihood Estimation in Safety Analysis

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Modular Program Report

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Modular Program Report

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Comparative effectiveness of dynamic treatment regimes

Lecture 8 Stat D. Gillen

Federated analyses. technical, statistical and human challenges

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

A framework for developing, implementing and evaluating clinical prediction models in an individual participant data meta-analysis

Low-Income African American Women's Perceptions of Primary Care Physician Weight Loss Counseling: A Positive Deviance Study

Incorporating published univariable associations in diagnostic and prognostic modeling

Lecture 7 Time-dependent Covariates in Cox Regression

Power and Sample Size Calculations with the Additive Hazards Model

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

STA6938-Logistic Regression Model

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Measurement Error in Spatial Modeling of Environmental Exposures

Ph.D. course: Regression models. Introduction. 19 April 2012

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Philosophy and Features of the mstate package

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Core Courses for Students Who Enrolled Prior to Fall 2018

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Supplementary Online Content

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Part IV Statistics in Epidemiology

Deep Temporal Generative Models of. Rahul Krishnan, Uri Shalit, David Sontag

Data Mining in Pharmacovigilence. Aimin Feng, David Madigan, and Ivan Zorych

Optimal Patient-specific Post-operative Surveillance for Vascular Surgeries

Poisson Regression: Let me count the uses!

Correlation and regression

Building a Prognostic Biomarker

Case-control studies

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

A multi-state model for the prognosis of non-mild acute pancreatitis

Supplementary Appendix

6.873/HST.951 Medical Decision Support Spring 2004 Evaluation

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

BIOS 2041: Introduction to Statistical Methods

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Statistics in medicine

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

4 Data collection tables, worksheets, and checklists

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Instrumental variables estimation in the Cox Proportional Hazard regression model

Survival Analysis for Case-Cohort Studies

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Extending the results of clinical trials using data from a target population

Author's response to reviews

Completions Survey materials can be downloaded using the following link: Survey Materials.

Interrupted Time Series Analysis for Single Series and Comparative Designs: Using Administrative Data for Healthcare Impact Assessment

Propensity Scores for Repeated Treatments

Growth Mixture Model

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

DO NOT CITE WITHOUT AUTHOR S PERMISSION:

Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures

β j = coefficient of x j in the model; β = ( β1, β2,

Overview of statistical methods used in analyses with your group between 2000 and 2013

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Multi-state Models: An Overview

Journal of Biostatistics and Epidemiology

Using Geospatial Methods with Other Health and Environmental Data to Identify Populations

DISCRETE PROBABILITY DISTRIBUTIONS

Lecture 5: Poisson and logistic regression

Effect Modification and Interaction

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Multiple imputation to account for measurement error in marginal structural models

Lecture 2: Poisson and logistic regression

Basic Medical Statistics Course

Institution: CUNY Hostos Community College (190585) User ID: 36C0029 Completions Overview distance education All Completers unduplicated count

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Institution: CUNY Bronx Community College (190530) User ID: 36C0029 Completions Overview distance education All Completers unduplicated count

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

CHAPTER 3 HEART AND LUNG TRANSPLANTATION. Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

PhD course: Statistical evaluation of diagnostic and predictive models

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

DETERMINATION OF THE SAMPLE SIZE AND THE NUMBER OF FOLLOW-UP TIMES BY USING LINEAR PROGRAMMING

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Transcription:

Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy, unauthorized uses of transferred data, inaccurate analysis or interpretation of data, or contractual or legal restrictions. Although these challenges can be addressed in part by proper governance and appropriate updates to existing regulations, newer privacy-protecting analytic and data-sharing methods offer another potential solution. This presentation will describe the use of privacy-protecting analytic methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling of individual-level datasets across data sources. We will present several comparative safety and effectiveness studies of medical treatments that employ these methods to generate actionable real-world evidence. 1. Toh S, Gagne JJ, Rassen JA, Fireman BH, Kulldorff M, Brown JS. Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med Care 2013:51(8 Suppl 3):S4-S10 2. Toh S, Hampp C, Reichman ME, Graham DJ, Balakrishnan S, Pucino F, Hamilton J, Lendle S, Iyer A, Rucker M, Pimentel M, Nathwani N, Griffin MR, Brown NJ, Fireman BH. Risk of hospitalized heart failure among new users of saxagliptin, sitagliptin, and other antihyperglycemic drugs: A retrospective cohort study. Ann Intern Med 2016;164(11):705-714 (PMC5178978) 3. Toh S, Reichman ME, Graham DJ, Hampp C, Zhang R, Butler MG, Iyer A, Rucker M, Pimentel M, Hamilton J, Lendle S, Fireman BH; for the Mini-Sentinel AMI-Saxagliptin Surveillance Writing Group. Prospective post-marketing surveillance of acute myocardial infarction in new users of saxagliptin: A population-based study. Diabetes Care 2018;41(1):39-48

Distributed analysis in multi center studies Darren Toh, ScD Department of Population Medicine Harvard Medical School & Harvard Pilgrim Health Care Institute Boston, MA November 18, 2018

Disclosures Research support Patient Centered Outcomes Research Institute (ME 1403 11305) Office of the Assistant Secretary for Planning and Evaluation & Food and Drug Administration (HHSF223200910006I) National Institutes of Health (U01EB023683) Agency for Healthcare Research and Quality (R01HS026214) Board of Directors, International Society for Pharmacoepidemiology My spouse is an employee of Biogen 2

Overview Evolution of multi center studies Analytic methods in multi center studies Select examples Discussion 3

Overview Evolution of multi center studies Analytic methods in multi center studies Select examples Discussion 4

Multi center studies Many studies are now done in multi center settings 5

Why do multi center studies? Larger sample sizes Allow studies of rare treatments or rare outcomes Allow studies in specific subpopulations Allow studies to be done more quickly More diverse populations Allow more generalizable findings Allow assessment of treatment effect heterogeneity 6

Multi center studies v1.0 Analysis center 7

Multi center studies v1.0 Pooling study specific individual level datasets 8

Typical datasets shared in multi center studies v1.0 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 0 0 40 45 1 0 1 0 003 0 0 365 76 0 0 0 0 004 0 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 1 1 15 80 1 0 0 1 007 1 0 4 65 1 1 0 1 008 1 0 145 77 0 1 0 0 009 0 1 33 48 1 0 0 0 010 0 0 98 52 1 0 0 0 011 0 0 34 32 0 0 0 0 9

Typical datasets shared in multi center studies v1.0 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 0 0 40 45 1 0 1 0 003 0 0 365 76 0 0 0 0 Each row represents an individual 004 0 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 1 1 15 80 1 0 0 1 007 1 0 4 65 1 1 0 1 008 1 0 145 77 0 1 0 0 009 0 1 33 48 1 0 0 0 010 0 0 98 52 1 0 0 0 011 0 0 34 32 0 0 0 0 10

Typical datasets shared in multi center studies v1.0 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 0 0 40 45 1 0 1 0 003 0 0 365 76 0 0 0 0 Each column represents a covariate 004 0 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 1 1 15 80 1 0 0 1 007 1 0 4 65 1 1 0 1 008 1 0 145 77 0 1 0 0 009 0 1 33 48 1 0 0 0 010 0 0 98 52 1 0 0 0 011 0 0 34 32 0 0 0 0 11

Typical datasets shared in multi center studies v1.0 Data Partner 1 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 1 0 40 45 1 0 1 0 003 1 0 365 76 0 0 0 0 004 1 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 0 1 15 80 1 0 0 1 007 0 0 4 65 1 1 0 1 008 0 0 145 77 0 1 0 0 Data Partner 2 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 0 1 35 44 0 1 3 0 002 0 1 213 54 0 1 1 1 003 0 1 453 78 0 0 4 1 004 0 0 58 87 1 0 3 1 005 1 0 31 22 1 0 3 0 006 1 0 56 46 0 1 2 0 Site PatID Exposure Outcome Time Age Sex DM HTN CVD 1 001 1 0 312 33 1 0 1 1 1 002 1 0 40 45 1 0 1 0 1 003 1 0 365 76 0 0 0 0 1 004 1 0 200 56 0 1 0 0 1 005 0 1 2 21 0 0 1 0 1 006 0 1 15 80 1 0 0 1 1 007 0 0 4 65 1 1 0 1 1 008 0 0 145 77 0 1 0 0 2 001 0 1 35 44 0 1 3 0 2 002 0 1 213 54 0 1 1 1 2 003 0 1 453 78 0 0 4 1 2 004 0 0 58 87 1 0 3 1 2 005 1 0 31 22 1 0 3 0 2 006 1 0 56 46 0 1 2 0 2 007 1 0 123 53 0 1 1 1 2 008 1 0 546 35 0 0 3 0 007 1 0 123 53 0 1 1 1 008 1 0 546 35 0 0 3 0 12

Multi center studies v2.0 Individual data partners Data standardization (common data model) Data accessible to research projects Site 1 Site 2 Site 3 Site 4 Site 1 Site 2 Site 3 Site 4 Research projects Programs written against common data model Data quality improvement feedback loop Adapted from: http://www.hcsrn.org/asset/b9efb268 eb86 400e 8c74 2d42ac57fa4F/VDW.Infographic031511.jpg 13

Data standardization Common data model 14

Distributed analysis in networks with common data model 1 Analysis Center Secure Network Portal 1. User creates and submits query Review & Run Query Data Partner 1 Review & Return Results Enrollment Demographics Utilization Pharmacy Etc Review & Run Query Data Partner 2 Review & Return Results Enrollment Demographics Utilization Pharmacy Etc 15

Distributed analysis in networks with common data model 1 Analysis Center Secure Network Portal 1. User creates and submits query 2. Data partners retrieve query 2 Review & Run Query Data Partner 1 Enrollment Demographics Utilization Pharmacy Etc Review & Return Results Review & Run Query Data Partner 2 Review & Return Results Enrollment Demographics Utilization Pharmacy Etc 16

Distributed analysis in networks with common data model 1 Analysis Center Secure Network Portal 1. User creates and submits query 2. Data partners retrieve query Review & Run Query 2 3 Data Partner 1 Enrollment Demographics Utilization Pharmacy Etc Review & Return Results 4 3. Data partners review and run query against their local data 4. Data partners review results Review & Run Query Data Partner 2 Review & Return Results 3 Enrollment Demographics Utilization Pharmacy Etc 4 17

Distributed analysis in networks with common data model 1 Analysis Center 6 1. User creates and submits query Secure Network Portal 2. Data partners retrieve query Review & Run Query 2 3 Data Partner 1 Enrollment Demographics Utilization Pharmacy Etc Review & Return Results 4 5 3. Data partners review and run query against their local data 4. Data partners review results Review & Run Query Data Partner 2 Review & Return Results 5. Data partners return results via secure network 3 Enrollment Demographics Utilization Pharmacy Etc 4 6. Results are aggregated and reported 18

Typical datasets shared in multi center studies v2.0 Data Partner 1 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 1 0 40 45 1 0 1 0 003 1 0 365 76 0 0 0 0 004 1 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 0 1 15 80 1 0 0 1 007 0 0 4 65 1 1 0 1 008 0 0 145 77 0 1 0 0 Data Partner 2 PatID Exposure Outcome Time Age Sex DM HTN CVD 001 0 1 35 44 0 1 3 0 002 0 1 213 54 0 1 1 1 003 0 1 453 78 0 0 4 1 004 0 0 58 87 1 0 3 1 005 1 0 31 22 1 0 3 0 006 1 0 56 46 0 1 2 0 Site PatID Exposure Outcome Time Age Sex DM HTN CVD 1 001 1 0 312 33 1 0 1 1 1 002 1 0 40 45 1 0 1 0 1 003 1 0 365 76 0 0 0 0 1 004 1 0 200 56 0 1 0 0 1 005 0 1 2 21 0 0 1 0 1 006 0 1 15 80 1 0 0 1 1 007 0 0 4 65 1 1 0 1 1 008 0 0 145 77 0 1 0 0 2 001 0 1 35 44 0 1 3 0 2 002 0 1 213 54 0 1 1 1 2 003 0 1 453 78 0 0 4 1 2 004 0 0 58 87 1 0 3 1 2 005 1 0 31 22 1 0 3 0 2 006 1 0 56 46 0 1 2 0 2 007 1 0 123 53 0 1 1 1 2 008 1 0 546 35 0 0 3 0 007 1 0 123 53 0 1 1 1 008 1 0 546 35 0 0 3 0 19

Concerns about data sharing in multi center studies v1 & v2 Loss of patient privacy Unauthorized uses of data Inaccurate analysis or interpretation of data Disclosures of sensitive institutional or corporate information Contractual restrictions 20

Data sharing A balancing act Analytic flexibility Granularity or identifiability of information 21

Multi center studies v3.0 Analysis Center 22

Multi center studies v3.0 Pooling study specific summary level datasets 23

Overview Evolution of multi center studies Analytic methods in multi center studies Select examples Discussion 24

Privacy protecting methods for multi center studies v3.0 Summary score based methods Meta analysis of database specific effect estimates Distributed regression 25

Summary scores Confounders PS DRS Treatment Outcome PS: Propensity scores DRS: Disease risk scores 26

Individual level dataset with individual covariates PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 0 0 40 45 1 0 1 0 003 0 0 365 76 0 0 0 0 004 0 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 1 1 15 80 1 0 0 1 007 1 0 4 65 1 1 0 1 008 1 0 145 77 0 1 0 0 009 0 1 33 48 1 0 0 0 010 0 0 98 52 1 0 0 0 011 0 0 34 32 0 0 0 0 27

Individual level dataset with summary scores PatID Exposure Outcome Time PS 001 1 0 312 0.33 002 0 0 40 0.21 003 0 0 365 0.56 004 0 0 200 0.11 005 0 1 2 0.97 006 1 1 15 0.56 007 1 0 4 0.40 008 1 0 145 0.22 009 0 1 33 0.43 010 0 0 98 0.78 011 0 0 34 0.38 28

Summary score based method #1 Matching PatID Exposure Outcome Time PS 001 1 0 312 0.33 Persons in exposed Persons in unexposed Events in exposed Events in unexposed 002 0 0 40 0.21 003 0 0 365 0.56 500 500 80 75 004 0 0 200 0.11 005 0 1 2 0.97 006 1 1 15 0.56 007 1 0 4 0.40 008 1 0 145 0.22 009 0 1 33 0.43 010 0 0 98 0.78 011 0 0 34 0.38 29

Summary score based method #1 Matching Data Partner 1 Persons in exposed Persons in unexposed Events in exposed Events in unexposed 500 500 87 85 Data Partner 2 Site Persons in exposed Persons in unexposed Events in exposed Events in unexposed 1 500 500 87 85 2 400 400 68 65 Persons in exposed Persons in unexposed Events in exposed Events in unexposed 400 400 68 65 30

Summary score based method #2 Stratification PatID Exposure Outcome Time PS 001 1 0 312 0.33 PS or DRS stratum Persons in exposed Persons in unexposed Events in exposed Events in unexposed 002 0 0 40 0.21 003 0 0 365 0.56 004 0 0 200 0.11 005 0 1 2 0.97 006 1 1 15 0.56 1 200 150 30 35 2 150 100 20 40 3 200 180 21 21 4 150 200 26 18 007 1 0 4 0.40 008 1 0 145 0.22 009 0 1 33 0.43 010 0 0 98 0.78 011 0 0 34 0.38 31

Summary score based method #3 Risk set analysis PatID Exposure Outcome Time PS 001 1 0 312 0.33 Event Event time Event exposed Risk set exposed Risk set unexposed 002 0 0 40 0.21 003 0 0 365 0.56 004 0 0 200 0.11 005 0 1 2 0.97 006 1 1 15 0.56 007 1 0 4 0.40 1 8 0 300 299 2 12 1 296 295 3 20 1 290 288 4 21 0 286 283 008 1 0 145 0.22 009 0 1 33 0.43 010 0 0 98 0.78 011 0 0 34 0.38 32

Meta analysis of database specific effect estimates PatID Exposure Outcome Time Age Sex DM HTN CVD 001 1 0 312 33 1 0 1 1 002 0 0 40 45 1 0 1 0 003 0 0 365 76 0 0 0 0 004 0 0 200 56 0 1 0 0 005 0 1 2 21 0 0 1 0 006 1 1 15 80 1 0 0 1 007 1 0 4 65 1 1 0 1 008 1 0 145 77 0 1 0 0 009 0 1 33 48 1 0 0 0 010 0 0 98 52 1 0 0 0 011 0 0 34 32 0 0 0 0 Hazard ratio Lower 95% CI Upper 95% CI 2.97 1.95 4.52 33

Distributed regression Type Name Intercept E X1 X2 Y ID E X1 X2 Y A001 0 13.89 3.42 28.70 A002 1 18.10 1.29 27.90 A003 0 6.41 4.86 33.10 A004 1 16.30 1.45 17.20 A005 1 17.57 2.51 21.70 A100 0 5.78 2.53 23.76 SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5 SSCP E 52.0 52.0 813.2 138.1 1060.9 SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8 SSCP X2 405.9 138.1 3458.7 2240.8 9572.3 SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9 MEAN 1.0 0.5 11.6 4.1 22.4 STD 0.0 0.5 6.6 2.5 8.4 N 100 100 100 100 100 Variable Parameter estimate Standard error Intercept 25.4540 3.7959 E 0.4323 1.7865 X1 0.5643 0.1432 X2 0.6564 0.4532 Analyst inputs individual level dataset into statistical software Statistical software produces intermediate statistics as part of computing process Statistical software produces final results 34

Distributed regression Type Name Intercept E X1 X2 Y ID E X1 X2 Y A001 0 13.89 3.42 28.70 A002 1 18.10 1.29 27.90 A003 0 6.41 4.86 33.10 A004 1 16.30 1.45 17.20 A005 1 17.57 2.51 21.70 A100 0 5.78 2.53 23.76 SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5 SSCP E 52.0 52.0 813.2 138.1 1060.9 SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8 SSCP X2 405.9 138.1 3458.7 2240.8 9572.3 SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9 MEAN 1.0 0.5 11.6 4.1 22.4 STD 0.0 0.5 6.6 2.5 8.4 N 100 100 100 100 100 Variable Parameter estimate Standard error Intercept 25.4540 3.7959 E 0.4323 1.7865 X1 0.5643 0.1432 X2 0.6564 0.4532 Regular regression shares this Analyst inputs individual level dataset into statistical software Statistical software produces intermediate statistics as part of computing process Statistical software produces final results 35

Distributed regression Type Name Intercept E X1 X2 Y ID E X1 X2 Y A001 0 13.89 3.42 28.70 A002 1 18.10 1.29 27.90 A003 0 6.41 4.86 33.10 A004 1 16.30 1.45 17.20 A005 1 17.57 2.51 21.70 A100 0 5.78 2.53 23.76 SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5 SSCP E 52.0 52.0 813.2 138.1 1060.9 SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8 SSCP X2 405.9 138.1 3458.7 2240.8 9572.3 SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9 MEAN 1.0 0.5 11.6 4.1 22.4 Distributed regression shares this STD 0.0 0.5 6.6 2.5 8.4 N 100 100 100 100 100 Variable Parameter estimate Standard error Intercept 25.4540 3.7959 E 0.4323 1.7865 X1 0.5643 0.1432 X2 0.6564 0.4532 Analyst inputs individual level dataset into statistical software Statistical software produces intermediate statistics as part of computing process Statistical software produces final results 36

Overview Evolution of multi center studies Analytic methods in multi center studies Select examples Discussion 37

Example 1 http://www.hopkinsmedicine.org/healthlibrary/test_procedures/ga stroenterology/laparoscopic_adjustable_gastric_banding_135,63/ http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gastroent erology/roux en y_gastric_bypass_weight loss_surgery_135,65/ 38

Study design 1/1/2005 Start of follow up (discharge date) 12/31/2010 365 days Time 21 years at time of bariatric surgery 1 BMI of 35kg/m 2 or greater Continuous enrollment w/ benefits No prior bariatric surgery No prior diagnosis of study outcome Re hospitalization Death Health plan disenrollment 12/31/2010 730 days of follow up Index bariatric hospitalization Contributing person times Toh et al, Med Care, 2014;52:664 668 39

Confounders Age Sex Race/ethnicity Diabetes* Baseline BMI* Year of procedure Charlson comorbidity score* Atrial fibrillation* GERD* Hypertension* Sleep Apnea* Asthma* Deep vein thrombosis* Pulmonary embolism* Congestive heart failure* Hyperlipidemia* Coronary artery disease* Oxygen use* Assistive walking device* Smoking status* Blood pressure* Length of stay assoc. with procedure *Identified during the 365 day baseline period prior to the index bariatric hospitalization Toh et al, Med Care, 2014;52:664 668 40

Statistical analysis Propensity score stratification Analysis Pooled patient level data analysis (benchmark) Risk set based analysis PS stratified analysis (by quintile) Meta analysis of site specific effect estimates Toh et al, Med Care, 2014;52:664 668 41

Select baseline patient characteristics Characteristics Adjustable gastric band (n=1,550) Roux en y gastric bypass (n=5,792) N %* N %* Mean age (SD) 46.7 11.2 45.7 10.7 Age > 65 years 76 4.9 141 2.4 Female sex 1,266 81.7 4,823 83.3 Race/ethnicity Black or African American 137 8.8 522 9.0 White 1,130 72.9 3,840 66.3 Hispanic 142 9.2 769 13.3 Other 62 4.0 280 4.8 Unknown 79 5.1 381 6.6 Baseline BMI 30 34.9 96 6.2 174 3.0 35 39.9 480 31.0 1,410 24.3 40 49.9 813 52.4 3,126 54.0 50 161 10.4 1,082 18.7 Toh et al, Med Care, 2014;52:664 668 42

Individual level data analysis, by site Site Adjusted HR 95% CI Site 1 0.68 0.45, 1.02 Site 2 0.65 0.37, 1.15 Site 3 0.52 0.26, 1.04 Site 4 0.72 0.35, 1.50 Site 5 0.82 0.46, 1.48 Site 6 0.32 0.13, 0.75 Site 7 0.79 0.62, 1.01 Toh et al, Med Care, 2014;52:664 668 43

Results, by method Method Adjusted HR 95% CI Individual level 0.71 0.59, 0.84 Risk set 0.71 0.59, 0.84 PS stratification 0.70 0.59, 0.83 Meta analysis 0.71 0.60, 0.84 Toh et al, Med Care, 2014;52:664 668 44

Example 2 Distributed regression Distributed Regression vs. Pooled Patient Level Regression LINEAR Covariates Distributed Regression Pooled Patient Level Differences in Differences in Parameter Estimates Standard Errors Parameter Estimates Standard Errors Parameter Estimates Standard Errors Intercept 35.50548 1.57690 35.50548 1.57690 8.38E 13 2.26E 14 Variable 1 0.27283 0.04401 0.27283 0.04401 4.44E 16 9.92E 16 Variable 2 1.01582 0.23259 1.01582 0.23259 1.09E 13 3.22E 15 Variable 3 0.73017 0.07229 0.73017 0.07229 3.54E 14 1.32E 15 Distributed Regression vs. Pooled Patient Level Regression LOGISTIC Covariates Distributed Regression Pooled Patient Level Differences in Differences in Parameter Estimates Standard Errors Parameter Estimates Standard Errors Parameter Estimates Standard Errors Intercept 2.49660 0.49057 2.49660 0.49060 1.33E 15 9.99E 16 Variable 1 0.14465 0.03686 0.14460 0.03690 2.04E 13 2.97E 14 Variable 2 0.14105 0.06976 0.14100 0.06980 1.38E 14 2.22E 16 Variable 3 0.13889 0.02376 0.13890 0.02380 2.42E 14 2.19E 16 Distributed Regression vs. Pooled Patient Level Regression COX Covariates Distributed Regression Pooled Patient Level Differences in Differences in Parameter Estimates Standard Errors Parameter Estimates Standard Errors Parameter Estimates Standard Errors Variable 1 0.06692 0.02084 0.06692 0.02084 1.39E 16 2.78E 17 Variable 2 0.34644 0.19024 0.34644 0.19024 2.22E 16 2.78E 17 Variable 3 0.09653 0.02724 0.09653 0.02724 1.80E 16 1.73E 17 45

Example 3 PCORnet Bariatric Study Use of bariatric surgery has expanded considerably Evidence on the comparative effectiveness and safety of these procedures is limited 46

Study design Comparisons Main analysis RYGB vs. SG RYGB vs. AGB AGB vs. SG Aggregate analysis RYGB vs. SG Outcomes Weight change 1, 3, and 5 yrs postsurgery Diabetes remission and relapse Major adverse events Weight change 1 yr post surgery Analysis One model that combines all data Additional data driven approaches to select covariates Site specific PS model Fixed set of covariates 47

48

49

Combining propensity scores with distributed regression Parameter estimate Standard error Variable Pooled individuallevel data analysis Pooled individuallevel data analysis RYGB vs. SG 0.05470 0.00113 PS stratum 1 Reference Reference PS stratum 2 0.00754 0.00209 PS stratum 3 0.00671 0.00210 PS stratum 4 0.00717 0.00211 PS stratum 5 0.00034218 0.00212 PS stratum 6 0.00583 0.00213 PS stratum 7 0.00135 0.00214 PS stratum 8 0.00435 0.00216 PS stratum 9 0.00523 0.00218 PS stratum 10 0.00812 0.00222 50

Combining propensity scores with distributed regression Parameter estimate Standard error Variable Pooled individuallevel data analysis Distributed regression Pooled individuallevel data analysis Distributed regression RYGB vs. SG 0.05470 0.05470 0.00113 0.00113 PS stratum 1 Reference Reference Reference Reference PS stratum 2 0.00754 0.00754 0.00209 0.00209 PS stratum 3 0.00671 0.00671 0.00210 0.00210 PS stratum 4 0.00717 0.00717 0.00211 0.00211 PS stratum 5 0.00034218 0.00034218 0.00212 0.00212 PS stratum 6 0.00583 0.00583 0.00213 0.00213 PS stratum 7 0.00135 0.00135 0.00214 0.00214 PS stratum 8 0.00435 0.00435 0.00216 0.00216 PS stratum 9 0.00523 0.00523 0.00218 0.00218 PS stratum 10 0.00812 0.00812 0.00222 0.00222 51

Example 4: Prospective surveillance of saxagliptin www.sentinelinitiative.org/sites/default/files/drugs/assessments/mini Sentinel_AMI and Anti Diabetic Agents_Protocol_0.pdf 52 52

http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm071627.pdf 53 53

SAVOR TIMI 53 Trial 54 54

Prospective surveillance of saxagliptin www.sentinelinitiative.org/sites/default/files/drugs/assessments/mini Sentinel_AMI and Anti Diabetic Agents_Protocol_0.pdf 55 55

Saxagliptin vs. sitagliptin 56 56

Saxagliptin vs. pioglitazone 57 57

Saxagliptin vs. sulfonylureas 58 58

Saxagliptin vs. long acting insulin 59 59

Comparisons with SAVOR TIMI 53 trial Characteristics SAVOR TIMI 53 Trial Mini Sentinel surveillance* Comparator Placebo Select anti hyperglycemics No. saxagliptin users 8,280 82,264 No. comparator users 8,212 146,045 to 452,969 Interim results from the first 5 sequential analyses were No. AMI in saxagliptin 265 94 to 171 made available to FDA prior to the publication of SAVOR No. AMI in comparator 278 75 to 1,085 TIMI 53 findings Length of follow up 2.1 years (median) 4 to 8 months (mean) Statistical analysis Intention to treat As treated Hazard ratio for AMI 0.95 (95% CI: 0.80, 1.12) 0.54 to 1.17 * From end of surveillance analysis that included all patients 60 60

Overview Evolution of multi center studies Analytic methods in multi center studies Select examples Discussion 61

Analytical flexibility vs. granularity of information Analytic flexibility Individuallevel data with individual covariates Individuallevel data with summary scores Risk set data Summarytable data Intermediate statistics Effectestimate data Privacy protection 62

Analytic methods in multi center studies Covariate summarization technique Data sharing approach Covariate adjustment technique Outcome type Individual covariates* Individual level data Matching Continuous What to share? How to share? What can we do? What outcome? Propensity scores Summary table data Stratification Binary Disease risk scores Risk set data Restriction Count Summary scores + individual covariates Effect estimate data Weighting Survival A hybrid of above Intermediate statistics Modeling 63

Analytic methods in multi center studies Covariate summarization technique Data sharing approach Covariate adjustment technique Outcome type Individual covariates Individual level data Matching Continuous Propensity scores Summary table data Stratification Binary Disease risk scores Risk set data Restriction Count Summary scores + individual covariates Effect estimate data Weighting Survival A hybrid of above Intermediate statistics Modeling 64

Conclusion A suite of analytic methods are available for multi center studies There are often trade offs between analytic flexibility and identifiability of information shared Some newer methods offer excellent analytic flexibility and good privacy protection 65

Darren_Toh@harvardpilgrim.org @darrentoh_epi https://www.distributedanalysis.org 66