Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Similar documents
Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Introduction. 19 April 2012

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Building a Prognostic Biomarker

Lecture 7 Time-dependent Covariates in Cox Regression

Sample Size Determination

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Introduction to Statistical Analysis

BIOS 2083: Linear Models

APPENDIX B Sample-Size Calculation Methods: Classical Design

Consider Table 1 (Note connection to start-stop process).

Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Bios 6648: Design & conduct of clinical research

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Survival Analysis Math 434 Fall 2011

Beyond GLM and likelihood

Joint Modeling of Longitudinal Item Response Data and Survival

β j = coefficient of x j in the model; β = ( β1, β2,

Lecture Discussion. Confounding, Non-Collapsibility, Precision, and Power Statistics Statistical Methods II. Presented February 27, 2018

Lecture 8 Stat D. Gillen

Survival Analysis. Stat 526. April 13, 2018

Ph.D. course: Regression models

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

TMA 4275 Lifetime Analysis June 2004 Solution

High-Throughput Sequencing Course

Correlation and regression

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Statistics in medicine

Confidence Intervals for the Interaction Odds Ratio in Logistic Regression with Two Binary X s

Pubh 8482: Sequential Analysis

Extensions of Cox Model for Non-Proportional Hazards Purpose

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Multistate models and recurrent event models

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Lecture 2: Poisson and logistic regression

More Statistics tutorial at Logistic Regression and the new:

Analysing data: regression and correlation S6 and S7

Lecture 10: Introduction to Logistic Regression

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Statistical Methods for Alzheimer s Disease Studies

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Bios 6648: Design & conduct of clinical research

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Multivariate Survival Analysis

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Propensity Score Analysis with Hierarchical Data

Logistic regression model for survival time analysis using time-varying coefficients

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Estimating direct effects in cohort and case-control studies

Example name. Subgroups analysis, Regression. Synopsis

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Multistate models and recurrent event models

Lecture 5: Poisson and logistic regression

Time-dependent covariates

Turning a research question into a statistical question.

Extending causal inferences from a randomized trial to a target population

Introduction to Logistic Regression

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

MAS3301 / MAS8311 Biostatistics Part II: Survival

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Survival Analysis for Case-Cohort Studies

Power and Sample Size Calculations with the Additive Hazards Model

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Rank preserving Structural Nested Distribution Model (RPSNDM) for Continuous

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

General Regression Model

LOGISTIC REGRESSION Joseph M. Hilbe

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

ST5212: Survival Analysis

ST745: Survival Analysis: Cox-PH!

Incorporating published univariable associations in diagnostic and prognostic modeling

Lecture 4 Multiple linear regression

Effect Modification and Interaction

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

STAT331. Cox s Proportional Hazards Model

Transcription:

Subgroup analysis using regression modeling multiple regression Aeilko H Zwinderman

who has unusual large response? Is such occurrence associated with subgroups of patients? such question is hypothesis-generating: to refine patient- or dose-selection subgroup-analyses are -by nature- almost surely underpowered: => regression model regression modeling may increase efficiency correct for confounding investigfate interaction / synergism be used for prediction

regression models: many possibilities quantitative data: linear/nonlinear regression models discrete data: (probit) logistic regression censored data: Cox regression

general form: E[Y i X i ] = g -1 ( 0 + 1 X 1i + 2 X 2i +... + k X ki ) Var[Y i X i ] = e 2 Y is the dependent variable (primary efficacy variable) X is a covariate, predictor or independent variable g is the link-function is a regression parameter (which must be estimated from the data)

linear model: Y = quantitative variable X = quantitative or discrete variable Y i X X... 0 1 1i 2 2i k X ki e i is a direct effect: difference in mean of Y if X changes 1 unit assumptions: a. linarity of the relation between Y and X b. normality: Y is normally distributed for any given value of X c. homogeneity: Y has the same variance for any given value of X

logistic model: Y = binary variable (i.e. 1 or 0) X = quantitative or discrete variable P( Y i 1 X i ) exp( 1 exp( 0 0 1 X1 i... X... 1 1i k X ki ) X k ki ) is a log odds-ratio: change in the log(odds) that Y=1 if X changes 1 unit assumptions: a. linarity of the relation between log odds(of Y=1) and X b. the link-function g -1 has the logistic form

Cox proportional hazards regression model: Y(t) = binary status variable (i.e. 1 or 0) occurring at time t X = quantitative or discrete variable S exp( 1X1i... k Xki ) ( ti Xi) S0( t) is a log-relative risk: change in the log(hazard(t)) if X changes 1 unit assumptions: a. linarity of the relation between the log hazard(t) and X b. the relative risk is constant with time

Cum hazard Survival log(hazard) hazard X=0 X=20 X=80 X=0 X=20 X=80 0 6 12 18 24 time 0 6 12 18 24 time X=0 X=20 X=80 X=0 X=20 X=80 0 6 12 18 24 time 0 6 12 18 24 time

survival hazard X=0 X=80 X=40 0 6 12 18 24 X=40 X=0 0 10 20 30 X=80 time time 0 6 12 18 24 0 6 12 18 24 time time

regression modeling to increase precision * placebo (n=434) or pravastatin (n=438) * two years treatment * average LDL-decrease: # pravastatin: 1.23 (SD 0.68, se = 0.68/438) # placebo: -0.04 (SD 0.59, se = 0.59/434) * efficacy: 1.23 - -0.04 = 1.270 standard error = 0.043

LDL-reduction: Y i = 0 + 1 X 1i + e i X 1 = 1 if a patient receives pravastatin and zero if he/she placebo => 1 is efficacy: 1.27 (SE = 0.043 is a function of e 2 ) Suppose there is a covariate X 2 which is related to Y, but not to X 1: Y i = 0 + 1 X 1i + 2 X 2i + e i 1 remains the same but e 2 will be (much) smaller => SE( 1 ) will be smaller => increased precision

An example of a variable that might be related to Y but not to treatment is baseline LDL * is not related to treatment (randomized trial) placebo: 4.32 (SD 0.78) pravast: 4.29 (SD 0.78) p=0.60 * is (almost surely) related to LDL-decrease 2 = 0.41 (SE 0.024, p<0.0001) => efficacy: 1 = 1.27 (SE 0.037, was 0.043: 15% gain in efficiency)

LDL decrease 4 3 2 1 0-1 -2-3 2 3 4 5 6 7 baseline LDL

usually there are many many many candidates to consider: specify which ones will be used in the protocol in non-linear regression models 1 always changes by including covariates, thus its interpretation changes (often not much, but it can be greatly inflated)

regression modeling to correct for confounding a confounder is a covariate Z that is associated with both Y and X 1 distorts the interpretation of the efficacy estimate 1 what is thought to be efficacy may just reflect the unbalance of Z between treatment groups

* will not happen often in randomized trials * will happen almost always in non-randomized research * when it happens, adjustment of 1 is required Y i = 0 + * 1 X 1i + 2 Z i + e i if r xz >0 and r yz >0 then * 1 < 1 if r xz >0 and r yz <0 then * 1 > 1 if r xz <0 and r yz >0 then * 1 > 1 if r xz <0 and r yz <0 then * 1 < 1

X Y X Y direct effect of treatment X on outcome Y: no need for regression modeling Z effect of treatment X on outcome Y is confounded bij Z: regression model may correct for this X Z Y effect of treatment X on outcome Y is partly through Z: Z is an intermediate not a confounder. Do not use regression modeling: in the regression model the effect of X is split between a direct and an indirect effect.

check only the necessary (known) confounders beware of multiple testing

interaction/synergism looking for subgroups with different efficacy Y i = 0 + 1 X 1i + 2 X 2i + 3 X 1i.X 2i + e i Suppose X 2 =0 or 1: X 2 =1: Y i = ( 0 + 2 )+ ( 1 + 3 ) X 1i + e i X2=0: Y i = 0 + 1 X 1i + e i

Primary question: H 0 : 3 = 0 Example: is there interaction between statins and CCBs? Y = change of diameter of coronary vessels during statin/placebo treatment placebo no CCB 0.097 (0.20) CCB 0.130 (0.22) statin no CCB 0.088 (0.19) CCB 0.035 (0.19)

Diameter decrease 0.2 0.15 0.1 0.05 placebo statin 0 no CCB CCB

Efficacy: no CCB: 1 = 0.097-0.088 = 0.011 CCB: 1 + 3 = 0.130-0.035 = 0.095 3 = 0.095-0.011 = 0.084, p=0.011 thus, statins are significantly more effective in patients who also were prescribed CCBs.

Fellstrom et al. Rosuvastatin and cardiovascular events in patients undergoing hemodialysis. NEJM, 2009.

be careful investigating interactions: multiple testing problem do not enter too many covariates in a regression model: (k<n/10)

good models check assumptions use selection algorithms sparsely. (Instead use penalized methods, shrink regression weights) caution against optimistic results: (cross-) validation