How to Present Results of Regression Models to Clinicians

Similar documents
9 Generalized Linear Models

Logistic Regression Models to Integrate Actuarial and Psychological Risk Factors For predicting 5- and 10-Year Sexual and Violent Recidivism Rates

Single-level Models for Binary Responses

An Analysis. Jane Doe Department of Biostatistics Vanderbilt University School of Medicine. March 19, Descriptive Statistics 1

Simple logistic regression

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Lecture 12: Effect modification, and confounding in logistic regression

STAT 7030: Categorical Data Analysis

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Lecture 10: Introduction to Logistic Regression

BIOS 312: MODERN REGRESSION ANALYSIS

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Multinomial Logistic Regression Models

Lecture 2: Poisson and logistic regression

Introduction to mtm: An R Package for Marginalized Transition Models

Sociology 362 Data Exercise 6 Logistic Regression 2

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Binary Logistic Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Lecture 5: Poisson and logistic regression

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

8 Nominal and Ordinal Logistic Regression

Linear Regression With Special Variables

Semiparametric Generalized Linear Models

Correlation and Simple Linear Regression

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Categorical data analysis Chapter 5

Relative Risk Calculations in R

Machine Learning Linear Classification. Prof. Matteo Matteucci

Generalized Linear Modeling - Logistic Regression

General Linear Model (Chapter 4)

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

Logistic Regression Analyses in the Water Level Study

Homework Solutions Applied Logistic Regression

Logistic Regression. 1 Analysis of the budworm moth data 1. 2 Estimates and confidence intervals for the parameters 2

MAS3301 / MAS8311 Biostatistics Part II: Survival

Lecture 4 Multiple linear regression

Lecture 7 Time-dependent Covariates in Cox Regression

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Lecture 1 Introduction to Multi-level Models

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Interpretation of the Fitted Logistic Regression Model

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Classification. Chapter Introduction. 6.2 The Bayes classifier

STA6938-Logistic Regression Model

Generalized Linear Models

Statistics in medicine

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Review of Statistics 101

Unit 11: Multiple Linear Regression

β j = coefficient of x j in the model; β = ( β1, β2,

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Modelling Survival Data using Generalized Additive Models with Flexible Link

Generalized logit models for nominal multinomial responses. Local odds ratios

Lab 11. Multilevel Models. Description of Data

Basic Medical Statistics Course

STAT 525 Fall Final exam. Tuesday December 14, 2010

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Varieties of Count Data

Introduction to logistic regression

Cohen s s Kappa and Log-linear Models

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

STAT 526 Spring Final Exam. Thursday May 5, 2011

Probabilistic Index Models

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

Testing Independence

Homework 1 Solutions

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Binary Dependent Variables

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

The nltm Package. July 24, 2006

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Linear Regression Models P8111

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

A Re-Introduction to General Linear Models (GLM)

Regression Methods for Survey Data

Big Data Analytics Linear models for Classification (Often called Logistic Regression) April 16, 2018 Prof. R Bohn

Logistic Regression. Continued Psy 524 Ainsworth

Power and Sample Size Calculations with the Additive Hazards Model

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Transcription:

How to Present Results of Regression Models to Clinicians Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine f.harrell@vanderbilt.edu biostat.mc.vanderbilt.edu/fhhandouts

Outline 1. Regression models: Uses and problems 2. Example of interpreting coefficients 3. Goals of software to assist analyst 4. Software 5. Example binary logistic regression fit 6. Typesetting model fits 7. Summarizing effects of predictors 8. Displaying shape of effect of a predictor 9. Displaying how predicted values are formed 10. Tree approximations 11. Interactively obtaining predictions 1

Uses of Models for Individual Patients Probability of having a specific disease Probability of getting a disease by t years Lifetime risk of disease Expected time until disease development Life expectency given risk factors, disease damage Risk reduction due to a new drug 2

Problems in Obtaining Predictions Many predictor variables Predictor variables act nonlinearly Predictor variables act non additively (e.g., multiply) Often make a nonlinear transformation of the linear predictor (Xβ = β 0 + β 1 X 1 +... + β p X p ) to obtain predicted response Want patterns and predictions to be understandable to clinicians Regression models are not black boxes 3

Example: Interpreting Coefficients E(Y age,sex) = β 0 + β 1 age + β 2 (sex = f) + β 3 age(sex = f), where sex = f is a dummy indicator variable for sex=female, i.e., reference cell is sex=male. Parameter Meaning β 0 E(Y age = 0, sex = m) β 1 β 2 β 3 E(Y age = x + 1, sex = m) E(Y age = x, sex = m) E(Y age = 0, sex = f) E(Y age = 0, sex = m) E(Y age = x + 1, sex = f) E(Y age = x, sex = f) [E(Y age = x + 1, sex = m) E(Y age = x, sex = m)] 4

Goals of Software 1. Document and represent fitted model algebraically 2. Estimating odds ratios for meaningful changes in predictors, in presence of nonlinearity, interaction 3. Displaying multiple level confidence bars 4. Understanding predictor transformation/strength 5. Easily obtaining predicted values 5

Software R (www.r-project.org), S-PLUS (Insightful Corp.) Object oriented, user extendible Design add on package (Harrell) Design freely available (as is R) 6

Example Logistic Regression Model Re analysis of bacterial vs. viral meningitis (Spanos et al. JAMA 1989) n = 422 (217 bacterial) time.summer function(x) pmin(abs(x-7.5), abs(x+12-7.5)) f lrm(abm lsp(age, c(1, 2, 22)) + time.summer(month) + rcs(glratio, 5) + tpolysˆ0.33333) 7

Coef S.E. Wald Z P Intercept 6.4887 7.73326 0.84 0.4014 age 2.1031 0.96654 2.18 0.0296 age -4.7292 1.58966-2.97 0.0029 age 2.5289 0.76804 3.29 0.0010 age 0.1754 0.06120 2.87 0.0042 month 0.4928 0.11878 4.15 0.0000 glratio -26.5169 33.19172-0.80 0.4243 glratio 37.0256 114.38492 0.32 0.7462 glratio -44.7037 275.91374-0.16 0.8713 glratio -2.2335 264.53421-0.01 0.9933 tpolys 0.2543 0.05976 4.26 0.0000 8

anova(f) χ 2 d.f. P age 33.1 4 < 0.0001 Nonlinear 32.9 3 < 0.0001 month 17.2 1 < 0.0001 glratio 22.8 4 < 0.0001 Nonlinear 22.4 3 0.0001 tpolys 18.1 1 < 0.0001 TOTAL NONLINEAR 49.3 6 < 0.0001 TOTAL 81.9 10 < 0.0001 9

Typesetting Model Fits latex(f) Prob{abm = 1} = 1 1 + exp( Xβ), where X ˆβ = 6.49 +2.103 age 4.729(age 1) + +2.529(age 2) + + 0.175(age 22) + +0.493 time.summer(month) 10

26.52 glratio +42.61(glratio 0.0395) 3 + 51.45(glratio 0.2712) 3 + 2.57(glratio 0.4954) 3 + 7.24(glratio 0.6317) 3 + +18.65(glratio 0.9716) 3 + +0.254 tpolys 0.33333 and (x) + = x if x > 0, 0 otherwise. 11

Summarizing Effects of Predictors Even if model is simple, must choose increment carefully Odds ratio for 1 cell increase in leukocyte count? If predictor is nonlinear and is represented as multiple model terms, can t look at individual coefficients Odds ratio for a 1 unit increase in age 2 holding age constant? Default: inter quartile range ORs Get predicted log odds at lower and upper quartiles, take difference, anti log Set levels of interacting factors 12

s summary(f, month=c(1.5, 7.5)) print(s) Effects Response : abm Factor Low High Effect S.E. age 0.600 20.000-3.53 0.68 Odds Ratio 0.600 20.000 0.03 month 1.500 7.500-2.96 0.71 Odds Ratio 1.500 7.500 0.05 glratio 0.321 0.629-2.74 0.68 Odds Ratio 0.321 0.629 0.06 tpolys 10.950 1087.500 2.05 0.48 Odds Ratio 10.950 1087.500 7.77 13

plot(s) Odds Ratio 0.0001 0.1000 4.0000 age - 20:0.6 month - 9:5 glratio - 0.63:0.32 tpolys - 1088:11 0.99 0.9 0.7 0.8 0.95 14

Displaying Shape of Effect Fix levels of interacting factors Vary X, show how X is related to logit or prob. Just a problem in getting predictions, remembering all variable transformations plot(f) 15

log odds -2 2 6 0 20 40 60 Age log odds -2 2 6 0 2 4 6 8 10 Month log odds -2 2 6 0.0 0.4 0.8 Glucose Ratio log odds -2 2 6 0 5000 15000 Total Polymorphs 16

plot(f, age=na, month=na) -6-4-2 0 2 4 log odds 1210 8 6 4 2 Month 0 10 20 30 40 50 60 Age 17

Displaying How Predictions are Formed Nomogram One axis per predictor And per level of interacting factors Use point scores Show X ˆβ and nomogram(f, Prob fun=function(x) 1/(1+exp(-x)), funlabel= Probability of ABM ) 18

Points Age Month Glucose Ratio 0 10 20 30 40 50 60 70 80 90 100 20 10 2 30 40 50 0 60 70 1 7.5 5.5 3.5 8.0 10.0 1.0 0.8 1.0 1.2 1.4 0.7 0.4 0.3 0.2 0.1 0.0 Total Polymorphs 0 1000 3000 7000 20000 40000 Total Points Linear Predictor 0 20 40 60 80 100 140 180 220 260-6 -2 2 6 10 14 18 22 Probability of ABM 0.01 0.20 0.90 19

Examples of Manually Drawn Nomograms 20

21

Use a Regression Tree to approximate desired degree of accuracy Tree Approximations Prob to a Can compute estimation error as a function of # nodes pred predict(f, type= fitted ) f.approx tree(pred age + month + glratio + tpolys) 22

glratio<0.445363 glratio<0.377322 tpolys<357.254 glratio<0.328948age<13.5 age<1.5 tpolys<1462.15 0.992 0.875 0.823 0.404 tpolys<33.15 age<62 age<1.5 0.914 month<5.5 0.493 month<11.5 0.500 0.718 0.336 0.472 0.176 month<3.5 0.222 0.177 0.045 23

Interactively Obtaining Predictions drep datarep( roundn(age,10) + sex + pclass + roundn(sibsp, clip=0:1)) Dialog(fitPar( f.mi, lp=f, fun=list( Prob[Survival] =plogis)), limits= data, basename= Titanic, vary=list(sex=c( female, male )), datarep=drep) runmenu.titanic() 24

Summary Automatic typesetting useful for simplifying nonlinear terms and interactions Estimates of effects should take into account scale, nonlinearities, interactions Variety of graphical methods useful for displaying strength, shape, predicted values Special interactive programs allow one to obtain predictions and confidence limits 25

Abstract Regression models can flexibly summarize relationships of multiple variables to clinical outcomes. With increased flexibility, such as incorportation of nonlinear effects or interactions, comes greater difficulty of interpreting model coefficients. In this talk I will discuss graphical and semi-graphical methods for describing a fitted regression model, using the binary logistic model as an example. The methods demonstrated include typesetting the model equation, summarizing the effects of predictors with odds ratio charts, displaying the shape of the effect of a predictor, drawing nomograms, and using trees to approximate the model. These methods will be demonstrated using a re analysis of a model to predict the probability that a patient will have bacterial meningitis (Spanos et al., JAMA 1989). 26