Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Similar documents
Machine Learning for OR & FE

Regression, Ridge Regression, Lasso

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 262: Intermediate Biostatistics Model selection

Linear regression methods

Machine Learning Linear Regression. Prof. Matteo Matteucci

Modelling geoadditive survival data

Data Mining Stat 588

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Machine Learning Linear Classification. Prof. Matteo Matteucci

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

How the mean changes depends on the other variable. Plots can show what s happening...

Nonparametric Bayesian Methods (Gaussian Processes)

Model Selection. Frank Wood. December 10, 2009

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Cross-Validation with Confidence

STA414/2104 Statistical Methods for Machine Learning II

Cross-Validation with Confidence

ECE521 week 3: 23/26 January 2017

Stat 5101 Lecture Notes

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Lecture 14: Shrinkage

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

ISyE 691 Data mining and analytics

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

An Introduction to Statistical and Probabilistic Linear Models

6. Regularized linear regression

Experimental Design and Data Analysis for Biologists

Statistical Inference

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

PENALIZING YOUR MODELS

High-dimensional regression modeling

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Sparse Linear Models (10/7/13)

ECE521 lecture 4: 19 January Optimization, MLE, regularization

STAT 518 Intro Student Presentation

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

y(x) = x w + ε(x), (1)

Generalized Additive Models

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Introduction. Chapter 1

STA 4273H: Statistical Machine Learning

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Proteomics and Variable Selection

Probability and Information Theory. Sargur N. Srihari

Bayesian Regression Linear and Logistic Regression

STA 4273H: Sta-s-cal Machine Learning

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Terminology for Statistical Data

How to Present Results of Regression Models to Clinicians

A Modern Look at Classical Multivariate Techniques

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Functional Estimation in Systems Defined by Differential Equation using Bayesian Smoothing Methods

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

High-Dimensional Statistical Learning: Introduction

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

A general mixed model approach for spatio-temporal regression data

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Plausible Values for Latent Variables Using Mplus

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Swarthmore Honors Exam 2012: Statistics

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

STAT 100C: Linear models

Interaction effects for continuous predictors in regression modeling

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Uncertainty Quantification for Inverse Problems. November 7, 2011

Chart types and when to use them

Generalized Elastic Net Regression

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Chapter 3. Linear Models for Regression

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Pattern Recognition and Machine Learning

Nonparametric Small Area Estimation Using Penalized Spline Regression

Multiple Linear Regression for the Supervisor Data

Regularization: Ridge Regression and the LASSO

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Building a Prognostic Biomarker

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Bayesian Machine Learning

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Regularization in Cox Frailty Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Machine Learning Lecture 7

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Dimension Reduction Methods

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Linear model selection and regularization

Linear Methods for Regression. Lijun Zhang

Transcription:

in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04

Fractional polynomials, regression splines, penalized splines, smoothing splines, nonparametric smoothers, etc. Advantages of regression splines Shape flexibility; allows sharp changes and long flat segments Can pre-set d.f. Simple progression of form as knots added Origin-free when knots are selected from marginal distribution of X, i.e., f (X + c) is a horizontal shift from f (X ) Allows X 0 Extension to tensor splines for interaction modeling Easy to penalize some types of complexity

Setting Exploratory vs. formal/predictive analysis Too many variables, too few subjects Many investigators and some statisticians feel that variable selection helps

Is misleading to the researcher Often arbitrary and unstable Attempts to obtain clear result in the presence of competition among predictors; result is clear only because it is misleading Reduce the number of variables to collect in the future Some of the information in the data is spent on variable selection instead of using all information for estimation

Maxwell s Demon James Clerk Maxwell Maxwell imagines one container divided into two parts, A and B. Both parts are filled with the same gas at equal temperatures and placed next to each other. Observing the molecules on both sides, an imaginary demon guards a trapdoor between the two parts. When a faster-than-average molecule from A flies towards the trapdoor, the demon opens it, and the molecule will fly from A to B. Likewise, when a slower-thanaverage molecule from B flies towards the trapdoor, the demon will let it pass from B to A. The average speed of the molecules in B will have increased while in A they will have slowed down on average. Since average molecular speed corresponds to temperature, the temperature decreases in A and increases in B, contrary to the second law of thermodynamics. Szilárd pointed out that a real-life Maxwell s demon would need to have some means of measuring molecular speed, and that the act of acquiring information would require an expenditure of energy. Since the demon and the gas are interacting, we must consider the total entropy of the gas and the demon combined. The expenditure of energy by the demon will cause an increase in the entropy of the demon, which will be larger than the lowering of the entropy of the gas. commons.wikimedia.org/wiki/file:youngjamesclerkmaxwell.jpg en.wikipedia.org/wiki/maxwell s_demon

, continued Example Method Apparent Rank Over- Bias-Corrected Correlation of Optimism Correlation Predicted vs. Observed Full Model 0.50 0.06 0.44 Stepwise Model 0.47 0.05 0.42 Model specification is preferred to model selection Information content of the data is usually insufficient for reliable variable selection Bootstrap or repeating the selection on a new sample expose the difficulty of the task

Stepwise Methods Can t Select the Right s 8 candidate predictors, 4 have β 0, Gaussian errors Stepwise selection using AIC Generalized d.f.: gdf : n Correct gdf ˆσ 2 /σ 2 Model 20 0 8.1 0.70 40 0 7.3 0.87 150 0.13 9.1 0.97 300 0.38 9.3 0.98 2000 0.52 6.3 1.00 SSE n gdf 1 unbiased for σ2 Harrell 2015, Ye 1998, Freedman, Pee, and Midthune 1992

Bootstrapping Importance Ranks of Predictors n = 300, 12 predictors, β i = i, σ = 9; rank partial χ 2 predictor x12 x11 x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 1 2 3 4 5 6 7 8 9 10 11 12 Rank Harrell 2015

Pooled Tests Instead of Example: semiparametric ordinal model for predicting glycohemoglobin in NHANES cohort There are better body size measures for predicting preclinical diabetes than height & weight Body size measures compete Compute chunk test leg sub tri waist re size age 0 100 200 300 400 500 600 700 χ 2 df Harrell 2015

Many good approaches exist for developing accurate predictive models, e.g. Data reduction, e.g., variable clustering; summarizing clusters with PC 1 Penalized MLE with large numbers of predictors and nonlinear terms Machine learning algorithms Predictive discrimination is best when the model is not decoded/simplified Prediction Uncertainty Principle: a prediction tool can be optimally predictive or parsimonious but not both Related to LJ Savage s anti-parsimony principle

Aside: lasso and Related Methods Trade one problem (high p n ) for another Assuming all effects are linear Scaling problems make it difficult to know how to include nonlinear terms

Advantages of Model Apparent d.f. = actual d.f. Accurate type I error Accurate confidence intervals Regression splines: pre-specify number & location of knots Most found interactions turn out to be false

Perspective of a Bayesian Modeler Everything is pre-specified but general enough to allow flexibility Prior distributions for regression coefficients generally centered at zero to bring skepticism/shrinkage Gaussian prior quadratic penalty (ridge regression) Laplace (double exponential) prior lasso/variable selection Terms are not just in or out of the model; they can be half in Natural for interaction terms Bayesian posterior provides formal inference With frequentist penalized MLE formal inference not fully developed

One Future Direction: Modeling Precision medicine, aka personalized medicine, aka heterogeneity of treatment effect (New name scheduled for 2017) Most clinicians attempt to use subgroup analysis; this is doomed Needs to be well planned, formal interaction analysis effects have lower precision than main effects, and more co-linearity problems Main effects need to be liberally and nonlinearly accounted for before considering interaction effects Dimensionality and identifiability problem Does the drug work more on older patients, or patients with more disease?

Reduced Rank Modeling Main effects fully expanded as usual PC 1, PC 2,..., PC k for interaction terms of dimension K > k k chosen to be the maximum the effective sample size will fully support Attempt to interpret linear combination of PCs in terms of constituent variables Need to develop more sensible restrictions than PCs, e.g., penalty function E.g. penalize interact effects for poorly distributed predictors Penalize for co-linear predictors Bayesian approaches

References Freedman, L. S., D. Pee, and D. N. Midthune (1992). The Problem of Underestimating the Residual Error Variance in Forward Stepwise Regression. In: Journal of the Royal Statistical Society. Series D (The Statistician) 41.4, pp. 405 412 (cit. on p. 7). Harrell, F. E. (2015). Regression Modeling Strategies, with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Second edition. New York: Springer. isbn: 978-3-319-19424-0 (cit. on pp. 7 9). Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. In: J Am Stat Assoc 93, pp. 120 131 (cit. on p. 7).