Sufficient Dimension Reduction for Longitudinally Measured Predictors

Similar documents
A Selective Review of Sufficient Dimension Reduction

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Sufficient reductions in regressions with elliptically contoured inverse predictors

Regression Graphics. R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Sufficient dimension reduction via distance covariance

Linear Models in Machine Learning

Near-equivalence in Forecasting Accuracy of Linear Dimension Reduction Method

Dimension reduction techniques for classification Milan, May 2003

Dimension Reduction in Abundant High Dimensional Regressions

Supplementary Materials for Tensor Envelope Partial Least Squares Regression

Linear Model Selection and Regularization

Regression diagnostics

Shrinkage Inverse Regression Estimation for Model Free Variable Selection

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Combining eigenvalues and variation of eigenvectors for order determination

Fused estimators of the central subspace in sufficient dimension reduction

Robustifying Trial-Derived Treatment Rules to a Target Population

Moment Based Dimension Reduction for Multivariate. Response Regression

Indirect multivariate response linear regression

Towards a Regression using Tensors

Correlation and Regression

Math 3191 Applied Linear Algebra

Maximum Likelihood Estimation

Regularization Path Algorithms for Detecting Gene Interactions

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2016, Mr. Ruey S. Tsay

Lecture 6: Discrete Choice: Qualitative Response

Chapter 3. Linear Models for Regression

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Regression: Ordinary Least Squares

High-dimensional Ordinary Least-squares Projection for Screening Variables

9.1 Orthogonal factor model.

Linear models and their mathematical foundations: Simple linear regression

DIMENSION REDUCTION AND PREDICTION IN LARGE p REGRESSIONS

Weighted Principal Support Vector Machines for Sufficient Dimension Reduction in Binary Classification

DIMENSION REDUCTION AND PREDICTION IN LARGE p REGRESSIONS

Envelopes: Methods for Efficient Estimation in Multivariate Statistics

Principal Fitted Components for Dimension Reduction in Regression

L 2,1 Norm and its Applications

A review on Sliced Inverse Regression

Simulation study on using moment functions for sufficient dimension reduction

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Residuals in the Analysis of Longitudinal Data

Marginal tests with sliced average variance estimation

Variable selection and machine learning methods in causal inference

36-720: The Rasch Model

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

PLS. theoretical results for the chemometrics use of PLS. Liliana Forzani. joint work with R. Dennis Cook

Regularization and Variable Selection via the Elastic Net

SUFFICIENT DIMENSION REDUCTION IN REGRESSIONS WITH MISSING PREDICTORS

STAT 518 Intro Student Presentation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Tensor Envelope Partial Least Squares Regression

CURRICULUM VITAE. Heng Peng

BIOS 2083 Linear Models c Abdus S. Wahed

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

ECE521 week 3: 23/26 January 2017

PENALIZED MINIMUM AVERAGE VARIANCE ESTIMATION

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

For more information about how to cite these materials visit

STAT 100C: Linear models

TAMS39 Lecture 2 Multivariate normal distribution

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Final Exam. Economics 835: Econometrics. Fall 2010

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Recent Advances in the analysis of missing data with non-ignorable missingness

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

A GENERAL THEORY FOR NONLINEAR SUFFICIENT DIMENSION REDUCTION: FORMULATION AND ESTIMATION

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Sliced Inverse Regression

Introduction to Simple Linear Regression

High-dimensional regression modeling

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Sparse Linear Models (10/7/13)

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

(Received April 2008; accepted June 2009) COMMENT. Jinzhu Jia, Yuval Benjamini, Chinghway Lim, Garvesh Raskutti and Bin Yu.

Generalized Linear Models. Kurt Hornik

Lecture 3. Truncation, length-bias and prevalence sampling

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Marginal Screening and Post-Selection Inference

Sliced Inverse Regression for big data analysis

Robust Variable Selection Through MAVE

Learning Task Grouping and Overlap in Multi-Task Learning

Ordinary Least Squares Regression

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

10. Linear Models and Maximum Likelihood Estimation

Introduction to Estimation Methods for Time Series models Lecture 2

Machine learning - HT Maximum Likelihood

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

MS&E 226: Small Data

Linear Regression. Junhui Qian. October 27, 2014

Factor Analysis. Qian-Li Xue

Machine Learning for OR & FE

Fractional Imputation in Survey Sampling: A Comparative Review

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Transcription:

Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver 2018

Motivation Biomarkers measured over time used to model disease onset/progression Examples: longitudinal PSA for prostate cancer onset/progression; longitudinal CA125 for ovarian cancer diagnosis Ideal: a single marker with high specificity and sensitivity Such high performance markers are mostly not available Possible strategy: Combine information from multiple longitudinal marker measurements

Statistical Problem Combine correlated markers into composite marker score for regression modeling and classification Account for longitudinal nature of marker measurements Identify markers truly associated with outcome and remove irrelevant and redundant markers from marker score to make results more interpretable, facilitate replication and translation of findings to clinical settings improve prediction

Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt Possible approach: Ignore time/matrix structure of X Reshape p T -matrix X R p T as pt 1 vector, vec(x) Drawback: Ignoring structure can lead to loss of accuracy in estimation that is reflected in loss of discriminatory ability

Sufficient Dimension Reduction (SDR) in Regression Y R - response X R p - marker (predictor) vector Goal: Model F (Y X) R : R p R d with d p = dim(x), such that F (Y X) = F (Y R(X)) i.e. replace X by R(X) without loss of information on Y X R(X) is sufficient reduction

Estimate R: SDR using Inverse Regression Find R(X) such that X and R(X) have same information about Y If R(X) is sufficient reduction for forward regression Y X then it is also sufficient for inverse regression (IR) X Y (Cook, 07) Advantage: p-dimensional multiple regression of Y on X replaced by p univariate regressions X i on Y Most SDR methods assume linear reduction R(X) = η X and estimated η based on moments of X Y

Estimate S(η): First Moment Based Linear SDR General Idea: find a kernel matrix M so that S(M) S(η) First moment SDR methods: If E(X η T X) linear in η T X, S FMSDR = Σ 1 x S(µ Y µ) S(η) µ Y = E(X Y ), µ = E(X), Σ x = cov(x) Sliced Inverse Regression (SIR, Li, 1991): S(Σ 1 x cov(e(x Y )) S(η)

FMSDR Estimation: Parametric Inverse Regression (PIR) (Bura & Cook, 2001) Assume linear IR model with µ Y µ = Bf y X y := X (Y = y) = µ + Bf y + ɛ where f y : r 1 vector of functions in y with E(f y ) = 0 B : p r unconstrained parameter matrix E(ɛ) = 0 and var(ɛ Y ) = var(x Y ) = Y Thus S FMSDR = Σ 1 x S(B)

Estimation of Sufficient Reduction in PIR random sample (Y i, X i ), i = 1,..., n X : n p matrix with rows (X y X) T, X = n i=1 X i/n F: n r matrix with rows (f y f) T, f = n i=1 f y i /n Ordinary least squares (OLS) estimate B= (F T F) 1 F T X Ŝ FMSDR = ˆΣ 1 X span( B) dim(ŝfmsdr) = rank( B) p estimate dimension B using rank tests

Likelihood-based SDR: Principal Fitted Components (PFC) (Cook & Forzani, 08) Assume normal linear IR model with µ Y µ = Γγf y X y = µ + Γγf y + ε, ε N p (0, ) Fix dim(ŝfmsdr) = d Parameterize B = Γγ Γ R p d denotes basis of S FMSDR, with Γ T Γ = I d. γ R d r, d r is unrestricted rank d parameter matrix

Recall: Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

Inverse Regression Model for Longitudinal Predictors To accommodate time structure of X Y, assume centered first moment of X can be decomposed into time and marker component in vector notation: where X y := X (Y = y) = µ + βf y α + ε vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ε) f y : k r known function of y α R p r captures mean structure of X regardless of time β R T k captures mean structure over time

Example: Binary outcome, Y = 0, 1 When does moment assumption hold? If means of markers change over time only by multiplicative factor that affects all markers equally and is the same for Y = 0 and Y = 1, vec(e(x Y )) = α y β Then, vec(e(x t Y )) = β t α y Using p y = P(Y = y) and E(X) = p 0 α 0 β + p 1 α 1 β, vec(e(x Y = y) E(X)) = (1 p y )(α 0 α 1 ) β = f y (α 0 α 1 ) β First order moment condition is satisfied with f y = (1 p y )

First Moment Subspace for Longitudinal Predictors Σ x = cov( vec(x)) R pt pt, and = E( y ), S FMSDR = Σ 1 x S(α β) = 1 S(α β) Pfeiffer, Forzani and Bura (2012) extended SIR to estimate S FMSDR Ding and Cook (2014) developed model-based dimension folding PCA and dimension folding PFC, obtain MLEs when X Y normal var(ɛ) is identity or separable ( = R C ) and Σ x = Σ R Σ C is also separable

General PIR and PFC for Longitudinal Predictors Model vec(x yi ) = vec( µ) + (α β) vec(f yi ) + vec(ɛ i ) for random sample (Y i, X i ), i = 1,..., n as where X y : n pt (centered) X y = F y (α β) + ɛ F y : n kr (centered functions of Y ) α R p r, and β R T k ɛ : n pt with E(ɛ) = 0, var ( vec(ɛ)) = I n Y

Least Squares Estimation for Kronecker Product Mean Model (K-PIR): Model X y = F y (α β) + ɛ Estimate α and β using two step approach: 1 Find α and β that minimize (F T F) 1 F T X α β 2 using algorithm by Van Loan & Pitsianis, 1993 (VLP) 2 Compute least squares estimate = 1 n rank(f y ) n (X yi F y ( α β) ) ( X yi F y ( α β) ) i

Least Squares Estimation for Kronecker Product Mean Model (K-PIR): Model X y = F y (α β) + ɛ Estimate α and β using two step approach: 1 Find α and β that minimize (F T F) 1 F T X α β 2 using algorithm by Van Loan & Pitsianis, 1993 (VLP) 2 Compute least squares estimate = 1 n rank(f y ) n (X yi F y ( α β) ) ( X yi F y ( α β) ) i Theorem: If α and β minimize (F T F) 1 F T X α β 2, then α β p α β

Kronecker Product PFC (K-PFC) Assume set Then, vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) α = Γ 1 γ 1 : p r Γ 1 : basis for d 1 dimensional subspace span(α) with Γ 1Γ 1 = I d1 γ 1 : unconstrained d 1 r matrix β = Γ 2 γ 2 : T k Γ 2 : basis for d 2 dimensional subspace span(β) with Γ 2Γ 2 = I d2 γ 2 : unconstrained d 2 k matrix vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ)

Kronecker product PFC, cont. Under model we obtain and vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ) S ( µ y µ ) = S(Γ) = S(Γ 1 Γ 2 ) dim(s ( µ y µ ) ) = rank(γ 1 Γ 2 ) = d 1 d 2 S FMSDR = Σ 1 x S(Γ 1 Γ 2 ) When in addition Σ x = Σ R Σ C, S FMSDR = span ( Σ 1 R Γ 1 Σ 1 C Γ ) 2

Maximum Likelihood Estimates under Kronecker Product Mean Sructure (K-MLEs) Assume vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) sample mean X is MLE of µ Obtain MLEs α and β by iteratively maximizing log-likelihood For given α and β, MLE = 1 n n i ( X yi ( α β) ) ( vec(f yi ) X yi ( α β) ) vec(f yi )

Variable Selection 1 Apply CISE (coordinate-independent sparse sufficient dimension reduction estimator, Chen, Zou and Cook, 2010) to obtain sparse solution Γ for general penalized LS problem 2 Minimize Γ Γ 1 Γ 2 2 to obtain Γ 1 and Γ 2. 3 Sparse estimate of sufficient reduction is Σ 1 x ( Γ 1 Γ 2 ). 4 This approach excludes combinations of markers and time points that are irrelevant to response, but not predictors or time points separately.

Simulation 1: Continuous Y, Full Rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 6 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR 0.046 0.314 0.186 K-PFC 0.046 0.284 0.052 MLE 0.009 0.285 0.017 5000 K-PIR 0.014 0.091 0.031 K-PFC 0.014 0.090 0.016 MLE 0.003 0.090 0.001 10000 K-PIR 0.010 0.064 0.018 K-PFC 0.010 0.064 0.011 MLE 0.002 0.064 0.011 angle smallest principal angle between subspaces (Zhu and Knyazev, 2012)

Simulation 2: Continuous Y, α, β not full rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 2 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR 0.113 0.314 5.268 K-PFC 0.063 0.284 0.583 MLE 0.173 0.554 0.723 5000 K-PIR 0.034 0.091 0.746 K-PFC 0.020 0.090 0.163 MLE 0.031 0.117 0.182 10000 K-PIR 0.024 0.064 0.427 K-PFC 0.014 0.064 0.116 MLE 0.006 0.064 0.120

Simulation 3: Binary Y p = 5, T = 5, r = k = 1, rank(α) = rank(β) = 1 n Method α β α β α β Angle(Ŝ, S) 500/500 K-PIR 0.132 0.122 16.430 K-PFC 0.132 0.122 16.430 MLE 0.123 0.121 15.622 SIR 22.911 1000/1000 K-PIR 0.090 0.086 11.553 K-PFC 0.090 0.086 11.553 MLE 0.087 0.085 11.044 SIR 16.360

Summary We provide fast and efficient algorithms for computing sufficient reduction in regressions/classifications for longitudinally measured/matrix valued predictors Simple to implement No convergence issues even for large dimensions PFC based estimates are efficient Simultaneous variable selection

References Bura, E. and Cook, R. D. (2001). Estimating the structural dimension of regressions via parametric inverse regression. J. R. Statist. Soc. B 63, 393 410. Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection, The Annals of Statistics, 38, 3696 3723. Cook, R.D. and Forzani L. (2008). Principal Fitted Components for Dimension Reduction in Regression. Statistical Science, 23, 485 501. Ding, S. and Cook, R. D. (2014). Dimension folding PCA and PFC for matrix-valued predictors. Statistica Sinica, 24, 463 492. Li, B., Kim, K. M. and Altman, N. (2010). On dimension folding of matrix or array-valued statistical objects. Ann. Statist. 38, 1094-1121. Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Am. Statist. Assoc., 86, 316 342. Pfeiffer, R., Forzani, L. and Bura, E. (2012). Sufficient Dimension Reduction for Longitudinally Measured Predictors. Statistics in Medicine, Special Issue: Biomarker Working Group: Issues in the Design and Analysis of Epidemiological Studies with Biomarkers, 31(22), 2414 2427. Van Loan, C. F. and Pitsianis, N. (1993). Approximation with Kronecker Products, Linear Algebra for Large Scale and Real-Time Applications, 293 314.