Sufficient Dimension Reduction for Longitudinally Measured Predictors

Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver 2018

Motivation Biomarkers measured over time used to model disease onset/progression Examples: longitudinal PSA for prostate cancer onset/progression; longitudinal CA125 for ovarian cancer diagnosis Ideal: a single marker with high specificity and sensitivity Such high performance markers are mostly not available Possible strategy: Combine information from multiple longitudinal marker measurements

Statistical Problem Combine correlated markers into composite marker score for regression modeling and classification Account for longitudinal nature of marker measurements Identify markers truly associated with outcome and remove irrelevant and redundant markers from marker score to make results more interpretable, facilitate replication and translation of findings to clinical settings improve prediction

Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt Possible approach: Ignore time/matrix structure of X Reshape p T -matrix X R p T as pt 1 vector, vec(x) Drawback: Ignoring structure can lead to loss of accuracy in estimation that is reflected in loss of discriminatory ability

Sufficient Dimension Reduction (SDR) in Regression Y R - response X R p - marker (predictor) vector Goal: Model F (Y X) R : R p R d with d p = dim(x), such that F (Y X) = F (Y R(X)) i.e. replace X by R(X) without loss of information on Y X R(X) is sufficient reduction

Estimate R: SDR using Inverse Regression Find R(X) such that X and R(X) have same information about Y If R(X) is sufficient reduction for forward regression Y X then it is also sufficient for inverse regression (IR) X Y (Cook, 07) Advantage: p-dimensional multiple regression of Y on X replaced by p univariate regressions X i on Y Most SDR methods assume linear reduction R(X) = η X and estimated η based on moments of X Y

Estimate S(η): First Moment Based Linear SDR General Idea: find a kernel matrix M so that S(M) S(η) First moment SDR methods: If E(X η T X) linear in η T X, S FMSDR = Σ 1 x S(µ Y µ) S(η) µ Y = E(X Y ), µ = E(X), Σ x = cov(x) Sliced Inverse Regression (SIR, Li, 1991): S(Σ 1 x cov(e(x Y )) S(η)

FMSDR Estimation: Parametric Inverse Regression (PIR) (Bura & Cook, 2001) Assume linear IR model with µ Y µ = Bf y X y := X (Y = y) = µ + Bf y + ɛ where f y : r 1 vector of functions in y with E(f y ) = 0 B : p r unconstrained parameter matrix E(ɛ) = 0 and var(ɛ Y ) = var(x Y ) = Y Thus S FMSDR = Σ 1 x S(B)

Estimation of Sufficient Reduction in PIR random sample (Y i, X i ), i = 1,..., n X : n p matrix with rows (X y X) T, X = n i=1 X i/n F: n r matrix with rows (f y f) T, f = n i=1 f y i /n Ordinary least squares (OLS) estimate B= (F T F) 1 F T X Ŝ FMSDR = ˆΣ 1 X span( B) dim(ŝfmsdr) = rank( B) p estimate dimension B using rank tests

Likelihood-based SDR: Principal Fitted Components (PFC) (Cook & Forzani, 08) Assume normal linear IR model with µ Y µ = Γγf y X y = µ + Γγf y + ε, ε N p (0, ) Fix dim(ŝfmsdr) = d Parameterize B = Γγ Γ R p d denotes basis of S FMSDR, with Γ T Γ = I d. γ R d r, d r is unrestricted rank d parameter matrix

Recall: Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

Inverse Regression Model for Longitudinal Predictors To accommodate time structure of X Y, assume centered first moment of X can be decomposed into time and marker component in vector notation: where X y := X (Y = y) = µ + βf y α + ε vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ε) f y : k r known function of y α R p r captures mean structure of X regardless of time β R T k captures mean structure over time

Example: Binary outcome, Y = 0, 1 When does moment assumption hold? If means of markers change over time only by multiplicative factor that affects all markers equally and is the same for Y = 0 and Y = 1, vec(e(x Y )) = α y β Then, vec(e(x t Y )) = β t α y Using p y = P(Y = y) and E(X) = p 0 α 0 β + p 1 α 1 β, vec(e(x Y = y) E(X)) = (1 p y )(α 0 α 1 ) β = f y (α 0 α 1 ) β First order moment condition is satisfied with f y = (1 p y )

First Moment Subspace for Longitudinal Predictors Σ x = cov( vec(x)) R pt pt, and = E( y ), S FMSDR = Σ 1 x S(α β) = 1 S(α β) Pfeiffer, Forzani and Bura (2012) extended SIR to estimate S FMSDR Ding and Cook (2014) developed model-based dimension folding PCA and dimension folding PFC, obtain MLEs when X Y normal var(ɛ) is identity or separable ( = R C ) and Σ x = Σ R Σ C is also separable

General PIR and PFC for Longitudinal Predictors Model vec(x yi ) = vec( µ) + (α β) vec(f yi ) + vec(ɛ i ) for random sample (Y i, X i ), i = 1,..., n as where X y : n pt (centered) X y = F y (α β) + ɛ F y : n kr (centered functions of Y ) α R p r, and β R T k ɛ : n pt with E(ɛ) = 0, var ( vec(ɛ)) = I n Y

Least Squares Estimation for Kronecker Product Mean Model (K-PIR): Model X y = F y (α β) + ɛ Estimate α and β using two step approach: 1 Find α and β that minimize (F T F) 1 F T X α β 2 using algorithm by Van Loan & Pitsianis, 1993 (VLP) 2 Compute least squares estimate = 1 n rank(f y ) n (X yi F y ( α β) ) ( X yi F y ( α β) ) i

Kronecker Product PFC (K-PFC) Assume set Then, vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) α = Γ 1 γ 1 : p r Γ 1 : basis for d 1 dimensional subspace span(α) with Γ 1Γ 1 = I d1 γ 1 : unconstrained d 1 r matrix β = Γ 2 γ 2 : T k Γ 2 : basis for d 2 dimensional subspace span(β) with Γ 2Γ 2 = I d2 γ 2 : unconstrained d 2 k matrix vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ)

Kronecker product PFC, cont. Under model we obtain and vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ) S ( µ y µ ) = S(Γ) = S(Γ 1 Γ 2 ) dim(s ( µ y µ ) ) = rank(γ 1 Γ 2 ) = d 1 d 2 S FMSDR = Σ 1 x S(Γ 1 Γ 2 ) When in addition Σ x = Σ R Σ C, S FMSDR = span ( Σ 1 R Γ 1 Σ 1 C Γ ) 2

Maximum Likelihood Estimates under Kronecker Product Mean Sructure (K-MLEs) Assume vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) sample mean X is MLE of µ Obtain MLEs α and β by iteratively maximizing log-likelihood For given α and β, MLE = 1 n n i ( X yi ( α β) ) ( vec(f yi ) X yi ( α β) ) vec(f yi )

Variable Selection 1 Apply CISE (coordinate-independent sparse sufficient dimension reduction estimator, Chen, Zou and Cook, 2010) to obtain sparse solution Γ for general penalized LS problem 2 Minimize Γ Γ 1 Γ 2 2 to obtain Γ 1 and Γ 2. 3 Sparse estimate of sufficient reduction is Σ 1 x ( Γ 1 Γ 2 ). 4 This approach excludes combinations of markers and time points that are irrelevant to response, but not predictors or time points separately.

Simulation 1: Continuous Y, Full Rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 6 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR 0.046 0.314 0.186 K-PFC 0.046 0.284 0.052 MLE 0.009 0.285 0.017 5000 K-PIR 0.014 0.091 0.031 K-PFC 0.014 0.090 0.016 MLE 0.003 0.090 0.001 10000 K-PIR 0.010 0.064 0.018 K-PFC 0.010 0.064 0.011 MLE 0.002 0.064 0.011 angle smallest principal angle between subspaces (Zhu and Knyazev, 2012)

Simulation 2: Continuous Y, α, β not full rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 2 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR 0.113 0.314 5.268 K-PFC 0.063 0.284 0.583 MLE 0.173 0.554 0.723 5000 K-PIR 0.034 0.091 0.746 K-PFC 0.020 0.090 0.163 MLE 0.031 0.117 0.182 10000 K-PIR 0.024 0.064 0.427 K-PFC 0.014 0.064 0.116 MLE 0.006 0.064 0.120

Simulation 3: Binary Y p = 5, T = 5, r = k = 1, rank(α) = rank(β) = 1 n Method α β α β α β Angle(Ŝ, S) 500/500 K-PIR 0.132 0.122 16.430 K-PFC 0.132 0.122 16.430 MLE 0.123 0.121 15.622 SIR 22.911 1000/1000 K-PIR 0.090 0.086 11.553 K-PFC 0.090 0.086 11.553 MLE 0.087 0.085 11.044 SIR 16.360

Summary We provide fast and efficient algorithms for computing sufficient reduction in regressions/classifications for longitudinally measured/matrix valued predictors Simple to implement No convergence issues even for large dimensions PFC based estimates are efficient Simultaneous variable selection

References Bura, E. and Cook, R. D. (2001). Estimating the structural dimension of regressions via parametric inverse regression. J. R. Statist. Soc. B 63, 393 410. Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection, The Annals of Statistics, 38, 3696 3723. Cook, R.D. and Forzani L. (2008). Principal Fitted Components for Dimension Reduction in Regression. Statistical Science, 23, 485 501. Ding, S. and Cook, R. D. (2014). Dimension folding PCA and PFC for matrix-valued predictors. Statistica Sinica, 24, 463 492. Li, B., Kim, K. M. and Altman, N. (2010). On dimension folding of matrix or array-valued statistical objects. Ann. Statist. 38, 1094-1121. Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Am. Statist. Assoc., 86, 316 342. Pfeiffer, R., Forzani, L. and Bura, E. (2012). Sufficient Dimension Reduction for Longitudinally Measured Predictors. Statistics in Medicine, Special Issue: Biomarker Working Group: Issues in the Design and Analysis of Epidemiological Studies with Biomarkers, 31(22), 2414 2427. Van Loan, C. F. and Pitsianis, N. (1993). Approximation with Kronecker Products, Linear Algebra for Large Scale and Real-Time Applications, 293 314.