Sufficient Dimension Reduction for Longitudinally Measured Predictors

Size: px

Start display at page:

Download "Sufficient Dimension Reduction for Longitudinally Measured Predictors"

Jocelyn Barber
5 years ago
Views:

1 Sufficient Dimension Reduction for Longitudinally Measured Predictors Ruth Pfeiffer National Cancer Institute, NIH, HHS joint work with Efstathia Bura and Wei Wang TU Wien and GWU University JSM Vancouver 2018

2 Motivation Biomarkers measured over time used to model disease onset/progression Examples: longitudinal PSA for prostate cancer onset/progression; longitudinal CA125 for ovarian cancer diagnosis Ideal: a single marker with high specificity and sensitivity Such high performance markers are mostly not available Possible strategy: Combine information from multiple longitudinal marker measurements

3 Statistical Problem Combine correlated markers into composite marker score for regression modeling and classification Account for longitudinal nature of marker measurements Identify markers truly associated with outcome and remove irrelevant and redundant markers from marker score to make results more interpretable, facilitate replication and translation of findings to clinical settings improve prediction

4 Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

5 Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt Possible approach: Ignore time/matrix structure of X Reshape p T -matrix X R p T as pt 1 vector, vec(x) Drawback: Ignoring structure can lead to loss of accuracy in estimation that is reflected in loss of discriminatory ability

6 Sufficient Dimension Reduction (SDR) in Regression Y R - response X R p - marker (predictor) vector Goal: Model F (Y X) R : R p R d with d p = dim(x), such that F (Y X) = F (Y R(X)) i.e. replace X by R(X) without loss of information on Y X R(X) is sufficient reduction

7 Estimate R: SDR using Inverse Regression Find R(X) such that X and R(X) have same information about Y If R(X) is sufficient reduction for forward regression Y X then it is also sufficient for inverse regression (IR) X Y (Cook, 07) Advantage: p-dimensional multiple regression of Y on X replaced by p univariate regressions X i on Y Most SDR methods assume linear reduction R(X) = η X and estimated η based on moments of X Y

8 Estimate S(η): First Moment Based Linear SDR General Idea: find a kernel matrix M so that S(M) S(η) First moment SDR methods: If E(X η T X) linear in η T X, S FMSDR = Σ 1 x S(µ Y µ) S(η) µ Y = E(X Y ), µ = E(X), Σ x = cov(x) Sliced Inverse Regression (SIR, Li, 1991): S(Σ 1 x cov(e(x Y )) S(η)

9 FMSDR Estimation: Parametric Inverse Regression (PIR) (Bura & Cook, 2001) Assume linear IR model with µ Y µ = Bf y X y := X (Y = y) = µ + Bf y + ɛ where f y : r 1 vector of functions in y with E(f y ) = 0 B : p r unconstrained parameter matrix E(ɛ) = 0 and var(ɛ Y ) = var(x Y ) = Y Thus S FMSDR = Σ 1 x S(B)

10 Estimation of Sufficient Reduction in PIR random sample (Y i, X i ), i = 1,..., n X : n p matrix with rows (X y X) T, X = n i=1 X i/n F: n r matrix with rows (f y f) T, f = n i=1 f y i /n Ordinary least squares (OLS) estimate B= (F T F) 1 F T X Ŝ FMSDR = ˆΣ 1 X span( B) dim(ŝfmsdr) = rank( B) p estimate dimension B using rank tests

11 Likelihood-based SDR: Principal Fitted Components (PFC) (Cook & Forzani, 08) Assume normal linear IR model with µ Y µ = Γγf y X y = µ + Γγf y + ε, ε N p (0, ) Fix dim(ŝfmsdr) = d Parameterize B = Γγ Γ R p d denotes basis of S FMSDR, with Γ T Γ = I d. γ R d r, d r is unrestricted rank d parameter matrix

12 Recall: Longitudinal Set-up Y R - response X t = (x 1t,..., x pt ) T R p - marker vector measured at t = 1,..., T p T -matrix X R p T X 11 X 1T X 21 X 2T X = (X 1,..., X T ) =... Rp T X p1. X pt

13 Inverse Regression Model for Longitudinal Predictors To accommodate time structure of X Y, assume centered first moment of X can be decomposed into time and marker component in vector notation: where X y := X (Y = y) = µ + βf y α + ε vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ε) f y : k r known function of y α R p r captures mean structure of X regardless of time β R T k captures mean structure over time

14 Example: Binary outcome, Y = 0, 1 When does moment assumption hold? If means of markers change over time only by multiplicative factor that affects all markers equally and is the same for Y = 0 and Y = 1, vec(e(x Y )) = α y β Then, vec(e(x t Y )) = β t α y Using p y = P(Y = y) and E(X) = p 0 α 0 β + p 1 α 1 β, vec(e(x Y = y) E(X)) = (1 p y )(α 0 α 1 ) β = f y (α 0 α 1 ) β First order moment condition is satisfied with f y = (1 p y )

15 First Moment Subspace for Longitudinal Predictors Σ x = cov( vec(x)) R pt pt, and = E( y ), S FMSDR = Σ 1 x S(α β) = 1 S(α β) Pfeiffer, Forzani and Bura (2012) extended SIR to estimate S FMSDR Ding and Cook (2014) developed model-based dimension folding PCA and dimension folding PFC, obtain MLEs when X Y normal var(ɛ) is identity or separable ( = R C ) and Σ x = Σ R Σ C is also separable

16 General PIR and PFC for Longitudinal Predictors Model vec(x yi ) = vec( µ) + (α β) vec(f yi ) + vec(ɛ i ) for random sample (Y i, X i ), i = 1,..., n as where X y : n pt (centered) X y = F y (α β) + ɛ F y : n kr (centered functions of Y ) α R p r, and β R T k ɛ : n pt with E(ɛ) = 0, var ( vec(ɛ)) = I n Y

17 Least Squares Estimation for Kronecker Product Mean Model (K-PIR): Model X y = F y (α β) + ɛ Estimate α and β using two step approach: 1 Find α and β that minimize (F T F) 1 F T X α β 2 using algorithm by Van Loan & Pitsianis, 1993 (VLP) 2 Compute least squares estimate = 1 n rank(f y ) n (X yi F y ( α β) ) ( X yi F y ( α β) ) i

18 Least Squares Estimation for Kronecker Product Mean Model (K-PIR): Model X y = F y (α β) + ɛ Estimate α and β using two step approach: 1 Find α and β that minimize (F T F) 1 F T X α β 2 using algorithm by Van Loan & Pitsianis, 1993 (VLP) 2 Compute least squares estimate = 1 n rank(f y ) n (X yi F y ( α β) ) ( X yi F y ( α β) ) i Theorem: If α and β minimize (F T F) 1 F T X α β 2, then α β p α β

19 Kronecker Product PFC (K-PFC) Assume set Then, vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) α = Γ 1 γ 1 : p r Γ 1 : basis for d 1 dimensional subspace span(α) with Γ 1Γ 1 = I d1 γ 1 : unconstrained d 1 r matrix β = Γ 2 γ 2 : T k Γ 2 : basis for d 2 dimensional subspace span(β) with Γ 2Γ 2 = I d2 γ 2 : unconstrained d 2 k matrix vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ)

20 Kronecker product PFC, cont. Under model we obtain and vec(x y µ) = (Γ 1 Γ 2 )(γ 1 γ 2 ) vec(f y ) + vec(ɛ) S ( µ y µ ) = S(Γ) = S(Γ 1 Γ 2 ) dim(s ( µ y µ ) ) = rank(γ 1 Γ 2 ) = d 1 d 2 S FMSDR = Σ 1 x S(Γ 1 Γ 2 ) When in addition Σ x = Σ R Σ C, S FMSDR = span ( Σ 1 R Γ 1 Σ 1 C Γ ) 2

21 Maximum Likelihood Estimates under Kronecker Product Mean Sructure (K-MLEs) Assume vec(x y ) = vec( µ) + (α β) vec(f y ) + vec(ɛ), ε N pt (0, ) sample mean X is MLE of µ Obtain MLEs α and β by iteratively maximizing log-likelihood For given α and β, MLE = 1 n n i ( X yi ( α β) ) ( vec(f yi ) X yi ( α β) ) vec(f yi )

22 Variable Selection 1 Apply CISE (coordinate-independent sparse sufficient dimension reduction estimator, Chen, Zou and Cook, 2010) to obtain sparse solution Γ for general penalized LS problem 2 Minimize Γ Γ 1 Γ 2 2 to obtain Γ 1 and Γ 2. 3 Sparse estimate of sufficient reduction is Σ 1 x ( Γ 1 Γ 2 ). 4 This approach excludes combinations of markers and time points that are irrelevant to response, but not predictors or time points separately.

23 Simulation 1: Continuous Y, Full Rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 6 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR K-PFC MLE K-PIR K-PFC MLE K-PIR K-PFC MLE angle smallest principal angle between subspaces (Zhu and Knyazev, 2012)

24 Simulation 2: Continuous Y, α, β not full rank p = 10, T = 8, r = k = 6, rank(α) = rank(β) = 2 n Method α β α β α β Angle(Ŝ, S) 500 K-PIR K-PFC MLE K-PIR K-PFC MLE K-PIR K-PFC MLE

25 Simulation 3: Binary Y p = 5, T = 5, r = k = 1, rank(α) = rank(β) = 1 n Method α β α β α β Angle(Ŝ, S) 500/500 K-PIR K-PFC MLE SIR /1000 K-PIR K-PFC MLE SIR

26 Summary We provide fast and efficient algorithms for computing sufficient reduction in regressions/classifications for longitudinally measured/matrix valued predictors Simple to implement No convergence issues even for large dimensions PFC based estimates are efficient Simultaneous variable selection

27 References Bura, E. and Cook, R. D. (2001). Estimating the structural dimension of regressions via parametric inverse regression. J. R. Statist. Soc. B 63, Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection, The Annals of Statistics, 38, Cook, R.D. and Forzani L. (2008). Principal Fitted Components for Dimension Reduction in Regression. Statistical Science, 23, Ding, S. and Cook, R. D. (2014). Dimension folding PCA and PFC for matrix-valued predictors. Statistica Sinica, 24, Li, B., Kim, K. M. and Altman, N. (2010). On dimension folding of matrix or array-valued statistical objects. Ann. Statist. 38, Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Am. Statist. Assoc., 86, Pfeiffer, R., Forzani, L. and Bura, E. (2012). Sufficient Dimension Reduction for Longitudinally Measured Predictors. Statistics in Medicine, Special Issue: Biomarker Working Group: Issues in the Design and Analysis of Epidemiological Studies with Biomarkers, 31(22), Van Loan, C. F. and Pitsianis, N. (1993). Approximation with Kronecker Products, Linear Algebra for Large Scale and Real-Time Applications,

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework