Regularized PCA to denoise and visualise data

Size: px

Start display at page:

Download "Regularized PCA to denoise and visualise data"

Amber Cunningham
6 years ago
Views:

1 Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier / 30

2 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 2 / 30

3 Geometrical point of view Maximize the variance of the projected points Minimize the reconstruction error Approximation of X of low rank (S < p) X n p ˆXn p 2 SVD: ˆXPCA = U n S Λ 1 2 S S V p S F = UΛ 1 2 principal components V principal axes Alternating least squares algorithm: X FV 2 F = XV(V V) 1 V = X F(F F) 1 3 / 30

4 Model in PCA x = signal + noise Random eect model: Factor Analysis, Probabilistic PCA (Lawley, 1953, Roweis, 1998, Tipping & Bishop, 1999) individuals are exchangeable relationship between variables Fixed eect model (Lawley, 1941, Caussinus 1986) individuals are not i.i.d study the individuals and the variables sensory analysis (products - sensory descriptors) plant breeding (genotypes - environments) 4 / 30

5 Fixed eect model X n p = Xn p + ε n p x = S s=1 Randomness due to the noise d s q is r js + ε, ε N (0, σ 2 ) Individuals have dierent expectations: individual parameters Inferential framework: when n the number of parameters Asymptotic: σ 0 Maximum likelihood estimates (ˆθ) = least square estimates EM algorithm = ALS algorithm Not necessarily the best in MSE Shrinkage estimates: minimizes E(φˆθ θ) 2 5 / 30

6 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 6 / 30

7 Minimizing the MSE x = x + ε = S s=1 x s + ε MSE = E i,j min(n 1,p) s=1 φ sˆx (s) x 2 with ˆx (s) = λ s u is v js MSE = E i,j ( S s=1 φ sˆx (s) ) 2 x (s) + min(n 1,p) s=s+1 ( φ sˆx (s) ) x (s) 2 For all s S + 1, x (s) = 0, therefore φ S+1 =... = φ min(n 1,p) = 0. 7 / 30

8 Minimizing the MSE MSE = E i,j ( S s=1 φ sˆx (s) x ) 2 with ˆx (s) = λ s u is v js φ s = i,j i,j (s) E(ˆx ) x E(ˆx (s)2 ) = i,j ( i,j V (ˆx (s) ) + (s) E(ˆx ) x ( E(ˆx (s) ) ) 2 ) When σ 0: E(ˆx (s) ) = x (s) (unbiased); V (ˆx (s) σ ) = 2 (Pazman & Denis, 1996; Denis & Gower, 1999) min(n 1;p) φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x 8 / 30

9 Minimizing the MSE x = x + ε = S s=1 d s q is r js + ε, ε N (0, σ 2 ) MSE = E ( i,j ( S s=1 φ sˆx (s) x ) 2 ) with ˆx (s) = λ s u is v js φ s = i,j i,j x (s)2 σ 2 min(n 1;p) + (s)2 i,j x = = d s npσ 2 min(n 1;p) + d s variance signal variance total ˆφ s = λ s ˆσ 2 λ s for s = 1,..., S 9 / 30

10 Regularized PCA ˆx rpca = ˆx PCA = S ( ) λs ˆσ 2 λs u is v js = s=1 λ s S s=1 S s=1 λs u is v js ( λs ˆσ2 λs ) u is v js Compromise between hard and soft thresholding σ 2 small regularized PCA PCA σ 2 large individuals coordinates toward the center of gravity ˆσ 2 = RSS ddl = n q s=s+1 λ s np p ns ps + S 2 + S (X n p ; U n S ; V p S ) (only asympt unbiased) 10 / 30

11 Illustration of the regularization X sim 4 10 = X ε sim 4 10 sim = 1,..., 1000 PCA rpca rpca more biased less variable 11 / 30

12 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 12 / 30

13 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view Y = Xβ + ε, ε N (0, σ 2 ) β N (0, τ 2 ) ˆβ MAP = (X X + κ) 1 X Y, with κ = σ 2 /τ 2 3 Results 12 / 30

14 Probabilistic PCA (Roweis, 1998, Tipping & Bishop, 1999) A specic case of an exploratory factor analysis model x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Conditional independence : x i z i N (Bz i, I p ) Distribution for the observations: x i N (0, Σ) with Σ = BB + σ 2 I p Explicit solution: ˆσ 2 = 1 p S p s=s+1 λ s ˆB = V(Λ σ 2 I S ) / 30

15 Probabilistic PCA x = (Z n S B p S ) + ε, z i N (0, I S ), ε i N (0, σ 2 I p ) Random eect model: no individual parameters BLUP E(z i x i ): Ẑ = XˆB(ˆB ˆB + σ2 I S ) 1 ˆX ppca = ẐˆB Using ˆB = V(Λ σ2 I S ) 1 2 and SVD X = UΛ 1/2 V XV = Λ 1/2 U ˆX ppca = U(Λ σ 2 I S )Λ 1 2 V ˆx ppca = S s=1 ( λs ˆσ2 λs ) u is v js Bayesian treatment of the xed eect model with an a priori on z i : Regularized PCA 14 / 30

16 Probabilistic PCA EM algorithm in ppca: two ridge regressions Step E: Step M: Ẑ = XˆB(ˆB ˆB + ˆσ 2 I S ) 1 ˆB = X Ẑ(Ẑ Ẑ + ˆσ 2 Λ 1 ) 1 EM algorithm in PCA: two regressions X FV 2 F = XV(V V) 1 V = X F(F F) 1 15 / 30

17 An empirical Bayesian approach Model: x = S s=1 Prior: x (s) N (0, τs 2 ) ( ) Posterior: E x (s) x (s) + ε, ε N (0, σ 2 ) x (s) ( ) Maximizing the likelihood of x (s) ( function of τs 2 to obtain: ˆτ s 2 = ˆΦ s = λ s = Φ s x (s) with Φ s = τ 2 s τ 2 s + 1 min(n 1;p) σ2 N (0, τs min(n 1;p) σ2 ) as a ) min(n 1;p) ˆσ2 1 np λ 1 s np min(n 1;p) ˆσ2 λ s 16 / 30

18 Outline 1 PCA 2 Regularized PCA MSE point of view Bayesian point of view 3 Results 17 / 30

19 Simulation design The simulated data: X = X + ε Vary several parameters to construct X: n/p: 100/20 = 5, 50/50 = 1 and 20/100 = 0.2 S underlying dimensions: 2, 4, 10 the ratio d 1 /d 2 : 4, 1 ε is drawn from a Gaussian distribution with a standard deviation chosen to obtain a signal-to-noise ratio (SNR): 4, 1, / 30

20 Soft thresholding and SURE (Candès et al., 2012) ˆx Soft = min(n,p) s=1 ( λs λ) + u is v js X X 2 + λ X Choosing the tunning parameter λ? MSE(λ) = E X SVTλ (X) 2 Stein's Unbiased Risk Estimate (SURE) min(n,p) SURE(SVT λ )(X) = npσ 2 + min(λ 2, λ i ) + 2σ 2 div(svt λ )(X) i λ is obtained minimizing SURE. Remark: ˆσ 2 is required 19 / 30

21 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 MSE in favor of rpca 20 / 30

22 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 rpca PCA when SNR is small - rpca PCA when SNR is high 20 / 30

23 rpca rpca when d 1 d 2 20 / 30 PCA Regularized PCA Results Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01

24 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 Same order of errors for n/p = 0.2 or 5, smaller for n/p = 1! 20 / 30

25 Simulation study n/p S SNR d 1 /d 2 MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) 100/ E E E / E E E / E E E / E E E / E E E / E E E-01 50/ E E E-03 50/ E E E-03 50/ E E E-01 50/ E E E-01 50/ E E E-01 50/ E E E-01 20/ E E E-03 20/ E E E-03 20/ E E E-01 20/ E E E-01 20/ E E E-01 20/ E E E-01 SURE good when data are noisy 20 / 30

26 Simulation from Candes et al. (2012) Same model of simulation n = 500 p = 200, SNR (4, 2, 1, 0.5), S (10, 100) S SNR MSE(ˆXPCA, X) MSE(ˆXrPCA, X) MSE(ˆXSURE, X) E E E E E E E E E E E E E E E E E E E E E E E E / 30

27 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration PCA 22 / 30

28 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA rpca 23 / 30

29 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 PCA SURE 23 / 30

30 Individuals representation 100 individuals, 20 variables, 2 dimensions, SNR=0.8, d 1 /d 2 = 4 True conguration rpca Improve the recovery of the inner-product matrix X X 24 / 30

31 Variables representation True conguration PCA cor(x 7, X 9 )= / 30

32 Variables representation PCA rpca cor(x 7, X 9 )= / 30

33 Variables representation PCA SURE cor(x 7, X 9 )= / 30

34 Variables representation True conguration rpca cor(x 7, X 9 )= Improve the recovery of the covariance matrix X X 27 / 30

35 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions PCA rpca 28 / 30

36 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SURE rpca 28 / 30

37 Real data set 27 chickens submitted to 4 nutritional statuses gene expressions SPCA rpca 28 / 30

38 Real data set PCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

39 Real data set SURE rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

40 Real data set SPCA rpca Bi-clustering from rpca better nd the 4 nutritional statuses! 29 / 30

41 Conclusion Recover a low rank matrix from noisy observations singular values thresholding: a popular estimation strategy ( λs a regularized term a priori: ˆσ2 λs ): Minimize MSE - Bayesian xed eect model Good results in term of recovery of the structure, the distances between individuals and the covariance matrix and in term of visualization Asymptotic results Tuning parameter: the number of dimensions CV? Regularized MCA (Takane, 2006) 30 / 30

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.