Online Kernel PCA with Entropic Matrix Updates

Size: px

Start display at page:

Download "Online Kernel PCA with Entropic Matrix Updates"

Melvin Adams
5 years ago
Views:

1 Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

2 Outline 1 Batch PCA 2 Online PCA alg 3 Kernelization D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

3 Outline Batch PCA 1 Batch PCA 2 Online PCA alg 3 Kernelization D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

4 Batch PCA PCA: dimensionality reduction project onto subspace preserve most information D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

5 Objective of batch PCA Batch PCA inf center c inf k-dim. proj. matrix P t P(x t c) }{{} compressed (x t c) 2 2 }{{} uncompressed Solution: c = P = average point subspace spanned by k longest axes of covariance matrix (x t c )(x t c ) t D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

6 What we do Batch PCA Online PCA Update subspace after each point Goals Total compression loss close to batch PCA Regret bounds logarithmic in the dimension Kernelize the online algorithm D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

7 Outline Online PCA alg 1 Batch PCA 2 Online PCA alg 3 Kernelization D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

8 Why online? Online PCA alg Data points produced online Data changes over time Our algorithms can be adapted to that case Want to exploit the sequential nature of the data D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

9 Protocol Online PCA alg For t=1 to T Algorithm picks k-dimensional projection P t Nature picks data point x t Algorithm suffers loss P t x t x t 2 2 End For Regret T P t x t x t 2 2 t=1 } {{ } online loss T inf Px t x t 2 2 P t=1 }{{} batch loss D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

10 How do we do it? Online PCA alg Lift methods from expert setting of online learning to matrix setting Use density matrices to express uncertainty over best subspace D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

11 Trick 1: density matrices Online PCA alg Natural parameter for expressing uncertainty over directions Symmetric positive definite matrix of trace 1 Eigenvalues λ i form probability vector (mixture) W = n λ i w i w i i=1 Many mixtures give same matrix Decomposition into n eigendirections D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

Online PCA alg Trick 2: capping the eigenvalues of density matrix Probability vector:

capped probability vector: uncertainty over m-sets of directions Such sets represented

probability simplex Any distribution in hull decomposable into n out of ( n m) corners

12 Online PCA alg Trick 2: capping the eigenvalues of density matrix Probability vector: uncertainty over eigendirections Capping prevents concentration on single corner 1 m capped probability vector: uncertainty over m-sets of directions Such sets represented as m-corners: (0, 1 m, 0, 0, 1 m, 0, 1 m ) The convex hull of the m-corners = capped probability simplex Any distribution in hull decomposable into n out of ( n m) corners D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

13 Online PCA alg Trick 3: rewrite quadratic loss as linear loss Assume c = 0 for now }{{} P x x 2 2 = (P I)x 2 2 k = x (I P) 2 x I P proj.matr. = tr((i P) }{{} n k xx ) Want to choose n k dimensional subspace of minimum variance Projection matrices are symmetric positive matrices w. {0, 1} eigenvals P 2 = P, (I P) 2 = I P D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

14 Online PCA alg Online PCA alg Initalize W 0 = 1 n I Pick n k dimensional subspace based on capped density matrix W }{{} t n k Choose complementary subspace P }{{} t k Receive instance x t Incur loss P t x t x t 2 2 = tr((i } {{ P t) x } t x t ) n k and expected loss (n k) tr(w t x t x t ) Update W t+1 = exp(log W t ηx t x t )/Z, where exp, log are matrix ops Cap eigenvals of W t+1 to 1 n k D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

15 Online PCA alg Update and Winnow-like bound Ŵ t = exp(log W t η x t x t ) tr(exp(log W t η x t x t )) W t+1 = inf W dens.matrix w.eigenvals 1 n k (W, Ŵ t ) regret 2 loss of best k subspace k log n k + k log n k D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

16 Outline Kernelization 1 Batch PCA 2 Online PCA alg 3 Kernelization D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

17 Feature maps Kernelization Expand instance vectors: φ : R n R N, N >> n Example: φ((x 1,..., x n )) = (x 2 1, x 1x 2,..., x 1 x n, x 2 2,...,..., x n 1x n, x 2 n), N = n 2 Dot products can often be computed efficiently: φ(x) φ(y) = (x y) 2 }{{} k(x,y) Kernel PCA computes normal PCA for expanded data instances φ(x 1 ),..., φ(x m ) D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

18 Kernelization Covariance and kernel matrices ( ) Data matrix X X = φ(x 1 )... φ(x n ) }{{} expanded instances as cols Covariance matrix - too big C = XX N N = m i=1 φ(x i)φ(x i ) Kernel matrix - small K = X X m m, K ij = φ(x i ) φ(x j ) }{{} k(x,y) Eigensystems of K and C are related! D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

19 Kernelization Eigendecomposition of K and C Theorem If K has eigensystem (λ i, u i ), then C has eigensystem ( ) λ i, Xu i λi Proof. Eigenvalue: C Xu i λi = XX Xu i λi = XKu i λi = λ i Xu i λi Orthogonality: ( Xui λi ) ( Xuj λj ) = u i X Xu j λi λ j = u i Ku j λi λ j = λ j u i u j λi λ j = 0 Normalization: Xu i λi 2 2 = 1 λ i (Xu i ) (Xu i ) = 1 λ i u i Ku i = 1 λ i λ i u i u i D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

20 Batch Kernel PCA Kernelization Top k eigenvectors of C are implicitly given by top k eigenvectors of K Projections can be computed based on small K matrix D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

21 Online Kernel PCA Kernelization Batch PCA: max Pick the k eigenvectors of C with largest eigenvalues. Online PCA: soft max Pick a subset of k eigenvectors of C probabilistically, based on eigenvalues of exp( ηc) Online KPCA Can compute everything about exp( ηc) i.t.o. K D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

22 Bound Kernelization regret 2 loss of best k subspace k log N k + k log N k N - dimension of feature space D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

23 Kernelization Kernelizable? Vector case: Additive updates w t φ(x t) Multiplicative updates w i e η P φ(x t,i ) ( ) Our results: Matrix multiplicative updates W exp( η t φ(x t)φ(x t ) ) Essentially any matrix update based on spectral function f W f( η t φ(x t )φ(x t ) ) Works only for rank 1 instances φ(x)φ(x) Standard basis vectors in vector case D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

24 Derivation and analysis Kernelization Online PCA W t+1 = argmin tr(w)=1 W 1 N r I {}}{ (W, W t ) + η tr(w x t x t ) quantum relative entropy Kernel Online PCA W t+1 = argmin tr(w)=1 W 1 N r I ( (W, 1 N I) + η t tr(w φ(x t )φ(x t ) ) ) Analysis Bregman projection methods or duality (W, U) = tr(w(log W log U)) D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

25 What s next Kernelization Bounds for estimating the center online Shifting with kernels, in particular long term memory Approximation algs - efficiency Other apps for matrix soft min and soft smallest k D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

26 Kernelization Additional loss of online algorithm D. Kuzmin, M. Warmuth (UCSC) Online Kernel PCA with Entropic Matrix Updates ICML / 26

On-line Variance Minimization

On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06