On-line Variance Minimization

Size: px

Start display at page:

Download "On-line Variance Minimization"

Pearl Caitlin Spencer
5 years ago
Views:

1 On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 1 / 28

2 Outline 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 2 / 28

3 Outline Variance 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 3 / 28

4 Variance Variance Symmetric positive definite matrix C is covariance matrix of some random vector p R n The variance along any vector w is C = E ((p E(p)(p E(p)) ) ( 2 V(p w) = E( p w E(p w)) ) ( 2 = E( (p E(p )) w) ) ( = w E (p E(p)(p E(p)) ) w M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 4 / 28

5 Variance Variance minimization problem Setup On-line learning problem Pick a vector w t Receive a covariance matrix C t Loss is variance along vector w t : L t (w t ) = w t C t w t Goal Achieve loss close to variance along best vector picked in hindsight L best = inf u u ( t C t ) u M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 5 / 28

6 Outline Variance minimization on simplex 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 6 / 28

7 Variance minimization on simplex w t is a probability vector Variance minimization in portfolio selection: p is a random vector of relative stock price changes w is stock investment proportions into n stocks w p is our capital gain w C w is variance of gain M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 7 / 28

8 Variance minimization on simplex Exponentiated Gradient Algorithm Maintains probability vector Motivation Bound w t+1 = w t+1,i = w t,ie η(ctwt) i Z inf w simplex i w i ln w i w t,i + η w C t w Var alg Var best + 2 r Var best log n + r log n M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 8 / 28

9 Variance minimization on simplex To do: Minimize tradeoff between gain and variance Pick a vector w t Receive a covariance matrix C t and gain vector p t Charge w t p t }{{} gain + δ w t C t w t }{{} variance M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 9 / 28

10 Outline Variance minimization on unit sphere 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 10 / 28

11 Variance minimization on unit sphere Variance of unit vectors The ellipse is plot of vector Cw, where w is unit vector The outer figure eight is direction w times the variance w Cw For an eigenvector, variance equals the eigenvalue and touches ellipse M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 11 / 28

12 Variance minimization on unit sphere Mixtures of Directions Our algorithm will pick a direction w i with probability ω i Expected variance i ω i var.in.dir.w i {}}{ w i C w i }{{} expected variance = i ω i tr(c w i w i ) = tr(c ω i w i w i ) i }{{} density matrix W Definition ww for unit w is called a dyad Symmetric positive definite matrix of rank one Trace one: tr(ww ) = w w = w 2 2 = 1 Projection matrix onto direction w M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 12 / 28

13 Density Matrices Variance minimization on unit sphere Convex combinations of dyads Symmetric positive definite matrices of trace one Eigenvalues form probability vector Many mixtures lead to the same matrix: Can always be written as a convex combination of n dyads corresponding to eigenvectors Diagonal case: i ω i e i e i M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 13 / 28

14 Variance minimization on unit sphere Variance Minimization with Density Matrices Setup Pick density matrix W t = i ω t,i w t,i w t,i Pick direction w t,i with probability ω t,i Covariance matrix C t is obtained Loss is expected variance: L t (W t ) = i ω t,i w t,i C tw t,i = tr(w t C t ) Goal Do as well as best density matrix - single dyad corresponding to smallest eigenvalue of t C t Expert setting retained as diagonal case ω t l t = tr( ( ωt, ω t, ω t, ω t,4 ) ( lt, l t, l t, l t,4 M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 14 / 28 ) )

15 Variance minimization on unit sphere Deriving the Algorithm W t+1 = arg inf U dens. mat. tr(u(log U log W t )) }{{} quantum relative entropy + η tr(uc t ) }{{} expected variance W t+1 = symmetric positive definite {}}{ symmetric symmetric {}}{{}}{ exp( log W t η C t ) tr(exp(log W t ηc t )) log, exp are matrix versions of logarithm and exponential M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 15 / 28

16 Bounds Generalize Variance minimization on unit sphere loss of algorithm {}}{ T tr(w t C t ) t=1 η loss of comparator {}}{ T tr(uc t ) + log n t=1 1 e η loss of alg. loss of best dens. + 2loss of best dens. log n + log n Assumption: max. eigenvalue of C t 1 M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 16 / 28

17 Outline On-line PCA 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 17 / 28

18 On-line PCA Best m experts Pick set of m experts {i 1,..., i m } based on probability vector w t Receive loss vector l t Loss is total loss of the m experts l i l im and expected loss m w t x t Update w t Minimizing loss l on m experts equivalent to maximizing gain l on n m experts M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 18 / 28

19 New trick: cap weights On-line PCA Super predator algorithm Preserves variety M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 19 / 28

20 On-line PCA Weights 1 m ŵ t,i = w t,ie ηl t,i Z w t+1 = inf (w, ŵ t) w in capped simplex expected loss of alg. loss of best m set + 2loss of best m set m log n + m log n M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 20 / 28

21 Why capping? On-line PCA m sets encoded as probability vectors (0, 1 m, 0, 0, 1 m, 0, 1 m ) called m-corners The convex hull of the m-corners = capped probability simplex We can effectively decompose any capped probability vector w as convex combination of n m-corners w = j α j r j Alternates to capping Dynamic programming: too expensive Follow the perturbed leader: cheap but inferior bounds M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 21 / 28

22 PCA On-line PCA On-line projection of data into low-dimensional subspace Best subspace in hindsight: k top eigenvectors of data covariance matrix M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 22 / 28

23 On-line PCA Rewrite quadratic loss into linear loss Want rank k projection matrix P that minimizes total square loss P }{{} k x x 2 2 = Px Px (I P)x 2 2 = tr((i P) }{{} n k xx ) Want to choose n k dimensional subspace of minimum variance M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 23 / 28

24 On-line PCA Lift sets of expert alg. to matrices Pick n k dimensional subspace based on density matrix W t Choose complementary subspace P t Receive instance x t Incur loss P t x t x t 2 2 and expected loss (n k) tr(w t x t x t ) Update W t M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 24 / 28

25 On-line PCA Update and Winnow-like bound expected loss of alg. loss of best k subspace + 2 loss of best k subspace k log n + k log n Ŵ t = exp(log W t η x t x t ) tr(exp(log W t η x t x t )) W t+1 = inf W dens.matrix w.eigenvals 1 n k (W, Ŵ t ) M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 25 / 28

26 Two families again On-line PCA Regularize with W W [Crammer 06] W = lin. comb. of x t x t Fast and kernelizable Regularize with quantum relative entropy W = exp(lin. comb. of xtx t ) Z Predict with random projection matrix Regret bounds instead of filtering loss Key insight: Mixtures of experts generalize density matrices M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 26 / 28

27 Outline What s next? 1 Variance 2 Variance minimization on simplex 3 Variance minimization on unit sphere 4 On-line PCA 5 What s next? M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 27 / 28

28 What s next? What s next? Shifting methodology from expert setting carries over Experiments Survey on The Blessing and Curse of the Multiplicative Updates Adapt quickly Loss of variety Connections to Biology Work out probability calculus for density matrices [Impromptu by Dima] M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06 28 / 28

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel