On the Eigenspectrum of the Gram Matrix and the Generalisation Error of Kernel PCA (Shawe-Taylor, et al. 2005) Ameet Talwalkar 02/13/07

Size: px

Start display at page:

Download "On the Eigenspectrum of the Gram Matrix and the Generalisation Error of Kernel PCA (Shawe-Taylor, et al. 2005) Ameet Talwalkar 02/13/07"

Alisha Hoover
6 years ago
Views:

1 On the Egenspectru of the Gra Matr and the Generalsaton Error of Kernel PCA Shawe-aylor, et al. 005 Aeet alwalar 0/3/07

2 Outlne Bacground Motvaton PCA, MDS Isoap Kernel PCA Generalsaton Error of Kernel PCA

3 Lossy Densonal Reducton: Motvaton Coputatonal effcency sualzaton of data requres D or 3D representatons Curse of Densonalty : Learnng algorths requre reasonably good saplng Intractable learnng proble A A ractable learnng proble D Red -> Lossless Manfold Learnng Assues estence of ntrnsc denson, or a reduced representaton contanng all ndependent varables

4 Lnear Densonal Reducton Assues nput data s a lnear functon of the ndependent varables Coon Methods: Prncpal Coponent Analyss PCA Multdensonal Scalng MDS

5 PCA Bg Pcture Lnearly transfor nput data n a way that: Mazes sgnal varance Mnzes redundancy of sgnal covarance

6 PCA Sple Eaple Orgnal Data Ponts E.g. shoe sze easured n ft, c y provdes a good appro of data

7 PCA Sple Eaple cont Orgnal data restored usng only frst prncpal coponent

8 PCA Covarance Covarance s a easure of how uch two varables vary together cov, y E[ y y] cov, var If and y are ndependent, then cov,y 0

9 PCA Covarance Matr Stores parwse covarance of varables Dagonals are varances Syetrc, Postve Se-defnte Start wth colun vector observatons of n varables Covarance s an n n atr C C X X E [ [ ] [ ] X E X X E X ] XX

10 Egendecoposton Egenvectors v and egenvalues λ for an n n atr, A, are pars v, λ such that: Av λv If A s a real syetrc atr, t can be dagonalzed nto A E DE E A s orthonoral egenvectors D dagonal atr of A s egenvalues A s postve se-defnte > egenvalues non-negatve negatve

11 PCA Goal 3 Lnearly transfor nput data n a way that: Mazes sgnal varance Mnzes redundancy of sgnal covarance Algorth: Select varance azng drecton nput space Fnd net varance azng drecton that s orthogonal to all prevously selected drectons Repeat - tes Fnd a transforaton, P, such that Y PX and C Y s dagonalzed Soluton: proect data onto egenvectors of C

12 PCA Algorth Goal: Fnd P where Y PX s.t.. C Y s dagonalzed C Y where YY PX PX PXX P PAP A XX EDE note: egenvectors of E are orthonoral Select P E, or a atr where each row s an egenvector of C C Y PAP P P D DP P Inverse ranspose for orthonoral atr C Y s dagonalzed PCs are the egenvectors of C th dagonal value of C Y s the varance of X along p

13 Gra Matr Kernel Matr Gven X, a collecton of colun vector observatons of n varables Gra Matr of M: atr of dot products of nputs, real, syetrc Postve se-defnte slarty atr K K X X

14 Classcal Multdensonal Scalng Gven obects and dsslarty δ for each par, fnd space n whch δ Eucldean dstance If δ Eucldean Dstance: Can convert Dsslarty atr to Gra Matr or we can ust start wth Gra Matr MDS yelds sae answer as PCA

15 Classcal Multdensonal Scalng Convert Dsslarty Matr to Gra Matr K Egendecoposton of K K EDE ED K X X X ED / ED / D / Reduce Denson / E ED / ED / / Construct X fro subset of egenvectors/egenvalues egenvalues Identcal to PCA

16 Ltatons of Lnear Methods Cannot account for non- lnear relatonshp of data n nput space Sall Eucldean dstance Data ay stll have lnear relatonshp n soe feature space Isoap: : use geodesc dstance to recover anfold Length of shortest curve on a anfold connectng two ponts on the anfold Large geodesc dstance

17 Local Estaton of Manfolds Sall patches on a non-lnear anfold loo lnear Locally lnear neghborhoods defned n two ways -nearest neghbors: fnd the nearest ponts to a gven pont ε-ball: fnd all ponts that le wthn ε of a gven pont

18 Isoap dea Create weghted graph vertces dataponts edges between neghbors, weghted by Eucldean dstance Dstance atr parwse Shortest paths Construct d-densonal d densonal ebeddng Perfor MDS and eyeball resdual varance

19 Eyeballng Intrnsc Denson

20 Isoap Convergence Guaranteed to asyptotcally recover conve Eucldean anfolds For a suffcently hgh densty of data ponts, gven arbtrarly sall values λ, λ and µ,, then wth probablty at least - µ: graph dstance - λ + λ geodesc dstance Rate of convergence dependent on densty of ponts and propertes of underlyng anfold radus of curvature, branch separaton

21 Kernel Functons Kernel functon: slarty easure between two vectors Defne non-lnear appng fro nput space to hgh- densonal feature space: : X F Defne such that: y, y Effcency: ay be uch ore effcent to copute than appng and dot product n hgh densonal space Fleblty: can be chosen arbtrarly so long as t s postve ve defnte syetrc

22 Postve Defnte Syetrc PDS Kernels Gven colun vector observatons of n varables Kernel Matr: atr n whch K, Kernel s PDS f K s syetrc and postve se- defnte If K s postve se-defnte then s the dot product n soe dot product space feature space

23 Kernel rc For any algorth relyng solely on dot-products, we can replace the dot-product wth a postve- defnte ernel Allows for non-lnearty Eaple: PCA

24 Kernel PCA Kernel PCA PCA: PCA: egenvectors of Covarance atr egenvectors of Covarance atr are Prncpal Coponents are Prncpal Coponents Can rewrte solely wth dot Can rewrte solely wth dotproducts products Kernel PCA: Kernel PCA: y y y y : by ultply * ],... [ λ C * λ λ

25 Kernel PCA Kernel PCA [ ] y y λ K λ y y y ]:... [ λ Kernel Matr Kernel Matr

26 Kernel PCA Kernel PCA K s ernel gra atr K s ernel gra atr Use Use egendecoposton egendecoposton on K to fnd on K to fnd egenvectors egenvectors Proect test ponts n F on subset of Proect test ponts n F on subset of egenvectors denson reducton egenvectors denson reducton PCA: PCA: egenvectors of Covarance atr egenvectors of Covarance atr are Prncpal Coponents are Prncpal Coponents Can rewrte solely wth dot Can rewrte solely wth dotproducts products Kernel PCA: Kernel PCA: y y y y : by ultply * ],... [ λ C * λ λ K λ, κ λ λ

27 heory behnd densonal reducton? Densonal reducton has ganed popularty snce Isoap,, LLE publshed But, not uch theory behnd t Isoap s an ecepton Assung estence of underlyng anfold Do varous d red algorths converge to the correct anfold? What s the rate of convergence,.e., gven nput X of ponts, how close s d_redx to underlyng anfold?

28 Why focus on KPCA? Generalzaton of densonal reducton LLE and Isoap are fors of KPCA Resdual arance an ntutve easureent of accuracy Lt s clear and provable: Gven an underlyng anfold wth denson, as approaches nfnty, resdual varance approaches 0 Paper also uses R to easure d red accuracy n fnte case

29 What we re nterested n Resdual arance Captured arance C n > λ λ λ λ + λ hs paper provdes bounds for the sus of these process egenvalues as a functon of eprcal egenvalues

30 Eprcal egenvalues Eprcal egenvalues Perfor PCA on saple, S, of ponts Perfor PCA on saple, S, of ponts Note: are egenvalues of C Note: are egenvalues of C S, and, and y y y y y y ˆ,, : by ultply * ],... [ λ κ µ κ * S C µ µ λ ˆ µ

31 Process egenvalues Eprcal egenproble: y [ κ... ], ultply * by y y, µ y As approaches nfnty, ths becoes: : χ κ, y p d λ y for a gven ernel functon and densty p on a space X µ λ s an estate for process egenvalue

32 Proectons onto Subspaces P : Proecton onto subspace P : Proecton onto orthogonal copleent of P : Resdual of proecton onto dstance between orgnal pont and ts proecton

33 Egenvalues and Proectons Equatons azed when v st egenvector of K q st egenvalue of operator K q equals epected squared nor of st egenvector of K q ntuton: frst egenvector s drecton for whch the epected square of the resdual s nal q defnes dstrbuton of K general forula applcable to eprcal and process cases λ Κ Ε q q P [ ] [ ] Pv Εq n Εq P v F v λ Κ q a Εq v F

34 Eprcal/Process Epectatons of Eprcal/Process Subspaces Frst two equatons follow fro last slde Ε λ P Εˆ P ˆ µ Ε P ˆ : Average resdual over entre dstrbuton of proecton onto frst eprcal egenvectors agreed? Εˆ P : Eprcal average of squared nor for ponts n S proected onto frst process egenvectors

35 wo sple nequaltes s the best soluton for eprcal data S ˆ Εˆ P Ε ˆ µ ˆ P s the best soluton for underlyng process Ε P Ε Pˆ λ Goal of paper: show that chan of nequaltes below s accurate and bound dfference between frst and last ters Εˆ Ε Ε Ε P ˆ ˆ P P Pˆ

36 What we re nterested n Resdual arance Captured arance C n λ λ λ > λ + λ Ε P Ε + Ε P P hs paper provdes bounds for the sus of these process egenvalues as a functon of eprcal egenvalues

37 And now a frst Bound If we perfor PCA n feature space defned by κ,y, then wth probablty greater than -δ over rando -saples S, f new data s proected onto Ṽ the su the largest process egenvalues captured varance s bounded by: λ Ε P l + l a µ S l R Ε 9 + ln δ P ˆ κ, where support of the dstrbuton s n a ball of radus R n feature space

38 And now a frst Bound Frst ter: l + l a µ S l κ, radeoff between ters wthn a ter: as l ncreases, captured varance ncreases, but so does the rato of l/ For well-behaved ernels those for whch dot product s bounded, the square root ter should be a constant Second ter: R 9 + ln δ Includes dependences on confdence paraeter and dstrbuton radus R

39 + he second bound If we perfor PCA n feature space defned by κ,y, then wth probablty greater than -δ over rando -saples S, f new data s proected onto Ṽ the epected squared resdual s bounded by: > Ε Ε λ P P ˆ + > l l µ S + n κ, l R 8 ln δ where support of the dstrbuton s n a ball of radus R n feature space

40 Net steps How tght are these bounds? Can we do better? Can we use these bounds to copare estng densonal reducton algorths Can we construct a ernel that azes the tghtness of ths bound?

LECTURE :FACTOR ANALYSIS

LECTURE :FACTOR ANALYSIS LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If