Dominant feature extraction

Similar documents
THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A BLOCK INCREMENTAL ALGORITHM FOR COMPUTING DOMINANT SINGULAR SUBSPACES CHRISTOPHER G.

14.2 QR Factorization with Column Pivoting

Notes on Eigenvalues, Singular Values and QR

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Algebra, part 3 QR and SVD

STA141C: Big Data & High Performance Statistical Computing

Linear Algebra in Actuarial Science: Slides to the lecture

Principal components analysis COMS 4771

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

STA141C: Big Data & High Performance Statistical Computing

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

UNIT 6: The singular value decomposition.

Linear Algebra- Final Exam Review

Numerical Methods I Non-Square and Sparse Linear Systems

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

A fast randomized algorithm for approximating an SVD of a matrix

Parallel Singular Value Decomposition. Jiaxing Tan

Orthogonalization and least squares methods

TEL AVIV UNIVERSITY FAST AND ROBUST ALGORITHMS FOR LARGE SCALE STREAMING PCA

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

7. Symmetric Matrices and Quadratic Forms

Model Order Reduction Techniques

arxiv: v1 [cs.lg] 26 Jul 2017

Chapter 7: Symmetric Matrices and Quadratic Forms

Lecture 4 Orthonormal vectors and QR factorization

IV. Matrix Approximation using Least-Squares

AM205: Assignment 2. i=1

235 Final exam review questions

Lecture notes: Applied linear algebra Part 1. Version 2

Chapter 3 Transformations

Linear Algebra Massoud Malek

Solving linear equations with Gaussian Elimination (I)

MA 265 FINAL EXAM Fall 2012

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

The Lanczos and conjugate gradient algorithms

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Applied Linear Algebra in Geoscience Using MATLAB

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Principal Component Analysis

Linear Algebra - Part II

Principal Component Analysis

1 Singular Value Decomposition and Principal Component

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

COMP 558 lecture 18 Nov. 15, 2010

Matrix Approximations

A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem

Example: Face Detection

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

This can be accomplished by left matrix multiplication as follows: I

Efficient and Accurate Rectangular Window Subspace Tracking

Orthonormal Transformations and Least Squares

Katholieke Universiteit Leuven Department of Computer Science

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Rank revealing factorizations, and low rank approximations

Orthogonal Transformations

Probabilistic Latent Semantic Analysis

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Parallel Numerical Algorithms

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

The Singular Value Decomposition

Gram-Schmidt Orthogonalization: 100 Years and More

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Index. for generalized eigenvalue problem, butterfly form, 211

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems

Vector and Matrix Norms. Vector and Matrix Norms

Randomized algorithms for the approximation of matrices

1. Diagonalize the matrix A if possible, that is, find an invertible matrix P and a diagonal

5 Selected Topics in Numerical Linear Algebra

8. Diagonalization.

Course Notes: Week 1

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Model reduction of large-scale dynamical systems

Mathematical foundations - linear algebra

Preliminary Examination, Numerical Analysis, August 2016

Numerical Methods I Singular Value Decomposition

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

AN ITERATIVE METHOD WITH ERROR ESTIMATORS

EE731 Lecture Notes: Matrix Computations for Signal Processing

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

ANSWERS. E k E 2 E 1 A = B

Computational Methods. Eigenvalues and Singular Values

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

The QR Factorization

Using semiseparable matrices to compute the SVD of a general matrix product/quotient

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Advanced Introduction to Machine Learning CMU-10715

A Multi-Affine Model for Tensor Decomposition

Linear Algebra Primer

Introduction to Numerical Linear Algebra II

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Lecture: Face Recognition and Feature Reduction

Transcription:

Dominant feature extraction Francqui Lecture 7-5-200 Paul Van Dooren Université catholique de Louvain CESAME, Louvain-la-Neuve, Belgium

Goal of this lecture Develop basic ideas for large scale dense matrices Recursive procedures for Dominant singular subspace Multipass iteration Subset selection Dominant eigenspace of positive definite matrix Possible extensions which are all based on solving cheap subproblems Show accuracy and complexity results

Dominant singular subspaces Given A mn, approximate it by a rank k factorization B mk C kn by solving min A BC 2, k m, n This has several applications in Image compression, Information retrieval and Model reduction (POD)

Information retrieval Low memory requirement 0(k(m + n)) Fast queries Ax L(Ux) 0(k(m + n)) time Easy to obtain 0(kmn) flops

Proper Orthogonal decomposition (POD) Compute a state trajectory for one typical" input Collect the principal directions to project on

Recursivity We pass once over the data with a window of length k and perform along the way a set of windowed SVD s of dimension m (k + l) Step : expand by appending l columns (Gram Schmidt) Step 2 : contract by deleting the l least important columns (SVD)

Expansion (G-S) Append column a + to the current approximation URV T to get [ ] [ ] [ ] [ ] URV T R 0 V T a + = U a+ Update with Gram Schmidt to recover a new decomposition Û ˆR ˆV T : using ˆr = U T a +, â = a + Uˆr, â = û ˆρ (since a + = Uˆr + û ˆρ)

Contraction (SVD) Now remove the l smallest singular values of this new Û ˆR ˆV T via Û ˆR ˆV T = (ÛG u)(g T u ˆRG v )(G T v ˆV T ) = and keeping U + R + V+ T T as best approximation of Û ˆR ˆV (just delete the l smallest singular values)

Complexity of one pair of steps The Gram Schmidt update (expansion) requires 4mk flops per column (essentially for the products ˆr = U T a +, â = a + Uˆr) [ ] R+ 0 For G u ˆRGv = one requires the left and right singular µ i vectors of ˆR which can be obtained in O(k 2 ) flops per singular value (using inverse iteration) Multiplying ÛG u and ˆV G v requires 4mk flops per deflated column The overall procedure requires 8mk flops per processed column and hence 8mnk flops for a rank k approximation to a m n matrix A [ ] [ ] R A2 One shows that A = U V 0 A T A2 where 22 A 2 F is known 22

Error estimates Let E := A  = UΣV T Û ˆΣ ˆV T and µ := E 2 Let ˆµ := max µ i where µ i is the neglected singular value at step i One shows that the error norm ˆµ σ k+ µ n k ˆµ c ˆµ ˆσ i σ i ˆσ i + ˆµ 2 /2ˆσ i tan θ k tan ˆθ k := ˆµ 2 /(ˆσ 2 k ˆµ 2 ), tan φ k tan ˆφ k := ˆµˆσ /(ˆσ 2 k ˆµ 2 ) where θ k, φ k are the canonical angles of dimension k : cos θ k := U T (:, k)û 2, cos φ k := V T (:, k) ˆV 2

Examples The bounds get much better when the gap σ k σ k+ is large

Convergence How quickly do we track the subpaces? How cos θ (i) k evolves with the time step i

Example Find the dominant behavior in an image sequence Images can have up to 0 6 pixels

Multipass iteration Low Rank Incremental SVD can be applied in several passes, say to [ ] A A A k After the first block (or pass ) a good approximation of the dominant space Û has already been constructed Going over to the next block (second pass ) will improve it, etc Theorem Convergence of the multipass method is linear, with approximate ratio of convergence ψ/( κ 2 ) <, where ψ measures orthogonality of the residual columns of A κ is the ratio σ k /σ k+ of A

Convergence behavior for increasing gap between signal" and noise" Number of INCSVD steps

Convergence behavior for increasing orthogonality between residual vectors" Number of INCSVD steps

Eigenfaces analysis Ten dominant left singular vectors of ORL Database of faces (40 images, 0 subjects, 922 pixels = 0304400 matrix) Using MATLAB SVD function Using one pass of incremental SVD Maximal angle : 63, maximum relative error in sing values : 48%

Conclusions Incremental SVD A useful and economical SVD approximation of A m,n For matrices with columns that are very large or arrive" with time Complexity is proportional to mnk and the number of passes" Algorithms due to [] Manjunath-Chandrasekaran-Wang (95) [2] Levy-Lindenbaum (00) [3] Chahlaoui-Gallivan-VanDooren (0) [4] Brand (03) [5] Baker-Gallivan-VanDooren (09) Convergence analysis and accuracy in refs [3],[4],[5]

Subset selection We want a good approximation" of A mn by a product B mk P T where P nk is a selection matrix" ie a submatrix of the identity I n This seems connected to min A BP T 2 and maybe similar techniques can be used as for incremental SVD Clearly, if B = AP, we just select a subset of the columns of A Rather than minimizing A BP T 2 we maximize vol(b) where k vol(b) = det(b T B) 2 = σ i (B), i= m k ( ) n There are possible choices and the problem is NP hard k and there is no polynomial time approximation algorithm

Heuristics Gu-Eisenstat show that the Strong Rank Revealing QR factorization (SRRQR) solves the following simpler problem B is sub-optimal if there is no swapping of a single column of A (yielding ˆB) that has a larger volume (constrained minimum) Here, we propose a simpler recursive updating" algorithm that has complexity O(mnk) rather than O(mn 2 ) for Gu-Eisenstat The idea is again based on a sliding window of size k + (or k + l) Sweep through columns of A while maintaining a best" subset B Append a column of A to B, yielding B + Contract B + to ˆB by deleting the weakest" column of B +

Deleting the weakest column Let B = A(:, : k) to start with and let B = QR where R is k k Append the next column a + of A to form B + and update its decomposition using Gram Schmidt B + := [ ] [ ] [ ] R 0 QR a + = Q a+ = [ Q ˆq ] [ ] R ˆr = Q ˆρ + R + with ˆr = Q T a +, â = a + Qˆr, â = ˆq ˆρ (since a + = Qˆr + ˆq ˆρ) Contract B + to ˆB by deleting the weakest" column of R + This can be done in O(mk 2 ) using Gu-Eisenstat s SRRQR method but an even simpler heuristic uses only O((m + k)k) flops

Golub-Klema-Stewart heuristic Let R + v = σ k+ u be the singular vector pair corresponding to the smallest singular value σ k+ of R + and let v i be the components of v Let R i be the submatrix obtained by deleting column i from R + then σk+ 2 σ 2 + ( σ2 k+ σ 2 ) v i 2 vol2 (R i ) k j= σ2 j σ2 k+ σ 2 k + ( σ2 k+ σ 2 k ) v i 2 Maximizing v i maximizes thus a lower bound on vol 2 (R i ) In practice this is almost always optimal and guaranteed to be so if σk+ 2 σk 2 + ( σ2 k+ σ 2 k ) v i 2 σ2 k+ σ 2 + ( σ2 k+ σ 2 ) v j 2 j i

GKS method Start with B = A(:, : k) = QR where R is k k For j = k + : n append column a + := A(:, j) to get B + update its QR decomposition to B + = Q + R + contract B + to yield a new ˆB using the GKS heuristic update its QR decomposition to ˆB = ˆQ ˆR One can verify the optimality by performing a second pass Notice that GKS is optimal when σ k+ = 0 since then vol(r i ) = v i k j= σ j

Dominant eigenspace of PSD matrix Typical applications Kernel Matrices (Machine Learning) Spectral Methods (Image Analysis) Correlation Matrices (Statistics and Signal Processing) Principal Component Analysis Karhunen-Loeve We use K N to denote the full N N positive definite matrix

Sweeping through K Suppose a rank m approximation of the dominant eigenspace of K n R nn, n m, is known, K n A n := U n M m U T n, M m R mm an spd matrix and U n R nm with U T n U n = I m

Sweeping through K Suppose a rank m approximation of the dominant eigenspace of K n R nn, n m, is known, K n A n := U n M m U T n, M m R mm an spd matrix and U n R nm with U T n U n = I m Obtain the (n + ) m eigenspace U n+ of the (n + ) (n + ) kernel matrix K n+ [ Kn a K n+ = a T b ] Ũn+,m+2 M m+2 Ũn+,m+2 T

Sweeping through K Suppose a rank m approximation of the dominant eigenspace of K n R nn, n m, is known, K n A n := U n M m U T n, M m R mm an spd matrix and U n R nm with U T n U n = I m Obtain the (n + ) m eigenspace U n+ of the (n + ) (n + ) kernel matrix K n+ [ Kn a K n+ = a T b ] Ũn+,m+2 M m+2 Ũn+,m+2 T Downdate M m+2 to get back to rank m

Sweeping through K Suppose a rank m approximation of the dominant eigenspace of K n R nn, n m, is known, K n A n := U n M m U T n, M m R mm an spd matrix and U n R nm with U T n U n = I m Obtain the (n + ) m eigenspace U n+ of the (n + ) (n + ) kernel matrix K n+ [ Kn a K n+ = a T b ] Ũn+,m+2 M m+2 Ũn+,m+2 T Downdate M m+2 to get back to rank m Downsize Ũn+ to get back to size n

Sweeping through K We show only the columns and rows of U and K that are involved Window size n = 5, rank k = 4

step Start with leading n n subproblem

step 2 Expand

step 3 Downdate and downsize

step 4

step 5 Expand

step 6 Downdate and downsize

step 7

step 8 Expand

step 9 Downdate and downsize

step 0

step Expand

step 2 Downdate and downsize

step 3

step 4 Expand

step 5 Downdate and downsize

step 6

Downdating K to fixed rank m Suppose a rank m approximation of the dominant eigenspace of K n R nn, n m, is known, K n A n := U n M m U T n, M m R mm an spd matrix and U n R nm with U T n U n = I m Obtain the (n + ) m eigenspace U n+ of the (n + ) (n + ) kernel matrix K n+ [ ] Kn a K n+ = a T b Ũn+,m+2 M m+2 Ũn+,m+2 T Downdate M m+2 to delete the smallest" two eigenvalues

Updating: Proposed algorithm Since a = U n U T n a + (I m U n U T n )a = U n r + ρu, with q = (I m U n Un T )a, ρ = q 2, u = q/ρ, we can write [ ] An a A n+ = a T b [ ] Un u m r U = M T n 0 ρ T u r T ρ b [ ] U Un u T n = M m+2 T u

Updating: Proposed algorithm Let M m = Q m Λ m Q T m where Λ m = diag(µ, µ 2,, µ m ), µ µ m > 0, QmQ T m = I m

Updating: Proposed algorithm Let M m = Q m Λ m Q T m where Λ m = diag(µ, µ 2,, µ m ), µ µ m > 0, QmQ T m = I m Let M m+2 = Q m+2 Λ m+2 QT m+2 where Λ m+2 = diag( λ, λ 2,, λ m+, λ m+2 ), QT m+2 Qm+2 = I m+2

Updating: Proposed algorithm Let M m = Q m Λ m Q T m where Λ m = diag(µ, µ 2,, µ m ), µ µ m > 0, QmQ T m = I m Let M m+2 = Q m+2 Λ m+2 QT m+2 where Λ m+2 = diag( λ, λ 2,, λ m+, λ m+2 ), QT m+2 Qm+2 = I m+2 By the interlacing property, we have λ µ λ 2 µ 2 µ m λ m+ 0 λ m+2

Updating: Proposed algorithm We want a simple orthogonal transformation H such that 0 0 M H [v m+ v m+2 ] = 0, H M m m+2 H T = λ m+, 0 λ m+2 with M m R mm Therefore [ Un u A n+ = ] H T M m λ m+ H λ m+2 and the new updated decomposition is given by Un T T u  n+ = U n+ Mm U T n+, [ Un u with U n+ given by the first m columns of ] H T

Cholesky factorization It is not needed to compute the whole spectral decomposition of the matrix M m+2

Cholesky factorization It is not needed to compute the whole spectral decomposition of the matrix M m+2 To compute H only the eigenvectors v m+, v m+2 corresponding to λ m+, λ m+2 are needed

Cholesky factorization It is not needed to compute the whole spectral decomposition of the matrix M m+2 To compute H only the eigenvectors v m+, v m+2 corresponding to λ m+, λ m+2 are needed To compute these vectors cheaply, one needs to maintain (and update) the Cholesky factorization of M m = L m L T m

Cholesky factorization It is not needed to compute the whole spectral decomposition of the matrix M m+2 To compute H only the eigenvectors v m+, v m+2 corresponding to λ m+, λ m+2 are needed To compute these vectors cheaply, one needs to maintain (and update) the Cholesky factorization of M m = L m L T m The eigenvectors are then obtained via inverse iteration and

Cholesky factorization It is not needed to compute the whole spectral decomposition of the matrix M m+2 To compute H only the eigenvectors v m+, v m+2 corresponding to λ m+, λ m+2 are needed To compute these vectors cheaply, one needs to maintain (and update) the Cholesky factorization of M m = L m L T m The eigenvectors are then obtained via inverse iteration and H can then be computed as a product of Householder or Givens transformations

Updating Cholesky M m+2 = = = M m r 0 ρ r T ρ b L m L T m r 0 ρ r T ρ b L m [ 0 T Im m t T I 2 = L m+2 Dm+2 LT m+2, S c ] [ L T m 0 m t I 2 ] where [ ] 0 ρ t = L m r and S c = ρ b t T t

step,,

step 2,,

step 3,,

step 4,,

step 5,,

step 6,,

step 7,,

step 8,,

step 9,,

step 9,,

step 0,,

step,,

step 2,,

step 3,,

step 4,,

step 5,,

step 6,,

step 7,,

Downsizing algorithm Let H v be orthogonal such that [ v T H v v = υ e,m, υ = v 2, U n+ = V ]

Downsizing algorithm Let H v be orthogonal such that [ v T H v v = υ e,m, υ = v 2, U n+ = V ] Then [ υ 0 0 U n+ H v = VH v ]

Downsizing algorithm Let H v be orthogonal such that [ v T H v v = υ e,m, υ = v 2, U n+ = V ] Then [ υ 0 0 U n+ H v = VH v To retrieve the orthonormality of VH v, it is sufficient to divide its first column of by υ 2 and therefore multiply the first column and row of M m by the same quantity ]

Downsizing algorithm Let H v be orthogonal such that [ v T H v v = υ e,m, υ = v 2, U n+ = V ] Then [ υ 0 0 U n+ H v = VH v To retrieve the orthonormality of VH v, it is sufficient to divide its first column of by υ 2 and therefore multiply the first column and row of M m by the same quantity If the matrix M m is factored as L m L T m this reduces to multiplying the first entry of L m by υ 2 ]

Downsizing algorithm Let H v be orthogonal such that [ v T H v v = υ e,m, υ = v 2, U n+ = V ] Then [ υ 0 0 U n+ H v = VH v To retrieve the orthonormality of VH v, it is sufficient to divide its first column of by υ 2 and therefore multiply the first column and row of M m by the same quantity If the matrix M m is factored as L m L T m this reduces to multiplying the first entry of L m by υ 2 Any row of U n+ can be chosen to be removed this way ]

Accuracy bounds When there is no downsizing where K n A n 2 F η n := ñ i=m+ λ (ñ) 2 n i + i=ñ+ K n A n 2 ζ n := λ (ñ) m+ + n A n := U n M m U T n, δ (+) i Kñ Añ 2 F = ñ i=m+ i=ñ+ { max δ (+) 2 n i + i=ñ+ } δ (+) i, δ ( ) i = λ (m+) i, δ ( ) i = λ (m+2) i, λ (ñ) 2 i, Kñ Añ 2 = λ (ñ) m+, δ ( ) 2 i,

Accuracy bounds In relation to the original spectrum K N one obtains the approximate bounds N K N A N 2 F (N m) λ 2 m+ and i=m+ λ 2 i λ m+ K N A N 2 cλ m+, When donwsizing the matrix as well, there are no guaranteed bounds

Example (no downsizing) The matrix considered in this example is a Kernel Matrix constructed from the Abalone benchmark data set http://archiveicsuciedu/ml/support/abalone, with radial basis kernel function ( k(x, y) = exp x ) y 2 2, 00

Example (no downsizing) The matrix considered in this example is a Kernel Matrix constructed from the Abalone benchmark data set http://archiveicsuciedu/ml/support/abalone, with radial basis kernel function ( k(x, y) = exp x ) y 2 2, 00 This data set has 477 training instances

0 4 0 2 0 0 0 2 0 4 0 6 0 8 0 0 0 2 0 0 20 30 40 50 60 70 80 90 00 Figure: Distribution of the largest 00 eigenvalues of the Abalone matrix in logarithmic scale

Table: Largest 9 eigenvalues of the Abalone matrix K N (second column), of A N obtained with updating with ñ = 500, m = 9 (third column) and with ñ = 500, m = 20 (fourth column), respectively λ i µ i, ñ = 500, m = 9 µ i, ñ = 500, m = 20 4483808255808e+3 448380825582e+3 4483808255805e+3 2774246723926e+ 2774246723935e+ 2774246723908e+ 396946486354603e- 39694648574339e- 396946486354575e- 282827838600384e- 282827838240747e- 28282783860794e- 87635493872957e-2 87635489366474e-2 876354938730078e-2 448976653877e-2 4489002296202e-2 4489766537462e-2 3950058249249e-2 395005033082028e-2 3950058245827e-2 34496594206443e-2 34495746496473e-2 34496594206963e-2 227595023456e-2 22750932394003e-2 22759506852e-2

0 500 000 500 2000 2500 3000 3500 4000 4500 0 2 0 3 0 4 0 5 0 6 Figure: Plot of the sequences of δ (+) n (blue line), δ ( ) n (green line), η n (red solid line), λ m+ (cyan solid line) and K N A N F (magenta solid line)

Table: Angles between the eigenvectors corresponding to the largest 9 eigenvalues of K N, computed by the function eigs of matlab and those computed by the proposed algorithm for ñ = 500, m = 9 (second column) and ñ = 500, m = 20 (third column) i (x i, x i ) (x i, ˆx i ) 36500e-08 84294e-08 2 39425e-08 29802e-08 3 23774e-06 569e-08 4 25086e-06 29802e-08 5 30084e-05 5e-07 6 20446e-04 4247e-08 7 2023e-04 490e-08 8 34670e-04 867e-08 9 59886e-04 2073e-08

Example 2 (downsizing) The following matrix has rank 3 F (i, j) = 3 exp ( (i µ k) 2 + (j µ k ) 2 ), i, j =,, 00, 2σ k k= with µ = [ 4 8 76 ], σ = [ 0 20 5 ] Let F = QΛQ T be its spectral decomposition and let R 0000 be a matrix of random numbers generated by the matlab function randn, and define = / 2 For this example, the considered SPD matrix is K N = F + ε T, ε = 0e 5

Example 2 5 05 0 20 40 60 0 0 20 40 60 80 00 80 00 Figure: Graph of the size of the entries of the matrix K N

Example 2 0 2 0 0 0 2 0 4 0 6 0 8 0 0 0 2 0 20 40 60 80 00 Figure: Distribution of the eigenvalues of the matrix K N in logarithmic scale

Example 2 06 05 04 03 02 0 0 0 0 20 40 60 80 00 Figure: Plot of the three dominant eigenvectors of K N

Example 2 0 4 0 5 0 6 0 7 30 40 50 60 70 80 90 00 Figure: Plot of the sequences of δ (+) n (blue dash dotted line), δ ( ) k (green dotted line), η n (red solid line), λ m+ (cyan solid line) and K N A N F (black solid line)

Example 2 Table: Largest three eigenvalues of the matrix K N (first column) Largest three eigenvalues computed with downdating procedure with minimal norm, with m = 3 and ñ = 30, 40, 50 (second, third and fourth column), respectively Largest three largest eigenvalues of K N computed with the former downdating procedure (fifth column) λ i µ i, ñ = 30 µ i, ñ = 40 µ i, ñ = 50 µ i, ñ = 50 7949478e0 73753e0 7820407e0 794727e0 3963329e0 526405e0 525563e0 5260243e0 526384e0 547202e-6 3963329e0 3948244e0 396323e0 3963329e0 4824060e-6

Conclusions A fast algorithm for to compute incrementally the dominant eigenspace of a positive definite matrix

Conclusions A fast algorithm for to compute incrementally the dominant eigenspace of a positive definite matrix Improvement on Hoegaerts, L, De Lathauwer, L, Goethals I, Suykens, JAK, Vandewalle, J, & De Moor, B Efficiently updating and tracking the dominant kernel principal components Neural Networks, 20, 220 229, 2007

Conclusions A fast algorithm for to compute incrementally the dominant eigenspace of a positive definite matrix Improvement on Hoegaerts, L, De Lathauwer, L, Goethals I, Suykens, JAK, Vandewalle, J, & De Moor, B Efficiently updating and tracking the dominant kernel principal components Neural Networks, 20, 220 229, 2007 The overall complexity of the incremental updating technique to compute an N m basis matrix U N for the dominant eigenspace of K N, is reduced from (m + 4)N 2 m + O(Nm 3 ) to 6N 2 m + O(Nm 2 )

Conclusions A fast algorithm for to compute incrementally the dominant eigenspace of a positive definite matrix Improvement on Hoegaerts, L, De Lathauwer, L, Goethals I, Suykens, JAK, Vandewalle, J, & De Moor, B Efficiently updating and tracking the dominant kernel principal components Neural Networks, 20, 220 229, 2007 The overall complexity of the incremental updating technique to compute an N m basis matrix U N for the dominant eigenspace of K N, is reduced from (m + 4)N 2 m + O(Nm 3 ) to 6N 2 m + O(Nm 2 ) When using both incremental updating and downsizing to compute the dominant eigenspace of ˆKñ (an ñ ñ principal submatrix of K N ), the complexity is reduced (2m + 4)Nñm + O(Nm 3 ) to 6Nñm + O(Nm 2 )

Conclusions A fast algorithm for to compute incrementally the dominant eigenspace of a positive definite matrix Improvement on Hoegaerts, L, De Lathauwer, L, Goethals I, Suykens, JAK, Vandewalle, J, & De Moor, B Efficiently updating and tracking the dominant kernel principal components Neural Networks, 20, 220 229, 2007 The overall complexity of the incremental updating technique to compute an N m basis matrix U N for the dominant eigenspace of K N, is reduced from (m + 4)N 2 m + O(Nm 3 ) to 6N 2 m + O(Nm 2 ) When using both incremental updating and downsizing to compute the dominant eigenspace of ˆKñ (an ñ ñ principal submatrix of K N ), the complexity is reduced (2m + 4)Nñm + O(Nm 3 ) to 6Nñm + O(Nm 2 ) This is in both cases essentially a reduction by a factor m

References Gu, Eisenstat, An efficient algorithm for computing a strong rank revealing QR factorization, SIAM SCISC, 996 Chahlaoui, Gallivan, Van Dooren, An incremental method for computing dominant singular subspaces, SIMAX, 200 Hoegaerts, De Lathauwer, Goethals, Suykens, Vandewalle, De Moor, Efficiently updating and tracking the dominant kernel principal components Neural Networks, 2007 Mastronardi, Tyrtishnikov, Van Dooren, A fast algorithm for updating and downsizing the dominant kernel principal components, SIMAX, 200 Baker, Gallivan, Van Dooren, Low-rank incrmenetal methods for computing dominant singular subspaces, submitted, 200 Ipsen, Van Dooren, Polynomial Time Subset Selection Via Updating, in preparation, 200