Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1
Alternating least-squares / linear scheme General setting: Solve optimization problem min f (X), X where X is a (large) matrix or tensor and f is simple (e.g., convex). Constrain X to M r, set of rank-r matrices or tensors and aim at solving min X M r f (X), Set X = i(u 1, U 2,..., U d ). (e.g., X = U 1 U2 T ). Low-rank formats are multilinear there is hope that optimizing for each component is simple: min U µ f (i(u 1, U 2,..., U d )). 2
Alternating least-squares / linear scheme Set f (U 1,..., U d ) := f (i(u 1,..., U d )). ALS: 1: while not converged do 2: U 1 arg min U1 f (i(u 1, U 2,..., U d )) 3: U 2 arg min U1 f (i(u 1, U 2,..., U d )) 4:... 5: U d arg min U1 f (i(u 1, U 2,..., U d )) 6: end while Examples: ALS for fitting CP decomposition Subspace iteration. Closely related: Block Gauss-Seidel, Block Coordinate Descent. Difficulties: Representation (U 1, U 2,..., U d ) often non-unique, parameters may become unbounded. M r not closed Convergence (analysis) 3
Subspace iteration and ALS Given A R m n, consider computation of best rank-r approximation: min f (U, V ), U R m r,v R n r f (U, V ) := A UV T 2 F Representation UV T is unique for each U, V individually if U, V have rank r. f is convex wrt U and V individually. Hence, U f (U, V ), H = f (U + H, V ) f (U, V ) + O( H 2 2) = 2 AV UV T V, H. 0 = U f (U, V ) = 2(AV UV T V ) U = AV (V T V ) 1. For stability it is advisable to choose V such that it has orthonormal columns. 4
Subspace iteration and ALS ALS for low-rank matrix approximation: 1: while not converged do 2: Compute economy size QR factorization: V = QR and update V Q. 3: U AV 4: Compute economy size QR factorization: U = QR and update U Q. 5: V A T U 6: end while Returns an approximation A UV T. This is the subspace iteration from Lecture 1! EFY. Develop an ALS method for solving the weighted low-rank approximation problem with square and invertible matrices W L, W R. min U,V W L (A UV T )W R F 5
Linear matrix equations For linear operator L : R m n R m n, consider linear system Examples: 1 Sylvester matrix equation: L(X) = C, C, X R m n. AX + XB = C, A R m m, B R n n, C, X R m n. Applications: Discretized 2D Laplace on rectangle, stability analysis, optimal control, model reduction of linear control systems. Special case Lyapunov equations: m = n, A = B T, C symmetric (and often negative semi-definite) Stochastic Galerkin methods in uncertainty quantification. Stochastic control. 1 See [V. Simoncini, Computational methods for linear matrix equations, SIAM Rev., 58 (2016), pp. 377 441] for details and references. 6
Linear matrix equations Using the matrix M L representing L in canonical bases, we can rewrite L(X) = B as linear system M L (vec(x)) = vec(c). Assumption: M L has low Kronecker rank: M L = B 1 A 1 + + B R A R, R m, n. Equivalently, L(X) = A 1 XB T 1 + + A R XB T R EFY. Develop a variant of ACA (from Lecture 3) that aims at approximating a given sparse matrix A by a matrix of low Kronecker rank for given m, n. EFY. Show that if m = n, M L is symmetric and has Kronecker rank R, one can find symmetric matrices A 1,..., A R, B 1,..., B R such that L(X) = A 1 XB 1 + + A R XB R. Is it always possible to choose all A k, B k positive semi-definite if M L is positive definite? 7
Linear matrix equations Two ways of turning L(X) = C into optimization problem: 1. If M L is symmetric positive definite: 2. General L: min L(X), X X, B. 2 X 1 Will focus on spd M L in the following. min L(X) B 2 F X 8
Linear matrix equations Low-rank approximation of L(X) = B obtained by solving min f (U, V ) for f (U, V ) = 1 U,V 2 L(UV T ), UV T UV T, C. Let L have Kronecker rank R. Then L(UV T ), UV T = R R A k UV T B k, UV T = A k UV T B k V, U. k=1 k=1 This shows that arg min U f (U, V ) is solution of linear matrix equation A 1 U(V T B 1 V ) + + A R U(V T B R V ) = CV. EFY. Show that this linear matrix equation always has a unique solution under the assumption that L is symmetric positive definite. For R = 2, can be reduced to R linear systems of size n n. For R > 2, need to solve Rn Rn system. 9
Linear matrix equations ALS for linear matrix equations: 1: while not converged do 2: Compute economy size QR factorization: V = QR and update V Q. 3: Solve A 1 U(V T B 1 V ) + + A R U(V T B R V ) = CV for U. 4: Compute economy size QR factorization: U = QR and update U Q. 5: Solve (U T A 1 U)V T B 1 + + (U T A R U)V T B R = U T C for V. 6: end while Returns an approximation X UV T. For R = 2, there are better alternatives: ADI, Krylov subspace methods,... [Simoncini 2016]. 10
2D eigenvalue problem u(x) + V (x)u = λu(x) in Ω = [0, 1] [0, 1] with Dirichlet b.c. and Henon-Heiles potential V Regular discretization Reshaped ground state into matrix Ground state 10 0 Singular values 10 5 10 10 10 15 10 20 0 100 200 300 Excellent rank-10 approximation possible 11
Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then We now... x, Ax λ min (A) = min x 0 x, x. reshape vector x into n n matrix X; reinterpret Ax as linear operator A : X A(X). 12
Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then X, A(X) λ min (A) = min X 0 X, X with matrix inner product,. We now... restrict X to low-rank matrices. 13
Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then λ min (A) min X=UV T 0 X, A(X). X, X Approximation error governed by low-rank approximability of X. Solved by Riemannian optimization techniques or ALS. 14
ALS for eigenvalue problem ALS for solving X, A(X) λ min (A) min. X=UV T 0 X, X Initially: fix target rank r U R m r, V n r randomly, such that V is ONB λ λ = 6 10 3 residual = 3 10 3 15
ALS for eigenvalue problem ALS for solving Fix V, optimize for U. λ min (A) min X=UV T 0 X, A(X). X, X X, A(X) = vec(uv T ) T A vec(uv T ) = vec(u) T (V I) T A(V I)vec(U) Compute smallest eigenvalue of reduced matrix (rn rn) matrix (V I) T A(V I). Note: Computation of reduced matrix benefits from Kronecker structure of A. 16
ALS for eigenvalue problem ALS for solving Fix V, optimize for U. λ min (A) min X=UV T 0 X, A(X). X, X λ λ = 2 10 3 residual = 2 10 3 17
ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X X, A(X) = vec(uv T ) T A vec(uv T ) = vec(v T )(I U) T A(I U)vec(V T ) Compute smallest eigenvalue of reduced matrix (rn rn) matrix (I U) T A(I U). Note: Computation of reduced matrix benefits from Kronecker structure of A. 18
ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X λ λ = 1.5 10 7 residual = 7.7 10 3 19
ALS ALS for solving λ min (A) min X=UV T 0 Orthonormalize V, fix V, optimize for U. X, A(X). X, X λ λ = 1 10 12 residual = 6 10 7 20
ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X λ λ = 7.6 10 13 residual = 7.2 10 8 21
Extension of ALS to TT Recall interface matrices X µ 1 R n 1n 2 n µ r µ 1, X µ R n µ+1n µ+2 n d r µ 1 yielding factorization X <µ> = X µ 1 X µ, T µ = 1,..., d 1. Combined with recursion X µ+1 T = Uµ R (X µ T I nµ ), this yields X <µ> = X µ 1 Uµ R X µ+1, T µ = 1,..., d 1. Hence, vec(x ) = (X µ+1 X µ 1 ) vec(u µ ) This formula allows us to pull out µth core! 22
Extension of ALS to TT A TT decomposition is called µ-orthogonal if and (U L ν) T U L ν = I rν, X T νx ν = I rν for ν = 1,..., µ 1. U R ν (U R ν ) T = I rν, X ν X T ν = I rµ for ν = µ + 1,..., d. This implies that X µ+1 X µ 1 has orthonormal columns! Consider eigenvalue problem Optimizing for µth core X, A(X ) λ min (A) = min X =0 X, X X, A(X ) vec U µ, A µ vec U µ min = min U µ 0 X, X U µ 0 vec U µ, vec U µ with r µ 1 n µ r µ r µ 1 n µ r µ matrix A µ = (X µ+1 X µ 1 ) T A(X µ+1 X µ 1 ) 23
Extension of ALS to TT U µ is obtained as eigenvector belonging to smallest eigenvalue of A µ. Computation of A µ for large d only feasible if A has low operator TT ranks (and is in operator TT decomposition). One microstep of ALS optimizes U µ and prepares for next core, by adjusting orthogonalization. One sweep of ALS consists of processing cores twice: once from left to right and once from right to left. 24
Extension of ALS to TT Input: X in right-orthogonal TT decomposition. 1: for µ = 1, 2,..., d 1 do 2: Compute A µ and replace core U µ by an eigenvector belonging to smallest eigenvalue of A µ. 3: Compute QR decomposition U L µ = QR. 4: Set U L µ Q. 5: Update U µ+1 R 1 U µ+1. 6: end for 7: for µ = d, d 1,..., 2 do 8: Compute A µ and replace core U µ by an eigenvector belonging to smallest eigenvalue of A µ. 9: Compute QR decomposition (U R µ ) T = QR. 10: Set U R µ Q T. 11: Update U µ 1 R 3 U µ 1. 12: end for 25
Extension of ALS to TT Remarks: Small matrix A µ quickly gets large as TT ranks increase Need to use iterative methods (e.g., Lanczos, LOBPCG), possibly combined with preconditioning [Kressner/Tobler 2011] for solving eigenvalue problems. In ALS TT ranks of X need to be chosen a priori. Adaptive choice of rank by merging neighbouring cores, optimizing for the merged core, and split the optimized merged core DMRG, modified ALS. Cheaper: AMEn [White 2005, Dolgov/Savostyanov 2013]. Principles of ALS easily extend to other optimization problems, e.g., linear systems [Holtz/Rohwedder/Schneider 2012]. 26
Numerical Experiments - Sine potential, d = 10 ALS 10 5 10 0 err_lambda res nr_iter 45 40 35 10 5 30 10 10 25 20 10 15 0 100 200 300 400 500 15 Execution time [s] Size = 128 10 10 21. Maximal TT rank 40. See [Kressner/Steinlechner/Uschmajew 2014] for details. 27
Numerical Experiments - Henon-Heiles potential, d = 20 ALS 10 5 10 0 err_lambda res nr_iter 60 50 40 10 5 30 10 10 20 10 10 15 0 500 1000 1500 2000 2500 0 Execution time [s] Size = 128 20 10 42. Maximal TT rank 40. 28
Numerical Experiments - 1/ ξ 2 potential, d = 20 ALS 10 5 10 0 err_lambda res nr_iter 30 25 20 10 5 15 10 10 10 5 10 15 0 500 1000 1500 0 Execution time [s] Size = 128 20 10 42. Maximal TT rank 30. 29