Low Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Similar documents
Numerical tensor methods and their applications

Preconditioned low-rank Riemannian optimization for linear systems with tensor product structure

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

APPLIED NUMERICAL LINEAR ALGEBRA

arxiv: v2 [math.na] 13 Dec 2014

Low Rank Approximation Lecture 3. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

The Conjugate Gradient Method

Exploiting off-diagonal rank structures in the solution of linear matrix equations

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Computational methods for large-scale linear matrix equations and application to FDEs. V. Simoncini

In this section again we shall assume that the matrix A is m m, real and symmetric.

Numerical Methods in Matrix Computations

COMP 558 lecture 18 Nov. 15, 2010

9.1 Preconditioned Krylov Subspace Methods

5. Solving the Bellman Equation

Domain decomposition on different levels of the Jacobi-Davidson method

Krylov subspace methods for linear systems with tensor product structure

Preconditioned Locally Minimal Residual Method for Computing Interior Eigenpairs of Symmetric Operators

On the convergence of higher-order orthogonality iteration and its extension

Lecture 18 Classical Iterative Methods

Large-scale eigenvalue problems

Index. for generalized eigenvalue problem, butterfly form, 211

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Iterative methods for symmetric eigenvalue problems

1 Non-negative Matrix Factorization (NMF)

Introduction to Iterative Solvers of Linear Systems

Lecture 3: Inexact inverse iteration with preconditioning

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Maths for Signals and Systems Linear Algebra in Engineering

QR-decomposition. The QR-decomposition of an n k matrix A, k n, is an n n unitary matrix Q and an n k upper triangular matrix R for which A = QR

Introduction to Arnoldi method

Numerical Methods - Numerical Linear Algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Inexact Inverse Iteration for Symmetric Matrices

arxiv: v1 [hep-lat] 2 May 2012

Matrix Equations and and Bivariate Function Approximation

MATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations

S N. hochdimensionaler Lyapunov- und Sylvestergleichungen. Peter Benner. Mathematik in Industrie und Technik Fakultät für Mathematik TU Chemnitz

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Block Bidiagonal Decomposition and Least Squares Problems

Computational Linear Algebra

EIGIFP: A MATLAB Program for Solving Large Symmetric Generalized Eigenvalue Problems

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

The rational Krylov subspace for parameter dependent systems. V. Simoncini

Iterative Methods for Sparse Linear Systems

Course Notes: Week 1

Chapter 7 Iterative Techniques in Matrix Algebra

Preconditioned GMRES Revisited

Math 671: Tensor Train decomposition methods II

NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA

Numerical Methods I: Eigenvalues and eigenvectors

Simple iteration procedure

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

6.4 Krylov Subspaces and Conjugate Gradients

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

AMS526: Numerical Analysis I (Numerical Linear Algebra)

18.06SC Final Exam Solutions

FEM and sparse linear system solving

Structure preserving Krylov-subspace methods for Lyapunov equations

Numerical Methods for Solving Large Scale Eigenvalue Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

MATH 423 Linear Algebra II Lecture 10: Inverse matrix. Change of coordinates.

Orthogonal tensor decomposition

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Solvability of Linear Matrix Equations in a Symmetric Matrix Variable

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

1 Inner Product and Orthogonality

Preconditioned inverse iteration and shift-invert Arnoldi method

Lecture 2: Linear Algebra Review

Real Eigenvalue Extraction and the Distance to Uncontrollability

Inexact inverse iteration for symmetric matrices

Stabilization and Acceleration of Algebraic Multigrid Method

PETROV-GALERKIN METHODS

Lecture 4 Orthonormal vectors and QR factorization

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Motivation: Sparse matrices and numerical PDE's

M.A. Botchev. September 5, 2014

Boundary Value Problems - Solving 3-D Finite-Difference problems Jacob White

Lecture 9: Numerical Linear Algebra Primer (February 11st)

University of Maryland Department of Computer Science TR-5009 University of Maryland Institute for Advanced Computer Studies TR April 2012

Conjugate Gradient (CG) Method

AMS526: Numerical Analysis I (Numerical Linear Algebra)

DELFT UNIVERSITY OF TECHNOLOGY

CLASSICAL ITERATIVE METHODS

On Lagrange multipliers of trust-region subproblems

Order reduction numerical methods for the algebraic Riccati equation. V. Simoncini

Fast iterative solvers for fractional differential equations

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Approximate Low Rank Solution of Generalized Lyapunov Matrix Equations via Proper Orthogonal Decomposition

Numerical Methods I Non-Square and Sparse Linear Systems

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble.

ON THE GLOBAL KRYLOV SUBSPACE METHODS FOR SOLVING GENERAL COUPLED MATRIX EQUATIONS

ENGG5781 Matrix Analysis and Computations Lecture 9: Kronecker Product

Transcription:

Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

Alternating least-squares / linear scheme General setting: Solve optimization problem min f (X), X where X is a (large) matrix or tensor and f is simple (e.g., convex). Constrain X to M r, set of rank-r matrices or tensors and aim at solving min X M r f (X), Set X = i(u 1, U 2,..., U d ). (e.g., X = U 1 U2 T ). Low-rank formats are multilinear there is hope that optimizing for each component is simple: min U µ f (i(u 1, U 2,..., U d )). 2

Alternating least-squares / linear scheme Set f (U 1,..., U d ) := f (i(u 1,..., U d )). ALS: 1: while not converged do 2: U 1 arg min U1 f (i(u 1, U 2,..., U d )) 3: U 2 arg min U1 f (i(u 1, U 2,..., U d )) 4:... 5: U d arg min U1 f (i(u 1, U 2,..., U d )) 6: end while Examples: ALS for fitting CP decomposition Subspace iteration. Closely related: Block Gauss-Seidel, Block Coordinate Descent. Difficulties: Representation (U 1, U 2,..., U d ) often non-unique, parameters may become unbounded. M r not closed Convergence (analysis) 3

Subspace iteration and ALS Given A R m n, consider computation of best rank-r approximation: min f (U, V ), U R m r,v R n r f (U, V ) := A UV T 2 F Representation UV T is unique for each U, V individually if U, V have rank r. f is convex wrt U and V individually. Hence, U f (U, V ), H = f (U + H, V ) f (U, V ) + O( H 2 2) = 2 AV UV T V, H. 0 = U f (U, V ) = 2(AV UV T V ) U = AV (V T V ) 1. For stability it is advisable to choose V such that it has orthonormal columns. 4

Subspace iteration and ALS ALS for low-rank matrix approximation: 1: while not converged do 2: Compute economy size QR factorization: V = QR and update V Q. 3: U AV 4: Compute economy size QR factorization: U = QR and update U Q. 5: V A T U 6: end while Returns an approximation A UV T. This is the subspace iteration from Lecture 1! EFY. Develop an ALS method for solving the weighted low-rank approximation problem with square and invertible matrices W L, W R. min U,V W L (A UV T )W R F 5

Linear matrix equations For linear operator L : R m n R m n, consider linear system Examples: 1 Sylvester matrix equation: L(X) = C, C, X R m n. AX + XB = C, A R m m, B R n n, C, X R m n. Applications: Discretized 2D Laplace on rectangle, stability analysis, optimal control, model reduction of linear control systems. Special case Lyapunov equations: m = n, A = B T, C symmetric (and often negative semi-definite) Stochastic Galerkin methods in uncertainty quantification. Stochastic control. 1 See [V. Simoncini, Computational methods for linear matrix equations, SIAM Rev., 58 (2016), pp. 377 441] for details and references. 6

Linear matrix equations Using the matrix M L representing L in canonical bases, we can rewrite L(X) = B as linear system M L (vec(x)) = vec(c). Assumption: M L has low Kronecker rank: M L = B 1 A 1 + + B R A R, R m, n. Equivalently, L(X) = A 1 XB T 1 + + A R XB T R EFY. Develop a variant of ACA (from Lecture 3) that aims at approximating a given sparse matrix A by a matrix of low Kronecker rank for given m, n. EFY. Show that if m = n, M L is symmetric and has Kronecker rank R, one can find symmetric matrices A 1,..., A R, B 1,..., B R such that L(X) = A 1 XB 1 + + A R XB R. Is it always possible to choose all A k, B k positive semi-definite if M L is positive definite? 7

Linear matrix equations Two ways of turning L(X) = C into optimization problem: 1. If M L is symmetric positive definite: 2. General L: min L(X), X X, B. 2 X 1 Will focus on spd M L in the following. min L(X) B 2 F X 8

Linear matrix equations Low-rank approximation of L(X) = B obtained by solving min f (U, V ) for f (U, V ) = 1 U,V 2 L(UV T ), UV T UV T, C. Let L have Kronecker rank R. Then L(UV T ), UV T = R R A k UV T B k, UV T = A k UV T B k V, U. k=1 k=1 This shows that arg min U f (U, V ) is solution of linear matrix equation A 1 U(V T B 1 V ) + + A R U(V T B R V ) = CV. EFY. Show that this linear matrix equation always has a unique solution under the assumption that L is symmetric positive definite. For R = 2, can be reduced to R linear systems of size n n. For R > 2, need to solve Rn Rn system. 9

Linear matrix equations ALS for linear matrix equations: 1: while not converged do 2: Compute economy size QR factorization: V = QR and update V Q. 3: Solve A 1 U(V T B 1 V ) + + A R U(V T B R V ) = CV for U. 4: Compute economy size QR factorization: U = QR and update U Q. 5: Solve (U T A 1 U)V T B 1 + + (U T A R U)V T B R = U T C for V. 6: end while Returns an approximation X UV T. For R = 2, there are better alternatives: ADI, Krylov subspace methods,... [Simoncini 2016]. 10

2D eigenvalue problem u(x) + V (x)u = λu(x) in Ω = [0, 1] [0, 1] with Dirichlet b.c. and Henon-Heiles potential V Regular discretization Reshaped ground state into matrix Ground state 10 0 Singular values 10 5 10 10 10 15 10 20 0 100 200 300 Excellent rank-10 approximation possible 11

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then We now... x, Ax λ min (A) = min x 0 x, x. reshape vector x into n n matrix X; reinterpret Ax as linear operator A : X A(X). 12

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then X, A(X) λ min (A) = min X 0 X, X with matrix inner product,. We now... restrict X to low-rank matrices. 13

Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 n 2 matrix A. Then λ min (A) min X=UV T 0 X, A(X). X, X Approximation error governed by low-rank approximability of X. Solved by Riemannian optimization techniques or ALS. 14

ALS for eigenvalue problem ALS for solving X, A(X) λ min (A) min. X=UV T 0 X, X Initially: fix target rank r U R m r, V n r randomly, such that V is ONB λ λ = 6 10 3 residual = 3 10 3 15

ALS for eigenvalue problem ALS for solving Fix V, optimize for U. λ min (A) min X=UV T 0 X, A(X). X, X X, A(X) = vec(uv T ) T A vec(uv T ) = vec(u) T (V I) T A(V I)vec(U) Compute smallest eigenvalue of reduced matrix (rn rn) matrix (V I) T A(V I). Note: Computation of reduced matrix benefits from Kronecker structure of A. 16

ALS for eigenvalue problem ALS for solving Fix V, optimize for U. λ min (A) min X=UV T 0 X, A(X). X, X λ λ = 2 10 3 residual = 2 10 3 17

ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X X, A(X) = vec(uv T ) T A vec(uv T ) = vec(v T )(I U) T A(I U)vec(V T ) Compute smallest eigenvalue of reduced matrix (rn rn) matrix (I U) T A(I U). Note: Computation of reduced matrix benefits from Kronecker structure of A. 18

ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X λ λ = 1.5 10 7 residual = 7.7 10 3 19

ALS ALS for solving λ min (A) min X=UV T 0 Orthonormalize V, fix V, optimize for U. X, A(X). X, X λ λ = 1 10 12 residual = 6 10 7 20

ALS for eigenvalue problem ALS for solving λ min (A) min X=UV T 0 Orthonormalize U, fix U, optimize for V. X, A(X). X, X λ λ = 7.6 10 13 residual = 7.2 10 8 21

Extension of ALS to TT Recall interface matrices X µ 1 R n 1n 2 n µ r µ 1, X µ R n µ+1n µ+2 n d r µ 1 yielding factorization X <µ> = X µ 1 X µ, T µ = 1,..., d 1. Combined with recursion X µ+1 T = Uµ R (X µ T I nµ ), this yields X <µ> = X µ 1 Uµ R X µ+1, T µ = 1,..., d 1. Hence, vec(x ) = (X µ+1 X µ 1 ) vec(u µ ) This formula allows us to pull out µth core! 22

Extension of ALS to TT A TT decomposition is called µ-orthogonal if and (U L ν) T U L ν = I rν, X T νx ν = I rν for ν = 1,..., µ 1. U R ν (U R ν ) T = I rν, X ν X T ν = I rµ for ν = µ + 1,..., d. This implies that X µ+1 X µ 1 has orthonormal columns! Consider eigenvalue problem Optimizing for µth core X, A(X ) λ min (A) = min X =0 X, X X, A(X ) vec U µ, A µ vec U µ min = min U µ 0 X, X U µ 0 vec U µ, vec U µ with r µ 1 n µ r µ r µ 1 n µ r µ matrix A µ = (X µ+1 X µ 1 ) T A(X µ+1 X µ 1 ) 23

Extension of ALS to TT U µ is obtained as eigenvector belonging to smallest eigenvalue of A µ. Computation of A µ for large d only feasible if A has low operator TT ranks (and is in operator TT decomposition). One microstep of ALS optimizes U µ and prepares for next core, by adjusting orthogonalization. One sweep of ALS consists of processing cores twice: once from left to right and once from right to left. 24

Extension of ALS to TT Input: X in right-orthogonal TT decomposition. 1: for µ = 1, 2,..., d 1 do 2: Compute A µ and replace core U µ by an eigenvector belonging to smallest eigenvalue of A µ. 3: Compute QR decomposition U L µ = QR. 4: Set U L µ Q. 5: Update U µ+1 R 1 U µ+1. 6: end for 7: for µ = d, d 1,..., 2 do 8: Compute A µ and replace core U µ by an eigenvector belonging to smallest eigenvalue of A µ. 9: Compute QR decomposition (U R µ ) T = QR. 10: Set U R µ Q T. 11: Update U µ 1 R 3 U µ 1. 12: end for 25

Extension of ALS to TT Remarks: Small matrix A µ quickly gets large as TT ranks increase Need to use iterative methods (e.g., Lanczos, LOBPCG), possibly combined with preconditioning [Kressner/Tobler 2011] for solving eigenvalue problems. In ALS TT ranks of X need to be chosen a priori. Adaptive choice of rank by merging neighbouring cores, optimizing for the merged core, and split the optimized merged core DMRG, modified ALS. Cheaper: AMEn [White 2005, Dolgov/Savostyanov 2013]. Principles of ALS easily extend to other optimization problems, e.g., linear systems [Holtz/Rohwedder/Schneider 2012]. 26

Numerical Experiments - Sine potential, d = 10 ALS 10 5 10 0 err_lambda res nr_iter 45 40 35 10 5 30 10 10 25 20 10 15 0 100 200 300 400 500 15 Execution time [s] Size = 128 10 10 21. Maximal TT rank 40. See [Kressner/Steinlechner/Uschmajew 2014] for details. 27

Numerical Experiments - Henon-Heiles potential, d = 20 ALS 10 5 10 0 err_lambda res nr_iter 60 50 40 10 5 30 10 10 20 10 10 15 0 500 1000 1500 2000 2500 0 Execution time [s] Size = 128 20 10 42. Maximal TT rank 40. 28

Numerical Experiments - 1/ ξ 2 potential, d = 20 ALS 10 5 10 0 err_lambda res nr_iter 30 25 20 10 5 15 10 10 10 5 10 15 0 500 1000 1500 0 Execution time [s] Size = 128 20 10 42. Maximal TT rank 30. 29