Block Bidiagonal Decomposition and Least Squares Problems

Similar documents
Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Total least squares. Gérard MEURANT. October, 2008

The Lanczos and conjugate gradient algorithms

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

Numerical Methods in Matrix Computations

UNIFYING LEAST SQUARES, TOTAL LEAST SQUARES AND DATA LEAST SQUARES

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Parallel Numerical Algorithms

APPLIED NUMERICAL LINEAR ALGEBRA

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Matrices, Moments and Quadrature, cont d

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Orthogonalization and least squares methods

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Main matrix factorizations

Lecture 03. Math 22 Summer 2017 Section 2 June 26, 2017

Section 6.4. The Gram Schmidt Process

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

EE731 Lecture Notes: Matrix Computations for Signal Processing

Matrix decompositions

Generalized MINRES or Generalized LSQR?

On the influence of eigenvalues on Bi-CG residual norms

Introduction. Chapter One

CG and MINRES: An empirical comparison

Sparse BLAS-3 Reduction

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

REORTHOGONALIZATION FOR GOLUB KAHAN LANCZOS BIDIAGONAL REDUCTION: PART II SINGULAR VECTORS

Residual statics analysis by LSQR

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LSMR: An iterative algorithm for least-squares problems

A Hybrid LSQR Regularization Parameter Estimation Algorithm for Large Scale Problems

LARGE SPARSE EIGENVALUE PROBLEMS

7. Symmetric Matrices and Quadratic Forms

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

Computing least squares condition numbers on hybrid multicore/gpu systems

This can be accomplished by left matrix multiplication as follows: I

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Matrix Operations: Determinant

Algorithms that use the Arnoldi Basis

14.2 QR Factorization with Column Pivoting

Computing tomographic resolution matrices using Arnoldi s iterative inversion algorithm

Problem set 5: SVD, Orthogonal projections, etc.

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Probabilistic upper bounds for the matrix two-norm

Krylov subspace projection methods

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Large-scale eigenvalue problems

ITERATIVE REGULARIZATION WITH MINIMUM-RESIDUAL METHODS

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Using semiseparable matrices to compute the SVD of a general matrix product/quotient

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

1 Conjugate gradients

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Singular Value Decomposition

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Linear Algebra Methods for Data Mining

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Vector and Matrix Norms. Vector and Matrix Norms

Course Notes: Week 1

Lecture 9 Least Square Problems

Linear Algebra, part 3 QR and SVD

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Math 405: Numerical Methods for Differential Equations 2016 W1 Topics 10: Matrix Eigenvalues and the Symmetric QR Algorithm

Problem # Max points possible Actual score Total 120

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Lecture 5 Singular value decomposition

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

Block Lanczos Tridiagonalization of Complex Symmetric Matrices

Krylov Subspace Methods that Are Based on the Minimization of the Residual

The Singular Value Decomposition and Least Squares Problems

Computational Methods. Eigenvalues and Singular Values

Model reduction of large-scale dynamical systems

arxiv: v1 [hep-lat] 2 May 2012

18.06SC Final Exam Solutions

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Linear Analysis Lecture 16

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Math 3191 Applied Linear Algebra

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

Topics in Numerical Linear Algebra

The Singular Value Decomposition

Notes on Eigenvalues, Singular Values and QR

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Scientific Computing: An Introductory Survey

Math 407: Linear Optimization

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

MATH 350: Introduction to Computational Mathematics

Orthonormal Transformations and Least Squares

Transcription:

Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008

Outline Bidiagonal Decomposition and LSQR Block Bidiagonalization Methods Multiple Right-Hand Sides in LS and TLS Rank Deficient Blocks and Deflation Summary

The Bidiagonal Decomposition In 1965 Golub and Kahan gave two algorithms for computing a bidiagonal decomposition (BD) ( ) L A = U V T R m n, 0 where B is lower bidiagonal, α 1 β 2 α 2 B =. β.. 3 R (n+1) n,... αn β n+1 and U and V are square orthogonal matrices. Given u 1 = Q 1 e 1, the decomposition is uniquely determined if α k β k+1 0, k = 1 : n,

The Bidiagonal Decomposition The Householder Algorithm For given Householder matrix Q 1, set A 0 = Q 1 A, and apply Householder transformations alternately from right and left: A k = Q k+1 (A k 1 P k ), k = 1, 2,.... P k zeros n k elements in kth row; Q k+1 zeros m k 1 elements in the kth column. If Q 1 is chosen so that Q 1 b = β 1 e 1, then U T ( b AV ) = where B is lower bidiagonal and ( ) β1 e 1 B, 0 0 U = Q 1 Q 2 Q n+1, V = P 1 P 2 P n 1.

The Bidiagonal Decomposition Lanczos algorithm In the Lanczos process, successive columns of U = (u 1, u 2,..., u n+1 ), V = (v 1, v 2,..., v n ), with β 1 u 1 = b, β 1 = b 2, are generated recursively. Equating columns in A T U = VB T and AV = UB, gives the coupled two-term recurrence relations (v 0 0): α k v k = A T u k β k v k 1, β k+1 u k+1 = Av k α k u k, k = 1 : n. Here α k and β k+1 are determined by the condition v k 2 = u k 2 = 1.

The Bidiagonal Decomposition Since the BD is unique, both algorithms generate the same decomposition in exact arithmetic. From the Lanczos recurrences it follows that U k = (u 1,..., u k ) and V k = (v 1,..., v k ), form orthonormal bases for the left and right Krylov subspaces [ ] K k (AA T, u 1 ) = span b, AA T b,..., (AA T ) k 1 b, [ ] K k (A T A, v 1 ) = span A T b,..., (A T A) k 1 A T b. If either α k = 0 or β k+1 = 0, the process terminates. Then the maximal dimensioned Krylov space has been reached.

Least Squares and LSQR The LSQR algorithm (Paige and Saunders 1982) is a Krylov subspace method for the LS problem min x b Ax 2. After k steps, one seeks an approximate solution x k = V k y k K k (A T A, A T b). From the recurrence formulas it follows that AV k = U k+1 B k. Thus b AV k y k = U k+1 (β 1 e 1 B k y k ), where B k is lower bidiagonal. By the orthogonality of U k+1 the optimal y k is obtained by solving a bidiagonal subproblem min y β 1 e 1 B k y 2,

Least Squares and LSQR This is solved using Givens rotations to transform B k into upper bidiagonal form giving G(β 1 e 1 B k ) = ( fk R k φ k+1 0 ). x k = VR 1 k f k, b Ax k 2 = φ k+1. Then x k, k = 1, 2, 3,... gives approximations on a nested sequence of Krylov subspaces. Often superior to truncated SVD. LSQR uses the Lanczos recursion and interleaves the solution of the LS subproblems. A Householder implementation could be used for dense A.

Least Squares and LSQR Bidiagonalization can also be used for total least squares (TLS) problems. Here the error function to be minimized is b Ax k 2 2 x k 2 2 + 1 = β 1e 1 B k y k 2 2 y k 2 2 + 1. The solution of the lower bidiagonal TLS subproblems are obtained from the SVD of matrix ( β1 e 1 B k ), k = 1, 2, 3.... Partial least squares (PLS) is widely used in data analysis and statistics. PLS is mathematically equivalent to LSQR but differently implemented. The predictive variables tend to be many and are often highly correlated.

Outline Bidiagonal Decomposition and LSQR Block Bidiagonalization Methods Multiple Right-Hand Sides in LS and TLS Rank Deficient Blocks and Deflation Summary

Block Bidiagonalization Methods Let Q 1 be a given m m orthogonal matrix and set ( ) Ip U 1 = Q 1 R m p. 0 Next form Q1 T A and compute the LQ factorization of its first p rows (I p 0)(Q1 T A)P 1 = (L 1 0), where L 1 is lower triangular. Proceed by alternately performing QR factorizations of blocks of p columns and LQ factorizations of blocks of p rows.

Block Bidiagonalization Methods After k steps we have computed a block bidiagonal matrix L 1 R 2 L 2 T k =...... R k L k R k+1 R (k+1)p kp, and two block matrices with orthogonal columns ( ) I(k+1)p (U 1, U 2,..., U k+1 ) = Q k+1 Q 2 Q 1, 0 ( ) Ikp (V 1, V 2,..., V k ) = P k P 2 P 1. 0 The matrix T k is a banded lower triangular matrix with (p + 1) nonzero diagonals.

Block Bidiagonalization Methods The Householder algorithm gives a constructive proof of the existence of such a block bidiagonal decomposition. If the LQ and QR factorizations have full rank then the decomposition is uniquely determined by U 1. To derive a block Lanczos algorithm we use the identities A(V 1 V 2 V n ) = (U 1 U 2 U n+1 )T n A T (U 1 U 2 U n+1 ) = (V 1 V 2 V n )Tn T. Start by forming Z 1 = A T U 1 and compute its thin QR factorization Z 1 = V 1 L T 1.

Block Bidiagonalization Methods For k = 1 : n, we then do: Compute the residuals W k = AV k U k L k R m p, and compute the QR factorization W k = U k+1 R k+1. Compute the residuals Z k+1 = A T U k+1 V k R T k+1 Rn p, and compute the QR factorization Z k+1 = V k+1 L T k+1 ; Householder QR factorizations can be used to guarantee orthogonality within each block U k and V k.

Block Bidiagonalization Methods The block Lanczos bidiagonalization was given by Golub, Luk, and Overton in 1981. From the Lanczos recurrence relations it follows by induction that for k = 1, 2, 3,..., the block algorithm generates orthogonal bases for the Krylov spaces span(u 1,..., U k ) = K k (AA T, U 1 ), span(v 1,..., V k ) = K k (A T A, A T U 1 ), The process can be continued as long as the residual matrices have full column rank. Rank deficiency will be considered later.

Outline Bidiagonal Decomposition and LSQR Block Bidiagonalization Methods Multiple Right-Hand Sides in LS and TLS Rank Deficient Blocks and Deflation Summary

Multiple Right-Hand Sides in LS and TLS In many applications one needs to solve least squares problems with multiple right-hand sides B = (b 1,..., b p ). B = (b 1,..., b d ), d 2. There are two possible approaches; Select a seed right-hand side and use the Krylov subspace generated to start up the solution of the second, etc. Use a block Krylov solver where all right-hand sides are treated simultaneously. Block Krylov methods: use a much larger search space from the start introduce matrix matrix multiplies into the algorithm.

Multiple Right-Hand Sides in LS and TLS A natural generalization of LSQR is to compute a sequence of approximate solutions of the form X k = V k Y k, k = 1, 2, 3,..., are determined. That is, X k is restricted to lie in a Krylov subspace K k (A T A, A T B). It follows that B AX k = B AV k Y k = U k+1 (R 1 E 1 T k Y k ). Using the orthogonality of the columns of U k+1 gives B AX k F = R 1 E 1 T k Y k F.

Block LSQR Hence B AX k F is minimized for X k R(V k ) by taking X k = V k Y k, where Y k solves min Y k R 1 E 1 T k Y k F. The approximations X k are the optimal solutions on the nested sequence of block Krylov subspaces. K k (A T A, A T B), k = 1, 2, 3,..... The block bidiagonal LS subproblems are solved by using orthogonal transformations to bring the lower triangular banded matrix T k into upper triangular banded form.

Multiple Right-Hand Sides in LS and TLS As in LSQR the solution can be interleaved with the block bidiagonalization. For example, (p = 2) ( ) C1 R = X S 1. C 2 0 S 2 At this step the approximate solution to min AX B F, is X = V k (R 1 X C 1), B AX F = C 2 f.

Multiple Right-Hand Sides in LS and TLS The block algorithm can be used also for TLS problems with multiple right-hand sides which is min E, F ( F E ) F, (A + E)X = B + F, This cannot, as the LS problem, be reduced to p separate problems. Using the orthogonal invariance, the error function to be minimized is B AX 2 F X 2 F + 1 = R 1E 1 TY 2 F Y 2 F + 1.

Multiple Right-Hand Sides in LS and TLS The TLS solution can be expressed in terms of the SVD ( ) ( ) ( ) ( ) Σ B A = U1 U 1 V T 1 2 Σ 2 V2 T, where Σ 1 = diag(σ 1,..., σ n ). The solution of the lower triangular block bidiagonal TLS problem can be constructed from the SVD of the block bidiagonal matrix.

Outline Bidiagonal Decomposition and LSQR Block Bidiagonalization Methods Multiple Right-Hand Sides in LS and TLS Rank Deficient Blocks and Deflation Summary

Rank Deficient Blocks and Deflation When rank deficient blocks occur the bidiagonalization must be modified. For example (p = 1), at a particular step Q 3 Q 2 Q 1 (b, AP 1 P 2 ) = β 1 α 1 β 2 α 2 β 3, If α 3 = 0 or β 4 = 0, the reduced matrix has into block diagonal form. Then the LS problem decomposes and the bidiagonalization can be terminated.

Rank Deficient Blocks and Deflation If α k = 0, then and the problem is reduced to Then the right Krylov vectors are linearly dependent. ( ) Uk T (b, AV β1 e k) = 1 T k 0 0 0 A k min y β 1 e 1 T k y 2. A T b, (A T A)A T b,..., (A T A) k 1 A T b

Rank Deficient Blocks and Deflation If β k+1 = 0, then ( ) Uk+1 T (b, AV β1 e k) = 1 Tk 0 0 0 A k where T k = (T k α k e k ) is square. Then the original system is consistent and the solution satisfies Then the left Krylov vectors are linearly dependent. T k y = β 1 e 1. b, (AA T )b,..., (AA T ) k b

Rank Deficient Blocks and Deflation From well known properties of tridiagonal matrices it follows: The matrix T k has full column rank and its singular values are simple. The right-hand side βe 1 has nonzero components along each left singular vector of T k. Paige and Strakoš call this a core subproblem and shows that it is minimally dimensioned. If we scale b := γb, then only β 1 changes. Thus, we have a core subproblem for any weighted TLS problem. min (γr, E) F s.t. (A + E)x = b + r.

Rank Deficient Blocks and Deflation How do we proceed in the block algorithm if a rank deficient block occurs in an LQ or QR factorization? The triangular factor then typically has the form (p = 4 and r = rank(r k ) = 3): R k = 0 0 Rr p. 0 (Recall that the factorizations are performed without pivoting). All following elements in this diagonal of T k will be zero. The bandwidth and block size are reduced by one.

Rank Deficient Blocks and Deflation Example p = 2: Assume that the first element in the diagonal of the block R 2 becomes zero.. If a diagonal element (say in L 3 ) becomes zero the problem decomposes. In general, the reduction can be terminated when the bandwidth has been reduced p times.

Rank Deficient Blocks and Deflation Deflation is related to linear dependencies in the associated Krylov subspaces: If (AA T ) k b j is linearly dependent on previous vectors, then all left and right Krylov vectors of the form should be removed. (AA T ) p b j, (A T A) p A T b j, p k, If (A T A) k A T b j is linearly dependent on previous vectors, then all vectors of the form should be removed. (A T A) p A T b j, (AA T ) p+1 b j, p k, When the reduction terminates when both left and right Krylov subspaces have reached their maximal dimensions.

Rank Deficient Blocks and Deflation The block Lanczos algorithm is modified similarly when a rank deficient block occurs. Recall the block Lanczos recurrence: For k = 1 : n, V k L T k = AT U k V k 1 R T k, U k+1 R k+1 = AV k U k L k. Suppose the first rank deficient block is L k R p r, r < p. Then V k is n r, i.e. only r < p vectors are determined. In the next step AV k U k L k R m r. Recurrence still works! Block size has been reduced to r.

Outline Bidiagonal Decomposition and LSQR Block Bidiagonalization Methods Multiple Right-Hand Sides in LS and TLS Rank Deficient Blocks and Deflation Summary

We have emphasized the similarity and (mathematical) equivalence of the Householder and Lanczos algorithms for block bidiagonalization. For dense problems in data analysis and statistics the Householder algorithm should be used because of its backward stability. Today quite large dense problems can be handled in seconds on a desktop computer! The relationship of block LSQR to multivariate PLS needs further investigation.

G. H. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. SIAM J. Numer. Anal. Ser. B, 2:205 224, 1965. G. H. Golub, F. T. Luk, and M. L. Overton. A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix. ACM Trans. Math. Software, 7:149 169, 1981. S. Karimi and F. Toutounian. The block least squares method for solving nonsymmetric linear systems with multiple right-hand sides. Appl. Math. Comput., 177:852 862, 2006. C. C. Paige and M. A. Saunders. LSQR. An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43 71, 1982.