Gram-Schmidt Orthogonalization: 100 Years and More

Similar documents
14.2 QR Factorization with Column Pivoting

NUMERICS OF THE GRAM-SCHMIDT ORTHOGONALIZATION PROCESS

The Lanczos and conjugate gradient algorithms

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe

A stable variant of Simpler GMRES and GCR

Numerical Methods in Matrix Computations

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

ETNA Kent State University

Inexactness and flexibility in linear Krylov solvers

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

ETNA Kent State University

Orthogonal Transformations

6.4 Krylov Subspaces and Conjugate Gradients

Block Bidiagonal Decomposition and Least Squares Problems

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

Course Notes: Week 1

Applied Linear Algebra in Geoscience Using MATLAB

Lecture 8: Fast Linear Solvers (Part 7)

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Lecture 6, Sci. Comp. for DPhil Students

Rounding error analysis of the classical Gram-Schmidt orthogonalization process

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Computational Methods. Least Squares Approximation/Optimization

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Numerical Methods - Numerical Linear Algebra

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Rounding error analysis of the classical Gram-Schmidt orthogonalization process

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Algebra C Numerical Linear Algebra Sample Exam Problems

Orthogonalization and least squares methods

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Index. for generalized eigenvalue problem, butterfly form, 211

Applied Linear Algebra in Geoscience Using MATLAB

MS&E 318 (CME 338) Large-Scale Numerical Optimization

M.A. Botchev. September 5, 2014

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Matrix decompositions

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Least-Squares Systems and The QR factorization

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Dense LU factorization and its error analysis

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Linear Algebra, part 3 QR and SVD

Linear Analysis Lecture 16

Orthonormal Transformations and Least Squares

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

This can be accomplished by left matrix multiplication as follows: I

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Introduction. Chapter One

Matrix Algebra for Engineers Jeffrey R. Chasnov

Numerical behavior of inexact linear solvers

SUMMARY OF MATH 1600

Linear Algebra in Actuarial Science: Slides to the lecture

Practical Linear Algebra: A Geometry Toolbox

Linear Algebra Massoud Malek

Why the QR Factorization can be more Accurate than the SVD

Solving large sparse Ax = b.

Chapter 7 Iterative Techniques in Matrix Algebra

Iterative Methods for Sparse Linear Systems

MATH 532: Linear Algebra

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Applied Numerical Linear Algebra. Lecture 8

On the influence of eigenvalues on Bi-CG residual norms

Lecture notes: Applied linear algebra Part 1. Version 2

Solution of Linear Equations

Lab 1: Iterative Methods for Solving Linear Systems

Linear Least-Squares Data Fitting

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Scientific Computing

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computing least squares condition numbers on hybrid multicore/gpu systems

Dominant feature extraction

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Least squares

EE731 Lecture Notes: Matrix Computations for Signal Processing

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Some minimization problems

FEM and sparse linear system solving

Sparse BLAS-3 Reduction

The QR Factorization

EECS 275 Matrix Computation

Iterative methods for Linear System

MAT 610: Numerical Linear Algebra. James V. Lambers

Augmented GMRES-type methods

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Computational Linear Algebra

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Transcription:

Gram-Schmidt Orthogonalization: 100 Years and More September 12, 2008

Outline of Talk Early History (1795 1907) Middle History 1. The work of Åke Björck Least squares, Stability, Loss of orthogonality 2. The work of Heinz Rutishauser Selective reorthogonalization and Superorthogonalization Modern Research 1. Recent results on the Gram-Schmidt algorithms 2. Column Pivoting and Rank Revealing factorizations 3. Applications to Krylov Subspace Methods

Earlier History Least Squares - Gauss and Legendre Laplace 1812, Analytic Theory of Probabilities (1814, 1820) Cauchy 1837, 1847 (Interpolation) and Bienaymé 1853 J. P. Gram, 1879(Danish), 1883(German) Erhard Schmidt 1907

Gauss and least squares Priority dispute: A. M. Legendre -first publication 1805; Gauss claimed discovery in 1795 G. Piazzi, January 1801 discovered asteroid Ceres. Tracked it for 6 weeks.

The Gauss derivation of the normal equations Let ˆx be the solution A T Ax = A T b. For any x we have r = (b Aˆx) + A(ˆx x) ˆr + Ae, and since A Tˆr = A T b A T Aˆx = 0 r T r = (ˆr + Ae) T (ˆr + Ae) = ˆr Tˆr + (Ae) T (Ae). Hence r T r is minimized when x = ˆx.

Pierre-Simon Laplace (1749 1827) Mathematical Astronomy Celestial Mechanics Laplace s Equation Laplace Transforms Probability Theory and Least Squares

Théorie Analytique des Probabilités - Laplace 1812 Problem: Compute masses of Saturn and Jupiter from systems of normal equations (Bouvart) and to compute the distribution of error in the solutions. Method: Laplace successively projects the system of equations orthogonally to a column of the observation matrix to eliminate all variables but the one of interest. Basically Laplace uses MGS to prove that, given an overdetermined system with a normal perturbation on the right-hand side, its solution has a centered normal distribution with variance independent from the parameters of the noise.

Laplace 1812 - Linear Algebra Laplace uses MGS to derive the Cholesky form of the normal equations, R T Rx = A T x Laplace does not seem to realize that the vectors generated are mutually orthogonal. He does observe that the generated vectors are each orthogonal to the residual vector.

Cauchy and Bienaymé Figure: A. Cauchy Figure: I. J. Bienaymé

Cauchy and Bienaymé Cauchy (1837) and (1847)- interpolation method leading to systems of the form Z T Ax = Z T b, where Z = (z 1,..., z n ) and z ij = ±1. Bienaymé (1853) new derivation of Cauchy s algorithm based on Gaussian elimination. Bienaymé noted that the Cauchy s choice of Z was not optimal in the least squares sense. Least squares solution if Z = A (normal equations) or more generally if R(Z) = R(A). The matrix Z can be determined a column at a time as the elimination steps are carried out.

3 3 example z1 T a 1 z1 T a 2 z1 T a 3 z2 T a 1 z2 T a 2 z2 T a 3 z3 T a 1 z3 T a 2 z3 T a 3 Transform i, jth element (2 i, j 3) zi T a j zt i a 1 z1 T a z1 T a j = zi T 1 x 1 x 2 x 3 = z T 1 b z T 2 b z T 3 b ( a j zt 1 a ) j z1 T a a 1 zi T a (2) j, 1

Bienaymé Reduction The reduced system has the form zt 2 a(2) 2 z2 T a(2) 3 z3 T a(2) 2 z3 T a(2) 3 where we have defined x 2 x 3 = zt 2 b(2) z3 T b(2) a (2) j = a j zt 1 a j z1 T a a 1, 1 b (2) = b zt 1 b z T 1 b a 1. Finally z 3 is chosen and used to form the single equation z T 3 a (3) 3 x 3 = z T 3 b (3). Taking the first equation from each step gives a triangular system defining the solution.

An interesting choice of Z Åke Björck has noted that since we want R(Z) = R(A), if we choose Z = Q where Then we have q 1 = a 1, q 2 = a (2) 2, q 3 = a (3) 3,... q 2 = a 2 qt 1 a 2 q1 T q q 1, q 3 = a (2) 1 3 qt 2 a(2) 3 q T 2 q 2 q 2,..., which is exactly the modified Gram-Schmidt procedure!

Jørgen Pedersen Gram (1850 1916) Dual career: Mathematics (1873) and Insurance (1875, 1884) Research in modern algebra, number theory, models for forest management, integral equations, probability, numerical analysis. Active in Danish Math Society, edited Tidsskrift journal (1883 89). Best known for his orthogonalization process

J. P. Gram: 1879 Thesis and 1883 paper Series expansions of real functions using least squares MGS orthogonalization applied to orthogonal polynomials Determinantal representation for resulting orthogonal functions Application to integral equations

Erhard Schmidt (1876 1959) Ph.D. on integral equations, Gottengin 1905 Student of David Hilbert 1917 University of Berlin, set up Inst for Applied Math Director of Math Res Inst of German Acad of Sci Known for his work in Hilbert Spaces Played important role in the development of modern functional analysis

Schmidt 1907 paper p 442 CGS for sequence of functions φ 1,..., φ n with respect to inner product f, g = b a f (x)g(x)dx p 473 CGS for an infinite sequence of functions In footnote (p 442) Schmidt claims that in essence the formulas are due to J. P. Gram. Actually Gram does MGS and Schmidt does CGS. 1935 Y. K. Wong paper refers to Gram-Schmidt Orthogonalization Process (First such linkage?)

Original algorithm of Schmidt for CGS ψ 1 (x) = ψ 2 (x) =. ψ n (x) = φ 1 (x) b a φ 1(y) 2 dy φ 2 (x) ψ 1 (x) b a φ 2(z)ψ 1 (z))dz b a (φ 2(y) ψ 1 (y) b a φ 2(z)ψ 1 (z))dz) 2 dy φ n (x) ρ=n 1 ρ=1 ψ ρ (x) b a φ n(z)ψ ρ (z)dz b a (φ n(x) ρ=n 1 ρ=1 ψ ρ (x) b a φ n(z)ψ ρ (z)dz) 2 dy

Åke Björck Specialist in Numerical Least Squares 1967 Backward stability of MGS for least squares 1992 Björck and Paige: Loss and recapture of orthogonality in MGS 1994 Numerics of Gram-Schmidt Orthogonalization 1996 SIAM: Numerical Methods for Least Squares

Björck 1967: Solving Least Squares Problems by Gram-Schmidt Orthogonalization If A has computed MGS factorization Loss of orthogonality in MGS Q R I Q T Q 2 c 1 (m, n) 1 c 2 (m, n)κu κu Backward stability A + E = Q R where E c(m, n)u A 2 and Q T Q = I

Björck and Paige 1992 Ã = O A = Q R = Q 11 Q 12 R1 Q 21 Q22 O where Q = H 1 H 2 H n (a product of Householder matrices) Q 11 = O and A = Q 21 R1 (C. Sheffield) Q 21 R1 and A = QR (MGS) numerically equivalent In fact if Q = (q 1,..., q n ) then where H k = I v k v T k, v k = e k q k, k = 1,..., n k = 1,..., n

Loss of Orthogonality in CGS With CGS you could have catastrophic cancellation. Gander 1980 - Better to compute Cholesky factorization of A T A and then set Q = AR 1 Smoktunowicz, Barlow, and Langou, 2006 If A T A is numerically nonsingular and the pythagorean version of CGS is used then the loss of orthogonality is proportional to κ 2

Pythagorean version of CGS Computation of diagonal entry r kk at step k. CGS: r kk = w k 2 where w k = a k k 1 i=1 r ikq i CGSP: If k 1 1/2 s k = a k and p k = rik 2 then r 2 kk + p2 k = s2 k and hence i=1 r kk = (s k p k ) 1/2 (s k + p k ) 1/2

Heinz Rutishauser (1918 1970) Pioneer in Computer Science and Computational Mathematics Long history with ETH as both student and distinguished faculty Selective Reorthogonalization for MGS Superorthogonalization

Classic Gram-Schmidt CGS Algorithm Q = A; for k = 1 : n for i = 1 : k 1; R(i, k) = Q(:, i) Q(:, k); end (Omit this line for CMGS) for i = 1 : k 1, (Omit this line for CMGS) Q(:, k) = Q(:, k) R(i, k) Q(:, i); end R(k, k) = norm(q(:, k)); Q(:, k) = Q(:, k)/r(k, k); end

Columnwise MGS CMGS Algorithm Q = A; for k = 1 : n for i = 1 : k 1 R(i, k) = Q(:, i) Q(:, k); Q(:, k) = Q(:, k) R(i, k) Q(:, i); end R(k, k) = norm(q(:, k)); Q(:, k) = Q(:, k)/r(k, k); end

Reorthogonalization Iterative orthogonalization - As each q k is generated reorthogonalize with respect to q 1,..., q k 1 Twice is enough (W. Kahan), (Giraud, Langou, and Rozložnik, 2002) Selective reorthogonalization

To reorthogonalize or not to reorthogonalize For most applications it is not necessary to reorthogonalize if MGS is used Both MGS least squares and MGS GMRES are backward stable In many applications it is important that the residual vector r be orthogonal to the column vectors of Q. In these cases if MGS is used, r should be reorthogonalized with respect to q n, q n 1,..., q 1 (note the reverse order)

Selective reorthogonalization Reorthogonalize when loss of orthogonality is detected. k 1 b = a k r ik q i i=1 Indication of cancellation if b << a k (1967) Rutishauser test: Reorthogonalize if b 1 10 a k

Selective reorthogonalization The Rutishauser reorthogonalization condition can be rewritten as a k b 10 or more generally a k b K There are many papers using different values of K. Generally 1 K κ 2 (A); popular choice is K = 2. Hoffman, 1989, investigates a range of K values. Giraud and Langou, 2003, use the alternate condition k 1 i=1 r ik r kk > L

Superorthogonalization x T y fl(x T y) γ n x T y γ n = nε/(1 nε), x is the vector with elements x i. If x is orthogonal y then fl(x T y) γ n x T y (1) Rutishauser Superorthogonalization: Given x and y keep orthogonalizing until (1) is satisfied.

Superorthogonalization Example x = 1 10 40 10 20 10 10 10 15, y = are numerically orthogonal, however 10 20 1 10 10 10 20 10 10 x T y 10 20 and x T y 10 20 Perform two reorthogonalizations: y y (2) y (3)

Superorthogonalization Calculations x y y (2) y (3) 1e+00 1e 20 1.000019999996365e 25 1.000020000000000e 25 1e 40 1e+00 1.000000000000000e+00 1.000000000000000e+00 1e 20 1e 10 1.000000000000000e 10 1.000000000000000e 10 1e 10 1e 20 9.999999998999989e 21 9.999999998999989e 21 1e 15 1e 10 1.000000000000000e 10 1.000000000000000e 10 Decrease in scalar products x T y = 1e 20 x T y (2) = 3.6351e 37 x T y (3) = 0

Column pivoting Golub, 1965 Golub and Businger, 1965 After k steps Qk T AΠ k = H k H 1 AP 1 P k = R (k) = R (k) 11 R (k) 12 O R (k) 22 Choose pivot column corresponding to the column (with smallest index) of R 22 that has smallest 2-norm. This strategy minimizes R 22 (1,2) at each step

The (1,2) Matrix Norm The (1,2) norm of an m n matrix A is defined by A (1,2) = ( a 1 2,..., a n 2 ) Note: Since Ax 2 A (1,2) x 1 it follows that A (1,2) = max x 0 Ax 2 x 1 Furthermore, if B is any n k matrix, then AB (1,2) A 2 max( b 1 2,..., b k 2 ) = A 2 B (1,2)

Failure of column pivoting to detect rank deficiencies Column pivoting may fail to detect rank deficiencies. Example of W. Kahan K n = diag(1, s, s 2,..., s n 1 ) where c 2 + s 2 = 1 and c 0. 1 c c c c 0 1 c c c 0 0 1 c c. 0 0 0 1 c 0 0 0 0 1

Pivoting strategies based on 2-norm optimizations Singular value interlacing σ k (R (k) 11 ) σ k(a) and σ 1 (R (k) 22 ) σ k+1(a) To reveal rank, choose a pivoting strategy to minimize R (k) 22 2 (rather than (1,2) norm). Alternative approach: Choose pivoting strategy to maximize 1 (R (k) = σ k (R (k) 11 ) 1 11 ) 2

Rank Revealing QR (RRQR) L. V. Foster, 1986 and Tony Chan 1987 algorithms use Linpack condition estimator. Chan first to refer to factorizations as rank revealing Existence of RRQR s shown by Hong and Pan, 1992 Chandrasekaran and Ipsen, 1994, Hybrid methods combining both optimization approaches

More hybrid RRQR algorithms Pan and Tang, 1999 Hybrid algorithms based on Pivoted Blocks and Cyclic Pivoting Bischof and Quintana-Orti, 1998 Block versions of hybrid algorithms with ICE Incremental Condition Estimation (1990)

Nullspace Computations Given A has numerical rank k and an RRQR factorization AP = (Q 1, Q 2 ) R 11 R 12 O R 22 where R 11 is k k and R 22 O. Approximate nullspace basis Z = P R 1 11 R 12 I n k

Strong RRQR factorizations Problem in computing nullspace basis if R 11 is ill-conditioned. Strong RRQR: Gu and Eisentat, 1996 Strong RRQR factorization if the entries of R 1 11 R 12 are not too large. In many applications the nullspaces need to be updated as A is updated.

UTV decompositions G. W. Stewart, URV (1992) and ULV (1993) Here AV = UT where V = (V 1, V 2 ) is orthogonal and T is triangular. The columns of V 2 form an approximate null space basis. Fierro and Hansen (1997) Low rank UTV factorizations MATLAB routines: UTV Tools (1995), UTV Expansion Pack (2005)

Krylov Subspace Methods for Ax = b Find orthonormal basis for the Krylov subspace K k (A, r 0 ) = span{r 0, Ar 0, A 2 r 0,..., A k 1 r 0 }. If MGS is used to compute QR-decomposition then in the process we compute K = [r 0, Ar 0,..., A k 1 r 0 ] = Q k R q j = c 0 r 0 + c 1 Ar 0 + + c j 1 A j 1 r 0, with c j 1 = 1 r jj 0. Arnoldi Process: Use Aq j instead of A j r 0.

MGS GMRES MGS GMRES is the most usual way to apply Arnoldi to large sparse unsymmetric linear systems. Succeeds despite loss of orthogonality

MGS GMRES Greenbaum, Rozložnik, and Strakoš (1997) Arnoldi basis vectors begin to loose their linear independence only after the GMRES residual norm has been reduced to almost its final level of accuracy. C. Paige, A. Rozložnik, and Z. Strakoš (2006) MGS-GMRES without reorthogonalization is backward stable. What matters is not the loss of orthogonality but the loss of linear independence.

Conclusions Orthogonality plays a fundamental role in applied mathematics. The Gram-Schmidt algorithms are at the core of much of what we do in computational mathematics. Stability of GS is now well understood. The GS Process is central to solving least squares problems and to Krylov subspace methods. The QR factorization paved the way for modern rank revealing factorizations. What will the future bring?