Vector and Matrix Norms I

Similar documents
Vector and Matrix Norms I

Chapter 7 Iterative Techniques in Matrix Algebra

Iterative Methods for Solving A x = b

Iterative techniques in matrix algebra

Solving Dense Linear Systems I

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Numerical Methods I Non-Square and Sparse Linear Systems

Conjugate Gradient (CG) Method

Notes on Some Methods for Solving Linear Systems

Algebra C Numerical Linear Algebra Sample Exam Problems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

EECS 275 Matrix Computation

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Interpolation and Polynomial Approximation I

Scientific Computing: Solving Linear Systems

LINEAR SYSTEMS (11) Intensive Computation

Lab 1: Iterative Methods for Solving Linear Systems

JACOBI S ITERATION METHOD

COURSE Iterative methods for solving linear systems

Scientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix

Computational Methods. Systems of Linear Equations

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Course Notes: Week 1

The Conjugate Gradient Method

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Computational Economics and Finance

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Conjugate Gradient Method

G1110 & 852G1 Numerical Linear Algebra

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

The conjugate gradient method

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Jordan Journal of Mathematics and Statistics (JJMS) 5(3), 2012, pp A NEW ITERATIVE METHOD FOR SOLVING LINEAR SYSTEMS OF EQUATIONS

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

9. Iterative Methods for Large Linear Systems

Chapter 3. Linear and Nonlinear Systems

Iterative Methods. Splitting Methods

Iterative methods for Linear System

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The Conjugate Gradient Method

Linear Solvers. Andrew Hazel

Gaussian Elimination for Linear Systems

CHAPTER 5. Basic Iterative Methods

the method of steepest descent

Lecture 9: Numerical Linear Algebra Primer (February 11st)

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

4.6 Iterative Solvers for Linear Systems

Computational Linear Algebra

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Numerical Linear Algebra And Its Applications

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

EP elements in rings

Lecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Classical iterative methods for linear systems

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Analytical formulas for calculating the extremal ranks and inertias of A + BXB when X is a fixed-rank Hermitian matrix

6.4 Krylov Subspaces and Conjugate Gradients

Introduction to Iterative Solvers of Linear Systems

Eigenvalues and Eigenvectors

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Notes on PCG for Sparse Linear Systems

Scientific Computing

Numerical Methods - Numerical Linear Algebra

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

GATE Engineering Mathematics SAMPLE STUDY MATERIAL. Postal Correspondence Course GATE. Engineering. Mathematics GATE ENGINEERING MATHEMATICS

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Lecture 18 Classical Iterative Methods

The Solution of Linear Systems AX = B

9.1 Preconditioned Krylov Subspace Methods

Scientific Computing: Dense Linear Systems

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

Review problems for MA 54, Fall 2004.

7.2 Steepest Descent and Preconditioning

Lecture # 20 The Preconditioned Conjugate Gradient Method

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble.

A Review of Linear Algebra

MA 265 FINAL EXAM Fall 2012

Solving Linear Systems of Equations

Iterative solvers for linear equations

Math 577 Assignment 7

Scientific Computing WS 2018/2019. Lecture 9. Jürgen Fuhrmann Lecture 9 Slide 1

Jae Heon Yun and Yu Du Han

Introduction to Numerical Analysis

0.1 Rational Canonical Forms

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

PETROV-GALERKIN METHODS

Linear Algebraic Equations

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Transcription:

Vector and Matrix Norms I Scalar, vector, matrix How to calculate errors? Scalar: absolute error: ˆα α relative error: Vectors: vector norm Norm is the distance!! ˆα α / α Chih-Jen Lin (National Taiwan Univ.) 1 / 93

Vector and Matrix Norms II l-norm: x l = l x 1 l + + x n l 1-norm: -norm: x 1 = n x i i=1 l l x = lim x l = lim l = max x i x 1 l + + x n l l (max x i ) l x l 1 l + + x n l l n(max x i ) l Chih-Jen Lin (National Taiwan Univ.) 2 / 93

Vector and Matrix Norms III Example: x = ˆx x = 2, ˆx x x 1 100 and ˆx = 9 1.1 99 11 = 0.02, ˆx x ˆx = 0.0202 ˆx x 2 = 2.238, ˆx x 2 x 2 = 0.0223, ˆx x 2 ˆx 2 = 0.0225 All norms are equivalent Chih-Jen Lin (National Taiwan Univ.) 3 / 93

Vector and Matrix Norms IV For l 1 and l 2 norms, there exist c 1 and c 2 such that for all x R n Example: c 1 x l1 x l2 c 2 x l1 x 2 x 1 n x 2 (1) x x 2 n x (2) x x 1 n x (3) Chih-Jen Lin (National Taiwan Univ.) 4 / 93

Vector and Matrix Norms V Therefore, you can just choose a norm for your convenience HW5-1: prove (1)-(3) Matrix norm: How to define the distance between two matrices? Usually a norm should satisfy where α is a scalar A 0 A + B A + B (4) αa = α A, Chih-Jen Lin (National Taiwan Univ.) 5 / 93

Vector and Matrix Norms VI Definition: A l max x 0 Ax l x l = max x =1 Ax l Proof of (4) A + B = max (A + B)x x =1 max ( Ax + Bx ) max Ax + max Bx x =1 x =1 x =1 Chih-Jen Lin (National Taiwan Univ.) 6 / 93

Relative error I Usually calculating ˆα α α is not practical The reason is that α is unknown ˆα α ˆα is a more reasonable estimate Chih-Jen Lin (National Taiwan Univ.) 7 / 93

Relative error II If then 1 ˆα α 1.1 ˆα ˆα α ˆα ˆα α α 0.1, 1 ˆα α 0.9 ˆα Proof: α ˆα ˆα α 0.1 ˆα α 1.1 ˆα Chih-Jen Lin (National Taiwan Univ.) 8 / 93

Relative error III Similarly, ˆα α 0.1 ˆα α 0.9 ˆα Chih-Jen Lin (National Taiwan Univ.) 9 / 93

Condition of a Linear System I Solving a linear system 10 7 8 7 x 1 32 1 7 5 6 5 x 2 8 6 10 9 x 3 = 23 33, solution = 1 1 7 5 9 10 31 1 x 4 Right-hand side slightly modified 10 7 8 7 x 1 32.1 9.2 7 5 6 5 x 2 8 6 10 9 x 3 = 22.9 33.1, solution = 12.6 4.5 7 5 9 10 30.9 1.1 x 4 Chih-Jen Lin (National Taiwan Univ.) 10 / 93

Condition of a Linear System II A small modification causes a huge error Matrix slightly modified 10 7 8.1 7.2 x 1 32 7.08 5.04 6 5 x 2 8 5.98 9.89 9 x 3 = 23 33 6.99 4.99 9 9.98 31 x 4 81 solution = 137 34 22 Chih-Jen Lin (National Taiwan Univ.) 11 / 93

Condition of a Linear System III Right-hand side slightly modified Ax = b A(x + δx) = b + δb δx = A 1 δb δx A 1 δb b A x δx x A A 1 δb b Chih-Jen Lin (National Taiwan Univ.) 12 / 93

Condition of a Linear System IV Matrix modified Ax = b (A + δa)(x + δx) = b Ax + Aδx + δa(x + δx) = b δx = A 1 δa(x + δx) δx A 1 δa x + δx δx x + δx A A 1 δa A A A 1 is defined as the condition of A A smaller condition number is better Chih-Jen Lin (National Taiwan Univ.) 13 / 93

Sparse matrices: Storage Schemes I Most elements are zero Very common in engineering applications Without saving zeros, can handle very large matrices An example 1 0 0 2 A = 3 4 0 5 6 0 7 8 0 0 10 11 Storage schemes: There are different ways to store sparse matrices Chih-Jen Lin (National Taiwan Univ.) 14 / 93

Sparse matrices: Storage Schemes II Coordinate format a 1 3 6 4 7 10 2 5 8 11 arow_ind 1 2 3 2 3 4 1 2 3 4 acol_ind 1 1 1 2 3 3 4 4 4 4 Indices may not be well ordered Is it easy to do operations? A + B, Ax A + B: if (i, j) are not ordered, difficult y = Ax: for l = 1:nnz i = arow_ind(l) j = acol_ind(l) y(i) = y(i) + a(l)*x(j) end Chih-Jen Lin (National Taiwan Univ.) 15 / 93

Sparse matrices: Storage Schemes III nnz: usually used to represent the number of nonzeros x: vector in dense format In general we directly store a vector without using sparse format Access one column for l = 1:nnz if acol_ind(l) == i x(arow_ind(l)) = a(l) end end Cost: O(nnz) Chih-Jen Lin (National Taiwan Univ.) 16 / 93

Sparse matrices: Storage Schemes IV When is this used: Solving Lx = b l 11 x 1 l 21 l 22 x 2..... = b 2. l n1 l n2 l nn x n b n b 2. b n x 1 Compressed column format l 21. l n1 b 1 Chih-Jen Lin (National Taiwan Univ.) 17 / 93

Sparse matrices: Storage Schemes V a 1 3 6 4 7 10 2 5 8 11 arow_ind 1 2 3 2 3 4 1 2 3 4 acol_ptr 1 4 5 7 11 jth column: from a(acol ptr(j)) to a(acol ptr(j+1)-1) Example: 3rd column acol_ptr(3) = 5 acol_prr(4) = 7 a(5) = 7 a(6) = 10 nnz = acol ptr(n+1) - 1 acol ptr contains n + 1 elements Chih-Jen Lin (National Taiwan Univ.) 18 / 93

Sparse matrices: Storage Schemes VI C = A + B for j = 1:n get A s jth column get B s jth column do a vector addition end C is still with column format y = Ax = A :,1 x 1 + + A :,n x n for j = 1:n for l = acol_ptr(j):acol_ptr(j+1)-1 y(arow_ind(l)) = y(arow_ind(l)) + a(l)*x(j) end end Chih-Jen Lin (National Taiwan Univ.) 19 / 93

Sparse matrices: Storage Schemes VII Row indices of the same column may not be sorted a 6 3 1 4 7 10 2 5 8 11 arow_ind 3 2 1 2 3 4 1 2 3 4 acol_ptr 1 4 5 7 11 C = AB is similar Access one column is easy Access one row is very difficult Compressed row format 1 0 0 2 A = 3 4 0 5 6 0 7 8 0 0 10 11 Chih-Jen Lin (National Taiwan Univ.) 20 / 93

Sparse matrices: Storage Schemes VIII a 1 2 3 4 5 6 7 8 10 11 acol_ind 1 4 1 2 4 1 3 4 3 4 arow_ptr 1 3 6 9 11 C a 1 3 6 4 7 10 2 5 8 11 arow_ind 0 1 2 1 2 3 0 1 2 3 acol_ptr 0 3 4 6 10 There are so many variations of sparse structures. Very difficult to have standard sparse libraries Different formats are suitable for different matrices A C implementation: row format Chih-Jen Lin (National Taiwan Univ.) 21 / 93

Sparse matrices: Storage Schemes IX typedef struct row_elt { int col, nxt_row, nxt_idx; Real val; } row_elt; typedef struct SPROW { int len, maxlen, diag; row_elt *elt; } SPROW; typedef struct SPMAT { int m, n, max_m, max_n; char flag_col, flag_diag; SPROW *row; int *start_row; int *start_idx; Chih-Jen Lin (National Taiwan Univ.) 22 / 93

Sparse matrices: Storage Schemes X } SPMAT; To scan a row len = A->row[i].len ; for (j_idx = 0; j_idx < len; j_idx++) printf( %d %d %g, i, A->row[i].elt[j_idx].col, A->row[i].elt[j_idx].val) ; Objected oriented design is a big challenge for sparse matrices Homework 5-2: Chih-Jen Lin (National Taiwan Univ.) 23 / 93

Sparse matrices: Storage Schemes XI Write your own sparse matrix-matrix code and compare with Intel MKL (which has now supported sparse opertions) You generate random matrices by yourself. Any reasonable size is ok. Chih-Jen Lin (National Taiwan Univ.) 24 / 93

Sparse Matrix and Factorization I A more advanced topic Factorization generates fill-ins fill-ins: new nonzero positions A matlab program A = sprandsym(200, 0.05, 0.01, 1) ; L = chol(a) ; spy(a) ; print -deps A spy(l) ; print -deps L 0.05: density 0.01: 1/(condition number) Chih-Jen Lin (National Taiwan Univ.) 25 / 93

Sparse Matrix and Factorization II 1: type of matrix, 1 gives a matrix with 1/(condition number) exactly 0.01 spy: draw the sparsity pattern 0 0 20 20 40 40 60 60 80 80 100 100 120 120 140 140 160 160 180 180 200 0 20 40 60 80 100 120 140 160 180 200 nz = 1984 (a) A 200 0 20 40 60 80 100 120 140 160 180 200 nz = 2919 (b) L Chih-Jen Lin (National Taiwan Univ.) 26 / 93

Sparse Matrix and Factorization III Clearly L is denser Chih-Jen Lin (National Taiwan Univ.) 27 / 93

Permutation and Reordering I 3 2 1 2 A = 2 4 0 0 1 0 5 0, 2 0 0 6 1.7321 0 0 0 chol(a) = 1.1547 1.6330 0 0 0.5774 0.4082 2.1213 0 1.1547 0.8165 0.4714 1.9437 Chih-Jen Lin (National Taiwan Univ.) 28 / 93

Permutation and Reordering II 0 0 0 1 2 1 2 3 P = 0 0 1 0 0 1 0 0, AP = 0 0 4 2 0 5 0 1 1 0 0 0 6 0 0 2 Chih-Jen Lin (National Taiwan Univ.) 29 / 93

Permutation and Reordering III = = PAP 0 0 0 1 2 1 2 3 0 0 1 0 0 0 4 2 0 1 0 0 0 5 0 1 1 0 0 0 6 0 0 2 6 0 0 2 0 5 0 1 0 0 4 2 2 1 2 3 Chih-Jen Lin (National Taiwan Univ.) 30 / 93

Permutation and Reordering IV 2.4495 0 0 0 chol(pap T ) = 0 2.2361 0 0 0 0 2.0000 0 0.8165 0.4472 1.0000 1.0646 chol(pap T ) is sparser Ax = b (PAP T )Px = Pb, solve Px and get x There are different ways of permutations Chih-Jen Lin (National Taiwan Univ.) 31 / 93

Permutation and Reordering V Reordering algorithms. colmmd - Column minimum degree permutat symmmd - Symmetric minimum degree permu symrcm - Symmetric reverse Cuthill-McKe permutation. colperm - Column permutation. randperm - Random permutation. dmperm - Dulmage-Mendelsohn permutation Finding the ordering with the least entries in the factorization minimum fill-in problem Minimum fill-in may not be the best: need to consider the numerical stability, implementation efforts, etc Chih-Jen Lin (National Taiwan Univ.) 32 / 93

Iterative Methods I Chapter 10 of the book matrix computation An iterative process: x 1, x 2,..., x We hope Ax = b Gaussian elimination is O(n 3 ) If x k x k+1 takes O(n r ), l iterations, and n r l < n 3, iterative methods can be faster Accuracy and sparsity are other considerations Chih-Jen Lin (National Taiwan Univ.) 33 / 93

Jacobi and Gauss-Seidel Method I A three by three system Ax = b x 1 = (b 1 a 12 x 2 a 13 x 3 )/a 11 x 2 = (b 2 a 21 x 1 a 23 x 3 )/a 22 x 3 = (b 3 a 31 x 1 a 32 x 2 )/a 33 x k : an approximation to x = A 1 b (x k+1 ) 1 = (b 1 a 12 (x k ) 2 a 13 (x k ) 3 )/a 11 (x k+1 ) 2 = (b 2 a 21 (x k ) 1 a 23 (x k ) 3 )/a 22 (x k+1 ) 3 = (b 3 a 31 (x k ) 1 a 32 (x k ) 2 )/a 33 Chih-Jen Lin (National Taiwan Univ.) 34 / 93

Jacobi and Gauss-Seidel Method II The general case for i = 1:n (x k+1 ) i = (b i i 1 j=1 a ij(x k ) j n j=i+1 a ij(x k ) j )/a ii end Gauss-Seidel iteration (x k+1 ) 1 = (b 1 a 12 (x k ) 2 a 13 (x k ) 3 )/a 11 (x k+1 ) 2 = (b 2 a 21 (x k+1 ) 1 a 23 (x k ) 3 )/a 22 (x k+1 ) 3 = (b 3 a 31 (x k+1 ) 1 a 32 (x k+1 ) 2 )/a 33 The general case Chih-Jen Lin (National Taiwan Univ.) 35 / 93

Jacobi and Gauss-Seidel Method III for i = 1:n (x k+1 ) i = (b i i 1 j=1 a ij(x k+1 ) j n j=i+1 a ij(x k ) j )/a ii end The iterates may diverge A = 1 2 2 2 1 2, b = 5 5, sol = 1 1 2 2 1 5 1 Chih-Jen Lin (National Taiwan Univ.) 36 / 93

Jacobi and Gauss-Seidel Method IV det = 1 + 8 + 8 4 4 4 0 x 0 = 0 0, x 1 = 5 5 10 10 5, x 2 = 5 10 10 = 15 15 0 5 5 10 10 15 5 + 30 + 30 x 3 = 5 + 30 + 30 = 65 65 5 + 30 + 30 65 Chih-Jen Lin (National Taiwan Univ.) 37 / 93

Jacobi and Gauss-Seidel Method V Convergence Does the method eventually go to a solution? This is like a 11 a 22 a 11 x 1 = b 1 a 12 x 2 a 13 x 3 a 22 x 2 = b 2 a 21 x 1 a 23 x 3 a 33 x 3 = b 3 a 31 x 1 a 32 x 2 x 1 x 2 = 0 a 12 a 13 a 21 0 a 23 x 1 x 2 +b a 33 a 31 a 32 0 x 3 Chih-Jen Lin (National Taiwan Univ.) 38 / 93 x 3

Jacobi and Gauss-Seidel Method VI If M = a 11 then a 22 and N = 0 a 12 a 13 a 21 0 a 23 a 33 a 31 a 32 0 A = M N and Mx k+1 = Nx k + b Chih-Jen Lin (National Taiwan Univ.) 39 / 93

Jacobi and Gauss-Seidel Method VII Spectral radius: ρ(a) = max{ λ λ λ(a)} λ(a) contains all eigenvalues of A Theorem 1 A = M N, A, M non-singular, ρ(m 1 N) < 1 Mx k+1 = Nx k + b leads to the convergence of {x k } to A 1 b for any starting vector x 0 Chih-Jen Lin (National Taiwan Univ.) 40 / 93

Jacobi and Gauss-Seidel Method VIII Proof: Ax = b Mx = Nx + b M(x k+1 x) = N(x k x) x k+1 x = M 1 N(x k x) x k+1 x = (M 1 N) k (x 1 x) ρ(m 1 N) < 1 (M 1 N) k 0 x k+1 x 0 Chih-Jen Lin (National Taiwan Univ.) 41 / 93

Jacobi and Gauss-Seidel Method IX Reasons why ρ(m 1 N) < 1 (M 1 N) k 0? This is quite complicated, so we omit the derivation here. Chih-Jen Lin (National Taiwan Univ.) 42 / 93

Reasons of Gauss-Seidel Methods I An optimization problem is the same as solving 1 min x 2 x T Ax b T x Ax b = 0 if A is symmetric positive definite Chih-Jen Lin (National Taiwan Univ.) 43 / 93

Reasons of Gauss-Seidel Methods II If then f (x) = 1 2 x T Ax b T x f (x) = Ax b f (x) f (x) x 1. f (x) x n Chih-Jen Lin (National Taiwan Univ.) 44 / 93

Reasons of Gauss-Seidel Methods III Remember x T Ax = = n x i (Ax) i i=1 n i=1 n x i A ij x j j=1 = x 1 A 11 x 1 + + x 1 A 1n x n +x 2 A 21 x 1 + + x n A n1 x 1 + Chih-Jen Lin (National Taiwan Univ.) 45 / 93

Reasons of Gauss-Seidel Methods IV Therefore x T Ax x 1 = 2A 11 x 1 + + A 1n x n +x 2 A 21 + + x n A n1 = 2(A 11 x 1 + + A 1n x n ) Chih-Jen Lin (National Taiwan Univ.) 46 / 93

Reasons of Gauss-Seidel Methods V Sequentially update one variable min f (x 1, x2 k,..., xn k ) x 1 min f (x x 1 k+1, x 2,..., xn k ) 2 min f (x x 1 k+1, x2 k+1, x 3,..., xn k ) 3. Chih-Jen Lin (National Taiwan Univ.) 47 / 93

Reasons of Gauss-Seidel Methods VI 1 min d 2 (x + de i) T A(x + de i ) b T (x + de i ) 1 min d 2 d 2 A ii + d(ax) i b i d A ii d + (Ax) i b i = 0 d = b i (Ax) i A ii x i + d b i j:j i A ijx j A ii Chih-Jen Lin (National Taiwan Univ.) 48 / 93

Reasons of Gauss-Seidel Methods VII Note that e i = [0,..., 0, 1, 0,..., 0] }{{} T i 1 Chih-Jen Lin (National Taiwan Univ.) 49 / 93

Conjugate Gradient Method I For symmetric positive definite matrices only One of the most frequently used iterative methods Before introducing CG, we discuss a related method called steepest descent method We still consider solving 1 min x 2 x T Ax b T x Chih-Jen Lin (National Taiwan Univ.) 50 / 93

Steepest Descent Method I Gradient direction f (x) =f (x k ) + f (x k )(x x k ) + 1 2 f (x k )(x x k ) 2 + f (x) =f (x k ) + f (x k ) T (x x k )+ 1 2 (x x k) T 2 f (x k )(x x k ) + Omit 1 2 (x x k) T 2 f (x k )(x x k ) + f (x) f (x k ) + f (x k ) T (x x k ) Chih-Jen Lin (National Taiwan Univ.) 51 / 93

Steepest Descent Method II minimize If then is the direction. f (x k ) T (x x k ) min f (x k) T (x x k ), x x k =1 x x k = f (x k) f (x k ) Chih-Jen Lin (National Taiwan Univ.) 52 / 93

Steepest Descent Method III Note a b = a b cos θ cos π = 1, minimum OK for 2D. But how do you prove that for general cases? Now f (x k ) = Ax k b. x x k α f (x k ) Chih-Jen Lin (National Taiwan Univ.) 53 / 93

Steepest Descent Method IV Let r = f (x k ) = b Ax k, x = x k Minimize along the direction 1 min α 2 (x + αr)t A(x + αr) (x + αr) T b 1 min α 2 α2 r T Ar + αr T Ax αr T b 1 min α 2 α2 r T Ar + αr T (Ax b) Chih-Jen Lin (National Taiwan Univ.) 54 / 93

Steepest Descent Method V A problem of one variable: Note αr T Ar + r T (Ax b) = 0 α = r T (b Ax) r T Ar r T Ar 0 if A is positive definite Now r = b Ax k α = (b Ax k) T (b Ax k ) (b Ax k ) T A(b Ax k ) Chih-Jen Lin (National Taiwan Univ.) 55 / 93

Steepest Descent Method VI The algorithm: k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 α k = r T k 1 r k 1/r T k 1 Ar k 1 x k = x k 1 + α k r k 1 r k = b Ax k end It converges but may be very slow Chih-Jen Lin (National Taiwan Univ.) 56 / 93

General Search Directions I Suppose we have x k obtained by f (x k 1 + αp k ) min α f (x k 1 + αp k ) = 1 2 (x k 1 + αp k ) T A(x k 1 + αp k ) b T (x k 1 + αp k ) = + αp T k (Ax k 1 b) + 1 2 α2 p T k Ap k α = pt k (r k 1) p T k Ap k Chih-Jen Lin (National Taiwan Univ.) 57 / 93

General Search Directions II A more general algorithm: k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 Choose a direction p k such that p T k r k 1 0 α k = p T k r k 1/p T k Ap k x k = x k 1 + α k p k r k = b Ax k end By this setting x k x 0 + span{p 1,..., p k } Chih-Jen Lin (National Taiwan Univ.) 58 / 93

General Search Directions III The question is then how to choose suitable directions? Chih-Jen Lin (National Taiwan Univ.) 59 / 93

Conjugate Gradient Method I We hope that p 1,..., p n are linearly independent and x k = arg min f (x) (5) x x 0 +span{p 1,...,p k } With (5), so Then span{p 1,..., p n } = R n, x n = arg min x R n f (x) Ax n = b Chih-Jen Lin (National Taiwan Univ.) 60 / 93

Conjugate Gradient Method II and the procedure stops at most n iterations But how to maintain (5)? Let x k = x 0 + P k 1 y + αp k, where P k 1 = [ p 1,..., p k 1 ], y R k 1, α R Chih-Jen Lin (National Taiwan Univ.) 61 / 93

Conjugate Gradient Method III f (x k ) = 1 2 (x 0 + P k 1 y + αp k ) T A(x 0 + P k 1 y + αp k ) b T (x 0 + P k 1 y + αp k ) = 1 2 (x 0 + P k 1 y) T A(x 0 + P k 1 y) b T (x 0 + P k 1 y) + αp T k A(x 0 + P k 1 y) b T (αp k ) + α2 2 pt k Ap k =f (x 0 + P k 1 y) + αp T k AP k 1 y αp T k r 0 + α2 2 pt k Ap k Chih-Jen Lin (National Taiwan Univ.) 62 / 93

Conjugate Gradient Method IV min f (x) x x 0 +span{p 1,...,p k } = min y,α f (x 0 + P k 1 y + αp k ) is difficult because the term αp T k AP k 1 y involves both α and y Chih-Jen Lin (National Taiwan Univ.) 63 / 93

Conjugate Gradient Method V If then and p k span{ap 1,..., Ap k 1 }, min f (x) x x 0 +span{p 1,...,p k } = min y αp T k AP k 1 y = 0 f (x 0 + P k 1 y) + min( αpk T r 0 + α2 α 2 pt k Ap k ), two independent optimization prolems Chih-Jen Lin (National Taiwan Univ.) 64 / 93

Conjugate Gradient Method VI Therefore, we require p k is A-conjugate to p 1,..., p k 1. That is By induction, p T i Ap k = 0, i = 1,..., k 1 x k 1 = arg min y f (x 0 + P k 1 y) The solution of the second problem is α k = pt k r 0 p T k Ap k Chih-Jen Lin (National Taiwan Univ.) 65 / 93

Conjugate Gradient Method VII Because of A-conjugacy, pk T r k 1 = pk T (b Ax k 1 ) = pk T (b A(x 0 + P k 1 y k 1 )) = pk T r 0 We have x k = x k 1 + α k p k New algorithm Chih-Jen Lin (National Taiwan Univ.) 66 / 93

Conjugate Gradient Method VIII k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 Choose any p k span{ap 1,..., Ap k 1 } such that p T k r k 1 0 α k = p T k r k 1/p T k Ap k x k = x k 1 + α k p k r k = b Ax k end Next how to choose p k? One way is to minimize the distance to r k 1 : Chih-Jen Lin (National Taiwan Univ.) 67 / 93

Conjugate Gradient Method IX Reason: r k 1 now negative gradient direction The algorithm becomes k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 if k = 1 p 1 = r 0 else Let p k minimize p r k 1 2 over all vectors p span{ap 1,..., Ap k 1 } end Chih-Jen Lin (National Taiwan Univ.) 68 / 93

Conjugate Gradient Method X end α k = p T k r k 1/p T k Ap k x k = x k 1 + α k p k r k = b Ax k Chih-Jen Lin (National Taiwan Univ.) 69 / 93

Conjugate Gradient Method XI Lemma 2 If p k minimizes p r k 1 2 over all vectors p span{ap 1,..., Ap k 1 }, then where z k 1 solves p k = r k 1 AP k 1 z k 1 min z r k 1 AP k 1 z 2, P k = [p 1,..., p k ]: an n k matrix Chih-Jen Lin (National Taiwan Univ.) 70 / 93

Conjugate Gradient Method XII Proof: AP k 1 z: space spanned by Ap 1,..., Ap k 1 r k 1 is not in the above space z: Coefficients of the linear combination of Ap 1,..., Ap k 1 p k span{ap 1,..., Ap k 1 } p k = r k 1 AP k 1 z k 1 Chih-Jen Lin (National Taiwan Univ.) 71 / 93

Conjugate Gradient Method XIII Theorem 3 After j iterations, we have Proof: r j = r j 1 α j Ap j Pj T r j = 0 (6) span{p 1,..., p j } = span{r 0,..., r j 1 } = span{b, Ab,..., A j 1 b} ri T r j = 0 for all i j Chih-Jen Lin (National Taiwan Univ.) 72 / 93

Conjugate Gradient Method XIV r j = b Ax j = b Ax j 1 + A(x j 1 x j ) = r j 1 α j Ap j r i, r j mutually orthogonal Proofs of others omitted Now we want to find z k 1 Chih-Jen Lin (National Taiwan Univ.) 73 / 93

Conjugate Gradient Method XV z k 1 a vector with length k 1 [ w z k 1 =, w : (k 2) 1, µ : 1 1 µ] p k = r k 1 AP k 1 z k 1 (7) = r k 1 AP k 2 w µap k 1 = (1 + µ )r k 1 + s k 1 α k 1 (8) and r k 1 = r k 2 α k 1 Ap k 1 Chih-Jen Lin (National Taiwan Univ.) 74 / 93

Conjugate Gradient Method XVI where We have and s k 1 µ α k 1 r k 2 AP k 2 w (9) ri T r j = 0, i, j AP k 2 w span{ap 1,..., Ap k 2 } = span{ab,..., A k 2 b} span{r 0,..., r k 2 } Chih-Jen Lin (National Taiwan Univ.) 75 / 93

Conjugate Gradient Method XVII Hence r T k 1 (AP k 2w) = 0 s T k 1r k 1 = 0 (10) Recall from Lemma 2, our job now is to find z k 1 such that r k 1 AP k 1 z is minimized Chih-Jen Lin (National Taiwan Univ.) 76 / 93

Conjugate Gradient Method XVIII The reason of minimizing r k 1 AP k 1 z instead of p r k 1 2, p span{ap 1,..., Ap k 1 } (11) is that (11) is constrained. Chih-Jen Lin (National Taiwan Univ.) 77 / 93

Conjugate Gradient Method XIX From (8) and (10), select µ and w such that (1 + µ α k 1 )r k 1 + s k 1 2 = (1 + µ α k 1 )r k 1 2 + µ α k 1 r k 2 AP k 2 w 2 is minimized Chih-Jen Lin (National Taiwan Univ.) 78 / 93

Conjugate Gradient Method XX If an optimal solution is (µ, w ), then µ α k 1 r k 2 AP k 2 w w = µ r k 2 AP k 2 α k 1 µ /α k 1 and w µ /α k 1 must be the solution of min r k 2 AP k 2 z z Chih-Jen Lin (National Taiwan Univ.) 79 / 93

Conjugate Gradient Method XXI From Lemma 2, the solution of min r k 2 AP k 2 z z is p k 1 = r k 2 AP k 2 z k 2 Therefore, s k 1 is a multiple of p k 1 From (8), p k span{r k 1, p k 1 } Chih-Jen Lin (National Taiwan Univ.) 80 / 93

Conjugate Gradient Method XXII Assume p k = r k 1 + β k p k 1 (12) This assumption is fine as we will adjust α later. That is, a direction parallel to the real solution of min p r k 1 is enough From (6), p T k 1Ap k = 0, p T k 1r k 1 = 0 Chih-Jen Lin (National Taiwan Univ.) 81 / 93

Conjugate Gradient Method XXIII With (12), Ap k = Ar k 1 + β k Ap k 1 p T k 1Ap k = p T k 1Ar k 1 + β k p T k 1Ap k 1 = 0 β k = pt k 1 Ar k 1 p T k 1 Ap k 1 α k = (r k 1 + β k p k 1 ) T r k 1 p T k Ap k = r T k 1 r k 1 p T k Ap k The conjugate gradient method Chih-Jen Lin (National Taiwan Univ.) 82 / 93

Conjugate Gradient Method XXIV k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 if k = 1 p 1 = r 0 else β k = p T k 1 Ar k 1/p T k 1 Ap k 1 p k = r k 1 + β k p k 1 end α k = r T k 1 r k 1/p T k Ap k x k = x k 1 + α k p k r k = b Ax k Chih-Jen Lin (National Taiwan Univ.) 83 / 93

Conjugate Gradient Method XXV end The computational efforts 3 matrix-vector products each iteration: Ar k 1, Ap k 1, Ax k Chih-Jen Lin (National Taiwan Univ.) 84 / 93

Conjugate Gradient Method XXVI Further simplification r k r k 1 r T k 1r k 1 = r k 1 α k Ap k = r k 2 α k 1 Ap k 1 = rk 1r T k 2 α k 1 rk 1Ap T k 1 = 0 α k 1 r T k 1Ap k 1 r T k 2r k 1 = r T k 2r k 2 α k 1 r T k 2Ap k 1 = r T k 2r k 2 α k 1 (p k 1 β k 1 p k 2 ) T Ap k 1 = r T k 2r k 2 α k 1 p T k 1Ap k 1 = 0 r T k 2r k 2 = α k 1 p T k 1Ap k 1 Chih-Jen Lin (National Taiwan Univ.) 85 / 93

Conjugate Gradient Method XXVII β k = pt k 1 Ar k 1 p T k 1 Ap k 1 A simplified version: k = 0; x 0 = 0; r 0 = b while r k 0 k = k + 1 if k = 1 p 1 = r 0 else β k = r T k 1 r k 1/r T k 2 r k 2 p k = r k 1 + β k p k 1 = r T k 1 r k 1/α k 1 r T k 2 r k 2/α k 1 = r T k 1 r k 1 r T k 2 r k 2 Chih-Jen Lin (National Taiwan Univ.) 86 / 93

Conjugate Gradient Method XXVIII end α k = rk 1 T r k 1/pk T Ap k x k = x k 1 + α k p k r k = r k 1 α k Ap k end One matrix vector product r k 0 is not a practical termination criterion Too many inner products The final version Chih-Jen Lin (National Taiwan Univ.) 87 / 93

Conjugate Gradient Method XXIX k = 0; x = 0; r = b; ρ 0 = r 2 2 while ρ k > ɛ b 2 and k < k max k = k + 1 if k = 1 p = r else β = ρ k 1 /ρ k 2 p = r + βp end w = Ap α = ρ k 1 /p T w x = x + αp Chih-Jen Lin (National Taiwan Univ.) 88 / 93

Conjugate Gradient Method XXX r = r αw ρ k = r 2 2 end Numerical error will cause the number of iterations > n Slow convergence Convergence properties Theorem 4 If A = I + B is an n n symmetric positive definite matrix and rank(b) = r, then the conjugate gradient method converges in at most r + 1 steps Chih-Jen Lin (National Taiwan Univ.) 89 / 93

Conjugate Gradient Method XXXI The case of A = I : x 0 = 0, r 0 = b p 1 = r 0 = b α = r T 0 r 0 /p T 1 Ap 1 = b T b/b T Ab = 1 x 1 = p 1 = b r 1 = r 0 α 1 Ap 1 = b b = 0 The conjugate gradient method stops in one iteration Chih-Jen Lin (National Taiwan Univ.) 90 / 93

Conjugate Gradient Method XXXII An error bound in terms of the norm x T Ax: Theorem 5 If Ax = b, κ 1 x x k A 2 x x 0 A ( ) k, κ + 1 where x A = x T Ax κ: the condition number of A κ 1, ( κ 1 κ+1 ) k smaller Chih-Jen Lin (National Taiwan Univ.) 91 / 93

Conjugate Gradient Method XXXIII In general, if the condition of A is better fewer CG iterations Where is PD used? For α, 1 2 α2 p T Ap + We need p T Ap > 0 for the minimization. That is, we need PD to ensur that CG solves 1 min x 2 x T Ax b T x Chih-Jen Lin (National Taiwan Univ.) 92 / 93

Homework 6 I Solving a linear system with the largest symmetric positive definite matrix in Matrix Market http://math.nist.gov/matrixmarket Implement three methods on C or C++: Jacobi, Gauss-Seidel, and CG Solve Ax = e, e: vector of all ones You may need to set the maximal number iterations if a method takes too many iterations If none converges, try to do diagonal scaling first: Let C = diag(a) 1/2 Chih-Jen Lin (National Taiwan Univ.) 93 / 93

Homework 6 II Solve CACy = Ce where x = Cy. After finding solutions, analyze the error by Ax b Chih-Jen Lin (National Taiwan Univ.) 94 / 93