Tsung-Ming Huang Matrix Computation, 2016, NTNU 1
Plan Gradient method Conjugate gradient method Preconditioner 2
Gradient method 3
Theorem Ax = b, A : s.p.d Definition A : symmetric positive definite if Inner product A = A x Ax 0, x 0 < x, y = x y for any x, y! n Define g(x) = < x, Ax 2 < x,b = x Ax 2x b Theorem A : s.p.d x is the sol. of Ax = b g(x ) = min x! n g(x) 4
Proof Assume x is the sol. of Ax = b Ax = b g(x) = < x, Ax 2 < x,b = < x x, A(x x ) + < x, Ax + < x, Ax < x, Ax 2 < x,b = < x x, A(x x ) < x, Ax + 2 < x, Ax 2 < x,b = < x x, A(x x ) < x, Ax +2 < x, Ax b = < x x, A(x x ) < x, Ax < x x, A(x x ) 0 g(x ) = min x! n g(x) 5
Proof Assume g(x ) = min x! n g(x) Fixed vectors x and v, for any α! f (α ) g(x + αv) = < x + αv, Ax + α Av 2 < x + αv,b = < x, Ax +α < v, Ax +α < x, Av +α 2 < v, Av 2 < x,b 2α < v,b = < x, Ax 2 < x,b +2α < v, Ax 2α < v,b +α 2 < v, Av = g(x) + 2α < v, Ax b +α 2 < v, Av 6
Proof f (α ) = g(x) + 2α < v, Ax b +α 2 < v, Av f is a quadratic function of α A : s.p.d f has a minimal value when f (α ) = 0 f ( ˆα ) = 2 < v, Ax b +2 ˆα < v, Av = 0 ˆα = < v, Ax b < v, Av = < v,b Ax < v, Av g(x + ˆαv) = f ( ˆα ) = g(x) 2 < v,b Ax < v, Av < v,b Ax < v,b Ax 2 < v,b Ax 2 + < v, Av < v, Av = g(x) < v, Av 7
Proof v 0 < v,b Ax 2 g(x + ˆαv) = g(x) < v, Av g(x + ˆαv) < g(x) if < v,b Ax 0 Suppose that g(x + ˆαv) = g(x) if < v,b Ax = 0 g(x ) = min x! n g(x) g(x + ˆαv) g(x ) for any v < v,b Ax = 0, v Ax = b 8
α = < v,b Ax < v, Av = < v,r < v, Av, r b Ax If r 0 and < v,r 0 < v,b Ax 2 g(x + αv) = g(x) < g(x) < v, Av x + αv is closer to x than is x Given and x (0) v (1) 0 For k = 1,2,3,! α k = < v(k ),b Ax (k 1) < v (k ), Av (k ), x (k ) = x (k 1) + α k v (k ) Choose a new search direction v (k+1) 9
Steepest descent { } { x (k ) } x Question How to choose v (k ) s.t. rapidly? Let Φ :! n! be a differentiable function on x Φ(x + ε p) Φ(x) ε = Φ(x) p + O(ε) The right hand side takes minimum at p = Φ(x) Φ(x) (i.e., the largest descent) for all p with p = 1 (neglect O(ε) ) 10
Steepest descent direction of g Denote x = [x 1, x 2,!, x n ] g(x) = < x, Ax 2 < x,b = n n n a x x 2 x ij i j i i=1 j=1 i=1 b i g x k (x) = 2 n a ki i=1 x 2b = 2( A(k,:)x b ) i k k g(x) = g (x), g,!, g (x) x x x 1 2 n = 2(Ax b) = 2r 11
Steepest descent method (gradient method) Given x (0) 0 For k = 1,2,3,! r k 1 = b Ax (k 1) If Else r = 0, then k 1 Stop; α k = < r k 1,r k 1 < r k 1, Ar k 1 Convergence Theorem λ 1 λ 2! λ n 0 : eigenvalues x (k ), x (k 1) : approx. sol. x : exact sol. x (k ) x * A λ 1 λ n λ 1 + λ n x(k 1) x * A End x (k ) = x (k 1) + α k r k 1 where x A = x Ax End 12
Conjugate gradient method 13
A-orthogonal If κ (A) = λ 1 λ n is large λ 1 λ n λ 1 + λ n 1 Convergence is very slow NOT recommend it Improvement Choose A-orthogonal search directions Definition p,q! n are called A-orthogonal (A-conjugate) if p Aq = 0 14
Lemma v 1,,v n 0 : pairwisely A-conjugate v 1,,v n : linearly independent Proof n 0 = c v j j j=1 0 = (v ) A c v k j j = n j=1 c k = 0, k = 1,,n n c (v ) Av j k j j=1 = c k (v k ) Av k v 1,,v n : linearly independent 15
Theorem A : symmetric positive definite v,,v 0! n : pairwisely A-conjugate 1 n x : given 0 For, let k = 1,,n α k = < v k,b Ax k 1 < v k, Av k x k = x k 1 + α k v k Then Ax n = b < b Ax k,v j = 0, for j = 1,2,,k 16
Proof x k = x k 1 + α k v k Ax n = Ax n 1 + α n Av n = (Ax n 2 + α n 1 Av n 1 ) + α n Av n =! = Ax + α Av + α Av +!+ α Av 0 1 1 2 2 n n < Ax b,v n k = < Ax b,v +α < Av,v +!+ α < Av,v 0 k 1 1 k n n k = < Ax b,v +α < v, Av +!+ α < v, Av 0 k 1 1 k n n k = < Ax b,v +α < v, Av 0 k k k k = < Ax 0 b,v k + < v k,b Ax k 1 < v k, Av k < v k, Av k = < Ax 0 b,v k + < v k,b Ax k 1 17
Proof < Ax b,v = < Ax b,v + < v,b Ax n k 0 k k k 1 = < Ax b,v 0 k + < v,b Ax + Ax Ax +! Ax + Ax Ax k 0 0 1 k 2 k 2 k 1 = < Ax b,v + < v,b Ax 0 k k 0 + < v, Ax Ax +!+ < v, Ax Ax k 0 1 k k 2 k 1 = < v, Ax Ax +!+ < v, Ax Ax k 0 1 k k 2 k 1 x i = x i 1 + α i v i, i Ax i = Ax i 1 + α i Av i Ax i 1 Ax i = α i Av i < Ax n b,v k = α 1 < v k, Av 1! α k 1 < v k, Av k 1 = 0 Ax n = b 18
Proof Assume < b Ax k,v j = 0, for j = 1,2,,k < r k 1,v j = 0, for j = 1,2,,k 1 r k = b Ax k = b A(x k 1 + α k v k ) = r k 1 α k Av k < r k,v k = < r k 1,v k α k < Av k,v k = < r k 1,v k < v k,b Ax k 1 < v k, Av k < Av k,v k = 0 For j = 1,,k 1 Assumption A-conjugate < r k,v j = < r k 1,v j α k < Av k,v j = 0 which is completed the proof by the mathematic induction. 19
Method of conjugate directions Given x (0), v,,v! n \ {0}: pairwisely A-orthogonal 1 n r 0 = b Ax (0) For k = 1,,n α k = < v k,r k 1 < v k, Av k, x(k ) = x (k 1) + α k v k r k = r k 1 α k Av k = b Ax (k 1) End Question How to find A-orthogonal search directions? 20
A-orthogonalization!v 2 = v 2 αv 1 v 1 v 2 αv 1!v 2 v 1 0 = v 1!v 2 = v 1 v 2 αv 1 v 1 A-orthogonal!v 2 = v 2 αv 1 A v 1 α = v v 1 2 v v 1 1 0 = v 1 A!v 2 = v 1 Av 2 αv 1 Av 1 α = v 1 Av 2 v 1 Av 1 21
A-orthogonalization!v = v v Av 1 2 2 2 v v v Av 1 A 1 1 1 { v, v } { v,!v } : A-orthogonal 1 2 1 2 { v, v, v } { v,!v,!v } : A-orthogonal 1 2 3 1 2 3!v = v α v α!v { v,!v } 3 3 1 1 2 2 A 1 2 0 = v 1 A!v 3 = v 1 Av 3 α 1 v 1 Av 1 α 1 = v 1 Av 3 / v 1 Av 1 0 =!v 2 A!v 3 =!v 2 Av 3 α 2!v 2 A!v 2 α 2 =!v 2 Av 3 /!v 2 A!v 2 22
Practical Implementation Given x (0) r 0 = b Ax (0) v 1 = r 0 α 1 = < v 1,r 0 < v 1, Av 1, x(1) = x (0) + α 1 v 1 r 1 = r 0 α 1 Av 1 steepest descent direction Construct A-orthogonal vector { v, r } 1 1 NOT A-orthogonal set v 2 = r 1 + β 1 v 1, β 1 = < v 1, Ar 1 < v 1, Av 1 α 2 = < v 2,r 1 < v 2, Av 2, x(2) = x (1) + α 2 v 2 r 2 = r 1 α 2 Av 2 23
Construct A-orthogonal vector { v, v,r } 1 2 2 v = r + β v + β v, β = v Ar 1 2 3 2 21 1 22 2 21 v, β = v Ar 2 2 Av 22 v Av 1 1 2 2 r 1 = r 0 α 1 Av 1 v Ar = r Av = α 1 ( r r r r ) 1 2 2 1 1 2 0 2 1 v r = v r α v Av = v r v r 2 1 2 2 2 1 2 2 2 2 1 v v Av = 0 Av 2 2 2 2 0 = v r = ( r + β v )r = r r + β v r 2 2 1 1 1 2 1 2 1 1 2 = r r + β v ( r α Av ) = r r + β v r 1 2 1 1 1 2 2 1 2 1 1 1 = r r + β v ( r α Av ) = r r + β v r < v 1,r 0 1 2 1 1 0 1 1 1 2 1 1 0 < v, Av v Av 1 1 1 1 = r 1 r 2 24
r = r α Av, α = < v 1,r 0 1 0 1 1 1 < v, Av 1 1 < v,r = < v,r α < v, Av = 0 1 1 1 0 1 1 1 < r,r = < r,v = < r,v α < Av,v = 0 2 0 2 1 1 1 2 2 1 v Ar = α 1 ( r r r r ) = 0 1 2 1 2 0 2 1 β = v Ar 1 2 21 v = 0 Av 1 1 v 3 = r 2 + β 2 v 2, β 2 = v 2 Ar 2 v 2 Av 2 25
In general case v k = r k 1 + β k 1 v k 1 if r k 1 0 0 = < v k 1, Av k = < v k 1, Ar k 1 + β k 1 Av k 1 = < v k 1, Ar k 1 +β k 1 < v k 1, Av k 1 β k 1 = < v k 1, Ar k 1 < v k 1, Av k 1 Theorem (i). { r,r,,r } is an orthogonal set 0 1 k 1 (ii). { v,,v } is an A-orthogonal set 1 k 26
Reformula α k, β k v k = r k 1 + β k 1 v k 1 α = < v k,r k 1 k < v, Av = < r + β v,r k 1 k 1 k 1 k 1 < v, Av k k k k = < r k 1,r k 1 < v k, Av k + β k 1 < v,r k 1 k 1 < v, Av = < r k 1,r k 1 < v, Av k k k k < r k 1,r k 1 = α k < v k, Av k r k = r k 1 α k Av k < r k,r k = < r k 1,r k α k < Av k,r k = α k < r k, Av k β k = < v k, Ar k < v k, Av k = < r k, Av k < v k, Av k = < r k,r k < r k 1,r k 1 27
Algorithm (Conjugate Gradient Method) Given For End compute x (0), r 0 = b Ax (0) = v 0 k = 0,1, α k = < r k,r k < v k, Av k, x(k+1) = x (k ) + α k v k r k+1 = r k α k Av k If Else End r = 0, then k+1 Stop; β k = < r k+1,r k+1 < r k,r k, v k+1 = r k+1 + β k v k Theorem Ax n = b well-conditioned r n < tol ill-conditioned r < tol k k n 28
Conjugate Gradient Method Convergence Theorem λ 1 λ 2! λ n 0 : eigenvalues { x (k ) } : produced by CG method x : exact sol. x (k ) x * 2 A κ 1 κ +1 k CG is much better than Gradient method x 0 x * A, κ = λ 1 λ n { x (k )} : produced by Gradient method G x G (k ) x * A λ 1 λ n λ 1 + λ n k x (0) x * = κ 1 G A κ +1 k κ 1 κ +1 κ 1 κ +1 x G (0) x * A 29
Preconditioner 30
!A!x! b Ax = b C 1 A C C x = C 1 b Goal Choose C such that κ (C 1 AC ) <κ (A) Apply CG method to!a!x = b! Get!x Solve x = C!x Question Nothing NEW Apply CG method to! A!x =! b Get x 31
Algorithm (Conjugate Gradient Method) Given For k = 0,1, If compute!x (0),!r 0 =! b!a!x (0) =!v 0!α k = <!r k,!r k <!v k,!a!v k!x (k+1) =!x (k ) +!α k!v k!r k+1 =! b!a!x (k+1)!r = 0, then Stop k+1!β k = <!r k+1,!r k+1 <!r k,!r k!v k+1 =!r k+1 +! β k!v k = C 1 r k+1!r = C 1 b ( C 1 AC T )C x k+1 k+1 Let = < w k+1,w k+1 < w k,w k = C 1 (b Ax k+1 ) = C 1 r k+1!v k = C v k, w k = C 1 r k!β k = < C 1 r k+1,c 1 r k+1 < C 1 r k,c 1 r k = < w k+1,w k+1 < w k,w k End 32
Algorithm (Conjugate Gradient Method) Given For compute!x (0),!r 0 =! b!a!x (0) =!v 0 k = 0,1,!α k = <!r k,!r k <!v k,!a!v k!x (k+1) =!x (k ) +!α k!v k!r k+1 = C 1 r k+1 = < w k,w k < v k, Av k!α k = = < C 1 r,c 1 r k k < C v,c 1 AC C v k k < w,w k k < C v,c 1 Av k k < C v,c 1 Av k k If!r = 0, then Stop k+1!β k = < w k+1,w k+1 < w k,w k = v k CC 1 Av k = v k Av k!α k = < w k,w k < v k, Av k!v k+1 =!r k+1 +! β k!v k End 33
Algorithm (Conjugate Gradient Method) Given For compute!x (0),!r 0 =! b!a!x (0) =!v 0 k = 0,1,!α k = < w k,w k < v k, Av k C x (k+1) = C x (k ) +!α k C v k x (k+1) = x (k ) +!α k v k!x (k+1) =!x (k ) +!α k!v k C 1 r k+1 = C 1 r k!α k C 1 AC C v k!r k+1 =!r k!α k!a!v k r k+1 = r k!α k Av k If!r = 0, then Stop k+1!v k = C v k, w k = C 1 r k!β k = < w k+1,w k+1 < w k,w k!v k+1 =!r k+1 +! β k!v k C v k+1 = C 1 r k+1 +! β k C v k v k+1 = C C 1 r k+1 +! β k v k End = C w k+1 +! β k v k 34
Algorithm (Conjugate Gradient Method) (0)!! Given x!, compute r!0 = b Ax! = v!0 1 For k = 0,1, need w0 wk = C rk < wk,wk 1 1 (0) α! k = w0 = C r0 = C (b Ax ) < vk, Avk (0) x (k+1) =x (k ) + α! k vk need v0 rk+1 = rk α! k Avk vk+1 = C wk+1 + β! k vk v0 = C w0 If rk+1 = 0, then Stop < wk+1,wk+1! βk = < wk,wk Solve C wk+1 = rk+1! v =C w +β v k+1 End k+1 k k 35
Algorithm (CG Method with preconditioner C) Given C and x (0), compute r = b Ax (0) 0 Solve Cw = r and C 0 0 v = w 1 0 For k = 0,1, α k = < w k,w k / < v k, Av k x (k+1) = x (k ) + α k v k r k+1 = r k α k Av k If r = 0, then Stop k+1 r k+1 = CC z k+1 Mz k+1 β k = < C 1 r k+1,c 1 r k+1 < C 1 r k,c 1 r k = < z k+1,r k+1 < z k,r k End Solve Cw = r and C k+1 k+1 z = w k+1 k+1 β k = < w k+1,w k+1 / < w k,w k v k+1 = z k+1 + β k v k α k = < C 1 r k,c 1 r k < C v k,c 1 Av k = < z k,r k < v k, Av k 36
Algorithm (CG Method with preconditioner M) Given M and x (0), compute r = b Ax (0) 0 Solve Mz = r and set v = z 0 0 1 0 For If k = 0,1, Compute α k = < z k,r k / < v k, Av k Compute x (k+1) = x (k ) + α k v k Compute r k+1 = r k α k Av k r = 0, then Stop k+1 Solve Mz k+1 = r k+1 Compute β k = < z k+1,r k+1 / < z k,r k Compute v k+1 = z k+1 + β k v k End 37
Choices of M (Criterion) cond (M 1/2 AM 1/2 ) is nearly by 1, i.e., M 1/2 AM 1/2 I, A M The linear system Mz = r must be easily solved. e.g. M = LL M is symmetric positive definite 38
Preconditioner M Jacobi method A = D + (L +U), M = D x = D 1 (L +U)x + D 1 b k+1 k = D 1 (A D)x + D 1 b k = x + D 1 r k k Gauss-Seidel A = (D + L) +U, M = D + L x = (D + L) 1 Ux + (D + L) 1 b k+1 k = (D + L) 1 (D + L A)x + (D + L) 1 b k = x k + (D + L) 1 r k 39
Preconditioner M SOR: ω A = (D + ω L) ((1 ω )D ωu) M N x k+1 = (D + ω L) 1 [(1 ω )D ωu ]x + ω(d + ω L) 1 b k = (D + ω L) 1 [(D + ω L) ω A]x + ω(d + ω L) 1 b k = I ω(d + ω L) 1 A x k + ω(d + ω L) 1 b = x k + ω(d + ω L) 1 r k SSOR: M (ω ) = 1 ω(2 ω ) (D + ω ( L)D 1 D + ω L ) 40