Conjugate Gradient Method

Size: px

Start display at page:

Download "Conjugate Gradient Method"

Esmond Logan Gray
6 years ago
Views:

1 Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

2 Outline 1 Steepest Descent Method 2 Conjugate Gradient Method 3 Convergence of CG-method T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

3 A Variational Problem, Steepest Descent Method (Gradient Method) Let A R n n be a large and sparse symmetric positive definite (s.p.d.) matrix. Consider the linear system and the functional F : R n R with Then it holds: Ax = b F(x) = 1 2 xt Ax b T x. (1) T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

4 Theorem 1 For a vector x the following statements are equivalent: (i) F(x ) < F(x), for all x x, (ii) Ax = b. Proof: From assumption there exists z 0 = A 1 b and F(x) can be rewritten as F(x) = 1 2 (x z 0) T A(x z 0 ) 1 2 zt 0 Az 0. (2) Since A is positive definite, F(x) has a minimum at x = z 0 and only at x = z 0, it follows the assertion. The solution of the linear system Ax = b is equal to the solution of the minimization problem ( ) 1 minf(x) min 2 xt Ax b T x. university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

5 Steepest Descent Method Let x k be an approximate of the exact solution x and p k be a search direction. We want to find an α k such that F(x k +α k p k ) < F(x k ). Set x k+1 := x k +α k p k. This leads to the basic problem: Given x, p 0, find α such that Φ(α ) = F(x+α p) = min α R F(x+αp). T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

6 Solution: Since F(x+αp) = 1 2 (x+αp)t A(x+αp) b T (x+αp) it follows that if we take = 1 2 α2 p T Ap+α(p T Ax p T b)+f(x), α = (b Ax)T p p T Ap = rt p p T Ap, (3) where r = b Ax = gradf(x) = residual, then x+α p is the minimal solution. Moreover, F(x+α p) = F(x) 1 (r T p) 2 2 p T Ap. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

7 Lemma 2 Let x k+1 = x k + rt k p k p T k Ap p k, r k = b Ax k, k (4) F(x k+1 ) = F(x k ) 1 (rk Tp k) 2 2 p T k Ap, k = 0,1,2,. k (5) Then, it holds Proof: Since r T k+1 p k = 0. (6) d dα F(x k +αp k ) = gradf(x k +αp k ) T p k, it follows that gradf(x k +α k+1 p k ) T p k = 0 where α k+1 = rt k p k p T k Ap. Thus k (b Ax k+1 ) T p k = r T k+1 p k = 0, university-logo hence (6) holds. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

8 How to choose search direction p k? Let Φ : R n R be a differential function on x. Then it holds Φ(x+εp) Φ(x) ε = Φ (x) T p+o(ε). The right hand side takes minimum at p = Φ (x) (i.e., the largest Φ (x) descent) for all p with p = 1 (neglect O(ε)). Hence, it suggests to choose p k = gradf(x k ) = b Ax k = r k. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

9 Algorithm: Gradient Method 1: Give x 0 and set k = 0. 2: Compute r k = b Ax k and α k = rt k r k rk TAr ; k 3: repeat 4: Compute x k+1 = x k +α k r k and set k := k+1; 5: Compute r k = b Ax k and α k = rt k r k rk TAr. k 6: until r k = 0 Cost in each step: compute Ax k (Ar k does not need to compute). To prove the convergence of Gradient method, we need the Kontorowitsch inequality: n Let λ 1 λ 2 λ n > 0, α i 0, α i = 1. Then it holds i=1 ( n n α i λ i α j λ 1 j (λ 1 +λ n ) 2 = 1 λ1 + 4λ 1 λ n 4 λ n i=1 j=1 λn λ 1 ) 2. university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

10 Theorem 3 If x k, x k 1 are two approximations of Gradient Method Algorithm for solving Ax = b and λ 1 λ 2 λ n > 0 are the eigenvalues of A, then it holds: F(x k )+ 1 ( ) 2 bt A 1 λ1 λ 2 n b [F(x 1 k 1)+ λ 1 +λ n 2 bt A 1 b], (7a) i.e., ( ) x k x λ1 λ n A x k 1 x A, λ 1 +λ n (7b) where x A = x T Ax. Thus the gradient method is convergent. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

11 Conjugate gradient method It is favorable to choose that the search directions {p i } as mutually A-conjugate, where A is symmetric positive definite. Definition 4 Two vectors p and q are called A-conjugate (A-orthogonal), if p T Aq = 0. Remark 1 Let A be symmetric positive definite. Then there exists a unique s.p.d. B such that B 2 = A. Denote B = A 1/2. Then p T Aq = (A 1/2 p) T (A 1/2 q). T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

12 Lemma 5 Let p 0,...,p r 0 be pairwisely A-conjugate. Then they are linearly independent. Proof: From 0 = r c j p j follows that j=0 r 0 = p T k A c j p j = j=0 so c k = 0, for k = 1,...,r. r c j p T k Ap j = c k p T k Ap k, j=0 T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

13 Theorem 6 Let A be s.p.d. and p 0,...,p n 1 be nonzero pairwisely A-conjugate vectors. Then n 1 A 1 p j p T j = p T j Ap. (8) j j=0 Remark 2 A = I, U = (p 0,...,p n 1 ), p T i p i = 1, p T i p j = 0, i j. UU T = I and I = UU T. Then p T I = (p 0,...,p n 1 ) 0. = p 0 p T 0 + +p n 1 p T n 1. p T n 1 T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

14 Proof of Theorem 6: Since p j = A1/2 p j p T j Ap j j = 0,1,...,n 1, we have are orthonormal, for Thus, I = p 0 p T p n 1 p T n 1 n 1 A 1/2 p j p T j = A1/2 n 1 p T j Ap = A 1/2 p j p T j j p T j Ap A 1/2. j j=0 A 1/2 IA 1/2 = A 1 = n 1 j=0 j=0 p j p T j p T j Ap. j Remark 3 Let Ax = b and x 0 be an arbitrary vector. Then from x x 0 = A 1 (b Ax 0 ) and (8) follows that n 1 x = x 0 + j=0 p T j (b Ax 0) (p T j Ap p j. (9) j) university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

15 Theorem 7 Let A be s.p.d. and p 0,...,p n 1 R n \{0} be pairwisely A-orthogonal. Given x 0 and let r 0 = b Ax 0. For k = 0,...,n 1, let Then the following statements hold: (i) r k = b Ax k. (By induction). α k = pt k r k p T k Ap, k (10) x k+1 = x k +α k p k, (11) r k+1 = r k α k Ap k. (12) (ii) x k+1 minimizes F(x) on x = x k +αp k, α R. (iii) x n = A 1 b = x. (iv) x k minimizes F(x) on the affine subspace x 0 +S k, where S k = Span{p 0,...,p k 1 }. university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

16 Proof: (i): By Induction and using (11), (12). (ii): From (3) and (i). (iii): It is enough to show that x k corresponds with the partial sum in (9), k 1 x k = x 0 + j=0 p T j (b Ax 0) p T j Ap p j. j Then it follows that x n = x from (9). From (10) and (11) we have To show that From x k x 0 = k 1 So (13) holds. k 1 k 1 p T j x k = x 0 + α j p j = x 0 + (b Ax j) p T j Ap p j. j j=0 j=0 j=0 p T j (b Ax j ) = p T j (b Ax 0 ). (13) α j p j we obtain k 1 p T k Ax k p T k Ax 0 = α j p T k Ap j = 0. j=0 university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

17 (iv): From (12) and (10) follows that p T k r k+1 = p T k r k α k p T k Ap k = 0. From (11), (12) and by the fact that r k+s r k+s+1 = α k+s Ap k+s and p k are A-orthogonal (for s 1) follows that Hence we have p T k r k+1 = p T k r k+2 = = p T k r n = 0. p T i r k = 0, i = 0,...,k 1, k = 1,2,...,n. (i.e., i < k). (14) We now consider F(x) on x 0 +S k : k 1 F(x 0 + ξ j p j ) = ϕ(ξ 0,,ξ k 1 ). j=0 university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

18 F(x) is minimal on x 0 +S k if and only if all derivatives ϕ ξ s vanish at x. But ϕ k 1 = [gradf(x 0 + ξ j p j )] T p s, s = 0,1,...,k 1. ξ s j=0 If x = x k, then gradf(x) = r k. From (14) follows that ϕ ξ s (x k ) = 0, for s = 0,1,...,k 1. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

19 Remark 4 The following conditions are equivalent: (i) p T i Ap j = 0, i j, (ii) p T i r k = 0, i < k, (iii) r T i r j = 0, i j. Proof of (iii): for i < k, p T i r k = 0 (ri T +β i 1 p T i 1)r k = 0 ri T r k = 0 ri T r j = 0, i j. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

20 Remark 5 It holds p 0,p 1,,p k = r 0,r 1,,r k = r 0,Ar 0,,A k r 0. Since p 1 = r 1 +β 0 p 0 = r 1 +β 0 r 0, r 1 = r 0 α 0 Ar 0, by induction, we have r 2 = r 1 α 0 Ap 1 = r 1 α 0 A(r 1 +β 0 r 0 ) = r 0 α 0 Ar 0 α 0 A(r 0 α 0 Ar 0 +β 0 r 0 ). T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

21 Method of conjugate directions Input: Let A be s.p.d., b and x 0 R n. Given p 0,...,p n 1 R n \{0} pairwisely A-orthogonal. 1: Compute r 0 = b Ax 0 ; 2: for k = 0,...,n 1 do 3: Compute α k = p kr k p T k Ap, x k+1 = x k +α k p k,; k 4: Compute r k+1 = r k α k Ap k = b Ax k+1. 5: end for From Theorem 7 we get x n = A 1 b. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

22 Practical Implementation Algorithm: Conjugate Gradient method (CG-method) Input: Given s.p.d. A, b R n and x 0 R n and r 0 = b Ax 0 = p 0. 1: Set k = 0. 2: repeat 3: Compute α k = pt k r k p T k Ap ; k 4: Compute x k+1 = x k +α k p k ; 5: Compute r k+1 = r k α k Ap k = b Ax k+1 ; 6: Compute β k = rt k+1 Ap k p T k Ap ; k 7: Compute p k+1 = r k+1 +β k p k ; 8: Set k = k+1; 9: until r k = 0 T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

23 Theorem 8 The CG-method holds (i) If k steps of CG-method are executable, i.e., r i 0, for i = 0,...,k, then p i 0, i k and p T i Ap j = 0 for i,j k, i j. (ii) The CG-method breaks down after N steps for r N = 0 and N n. (iii) x N = A 1 b. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

24 Proof: (i): By induction on k, it is trivial for k = 0. Suppose that (i) is true until k and r k+1 0. Then p k+ 1 is well-defined. We want to verify p T k+1 Ap j = 0, for j = 0,1,...,k. From Lines 6 and 7 in CG Algorithm, we have Let j < k, from Line 7 we have It is enough to show that p T k+1 Ap k = r T k+1 Ap k +β k p T k Ap k = 0. p T k+1 Ap j = r T k+1 Ap j +β k p T k Ap j = r T k+1 Ap j. Ap j span{p 0,...,p j+1 }, j < k. (15) Then from the relation p T i r j = 0, i < j k+1, which has been proved in (14), follows assention. university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

25 Claim (15): For r j 0, it holds that α j 0. Line 5shows that Ap j = 1 α j (r j r j+1 ) span{r 0,...,r j+1 }. Line 7 shows that span{r 0,...,r j+1 } = span{p 0,...,p j+1 } with r 0 = p 0, so is (15). (ii): Since {p i } k+1 i=0 0 and are mutually A-orthogonal, p 0,...,p k+1 are linearly independent. Hence there exists a N n with r N = 0. This follows x N = A 1 b. Advantage 1 Break-down in finite steps. 2 Less cost in each step: one matrix vector. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

26 Convergence of CG-method Consider the A-norm with A being s.p.d. x A = (x T Ax) 1/2. Let x = A 1 b. Then from (2) we have F(x) F(x ) = 1 2 (x x ) T A(x x ) = 1 2 x x 2 A, where x k is the k-th iterate of CG-method. From Theorem 7 x k minimizes the functional F on x 0 + span{p 0,...,p k 1 }. Hence it holds x k x A y x A, y x 0 +span{p 0,..,p k 1 }. (16) T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

27 From Lines 5 and 7 in CG Algorithm it is easily seen that both p k and r k can be written as linear combination of r 0, Ar 0,...,A k 1 r 0. If y x 0 +span{p 0,...,p k 1 }, then y = x 0 +c 1 r 0 +c 2 Ar c k A k 1 r 0 = x 0 + P k 1 (A)r 0, where P k 1 is a polynomial of degree k 1. But r 0 = b Ax 0 = A(x x 0 ), thus y x = (x 0 x )+ P k 1 (A)A(x x 0 ) [ ] = I A P k 1 (A) (x 0 x ) = P k (A)(x 0 x ), (17) where degree P k k and P k (0) = 1. (18) T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

28 Conversely, if P k is a polynomial of degree k and satisfies (18), then x +P k (A)(x 0 x ) x 0 +S k. Hence (16) means that if P k is a polynomial of degree k with P k (0) = 1, then x k x A P k (A)(x 0 x ) A. Lemma 9 Let A be s.p.d. It holds for every polynominal Q k of degree k that max x 0 Q k (A)x A x A = ρ(q k (A)) = max{ Q k (λ) : λ eigenvalue of A}. (19) Proof university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

29 From (19) we have that where degree P k k and P k (0) = 1. x k x A ρ(p k (A)) x 0 x A, (20) Replacement problem for (20): For 0 < a < b, min{max{ P k (λ) : a λ b, deg(p k (λ)) k with P k (0) = 1}} (21) We use Chebychev poly. of the first kind for the solution. They are defined by { T0 (t) = 1,T 1 (t) = t, T k+1 (t) = 2tT k (t) T k 1 (t). it holds T k (cosφ) = cos(kφ) by using cos((k +1)φ)+cos((k 1)φ) = 2cosφcoskφ. Especially, T k (cos jπ k ) = cos(jπ) = ( 1)j, for j = 0,...,k, university-logo T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

30 i.e. T k takes maximal value one at k +1 positions in [ 1,1] with alternating sign. In addition (Exercise!), we have T k (t) = 1 2 [ (t+ t 2 1) k +(t t 2 1) k]. (22) Lemma 10 The solution of the problem (21) is given by Q k (t) = T k ( 2t a b b a )/ T k ( a+b a b i.e., for all P k of degree k with P k (0) = 1 it holds max Q k(λ) max P k(λ). λ [a,b] λ [a,b] ), T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

31 Proof: Q k (0) = 1. If t runs through the interval [a,b], then (2t a b)/(b a) runs through the interval [ 1,1]. Hence, in [a,b], Q k (t) has k+1 extreme with alternating sign and absolute value δ = T k ( a+b a b ) 1. If there are a P k with max{ P k (λ) : λ [a,b]} < δ, then Q k P k has the same sign as Q k of the extremal values, so Q k P k changes sign at k+1 positions. Hence Q k P k has k roots, in addition a root zero. This contradicts that degree (Q k P k ) k. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

32 Lemma 11 It holds δ = T k ( ) b+a 1 a b = 1 ( T b+a k b a ) = 2ck 1+c 2k 2ck, where c = κ 1 κ+1 and κ = b/a. Proof Theorem 12 CG-method satisfies the following error estimate x k x A 2c k x 0 x A, where c = κ 1 κ+1, κ = λ 1 λ n and λ 1 λ n > 0 are the eigenvalues of A. university-logo Proof T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

33 Remark 6 To compare with Gradient method (see (7b)): Let x G k be the kth iterate of Gradient method. Then x G k x A λ 1 λ n k λ 1 +λ n x 0 x A. But λ 1 λ n = κ 1 κ 1 λ 1 +λ n κ+1 > = c, κ+1 because in general κ κ. Therefore the CG-method is much better than Gradient method. T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

34 Appendix Proof: Q k (A)x 2 A x 2 A = xt Q k (A)AQ k (A)x x T Ax = (A1/2 x) T Q k (A)Q k (A)(A 1/2 x) (A 1/2 x)(a 1/2 x) = zt Q k (A) 2 z z T z ρ(q k (A) 2 ) = ρ 2 (Q k (A)). (Let z := A 1/2 x) Equality holds for suitable x, hence the first equality is shown. The second equality holds by the fact that Q k (λ) is an eigenvalue of Q k (A), where λ is an eigenvalue of A. return T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

35 Proof: For t = b+a b a = κ+1 κ 1, we compute t+ t 2 1 = κ+1 κ 1 = c 1 and t t 2 1 = κ 1 κ+1 = c. Hence from (22) follows δ = 2 2ck c k = +c k 1+c 2k 2ck. return T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

36 Proof: From (20) we have x k x A ρ(p k (A)) x 0 x A max{ P k (λ) : λ 1 λ λ n } x 0 x A, for all P k of degree k with P k (0) = 1. From Lemma 10 and Lemma 11 follows that x k x A max{ Q k (λ) : λ 1 λ λ n } x 0 x A 2c k x 0 x A. return T.M. Huang (NTNU) Conjugate Gradient Method October 10, / 36

Tsung-Ming Huang. Matrix Computation, 2016, NTNU

Tsung-Ming Huang. Matrix Computation, 2016, NTNU Tsung-Ming Huang Matrix Computation, 2016, NTNU 1 Plan Gradient method Conjugate gradient method Preconditioner 2 Gradient method 3 Theorem Ax = b, A : s.p.d Definition A : symmetric positive definite