Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Size: px

Start display at page:

Download "Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019"

Lorraine Lamb
5 years ago
Views:

1 Math 563: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 219 hroughout, A R n n is symmetric and positive definite, and b R n. 1 Steepest Descent Method We present the steepest descent method for solving the minimization problem First, we prove the following result. min x R n g(x) = 1 2 x Ax d x. heorem 1 he vector x is a solution to the linear system Ax = d if and only if x minimizes g(x) = 1 2 x Ax d x. Proof. Let x and v be fixed vectors, and t a real number. We have (1) h(t) = g(x + tv) = 1 2 (x + tv) A(x + tv) d (x + tv) = 1 2 x Ax + tv Ax t2 (v Av) d x td v = g(x) tv (d Ax) t2 v Av. Since v Av >, the function h(t) attains its minimum at h (t) = t(v Av) v (d Ax) = t = v (d Ax) v. Av And the function value is h( t) = g(x) (v (d Ax)) 2 2v. Av From here we can conclude that: x is the solution of Ax = d x minimizes g(x). Definition 2 (residual vector) We call r = d Ax the residual vector associated with x. he residual is indeed the negative gradient of g(x), which is also the steepest descent direction at x. Given a guess x, the steepest descent method seeks a new iterate x + along the steepest direction r = d Ax such that x + = x + tr, where t := argmin g(x + tr). t 1

2 o find t, we use (1) with v = r = d Ax and obtain g(x + tr) = g(x) tr r t2 r Ar, the minimizer is t = r r r Ar. In summary, we have the Steepest Descent Method Set ε >, k =, x =, and r = d Ax. while r k > ε, end r k r k t k = rk Ar, k x k+1 = x k + t k r k, r k+1 = r k t k Ar k, k := k + 1, 2 Conjugate Gradient Method Suppose at iterate x m, instead of seeking the new iterate x m+1 in the steepest descent direction r m = b Ax m, we seek in multiple directions, say, x m+1 = x m + c v + c 1 v c m v m, where v,..., v m are direction vectors. his can be written as x m+1 = x m + Rc where R = [ v v 1 v m ], c = (c, c 1,..., c m ) R m+1. So our goal is to determine c R m+1 such that g(x m+1 ) is minimized. We use (1) and obtain g(x m+1 ) = g(x m + Rc) = g(x m ) c R (d Ax m ) c (R AR)c = g(x m ) c R r m c (R AR)c. Now choose c such that c R r m c (R AR)c is minimized. If R AR is symmetric positive definite, then by heorem 1, c is the solution of the linear system (R AR)c = R r m. Since A is symmetric positive definite, R AR will be symmetric positive definite if the columns of R are linearly independent. And the matrix R AR would be easy to invert if it were a diagonal matrix, which requires v i Av j = for all i j. We can achieve this by Gram-Schmidt process. Suppose v = r = d Ax and we will find x 1 = x +αv and v 1 such that v Av 1 = and v and v 1 is orthogonal to r 1 = d Ax 1. First, = v r 1 = v (d A(x + tv )) = v r αv Av α = v v v Av. 2

3 So x 1 = x + v v v Av v. Next, we find v 1 in the form v 1 = r 1 + βv. So = v1 Av = (r 1 + βv ) Av β = r 1 Av v Av. [ ] hen the system (R c AR) = R r 1 becomes [ v v 1 c 1 ] A [ ] [ ] [ ] c v v v 1 = c 1 v1 r 1 hus, c = and c 1 = v 1 r 1 v1 Av. We also notice that 1 [ ] [ ] [ ] v Av c v1 Av = 1 c 1 v1 r 1 r 1, v 1 span {v, v 1 } = span {v, r 1 } = span {r, r 1 }. Now suppose we have found v, v 1,..., v m such that v i Av j =, for i, j =, 1,..., m, i j v i r m =, for i =, 1..., m 1, r i+1 = d Ax i+1 = d A(x i + α i v i ) = r i α i Av i, span {v, v 1,..., v m } = span {r, r 1,..., r m } =: L m. We find x m+1 = x m + α m v m such that Note that r m+1 L m. r m+1 = d Ax m+1 = d Ax m αav m = r m α m Av m. Since r m and Av m are already orthogonal to {v, v 1,..., v m 1 } by assumption, we only need to find α such that r m+1 is orthogonal to v m. So, Now we find v m+1 = r m+1 + β m v m such that Notice that = v mr m+1 = v mr m α m v mav m α m = v mr m v mav m. Av m+1 L m v m+1 AL m = span {Av, Av 1,..., Av m }. Av span {r 1, r } span {r, r 1,..., r m } = span {v, v 1,..., v m }, Av 1 span {r 2, r 1 } span {r, r 1,..., r m } = span {v, v 1,..., v m }, Av m 1 span {r m, r m 1 } span {r, r 1,..., r m } = span {v, v 1,..., v m }, so v m+1 = r m+1 + β m v m {Av,..., Av m 1 }. Hence, we only need to determine β m so that = v m+1av m = r m+1av m + β m v mav m. hus, β m = r m+1 Av m v mav m. 3

4 In summary, we have constructed the so-called conjugate gradient method: Set v = r = b Ax. For each k =, 1, 2,..., r k v k α k = vk Av k x k+1 = x k + α k v k, β k = r k+1 Av k v k Av k v k+1 = r k+1 + β k v k. he above process leads us to a new concept. Definition 3 (A-orthogonality) he set of nonzero vectors {v 1,..., v k } is said to be A-orthogonal if i j : v i Av j =. he conjugate gradient method has created a set of A-orthogonal search directions v,..., v m. heorem 4 Every A-orthogonal system is linearly independent. Proof. Let {v 1,..., v k } be an A-orthogonal system. Suppose λ 1 v λ k v k =. hen for every i = 1,..., k, = λ 1 v i, Av λ k v i, Av k = λ i v i, Av i. hus, λ i = since v i, Av i >. So {v i } is linearly independent. A different presentation for the conjugate gradient method is given in [1, p.27]. heorem 5 (finite convergence) he conjugate gradient method converges after n steps. Proof. he residue r k is orthogonal to span {v, v 1,..., v k 1 }. hus, r n =, that means, b Ax n =, i.e., x n is the solution. 3x 1 x 2 + x 3 = 1, Example 6 Use conjugate gradient method to solve x 1 + 6x 2 + 2x 3 =, x 1 + 2x 2 + 7x 3 = 4. Solution. x = (,, ), v = r = b Ax = (1,, 4), α = r v v Av β = r 1 Av v Av α 1 = r 1 v 1 v 1 Av 1 β 1 = r 2 Av 1 v 1 Av 1 = , x 1 = x + α v = ( ,, ), r 1 = b Ax 1 = (.32523, ,.8119), =.55123, v 1 = r 1 + β 1 v = (.87646, , ), =.17551, x 2 = x 1 + α 1 v 1 = ( , ,.59116), r 2 = b Ax 2 = (.22688,.78793, ), =.55288, v 2 = r 2 + β 2 v 1 = ( , , ), 4

5 α 2 = r 2 v 2 v 2 Av 2 =.42517, x 3 = x 2 + α 2 v 2 = ( , ,.61856), r 3 = b Ax 3 = (,, ). References [1] D. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd edition, Springer (28). [2] R. Burden, D. Faires, Numerical Analysis, 9th edition, Brooks/Cole Publishing Co. (211). [3] R.E. White Computation Mathematics: Models, Methods, and Analysis with Matlab, CRC Press (24). 5

Conjugate Gradient Method

Conjugate Gradient Method Hung M Phan UMass Lowell April 13, 2017 Throughout, A R n n is symmetric and positive definite, and b R n 1 Steepest Descent Method We present the steepest descent method for