Chapter 10 Conjugate Direction Methods

Size: px

Start display at page:

Download "Chapter 10 Conjugate Direction Methods"

Lydia Thompson
5 years ago
Views:

1 Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, Wei-Ta Chu 2012/4/13

2 Introduction Conjugate direction methods can be viewed as being intermediate between the method of steepest descent and Newton s method. Solve quadratics of variables in steps. The usual implementation, the conjugate gradient algorithm, requires no Hessian matrix evaluations. No matrix inversion and no storage of an matrix are required. The conjugate direction methods typically perform better than the method of steepest descent, but not as well as Newton s method. 2

3 Introduction For a quadratic function of variables,,, the best direction of search is in the -conjugate direction. Basically, two directions and in are said to be - conjugate if. Definition 10.1: Let be a real symmetric matrix. The directions are -conjugate if for all, we have 3

4 Introduction Lemma 10.1: Let be a symmetric positive definite matrix. If the directions,, are nonzero and -conjugate, then they are linearly independent. Proof: Let be scalars such that Premultiplying this equality by,, yields because all other terms,, by -conjugacy. But and ; hence,,. Therefore,,, are linearly independent. 4

5 Example Let Note that. The matrix is positive definite because all its leading principal minors are positive: Our goal is to construct a set of -conjugate vectors Let,,. We require that. We have 5

6 Example Let,,. Then,, and thus To find the third vector, which would be -conjugate with and, we require that and. We have If we take, then the resulting set of vectors is mutually conjugate. 6

7 The Conjugate Direction Algorithm This method of finding -conjugate vectors is inefficient. A systematic procedure for finding -conjugate vectors can be devised using the idea underlying the Gram-Schmidt process of transforming a given basis of into an orthonormal basis of. 7

8 The Conjugate Direction Algorithm Minimizing the quadratic function of variables where,. Note that because, the function has a global minimizer that can be found by solving Basic Conjugate Direction Algorithm. Given a starting point and -conjugate directions ; for, 8

9 The Conjugate Direction Algorithm Theorem 10.1: For any starting point, the basic conjugate direction algorithm converges to the unique (that solves ) in steps; that is,. Proof: Consider. Because the are linearly independent, there exist constants, such that Now premultiply both sides of this equation by to obtain where the terms, by the - conjugate property. Hence, 9

10 The Conjugate Direction Algorithm Now, we can write Therefore, So writing and premultiplying the above by, we obtain because and. Thus, and, which completes the proof. 10

11 Example Find the minimizer of using the conjugate direction method with the initial point, and -conjugate direction and. We have and hence Thus, 11

12 Example To find, we compute and Therefore, Because is a quadratic function in two variables, 12

13 The Conjugate Direction Algorithm For a quadratic function of variables, the conjugate direction method reaches the solution after steps. Suppose that we start at and search in the direction to obtain we claim that 13

14 The Conjugate Direction Algorithm The equation implies that has the property that, where. To see this, apply the chain rule to get Evaluating the above at, we get Because is a quadratic function of, and the coefficient of the term in is, the above implies that 14

15 The Conjugate Direction Algorithm Using a similar argument, we can show that for all, and hence, Lemma 10.2: In the conjugate direction algorithm, for all,, and Proof: Note that because. Thus, 15

16 The Conjugate Direction Algorithm Prove by induction. The result is true for because We now show that if the result is true for (i.e. ), then it is true for, i.e. Fix and. By the induction hypothesis, Because and by the -conjugacy, we have It remains to be shown that 16

17 The Conjugate Direction Algorithm Indeed, because. Therefore, by induction, for all and 17

18 The Conjugate Direction Algorithm By Lemma 10.2 we see that is orthogonal to any vector from the subspace spanned by We now show that not only does satisfy, but also In other words, if we write then we can express. As increases, the subspace expands, and will eventually fill the whole of R n (provided that are linearly independent). 18

19 The Conjugate Direction Algorithm Therefore, for some sufficiently large, will lie in. For this reason, the above result is sometimes called the expanding subspace theorem. To prove the expanding subspace theorem, define the matrix Note that. Also Hence, 19

20 The Conjugate Direction Algorithm Now, consider any vector. There exists a vector such that. Let. Note that is a quadratic function and has a unique minimizer that satisfies the FONC. By the chain rule, Therefore, By Lemma 10.2,. Therefore, satisfies the FONC for the quadratic function, and hence is the minimizer of ; that is, 20

21 The Conjugate Gradient Algorithm The conjugate direction algorithm is very effective. However, to use the algorithm, we need to specify the - conjugate directions. Fortunately, there is a way to generate -conjugate directions as we perform iterations. The conjugate gradient algorithm does not use prespecified conjugate directions, but instead computes the directions as the algorithm proceeds. At each stage of the algorithm, the direction is calculated as a linear combination of the previous direction and the current gradient, in such as way that all the directions are mutually - conjugate. 21

22 The Conjugate Gradient Algorithm For a quadratic function of variables, we can locate the function minimizer by performing searches along mutually conjugate directions. We consider the quadratic function where. Our first search direction from an initial point is in the direction of steepest descent; that is, Thus,, where 22

23 The Conjugate Gradient Algorithm We search in a direction that is -conjugate to. We choose as a linear combination of and. In general, at the th step, we choose as a linear combination of and. Specifically, we choose The coefficients, are chosen in such a way that is -conjugate to. This is accomplished by choosing to be 23

24 The Conjugate Gradient Algorithm The algorithm 1. Set ; select the initial point 2.. If, stop; else, set If, stop Set ; go to step 3. 24

25 Example Proposition 10.1: In the conjugate gradient algorithm, the directions are -conjugate. Consider the quadratic function We find the minimizer using the conjugate gradient algorithm, using the starting point We can represent as where 25

26 Example We have Hence, 26

27 Example Hence, 27

28 Proof of Proposition 10.1 We use induction. We first show that. Substituting for we see that Assume that, are -conjugate directions. From Lemma 10.2 we have Thus, is orthogonal to each of the directions We now show that 28

29 Proof of Proposition 10.1 Fix. We have Substituting this equation into the previous one yields Because, it follows that We are now ready to show that We have If, then, by virtue of the induction hypothesis. Hence, we have 29

30 Proof of Proposition 10.1 But. Because Thus, It remains to show. We have Using the expression for, we get, which completes the proof. 30

31 The Conjugate Gradient Algorithm for Nonquadratic Problems The algorithm can be extended to general nonlinear functions by interpreting as a second-order Taylor series approximation of the objective function. For a quadratic, the matrix, the Hessian of the quadratic, is constant. However, for a general nonlinear function the Hessian is a matrix that has to be reevaluated at each iteration of the algorithm. Observe that appears only in the computation of the scalars and. Because can be replaced by a numerical line search procedure, we need only concern ourselves with the formula for. 31

32 The Conjugate Gradient Algorithm for Nonquadratic Problems Hestenes-Stiefel Formula. Replacing the by the term. The two terms are equal in the quadratic case.. Premultiplying both sides by, subtracting from both sides, and recognizing that we get, which we can rewrite as. Therefore, the Hestenes-Stiefel Formula 32

33 The Conjugate Gradient Algorithm for Nonquadratic Problems Polak-Ribiere Formula. Starting from the Hestenes-Stiefel formula, we multiply out the denominator to get By Lemma 10.2,. Also, since and premultiplying this by, we get where once again we used Lemma Hence, we get the Polak-Ribiere formula 33

34 The Conjugate Gradient Algorithm for Nonquadratic Problems Flether-Reeves Formula. Starting with the Polak-Ribiere formula, we multiply out the numerator to get We now use, which we get by using the equation and applying Lemma This leads to the Fletcher-Reeves formula 34

35 The Conjugate Gradient Algorithm for Nonquadratic Problems Without the Hessian matrix, all we need are the objective function and gradient values at each iteration. For the quadratic case the three expressions for are exactly equal. A very important issue in minimization problems of nonquadratic functions is the line search. If the line search is known to be inaccurate, the Hestenes-Stiefel formula for is recommended. 35

36 Homework 2 Exercises 8.3, 8.15 Exercises 9.1 Exercises 10.9 Hand over your homework at the class of Apr

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point