January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

Size: px

Start display at page:

Download "January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13"

Brianna Parks
5 years ago
Views:

1 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

2 Problem Extend the linear CG method to non-quadratic functions f : R n R: minimize x R n f (x). Reminder: CG for quadratic functions 1: r 0 = Ax 0 b, p 0 = r 0, k = 0 2: while r k > ɛ do 3: α k = (r T k p k)/(p T k Ap k) 4: x k+1 = x k + α k p k 5: r k+1 = Ax k+1 b = r k + α k Ap k 6: β k+1 = (rk+1 T Ap k)/(pk T Ap k) 7: p k+1 = r k+1 + β k+1 p k 8: k = k + 1 9: end while Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 2 / 13

3 Problem Extend the linear CG method to non-quadratic functions f : R n R: minimize x R n f (x). Almost done: 1: r 0 = f (x 0 ), p 0 = r 0, k = 0 2: while r k > ɛ do 3: α k = linesearch for f along p k 4: x k+1 = x k + α k p k 5: r k+1 = f (x k+1 ) 6: β k+1 =??? 7: p k+1 = r k+1 + β k+1 p k 8: k = k + 1 9: end while Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 3 / 13

4 Algebraic manipulations with linear CG: p k = r k + β k p k 1 rk T p k = rk T [ r k + β k p k 1 ] = rk T r k β k rk T p k 1 = rk T }{{} r k =0 α k = r T k p k p T k Ap k = r T k r k p T k Ap k Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 4 / 13

5 Algebraic manipulations with linear CG: r k+1 = r k + α k Ap k Ap k = α 1 k [r k+1 r k ] β k+1 = r T k+1 Ap k p T k Ap k = pt k Ap k r T k r k = α 1 rk+1 T [r k+1 r k ] k pk T Ap k r T k+1 [r k+1 r k ] p T k Ap k = r k+1 T [r k+1 r k ] rk+1 T rk T r = r k+1 k rk T }{{} r k }{{} leads to Polak Ribière leads to Fletcher Reeves Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 5 / 13

6 Fletcher Reeves algorithm: 1: p 0 = f (x 0 ), k = 0 2: while f (x k ) > ɛ do 3: α k = linesearch for f along p k 4: x k+1 = x k + α k p k 5: β FR k+1 = f (x k+1) 2 / f (x k ) 2 6: p k+1 = f (x k+1 ) + β FR k+1 p k 7: k = k + 1 8: end while Big question: Under which conditions on f, α k 1 is p k a descent direction at x k for f? Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 6 / 13

7 Fletcher Reeves algorithm: Big question: Under which conditions on f, α k 1 is p k a descent direction at x k for f? f (x k ) T p k = f (x k ) 2 + β FR k f (x k ) T p k 1 }{{} =0 for exact linesearch Conclusion: linesearch should be accurate enough to guarantee descent. Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 7 / 13

8 Analysis of Fletcher Reeves: Lemma 5.6 FR CG algorithm with strong Wolfe linesearch; 0 < c 1 < c 2 < 1/2. Then Proof: Induction in k. 1 1 c 2 pt k f (x k) f (x k ) 2 2c c 2 < 0. Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 8 / 13

9 Global convergence of Fletcher Reeves: Theorem 5.7 Assume: 1 f is bounded from below and is Lipschitz continuously differentiable (prerequisites for Zoutendijk s); 2 α k satisfies strong Wolfe s, 0 < c 1 < c 2 < 1/2. Then lim inf k f (x k ) = 0. Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 9 / 13

10 Main problem of Fletcher Reeves: Suppose Then cos θ k = pt k f (x k) p k f (x k ) 0 cos θ k+1 0 If we (for any reason) end up with a bad direction p k then FR continues to generate poor directions Non-linear conjugate gradient method(s): Fletcher ReevesJanuary Polak Ribière 29, 2014 Hestenes Stiefel 10 / 13

11 Polak Ribière algorithm: β PR k+1 = f (x k+1) T [ f (x k+1 ) f (x k )] f (x k ) 2 Warning: Strong Wolfe conditions do not guarantee that p k - descent direction [but the algorithm often works better than FR in practice]! cos θ k 0 = cos θ k+1 1 Even better: β PR+ k+1 = max{0, βpr k+1 } Non-linear conjugate gradient method(s): Fletcher ReevesJanuary Polak Ribière 29, 2014 Hestenes Stiefel 11 / 13

12 Polak Ribière algorithm: β PR k+1 = f (x k+1) T [ f (x k+1 ) f (x k )] f (x k ) 2 Theorem Even with exact line-search in the Polak Ribière CG algorithm, there are examples f : R 3 R, f C 2 : for some x 0 R 3 it holds that lim inf k nablaf (x k ) > 0. Non-linear conjugate gradient method(s): Fletcher ReevesJanuary Polak Ribière 29, 2014 Hestenes Stiefel 12 / 13

13 Other choices of β: ŷ k = f (x k+1 ) f (x k ) Hestenes Stiefel: β k+1 = ŷ T k f (x k+1) ŷ T k p k Dai Yuan, 1999 [works with strong Wolfe]: β k+1 = f (x k+1) 2 ŷ T k p k Hager Zhang, 2005 [works with strong Wolfe]: β k+1 = ŷk f (x k+1 ) ŷ T k p k 2 ŷ k 2 p T k f (x k+1) (ŷ T k p k) 2 Non-linear conjugate gradient method(s): Fletcher ReevesJanuary Polak Ribière 29, 2014 Hestenes Stiefel 13 / 13

14 Convergence rate: Linear! Though one can show faster convergence rate with restarts after every N iterations in practice we want to stop the algorithm long before N iterations! Non-linear conjugate gradient method(s): Fletcher ReevesJanuary Polak Ribière 29, 2014 Hestenes Stiefel 14 / 13

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file