OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Size: px

Start display at page:

Download "OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods"

Monica Malone
5 years ago
Views:

1 OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization Sept 25, / 12

Quiz True or False: From any arbitrary initial point, BFGS algorithm guarantees convergence to a stationary point For a general nonlinear smooth function, BFGS algorithm has superlinear local

2 Quiz True or False: From any arbitrary initial point, BFGS algorithm guarantees convergence to a stationary point For a general nonlinear smooth function, BFGS algorithm has superlinear local convergence under some mild conditions Quasi-Newton methods are good at sparse large-scale optimization problems What is the idea of LBFGS method? (Lecture 9) Nonlinear Optimization Sept 25, / 12

3 Today s Outline Introduction to trust region methods Trust region algorithm Trust region subproblems (Lecture 9) Nonlinear Optimization Sept 25, / 12

An alternative perspective Model function to approximate f (x k + p): m k (p) = f (x k ) + f (x k ) p + 1 2 p B k p This model is only locally reliable: p cannot be too large Let s go local 1 Step 1:

4 An alternative perspective Model function to approximate f (x k + p): m k (p) = f (x k ) + f (x k ) p p B k p This model is only locally reliable: p cannot be too large Let s go local 1 Step 1: Choose a trust region (a compact convex neighborhood around the current point) 2 Step 2: Find a minimizer of some model function in the trust region Q: Why do we need the trust region? (Lecture 9) Nonlinear Optimization Sept 25, / 12

5 Trust region subproblem 1 Model function: m k (p) = f (x k ) + g k p p B k p 2 Trust region radius:, p (2-norm by default) B k could be the true Hessian B k could be a quasi-newton approximate matrix Trust region subproblem (TRP): min p m(p) s.t. p Make p k as large as possible (Why?), as long as m(p) agrees closely with f (x + p) How close? A-Red k : actual reduction in f, f k := f (x k ) f (x k + p k ) P-Red k : predicted reduction in f, m k := f (x k ) m k (p k ) Ratio: ρ k = f k m k, if close to 1, then we have a good agreement (Lecture 9) Nonlinear Optimization Sept 25, / 12

6 Trust region: simple idea If ρ is far from 1: If ρ is close to 1, and p hit the boundary: p = δ: (Lecture 9) Nonlinear Optimization Sept 25, / 12

7 Trust region algorithm Algorithm 1 Trust region ( > 0, 0 (0, ), η [0, 1 4 )) 1: Given x k, k > 0, calculate g k = f (x k ) and B k 2: (Key) Solve TRP k for p k 3: Evaluate f (x k + p k ), and calculate f k, ρ k 4: if ρ k < 1 4 (not a good model, reduce the trust region radius) then 5: k+1 = k 4 6: else if p k = k (hit the boundary) then 7: k+1 = min(2 k, ) 8: else 9: k+1 = k 10: end if 11: if ρ k > η (so there are some reduction in the model) then 12: x k+1 = x k + p k 13: else 14: x k+1 = x k (null step) 15: end if (Lecture 9) Nonlinear Optimization Sept 25, / 12

8 Trust region and line search Example: min x (x 2 1) 2 1 Line search at 0: cannot get over saddle point 2 Trust region: Escape the saddle point! min 2d 2 s.t. d d (Lecture 9) Nonlinear Optimization Sept 25, / 12

9 Solution of TRP Trust region subproblem (TRP): min p m(p) s.t. p Theorem (More and Sorensen 1983) Vector p is a global minimizer of TRP if and only if p and λ 0 that: 1 (B + λi)p = g 2 λ( p ) = 0 3 B + λi is PSD (Lecture 9) Nonlinear Optimization Sept 25, / 12

10 Two cases 1 λ = 0, this happens only if B is PSD, and p = B 1 g satisfies p 2 λ > 0, then we look for a number λ that is big enough so that B + λi is PSD, and p = (B + λi) 1 g satisfies p = Let p(λ) = (B + λi) 1 g, we just need to solve p(λ) =, which is a single-variable root finding problem! (Lecture 9) Nonlinear Optimization Sept 25, / 12

11 How to solve? No explicit form p(λ)! Let s do some derivation! (Lecture 9) Nonlinear Optimization Sept 25, / 12

12 Algorithm for solving TRP Algorithm 2 Trust region subproblem 1: Given λ 0, > 0 2: for l = 0, 1, do 3: Factorize B + λ l I = R R (safeguard here to ensure Cholesky factorization) 4: Solve R Rp l = g 5: Solve R q l = p l 6: Set λ l+1 = λ l + p l 2 7: end for p l q l 2 Good news: typically just 2 or 3 steps of Newton iteration! (Lecture 9) Nonlinear Optimization Sept 25, / 12

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization