Optimization and Root Finding. Kurt Hornik

Size: px

Start display at page:

Download "Optimization and Root Finding. Kurt Hornik"

Marcus Lester Hill
5 years ago
Views:

1 Optimization and Root Finding Kurt Hornik

2 Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2

3 Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Unconstrained optima of ƒ must be critical points, i.e., solve ƒ () = 0 (Note: ƒ scalar for optimization and typically vector-valued for root finding.) Linear equations and linear least squares problems can be solved exactly using techniques from numerical linear algebra. Otherwise: solve approximately as limits of iterations k+1 = g( k ). Slide 2

4 Fixed Point Iteration Consider iteration k+1 = g( k ) with limit and g smooth. Then = g( ) and where k+1 = g( k ) g( ) + J g ( )( k ) = + J g ( )( k ) J g () = g () j is the Jacobian of g at (sometimes also written as ( g) ()). Thus: k+1 J g ( )( k ) Slide 3

5 Fixed Point Iteration In general, for local convergence it is necessary and sufficient that ρ(j g ( )) < 1 where ρ(a) = mx{ λ : λ is an eigenvalue of A} is the spectral radius of A. In this case, we get (at least) linear (local) convergence, i.e., k+1 C k for some 0 < C < 1 and k sufficiently large. Slide 4

6 Outline Root Finding Unconstrained Optimization Constrained Optimization Optimization with R Slide 5

7 Newton s Method Suppose we have an approximation k to and that in a neighborhood of k, the linearization L k () = ƒ ( k ) + J ƒ ( k )( k ) is a good approximation to ƒ. An obvious candidate for a next approximation is obtained by solving L k () = 0, i.e., k+1 = k J ƒ ( k ) 1 ƒ ( k ) = g( k ), g() = J ƒ () 1 ƒ (), This is the mathematical form: the computational one is Solve J ƒ ( k )s k = ƒ ( k ) for s k ; k+1 = k + s k. Slide 6

8 Newton s Method Iteration function g has components and partials g () = [J ƒ () 1 ] ƒ (), g j () = δ j As ƒ ( ) = 0, g j ( ) = δ j [J ƒ () 1 ] ƒ () j [J ƒ () 1 ] ƒ j (). [J ƒ ( ) 1 ] ƒ j ( ) = δ j [J ƒ ( ) 1 J ƒ ( )] j = 0 so that J g ( ) vanishes! Thus, local convergence is super-linear (and can in fact be shown to be at least quadratic) in the sense that Slide 7 k+1 α k k, lim k α k = 0.

9 Newton s Method: Discussion Due to super-linear convergence, works very nicely once we get close enough to a root where the above approximations apply. However (with n the dimension of the problem): In each step, O(n 2 ) derivatives needs to be computed (exactly or numerically), which can be costly (in particular if function and/or derivative evaluations are costly); Slide 8

10 Newton s Method: Discussion Due to super-linear convergence, works very nicely once we get close enough to a root where the above approximations apply. However (with n the dimension of the problem): In each step, O(n 2 ) derivatives needs to be computed (exactly or numerically), which can be costly (in particular if function and/or derivative evaluations are costly); In each step, an n n linear system needs to be solved, which takes O(n 3 ) operations. Slide 8

11 Newton s Method: Discussion Due to super-linear convergence, works very nicely once we get close enough to a root where the above approximations apply. However (with n the dimension of the problem): In each step, O(n 2 ) derivatives needs to be computed (exactly or numerically), which can be costly (in particular if function and/or derivative evaluations are costly); In each step, an n n linear system needs to be solved, which takes O(n 3 ) operations. For super-linear convergence the exact Jacobian is not needed. Slide 8

12 Super-Linear Convergence Attractive because faster than linear. Also, k+1 k k k+1 so that with super-linear convergence, k+1 k k 1 k+1 k = α k 0. I.e., relative change in the k update is a good approximation for the relative error in the approximation of by k (which is commonly used for stopping criteria). Slide 9

13 Super-Linear Convergence One can show: for sequences k+1 = k B 1 k ƒ ( k) with suitable non-singular matrices B k one has super-linear (local) convergence to with ƒ ( ) = 0 if and only if (B k J ƒ ( ))( k+1 k ) lim = 0, k k+1 k i.e., if B k converges to J ƒ ( ) along the directions s k = k+1 k of the iterative method. Suggests investigating iterative Quasi-Newton methods with such B k which are less costly to compute and/or invert. Slide 10

14 Broyden s Method Note that Newton s method is based on the approximation ƒ ( k+1 ) ƒ ( k ) + J ƒ ( k )( k+1 k ). Suggests considering approximations B k+1 which exactly satisfy the secant equation ƒ ( k+1 ) = ƒ ( k ) + B k+1 ( k+1 k ). Of all such B k+1, the one with the least change to B k seems particularly attractive. I.e., solutions to B B k F min, y k = ƒ ( k+1 ) ƒ ( k ) = Bs k (where F is the Frobenius norm). Slide 11

15 Broyden s Method We have B B k 2 F = vec(b) vec(b k) 2, vec(bs k y k ) = (s k )vec(b) y k. Thus, writing b = vec(b), b k = vec(b k ), A k = s k, optimization problem is b b k 2 min, A k b = y k. Convex quadratic optimzation problem with linear constraints: can solve using geometric ideas (orthogonal projections on affine subspaces) or using Lagrange s method. Slide 12

16 Broyden s Method Lagrangian: L(b, λ) = 1 2 b b k 2 + λ (A k b y k ). For critical point: b L = b b k + A k λ = 0, λl = A k b y k = 0. Thus b = b k A k λ with y k = A k b = A k (b k A λ), so k λ = (A k A k ) 1 (A k b k y k ) and b = b k A k (A ka k ) 1 (A k b k y k ). Slide 13

17 Broyden s Method As A k A k = (s k )(s k ) = s k s k and A k b k = (s k )vec(b k) = vec(b k s k ), A k (A ka k ) 1 (A k b k y k ) = 1 s k s (s k )(A k b k y k ) = 1 k s k s vec((a k b k y k )s k ) k and hence vec(b) = b = vec(b k ) vec and thus B = B k + (y k B k s k )s k s k s k (Ak b k y k )s k s k s k = vec B k (B ks k y k )s k s k s k Slide 14

18 Broyden s Method Based on a suitable guess 0 and a suitable full rank approximation B 0 for the Jacobian (e.g., ), iterate using Solve B k s k = ƒ ( k ) for s k k+1 = k + s k y k = ƒ ( k+1 ) ƒ ( k ) B k+1 = B k + (y k B k s k )s k /(s k s k). Again, this is the mathematical form. Computationally, can improve! Note that iteration for B k performs rank-one updates of the form B k+1 = B k + k k, and all we need is the inverse H k = B 1 k! Slide 15

19 Broyden s Method Remember Sherman-Morrison formula for inverse of rank-one update: Thus, (A + ) 1 = A 1 (1/σ)A 1 A 1, σ = 1 + A 1 = 0. H k+1 = B 1 k+1 = (B k + k k ) 1 = H k with k = (y k B k s k ) and k = s k /(s k s k) so that and Slide 16 H k k = B 1 k k = B 1 k y k s k = H k y k s k 1 + k H k k = 1 + s k (H ky k s k ) s k s k k H k k H k k k H k = s k H ky k s k s. k

20 Broyden s Method Computational form: based on suitable guess 0 and a suitable full rank approximation H 0 for the inverse of the Jacobian (e.g., ), iterate using s k = H k ƒ ( k ) k+1 = k + s k y k = ƒ ( k+1 ) ƒ ( k ) H k+1 = H k (H ky k s k )s k H k s k H ky k (of course needs that s k H ky k = 0). Slide 17

21 Spectral Methods Used for very large n. E.g., Barzilai-Borwein methods. One variant (DF-SANE) based on idea to use very simple B k or H k proportional to the identity matrix. However: when B k = β k, secant condition B k+1 s k = y k typically cannot be achieved exactly. Instead, determine β k+1 to solve y k B k+1 s k 2 = y k β k+1 s k 2 min with solution β k+1 = y k s k/s k s k. Gives iteration k+1 = k ƒ ( k )/β k, β k+1 = y k s k/s k s k or equivalently (with s k and y k as before), Slide 18 k+1 = k α k ƒ ( k ), α k+1 = s k s k/y k s k.

22 Spectral Methods Alternatively, when using H k = α k, the exact secant condition would be H k+1 y k = s k, and one determines α k+1 to solve s k H k+1 y k 2 = s k α k+1 y k 2 min with solution α k+1 = s k y k/y k y k. Gives iteration k+1 = k α k ƒ ( k ), α k+1 = s k y k y k y. k Slide 19

23 Root Finding with R Methods only only available in add-on packages. Package nleqslv has function nleqslv() providing the Newton and Broyden methods with a variety of global strategies. Slide 20

24 Root Finding with R Methods only only available in add-on packages. Package nleqslv has function nleqslv() providing the Newton and Broyden methods with a variety of global strategies. Package BB has function BBsolve() providing Barzilai-Borwein solvers, and and multistart() for starting solvers from multiple starting values. Slide 20

25 Root Finding with R Methods only only available in add-on packages. Package nleqslv has function nleqslv() providing the Newton and Broyden methods with a variety of global strategies. Package BB has function BBsolve() providing Barzilai-Borwein solvers, and and multistart() for starting solvers from multiple starting values.??? Slide 20

26 Examples System = 2, e = 2 Has one solution at 1 = 2 = 1. Slide 21

27 Outline Root Finding Unconstrained Optimization Constrained Optimization Optimization with R Slide 22

28 Theory Consider unconstrained minimization problem with objective function ƒ. If ƒ is smooth: ƒ ( + s) = ƒ ( ) + ƒ ( ) s s H ƒ ( + αs)s for some 0 < α < 1, with H ƒ () = 2 ƒ () the Hessian of ƒ at. Hence: j local minimum = ƒ ( ) = 0 ( is a critical point of ƒ) and H ƒ ( ) non-negative definite (necessary first and second order conditions) Slide 23

29 Theory Consider unconstrained minimization problem with objective function ƒ. If ƒ is smooth: ƒ ( + s) = ƒ ( ) + ƒ ( ) s s H ƒ ( + αs)s for some 0 < α < 1, with H ƒ () = 2 ƒ () the Hessian of ƒ at. Hence: j local minimum = ƒ ( ) = 0 ( is a critical point of ƒ) and H ƒ ( ) non-negative definite (necessary first and second order conditions) a critical point of ƒ and H ƒ ( ) positive definite (sufficient second order condition) = local minimum Slide 23

30 Steepest Descent Linear approximation ƒ ( + αs) ƒ () + α ƒ () s suggests looking for good α along the steepest descent direction s = ƒ (). Typically performed as s k = ƒ ( k ) Choose α k to minimize ƒ ( k + αs k ) k+1 = k + α k s k. (method of steepest descent [with line search]). Slide 24

31 Newton s Method Use local quadratic approximation ƒ ( + s) ƒ () + ƒ () s s H ƒ ()s (equivalently, linear approximation for ƒ) and minimize RHS over s to obtain Newton s method k+1 = k H ƒ ( k ) 1 ƒ ( k ) with computational form Solve H ƒ ( k )s k = ƒ ( k ) for s k ; k+1 = k + s k. Slide 25

32 Quasi-Newton Methods Similarly to the root-finding case, there are many modifications of the basic scheme, the general form k+1 = k α k B 1 k ƒ ( k) referred to as quasi-newton methods: adding the α k allows to improve the global performance by controlling the step size. (Note however that away from a critical point, the Newton direction s does not necessarily result in a local decrease of the objective function.) Slide 26

33 Quasi-Newton Methods Similar to the root-finding case, one can look for simple updating rules for Hessian approximations B k subject to the secant constraint B k s k = y k = ƒ ( k+1 ) ƒ ( k ), additionally taking into account that the B k should be symmetric (as approximations to the Hessians). This motivates solving B B k F min, B symmetric, y k = Bs k. which can easily be shown to have the unique solution B k + (y k B k s k )(y k B k s k ) (y k B k s k ) s k. Symmetric rank-one (SR1) update formula (Davidon, 1959). Slide 27

34 Quasi-Newton Methods SR1 has numerical difficulties: one thus considers similar problems with different norms, such as weighted Frobenius norms B B k F,W = W 1/2 (B B k )W 1/2 F. One can show: for any (symmetric) W for which Wy k = s k (such as the inverse of the average Hessian 1 0 H ƒ ( k + τs k ) dτ featuring in Taylor s theorem) which makes the approximation problem non-dimensional, B B k F,W min, B symmetric, y k = Bs k has the the unique solution ( ρ k y k s k )B k( ρ k s k y k ) + ρ ky k y k, ρ k = 1/y k s k. This is the DFP (Davidon-Fletcher-Powell) update formula. Slide 28

35 Quasi-Newton Methods Average Hessian: we have ƒ ( + s) ƒ () = ƒ ( + τs) 1 0 = = 1 d ƒ ( + τs) dτ 0 dτ 1 H ƒ ( + τs) dτ s 0 and hence (ignoring α k for simplicity) 1 y k = ƒ ( k + s k ) ƒ ( k ) = H ƒ ( k + τs k ) dτ s k. 0 Slide 29

36 Quasi-Newton Methods Most effective quasi-newton formula is obtained by applying the corresponding approximation argument to H k = B 1 k, i.e., by using the unique solution to H H k F,W min H symmetric, s k = Hy k. with W any matrix satisfying Ws k = y k (such as the average Hessian matrix), which is given by ( ρ k s k y k )H k( ρ k y k s k ) + ρ ks k s k, ρ k = 1 y k s. k This is the BFGS (Broyden-Fletcher-Goldfarb-Shanno) update formula. Slide 30

37 Quasi-Newton Methods Using Sherman-Morrison-Woodbury formula (A + UV ) 1 = A 1 A 1 U( + V A 1 U) 1 V A 1 and lengthy computations, one can show: BFGS update for Hk corresponds to B k+1 = B k + y ky k y k s k B ks k s k B k s k B ks k Slide 31

38 Quasi-Newton Methods Using Sherman-Morrison-Woodbury formula (A + UV ) 1 = A 1 A 1 U( + V A 1 U) 1 V A 1 and lengthy computations, one can show: BFGS update for Hk corresponds to B k+1 = B k + y ky k y k s k B ks k s k B k s k B ks k DFP update for Bk corresponds to H k+1 = H k + s ks k s k y H ky k y k H k k y k H ky k Rank-two updates. (Note the symmetry.) Slide 31

39 Conjugate Gradient Methods Alternative to quasi-newton methods. Have the form k+1 = k + α k s k, s k+1 = ƒ ( k+1 ) + β k+1 s k where the β are chosen such that s k+1 and s k are suitably conjugate. E.g., the Fletcher-Reeves variant uses β k+1 = g k+1 g k+1 g k g, g k+1 = ƒ ( k+1 ). k Advantage: require no matrix storage (uses only gradients) and are faster than steepest descent, hence suitable for large-scale optimization problems. Slide 32

40 Gauss-Newton Method Minimizing objective functions of the form ϕ() = 1 r () 2 = r() 2 /2, 2 is known (roughly) as non-linear least squares problems, corresponding to minimizing the sum of squares of residuals r () = t ƒ (e, ) (in statistics, we usually write and y for the inputs and targets, and θ instead of for the unknown parameter). Clearly, ϕ() = J r () r(), H ϕ () = J r () J r () + r ()H r (). Slide 33

41 Gauss-Newton Method Straightforward Newton method typically replaced by the Gauss-Newton method (J r ( k ) J r ( k ))s k = J r ( k ) r( k ) which drops the terms involving the H r from the Hessian based on the idea that in the minimum the residuals r should be small. Replaces a non-linear least squares problem by a sequence of linear least-squares problems with the right limit. However, simplification may lead to ill-conditioned or rank deficient linear problems: if so, the Levenberg-Marquardt method (J r ( k ) J r ( k ) + μ k )s k = J r ( k ) r( k ) with suitably chosen μ k > 0 can be employed. Slide 34

42 Outline Root Finding Unconstrained Optimization Constrained Optimization Optimization with R Slide 35

43 Theory Local optima are relative to neighborhoods in feasible set X. A suitably interior point in X is a local minimum iff t ƒ ((t)) has a local minimum at t = 0 for all smooth curves t (t) in X defined in some open interval containing 0 with (0) =. (Alternatively, if J g ( ) has full column rank, use the implicit mapping theorem to obtain a local parametrization = (s) of the feasible set with, say, (0) =, and consider the conditions for ƒ ((s)) to have an unconstrained local minimum at s = 0.) Slide 36

44 Theory We have dƒ ((t)) dt = d 2 ƒ ((t)) dt 2 = ƒ j j,k j ((t)) j (t) 2 ƒ ƒ ((t)) j (t) k (t) + ((t)) j (t) j k j j From the first: for local minimum necessarily ƒ ( ), s = 0 for all feasible directions s at. Slide 37

45 Theory Suppose that X is defined by equality constraints g() = [g 1 (),..., g m ()] = 0. Then g ((t)) 0 and hence g ((t)) j (t) = 0. j j Hence: set of feasible directions at is given by Nll(J g ( )). Necessary first order condition translates into: ƒ ( ) (Nll(J g ( ))) = Rnge(J g ( ) ), i.e., there exist Lagrange multipliers λ such that ƒ ( ) = J g ( ) λ. Slide 38

46 Theory Differentiating once more, j,k 2 g ((t)) j (t) k (t) + j k j g ((t)) j (t) = 0. j Multiply by λ from above and add to expression for d 2 ƒ ((t))/dt 2 at t = 0: d 2 ƒ ((t)) 2 ƒ 2 g dt 2 = ( ) + λ ( ) s j s k t=0 j,k j k j k ƒ g + ( ) + λ ( ) j (0) j j j = s H ƒ ( ) + λ H g ( ) s = s B(, λ)s Slide 39

47 Theory For local minimum, must thus also have s B(, λ)s 0 for all feasible directions s at, or equivalently, for all s Nll(J g ( ). Let Z be a basis for Nll(J g ( ). Then: constrained local minimum = ƒ ( ) = J g ( ) λ and projected (reduced) Hessian Z B(, λ)z non-negative definite Slide 40

48 Theory For local minimum, must thus also have s B(, λ)s 0 for all feasible directions s at, or equivalently, for all s Nll(J g ( ). Let Z be a basis for Nll(J g ( ). Then: constrained local minimum = ƒ ( ) = J g ( ) λ and projected (reduced) Hessian Z B(, λ)z non-negative definite ƒ ( ) = J g ( ) λ and Z B(, λ)z positive definite = constrained local minimum Slide 40

49 Theory Lagrangian L(, λ) = ƒ () + λ g() has ƒ () + Jg () λ B(, λ) Jg () L(, λ) =, H g() L (, λ) = J g () O Above first-order conditions conveniently translate into (, λ) being a critical point of L. However, saddle point: H L cannot be positive definite. Need to check Z B(, λ)z as discussed above. If also inequality constraints h k () 0: corresponding λ k 0 and h k ( )λ k = 0 (k with λ k > 0 give active set where h k ( ) = 0). Slide 41

50 Sequential Quadratic Programming Consider minimization problem with equality constraints. Suppose we use Newton s method to determine the critical points of the Lagrange function, i.e., to solve ƒ () + Jg () λ L(, λ) = = 0. g() Gives the system B(k, λ k ) J g ( k ) sk = J g ( k ) O δ k ƒ (k ) + J g ( k ) λ k g( k ). which are the first-order conditions for the constrained optimization problem min 2 s B( k, λ k )s + s (ƒ ( k ) + J g ( k ) λ k ), J g ( k )s + g( k ) = 0. 1 s Slide 42

51 Sequential Quadratic Programming Why? Dropping arguments, want to solve 1 2 s Bs + s min, Js + = 0. Lagrangian is L(s, δ) = 1 2 s Bs + s + δ (Js + ) for which the first-order conditions are Bs + + J δ = 0, Js + = 0 Equivalently, B J s = J O δ. Slide 43

52 Sequential Quadratic Programming I.e., when using Newton s method to determine the critical points of the Lagrange function, each Newton step amounts to solving a quadratic optimization problem under linear constraints: hence, method known as Sequential Quadratic Programming. One does not necessarily solve the linear system directly (remember that the Hessian is symmetric but indefinite, so Choleski factorization is not possible). Slide 44

53 Penalty Methods Idea: convert the constrained problem into a sequence of unconstrained problems featuring a penalty function to achieve (approximate) feasibility. E.g., for the equality-constrained problem ƒ () min, g() = 0 one considers ϕ ρ () = ƒ () + ρ g() 2 /2 min; Under appropriate conditions it can be shown that the solutions (ρ) converge to the solution of the original problem for ρ. Slide 45

54 Augmented Lagrangian Methods For penalty method, letting ρ increase without bounds often leads to problems, which motivates to instead use an augmented Lagrangian function L ρ (, λ) = ƒ () + λ g() + ρ g() 2 /2 for which the Lagrange multiplier estimates can be manipulated in ways to keep them bounded while converging to the constrained optimum: augmented Lagrangian methods. Illustration where fixed ρ > 0 suffices: convex minimization with linear constraints. Slide 46

55 Augmented Lagrangian Methods Primal problem ƒ () min, A = b. with ƒ convex. Duality theory: with Lagrangian L(, λ) = ƒ () + λ (A b), dual function is inf L(, λ) = ƒ ( A λ) b λ (where ƒ is the convex conjugate), and optimality conditions are A b = 0, ƒ () + A λ = 0 (primal and dual feasibility). Slide 47

56 Augmented Lagrangian Methods Consider augmented Lagrangian L ρ (, λ) = ƒ () + λ (A b) + (ρ/2) A b 2. Can be viewed as unaugmented Lagrangian for ƒ () + (ρ/2) A b 2 min, A = b. Consider iteration k+1 = rgmin L ρ (, λ k ), λ k+1 = λ k + ρ(a k+1 b). Slide 48

57 Augmented Lagrangian Methods By definition, k+1 minimizes L ρ (, λ k ), hence 0 = L ρ ( k+1, λ k ) = ƒ ( k+1 ) + A (λ k + ρ(a k+1 b)) = ƒ ( k+1 ) + A λ k+1. Thus, iteration ensures dual feasibility, convergence implies primal feasibility, hence altogether optimality. General case uses the same idea, but may need increasing ρ = ρ k. (For additional inequality constraints, add slack variables and use bound constraints.) Slide 49

58 Barrier Methods Also known as interior point methods. Use barrier functions which ensure feasibility and increasingly penalize feasible points as they increase the boundary of the feasible set. E.g., with constraints h() 0, 1 ϕ μ () = ƒ () μ h (), ϕ μ() = ƒ () μ log( h ()) are the inverse and logarithmic barrier functions, respectively. Idea: for μ > 0, unconstrained minimum μ of ϕ μ is necessarily feasible; as μ 0, μ (the constrained minimum). Need starting values and keep all approximations in the interior of the feasible set, but allow convergence to points on the boundary. Slide 50

59 Outline Root Finding Unconstrained Optimization Constrained Optimization Optimization with R Slide 51

60 Optimization with R: Base Packages nlm() does a Newton-type algorithm for unconstrained minimization. Slide 52

61 Optimization with R: Base Packages nlm() does a Newton-type algorithm for unconstrained minimization. optim() provides several methods, including the Nelder-Mead simplex algorithm (default: derivative-free but slow, not discussed above), quasi-newton (BFGS), conjugate gradient as well as simulated annealing (a stochastic global optimization method, not discussed above). The quasi-newton method can also handle box constrains of the form (where and r may be and, respectively), known as L-BFGS-B. Slide 52

62 Optimization with R: Base Packages nlm() does a Newton-type algorithm for unconstrained minimization. optim() provides several methods, including the Nelder-Mead simplex algorithm (default: derivative-free but slow, not discussed above), quasi-newton (BFGS), conjugate gradient as well as simulated annealing (a stochastic global optimization method, not discussed above). The quasi-newton method can also handle box constrains of the form (where and r may be and, respectively), known as L-BFGS-B. nlminb(), now documented to be for historical compatibility, provides code from the PORT library by Gay for unconstrained or box-constrained optimization using trust region methods. Slide 52

63 Optimization with R: Base Packages constroptim() performs minimization under linear inequality constraints using an adaptive (logarithmic) barrier algorithm in combination with the Nelder-Mead or BFGS minimizers. Note that this is an interior point method, so there must be interior points! I.e., linear equality constraints cannot be handled. Slide 53

64 Optimization with R: Contributed Packages Several add-on packages perform linear programming. Slide 54

65 Optimization with R: Contributed Packages Several add-on packages perform linear programming. Package quadprog performs quadratic programming (strictly convex case with linear equality and/or inequality constrains). Slide 54

66 Optimization with R: Contributed Packages Several add-on packages perform linear programming. Package quadprog performs quadratic programming (strictly convex case with linear equality and/or inequality constrains). Package BB provides BBoptim() which itself wraps around spg(), which implements the spectral projected gradient method for large-scale optimization with convex constraints. Needs a function for projecting on the feasible set. Slide 54

67 Optimization with R: Contributed Packages Several add-on packages perform linear programming. Package quadprog performs quadratic programming (strictly convex case with linear equality and/or inequality constrains). Package BB provides BBoptim() which itself wraps around spg(), which implements the spectral projected gradient method for large-scale optimization with convex constraints. Needs a function for projecting on the feasible set. Package nloptr provides an interface to NLopt ( a general purpose open source library for non-linear unconstrained and constrained optimization. One needs to consult the NLopt web page to learn about available methods (see Slide 54

68 Optimization with R: Contributed Packages Package alabama provides auglag(), an Augmented Lagrangian routine for smooth constrained optimization, which conveniently does not require feasible starting values. Slide 55

69 Example Rosenbrock banana function ƒ () = 100( )2 + (1 1 ) 2. Clearly has a global minimum at = (1, 1). Slide 56

70 Example Rosenbrock banana function ƒ () = 100( )2 + (1 1 ) 2. Clearly has a global minimum at = (1, 1). Multi-dimensional generalization (e.g., extended Rosenbrock function 2n =1 (100( )2 + (1 2 1 ) 2 (sum of n uncoupled 2-d Rosenbrock functions). Slide 56

Algorithms for Constrained Optimization

1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic