Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Size: px
Start display at page:

Download "Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control"

Transcription

1 Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Constrained Optimization May / 46

2 In This Section... My first talk was about optimization formulations, optimality conditions, and duality for LP, QP, LCP, and nonlinear optimization. This section will review some algorithms, in particular: Primal-dual interior-point (PDIP) methods Augmented Lagrangian (AL) methods. Both are useful in control applications. We ll say something about PDIP methods for model-predictive control, and how they exploit the structure in that problem. Wright (UW-Madison) Constrained Optimization May / 46

3 Recapping Gradient Methods Considering unconstrained minimization min f (x) where f is smooth and convex, or the constrained version in which x is restricted to the set Ω, usually closed and convex. First-order or gradient methods take steps of the form x k+1 = x k α k g k, where α k R + is a steplength and g k is a search direction. d k is constructed from knowledge of the gradient f (x) at the current iterate x = x k and possibly previous iterates x k 1, x k 2,.... Can extend to nonsmooth f by using the subgradient f (x). Extend to constrained minimization by projecting the search line onto the convex set Ω, or (similarly) minimizing a linear approximation to f over Ω. Wright (UW-Madison) Constrained Optimization May / 46

4 Prox Interpretation of Line Search Can view the gradient method step x k+1 = x k α k g k as the minimization of a first-order model of f plus a prox-term which prevents the step from being too long: x k+1 = arg min x f (x k ) + g T k (x x k) + 1 2α k x x k 2 2. Taking the gradient the quadratic and setting to zero, we obtain which gives the formula for x k+1. g k + 1 α k (x k+1 x k ) = 0, This viewpoint is the key to several extensions. Wright (UW-Madison) Constrained Optimization May / 46

5 Extensions: Constraints When a constraint set Ω is present we can simply minimize the quadratic model function over Ω: x k+1 = arg min x Ω f (x k) + g T k (x x k) + 1 2α k x x k 2 2. Gradient Projection has this form. We can replace the l 2 -norm measure of distance with some other measure φ(x; x k ): x k+1 = arg min x Ω f (x k) + g T k (x x k) + 1 2α k φ(x; x k ). Could choose φ to match Ω. For example, a measure derived from the entropy function is a good match for the simplex Ω := {x x 0, e T x = 1}. Wright (UW-Madison) Constrained Optimization May / 46

6 Extensions: Regularizers In many modern applications of optimization, f has the form f (x) = l(x) }{{} + τψ(x). }{{} smooth function simple nonsmooth function Can extend the prox approach above by choosing g k to contain gradient information from l(x) only; including τ ψ(x) explicitly in the subproblem. Subproblems are thus: x k+1 = arg min x l(x k ) + g T k (x x k) + 1 2α k x x k 2 + τψ(x). Wright (UW-Madison) Constrained Optimization May / 46

7 Extensions: Explicit Trust Regions Rather than penalizing distance moved from current x k, we can enforce an explicit constraint: a trust region. x k+1 = arg min x f (x k ) + g T k (x x k) + I x xk k (x), where I Λ (x) denotes an indicator function with { 0 if x Λ I Λ (x) = otherwise. Adjust trust-region radius k to ensure progress e.g. descent in f. Wright (UW-Madison) Constrained Optimization May / 46

8 Extension: Proximal Point Could use the original f in the subproblem rather than a simpler model function: x k+1 = arg min f (x) + 1 x x k 2 x 2α 2. k although the subproblem seems just as hard to solve as the original, the prox-term may make it easier by introducing strong convexity, and may stabilize progress. Can extend to constrained and regularized cases also. Wright (UW-Madison) Constrained Optimization May / 46

9 Quadratic Models: Newton s Method We can extend the iterative strategy further by adding a quadratic term to the model, instead of (or in addition to) the simple prox-term above. Taylor s Theorem suggests basing this term on the Hessian (second-derivative) matrix. That is, obtain the step from x k+1 := arg min x f (x k ) + f (x k ) T (x x k ) (x x k) T 2 f (x k )(x x k ). Can reformulate to solve for the step d k : Then have x k+1 = x k + d k, where d k := arg min d f (x k ) + f (x k ) T d d T 2 f (x k )d. See immediately that this model won t have a bounded solution if 2 f (x k ) is not positive definite. It usually is positive definite near a strict local solution x, but need something that works more globally. Wright (UW-Madison) Constrained Optimization May / 46

10 Practical Newton Method One obvious strategy is to add the prox-term to the quadratic model: d k := arg min f (x k ) + f (x k ) T d + 1 ( d 2 d T 2 f (x k ) + 1 ) I d, α k choosing α k so that The quadratic term is positive definite; Some other desirable property holds, e.g. descent f (x k + d k ) < f (x k ). We can also impose the trust-region explicitly: d k := arg min d or alternatively: d k := arg f (x k ) + f (x k ) T d d T 2 f (x k )d + I d k (x), min d : d k f (x k ) + f (x k ) T d d T 2 f (x k )d. But this is equivalent: For any k, there exists α k such that the solutions of the prox form and the trust-region form are identical. Wright (UW-Madison) Constrained Optimization May / 46

11 Quasi-Newton Methods Another disadvantage of Newton is that the Hessian may be difficult to evaluate or otherwise work with. The quadratic model is still useful when we use first-order information to learn about the Hessian. Key observation (from Taylor s theorem): the secant condition: 2 f (x k )(x k+1 x k ) f (x k+1 ) f (x k ). The difference of gradients tells us how the Hessian behaves along the direction x k+1 x k. By aggregating such information over multiple steps, we can build up an approximation to the Hessian than is valid along multiple directions. Quasi-Newton Methods maintain an approximation B k to 2 f (x k ) that respects the secant condition. The approximation may be implicit rather than explicit, and we may store an approximation to the inverse Hessian instead. Wright (UW-Madison) Constrained Optimization May / 46

12 L-BFGS A particularly populat quasi-newton method, suitable for large-scale problems is the limited-memory BFGS method (L-BFGS) which stores the Hessian or inverse Hessian approximation implicitly. L-BFGS stores the last 5 10 update pairs: s j := x j+1 x j, y j := f (x j+1 ) f (x j ). for j = k, k 1, k 2,..., k m. Can implicitly construct H k+1 that satisfies H k+1 y j = s j. In fact, an efficient recursive formula is available for evaluating d k+1 := H k+1 f (x k+1 ) the next search direction directly from the (s j, y j ) pairs and from some initial estimate of the form (1/α k+1 )I. Wright (UW-Madison) Constrained Optimization May / 46

13 Newton for Nonlinear Equations There is also a variant of Newton s method for nonlinear equations: Find x such that F (x) = 0, where F : R n R n (n equations in n unknowns.) Newton s method forms a linear approximation to this system, based on another variant of Taylor s Theorem which says F (x + d) = F (x) + J(x)d [J(x + td) J(x)]d dt, where J(x) is the Jacobian matrix of first partial derivatives: F 1 x 1 J(x) =. F n x 1 F 1 x n. F n x n (usually not symmetric). Wright (UW-Madison) Constrained Optimization May / 46

14 When F is continuously differentiable, we have F (x k + d) F (x k ) + J(x k )d, so the Newton step is the one that make the right-hand side zero: d k := J(x k ) 1 F (x k ). The basis Newton method takes steps x k+1 := x k + d k. Its effectiveness can be improved by Doing a line search: x k+1 := x k + α k d k, for some α k > 0; Levenberg strategy: add λi to J and set d k := (J(x k ) + λi ) 1 F (x k ). Guide progress via a merit function, usually φ(x) := 1 2 F (x) 2 2. Achtung! Can get stuck in a local min of φ that s not a solution of F (x) = 0. Wright (UW-Madison) Constrained Optimization May / 46

15 Homotopy Tries to avoid the local-min issue with the merit function. Start with an easier set of nonlinear equations, and gradually deform it to the system F (x) = 0, tracking changes to the solution as you go. F (x, λ) := λf (x) + (1 λ)f 0 (x), λ [0, 1]. Assume that F (x, 0) = F 0 (x) = 0 has solution x 0. Homotopy methods trace the curve of solutions (x, λ) until λ = 1 is reached. The corresponding value of x then solves the original problem. Many variants. Some supporting theory. Typically more expensive than enhanced Newton methods, but better at finding solutions to F (x) = 0. We mention homotopy mostly because of its connection to interior-point methods. Wright (UW-Madison) Constrained Optimization May / 46

16 Interior-Point Methods Recall the monotone LCP: Find z R n such that 0 z Mz + q 0, where M R n n is positive semidefinite, and q R n. Recall too that monotone LCP is a generalization of LP and convex QP. Rewrite LCP as w = Mz + q, (w, z) 0, w i z i = 0, i = 1, 2,..., n, which is a constrained system of nonlinear equations: F (z, w) = 0, (z, w) 0, where F : R 2n R 2n is defined as [ ] Mz + q w F (z, w) = = WZe where [ ] 0, 0 W = diag(w 1, w 2,..., w n ), Z = diag(z 1, z 2,..., z n ), e = (1, 1,..., 1) T. Wright (UW-Madison) Constrained Optimization May / 46

17 Interior-Point Methods Interior-point methods generate iterates that satisfy the nonnegativity constraints strictly, that is, (z k, w k ) > 0 for all k. An obvious strategy is to search along Newton directions for F, with a line search to maintain positivity of (z k+1, w k+1 ). The Newton equations are: [ M I W Z ] [ ] z = w [ ] Mz + q w WZe This affine scaling approach can actually work, and convergence can be proved, but it needs very precise conditions on the steplength to avoid getting jammed at the boundary. Much better performance is obtained from methods that use homotopy, in particular, the idea of following a central path. Wright (UW-Madison) Constrained Optimization May / 46

18 The Central Path The central path is a critical object in primal-dual interior-point methods. It s defined by the parametric equations: [ ] [ ] Mz + q w 0 F (z, w; τ) := =, WZe τe 0 where τ 0 is the central path parameter, along with the conditions (z, w) > 0. The second block of conditions is saying that w i z i = τ, for all i = 1, 2,..., n, so if τ > 0, we must have all components of w and z strictly positive hence the term interior point. 1 1 Actually, interior-point methods don t force the first condition Mz + q w = 0 to hold except in the limit, so the iterates are not really interior in the send of being strictly feasible. Wright (UW-Madison) Constrained Optimization May / 46

19 Primal-Dual Interior-Point Methods The central path guides iterates to the solution. The basic primal-dual interior-point method works as follows: Compute Newton steps for F (z, w; τ k ) for some value of τ k. Take a step (z k+1, w k+1 ) = (z k, w k ) + α k ( z k, w k ), choosing α k so that (z k+1, w k+1 ) remains sufficiently positive. Possibly enhance with some auxiliary corrector steps. Choose a smaller value τ k+1 < τ k and repeat. The effect is that it chases a moving target along the central path, staying reasonably close to the central path to avoid getting jammed near the boundary of the region (w, z) 0. Wright (UW-Madison) Constrained Optimization May / 46

20 A Simple PDIP Method Usually works, simple to code (one page!) Choose τ k+1 =.9τ k ; Choose α k to be the largest α for which (z k i + α z k i )(w k i + α w k i ).01(z k + α z k ) T (w k + α w k )/n. The second condition ensures that iterates stay in a loose neighborhood of the central path none of the variables go to zero prematurely. Wright (UW-Madison) Constrained Optimization May / 46

21 A More Elaborate PDIP Method (Mehrotra) Chooses τ k+1 by a heuristic based on the performance of the affine scaling step ( z k aff, w k aff), which is computed by solving the Newton equations with τ = 0. If we can take a long step along the affine scaling direction before reaching the boundary, choose τ k+1 much smaller than τ k an aggressive choice. Otherwise, if the affine scaling step quickly jams at the boundary, make a more conservative choice τ k+1 only a little smaller than τ k. Mehrotra s method also includes a corrector step, that corrects for the nonlinearity revealed by the affine-scaling step. The combined centering-corrector step is obtained from [ ] [ ] [ M I zcc = W Z w cc 0 τ k+1 e z k aff w k affe This is added to the affine-scaling step to obtain the search direction. Wright (UW-Madison) Constrained Optimization May / 46 ].

22 Solving the Linear Equations At each iteration need to solve two or more linear systems of the form: [ ] [ ] [ ] M I z rf =, W Z w where W and Z are the diagonals of w k and z k which contain allpositive numbers. r zw r f and r zw are different right-hand sides (affine-scaling, centering-corrector). Note that there is a lot of structure in this system three of the blocks are N N nonsingular diagonal and that the structure is the same at every interior-point iteration. Wright (UW-Madison) Constrained Optimization May / 46

23 Reduced Form Can do block elimination (of w) to obtain Note that (M + Z 1 W ) z = r f + Z 1 r wz. When M is positive semidefinite (as it is for monotone LCP, then M + Z 1 W is positive definite). If we can identify a good strategy for reordering rows and columns of M to solve this efficiently, we can re-use this ordering at every iteration, because M + Z 1 W has the same nonzero pattern (only the diagonals change). Since some elements of Z 1 W are going to as the iterates near a solution while others are going to zero, this system can become highly ill-conditioned but it doesn t seem to matter much. Wright (UW-Madison) Constrained Optimization May / 46

24 Application to LP When we apply this technique to LCPs arising from LP, M has a particular form, and the linear system that we solve has the form: 0 A T I x r p A 0 0 λ = r d, S 0 X s r xs where X = diag(x 1, x 2,..., x n ) and S = diag(s 1, s 2,..., s n ). By eliminating s we obtain a reduced form: [ S 1 X A T ] [ ] [ x rp S = 1 ] r xs. A 0 λ Since S 1 X is positive diagonal we can go a step further and eliminate x: A(SX 1 )A T λ = r d + ASX 1 (r p S 1 r xs ). r d Wright (UW-Madison) Constrained Optimization May / 46

25 Thus we have a coefficient matrix of the form ADA T where D = SX 1 is positive diagonal. Some elements of D go to zero as the the iterates converge (those for which s i = 0 at the solution) while others go to (those for which x i = 0). Hence the system MAY be ill conditioned. In practical codes, this matrix product is actually formed and factored at every iteration, using a variant of the Cholesky factorization. This factorization is stable regardless of the ordering of rows/columns, so we are free to use orderings that create the least fill-in during Cholesky. Good codes: MOSEK, GUrobi, CPLEX. Freebie: PCx. Wright (UW-Madison) Constrained Optimization May / 46

26 Optimal Control (Open Loop) Given current state x 0, choose a time horizon N (long), and solve the optimization problem for x = {x k } N k=0, u = {u k} N 1 k=0 : min L(x, u), subject to x 0 given, x k+1 = F k (x k, u k ), k = 0, 1,..., N 1, other constraints on x, u. Then apply controls u 0, u 1, u 2,... blindly, without monitoring the state. flexible with respect to nonlinearity and constraints; if model F k is inaccurate, solution may be bad; doesn t account for system disturbances during time horizon; never used in industrial practice! Wright (UW-Madison) Constrained Optimization May / 46

27 Linear-Quadratic Regulator (Closed Loop) A canonical control problem. For given x 0, solve: min x,u Φ(x, u) := 1 2 xk T Qx k + uk T Ru k s.t. x k+1 = Ax k + Bu k. k=0 From KKT conditions, dependence of optimal values of x 1, x 2,... and u 0, u 1,... on initial x 0 is linear. We can substitute these variables out to obtain min x,u Φ(x, u) = 1 2 x T 0 Πx 0, for some s.p.d. matrix Π. By using this dynamic programming principle, isolating the first stage, can write the problem as: 1 min x 1,u 0 2 (x 0 T Qx 0 + u0 T Ru 0 ) x 1 T Πx 1 s.t. x 1 = Ax 0 + Bu 0. Wright (UW-Madison) Constrained Optimization May / 46

28 By substituting for x 1, get unconstrained quadratic problem in u 0. Minimizer is so that u 0 = Kx 0, where K = (R + B T ΠB) 1 B T ΠA. By substituting for u 0 and x 1 in obtain the Riccati equation: x 1 = Ax 0 + Bu 0 = (A + BK)x x T 0 Πx 0 = 1 2 (x T 0 Qx 0 + u T 0 Ru 0 ) x T 1 Πx 1, Π = Q + A T ΠA A T Π T B(R + B T ΠB) 1 B T ΠA. There are well-known techniques to solve this equation for Π, hence K. Hence, we have a feedback control law u = Kx that is optimal for the LQR problem. This is a closed-loop control strategy that can respond to changes in state. Wright (UW-Madison) Constrained Optimization May / 46

29 Model Predictive Control (MPC) Closed-loop has the highly desirable property of being able to respond to system disturbances and modeling errors. But closed-form solutions can be found only for simple models such as LQR. Open-loop allows for a rich variety of models and constraints, and yields a highly structured optimization problem that can often be solved efficiently. But it can t fix disturbances and errors. Set and forget at your peril! Model-Predictive Control (MPC) is a way of doing closed-loop control using open-loop / optimal-control techniques. Set up a optimal control problem, initialized at the current state, with a finite planning horizon. Solve this open-loop problem (quickly!) and implement the control u 0. At the next decision point, repeat the process. Can use the previous solution (shifted forward one period) as a warm start. Wright (UW-Madison) Constrained Optimization May / 46

30 Linear MPC A generalization of the linear-quadratic regulator (LQR) problem includes constraints: min x,u 1 2 xk T Qx k + uk T Ru k, k=0 subject to x k+1 = Ax k + Bu k, k = 0, 1, 2,..., x k X, u k U, possibly also mixed constraints, and constraints on u k+1 u k. Assuming that 0 int(x ), 0 int(u) and that the system is stabilizable, we expect that u k 0 and x k 0 as k. Therefore, for large enough k, the non-model constraints become inactive. Wright (UW-Madison) Constrained Optimization May / 46

31 Hence, for N large enough, the problem is equivalent to the following (finite) problem: min x,u N 1 1 xk T 2 Qx k + uk T Ru k x N T Πx N, k=0 subject to x k+1 = Ax k + Bu k, k = 0, 1, 2,..., N 1 x k X, u k U, k = 0, 1, 2,..., N 1, where Π is the solution of the Riccati equation. In the tail of the sequence (k > N) simply apply the unconstrained LQR feedback law, derived above. When constraints are linear, need to solve a finite, structured, convex quadratic program. Wright (UW-Madison) Constrained Optimization May / 46

32 Interior-Point Method for MPC min u,x,ɛ N 1 k=0 subject to 1 2 (x T k Qx k + u T k Ru k + 2x T k Mu k + ɛ T k Zɛ k) + z T ɛ k + x T N Πx N, x 0 = ˆx j, (fixed) x k+1 = Ax k + Bu k, k = 0, 1,..., N 1, Du k Gx k d, k = 0, 1,..., N 1, Hx k ɛ k h, k = 1, 2,..., N, ɛ k 0, k = 1, 2,..., N, Fx N = 0. Soft contraints on the state (involving ɛ k ); Also general hard constraints on (x k, u k ). Can use these to implement constraints on control changes u k+1 u k Wright (UW-Madison) Constrained Optimization May / 46

33 Introduce dual variables, use stagewise ordering. Primal-dual interior-point method yields a block-banded system at each iteration:... Q M G T A T.. x M T R D T B T k r x k G D Σ D u k r u k k λ A B I k r λ k Σ ɛ p k+1 I k+1 r p = k+1 ξ Σ H k+1 I H k+1 r ξ k+1 η k+1 I I Z r η k+1 ɛ k+1 I H T Q... x rk+1 ɛ k+1 rk+1 x.. where Σ D k, Σɛ k+1, etc are diagonal. Wright (UW-Madison) Constrained Optimization May / 46

34 By performing block elimination, get reduced system R 0 B B T u I 0 r u 0 I Q 1 M 1 A T p 0 r p 0 M1 T R 1 B T x 1 r x 1 u A B I 1 r u 1 I Q 2 M 2 A T p 1 r p 1 M2 T R 2 B T x 2 = r x 2. u. A B... 2 r u QN F T x N r N x β r β F which has the same structure as the KKT system of a problem without side constraints (soft or hard). Wright (UW-Madison) Constrained Optimization May / 46

35 Can solve by applying a banded linear solver: O(N) operations. Alternatively, seek matrices Π k and vectors π k such that the following relationship is satisfied between p k 1 and x k : p k 1 + Π k x k = π k, quadk = N, N 1,..., 1. By substituting in the linear system, find a recurrence relation: Π N = Q N, π N = r x N, Π k 1 = Q k 1 + A T Π k A (A T Π k B + M k 1 )(R k 1 + B T Π k B) 1 (B T Π k A + Mk 1), T π k 1 = r k 1 x + A T Π k r p k 1 + AT π k (A T Π k B + M k 1 )(R k 1 + B T Π k B) 1 ( r k 1 u + B T Π k r p k 1 + BT π k ). The recurrence for Π k is the discrete time-varying Riccati equation! Wright (UW-Madison) Constrained Optimization May / 46

36 Duality for Nonlinear Programming (Refresher) Recall yesterday s conversation about duality for general constrained problems: min f (x) subject to c i (x) 0, i = 1, 2,..., m. with the Lagrangian defined by: L(x, λ) := f (x) λ T c(x) = f (x) m λ i c i (x). i=1 Remember that we defined primal and dual objectives: r(x) = sup L(x, λ); λ 0 q(λ) = inf x so the primal and dual problems are: L(x, λ), min x r(x), max λ 0 q(λ). Wright (UW-Madison) Constrained Optimization May / 46

37 Augmented Lagrangian Can motivate it crudely as proximal point on the dual. Consider the dual: max λ 0 q(λ) = max inf f (x) λ T c(x). λ 0 x Add a proximal point term a quadratic penalty on the distance moved from the last dual iterate λ k : max inf f (x) λ T c(x) 1 λ λ k 2 λ 0 x 2 2α k Note that the objective here is simple in λ. Now switch the max and inf: { inf max f (x) x λ 0 λt c(x) 1 } λ λ k α k We can solve the max problem, to get an explicit value for λ! This is easy because the components of λ are separated in the objective; we can solve for each one individually. Wright (UW-Madison) Constrained Optimization May / 46

38 Explicit solution is λ i = { 0 if α k c i (x) + λ k i 0; α k c i (x) + λ k i otherwise. Substitute this result in the previous expression to get min x f (x) + + i : c i (x) λ k i /α k i : c i (x)<λ k i /α k 1 (λ k i ) 2 2α k ( αk ) 2 c i(x) 2 λ k i c i (x). Thus the basic augmented Lagrangian process is (iteration k): Minimize the augmented Lagrangian function above for x (approximately); call it x k+1 ; Plug this x into the explicit-max formula for λ to get λ k+1 ; Choose α k+1 for the next iteration. Wright (UW-Madison) Constrained Optimization May / 46

39 Equality Constraints For the equality constrained case, the formulae simplify a lot. We have min x f (x) subject to d j (x) = 0, j = 1, 2,..., p. Dual objective is inf x L(x, µ) := f (x) µ T d(x). Augmented Lagrangian subproblems are: x k+1 = arg min x µ k+1 = µ k α k d(x k+1 ). f (x) (µ k ) T d(x) + α k 2 d(x) 2 2, Wright (UW-Madison) Constrained Optimization May / 46

40 Augmented Lagrangian: History and Practice How accurately to solve the subproblem for x, which is unconstrained but nonlinear? How to adjust α k? Use a different α k value for each constraint; increase it when this constraint is not becoming feasible rapidly enough. Historical Sketch: Dates from 1969: Hestenes, Powell. Developments in 1970s- early 1980s by Rockafellar, Bertsekas, others. Lancelot code for nonlinear programming: Conn, Gould, Toint, Largely lost favor as an approach for general nonlinear programming during the next 15 years. Recent revival in the context of sparse optimization and its many applications, in conjunction with splitting / coordinate descent. Wright (UW-Madison) Constrained Optimization May / 46

41 Separable Objectives: ADMM Alternating Directions Method of Multipliers (ADMM) arises when the objective in the basic linearly constrained problem is separable: for which min f (x) + h(z) subject to Ax + Bz = c, (x,z) L(x, z, λ; ρ) := f (x) + h(z) λ T (Ax + Bz c) + α 2 Ax Bz c 2 2. Standard augmented Lagrangian would minimize L(x, z, λ; ρ) over (x, z) jointly but these are coupled through the quadratic term, so the advantage of separability is lost. Instead, minimize over x and z separately and sequentially: x k = arg min x L(x, z k 1, λ k 1 ; α k ); z k = arg min z L(x k, z, λ k 1 ; α k ); λ k = λ k 1 α k (Ax k + Bz k c). Wright (UW-Madison) Constrained Optimization May / 46

42 ADMM Basically, does a round of block-coordinate descent in (x, z). The minimizations over x and z add only a quadratic term to f and h, respectively. This does not alter the cost much. Can perform these minimizations inexactly. Convergence is often slow, but sufficient for many applications. Many recent applications to compressed sensing, image processing, matrix completion, sparse principal components analysis. ADMM has a rich collection of antecendents. A nice recent survey, including a diverse collection of machine learning applications, is Boyd et al. (2011). Wright (UW-Madison) Constrained Optimization May / 46

43 ADMM for Consensus Optimization Given min m f i (x), form m copies of the x, with the original x as a master variable: m f i (x i ) subject to x i = x, i = 1, 2,..., m. min x,x 1,x 2,...,x m i=1 i=1 Apply ADMM, with z = (x 1, x 2,..., x m ), get xk i = arg min f i (x i ) (λ i x i k 1 )T (x i x k 1 ) + α k 2 x i x k 1 2 2, i, x k = 1 m ( xk i m 1 ) λ i k 1, α k i=1 λ i k = λi k 1 α k(x i k x k), i Obvious parallel possibilities in the x i updates. Synchronize for x update. Wright (UW-Madison) Constrained Optimization May / 46

44 ADMM for Awkward Intersections The feasible set is sometimes an intersection of two or more convex sets that are easy to handle separately (e.g. projections are easily computable), but whose intersection is more difficult to work with. Example: Optimization over the cone of doubly nonnegative matrices: General form: min X f (X ) s.t. X 0, X 0. min f (x) s.t. x Ω i, i = 1, 2,..., m Just consensus optimization, with indicator functions for the sets. m x k = arg min f (x) (λ i x k 1 )T (x xk 1 i ) + α k 2 x x k 1 i 2 2, x i k = arg min x i Ω i i=1 (λ i k 1 )T (x k x i ) + α k 2 x k x i 2 2, i λ i k = λi k 1 α k(x k x i k ), i. Wright (UW-Madison) Constrained Optimization May / 46

45 ADMM and Prox-Linear Given min f (x) + τψ(x), x reformulate as the equality constrained problem: ADMM form: x k := min x min x,z f (x) + τψ(z) subject to x = z. f (x) + τψ(z k 1 ) + (λ k ) T (x z k 1 ) + α k 2 z k 1 x 2 2, z k := min f (x k ) + τψ(z) + (λ k ) T (x k z) + α k z 2 z x k 2 2, λ k+1 := λ k + α k (x k z k ). Minimization over z is the shrink operator often inexpensive. Minimization over x can be performed approximately using an algorithm suited to the form of f. Wright (UW-Madison) Constrained Optimization May / 46

46 The subproblems are not too different from those obtained in prox-linear algorithms: λ k is asymptotically similar to the gradient term in prox-linear, that is, λ k f (x k ); Thus, the minimization over z is quite similar to the prox-linear step. Wright (UW-Madison) Constrained Optimization May / 46

47 References I Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, second edition. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction methods of multipliers. Foundations and Trends in Machine Learning, 3(1): Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, second edition. Rao, C. V., Wright, S. J., and Rawlings, J. B. (1998). Application of interior-point methods to model predictive control. Journal of Optimization Theory and Applications, 99: Wright, S. J. (1997a). Applying new optimization algorithms to model predictive control. In Kantor, J. C., editor, Chemical Process Control-V, volume 93 of AIChE Symposium Series, pages CACHE Publications. Wright, S. J. (1997b). Primal-Dual Interior-Point Methods. SIAM, Philadelphia, PA. Wright (UW-Madison) Constrained Optimization May / 1

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Optimization Problems in Model Predictive Control

Optimization Problems in Model Predictive Control Optimization Problems in Model Predictive Control Stephen Wright Jim Rawlings, Matt Tenny, Gabriele Pannocchia University of Wisconsin-Madison FoCM 02 Minneapolis August 6, 2002 1 reminder! Send your new

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 4: Introduction to Interior Point Methods Dianne P. O Leary c 2008 Interior Point Methods We ll discuss

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Interior-Point Methods

Interior-Point Methods Interior-Point Methods Stephen Wright University of Wisconsin-Madison Simons, Berkeley, August, 2017 Wright (UW-Madison) Interior-Point Methods August 2017 1 / 48 Outline Introduction: Problems and Fundamentals

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming Altuğ Bitlislioğlu and Colin N. Jones Abstract This technical note discusses convergence

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

arxiv: v1 [math.na] 8 Jun 2018

arxiv: v1 [math.na] 8 Jun 2018 arxiv:1806.03347v1 [math.na] 8 Jun 2018 Interior Point Method with Modified Augmented Lagrangian for Penalty-Barrier Nonlinear Programming Martin Neuenhofen June 12, 2018 Abstract We present a numerical

More information

Algorithms for nonlinear programming problems II

Algorithms for nonlinear programming problems II Algorithms for nonlinear programming problems II Martin Branda Charles University Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Computational Aspects of Optimization

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Introduction to Nonlinear Stochastic Programming

Introduction to Nonlinear Stochastic Programming School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Course on Model Predictive Control Part II Linear MPC design

Course on Model Predictive Control Part II Linear MPC design Course on Model Predictive Control Part II Linear MPC design Gabriele Pannocchia Department of Chemical Engineering, University of Pisa, Italy Email: g.pannocchia@diccism.unipi.it Facoltà di Ingegneria,

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Efficient Convex Optimization for Linear MPC

Efficient Convex Optimization for Linear MPC Efficient Convex Optimization for Linear MPC Stephen J Wright Abstract MPC formulations with linear dynamics and quadratic objectives can be solved efficiently by using a primal-dual interior-point framework,

More information

Nonlinear optimization

Nonlinear optimization Nonlinear optimization Anders Forsgren Optimization and Systems Theory Department of Mathematics Royal Institute of Technology (KTH) Stockholm, Sweden evita Winter School 2009 Geilo, Norway January 11

More information

Primal-Dual Interior-Point Methods

Primal-Dual Interior-Point Methods Primal-Dual Interior-Point Methods Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 Outline Today: Primal-dual interior-point method Special case: linear programming

More information

On nonlinear optimization since M.J.D. Powell

On nonlinear optimization since M.J.D. Powell On nonlinear optimization since 1959 1 M.J.D. Powell Abstract: This view of the development of algorithms for nonlinear optimization is based on the research that has been of particular interest to the

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Algorithms for nonlinear programming problems II

Algorithms for nonlinear programming problems II Algorithms for nonlinear programming problems II Martin Branda Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Computational Aspects

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725 Primal-Dual Interior-Point Methods Javier Peña Convex Optimization 10-725/36-725 Last time: duality revisited Consider the problem min x subject to f(x) Ax = b h(x) 0 Lagrangian L(x, u, v) = f(x) + u T

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

An ADMM algorithm for optimal sensor and actuator selection

An ADMM algorithm for optimal sensor and actuator selection An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information