Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control
|
|
- Jasmine Phelps
- 6 years ago
- Views:
Transcription
1 Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Constrained Optimization May / 46
2 In This Section... My first talk was about optimization formulations, optimality conditions, and duality for LP, QP, LCP, and nonlinear optimization. This section will review some algorithms, in particular: Primal-dual interior-point (PDIP) methods Augmented Lagrangian (AL) methods. Both are useful in control applications. We ll say something about PDIP methods for model-predictive control, and how they exploit the structure in that problem. Wright (UW-Madison) Constrained Optimization May / 46
3 Recapping Gradient Methods Considering unconstrained minimization min f (x) where f is smooth and convex, or the constrained version in which x is restricted to the set Ω, usually closed and convex. First-order or gradient methods take steps of the form x k+1 = x k α k g k, where α k R + is a steplength and g k is a search direction. d k is constructed from knowledge of the gradient f (x) at the current iterate x = x k and possibly previous iterates x k 1, x k 2,.... Can extend to nonsmooth f by using the subgradient f (x). Extend to constrained minimization by projecting the search line onto the convex set Ω, or (similarly) minimizing a linear approximation to f over Ω. Wright (UW-Madison) Constrained Optimization May / 46
4 Prox Interpretation of Line Search Can view the gradient method step x k+1 = x k α k g k as the minimization of a first-order model of f plus a prox-term which prevents the step from being too long: x k+1 = arg min x f (x k ) + g T k (x x k) + 1 2α k x x k 2 2. Taking the gradient the quadratic and setting to zero, we obtain which gives the formula for x k+1. g k + 1 α k (x k+1 x k ) = 0, This viewpoint is the key to several extensions. Wright (UW-Madison) Constrained Optimization May / 46
5 Extensions: Constraints When a constraint set Ω is present we can simply minimize the quadratic model function over Ω: x k+1 = arg min x Ω f (x k) + g T k (x x k) + 1 2α k x x k 2 2. Gradient Projection has this form. We can replace the l 2 -norm measure of distance with some other measure φ(x; x k ): x k+1 = arg min x Ω f (x k) + g T k (x x k) + 1 2α k φ(x; x k ). Could choose φ to match Ω. For example, a measure derived from the entropy function is a good match for the simplex Ω := {x x 0, e T x = 1}. Wright (UW-Madison) Constrained Optimization May / 46
6 Extensions: Regularizers In many modern applications of optimization, f has the form f (x) = l(x) }{{} + τψ(x). }{{} smooth function simple nonsmooth function Can extend the prox approach above by choosing g k to contain gradient information from l(x) only; including τ ψ(x) explicitly in the subproblem. Subproblems are thus: x k+1 = arg min x l(x k ) + g T k (x x k) + 1 2α k x x k 2 + τψ(x). Wright (UW-Madison) Constrained Optimization May / 46
7 Extensions: Explicit Trust Regions Rather than penalizing distance moved from current x k, we can enforce an explicit constraint: a trust region. x k+1 = arg min x f (x k ) + g T k (x x k) + I x xk k (x), where I Λ (x) denotes an indicator function with { 0 if x Λ I Λ (x) = otherwise. Adjust trust-region radius k to ensure progress e.g. descent in f. Wright (UW-Madison) Constrained Optimization May / 46
8 Extension: Proximal Point Could use the original f in the subproblem rather than a simpler model function: x k+1 = arg min f (x) + 1 x x k 2 x 2α 2. k although the subproblem seems just as hard to solve as the original, the prox-term may make it easier by introducing strong convexity, and may stabilize progress. Can extend to constrained and regularized cases also. Wright (UW-Madison) Constrained Optimization May / 46
9 Quadratic Models: Newton s Method We can extend the iterative strategy further by adding a quadratic term to the model, instead of (or in addition to) the simple prox-term above. Taylor s Theorem suggests basing this term on the Hessian (second-derivative) matrix. That is, obtain the step from x k+1 := arg min x f (x k ) + f (x k ) T (x x k ) (x x k) T 2 f (x k )(x x k ). Can reformulate to solve for the step d k : Then have x k+1 = x k + d k, where d k := arg min d f (x k ) + f (x k ) T d d T 2 f (x k )d. See immediately that this model won t have a bounded solution if 2 f (x k ) is not positive definite. It usually is positive definite near a strict local solution x, but need something that works more globally. Wright (UW-Madison) Constrained Optimization May / 46
10 Practical Newton Method One obvious strategy is to add the prox-term to the quadratic model: d k := arg min f (x k ) + f (x k ) T d + 1 ( d 2 d T 2 f (x k ) + 1 ) I d, α k choosing α k so that The quadratic term is positive definite; Some other desirable property holds, e.g. descent f (x k + d k ) < f (x k ). We can also impose the trust-region explicitly: d k := arg min d or alternatively: d k := arg f (x k ) + f (x k ) T d d T 2 f (x k )d + I d k (x), min d : d k f (x k ) + f (x k ) T d d T 2 f (x k )d. But this is equivalent: For any k, there exists α k such that the solutions of the prox form and the trust-region form are identical. Wright (UW-Madison) Constrained Optimization May / 46
11 Quasi-Newton Methods Another disadvantage of Newton is that the Hessian may be difficult to evaluate or otherwise work with. The quadratic model is still useful when we use first-order information to learn about the Hessian. Key observation (from Taylor s theorem): the secant condition: 2 f (x k )(x k+1 x k ) f (x k+1 ) f (x k ). The difference of gradients tells us how the Hessian behaves along the direction x k+1 x k. By aggregating such information over multiple steps, we can build up an approximation to the Hessian than is valid along multiple directions. Quasi-Newton Methods maintain an approximation B k to 2 f (x k ) that respects the secant condition. The approximation may be implicit rather than explicit, and we may store an approximation to the inverse Hessian instead. Wright (UW-Madison) Constrained Optimization May / 46
12 L-BFGS A particularly populat quasi-newton method, suitable for large-scale problems is the limited-memory BFGS method (L-BFGS) which stores the Hessian or inverse Hessian approximation implicitly. L-BFGS stores the last 5 10 update pairs: s j := x j+1 x j, y j := f (x j+1 ) f (x j ). for j = k, k 1, k 2,..., k m. Can implicitly construct H k+1 that satisfies H k+1 y j = s j. In fact, an efficient recursive formula is available for evaluating d k+1 := H k+1 f (x k+1 ) the next search direction directly from the (s j, y j ) pairs and from some initial estimate of the form (1/α k+1 )I. Wright (UW-Madison) Constrained Optimization May / 46
13 Newton for Nonlinear Equations There is also a variant of Newton s method for nonlinear equations: Find x such that F (x) = 0, where F : R n R n (n equations in n unknowns.) Newton s method forms a linear approximation to this system, based on another variant of Taylor s Theorem which says F (x + d) = F (x) + J(x)d [J(x + td) J(x)]d dt, where J(x) is the Jacobian matrix of first partial derivatives: F 1 x 1 J(x) =. F n x 1 F 1 x n. F n x n (usually not symmetric). Wright (UW-Madison) Constrained Optimization May / 46
14 When F is continuously differentiable, we have F (x k + d) F (x k ) + J(x k )d, so the Newton step is the one that make the right-hand side zero: d k := J(x k ) 1 F (x k ). The basis Newton method takes steps x k+1 := x k + d k. Its effectiveness can be improved by Doing a line search: x k+1 := x k + α k d k, for some α k > 0; Levenberg strategy: add λi to J and set d k := (J(x k ) + λi ) 1 F (x k ). Guide progress via a merit function, usually φ(x) := 1 2 F (x) 2 2. Achtung! Can get stuck in a local min of φ that s not a solution of F (x) = 0. Wright (UW-Madison) Constrained Optimization May / 46
15 Homotopy Tries to avoid the local-min issue with the merit function. Start with an easier set of nonlinear equations, and gradually deform it to the system F (x) = 0, tracking changes to the solution as you go. F (x, λ) := λf (x) + (1 λ)f 0 (x), λ [0, 1]. Assume that F (x, 0) = F 0 (x) = 0 has solution x 0. Homotopy methods trace the curve of solutions (x, λ) until λ = 1 is reached. The corresponding value of x then solves the original problem. Many variants. Some supporting theory. Typically more expensive than enhanced Newton methods, but better at finding solutions to F (x) = 0. We mention homotopy mostly because of its connection to interior-point methods. Wright (UW-Madison) Constrained Optimization May / 46
16 Interior-Point Methods Recall the monotone LCP: Find z R n such that 0 z Mz + q 0, where M R n n is positive semidefinite, and q R n. Recall too that monotone LCP is a generalization of LP and convex QP. Rewrite LCP as w = Mz + q, (w, z) 0, w i z i = 0, i = 1, 2,..., n, which is a constrained system of nonlinear equations: F (z, w) = 0, (z, w) 0, where F : R 2n R 2n is defined as [ ] Mz + q w F (z, w) = = WZe where [ ] 0, 0 W = diag(w 1, w 2,..., w n ), Z = diag(z 1, z 2,..., z n ), e = (1, 1,..., 1) T. Wright (UW-Madison) Constrained Optimization May / 46
17 Interior-Point Methods Interior-point methods generate iterates that satisfy the nonnegativity constraints strictly, that is, (z k, w k ) > 0 for all k. An obvious strategy is to search along Newton directions for F, with a line search to maintain positivity of (z k+1, w k+1 ). The Newton equations are: [ M I W Z ] [ ] z = w [ ] Mz + q w WZe This affine scaling approach can actually work, and convergence can be proved, but it needs very precise conditions on the steplength to avoid getting jammed at the boundary. Much better performance is obtained from methods that use homotopy, in particular, the idea of following a central path. Wright (UW-Madison) Constrained Optimization May / 46
18 The Central Path The central path is a critical object in primal-dual interior-point methods. It s defined by the parametric equations: [ ] [ ] Mz + q w 0 F (z, w; τ) := =, WZe τe 0 where τ 0 is the central path parameter, along with the conditions (z, w) > 0. The second block of conditions is saying that w i z i = τ, for all i = 1, 2,..., n, so if τ > 0, we must have all components of w and z strictly positive hence the term interior point. 1 1 Actually, interior-point methods don t force the first condition Mz + q w = 0 to hold except in the limit, so the iterates are not really interior in the send of being strictly feasible. Wright (UW-Madison) Constrained Optimization May / 46
19 Primal-Dual Interior-Point Methods The central path guides iterates to the solution. The basic primal-dual interior-point method works as follows: Compute Newton steps for F (z, w; τ k ) for some value of τ k. Take a step (z k+1, w k+1 ) = (z k, w k ) + α k ( z k, w k ), choosing α k so that (z k+1, w k+1 ) remains sufficiently positive. Possibly enhance with some auxiliary corrector steps. Choose a smaller value τ k+1 < τ k and repeat. The effect is that it chases a moving target along the central path, staying reasonably close to the central path to avoid getting jammed near the boundary of the region (w, z) 0. Wright (UW-Madison) Constrained Optimization May / 46
20 A Simple PDIP Method Usually works, simple to code (one page!) Choose τ k+1 =.9τ k ; Choose α k to be the largest α for which (z k i + α z k i )(w k i + α w k i ).01(z k + α z k ) T (w k + α w k )/n. The second condition ensures that iterates stay in a loose neighborhood of the central path none of the variables go to zero prematurely. Wright (UW-Madison) Constrained Optimization May / 46
21 A More Elaborate PDIP Method (Mehrotra) Chooses τ k+1 by a heuristic based on the performance of the affine scaling step ( z k aff, w k aff), which is computed by solving the Newton equations with τ = 0. If we can take a long step along the affine scaling direction before reaching the boundary, choose τ k+1 much smaller than τ k an aggressive choice. Otherwise, if the affine scaling step quickly jams at the boundary, make a more conservative choice τ k+1 only a little smaller than τ k. Mehrotra s method also includes a corrector step, that corrects for the nonlinearity revealed by the affine-scaling step. The combined centering-corrector step is obtained from [ ] [ ] [ M I zcc = W Z w cc 0 τ k+1 e z k aff w k affe This is added to the affine-scaling step to obtain the search direction. Wright (UW-Madison) Constrained Optimization May / 46 ].
22 Solving the Linear Equations At each iteration need to solve two or more linear systems of the form: [ ] [ ] [ ] M I z rf =, W Z w where W and Z are the diagonals of w k and z k which contain allpositive numbers. r zw r f and r zw are different right-hand sides (affine-scaling, centering-corrector). Note that there is a lot of structure in this system three of the blocks are N N nonsingular diagonal and that the structure is the same at every interior-point iteration. Wright (UW-Madison) Constrained Optimization May / 46
23 Reduced Form Can do block elimination (of w) to obtain Note that (M + Z 1 W ) z = r f + Z 1 r wz. When M is positive semidefinite (as it is for monotone LCP, then M + Z 1 W is positive definite). If we can identify a good strategy for reordering rows and columns of M to solve this efficiently, we can re-use this ordering at every iteration, because M + Z 1 W has the same nonzero pattern (only the diagonals change). Since some elements of Z 1 W are going to as the iterates near a solution while others are going to zero, this system can become highly ill-conditioned but it doesn t seem to matter much. Wright (UW-Madison) Constrained Optimization May / 46
24 Application to LP When we apply this technique to LCPs arising from LP, M has a particular form, and the linear system that we solve has the form: 0 A T I x r p A 0 0 λ = r d, S 0 X s r xs where X = diag(x 1, x 2,..., x n ) and S = diag(s 1, s 2,..., s n ). By eliminating s we obtain a reduced form: [ S 1 X A T ] [ ] [ x rp S = 1 ] r xs. A 0 λ Since S 1 X is positive diagonal we can go a step further and eliminate x: A(SX 1 )A T λ = r d + ASX 1 (r p S 1 r xs ). r d Wright (UW-Madison) Constrained Optimization May / 46
25 Thus we have a coefficient matrix of the form ADA T where D = SX 1 is positive diagonal. Some elements of D go to zero as the the iterates converge (those for which s i = 0 at the solution) while others go to (those for which x i = 0). Hence the system MAY be ill conditioned. In practical codes, this matrix product is actually formed and factored at every iteration, using a variant of the Cholesky factorization. This factorization is stable regardless of the ordering of rows/columns, so we are free to use orderings that create the least fill-in during Cholesky. Good codes: MOSEK, GUrobi, CPLEX. Freebie: PCx. Wright (UW-Madison) Constrained Optimization May / 46
26 Optimal Control (Open Loop) Given current state x 0, choose a time horizon N (long), and solve the optimization problem for x = {x k } N k=0, u = {u k} N 1 k=0 : min L(x, u), subject to x 0 given, x k+1 = F k (x k, u k ), k = 0, 1,..., N 1, other constraints on x, u. Then apply controls u 0, u 1, u 2,... blindly, without monitoring the state. flexible with respect to nonlinearity and constraints; if model F k is inaccurate, solution may be bad; doesn t account for system disturbances during time horizon; never used in industrial practice! Wright (UW-Madison) Constrained Optimization May / 46
27 Linear-Quadratic Regulator (Closed Loop) A canonical control problem. For given x 0, solve: min x,u Φ(x, u) := 1 2 xk T Qx k + uk T Ru k s.t. x k+1 = Ax k + Bu k. k=0 From KKT conditions, dependence of optimal values of x 1, x 2,... and u 0, u 1,... on initial x 0 is linear. We can substitute these variables out to obtain min x,u Φ(x, u) = 1 2 x T 0 Πx 0, for some s.p.d. matrix Π. By using this dynamic programming principle, isolating the first stage, can write the problem as: 1 min x 1,u 0 2 (x 0 T Qx 0 + u0 T Ru 0 ) x 1 T Πx 1 s.t. x 1 = Ax 0 + Bu 0. Wright (UW-Madison) Constrained Optimization May / 46
28 By substituting for x 1, get unconstrained quadratic problem in u 0. Minimizer is so that u 0 = Kx 0, where K = (R + B T ΠB) 1 B T ΠA. By substituting for u 0 and x 1 in obtain the Riccati equation: x 1 = Ax 0 + Bu 0 = (A + BK)x x T 0 Πx 0 = 1 2 (x T 0 Qx 0 + u T 0 Ru 0 ) x T 1 Πx 1, Π = Q + A T ΠA A T Π T B(R + B T ΠB) 1 B T ΠA. There are well-known techniques to solve this equation for Π, hence K. Hence, we have a feedback control law u = Kx that is optimal for the LQR problem. This is a closed-loop control strategy that can respond to changes in state. Wright (UW-Madison) Constrained Optimization May / 46
29 Model Predictive Control (MPC) Closed-loop has the highly desirable property of being able to respond to system disturbances and modeling errors. But closed-form solutions can be found only for simple models such as LQR. Open-loop allows for a rich variety of models and constraints, and yields a highly structured optimization problem that can often be solved efficiently. But it can t fix disturbances and errors. Set and forget at your peril! Model-Predictive Control (MPC) is a way of doing closed-loop control using open-loop / optimal-control techniques. Set up a optimal control problem, initialized at the current state, with a finite planning horizon. Solve this open-loop problem (quickly!) and implement the control u 0. At the next decision point, repeat the process. Can use the previous solution (shifted forward one period) as a warm start. Wright (UW-Madison) Constrained Optimization May / 46
30 Linear MPC A generalization of the linear-quadratic regulator (LQR) problem includes constraints: min x,u 1 2 xk T Qx k + uk T Ru k, k=0 subject to x k+1 = Ax k + Bu k, k = 0, 1, 2,..., x k X, u k U, possibly also mixed constraints, and constraints on u k+1 u k. Assuming that 0 int(x ), 0 int(u) and that the system is stabilizable, we expect that u k 0 and x k 0 as k. Therefore, for large enough k, the non-model constraints become inactive. Wright (UW-Madison) Constrained Optimization May / 46
31 Hence, for N large enough, the problem is equivalent to the following (finite) problem: min x,u N 1 1 xk T 2 Qx k + uk T Ru k x N T Πx N, k=0 subject to x k+1 = Ax k + Bu k, k = 0, 1, 2,..., N 1 x k X, u k U, k = 0, 1, 2,..., N 1, where Π is the solution of the Riccati equation. In the tail of the sequence (k > N) simply apply the unconstrained LQR feedback law, derived above. When constraints are linear, need to solve a finite, structured, convex quadratic program. Wright (UW-Madison) Constrained Optimization May / 46
32 Interior-Point Method for MPC min u,x,ɛ N 1 k=0 subject to 1 2 (x T k Qx k + u T k Ru k + 2x T k Mu k + ɛ T k Zɛ k) + z T ɛ k + x T N Πx N, x 0 = ˆx j, (fixed) x k+1 = Ax k + Bu k, k = 0, 1,..., N 1, Du k Gx k d, k = 0, 1,..., N 1, Hx k ɛ k h, k = 1, 2,..., N, ɛ k 0, k = 1, 2,..., N, Fx N = 0. Soft contraints on the state (involving ɛ k ); Also general hard constraints on (x k, u k ). Can use these to implement constraints on control changes u k+1 u k Wright (UW-Madison) Constrained Optimization May / 46
33 Introduce dual variables, use stagewise ordering. Primal-dual interior-point method yields a block-banded system at each iteration:... Q M G T A T.. x M T R D T B T k r x k G D Σ D u k r u k k λ A B I k r λ k Σ ɛ p k+1 I k+1 r p = k+1 ξ Σ H k+1 I H k+1 r ξ k+1 η k+1 I I Z r η k+1 ɛ k+1 I H T Q... x rk+1 ɛ k+1 rk+1 x.. where Σ D k, Σɛ k+1, etc are diagonal. Wright (UW-Madison) Constrained Optimization May / 46
34 By performing block elimination, get reduced system R 0 B B T u I 0 r u 0 I Q 1 M 1 A T p 0 r p 0 M1 T R 1 B T x 1 r x 1 u A B I 1 r u 1 I Q 2 M 2 A T p 1 r p 1 M2 T R 2 B T x 2 = r x 2. u. A B... 2 r u QN F T x N r N x β r β F which has the same structure as the KKT system of a problem without side constraints (soft or hard). Wright (UW-Madison) Constrained Optimization May / 46
35 Can solve by applying a banded linear solver: O(N) operations. Alternatively, seek matrices Π k and vectors π k such that the following relationship is satisfied between p k 1 and x k : p k 1 + Π k x k = π k, quadk = N, N 1,..., 1. By substituting in the linear system, find a recurrence relation: Π N = Q N, π N = r x N, Π k 1 = Q k 1 + A T Π k A (A T Π k B + M k 1 )(R k 1 + B T Π k B) 1 (B T Π k A + Mk 1), T π k 1 = r k 1 x + A T Π k r p k 1 + AT π k (A T Π k B + M k 1 )(R k 1 + B T Π k B) 1 ( r k 1 u + B T Π k r p k 1 + BT π k ). The recurrence for Π k is the discrete time-varying Riccati equation! Wright (UW-Madison) Constrained Optimization May / 46
36 Duality for Nonlinear Programming (Refresher) Recall yesterday s conversation about duality for general constrained problems: min f (x) subject to c i (x) 0, i = 1, 2,..., m. with the Lagrangian defined by: L(x, λ) := f (x) λ T c(x) = f (x) m λ i c i (x). i=1 Remember that we defined primal and dual objectives: r(x) = sup L(x, λ); λ 0 q(λ) = inf x so the primal and dual problems are: L(x, λ), min x r(x), max λ 0 q(λ). Wright (UW-Madison) Constrained Optimization May / 46
37 Augmented Lagrangian Can motivate it crudely as proximal point on the dual. Consider the dual: max λ 0 q(λ) = max inf f (x) λ T c(x). λ 0 x Add a proximal point term a quadratic penalty on the distance moved from the last dual iterate λ k : max inf f (x) λ T c(x) 1 λ λ k 2 λ 0 x 2 2α k Note that the objective here is simple in λ. Now switch the max and inf: { inf max f (x) x λ 0 λt c(x) 1 } λ λ k α k We can solve the max problem, to get an explicit value for λ! This is easy because the components of λ are separated in the objective; we can solve for each one individually. Wright (UW-Madison) Constrained Optimization May / 46
38 Explicit solution is λ i = { 0 if α k c i (x) + λ k i 0; α k c i (x) + λ k i otherwise. Substitute this result in the previous expression to get min x f (x) + + i : c i (x) λ k i /α k i : c i (x)<λ k i /α k 1 (λ k i ) 2 2α k ( αk ) 2 c i(x) 2 λ k i c i (x). Thus the basic augmented Lagrangian process is (iteration k): Minimize the augmented Lagrangian function above for x (approximately); call it x k+1 ; Plug this x into the explicit-max formula for λ to get λ k+1 ; Choose α k+1 for the next iteration. Wright (UW-Madison) Constrained Optimization May / 46
39 Equality Constraints For the equality constrained case, the formulae simplify a lot. We have min x f (x) subject to d j (x) = 0, j = 1, 2,..., p. Dual objective is inf x L(x, µ) := f (x) µ T d(x). Augmented Lagrangian subproblems are: x k+1 = arg min x µ k+1 = µ k α k d(x k+1 ). f (x) (µ k ) T d(x) + α k 2 d(x) 2 2, Wright (UW-Madison) Constrained Optimization May / 46
40 Augmented Lagrangian: History and Practice How accurately to solve the subproblem for x, which is unconstrained but nonlinear? How to adjust α k? Use a different α k value for each constraint; increase it when this constraint is not becoming feasible rapidly enough. Historical Sketch: Dates from 1969: Hestenes, Powell. Developments in 1970s- early 1980s by Rockafellar, Bertsekas, others. Lancelot code for nonlinear programming: Conn, Gould, Toint, Largely lost favor as an approach for general nonlinear programming during the next 15 years. Recent revival in the context of sparse optimization and its many applications, in conjunction with splitting / coordinate descent. Wright (UW-Madison) Constrained Optimization May / 46
41 Separable Objectives: ADMM Alternating Directions Method of Multipliers (ADMM) arises when the objective in the basic linearly constrained problem is separable: for which min f (x) + h(z) subject to Ax + Bz = c, (x,z) L(x, z, λ; ρ) := f (x) + h(z) λ T (Ax + Bz c) + α 2 Ax Bz c 2 2. Standard augmented Lagrangian would minimize L(x, z, λ; ρ) over (x, z) jointly but these are coupled through the quadratic term, so the advantage of separability is lost. Instead, minimize over x and z separately and sequentially: x k = arg min x L(x, z k 1, λ k 1 ; α k ); z k = arg min z L(x k, z, λ k 1 ; α k ); λ k = λ k 1 α k (Ax k + Bz k c). Wright (UW-Madison) Constrained Optimization May / 46
42 ADMM Basically, does a round of block-coordinate descent in (x, z). The minimizations over x and z add only a quadratic term to f and h, respectively. This does not alter the cost much. Can perform these minimizations inexactly. Convergence is often slow, but sufficient for many applications. Many recent applications to compressed sensing, image processing, matrix completion, sparse principal components analysis. ADMM has a rich collection of antecendents. A nice recent survey, including a diverse collection of machine learning applications, is Boyd et al. (2011). Wright (UW-Madison) Constrained Optimization May / 46
43 ADMM for Consensus Optimization Given min m f i (x), form m copies of the x, with the original x as a master variable: m f i (x i ) subject to x i = x, i = 1, 2,..., m. min x,x 1,x 2,...,x m i=1 i=1 Apply ADMM, with z = (x 1, x 2,..., x m ), get xk i = arg min f i (x i ) (λ i x i k 1 )T (x i x k 1 ) + α k 2 x i x k 1 2 2, i, x k = 1 m ( xk i m 1 ) λ i k 1, α k i=1 λ i k = λi k 1 α k(x i k x k), i Obvious parallel possibilities in the x i updates. Synchronize for x update. Wright (UW-Madison) Constrained Optimization May / 46
44 ADMM for Awkward Intersections The feasible set is sometimes an intersection of two or more convex sets that are easy to handle separately (e.g. projections are easily computable), but whose intersection is more difficult to work with. Example: Optimization over the cone of doubly nonnegative matrices: General form: min X f (X ) s.t. X 0, X 0. min f (x) s.t. x Ω i, i = 1, 2,..., m Just consensus optimization, with indicator functions for the sets. m x k = arg min f (x) (λ i x k 1 )T (x xk 1 i ) + α k 2 x x k 1 i 2 2, x i k = arg min x i Ω i i=1 (λ i k 1 )T (x k x i ) + α k 2 x k x i 2 2, i λ i k = λi k 1 α k(x k x i k ), i. Wright (UW-Madison) Constrained Optimization May / 46
45 ADMM and Prox-Linear Given min f (x) + τψ(x), x reformulate as the equality constrained problem: ADMM form: x k := min x min x,z f (x) + τψ(z) subject to x = z. f (x) + τψ(z k 1 ) + (λ k ) T (x z k 1 ) + α k 2 z k 1 x 2 2, z k := min f (x k ) + τψ(z) + (λ k ) T (x k z) + α k z 2 z x k 2 2, λ k+1 := λ k + α k (x k z k ). Minimization over z is the shrink operator often inexpensive. Minimization over x can be performed approximately using an algorithm suited to the form of f. Wright (UW-Madison) Constrained Optimization May / 46
46 The subproblems are not too different from those obtained in prox-linear algorithms: λ k is asymptotically similar to the gradient term in prox-linear, that is, λ k f (x k ); Thus, the minimization over z is quite similar to the prox-linear step. Wright (UW-Madison) Constrained Optimization May / 46
47 References I Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, second edition. Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction methods of multipliers. Foundations and Trends in Machine Learning, 3(1): Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, New York, second edition. Rao, C. V., Wright, S. J., and Rawlings, J. B. (1998). Application of interior-point methods to model predictive control. Journal of Optimization Theory and Applications, 99: Wright, S. J. (1997a). Applying new optimization algorithms to model predictive control. In Kantor, J. C., editor, Chemical Process Control-V, volume 93 of AIChE Symposium Series, pages CACHE Publications. Wright, S. J. (1997b). Primal-Dual Interior-Point Methods. SIAM, Philadelphia, PA. Wright (UW-Madison) Constrained Optimization May / 1
Optimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationOptimization Problems in Model Predictive Control
Optimization Problems in Model Predictive Control Stephen Wright Jim Rawlings, Matt Tenny, Gabriele Pannocchia University of Wisconsin-Madison FoCM 02 Minneapolis August 6, 2002 1 reminder! Send your new
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationPenalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.
AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationInterior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems
AMSC 607 / CMSC 764 Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 4: Introduction to Interior Point Methods Dianne P. O Leary c 2008 Interior Point Methods We ll discuss
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationSF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren
SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationInterior-Point Methods
Interior-Point Methods Stephen Wright University of Wisconsin-Madison Simons, Berkeley, August, 2017 Wright (UW-Madison) Interior-Point Methods August 2017 1 / 48 Outline Introduction: Problems and Fundamentals
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationPenalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationAn Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84
An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationOn Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming
On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming Altuğ Bitlislioğlu and Colin N. Jones Abstract This technical note discusses convergence
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationarxiv: v1 [math.na] 8 Jun 2018
arxiv:1806.03347v1 [math.na] 8 Jun 2018 Interior Point Method with Modified Augmented Lagrangian for Penalty-Barrier Nonlinear Programming Martin Neuenhofen June 12, 2018 Abstract We present a numerical
More informationAlgorithms for nonlinear programming problems II
Algorithms for nonlinear programming problems II Martin Branda Charles University Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Computational Aspects of Optimization
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAn Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints
An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:
More informationLinear Programming: Simplex
Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationLecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming
Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More information2.3 Linear Programming
2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationIntroduction to Nonlinear Stochastic Programming
School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS
More informationPrimal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization
Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationCourse on Model Predictive Control Part II Linear MPC design
Course on Model Predictive Control Part II Linear MPC design Gabriele Pannocchia Department of Chemical Engineering, University of Pisa, Italy Email: g.pannocchia@diccism.unipi.it Facoltà di Ingegneria,
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationEfficient Convex Optimization for Linear MPC
Efficient Convex Optimization for Linear MPC Stephen J Wright Abstract MPC formulations with linear dynamics and quadratic objectives can be solved efficiently by using a primal-dual interior-point framework,
More informationNonlinear optimization
Nonlinear optimization Anders Forsgren Optimization and Systems Theory Department of Mathematics Royal Institute of Technology (KTH) Stockholm, Sweden evita Winter School 2009 Geilo, Norway January 11
More informationPrimal-Dual Interior-Point Methods
Primal-Dual Interior-Point Methods Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 Outline Today: Primal-dual interior-point method Special case: linear programming
More informationOn nonlinear optimization since M.J.D. Powell
On nonlinear optimization since 1959 1 M.J.D. Powell Abstract: This view of the development of algorithms for nonlinear optimization is based on the research that has been of particular interest to the
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationAN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING
AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationAlgorithms for nonlinear programming problems II
Algorithms for nonlinear programming problems II Martin Branda Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Computational Aspects
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationPart 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x
More informationPrimal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725
Primal-Dual Interior-Point Methods Javier Peña Convex Optimization 10-725/36-725 Last time: duality revisited Consider the problem min x subject to f(x) Ax = b h(x) 0 Lagrangian L(x, u, v) = f(x) + u T
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationAn ADMM algorithm for optimal sensor and actuator selection
An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More information