Lecture 18: Optimization Programming

Fall, 2016

Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming (QP), Linear Programming (LP)

Unconstrained Optimization Consider the minimization problem min x f (x), where x R d, and f is continuously differentiable. Necessary condition for local minimums If x is a local minimum of f (x), then f (x ) = 0. A point satisfying the necessary condition is called a stationary point, which can be a local minimum, local maximum, or saddle point. Sufficient conditions for x be a local minimum are: f (x ) = 0, 2 f (x ) is positive definite.

Equality- Inequality- Mixture- Consider the minimization problem min x f (x) subject to g i (x) 0, i = 1,, m. h j (x) = 0, j = 1,, k. Assume x R d. f : R d R is called the objective function g i : R d R, i = 1,, m are inequality constraints h j : R d R, j = 1,, k are equality constraints. Assume f, g i s, h j s are all continuously differentiable.

Single Equality Constraint Equality- Inequality- Mixture- Consider the minimization problem Introduce the Lagrange function: Lagrange extreme value theory: min x f (x) (1) subject to h(x) = 0. (2) L(x, λ) = f (x) + λh(x) If x is a local minimum under the constraint, then there exist some λ such that x L(x, λ ) = 0, λ L(x, λ ) = 0.

Interpretations Unconstrained Optimization Equality- Inequality- Mixture- Above conditions imply that (x, λ ) is a stationary point of L. The above equations are equivalent to x f (x ) + λ x h(x ) = 0, h(x ) = 0. Note the first condition gives d equations.

Multiple Equality Constraints Equality- Inequality- Mixture- Consider the minimization problem min x f (x) (3) subject to h j (x) = 0, j = 1,, k. (4) Introduce multiple Lagrange multipliers: λ = (λ 1,, λ k ) L(x, λ) = f (x) + k λ j h j (x) Lagrange extreme theory: If x is a local minimum under the constraint, then there exist some λ = (λ 1,, λ k ) such that j=1 x L(x, λ ) = 0, λ L(x, λ ) = 0. In other words, (x, λ ) is a stationary point of L.

Equality- Inequality- Mixture- Example: Entropy Maximization Problem Consider the minimization problem n min p i log(p i ), subject to p 1,,p n i=1 i=1 n p i = 1. i=1 The Lagrange function is n n L(p, λ) = p i log(p i ) + λ( p i 1). L p i = log(p i ) + 1 + λ = 0, log(p i ) = 1 λ, i = 1,, n. Together with n i=1 p i = 1, we have p 1 = = p n = 1 n. This problem is known as maximizing information entropy. i=1

Generalized Lagrange Methods Consider the minimization problem Equality- Inequality- Mixture- min x f (x) subject to g i (x) 0, i = 1,, m. Introduce multiple non-negative Lagrange multipliers: µ 1 0,, µ m 0. m L(x, µ) = f (x) µ i g i (x). µ i s are also called dual variables or slack variables. Correspondingly, we call x as primal variables. We would like to minimize L with respect to x and maximize L with respect to µ. i=1

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some µ = (µ 1,, µ m) such that Stationarity: x L(x, µ ) = 0, i.e., x f (x ) m µ k xg i (x ) = 0. i=1 Primal feasibility: g i (x ) 0 for i = 1,, m. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: KKT = Karush-Kuhn-Tucker µ i g i (x ) = 0, i = 1,, m.

Sufficient Optimality Conditions Equality- Inequality- Mixture- The above necessary conditions are also sufficient for optimality, if f, g i s are continuously differentiable f and g i s are convex functions. Under this case, There exists a solution x satisfying the KKT condition, x is actually a global optimum. Examples: SVM problems

Alternative Constraints Equality- Inequality- Mixture- Consider the minimization problem min x f (x) subject to g i (x) 0, i = 1,, m. Introduce multiple non-negative Lagrange multipliers: µ 1 0,, µ m 0. m L(x, µ) = f (x) + µ i g i (x). µ i s are also called dual variables or slack variables. Correspondingly, we call x as primal variables. We would like to minimize L with respect to x and maximize L with respect to µ. i=1

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some µ = (µ 1,, µ m) such that Stationarity: x L(x, µ ) = 0, i.e., x f (x ) + m µ k xg i (x ) = 0. i=1 Primal feasibility: g i (x ) 0 for i = 1,, m. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: µ i g i (x ) = 0, i = 1,, m.

Generalized Lagrange Methods Consider the minimization problem Equality- Inequality- Mixture- min x f (x) subject to g i (x) 0, i = 1,, m. h j (x) = 0, j = 1,, k. Introduce two sets of Lagrange multipliers: Define the Lagrange function: µ 1 0,, µ m 0, λ 1,, λ k. L(x, µ, λ) = f (x) m µ i g i (x) + i=1 k λ j h j (x) j=1 µ i s and λ j s are dual variables or slack variables.

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some (µ, λ ) such that Stationarity: x L(x, µ, λ ) = 0, i.e., x f (x ) Primal feasibility: m µ i x g i (x ) + i=1 k λ j x h j (x ) = 0. j=1 g i (x ) 0, i = 1,, m; h j (x ) = 0, j = 1,, k. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: µ i g i (x ) = 0, i = 1,, m.

Sufficient Optimality Conditions Equality- Inequality- Mixture- The above necessary conditions are also sufficient for optimality, if f, g i s are continuously differentiable. f and g i s are convex functions, h j s are linear constraints. Under this case, There exists a solution x satisfying the KKT condition, x is actually a global optimum.

Quadratic Programming (QP) Quadratic Programming (QP) The objective function is quadratic in x The constraints are linear in x. Consider the minimization problem min x f (x) = 1 2 xt Qx + c T x, subject to Ax b, Ex = d. where Q is a symmetric matrix of d d, A is a matrix of m d, E is a matrix of k d, c R d, b R m, and d R k. Example: SVM

Properties of QP Unconstrained Optimization Quadratic Programming (QP) If Q is a non-negative definite matrix, then f (x) is convex. If Q is a non-negative definite, the above QP has a global minimizer if there exists at least one x satisfying the constraints and f (x) is bounded below in the feasible region. If Q is a positive definite, the above QP has a unique global minimizer. If Q = 0, the problem becomes a linear programming (LP). From optimization theory, If Q is non-negative definite, the necessary and sufficient condition for a point x to be a global minimizer is for it to satisfy the KKT condition.

Solving QP Unconstrained Optimization Quadratic Programming (QP) If there are only equality constraints, then the QP can be solved by a linear system. In general, a variety of methods for solving the QP are commonly used, including Methods: ellipsoid, interior point, active set, conjugate gradient, etc Software packages: Matlab, CPLEX, AMPL, GAMS, etc. For positive definite Q, the ellipsoid method solves the problem in polynomial time. In other words, the running time is upper bounded by a polynomial in the size of the input for the algorithm, i.e., T = O(d k ) for some k.

Dual Problem for QP Quadratic Programming (QP) Consider the minimization problem The Lagrange function. min f (x) = 1 x 2 xt Qx, subject to Ax b. L(x, λ) = f (x) + λ T (Ax b). We need to minimize L with respect to x and maximize L with respect to λ. The dual function is defined as a function of dual variables only by solving x first with any fixed λ: G(λ) = inf x L(x, λ).

Dual Problem Derivation for QP Quadratic Programming (QP) With fixed λ, we solve the equation L x (x, λ) = 0 and obtain x = Q 1 A T λ. Plug this into L, we get the dual function The dual problem of the QP is G(λ) = 1 2 λt AQ 1 A T b T λ. max G(λ) = 1 λ 2 λt AQ 1 A T b T λ, subject to λ 0. We first find λ = arg max λ G(λ), and then compute x = Q 1 A T λ. The idea is similar to the profile method.