Linear programming II

Similar documents
CS711008Z Algorithm Design and Analysis

Lecture 18: Optimization Programming

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Constrained Optimization and Lagrangian Duality

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Primal-Dual Interior-Point Methods

Computational Finance

Interior-Point Methods for Linear Optimization

Convex Optimization & Lagrange Duality

5. Duality. Lagrangian

Optimality, Duality, Complementarity for Constrained Optimization

Lagrangian Duality Theory

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture #21. c T x Ax b. maximize subject to

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ICS-E4030 Kernel Methods in Machine Learning

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lecture 6: Conic Optimization September 8

Lecture: Duality of LP, SOCP and SDP

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Lagrange duality. The Lagrangian. We consider an optimization program of the form


Written Examination

minimize x subject to (x 2)(x 4) u,

Convex Optimization Boyd & Vandenberghe. 5. Duality

Linear and Combinatorial Optimization

10 Numerical methods for constrained problems

Support Vector Machines for Regression

Chap6 Duality Theory and Sensitivity Analysis

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

A Brief Review on Convex Optimization

Generalization to inequality constrained problem. Maximize

Convex Optimization M2

Constrained Optimization

Support Vector Machines: Maximum Margin Classifiers

LP. Kap. 17: Interior-point methods

CO 250 Final Exam Guide

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization and SVM

Lecture 7: Convex Optimizations

Linear & nonlinear classifiers

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

5 Handling Constraints

Interior Point Methods for Mathematical Programming

On the Relationship Between Regression Analysis and Mathematical Programming

Lecture: Duality.

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Linear Programming: Simplex

Nonlinear Optimization: What s important?

Homework 3. Convex Optimization /36-725

Convex Optimization and Support Vector Machine

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Convex Optimization and Modeling

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Lecture 10: Linear programming. duality. and. The dual of the LP in standard form. maximize w = b T y (D) subject to A T y c, minimize z = c T x (P)

EE/AA 578, Univ of Washington, Fall Duality

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Lecture 13: Constrained optimization

Machine Learning. Support Vector Machines. Manfred Huber

Support vector machines

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming

1. f(β) 0 (that is, β is a feasible point for the constraints)

Numerical Optimization

Algorithms for constrained local optimization

Support Vector Machines

12. Interior-point methods

IE 5531 Midterm #2 Solutions

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Optimization and Root Finding. Kurt Hornik

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Linear and non-linear programming

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

Optimisation in Higher Dimensions

3 The Simplex Method. 3.1 Basic Solutions

Homework 4. Convex Optimization /36-725

CSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1

Lecture 16: October 22

Algorithms for Constrained Optimization

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

Review Solutions, Exam 2, Operations Research

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

Interior Point Methods in Mathematical Programming

Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for So far, we have considered unconstrained optimization problems.

Chapter 2. Optimization. Gradients, convexity, and ALS

Operations Research Lecture 4: Linear Programming Interior Point Method

Lecture Note 18: Duality

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

Transcription:

Linear programming II

Review: LP problem 1/33 The standard form of LP problem is (primal problem): max z = cx s.t. Ax b, x 0 The corresponding dual problem is: min b T y s.t. A T y c T, y 0 Strong Duality Theorem: If the primal problem has an optimal solution, then the dual also has an optimal solution and there is no duality gap.

Complementary Slackness Theorem 2/33 The result obtained from proving the strong duality theorem is a theorem itself called Complementary Slackness Theorem, which states: If x and y are feasible solutions of primal and dual problems, then x and y are both optimal if and only if 1. y T (b Ax ) = 0 2. (y T A c)x = 0 This implies that if a primal constraint is not bounded, its corresponding variables in the dual problem must be 0, and vice versa. This theorem is useful for solving LP problems, and the foundation of another class of LP solver called interior problem method.

Complementary Slackness Theorem example 3/33 Consider our simple LP problem: max z = x 1 + x 2, s.t. x 1 + 2x 2 100 2x 1 + x 2 100 x 1, x 2 0 Its dual problem is: min z = 100y 1 + 100y 2, s.t. y 1 + 2y 2 1 2y 1 + y 2 1 y 1, y 2 0

The the Complementary Slackness Theorem states: y 1 (100 x 1 2x 2 ) = 0 y 2 (100 2x 1 x 2 ) = 0 x 1 (y 1 + 2y 2 1) = 0 x 2 (2y 1 + y 2 1) = 0 At the optimal, we have x = [100/3, 100/3], y = [1/3, 1/3]. The complementary slackness holds, and all the constrains are bounded in both primal and dual problems.

Complementary Slackness Theorem example (cont.) 5/33 Modify the simple LP problem a bit: max z = 3x 1 + x 2, s.t. x 1 + 2x 2 100 2x 1 + x 2 100 x 1, x 2 0 and its dual problem is: min z = 100y 1 + 100y 2, s.t. y 1 + 2y 2 3 2y 1 + y 2 1 y 1, y 2 0

Now the the Complementary Slackness Theorem states: y 1 (100 x 1 2x 2 ) = 0 y 2 (100 2x 1 x 2 ) = 0 x 1 (y 1 + 2y 2 3) = 0 x 2 (2y 1 + y 2 1) = 0 For this LP problem, the optimal solutions are x = [50, 0], y = [0, 1.5]. The complementary slackness still holds. We observe that: In the primal problem: The first constrain is unbounded, so its corresponding variable in the dual problem (y 1 ) has to be 0. The second constrain is bounded, so its corresponding variable in the dual problem (y 2 ) can be non-zero. In the dual problem: The first constrain is bounded, so its corresponding variable in the primal problem (x 1 ) is free. The second constrain is unbounded, so its corresponding variable in the primal problem (x 2 ) has to be 0.

Primal/Dual optimality conditions 7/33 Given the primal and dual problem with slack/surplus variables added: Primal: Dual: max cx min b T y s.t. Ax + w = b, x, w 0 s.t. A T y z = c T, y, z 0 The Complementary Slackness Theorem states that at optimal solution, we should have: x j z j = 0, j, and w i y i = 0, i. To put this in matrix notation, define X = diag(x), which means X is a diagonal matrix with x j as diagonal elements. Define e as a vector of 1 s. Now the complementary conditions can be written as: XZe = 0, WYe = 0.

We wil have the optimality conditions for the primal/dual problems as: Ax + w b = 0 A T y z c T = 0 XZe = 0 WYe = 0 x, y, w, z 0 The first two conditions are simply the constraints for primal/dual problems. The next two are complementary slackness. The last one is the non-negativity constraint. Ignoring the non-negativity constraints, this is a set of 2n + 2m equations with 2n + 2m unknowns (n and m are the number of unknowns and constraints in the primal problem), which can be solved using Newton s method. Such approach is called primal-dual interior point method.

Primal-dual interior point method 9/33 The primal-dual interior point method finds the primal-dual optimal solution (x, y, w, z ) by applying Newton s method to the primal-dual optimality conditions. The direction and length of the steps are modified in each step so that the non-negativity condition is strictly satisfied in each iteration. To be specific, define the following function F : R 2n+2m R 2n+2m : Ax + w b A T y z c T F(x, y, w, z) = XZe WYe The goal is to find solution for F = 0.

Applying Newton s method, if at iteration k the variables are (x k, y k, w k, z k ), we obtain a search direction (δx, δy, δw, δz) by solving the linear equations: δx F (x k, y k, w k, z k δy ) = F(x k, y k, w k, z k ). δw δz Here F is the Jacobian. At iteration k, the equations are: A 0 I 0 δx Ax k w k + b 0 A T 0 I δy A T y k + z k + c T = Z 0 0 X δw X 0 W Y 0 δz k Z k e W k Y k e Then the update will be obtained as: (x k, y k, w k, z k ) + α(δx, δy, δw, δz) with α (0, 1]. α is chosen so that the result from the next iteration is feasible. Given that at current iteration, both primal and dual are strictly feasible, the first two terms on the right hand side are 0.

An improved algorithm 11/33 The algorithm in its current setup is not ideal because often only a small step can be taken before the positivity constraints are violated. A more flexible version is proposed as follow. The value of XZe + WYe represents the duality gap. Instead of trying to eliminate the duality gap, reducing the duality gap by some factor in each step. In order word, we replace the complementary slackness by: XZe = µ x e WYe = µ y e When µ x, µ y 0 as k inf, the solution from this system will converge to the optimal solution of the original LP problem. Easy selections of µ s are µ k x = (x k ) T z/n and µ k y = (w k ) T y/m. Here n and m are dimensions of x and y respectively.

Under the new algorithm, at the kth iteration, the Newton equations become: A 0 I 0 δx 0 0 A T 0 I δy 0 = Z 0 0 X δw X 0 W Y 0 δz k Z k e + µ k xe W k Y k e + µ k ye (1) This provides the general primal-dual interior point method as follow: 1. Choose strictly feasible initial solution (x 0, y 0, w 0, z 0 ), and set k = 0. Then Repeat following two steps until convergence. 2. Solve system (1) to obtain the updates (δx, δy, δw, δz). 3. Update the solution: (x k+1, y k+1, w k+1, z k+1 ) = (x k, y k, w k, z k ) + α k (δx, δy, δw, δz). α k is chosen so that all variables are greater than or equal to 0.

The Barrier Problem 13/33 The interior point algorithm is closely related to the Barrier Problem. Go back to the primal problem: max z = cx s.t. Ax + w = b, x, w 0. The non-negativity constraints can be replaced by adding two barrier terms in the objective function. The barrier term is defined as B(x) = j log x j, which is finite as long as x j is positive. Then the primal problem becomes: max z = cx + µ x B(x) + µ y B(w) s.t. Ax + w = b. The barrier terms make sure x and w won t become negative. Before trying to solve this problem, we need some knowledge about Lagrange multiplier.

Lagrange multiplier 14/33 The method Lagrange multiplier is a general algorithm for optimization problems with equality constraints. For example, consider a problem: max f (x, y) s.t. g(x, y) = c We introduce a new variable λ called Lagrange multiplier and form the following new objective function : L(x, y, λ) = f (x, y) + λ[g(x, y) c] We will then optimize L with respect to x, y and λ using typical method. Note that the condition L/ λ = 0 at optimal solution guarantees that the constraints will be satisfied.

The Barrier Problem (cont.) 15/33 Go back to the barrier problem, the Lagrangian for this problem is (using y as the multiplier): L(x, y, w) = cx + µ x B(x) + µ y B(w) + y T (b w Ax). The optimal solution for the problem satisfies (check this!): c + µ x X 1 e A T y = 0 µ y W 1 e y = 0 b w Ax = 0 Define new variables z = µ x X 1 e and rewrite these conditions, we obtain exactly the same set of equations as the relaxed optimality conditions for primal-dual problem.

Introduction to quadratic programming 16/33 We have discussed linear programming, where both the objective function and constraints are linear functions of the unknowns. The quadratic programming (QP) problem has quadratic objective function and linear constraints: max f (x) = 1 2 xt Bx + cx s.t. Ax b, x 0 The algorithm for solving QP problem is very similar to that for LP. But first we need to introduce the KKT condition.

KKT conditions 17/33 The Karush-Kuhn-Tucker (KKT) conditions are a set of necessary conditions for a solution to be optimal in a general non-linear programming problem. Consider the following problem : max s.t. f (x) g i (x) 0, i = 1,..., I h j (x) = 0, j = 1,..., J The Lagrangian is: L(x, µ, λ) = f (x) i y i g i (x) j z j h j (x). Then at the optimal solution, following KKT conditions must be satisfied: Primal feasibility: g i (x ) 0, h j (x ) = 0. Dual feasibility: y i 0. (what about z j?) Complementary slackness: y i g i (x ) = 0. Stationary: f (x ) i y i g i (x) j z j h j (x) = 0.

Optimal solution for QP 18/33 Following the same procedure, the Lagrangian for the QP problem can be expressed as : L(x, µ, λ) = 1 2 xt Bx + cx y T (Ax b) + z T x. Then the KKT conditions for the QP problem is: Primal feasibility: Ax b, x 0. Dual feasibility: y 0, z 0 (pay attention to the sign of z). Complementary slackness: Y(Ax b) = 0, Zx = 0. Stationary: Bx + c A T y + z = 0. Y and Z are diagonal matrices with y and z at diagonal. This can be solved using the interior-point method.

To be specific, add slack variable w (= b Ax), the optimality conditions become: Ax + w b = 0 Bx + c A T y + z = 0 Zx = 0 Yw = 0 x, y, z, w 0 The unknowns are x, y, z, w. We can then obtain the Jacobians, form the Newton equation and solve for the optimal solution iteratively.

QP in R 20/33 The quadprog package provide functions (solve.qp.compact) to solve quadratic programming problem. Pay attention to the definition of function parameters. They are slightly different from what I have used in the standard form! solve.qp package:quadprog R Documentation Solve a Quadratic Programming Problem Description: This routine implements the dual method of Goldfarb and Idnani (1982, 1983) for solving quadratic programming problems of the form min(-dˆt b + 1/2 bˆt D b) with the constraints AˆT b >= b_0. Usage: solve.qp(dmat, dvec, Amat, bvec, meq=0, factorized=false) Arguments: Dmat: matrix appearing in the quadratic function to be minimized.

dvec: vector appearing in the quadratic function to be minimized. Amat: matrix defining the constraints under which we want to minimize the quadratic function. bvec: vector holding the values of b_0 (defaults to zero). meq: the first meq constraints are treated as equality constraints, all further as inequality constraints (defaults to 0). factorized: logical flag: if TRUE, then we are passing Rˆ(-1) (where D = RˆT R) instead of the matrix D in the argument Dmat.

QP example 22/33 To solve min 1 2 (x2 1 + x2 2 ), s.t. 2x 1 + x 2 1. > Dmat = diag(rep(1,2)) > dvec = rep(0,2) > Amat = matrix(c(2,1)) > b = 1 > solve.qp(dmat=dmat,dvec=rep(0,2),amat=amat, bvec=b) $solution [1] 0.4 0.2 $value [1] 0.1 $unconstrained.solution [1] 0 0 $iterations [1] 2 0 $Lagrangian [1] 0.2 $iact [1] 1

Review 23/33 We have covered in previous two classes: LP problem set up. Simplex method. Duality. Interior point algorithm. Quadratic programming. Now you should be able to formulate a LP/QP problem and solve it. But how are these useful in statistics? Remember LP is essentially an optimization algorithm. There are plenty of optimization problems in statistics, e.g., MLE. It s just a matter of formulating the objective function and constraints.

LP in statistics I: quantile regression 24/33 Motivation: Goal of regression: to tease out the relationship between outcome and covariates. Traditional regression: mean of the outcome depends on covariates. Problem: data are not always well-behaved. Are mean regression methods sufficient in all circumstances? 0 10 20 30 40 2 4 6 8 10 Quantile regression: provides a much more exhaustive description of the data. The collection of regressions at all quantiles would give a complete picture of outcome-covariate relationships.

Quantile regression model 25/33 Regress conditional quantiles of response on the covariates. Assume the outcome Y is continuous and that X is the vector of covariates. Classical model: Q τ (Y X) = Xβ τ Q τ (Y X) is the τ th conditional quantile of Y given X. β τ is the parameter of interest. The above model is equivalent to specifying Y = Xβ τ + ɛ, Q τ (ɛ X) = 0. In comparison, the mean regression is: Y = Xβ + ɛ, E[ɛ X] = 0.

Pros and cons 26/33 Advantages: Regression at a sequence of quantiles provides a more complete view of data. Inference is robust to outliers. Estimation is more efficient when residual normality is highly violated. Allows interpretation in the outcome s original scale of measurement. Disadvantages: To be useful, needs to regress on a set of quantiles: computational burden. Solution has no closed form. Adaptation to non-continuous outcomes is difficult.

The loss function 27/33 Link between estimands and loss functions. To obtain sample mean of {y 1, y 2,..., y n }, minimize i (y i b) 2. To obtain sample median of {y 1, y 2,..., y n }, minimize i y i b. It can be shown that to obtain the sample τ th quantile, one needs to minimize asymmetric absolute loss, that is, compute ˆQ τ (Y) = argmin τ y i b + (1 τ) y i b b. i:y i b For convenience, defined ρ τ (x) = x[τ 1(x < 0)]. i:y i <b 1

Estimator 28/33 The classical linear quantile regression model is fitted by determining n ˆβ τ = argmin ρ τ (y i x i b). b i=1 The estimator have all expected properties: Scale equivariance: ˆβ τ (ay, X) = aˆβ τ (y, X), ˆβ τ ( ay, X) = aˆβ 1 τ (y, X) Shift (or regression) equivariance: ˆβ τ (y + Xγ, X) = ˆβ τ (y, X) + γ Equivariance to reparametrization of design: ˆβ τ (y, XA) = A 1 ˆβ τ (y, X)

Asymmetric Double Exponential (ADE) distribution 29/33 Least-squares estimator MLE if residuals are normal. QR estimator MLE if residuals are ADE. Density function for ADE: f (y; µ, σ, τ) = τ(1 τ) σ exp { ( ρ y µ )} τ σ. ADE(0, 1, 0.25) 5 0 5 10 15 If residuals are iid ADE(0, 1, τ), then the log-likelihood for β τ is n l(β τ ; Y, X, τ) = ρ τ (y i x i β τ ) + c 0 i=1

Model fitting 30/33 The QR question ˆβ τ = argmin ni=1 b ρ τ (y i x i b) can be framed into an LP problem. First define a set of new variables: u i [y i x i b] + v i [y i x i b] b + [b] + b [b] The the problem can be formulated as: n max [τu i + (1 τ)v i ] s.t. i=1 y i = x i b + x i b + u i v i u i, v i 0, i = 1,..., n b +, b 0 This is a standard LP problem can be solved by Simplex/Interior point method.

A little more details 31/33 Written in matrix notation, and make u i, v i, b +, b as unknowns, get b + b max [0, 0, τ, 1 τ] u v b + b s.t. [X, X, I, I] = y u v b +, b, u, v 0 The dual problem is: min s.t. y T d X T X T d I I d is unrestricted 0 0 τ 1 τ

Manipulating the constraints, get X T d = 0 τ d 1 τ Define a new variable a = 1 τ d, the original LP problem can be formulated as: max y T a s.t. X T a = 0 0 a 1 Adding slack variables s for the constraints, the problem can be formulated in the standard form, and can be solved by using either Simplex or Interior Point methods. max y T a s.t. X T a = 0 a + s = 1 a, s 0

In this form, y (n-vector) are outcomes, X (n p matrix) are predictors, a and s (n-vectors) are unknowns. There are 2n unknowns and p + n constraints. Once we have optimal a, d can be obtained given τ (the quantile). Then depending on which constraints are hit in the dual problem, one can determine the set of basic variables in primal problem, and then solve for β.