Integer programming (part 2) Lecturer: Javier Peña Convex Optimization 10-725/36-725
Last time: integer programming Consider the problem min x subject to f(x) x C x j Z, j J where f : R n R, C R n are convex, and J {1,..., n}. Branch and bound Algorithm to solve integer programs. Divide-and-conquer scheme combined with upper and lower bounds for efficiency. 2
Consider the problem Branch and bound algorithm min x subject to f(x) x C x j Z, j J (IP) where f : R n R, C R n are convex and J {1,..., n}. 1. Solve the convex relaxation min x subject to f(x) x C (CR) 2. (CR) infeasible (IP) is infeasible. Stop 3. Solution x to (CR) is (IP) feasible x solves (IP). Stop 4. Solution x to (CR) not (IP) feasible lower bound for (IP). Branch and recursively solve subproblems. 3
After branching Key component of B&B After branching compute a lower bound for each subproblem. If a lower bound for a subproblem is larger than the current upper bound, no need to consider the subproblem. Most straightforward way to compute lower bounds is via a convex relaxation but other methods (e.g., Lagrangian relaxations) can also be used. 4
Tightness of relaxations Suppose we have two equivalent formulations (e.g., facility location) min x subject to with C C. f(x) x C x j Z, j J min x f(x) subject to x C x j Z, j J Which one should we prefer? 5
Outline Today: More about solution techniques Cutting planes Branch and cut Two extended examples Best subset selection Least mean squares 6
Convexification Consider the special case of an integer program with linear objective min c T x x subject to x C x j Z, j J This problem is equivalent to min x subject to c T x x S where S := conv{x C : x j Z, j J}. 7
Special case: integer linear programs Consider the problem min x subject to c T x Ax b x j Z, j J Theorem If A, b are rational, then the set is a polyhedron. S := conv{x : Ax b, x j Z, j J} Thus the above integer linear program is equivalent to a linear program. How hard could that be? 8
Cutting plane algorithm We say that the inequality π T x π 0 is valid for a set S if Consider the problem π T x π 0 for all x S. min x subject to c T x x C x j Z, j J and let S := conv{x C : x j Z, j J}. 9
Cutting plane algorithm Recall: S = conv{x C : x j Z, j J} and want to solve min x subject to Cutting plane algorithm c T x x C x j Z, j J (IP) 1. let C 0 := C and compute x (0) := argmin{c T x : x C 0 } x 2. for k = 0, 1,... if x (k) is (IP) feasible then x (k) is an optimal solution. Stop else find a valid inequality (π, π 0 ) for S that cuts off x (k) let C k+1 := C k {x : π T x π 0 } compute x (k+1) := argmin x {c T x : x C k+1 } end if end for A valid inequality is also called a cutting plane or a cut 10
11 A bit of history on cutting planes In 1954 Dantzig, Fulkerson, and Johnson pioneered the cutting plane approach for the traveling salesman problem. In 1958 Gomory proposed a general-purpose cutting plane method to solve any integer linear program. For more than three decades Gomory cuts were deemed impractical for solving actual problems. In the early 1990s Sebastian Ceria (at CMU) successfully incorporated Gomory cuts in a branch and cut framework. By the late 1990s cutting planes had become a key component of commercial optimization solvers. This subject has a long tradition at Carnegie Mellon and is associated to some of our biggest names: Balas, Cornuejols, Jeroslow, Kilinc-Karzan and many of their collaborators.
12 Gomory cuts (1958) This class of cuts is based on the following observation: Suppose if a b and a is an integer then a b. S x Zn + : n a j x j = a 0 j=1 where a 0 Z. The Gomory fractional cut is n (a j a j )x j a 0 a 0. j=1 There is a rich variety of extensions of the above idea: Chvatal cuts, split cuts, lift-and-project cuts, etc.
13 Theorem (Balas 1974) Lift-and-project cuts Assume C = {x : Ax b} {x : 0 x j 1}. Then C j :=conv{x C : x j {0, 1}} is the projection of the polyhedron L j (C) defined by the following inequalities Ay λb Az (1 λ)b y + z = x λ 1 λ 0. Suppose x C but x C j. To find a cut separating x from C j solve x x min x,y,z,λ subject to (x, y, z, λ) L j (C)
14 Branch and cut algorithm Combine strengths of both branch and bound and cutting planes. Consider the problem min x subject to f(x) x C x j Z, j J (IP) where f : R n R, C R n are convex and J {1,..., n}.
15 Branch and cut algorithm Branch and cut algorithm 1. Solve the convex relaxation min x subject to f(x) x C (CR) 2. (CR) infeasible (IP) is infeasible. 3. Solution x to (CR) is (IP) feasible x solution to (IP). 4. Solution x to (CR) is not (IP) feasible. Choose between two alternatives 4.1 Add cuts and go back to step 1 4.2 Branch and recursively solve subproblems
16 Integer programming technology State-of-the-art solvers (Gurobi, CPLEX, FICO) rely on extremely efficient implementations of numerical linear algebra, simplex, interior-point, and other algorithms for convex optimization. For mixed integer optimization, most solvers use a clever type of branch and cut algorithm. They rely extensively on convex relaxations. They also take advantage of warm starts as much as possible. Some interesting numbers Speedup in algorithms 1990 2016: over 500,000 Speedup in hardware 1990 2016: over 500,000 Total speedup over 250 billion = 2.5 10 11
17 Best subset selection Assume X = [ x 1 x p] R n p and y R n. Best subset selection problem: 1 min β 2 y Xβ 2 2 subject to β 0 k Here β 0 := number of nonzero entries of β.
18 Best subset selection Integer programming formulation min β,z subject to 1 2 y Xβ 2 2 β i M i z i, i = 1,..., p p z i k i=1 z i {0, 1}, i = 1,..., p. Here M i is some a priori known bound on β i for i = 1,..., p. They can be computed via suitable preprocessing of X, y.
19 A clever way to get good feasible solutions Consider the problem Bertsimas, King, and Mazumder min g(β) subject to β 0 k β where g : R p R is smooth convex and g is L-Lipschitz. Best subset selection corresponds to g(β) = 1 2 Xβ y 2 2. Observation For u R p the vector H k (u) = argmin β u 2 2 β: β 0 k is obtained by retaining the k largest entries of u.
Consider the problem Discrete first-order algorithm Bertsimas et al. min g(β) subject to β 0 k β where g : R p R is smooth convex and g is L-Lipschitz. 1. start with some β (0) 2. for i = 0, 1,..., β (i+1) = H k ( β (i) 1 L g(β(i) ) ) end for The above iterates satisfy β (i) β where ( β = H k β 1 ) g( β). L This is a kind of local solution to the above minimization problem. 20
21 Best Subset Regression Computational results (Bertsimas et al.) Best Subset Regression Dataset, n =350, p Diabetes =64, k Diabetes =6 dataset, n Dataset, = 350, p = n =350, 64, k = 6p =64, k =6 Typical behavior of Overall Algorithm Typical behavior of Overall Algorithm Objective 0.68 0.70 0.72 0.74 0.76 0.78 0.80 Upper Bounds Lower Bounds Global Minimum MIO Gap 0.00 0.05 0.10 0.15 0 100 200 300 400 0 100 200 300 400 (MIT) Statistics via Optimization Bertsimas (MIT) June 2015 12 / 48 Statistics via Optimization J
FIG. 3. The evolution of the MIO optimality gap [in log ( ) scale] for22p with bounds implied by Sections 2.3.1, 2.3.2 and observed that tion (2.5)armed Computational withwarm-startsand results (Bertsimas additional et al.) boundsclosed faster than their Cold Cold Start vs Warm counterpart. Starts
Computational results (Bertsimas et al.) Best Subset Regression Sparsity detection (synthetic datasets) Sparsity Detection for n =500,p =100 40 Method MIO Lasso Step Sparsenet 30 Number of Nonzeros 20 10 0 1.742 3.484 6.967 Signal to Noise Ratio Bertsimas (MIT) Statistics via Optimization June 2015 21 / 48 23
24 Least median of squares regression Assume X = [ x 1 x p] R n p and y R n. Given β R p let r := y Xβ Observe Least squares (LS): β LS := argmin β Least absolute deviation (LAD): β LAD = argmin β Least Median of Squares (LMS) i r 2 i β LMS := argmin(median r i ). β r i i
25 Least quantile regression Least Quantile of Squares (LQS) β LQS := argmin r (q) β where r (q) is the qth ordered absolute residual: r (1) r (2) r (n). Key step in the formulation Use binary and auxiliary variables to encode the relevant condition for each entry i of r. r i r (q) or r i r (q)
26 Least quantile regression Integer programming formulation min β,µ, µ,z,γ subject to γ γ r i + µ i i = 1,..., n γ r i µ i, i = 1,..., n µ i Mz i, i = 1,..., n µ i M(1 z i ), i = 1,..., n p z i = q i=1 µ i, µ i 0, i = 1,..., n z i {0, 1}, i = 1,..., p.
27 First-order algorithm Bertsimas & Mazumder Observe r (q) = y (q) x T (q) β = H q(β) H q+1 (β) where H q (β) = n i=q y (i) x T (i) β = max w subject to n w i y i x T i β i=1 n w i = q i=1 0 w i 1, i = 1,..., n. The function H q (β) is convex. Subgradient algorithm yields a local min to H q (β) H q+1 (β).
Typical Computational Evolution ofresults MIO (Bertsimas & Mazumder) Bertsimas (MIT) Statistics via Optimization June 2015 40 28/ Alcohol Data; (n,p,q) = (44,5,31) Alcohol Data; (n,p,q) = (44,7,31) Objective Value 0.0 0.1 0.2 0.3 0.4 Upper Bounds Lower Bounds Optimal Solution 0.0 0.1 0.2 0.3 0.4 Upper Bounds Lower Bounds Optimal Solution 0 20 40 60 80 0 200 400 600 800 Time (s) Time (s)
act of Warm-Starts Computational results (Bertsimas & Mazumder) Cold vs Warm Starts Evolution of MIO (cold-start) [top] vs (warm-start) [bottom] Objective Value 0 10 20 30 40 50 Upper Bounds Lower Bounds Optimal Solution Objective Value 0 2 4 6 8 10 12 0 500 1000 1500 2000 Time (s) Upper Bounds Lower Bounds Optimal Solution 0 500 1000 1500 2000 Time (s) 29
30 References and further reading Belotti, Kirches, Leyffer, Linderoth, Luedke, and Mahajan (2012), Mixed-integer nonlinear optimization Bertsimas and Mazumder (2016), Best subset selection via a modern optimization lens Bertsimas, King, and Mazumder (2014), Least quantile regression via modern optimization Conforti, Cornuejols, and Zambelli (2014), Integer programming Wolsey (1998), Integer programming