Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning

Size: px

Start display at page:

Download "Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning"

Anabel Cameron
6 years ago
Views:

1 Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning Juan Pablo Vielma Massachusetts Institute of Technology 2016 IISA International Conference on Statistics. Corvallis, Oregon, August, 2016.

2 (Custom) Product Recommendations via CBCA Feature SX530 RX100 Zoom 50x 3.6x Prize $ $ Weight ounces 7.5 ounces Prefer Feature TG-4 G9 Waterproof Yes No Prize $ $ Weight 7.36 lb 7.5 lb Prefer We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Prize $ $ Viewfinder Electronic Optical Prefer 1 / 22

Towards Optimal Product Recommendation Find enough information about preferences to recommend Feature SX530 RX100 Zoom 50x 3.6x Prize $249.99 $399.99 Weight 15.68 ounces 7.

3 Towards Optimal Product Recommendation Find enough information about preferences to recommend Feature SX530 RX100 Zoom 50x 3.6x Prize $ $ Weight ounces 7.5 ounces Prefer Feature TG-4 G9 Waterproof Yes No Prize $ $ Weight 7.36 lb 7.5 lb Prefer We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Prize $ $ Viewfinder Electronic Optical Prefer How do I pick the next (1 st ) question to obtain the largest reduction of uncertainty or variance on preferences 2 / 22

4 Choice-based Conjoint Analysis Feature Chewbacca BB-8 Wookiee Yes No Droid No Yes Blaster Yes No 0 1 1A = x 2 0 I would buy toy Product Profile x 1 x 2 3 / 22

5 MNL Preference Model Utilities for 2 products, n features (e.g. n = 12) U 1 = x = X n i=1 i x 1 i + 1 U 2 = x = X n i=1 i x 2 i + 2 part-worths product profile noise (gumbel) Utility maximizing customer: x 1 x 2, U 1 U 2 Noise can result in response error: L x 1 x 2 = P x 1 x 2 = e x1 e x1 + e x2 4 / 22

6 Next Question To Reduce Variance : Bayesian Prior Distribution of Feature SX530 RX100 Zoom 50x 3.6x Prize $ $ Weight ounces 7.5 ounces Prefer Bayesian Update Posterior Distribution Feature TG-4 Galaxy 2 Waterproof Yes No Prize $ $ MCMC Bayesian Update Posterior Distribution Viewfinder Electronic Optical 0.10 Prefer Black-box objective: Question Selection = Enumeration Question selection by Mixed Integer Programming (MIP) 5 / 22

7 Avoiding Enumeration with MIP

8 Traveling Salesman Problem (TSP): Visit Cities Fast 7 / 22

9 How about 49 cities? Number of tours Fastest supercomputer Assuming one floating point operation per tour: > years times the age of the universe! How long does it take on an iphone? Less than a second! = 48!/ flops 4 iterations of cutting plane method! Dantzig, Fulkerson and Johnson 1954 did it by hand! For more info see tutorial in ConcordeTSP app Cutting planes are the key for effectively solving (even NPhard) MIP problems in practice. 8 / 22

10 50+ Years of MIP = Significant Solver Speedups Algorithmic Improvements (Machine Independent): CPLEX v1.2 (1991) v11 (2007): 29,000x speedup Gurobi v1 (2009) v6.5 (2015): 48.7x speedup Commercial, but free for academic use (Reasonably) effective free / open source solvers: GLPK, CBC and SCIP (free only for non-commercial) Easy to use, fast and versatile modeling languages Julia based JuMP modelling language Linear MIP solvers very mature and effective: Convex nonlinear MIP getting there (quadratic nearly there) 9 / 22

11 Question Selection with MIP

12 Bayesian Update and Geometric Updates Prior distribution Answer likelihood Posterior distribution N(µ, ) x 1 x 2 f x 1 x 2 ( ; µ, ) f x 1 x 2 = R L x 1 x 2 Multidimensional Integration? ( ; µ, ) L x 1 x 2 R ( ; µ, ) L ( x1 x 2 ) d non-convex on x 1,x 2 2 {0, 1} n 11 / 22

13 D-Efficiency and Posterior Covariance Matrix x 1 x 2 N(µ, ) x 1 x 2 cov( )= 1 Variance = D-Efficiency: Non-convex function f x 1,x 2 := E,x 1 / x 2 det( i ) 1/p Even evaluating expected D-Efficiency for a question requires multidimensional integration cov( )= 2 12 / 22

14 Standard Question Selection Criteria ( µ) 0 1 ( µ) apple r Choice balance: Minimize distance to center µ x 1 x 2 µ Postchoice symmetry: Maximize variance of question x 1 x 2 0 x 1 x 2 13 / 22

15 D-efficiency: Balance Question Trade-off D-efficiency = Non-convex function distance: d := µ x 1 x 2 f(d, v) variance: v := x 1 x 2 0 x 1 x 2 of Can evaluate f(d, v) with 1-dim integral 14 / 22

16 Optimization Model min f(d, v) s.t. µ x 1 x 2 = d x 1 x 2 0 x 1 x 2 = v A 1 x 1 + A 2 x 2 apple b x 1 6= x 2 x 1,x 2 2 {0, 1} n 15 / 22

17 Technique 1: Binary Quadratic x 1,x 2 2 {0, 1} n x 1 x 2 0 x 1 x 2 = v X l i,j apple x l i, X l i,j apple x l j, X l i,j x l i + x l j 1, X l i,j 0 apple X l i,j = x l i x l j (l 2 {1, 2}, i,j 2 {1,...,n}) : W i,j = x 1 i x 2 j : W i,j apple x 1 i, W i,j apple x 2 j, W i,j x 1 i + x 2 j 1, W i,j 0 nx i,j=1 Xi,j 1 + Xi,j 2 W i,j W j,i i,j = v 16 / 22

18 Technique 1: Binary Quadratic x 1,x 2 2 {0, 1} n x 1 6= x 2, kx 1 x 2 k Xi,j l = x l i x l j (l 2 {1, 2}, i,j 2 {1,...,n}) : Xi,j l apple x l i, Xi,j l apple x l j, Xi,j l x l i + x l j 1, Xi,j l 0 W i,j = x 1 i x 2 j : W i,j apple x 1 i, W i,j apple x 2 j, W i,j x 1 i + x 2 j 1, W i,j 0 nx i,j=1 X 1 i,j + X 2 i,j W i,j W j,i 1 17 / 22

19 Technique 2: Piecewise Linear Functions D-efficiency = Non-convex function distance: d := µ x 1 x 2 f(d, v) variance: v := x 1 x 2 0 x 1 x 2 of Can evaluate f(d, v) with 1-dim integral Piecewise Linear Interpolation MIP formulation 18 / 22

20 f(d 3 ) f(d 2 ) f(d 5 ) f(d 1 ) f(d 4 ) Simple Formulation for Univariate Functions z = f(x) d 1 d 2 d 3 d 4 d 5 Size = O (# of segments) x z = X 5 1= X 5 j=1 j=1 dj f(d j ) j j, j 0 19 / 22

21 f(d 3 ) f(d 2 ) f(d 5 ) f(d 1 ) f(d 4 ) Advanced Formulation for Univariate Functions z = f(x) d 1 d 2 d 3 d 4 d 5 Size = O (log 2 # of segments) x z = X 5 1= X 5 j=1 j=1 y 2 {0, 1} 2 dj f(d j ) j j, j 0 20 / 22

22 Computational Performance Advanced formulations provide an computational advantage Advantage is significantly more important for free solvers State of the art commercial solvers can be significantly better that free solvers Still, free is free! Time [s] Time [s] Simple CPLEX GLPK Advanced Simple Advanced 21 / 22

23 Summary and Main Messages Always choose Chewbacca! MIP can solve very challenging problems in practice Commercial solvers best, but free solvers reasonable Easily accessible and integrated into complex systems through the JuMP modeling language github.com/juliaopt/jump.jl Formulations = speed-ups and are (relatively) easy to learn Mixed integer linear programming formulation techniques. J. P. Vielma. SIAM Review 57, pp CBC application: 22 / 22

Advanced Mixed Integer Programming (MIP) Formulation Techniques

Advanced Mixed Integer Programming (MIP) Formulation Techniques Juan Pablo Vielma Massachusetts Institute of Technology Center for Nonlinear Studies, Los Alamos National Laboratory. Los Alamos, New Mexico,