Mixed Integer Programming Approaches for Experimental Design

Size: px

Start display at page:

Download "Mixed Integer Programming Approaches for Experimental Design"

Derick Wilkins
5 years ago
Views:

1 Mixed Integer Programming Approaches for Experimental Design Juan Pablo Vielma Massachusetts Institute of Technology Joint work with Denis Saure DRO brown bag lunch seminars, Columbia Business School New York, NY, October, 2016.

2 Motivation: (Custom) Product Recommendations Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Experimental Design with MIP 1 / 27

5 lb Toubia, Hauser and Dahan (2003) We recommend: Toubia, Hauser, and Simester (2004)

3 Motivation: (Custom) Product Recommendations Feature SX530 RX100 Zoom 50x 3.6x Feature TG-4 G9 Waterproof Yes No Weight ounces 7.5 ounces Weight 7.36 lb 7.5 lb Toubia, Hauser and Dahan (2003) We recommend: Toubia, Hauser, and Simester (2004) Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Experimental Design with MIP 2 / 27

Towards Optimal Product Recommendation Find enough information about preferences to recommend Feature SX530 RX100 Zoom 50x 3.6x Weight 15.68 ounces 7.

4 Towards Optimal Product Recommendation Find enough information about preferences to recommend Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical How do I pick the next (1 st ) question to obtain the largest reduction of uncertainty or variance on preferences Compensatory model estimation (part-worths), not just assortment Experimental Design with MIP 3 / 27

5 Next Question To Reduce Variance : Bayesian Prior Distribution of Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Bayesian Update Posterior Distribution Feature TG-4 Galaxy 2 Waterproof Yes No MCMC Bayesian Update Posterior Distribution Viewfinder Electronic Optical Black-box objective: Question Selection = Enumeration Question selection by Mixed Integer Programming (MIP) Experimental Design with MIP 4 / 27

6 Traveling Salesman Problem (TSP): Visit Cities Fast Paradoxes, Contradictions, and the Limits of Science Many research results define boundaries of what cannot be known, predicted, or described. Classifying these limitations shows us the structure of science and reason. Noson S. Yanofsky S A computer would have to check all these possible routes to find the shortest one. Experimental Design with MIP 5 / 27

7 MIP = Avoid Enumeration Number of tours for 49 cities Fastest supercomputer Assuming one floating point operation per tour: > years times the age of the universe! How long does it take on an iphone? Less than a second! 4 iterations of cutting plane method! = 48!/ flops Dantzig, Fulkerson and Johnson 1954 did it by hand! For more info see tutorial in ConcordeTSP app Cutting planes are the key for effectively solving (even NPhard) MIP problems in practice. Experimental Design with MIP 6 / 27

8 50+ Years of MIP = Significant Solver Speedups Algorithmic Improvements (Machine Independent): CPLEX v1.2 (1991) v11 (2007): 29,000x speedup Gurobi v1 (2009) v6.5 (2015): 48.7x speedup Commercial, but free for academic use (Reasonably) effective free / open source solvers: GLPK, COIN-OR (CBC) and SCIP (only for non-commercial) Easy to use, fast and versatile modeling languages Julia based JuMP modelling language Linear MIP solvers very mature and effective: Convex nonlinear MIP getting there (even MI-SDP!), quadratic nearly there Experimental Design with MIP 7 / 27

Blaster Yes No 0 1 0 @ 1A = x 2 0 I would buy toy

9 Choice-based Conjoint Analysis (CBCA) Feature Chewbacca BB-8 Wookiee Yes No Droid No Yes Blaster Yes No 0 1 1A = x 2 0 I would buy toy Product Profile x 1 x 2 Experimental Design with MIP 8 / 27

10 MNL Preference Model Utilities for 2 products, n features (e.g. n = 12) U 1 = x = X n i=1 i x 1 i + 1 U 2 = x = X n i=1 i x 2 i + 2 part-worths product profile noise (gumbel) Utility maximizing customer: x 1 x 2, U 1 U 2 Noise can result in response error: L x 1 x 2 = P x 1 x 2 = e x1 e x1 + e x2 Experimental Design with MIP 9 / 27

11 Next Question To Reduce Variance : Bayesian Prior Distribution of Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Bayesian Update Posterior Distribution Feature TG-4 Galaxy 2 Waterproof Yes No MCMC Bayesian Update Posterior Distribution Viewfinder Electronic Optical MNL Preference Model Logistic Regression x 1 x 2, z 0 z = x 1 x 2 Experimental Design with MIP 10 / 27

12 Linear Experimental Design Feature TG-4 Galaxy 2 Waterproof Yes No Unknown 2 R n Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical Questions: Z = z 1... z q T 2 R q n z i q i=1 Viewfinder Electronic Optical Answers: Y = T y 1... y q Model: P (Y,Z )=L (Y,Z )= Y q Objective: Choose Z to learn fast i=1 h yi, z i Experimental Design with MIP 11 / 27

13 Bayesian Framework Prior distribution Answer likelihood Posterior distribution N(µ, ) L (Y,Z) g ( Y,Z ) g ( Y,Z )= R R ( ; µ, ) L (Y,Z) ( ; µ, ) L (Y,Z) d fast = minimize posterior variance Experimental Design with MIP 12 / 27

14 Goal = Minimize Expected Posterior Variance f (Z) n f (Z) =E Y (det cov ( Y,Z)) 1/mo Possible solution approaches: g ( Y,Z ) / ( ; µ, ) L (Y,Z) I ( Y,Z) 2 ln g ( Y,Z ) / 2 ln L (Y,Z) cov ( Y,Z) I ˆ Y,Z 1, E N(µ, ) ni ( Y,Z) 1o max Z E Y det I ˆ Y,Z 1/m? Experimental Design with MIP 13 / 27

15 A Really Good Case = Linear Regression n f (Z) =E Y (det cov ( Y,Z)) 1/mo y i = z i + i, i N(0, 1) g ( Y,Z )= ( ; µ 0, 0 ) 0 = var ( Y,Z) = Z T Z min Z f (Z) = max Z det Z T Z + 1 1/m Z discrete MISDP or MISOCP for m = n Experimental Design with MIP 14 / 27

16 A Relatively Good Case = Few Questions n f (Z) =E Y (det cov ( Y,Z)) 1/mo Z = E ( cov ( z i q i=1 Rn, q n n Y,Z)=m Y, µ z i q, z it z jo q i=1 n Y,Z)=M Y, µ z i q, z it z jo q i=1 n f (Z) =f µ z i q, z it z jo q i=1 i,j=1 i,j=1 i,j=1 Experimental Design with MIP 15 / 27

17 A Relatively Good Case = Few Questions n f (Z) =E Y (det cov ( Y,Z)) 1/mo Z = E ( cov ( z i q Only requirements: i=1 Rn, q n n Y,Z)=m Y, µ z i q, z it z jo q N(µ, ) i=1 L (Y,Z)= Y q n Y,Z)=M Y, µ z i q h, z it z jo q i=1 yi, z i i=1 Logistic regression with small q n f (Z) =f µ z i q, z it z jo q i=1 i,j=1 i,j=1 i,j=1 Experimental Design with MIP 16 / 27

18 Question Selection for CBCA x 1 x 2 N(µ, ) x 1 x 2 cov( )= 1 Variance = D-Efficiency: Non-convex function Without previous slide, even evaluation requires dim d ( ) - dimensional integration f x 1,x 2 := E,x 1 / x 2 det( i ) 1/p cov( )= 2 Experimental Design with MIP 17 / 27

19 D-efficiency Simplification for CBCA D-efficiency = Non-convex function distance: d := µ x 1 x 2 f(d, v) variance: v := x 1 x 2 0 x 1 x 2 of Can evaluate f(d, v) with 1-dim integral Experimental Design with MIP 18 / 27

20 Simplification = Trade-off for known criteria ( µ) 0 1 ( µ) apple r Choice balance: Minimize distance to center µ x 1 x 2 µ Postchoice symmetry: Maximize variance of question x 1 x 2 0 x 1 x 2 Experimental Design with MIP 19 / 27

21 Optimization Model min f(d, v) s.t. µ x 1 x 2 = d x 1 x 2 0 x 1 x 2 = v A 1 x 1 + A 2 x 2 apple b linearize x k i xl j x 1 6= x 2 x 1,x 2 2 {0, 1} n Experimental Design with MIP 20 / 27

of f(d, v) Can evaluate f(d, v) with 1-dim integral Piecewise

22 Technique 2: Piecewise Linear Functions D-efficiency = Non-convex function distance: d := µ x 1 x 2 variance: v := x 1 x 2 0 x 1 x 2 of f(d, v) Can evaluate f(d, v) with 1-dim integral Piecewise Linear Interpolation MIP formulation Experimental Design with MIP 21 / 27

23 MIP-based Adaptive Questionnaires Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces Feature TG-4 G9 Waterproof Yes No Weight 7.36 lb 7.5 lb We recommend: Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical E Y,X 1,X 2 cov Y,X 1,X 2 Optimal one-step look-ahead moment-matching approximate Bayesian approach. Experimental Design with MIP 22 / 27

Optimal One-Step Look-Ahead Prior distribution Feature SX530 RX100 Zoom 50x 3.6x Weight 15.68 ounces 7.

24 Optimal One-Step Look-Ahead Prior distribution Feature SX530 RX100 Zoom 50x 3.6x Weight ounces 7.5 ounces min f x 1,x x1,x 2 2 Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical N µ i, i Feature TG-4 Galaxy G9 2 Waterproof Yes No Viewfinder Weight Electronic 7.36 lb Optical 7.5 lb Solve with MIP formulation Experimental Design with MIP 23 / 27

25 Moment-Matching Approximate Bayesian Update Answer likelihood Prior distribution Posterior distribution N µ i, i Feature TG-4 Galaxy 2 Waterproof Yes No Viewfinder Electronic Optical approx. N µ i+1, i+1 µ i+1 = E y, x 1,x 2 i+1 =cov y, x 1,x 2 1-dim integral Experimental Design with MIP 24 / 27

26 Computational Experiments 16 questions, 2 options, 12 and 24 features Simulate MNL responses with known Question Selection MIP-based using CPLEX and open source COIN-OR solver Knapsack-based geometric Heuristic by Toubia et al. Time limits of 1 s and 10 s Metrics: Estimator variance = det cov Y,X 1,X 2 1/2 Estimator distance = E Y,X 1,X 2 2 Computed for true posterior with MCMC Experimental Design with MIP 25 / 27

27 Results for 12 Features, 1 s time limit Estimator Variance Estimator Distance Heuristic (Avg. = 0.04 s, Max = 0.61s) COIN-OR (Avg. = 0.93 s, Max = 1s) CPLEX (Avg. = 0.21 s, Max = 0.48s) Experimental Design with MIP 26 / 27

28 Does it Scale? Results for 24 features Estimator Variance Estimator Distance Heuristic (Avg. = 0.19 s, Max = 3s) CPLEX 1s (Avg. = 1 s, Max = 1s) CPLEX 10s (Avg. = 7.7 s, Max = 10s) Experimental Design with MIP 27 / 27

29 Some improvements for 24 features Estimator Variance Estimator Distance Heuristic (Avg. = 0.19 s, Max = 3s) CPLEX 1s (Avg. = 1 s, Max = 1s) CPLEX 10s (Avg. = 7.7 s, Max = 10s) Experimental Design with MIP 28 / 27

30 Summary and Main Messages Always choose Chewbacca! MIP can now solve challenging problems in practice Even in near-real time Appropriate domain expertise can be crucial for MIP ing Commercial solvers best, but free solvers reasonable Integration into complex systems easy with JuMP Some scalability : get the most out of small data Adaptive Choice-based Conjoint Analysis Improves on existing geometric methods Experimental Design with MIP 29 / 27

Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning

Advanced Mixed Integer Programming Formulations for Non-Convex Optimization Problems in Statistical Learning Juan Pablo Vielma Massachusetts Institute of Technology 2016 IISA International Conference on