Learning to Branch. Ellen Vitercik. Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm. Published in ICML 2018

Similar documents
Math Models of OR: Branch-and-Bound

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

The Sample Complexity of Revenue Maximization in the Hierarchy of Deterministic Combinatorial Auctions

Integer and Combinatorial Optimization: Introduction

23. Cutting planes and branch & bound

Integer Programming Part II

Lecture 8: Column Generation

Lecture 8: Column Generation

Mixed Integer Programming Solvers: from Where to Where. Andrea Lodi University of Bologna, Italy

AM 121: Intro to Optimization! Models and Methods! Fall 2018!

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Introduction to Auction Design via Machine Learning

Week Cuts, Branch & Bound, and Lagrangean Relaxation

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano

Generalization and Overfitting

Section #2: Linear and Integer Programming

Decision Procedures An Algorithmic Point of View

MDD-based Postoptimality Analysis for Mixed-integer Programs

15.081J/6.251J Introduction to Mathematical Programming. Lecture 24: Discrete Optimization

Software for Integer and Nonlinear Optimization

Generalization, Overfitting, and Model Selection

Optimization Bounds from Binary Decision Diagrams

Stochastic Integer Programming

Operations Research Lecture 6: Integer Programming

Integer Linear Programming (ILP)

Computational Integer Programming. Lecture 2: Modeling and Formulation. Dr. Ted Ralphs

Linear integer programming and its application

Final Exam, Machine Learning, Spring 2009

Introduction to Integer Programming

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

3. Branching Algorithms COMP6741: Parameterized and Exact Computation

Integer Linear Programming Modeling

CSC373: Algorithm Design, Analysis and Complexity Fall 2017 DENIS PANKRATOV NOVEMBER 1, 2017

Large-scale optimization and decomposition methods: outline. Column Generation and Cutting Plane methods: a unified view

3.10 Column generation method

Welfare Maximization in Combinatorial Auctions

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

Separation Techniques for Constrained Nonlinear 0 1 Programming

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

MTHSC 4420 Advanced Mathematical Programming Homework 2

18 hours nodes, first feasible 3.7% gap Time: 92 days!! LP relaxation at root node: Branch and bound

A first model of learning

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms

3. Branching Algorithms

Introduction to Integer Linear Programming

IE418 Integer Programming

Structured Problems and Algorithms

Generalization, Overfitting, and Model Selection

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

3.10 Column generation method

Some Recent Advances in Mixed-Integer Nonlinear Programming

PART 4 INTEGER PROGRAMMING

Special Classes of Fuzzy Integer Programming Models with All-Dierent Constraints

the tree till a class assignment is reached

Stochastic Decision Diagrams

Lecture 5: Probabilistic tools and Applications II

Empirical Risk Minimization

Comp487/587 - Boolean Formulas

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Data Structures and Algorithms

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

where X is the feasible region, i.e., the set of the feasible solutions.

Introduction to Bin Packing Problems

Distributed Distance-Bounded Network Design Through Distributed Convex Programming

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Gestion de la production. Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Optimisation and Operations Research

CHAPTER 3: INTEGER PROGRAMMING

INTRODUCTION TO DATA SCIENCE

MIRA, SVM, k-nn. Lirong Xia

Welfare Maximization with Friends-of-Friends Network Externalities

16.410/413 Principles of Autonomy and Decision Making

Recoverable Robustness in Scheduling Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

of a bimatrix game David Avis McGill University Gabriel Rosenberg Yale University Rahul Savani University of Warwick

to work with) can be solved by solving their LP relaxations with the Simplex method I Cutting plane algorithms, e.g., Gomory s fractional cutting

19. Fixed costs and variable bounds

Integer Linear Programming

Decision Tree Learning Lecture 2

Decision Trees: Overfitting

1 Maximum Budgeted Allocation

MVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg

APPLIED MECHANISM DESIGN FOR SOCIAL GOOD

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Holdout and Cross-Validation Methods Overfitting Avoidance

THE EXISTENCE AND USEFULNESS OF EQUALITY CUTS IN THE MULTI-DEMAND MULTIDIMENSIONAL KNAPSACK PROBLEM LEVI DELISSA. B.S., Kansas State University, 2014

Algorithms for Learning Good Step Sizes

Information-Based Branching Schemes for Binary Linear Mixed Integer Problems

Computational and Statistical Learning theory

Integer Programming. Wolfram Wiesemann. December 6, 2007

Decision trees COMS 4771

Fundamental Domains for Integer Programs with Symmetries

i=1 = H t 1 (x) + α t h t (x)

Distributed Optimization. Song Chong EE, KAIST

Topics in Approximation Algorithms Solution for Homework 3

3.10 Lagrangian relaxation

arxiv: v3 [cs.ds] 17 May 2017

Basic notions of Mixed Integer Non-Linear Programming

Transcription:

Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1

Integer Programs (IPs) a maximize subject to c x Ax b x {0,1} n 2

Facility location problems can be formulated as IPs. 3

Clustering problems can be formulated as IPs. 4

Binary classification problems can be formulated as IPs. 5

Integer Programs (IPs) a maximize subject to c x Ax = b x {0,1} n NP-hard 6

Branch and Bound (B&B) Most widely-used algorithm for IP-solving (CPLEX, Gurobi) Recursively partitions search space to find an optimal solution Organizes partition as a tree Many parameters CPLEX has a 221-page manual describing 135 parameters You may need to experiment. 7

Why is tuning B&B parameters important? Save time Solve more problems Find better solutions 8

B&B in the real world Delivery company routes trucks daily Use integer programming to select routes Demand changes every day Solve hundreds of similar optimizations Using this set of typical problems can we learn best parameters? 9

Model A 1, b 1, c 1,, A m, b m, c m Application- Specific Distribution B&B parameters Algorithm Designer How to use samples to find best B&B parameters for my domain? 10

Model A 1, b 1, c 1,, A m, b m, c m Application- Specific Distribution B&B parameters Algorithm Designer Model has been studied in applied communities [Hutter et al. 09] 11

Model A 1, b 1, c 1,, A m, b m, c m Application- Specific Distribution B&B parameters Algorithm Designer Model has been studied from a theoretical perspective [Gupta and Roughgarden 16, Balcan et al., 17] 12

Model 1. Fix a set of B&B parameters to optimize 2. Receive sample problems from unknown distribution A 1, b 1, c 1 A 2, b 2, c 2 3. Find parameters with the best performance on the samples Best could mean smallest search tree, for example 13

Questions to address How to find parameters that are best on average over samples? A 1, b 1, c 1 A 2, b 2, c 2 A, b, c? Will those parameters have high performance in expectation? 14

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 15

max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 16

max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 17

B&B max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 1. Choose leaf of tree 18

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 19

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 20

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 x 2 = 0 x 2 = 1 1, 0, 0, 1, 0, 1 2, 1 120 1, 1, 0, 0, 0, 0, 1 3 120 21

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 x 2 = 0 x 2 = 1 1, 0, 0, 1, 0, 1 2, 1 120 1, 1, 0, 0, 0, 0, 1 3 120 22

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1 2, 1 1, 1, 0, 0, 0, 0, 1 3 133.3 116 120 120 23

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1 2, 1 1, 1, 0, 0, 0, 0, 1 3 133.3 116 120 120 24

B&B 1. Choose leaf of tree 2. Branch on a variable max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 0, 1, 0, 1, 0, 1 4, 1 135 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1 2, 1 1, 1, 0, 0, 0, 0, 1 3 133.3 116 120 120 x 3 = 0 x 3 = 1 0, 1, 0, 1, 1, 0, 1 0, 4, 1, 0, 0, 0, 1 5 133 118 25

B&B max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 1. Choose leaf of tree 2. Branch on a variable 3. Fathom leaf if: i. LP relaxation solution is integral ii. LP relaxation is infeasible iii. LP relaxation solution isn t better than bestknown integral solution x 1 = 0 x 1 = 1 0, 1, 0, 1, 0, 1, 1 4 135 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 133.3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1, 1 2 1, 1, 0, 0, 0, 0, 1 3 116 120 120 x 3 = 0 x 3 = 1 0, 1, 0, 1, 1, 0, 1 133 0, 4 5, 1, 0, 0, 0, 1 118 26

B&B max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 1. Choose leaf of tree 2. Branch on a variable 3. Fathom leaf if: i. LP relaxation solution is integral ii. LP relaxation is infeasible iii. LP relaxation solution isn t better than bestknown integral solution x 1 = 0 x 1 = 1 0, 1, 0, 1, 0, 1, 1 4 135 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 133.3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1, 1 2 1, 1, 0, 0, 0, 0, 1 3 116 120 120 x 3 = 0 x 3 = 1 Integral 0, 1, 0, 1, 1, 0, 1 133 0, 4 5, 1, 0, 0, 0, 1 118 27

B&B max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 1. Choose leaf of tree 2. Branch on a variable 3. Fathom leaf if: i. LP relaxation solution is integral ii. LP relaxation is infeasible iii. LP relaxation solution isn t better than bestknown integral solution x 1 = 0 x 1 = 1 0, 1, 0, 1, 0, 1, 1 4 135 1, 3, 0, 0, 0, 0, 1 5 136 x 6 = 0 x 6 = 1 x 2 = 0 x 2 = 1 0, 1, 1, 1, 0, 0, 1 3 133.3 0, 3 5, 0, 0, 0, 1, 1 1, 0, 0, 1, 0, 1, 1 2 1, 1, 0, 0, 0, 0, 1 3 116 120 120 x 3 = 0 x 3 = 1 0, 1, 0, 1, 1, 0, 1 133 0, 4 5, 1, 0, 0, 0, 1 118 28

B&B 1. Choose leaf of tree 2. Branch on a variable 3. Fathom leaf if: i. LP relaxation solution is integral ii. LP relaxation is infeasible iii. LP relaxation solution isn t better than bestknown integral solution This talk: How to choose which variable? (Assume every other aspect of B&B is fixed.) 29

Variable selection policies can have a huge effect on tree size 30

Outline 1. Introduction 2. Branch-and-Bound a. Algorithm Overview b. Variable Selection Policies 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 31

Variable selection policies (VSPs) Score-based VSP: At leaf Q, branch on variable x i maximizing score Q, i 1, 3 5, 0, 0, 0, 0, 1 136 x 2 = 0 x 2 = 1 1, 0, 0, 1, 0, 1 2, 1 120 1, 1, 0, 0, 0, 0, 1 3 120 Many options! Little known about which to use when 32

Variable selection policies For an IP instance Q: Let c Q be the objective value of its LP relaxation Let Q i be Q with x i set to 0, and let Q + i be Q with x i set to 1 Example. Q max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 c Q 33

Variable selection policies For an IP instance Q: Let c Q be the objective value of its LP relaxation Let Q i be Q with x i set to 0, and let Q + i be Q with x i set to 1 Example. Q max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 c Q 0, 1, 0, 1, 0, 1 4, 1 1, 3 5, 0, 0, 0, 0, 1 c Q1 c Q1 + 135 136 34

Variable selection policies For a IP instance Q: The linear rule (parameterized by μ) [Linderoth & Savelsbergh, 1999] Branch Let c Q on be variable the objective x i maximizing: value of its LP relaxation score Q, i = μ min c Q c Qi, c Q c Qi + + (1 μ) max c Q c Qi, c Q c Qi + Let Q i be Q with x i set to 0, and let Q i + be Q with x i set to 1 Example. Q max (40, 60, 10, 10, 3, 20, 60) x s.t. 40, 50, 30, 10, 10, 40, 30 x 100 x {0,1} 7 1 2, 1, 0, 0, 0, 0, 1 140 x 1 = 0 x 1 = 1 c Q 0, 1, 0, 1, 0, 1 4, 1 1, 3 5, 0, 0, 0, 0, 1 c Q1 c Q1 + 135 136 35

Variable selection policies The linear rule (parameterized by μ) [Linderoth & Savelsbergh, 1999] Branch on variable x i maximizing: score Q, i = μ min c Q c Qi, c Q c Qi + + (1 μ) max c Q c Qi, c Q c Qi + The (simplified) product rule [Achterberg, 2009] Branch on variable x i maximizing: score Q, i = c Q c Qi c Q c Qi + And many more 36

Variable selection policies Given d scoring rules score 1,, score d. Goal: Learn best convex combination μ 1 score 1 + + μ d score d. Our parameterized rule Branch on variable x i maximizing: score Q, i = μ 1 score 1 Q, i + + μ d score d Q, i 37

Model A 1, b 1, c 1,, A m, b m, c m Application- Specific Distribution B&B parameters Algorithm Designer How to use samples to find best B&B parameters for my domain? 38

Model A 1, b 1, c 1,, A m, b m, c m Application- Specific Distribution B&B μ 1 parameters,, μ d Algorithm Designer How to use samples to find best B&B μ 1 parameters,, μ d for my domain? 39

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms a. First-try: Discretization b. Our Approach 4. Experiments 5. Conclusion and Future Directions 40

First try: Discretization 1. Discretize parameter space 2. Receive sample problems from unknown distribution 3. Find params in discretization with best average performance Average tree size μ 41

First try: Discretization This has been prior work s approach [e.g., Achterberg (2009)]. Average tree size μ 42

Discretization gone wrong Average tree size μ 43

Discretization gone wrong Average tree size This can actually happen! μ 44

Discretization gone wrong Theorem [informal]. For any discretization: Exists problem instance distribution D inducing this behavior Expected tree size Proof ideas: D s support consists of infeasible IPs with easy out variables B&B takes exponential time unless branches on easy out variables B&B only finds easy outs if uses parameters from specific range μ 45

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms a. First-try: Discretization b. Our Approach i. Single-parameter settings ii. Multi-parameter settings 4. Experiments 5. Conclusion and Future Directions 46

Simple assumption Exists κ upper bounding the size of largest tree willing to build Common assumption, e.g.: Hutter, Hoos, Leyton-Brown, Stützle, JAIR 09 Kleinberg, Leyton-Brown, Lucier, IJCAI 17 47

Useful lemma Much smaller in our experiments! Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b μ [0,1] 48

Useful lemma Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b μ score 1 Q, 1 + (1 μ) score 2 Q, 1 μ score 1 Q, 3 + (1 μ) score 2 Q, 3 branch on x 2 branch on x 3 branch on x 1 μ score 1 Q, 2 + (1 μ) score 2 Q, 2 μ 49

Useful lemma Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b Any μ in yellow interval: branch on x 2 branch on x 3 branch on x 1 μ Q x 2 = 0 x 2 = 1 Q 2 + Q 2 50

Useful lemma Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b branch on x 2 then x 3 branch on x 2 then x 1 μ score 1 Q 2, 1 + (1 μ) score 2 Q 2, 1 μ score 1 Q 2, 3 + (1 μ) score 2 Q 2, 3 μ Any μ in yellow interval: Q x 2 = 0 x 2 = 1 + Q 2 Q 2 51

Useful lemma Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b Any μ in blue-yellow interval: Q x 2 = 0 x 2 = 1 branch on x 2 then x 3 branch on x 2 then x 1 μ + Q 2 Q 2 x 3 = 0 x 3 = 1 52

Useful lemma Lemma: For any two scoring rules and any IP Q, O (# variables) κ+2 intervals partition [0,1] such that: For any interval [a, b], B&B builds same tree across all μ a, b branch on x 2 then x 3 branch on x 2 then x 1 μ Proof idea. Continue dividing [0,1] into intervals s.t.: In each interval, var. selection order fixed Can subdivide only finite number of times Proof follows by induction on tree depth 53

Learning algorithm Input: Set of IPs sampled from a distribution D For each IP, set μ = 0. While μ < 1: 1. Run B&B using μ score 1 + (1 μ) score 2, resulting in tree T 2. Find interval μ, μ where if B&B is run using the scoring rule μ score 1 + 1 μ score 2 for any μ μ, μ, B&B will build tree T (takes a little bookkeeping) 3. Set μ = μ Return: Any μ from the interval minimizing average tree size μ [0,1] 54

Learning algorithm guarantees Let μƹ be algorithm s output given O κ3 W.h.p., E Q~D [tree-size(q, μ)] Ƹ μ [0,1] ln(#variables) samples. ε 2 min E Q~D[tree size(q, μ)] < ε μ 0,1 Proof intuition: Bound algorithm class s intrinsic complexity (IC) Lemma bounds the number of truly different parameters Parameters that are the same come from a simple set Learning theory allows us to translate IC to sample complexity 55

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms a. First-try: Discretization b. Our Approach i. Single-parameter settings ii. Multi-parameter settings 4. Experiments 5. Conclusion and Future Directions 56

Useful lemma: higher dimensions Lemma: For any d scoring rules and any IP, a set H of O (# variables) κ+2 hyperplanes partitions 0,1 d s.t.: For any connected component R of 0,1 d H, B&B builds the same tree across all μ R 57

Learning-theoretic guarantees Fix d scoring rules and draw samples Q 1,, Q N ~D If N = O κ3 ln(d #variables), then w.h.p., for all μ ε 2 [0,1]d, 1 N N i=1 tree size(q i, μ) E Q~D [tree size(q, μ)] < ε Average tree size generalizes to expected tree size 58

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 59

Experiments: Tuning the linear rule Let: score 1 Q, i = min c Q c Qi, c Q c Qi + score 2 Q, i = max c Q c Qi, c Q c Qi + This is the linear rule [Linderoth & Savelsbergh, 1999] Our parameterized rule Branch on variable x i maximizing: score Q, i = μ score 1 Q, i + (1 μ) score 2 Q, i 60

Experiments: Combinatorial auctions Regions generator: 400 bids, 200 goods, 100 instances Arbitrary generator: 200 bids, 100 goods, 100 instances Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000. 61

Additional experiments Clustering: 5 clusters, 35 nodes, 500 instances Facility location: 70 facilities, 70 customers, 500 instances Agnostically learning linear separators: 50 points in R 2, 500 instances 62

Outline 1. Introduction 2. Branch-and-Bound 3. Learning algorithms 4. Experiments 5. Conclusion and Future Directions 63

Conclusion Study B&B, a widely-used algorithm for combinatorial problems Show how to use ML to weight variable selection rules First sample complexity bounds for tree search algorithm configuration Unlike prior work [Khalil et al. 16; Alvarez et al. 17], which is purely empirical Empirically show our approach can dramatically shrink tree size We prove this improvement can even be exponential Theory applies to other tree search algos, e.g., for solving CSPs 64

Future directions How can we train faster? Don t want to build every tree B&B will make for every training instance Train on small IPs and then apply the learned policies on large IPs? Other tree-building applications can we apply our techniques to? E.g., building decision trees and taxonomies How can we attack other learning problems in B&B? E.g., node-selection policies Thank you! Questions? 65