Highly-scalable branch and bound for maximum monomial agreement

Similar documents
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

An Improved Branch-and-Bound Method for Maximum Monomial Agreement

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs

Integer Linear Programming (ILP)

Disconnecting Networks via Node Deletions

is called an integer programming (IP) problem. model is called a mixed integer programming (MIP)

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Integer Programming Part II

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

0-1 Knapsack Problem in parallel Progetto del corso di Calcolo Parallelo AA

to work with) can be solved by solving their LP relaxations with the Simplex method I Cutting plane algorithms, e.g., Gomory s fractional cutting

PetaBricks: Variable Accuracy and Online Learning

Lessons from MIP Search. John Hooker Carnegie Mellon University November 2009

A Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse

Integer Programming. Wolfram Wiesemann. December 6, 2007

INTEGER PROGRAMMING. In many problems the decision variables must have integer values.

IE418 Integer Programming

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Machine Learning 3. week

Integer Linear Programming

Scheduling Lecture 1: Scheduling on One Machine

Decision Procedures An Algorithmic Point of View

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Multivalued Decision Diagrams. Postoptimality Analysis Using. J. N. Hooker. Tarik Hadzic. Cork Constraint Computation Centre

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

Math Models of OR: Branch-and-Bound

Dense Arithmetic over Finite Fields with CUMODP

MVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg

where X is the feasible region, i.e., the set of the feasible solutions.

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

AM 121: Intro to Optimization! Models and Methods! Fall 2018!

Decision Trees: Overfitting

Classification Based on Logical Concept Analysis

Interior-Point versus Simplex methods for Integer Programming Branch-and-Bound

The Separation Problem for Binary Decision Diagrams

Jeffrey D. Ullman Stanford University

Integer Programming and Branch and Bound

Notes on Machine Learning for and

Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

maxz = 3x 1 +4x 2 2x 1 +x 2 6 2x 1 +3x 2 9 x 1,x 2

Novel update techniques for the revised simplex method (and their application)

Decision Diagram Relaxations for Integer Programming

On the exact solution of a large class of parallel machine scheduling problems

DIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core

Computer Sciences Department

Logic, Optimization and Data Analytics

Bilevel Integer Optimization: Theory and Algorithms

The CPLEX Library: Mixed Integer Programming

Optimization Methods in Logic

Lecture 3: Decision Trees

Analyzing the computational impact of individual MINLP solver components

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

18 hours nodes, first feasible 3.7% gap Time: 92 days!! LP relaxation at root node: Branch and bound

Feasibility Pump Heuristics for Column Generation Approaches

The Traveling Salesman Problem: An Overview. David P. Williamson, Cornell University Ebay Research January 21, 2014

Heuristics for Cost-Optimal Classical Planning Based on Linear Programming

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

On Two Class-Constrained Versions of the Multiple Knapsack Problem

Scheduling Home Hospice Care with Logic-Based Benders Decomposition

23. Cutting planes and branch & bound

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions

Data Structures and Algorithms

Decision Tree Learning

Divisible Load Scheduling

Learning Decision Trees

Projection in Logic, CP, and Optimization

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Improving Branch-And-Price Algorithms For Solving One Dimensional Cutting Stock Problem

Lecture 3: Decision Trees

Three-partition Flow Cover Inequalities for Constant Capacity Fixed-charge Network Flow Problems

the tree till a class assignment is reached

Gestion de la production. Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA

Mixed Integer Programming Solvers: from Where to Where. Andrea Lodi University of Bologna, Italy

Large-Scale Behavioral Targeting

WRF performance tuning for the Intel Woodcrest Processor

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Section #2: Linear and Integer Programming

21. Set cover and TSP

Stabilized Branch-and-cut-and-price for the Generalized Assignment Problem

Induction of Decision Trees

3.10 Lagrangian relaxation

Reconnect 04 Introduction to Integer Programming

Decision Tree Learning Lecture 2

Scheduling Lecture 1: Scheduling on One Machine

A Framework for Integrating Optimization and Constraint Programming

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Structured Problems and Algorithms

Decision Diagrams: Tutorial

Introduction to integer programming II

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

The Multiple Checkpoint Ordering Problem

MPI at MPI. Jens Saak. Max Planck Institute for Dynamics of Complex Technical Systems Computational Methods in Systems and Control Theory

SOLVING INTEGER LINEAR PROGRAMS. 1. Solving the LP relaxation. 2. How to deal with fractional solutions?

min3x 1 + 4x 2 + 5x 3 2x 1 + 2x 2 + x 3 6 x 1 + 2x 2 + 3x 3 5 x 1, x 2, x 3 0.

Decision Diagrams for Discrete Optimization

3.10 Column generation method

Transcription:

Highly-scalable branch and bound for maximum monomial agreement Jonathan Eckstein (Rutgers) William Hart Cynthia A. Phillips Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-AC04-94AL85000.!

Classification: Distinguish 2 Classes M vectors v k, each with N binary features/attributes: x i for i = 1 N Each vector can have a weight w i Each vector is a positive or negative example: Feature x 1 x 2 x 3 x 4 class w i v 1 0 0 1 1 + 1.0 v 2 1 0 0 1-2.0 v 3 1 0 1 1-3.0 v 4 0 1 1 0 + 4.0 v 5 1 0 0 0 + 5.0 ObservaAon Matrix A October 23, 2013 Simons Workshop 2

Binary Monomial A binary monomial is a conjunction of binary features: It is equivalent to a binary function: Let J be the set of literals that appear (uncomplemented) Let C be the set of literals that appear complemented A binary monomial covers a vector if m J,C (x) = 1. The vector agrees with the monomial on each selected feature October 23, 2013 Simons Workshop 3

Example: Coverage Uncomplemented variables J = {1} Complemented variables C = {2} Cover(J,C) = {2,3,5} x 1 = 1 and x 2 = 0 Feature x 1 x 2 x 3 x 4 class w i v 1 0 0 1 1 + 1.0 v 2 1 0 0 1-2.0 v 3 1 0 1 1-3.0 v 4 0 1 1 0 + 4.0 v 5 1 0 0 0 + 5.0 October 23, 2013 Simons Workshop 4

Maximum Monomial Agreement Maximize JC : f (J,C) = w( Cover(J,C) Ω + ) w( Cover(J,C) Ω ) Weighted difference between covered + and - examples x 1 = 1 and x 2 = 0 f ({1},{2}) = 5 3 2 = 0 Feature x 1 x 2 x 3 x 4 class w i v 1 0 0 1 1 + 1.0 v 2 1 0 0 1-2.0 v 3 1 0 1 1-3.0 v 4 0 1 1 0 + 4.0 v 5 1 0 0 0 + 5.0 October 23, 2013 Simons Workshop 5

LPBoost Use MMA as a weak learner Use a linear program to find optimal linear combination of the weak learners (column generation) Optimize for gap/separation Dual of the LP gives weights for next MMA round More weight to the harder parts We want to solve MMA exactly (Goldberg,Shan, 2007) max ρ D M i=1 ξ i s.t. y i H i λ +ξ i ρ for i =1,, M U λ u =1 u=1 ξ i, λ u 0 October 23, 2013 Simons Workshop 6

Branch and Bound Branch and Bound is an intelligent (enumerative) search procedure for discrete optimization problems. max x X f (x) Requires subproblem representation and 3 (problem-specific) procedures: Compute an upper bound b(x) x X, b(x) f (x) x X Find a candidate solution Can fail Require that it recognizes feasibility if X has only one point Split a feasible region (e.g. over parameter/decision space) e.g. Add a constraint October 23, 2013 Simons Workshop 7

Branch and Bound Root Problem = original Fathomed U k < L Recursively divide feasible region, prune search when no optimal solution can be in the region. Important: need good bounds New best soluaon L = U k infeasible October 23, 2013 Simons Workshop 8

Solution Quality U 1 U 2 U 3 Global upper bound (maximum over all active problems): U=max k U k Approximation ratio for current incumbent L is L/U. Can stop when L/U is good enough (e.g. 95%) Running to completion proves optimality U 4 U 5 October 23, 2013 Simons Workshop 9

B&B Representation for MMA Subproblem (partial solution) = (J,C,E,F) J are features forced into monomial C are features forced in as complemented E are eliminated features: cannot appear F are free features Any partition of {1,, N} is possible A feasible solution that respects (J,C,E,F) is just (J,C) When F is empty, only one element (leaf) Serial MMA branch-and-bound elements from Eckstein and Goldberg, An improved method for maximum monomial agreement, INFORMS J. Computing, 24(2), 2012. October 23, 2013 Simons Workshop 10

Upper Bound Valid: { ( )} max w( Cover( J, C) Ω + ), w Cover( J,C) Ω Strengthen by considering excluded features E Two vectors inseparable if they agree on all features Creates q(e) equivalence classes i E E x 1 x 2 x 3 x 4 v 1 0 0 1 1 v 2 1 0 0 1 v 3 1 0 1 1 v 4 0 1 1 0 v 5 1 0 0 0 x 1 x 3 x 2 x 4 v 1 0 1 0 1 v 4 0 1 1 0 v 2 1 0 0 1 v 5 1 0 0 0 v 3 1 1 0 1 October 23, 2013 Simons Workshop 11

Upper Bound V η E are vectors in the η th equivalence class All covered or all not covered w + η ( J,C, E) = w V E η Cover( J,C) Ω + w η Stronger upper bound: ( ) = max b J,C, E # % % $ % % &% q( E) ( ) ( ( ) Ω ) ( J,C, E) = w V E η Cover J,C ( w + η ( J,C, E) w η ( J,C, E) ) + η=1 q( E) ( w η ( J,C, E) w + η ( J,C, E) ) + η=1 x 1 x 3 x 2 x 4 v 1 0 1 0 1 v 4 0 1 1 0 v 2 1 0 0 1 v 5 1 0 0 0 v 3 1 1 0 1 E October 23, 2013 Simons Workshop 12

Upper Bound More convenient form: ( ) = max w Cover J,C b J,C b( J,C, E) = b J,C { ( ( ) Ω + ), w( Cover( J,C) Ω )} q( E) ( ) min w η + J,C, E η=1 Compute b(j,c) first J C set intersections If can t fathom, compute second part { ( ), w η ( J,C, E) } Compute equivalence classes with radix sort on non-e features October 23, 2013 Simons Workshop 13

Branching ( J,C, E, F) and f F ( J { f },C, E, F { f }) ( J,C { f }, E, F { f }) ( J,C, E { f }, F { f }) Eckstein, Goldberg considered higher branching factor Branching on 1 feature faster, more nodes October 23, 2013 Simons Workshop 14

Choose branch variable ( J,C, E, F) and f F ( J { f },C, E, F { f }) ( J,C { f }, E, F { f }) ( J,C, E { f }, F { f }) Strong branching: for all f Compute all 3 upper bounds, (b 1,b 2,b 3 ) sorted descending Sort lexicographically, pick smallest. Gives lookahead bound ( ) = min f F % max b J,C, E, F $ & & & & ' $ & % & & ' ( { }) ( { }) ( { }) b J { f },C, E, F f b J,C { f }, E, F f b J,C, E { f }, F f October 23, 2013 Simons Workshop 15

PEBBL Parallel Enumeration and Branch-and-Bound Library Distributed memory (MPI), C++ Goals: Massively parallel (scalable) General parallel Branch & Bound environment Parallel search engine cleanly separated from application and platform Portable Flexible Integrate approximation techniques There are other parallel B&B frameworks: PUBB, Bob, PPBB-Lib, Symphony, BCP, CHiPPS/ALPS, FTH-B&B, and codes for MIP October 23, 2013 Simons Workshop 16

Pebbl s Parallelism (Almost) Free PEBBL serial Core PEBBL parallel Core Serial applicaaon User must Parallel applicaaon Define serial application (debug in serial) Describe how to pack/unpack data (using a generic packing tool) C++ inheritance gives parallel management User may add threads to Share global data Exploit problem-specific parallelism Add parallel heuristics Slide 17

PEBBL Features for Efficient Parallel B&B Efficient processor use during ramp-up (beginning) Integration of heuristics to generate good solutions early Worker/hub hierarchy Efficient work storage/distribution Control of task granularity Load balancing Non-preemptive proportional-share thread scheduler Correct termination Early output Checkpointing October 23, 2013 Simons Workshop 18

PEBBL Ramp-up Tree starts with one node. What to do with 10,000 processors? Serialize tree growth All processors work in parallel on a single node Parallelize Preprocessing Tough root bounds Incumbent Heuristics Splitting decisions (MMA) Strong-branching for variable selection October 23, 2013 Simons Workshop 19

PEBBL Ramp-up Strong branching for variable selection Divide free variables evenly Processors compute bound triples for their free variables All-reduce on best triples to determine branch var All-reduce to compute lookahead bound ( ) = min f F % max b J,C, E, F $ & & & & ' $ & % & & ' ( { }) ( { }) ( { }) b J { f },C, E, F f b J,C { f }, E, F f b J,C, E { f }, F f Note: last element most computation: recompute equivalence classes October 23, 2013 Simons Workshop 20

Crossing over Switch from parallel operations on one node to processing independent subproblems (serially) Work division by processor ID/rank Generally Crossover to parallel with perfect load balance When there are enough subproblems to keep the processors busy When single subproblems cannot effectively use parallelism For MMA: crossover when #open problems = N, the # of features October 23, 2013 Simons Workshop 21

Hubs and Workers Control communication Processor utilization Approximation of serial order Subproblem pools at both the hubs and workers Hubs keep only tokens Subproblem identifier Bound Location (processor, address) W H H W Tree of hubs W W W W October 23, 2013 Simons Workshop 22

Load Balancing Hub pullback Random scattering Rendezvous Hubs determine load (function of quantity and quality) Use binary tree of hubs Determine what processors need more work or better work Exchange work October 23, 2013 Simons Workshop 23

Experiments UC Irvine machine learning repository Hungarian heart disease dataset (M = 294, N = 72) Spam dataset (M= 4601, N = 75) Multiple MMA instances based on boost iteration Later iterations are harder Dropped observations with missing features Binarization of real features (Boros, Hammer, Ibaraki, Kogan) Feature (i,j) is 1 iff x i t j Cannot map an element of and to the same vector Ω + Ω min max t 1 t 2 000 001 011 111 t 3 October 23, 2013 Simons Workshop 24

Red Sky Node: two quad-core Intel Xeon X5570 procs, 48GB shared RAM Full system: 22,528 cores, 132TB RAM General partition: 17,152 cores, 100.5TB RAM Queue wait times OK for 1000s of processors Network: Infiniband, 3D torroidal (one dim small), 10GB/s Red Hat Linux 5, Intel 11.1 C++ compiler (O2), Open MPI 1.4.3 Because subproblem bounding is slow, 128 workers/core October 23, 2013 Simons Workshop 25

Value of ramp up (no enumeration) October 23, 2013 Simons Workshop 26

Number of tree nodes October 23, 2013 Simons Workshop 27

Spam, value of ramp up October 23, 2013 Simons Workshop 28

Spam, tree nodes October 23, 2013 Simons Workshop 29

Comments: Ramp up Using initial synchronous ramp up improves scalability (e.g. 2x processors), reduces tree inflation. Speed up departure point from linear depends on problem difficulty and tree size. Tree inflation is the main contributor to sub-linear speedup Solution times down to 1-3 minutes Spam26: 3 min on 6144 cores, 27 hours on 8 cores For MMA no significant efficiency drop from 1 processor and going to multiple hubs October 23, 2013 Simons Workshop 30

Parallel Enumeration Fundamental in PEBBL: best k, absolute tolerance, relative tolerance, objective threshold Requires: branch-through on leaves and duplicate detection Hash solution to find owning processor For all but best-k independent solution repositories parallel merge sort at end For k-best need to periodically compute cut off objective value October 23, 2013 Simons Workshop 31

Enumeration Experiments Why Enumeration for MMA? MMA is the weak learner for LP-Boost Add multiple columns in column generation In this case, add the best 25 MMA solutions Hungarian Heart Tree size about same More communication Spam Larger tree with enumeration Harder subproblems than Hungarian heart (more observations) October 23, 2013 Simons Workshop 32

Results: Enumeration October 23, 2013 Simons Workshop 33

Results: Enumeration October 23, 2013 Simons Workshop 34

Open-Source Code Available Software freely available (BSD license) PEBBL plus knapsack and MMA examples http://software.sandia.gov/acro ACRO = A Common Repository for Optimizers October 23, 2013 Simons Workshop 35

Thank you! October 23, 2013 Simons Workshop 36