Integer Programming for Bayesian Network Structure Learning

Similar documents
Advances in Bayesian Network Learning using Integer Programming

Integer Programming for Bayesian Network Structure Learning

Bayesian network model selection using integer programming

How matroids occur in the context of learning Bayesian network structure

Towards the holy grail in machine learning

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Cutting Planes in SCIP

An Empirical-Bayes Score for Discrete Bayesian Networks

Integer programming for the MAP problem in Markov random fields

Optimisation and Operations Research

Discrete Optimization 2010 Lecture 7 Introduction to Integer Programming

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Linear Programming. Scheduling problems

Beyond Uniform Priors in Bayesian Network Structure Learning

1 Column Generation and the Cutting Stock Problem

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Analyzing the computational impact of individual MINLP solver components

An Empirical-Bayes Score for Discrete Bayesian Networks

Computational Integer Programming. Lecture 2: Modeling and Formulation. Dr. Ted Ralphs

MVE165/MMG631 Linear and integer optimization with applications Lecture 8 Discrete optimization: theory and algorithms

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:

Branch and Bound for Regular Bayesian Network Structure Learning

Decision Diagram Relaxations for Integer Programming

Cutting Plane Separators in SCIP

Integer Programming ISE 418. Lecture 13. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

The CPLEX Library: Mixed Integer Programming

3.10 Lagrangian relaxation

where X is the feasible region, i.e., the set of the feasible solutions.

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

Tightness of LP Relaxations for Almost Balanced Models

18 hours nodes, first feasible 3.7% gap Time: 92 days!! LP relaxation at root node: Branch and bound

Week Cuts, Branch & Bound, and Lagrangean Relaxation

Optimization Exercise Set n.5 :

IP Cut Homework from J and B Chapter 9: 14, 15, 16, 23, 24, You wish to solve the IP below with a cutting plane technique.

PART 4 INTEGER PROGRAMMING

Math Models of OR: Branch-and-Bound

SOLVING INTEGER LINEAR PROGRAMS. 1. Solving the LP relaxation. 2. How to deal with fractional solutions?

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

CS Lecture 3. More Bayesian Networks

Solving LP and MIP Models with Piecewise Linear Objective Functions

Min-BDeu and Max-BDeu Scores for Learning Bayesian Networks

to work with) can be solved by solving their LP relaxations with the Simplex method I Cutting plane algorithms, e.g., Gomory s fractional cutting

Network Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini

Optimization Exercise Set n. 4 :

is called an integer programming (IP) problem. model is called a mixed integer programming (MIP)

Lecture 6: Graphical Models: Learning

Solving Mixed-Integer Nonlinear Programs

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

23. Cutting planes and branch & bound

3.7 Strong valid inequalities for structured ILP problems

Curriculum Learning of Bayesian Network Structures

Lecture 9: PGM Learning

Towards using the chordal graph polytope in learning decomposable models

Lecture 8: PGM Inference

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 17: Combinatorial Problems as Linear Programs III. Instructor: Shaddin Dughmi

Structure Learning: the good, the bad, the ugly

Fixed-charge transportation problems on trees

Integer Programming ISE 418. Lecture 16. Dr. Ted Ralphs

Learning in Bayesian Networks

Combinatorial Optimization

3.7 Cutting plane methods

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Integer program reformulation for robust branch-and-cut-and-price

Side-chain positioning with integer and linear programming

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Introduction to Integer Linear Programming

Decomposition-based Methods for Large-scale Discrete Optimization p.1

Directed Graphical Models or Bayesian Networks

BBM402-Lecture 20: LP Duality

An Ensemble of Bayesian Networks for Multilabel Classification

21. Solve the LP given in Exercise 19 using the big-m method discussed in Exercise 20.

Machine Learning Summer School

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

Probabilistic Graphical Models

Improving the Randomization Step in Feasibility Pump using WalkSAT

5 Integer Linear Programming (ILP) E. Amaldi Foundations of Operations Research Politecnico di Milano 1

A notion of Total Dual Integrality for Convex, Semidefinite and Extended Formulations

Integer Linear Programming (ILP)

IE418 Integer Programming

Alternative Methods for Obtaining. Optimization Bounds. AFOSR Program Review, April Carnegie Mellon University. Grant FA

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Lagrangian Relaxation in MIP

Lecture 8. Strong Duality Results. September 22, 2008

The Strength of Multi-Row Relaxations

Math 16 - Practice Final

Optimization Methods in Management Science

Constraint-Based Learning Bayesian Networks Using Bayes Factor

Bayesian Networks to design optimal experiments. Davide De March

Learning Bayesian Networks Does Not Have to Be NP-Hard

Lessons from MIP Search. John Hooker Carnegie Mellon University November 2009

Bilevel Integer Optimization: Theory and Algorithms

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

Discrete (and Continuous) Optimization WI4 131

maxz = 3x 1 +4x 2 2x 1 +x 2 6 2x 1 +3x 2 9 x 1,x 2

Part 4. Decomposition Algorithms

3.8 Strong valid inequalities

Decision Procedures An Algorithmic Point of View

FINAL: CS 6375 (Machine Learning) Fall 2014

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories

Transcription:

Integer Programming for Bayesian Network Structure Learning James Cussens Helsinki, 2013-04-09 James Cussens IP for BNs Helsinki, 2013-04-09 1 / 20

Linear programming The Belgian diet problem Fat Sugar Salt Cost Chocolate 4 6 1 5 Chips 6 1 8 4 Needs 12 8 4 y Minimise 5x + 4y, subject to: x, y 0 4x + 6y 12 5x + y 8 x + 8y 4 x, y R x James Cussens IP for BNs Helsinki, 2013-04-09 2 / 20

Linear programming Solving an LP using SCIP presolved problem has 2 variables (0 bin, 0 int, 0 impl, 2 cont) and 3 constraints LP iter cols rows dualbound primalbound gap 0 0 0 -- 5.2000e+01 Inf 2 2 3 1.0625e+01 5.2000e+01 389.41% 2 2 3 1.0625e+01 1.0625e+01 0.00% chocolate 1.125 chips 1.25 James Cussens IP for BNs Helsinki, 2013-04-09 3 / 20

Integer linear programming Discrete dieting Fat Sugar Salt Cost Chocolate 4 6 1 5 Chips 6 1 8 3 Needs 12 8 4 y Minimise 5x + 4y, subject to: x, y 0 4x + 6y 12 5x + y 8 x + 8y 4 x, y Z x James Cussens IP for BNs Helsinki, 2013-04-09 4 / 20

Integer linear programming Discrete dieting Finding a solution Fat Sugar Salt Cost Chocolate 4 6 1 5 Chips 6 1 8 3 Needs 12 8 4 y Minimise 5x + 4y, subject to: x, y 0 4x + 6y 12 5x + y 8 x + 8y 4 x, y Z x James Cussens IP for BNs Helsinki, 2013-04-09 4 / 20

Integer linear programming Discrete dieting Finding the solution Fat Sugar Salt Cost Chocolate 4 6 1 5 Chips 6 1 8 3 Needs 12 8 4 y Minimise 5x + 4y, subject to: x, y 0 4x + 6y 12 5x + y 8 x + 8y 4 x, y Z x James Cussens IP for BNs Helsinki, 2013-04-09 4 / 20

Integer linear programming Solving an IP using SCIP presolved problem has 2 variables (0 bin, 2 int, 0 impl, 0 cont) and 3 constraints LP iters cols rows cuts dualbound p bnd gap 0 0 0 0 -- 5.2e+01 Inf 0 2 3 0 -- 2.0e+01 Inf 2 2 3 0 1.0625e+01 2.0e+01 88.24% 2 2 3 0 1.0625e+01 1.8e+01 69.41% 2 2 3 0 1.0625e+01 1.3e+01 22.35% 3 2 4 1 1.3000e+01 1.3e+01 0.00% 3 2 4 1 1.3000e+01 1.3e+01 0.00% James Cussens IP for BNs Helsinki, 2013-04-09 5 / 20

Integer linear programming Cutting planes James Cussens IP for BNs Helsinki, 2013-04-09 6 / 20

Integer linear programming Cutting planes James Cussens IP for BNs Helsinki, 2013-04-09 6 / 20

Integer linear programming Branch-and-bound James Cussens IP for BNs Helsinki, 2013-04-09 7 / 20

Integer linear programming Branch-and-bound James Cussens IP for BNs Helsinki, 2013-04-09 7 / 20

Integer linear programming (Mixed) Integer (linear) programming All variables x i have numeric values. Binary (0/1), Integer and Real ILP: Linear objective function c i x i Pure ILP: All constraints are linear. James Cussens IP for BNs Helsinki, 2013-04-09 8 / 20

BN learning with IP Encoding graphs with binary family variables Suppose there are p fully-observed discrete variables V in some dataset. We want to find a MAP BN (with p vertices) conditional on this data. Can encode any graph by creating a binary variable I (u W ) for each BN variable u V and each candidate parent set W. James Cussens IP for BNs Helsinki, 2013-04-09 9 / 20

BN learning with IP Encoding graphs with binary family variables Suppose there are p fully-observed discrete variables V in some dataset. We want to find a MAP BN (with p vertices) conditional on this data. Can encode any graph by creating a binary variable I (u W ) for each BN variable u V and each candidate parent set W. Assume known parameters (pedigrees) or Dirichlet parameter priors (general BN) and a uniform (or at least decomposable ) structural prior. Each I (u W ) has a local score c(u, W ). Instantiate the I (u W ) to maximise: u,w c(u, W )I (u W ) subject to the I (u W ) representing a DAG. James Cussens IP for BNs Helsinki, 2013-04-09 9 / 20

BN learning with IP Ruling out non-dags with linear constraints u V : W I (u W ) = 1 Where C V : u C W :W C= I (u W ) 1 (1) Let x be the solution to the LP relaxation. We search for a cluster C such that x violates (1) and then add (1) to get a new LP. Repeat as long as a cutting plane can be found. These constraints introduced by Jaakkola et al, AISTATS2010 [Jaakkola et al., 2010]. James Cussens IP for BNs Helsinki, 2013-04-09 10 / 20

BN learning with IP Time for a demo Do: Pigs 100 1 2 alarm James Cussens IP for BNs Helsinki, 2013-04-09 11 / 20

Improving efficiency Primal heuristic Primal heuristics in IP A good (typically suboptimal) solution helps prune the search tree. Can also help in the root search node due to reduced cost strengthening. If we fail to solve to optimality even more important to have a reasonable solution. James Cussens IP for BNs Helsinki, 2013-04-09 12 / 20

Improving efficiency Primal heuristic Sink finding As we learn from [Silander and Myllymäki, 2006]... Every DAG has at least one sink node (node with no children). For this node we can choose the best parents without fear of creating a cycle. Once a sink v p selected from V then just worry about learning the best BN with nodes V \ {v p }. Basically the same as finding the best total order (in reverse). James Cussens IP for BNs Helsinki, 2013-04-09 13 / 20

Improving efficiency Primal heuristic Using the LP solution to find sinks I (1 W 1,1 ) I (1 W 1,2 )... I (1 W 1,k1 ) I (2 W 2,1 ) I (2 W 2,2 )... I (2 W 2,k2 ) I (3 W 3,1 ) I (3 W 3,2 )... I (3 W 3,k3 )............ I (p W p,1 ) I (p W p,2 )... I (p W p,kp ) For each variable, order its parent set choices from best to worst. James Cussens IP for BNs Helsinki, 2013-04-09 14 / 20

Improving efficiency Primal heuristic Using the LP solution to find sinks I (1 W 1,1 ) I (1 W 1,2 )... I (1 W 1,k1 ) I (2 W 2,1 ) I (2 W 2,2 )... I (2 W 2,k2 ) I (3 W 3,1 ) I (3 W 3,2 )... I (3 W 3,k3 )............ I (p W p,1 ) I (p W p,2 )... I (p W p,kp ) (With only the acyclicity constraint) for an optimal BN at least one BN variable has its best parent set selected. James Cussens IP for BNs Helsinki, 2013-04-09 14 / 20

Improving efficiency Primal heuristic Using the LP solution to find sinks I (1 W 1,1 ) I (1 W 1,2 )... I (1 W 1,k1 ) I(2 W 2,1 ) I (2 W 2,2 )... I (2 W 2,k2 ) I (3 W 3,1 ) I (3 W 3,2 )... I (3 W 3,k3 )............ I (p W p,1 ) I (p W p,2 )... I (p W p,kp ) (With only the acyclicity constraint) for an optimal BN at least one BN variable has its best parent set selected. Let x be the LP solution and suppose xi (2 W 2,1 ) is closer to 1 than the best parent set choice for any other variable. James Cussens IP for BNs Helsinki, 2013-04-09 14 / 20

Improving efficiency Primal heuristic Using the LP solution to find sinks I (1 W 1,1 ) I (1 W 1,2 )... I (1 W 1,k1 ) I(2 W 2,1 ) I (2 W 2,2 )... I (2 W 2,k2 ) I (3 W 3,1 ) I (3 W 3,2 )... I (3 W 3,k3 )............ I (p W p,1 ) I (p W p,2 )... I (p W p,kp ) (With only the acyclicity constraint) for an optimal BN at least one BN variable has its best parent set selected. Let x be the LP solution and suppose xi (2 W 2,1 ) is closer to 1 than the best parent set choice for any other variable. Select it. James Cussens IP for BNs Helsinki, 2013-04-09 14 / 20

Improving efficiency Primal heuristic Using the LP solution to find sinks I (1 W 1,1 ) I (1 W 1,2 )... I (1 W 1,k1 ) I(2 W 2,1 ) I (2 W 2,2 )... I (2 W 2,k2 ) I (3 W 3,1 ) I (3 W 3,2 )... I (3 W 3,k3 )............ I (p W p,1 ) I (p W p,2 )... I (p W p,kp ) (With only the acyclicity constraint) for an optimal BN at least one BN variable has its best parent set selected. Let x be the LP solution and suppose xi (2 W 2,1 ) is closer to 1 than the best parent set choice for any other variable. Select it. Suppose 2 is a member of W 1,1, W 3,2, W p,1 and W p,2 James Cussens IP for BNs Helsinki, 2013-04-09 14 / 20

Improving efficiency Primal heuristic Sink finding primal heuristic The BN returned is always best for some total ordering. Basically a greedy search for such a BN near the LP solution (L 1 ). Complications if some IP variables already fixed (due to branching). James Cussens IP for BNs Helsinki, 2013-04-09 15 / 20

Improving efficiency Tightening the LP relaxation Implied constraints The cluster constraints of [Jaakkola et al., 2010] ensure that any integer solution is a DAG :-) But they do not define the convex hull of DAGs. :-( Adding in at least some of the facets of the convex hull (at least those which pass through an optimal DAG) can provide dramatic improvements. SCIP adds some extra cuts automatically (e.g. Gomory). James Cussens IP for BNs Helsinki, 2013-04-09 16 / 20

Improving efficiency Tightening the LP relaxation Towards the convex hull Adding in valid inequalities like this is a big win: +1<t_I(2<-{1,3,9,})> +1<t_I(2<-{3,9,})> +1<t_I(3<-{2,9,33,})> +1<t_I(3<-{2,9,29,})> +1<t_I(3<-{2,9,31,})> +1<t_I(3<-{2,9,15,})> +1<t_I(3<-{2,8,9,})> +1<t_I(3<-{2,9,12,})> +1<t_I(3<-{2,9,})> +1<t_I(9<-{2,3,13,})> +1<t_I(9<-{2,3,})> <= 1 On the Pigs 1000 1 2 problem: With 3 hours Without > 30 hours After re-implementing our propagator now down to 38 minutes. (Coming soon in the next GOBNILP release.) James Cussens IP for BNs Helsinki, 2013-04-09 17 / 20

Improving efficiency Propagation Propagation If, say, I (1 2, 3) and I (4 1) set to 1 then immediately fix e.g. I (2 4) to 0. Compute transitive closure efficiently! James Cussens IP for BNs Helsinki, 2013-04-09 18 / 20

Conditional independence constraints Conditional independence constraints Solutions only checked for conditional independence constraints after they get through integrality and acyclicity constraints. If a DAG does not meet a conditional independence constraint a cutting plane ruling out just that DAG is added. There is also some simple presolving and some obvious simple valid inequalities added at the start. Also in next release. (Do demo with citest.constraints) James Cussens IP for BNs Helsinki, 2013-04-09 19 / 20

Scaling up Column generation Just as one can create constraints on the fly (cutting planes) one can also create variables dynamically (column generation). Think of the not-currently-created variables as being initially fixed to zero. After solving the LP we look for a variable with negative reduced cost. If we can t find one we have all the variables we need for an optimal solution. For reduced cost we need the objective coefficient of the new variable and dual values for all the constraints in which it will appear. James Cussens IP for BNs Helsinki, 2013-04-09 20 / 20

References Jaakkola, T., Sontag, D., Globerson, A., and Meila, M. (2010). Learning Bayesian network structure using LP relaxations. In Proceedings of 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), volume 9, pages 358 365. Journal of Machine Learning Research Workshop and Conference Proceedings. Silander, T. and Myllymäki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In UAI. James Cussens IP for BNs Helsinki, 2013-04-09 20 / 20