Symbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad

Similar documents
Data Structures for Efficient Inference and Optimization

STA 4273H: Statistical Machine Learning

14 : Theory of Variational Inference: Inner and Outer Approximation

Probabilistic Graphical Models (I)

Based on slides by Richard Zemel

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Chris Bishop s PRML Ch. 8: Graphical Models

Exact Bayesian Pairwise Preference Learning and Inference on the Uniform Convex Polytope

Probabilistic Graphical Models

CS 188: Artificial Intelligence. Bayes Nets

Introduction to Machine Learning Midterm, Tues April 8

Final Exam, Machine Learning, Spring 2009

Directed and Undirected Graphical Models

Bayesian Machine Learning

Symbolic Methods for Probabilistic Inference, Optimization, and Decision-making Scott Sanner

Travelling Salesman Problem

11. Learning graphical models

Qualifier: CS 6375 Machine Learning Spring 2015

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

BBM402-Lecture 20: LP Duality

Kernel methods, kernel SVM and ridge regression

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Bayesian Networks. Motivation

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Going from graphic solutions to algebraic

Binary Decision Diagrams

STA 4273H: Statistical Machine Learning

Learning P-maps Param. Learning

Introduction to Probabilistic Graphical Models

Undirected Graphical Models

Inference in Bayesian Networks

Pattern Recognition and Machine Learning

Variational Inference (11/04/13)

Mathematical Formulation of Our Example

Andrew/CS ID: Midterm Solutions, Fall 2006

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Bayesian Networks BY: MOHAMAD ALSABBAGH

Machine Learning Lecture 14

Graphical Models and Kernel Methods

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Prentice Hall Algebra 1, Foundations Series 2011 Correlated to: Washington Mathematics Standards, Algebra 1 (2008)

Uncertainty and Bayesian Networks

Probabilistic and Bayesian Machine Learning

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning 4771

Machine Learning 4771

Probabilistic Graphical Models

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

4.1 Notation and probability review

p L yi z n m x N n xi

CPSC 540: Machine Learning

4 : Exact Inference: Variable Elimination

Overfitting, Bias / Variance Analysis

Probabilistic Graphical Networks: Definitions and Basic Results

3 : Representation of Undirected GM

Approximate Inference Part 1 of 2

Probabilistic Graphical Models

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Approximate Inference Part 1 of 2

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

STA 4273H: Statistical Machine Learning

CS 5522: Artificial Intelligence II

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

CSCE 478/878 Lecture 6: Bayesian Learning

14 : Theory of Variational Inference: Inner and Outer Approximation

Inference as Optimization

Machine Learning, Midterm Exam

Introduction to Machine Learning Midterm Exam Solutions

Lecture 4 October 18th

Section Notes 9. IP: Cutting Planes. Applied Math 121. Week of April 12, 2010

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Math-2A Lesson 13-3 (Analyzing Functions, Systems of Equations and Inequalities) Which functions are symmetric about the y-axis?

MODULE -4 BAYEIAN LEARNING

Scalable robust hypothesis tests using graphical models

Inference methods in discrete Bayesian networks

Bayesian Networks Inference with Probabilistic Graphical Models

Alternative Parameterizations of Markov Networks. Sargur Srihari

Probabilistic Graphical Models

CS534 Machine Learning - Spring Final Exam

Enumeration. Phong Nguyễn

Decomposable Graphical Gaussian Models

Probabilistic Graphical Models

Artificial Intelligence Bayes Nets: Independence

CS 343: Artificial Intelligence

Notes taken by Graham Taylor. January 22, 2005

PMR Learning as Inference

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bin Packing Problems

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Inference and MCMC

Bayes Networks 6.872/HST.950

BN Semantics 3 Now it s personal! Parameter Learning 1

Transcription:

Symbolic Variable Elimination in Discrete and Continuous Graphical Models Scott Sanner Ehsan Abbasnejad

Inference for Dynamic Tracking No one previously did this inference exactly in closed-form!

Exact Closed-form Continuous Inference Fully Gaussian Most inference including conditional Fully Uniform 1D, n-d hyperrectangular cases General Uniform Piecewise, Asymmetrical Yes, but not a solution you can write on 1 sheet of paper Exact (conditional) inference possible in closed-form?

Already Solved? How is inference done in piecewise models? (Adaptively) discretize model: Approximate, O(N D ) Adaptivity is an artform Projective approximation: variational, EP Choose approximating class Often Gaussian good luck! Sampling: Monte Carlo Gibbs May not converge for (near) deterministic distributions Not for unobserved evidence E[ x o=? ] No expressive, exact, closed-form solutions! Yet.

What has everyone been missing? Symbolic representations and operations on piecewise functions Continuous variant of work started in (Boutilier, Reiter, Price, IJCAI-01)

Piecewise Functions (Cases) z = f(x; y) = Constraint ( (x > 3) ^ (y x) : x + y (x 3) _ (y > x) : x 2 + xy 3 Partition Value Linear constraints and value Linear constraints, constant value Quadratic constraints and value

Formal Problem Statement General continuous graphical models represented by piecewise functions (cases) 8 Á >< 1 : f 1 f =.. >: Á k : For variable elimination, need exact closed-form solution inferred via the following piecewise calculus: f k f 1 f 2, f 1 f 2 max( f 1, f 2 ), min( f 1, f 2 ) x f(x) Question: how do we perform these operations in closed-form?

Polynomial Case Operations:, ( Á 1 : f 1 Á 2 : f 2 ( Ã 1 : g 1 Ã 2 : g 2 =?

Polynomial Case Operations:, ( Á 1 : f 1 Á 2 : f 2 ( Ã 1 : g 1 Ã 2 : g 2 = 8 Á 1 ^ Ã 1 : f 1 + g 1 >< Á 1 ^ Ã 2 : f 1 + g 2 Á 2 ^ Ã 1 : f 2 + g 1 >: Á 2 ^ Ã 2 : f 2 + g 2 Similarly for Polynomials closed under +, * What about max? Max of polynomials is not a polynomial

Polynomial Case Operations: max max à ( Á 1 : f 1 Á 2 : f 2 ; ( à 1 : g 1 à 2 : g 2! =?

Polynomial Case Operations: max 8 Á 1 ^ à 1 ^ f 1 > g 1 : f 1 Á 1 ^ à 1 ^ f 1 g 1 : g 1 max à ( Á 1 : f 1 Á 2 : f 2 ; ( à 1 : g 1 à 2 : g 2! = Á 1 ^ à 2 ^ f 1 > g 2 : f 1 >< Á 1 ^ à 2 ^ f 1 g 2 : g 2 Á 2 ^ à 1 ^ f 2 > g 1 : f 2 Á 2 ^ à 1 ^ f 2 g 1 : g 1 Á 2 ^ à 2 ^ f 2 > g 2 : f 2 >: Á 2 ^ à 2 ^ f 2 g 2 : g 2 Still a piecewise polynomial! Size blowup? We ll get to that

Integration: x x closed for polynomials But how to compute for case? Z x 8 Á >< 1 : f 1.. dx = >: Á k : f k Z x = X i kx [Á i ] f i dx i=1 Z x [Á i ] f i dx Just integrate case partitions, results!

Partition Integral 1. Determine integration bounds Z x [Á 1 ] f 1 dx Á 1 := [x > 1] ^ [x > y 1] ^ [x z] ^ [x y + 1] ^ [y > 0] f 1 := x 2 xy LB := ( y 1 > 1 : y 1 y 1 1 : 1 UB := ( z < y + 1 : z z y + 1 : y + 1 What constraints here? independent of x pairwise UB > LB [Á cons ] UB R x = LB f 1 dx UB and LB are symbolic! How to evaluate?

Definite Integral Evaluation How to evaluate integral bounds? LB := Z UB x=lb ( y 1 > 1 : y 1 y 1 1 : 1 x 2 xy = 1 3 x3 1 2 x2 y UB := UB LB ( z < y + 1 : z z y + 1 : y + 1 Can do polynomial operations on cases! f 1 UB LB 1 = 3 UB UB UB ª 1 UB UB (y) 2 1 ª 3 LB LB LB ª 1 LB LB (y) 2 Symbolically, exactly evaluated!

Exact Graphical Model Inference! (directed and undirected) Can do general probabilistic inference p(x 2 jx 1 ) = R x 3 R x n N k i=1 case i dx n dx 3 R x 2 R x n N k i=1 case i dx n dx 2 Or an exact expectation of any polynomial Z E x»p(xjo) [poly(x)jo] = p(xjo)poly(x)dx poly = Mean, variance, skew, curtosis,, x 2 +y 2 +xy x All computed by Symbolic Variable Elimination (SVE)

Voila: Closed-form Exact Inference via SVE!

Bayesian Robotics D x 1,, x n D: true distance to wall x 1,, x n : measurements want: E[D x 1,,x n-1, x n =?]

Bayesian Robotics: Inference Examples Posterior distance shown in graph Expected distance calculation shown below

Extra: A New Conjugate Prior for Bayesian Inference General Bayesian Inference p( ~ µjd n+1 ) / p(d n+1 j ~ µ)p( ~ µjd n ) Prior and likelihood are case statements then posterior is a case statement

Example: Bayesian Preference Learning for Linear Utilities If prior is uniform over weights and each preference likelihood linearly constrains weights posterior is uniform over convex polytope Inference is equivalent All case statements! to finding volume of a convex polytope!

Computational Complexity? In theory for SVE on graphical models Best-case complexity (#operations) Worst-case complexity is O(exp(#operations)) Not explicitly tree-width dependent! But worse: integral may invoke 100 s of operations! Fortunately data structures mitigate worstcase complexity

BDD / ADDs Quick Introduction

Function Representation (Tables) How to represent functions: B n R? How about a fully enumerated table OK, but can we be more compact? a b c F(a,b,c) 0 0 0 0.00 0 0 1 0.00 0 1 0 0.00 0 1 1 1.00 1 0 0 0.00 1 0 1 1.00 1 1 0 0.00 1 1 1 1.00

Function Representation (Trees) How about a tree? Sure, can simplify. a b c F(a,b,c) 0 0 0 0.00 0 0 1 0.00 0 1 0 0.00 Context-specific independence! c a b 0 1 1 1.00 1 0 0 0.00 1 0 1 1.00 1 0 c 0 1 1 0 0.00 1 1 1 1.00 1 0

Function Representation (ADDs) Why not a directed acyclic graph (DAG)? a b c F(a,b,c) 0 0 0 0.00 a 0 0 1 0.00 0 1 0 0.00 0 1 1 1.00 1 0 0 0.00 c b Algebraic Decision Diagram (ADD) 1 0 1 1.00 1 1 0 0.00 1 1 1 1.00 1 0 Think of BDDs as {0,1} subset of ADD range

Binary Operations (ADDs) Why do we order variable tests? Enables us to do efficient binary operations a b a c Result: ADD operations can avoid state enumeration a b c 0 2 c 1 0 0 2

Case XADD XADD = continuous variable extension of algebraic decision diagram Efficient XADD data structure for cases strict ordering of atomic inequality tests compact, minimal case representation efficient case operations

Case XADD 8 x 1 + k > 100 ^ x 2 + k > 100 : 0 x 1 + k > 100 ^ x 2 + k 100 : x 2 >< x 1 + k 100 ^ x 2 + k > 100 : x 1 V = x 1 + x 2 + k > 100 ^ x 1 + k 100 ^ x 2 + k 100 ^ x 2 > x 1 : x 2 x 1 + x 2 + k > 100 ^ x 1 + k 100 ^ x 2 + k 100 ^ x 2 x 1 : x 1 x 1 + x 2 + k 100 : x 1 + x 2 >:.. *With non-trivial extensions over ADD, can reduce to a minimal canonical form!

Compactness of (X)ADDs Linear in number of decisions i Case version has exponential number of partitions! 1 2 2 3 3 4 4 5 5 1 0

XADD Maximization y > 0 max( y > 0, x > 0 ) = x > 0 x > 0 x y y x x > y May introduce new decision tests x y Operations exploit structure: O( f g )

Maintaining XADD Orderings Max may get decisions out of order Decision ordering (root leaf) x > y y > 0 x > 0 max( y > 0, x > 0 ) = x y y Newly introduced node is out of order! x y > 0 x > 0 x > 0 x > y x y

Correcting XADD Ordering Obtain ordered XADD from unordered XADD key idea: binary operations maintain orderings z is out of order z result will have z in order! ID 1 z 1 0 ID 0 z 0 1 ID 1 ID 0 Inductively assume ID 1 and ID 0 are ordered. All operands ordered, so applying, produces ordered result!

Maintaining Minimality y > 0 y > 0 x > 0 x > 0 x + y < 0 y x y x + y x + y Node unreachable x + y < 0 always false if x > 0 & y > 0 If linear, can detect with feasibility checker of LP solver & prune More subtle prunings as well.

XADD Makes Possible all Previous Inference Could not even do a single integral or maximization without it!

Recap Defined a calculus for piecewise functions f 1 f 2, f 1 f 2 max( f 1, f 2 ), min( f 1, f 2 ) x f(x) See Zamani & Sanner (AAAI-12) for max x f(x), min x f(x) Defined XADD to efficiently compute with cases Makes possible Closed-form inference in continuous graphical models Exact (conditional) expectations of any polynomial New paradigms for Bayesian inference

Piecewise Calculus + XADD = Expressive, Exact, Closed-form Inference in Discrete & Continuous Variable Graphical Models Thank you! Questions?