STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks
Representing Joint Probability Distributions 2 n -1 free parameters
Reducing Number of Parameters: Conditional Independence Conditional independence can reduce the number of parameters 1 parameter 2 parameters 2 parameters
Reducing Number of Parameters: Conditional Independence Z X Y 1 parameter 2 parameters 2 parameters
Example: Naïve Bayes & & L A P F E L: like/dislike A: ambience P: price F: food E: ethnic &
Structure Graph Skeleton for factorization of a joint distribution Representation for a set of conditional independence relations Two are the same
Causal Interpretation
With Probability Tables
Marginals
Causal Reasoning
Causal Reasoning
Evidential Reasoning
Evidential Reasoning
Evidential Reasoning
Explaining Away
Explaining Away
Bayesian Network Model Graphical way to describe a particular chain rule decomposition of the joint distribution Parents in the graph are conditioning variables Factors are conditional probability distributions
Conditional Independence Assumptions What conditional independence assumptions are made?
Representation Theorem Local Markov assumption Each variable is independent of its non-descendants given its parents P factorizes according to G
Representation Theorem G is an I-map for P P could potentially exhibit more CI relations
Representation Theorem X 1 G is an I-map for P X 2 X 5 P could potentially exhibit more CI relations X 3 X 4 Is there an I-map graph for every P?
Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G
Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G Proof: 1. Assume topological order (or reorder) 2. 3. 4. & & &
Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G Homework!
Bayesian Network + DAG: conditioning variables=parents Bayesian network structure = Conditional probability tables Bayesian network parameters Joint distribution as a chain rule of conditional probabilities
Dimensionality Reduction Full probability table Bayesian network (BN) O(2 n ) free parameters O(n2 k ) free parameters (assuming at most k parents per node)
Representation Theorem DAG: conditioning variables=parents Bayesian Network Structure Structure Local Markov assumption Can be more independencies in P Independencies Joint distribution as a chain rule of conditional probabilities
Finding Conditional Independences A B E F H I C G J D Is A E?
Finding Conditional Independences A B E F H I C G J D Is A E B?
Finding Conditional Independences A B E F H I C G J D Is A E B,G? How to convert local Markov properties into conditional independencies?
Simple Case X Y direct connection Is possible to find Z so that for X Y Z? Not always, e.g., Y is a deterministic function of X Verdict: dependent edge is active in the flow of influence active = dependence
Independencies for Three Variables X Z Y active indirect causal effect X blocked Y X Z Y indirect evidential effect active Z common effect X Z Y active common cause
Independencies for Three Variables X Z Y blocked indirect causal effect X active Y X Z Y indirect evidential effect blocked Z common effect X Z Y blocked common cause
Trails A B E F H I C G J D Trail = undirected path
Trails A B E F H I C G J D Trail = undirected path
Active Trails A B E F H I C G J D Active trail = all consecutive triples are active v-triples do not have descendants among conditioning nodes
Finding Active Trails Algorithm 3.1 in the textbook Find the ancestors for evidence nodes (to test v- structures) Breadth-first search (a bit tricky as both up and down directions have to be considered) Instead, play Bayes-Ball (The Rational Pastime)
http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf
http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf
Direction-dependent Separation X and Y are d-separated given Z = no active trails between nodes in X and Y given nodes in Z Denoted by Will examine the set of independencies induced by d-separation How can we formally tie d-separation and conditional independence?
Soundness of d-separation For all P that factorizes according to G G is a BN structure for P Will prove later in the course d-separation in G conditional independence in P G is an I-map for P
Completeness of d-separation What would be a good converse? Faithfullness Does not hold! Relaxing
Completeness of d-separation Relaxing relaxation contraposition
Completeness of d-separation Interpreting the statement Active trail between X and Y given Z A B E F H I X and Y are dependent given Z in some P that factorizes according to G C G J D
Completeness of d-separation Interpreting the statement Active trail between X and Y given Z A E F X and Y are dependent given Z in some P that factorizes according to G C G J D Sketch of proof (by construction): All CPDs not in the trail are uniform (remove nodes and arrows not in the trail Make the remaining dependencies deterministic (xor)
More General Result Soundness Completeness (almost) Intuition: Two binary variables X and Y; 2-d space of possible conditional distributions with a 1-d curve for independence P(Y=1 X=1) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 X Y X Y 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(Y=1 X=0)
Dimensionality Reduction 1 0.9 X Y X Y P(Y=1 X=1) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(Y=1 X=0) Key for dimensionality reduction: finding a parametrization on such a low-dimensional manifold
Equivalent Structures Is DAG structure unique for a set of conditional independencies? X Z Y X Z Y X Z Y What makes DAGs equivalent for CI modeling?
More Formally: I-Equivalence X Z Y X Z Y I-Equivalence: I(G 1 )=I(G 2 ) What makes graphs I-equivalent?
Skeleton A B E F H I C G J D A B E F H I C G J D
Structure Equivalence Is skeleton enough? X Z Y No, because of v-structures What if we add v-structures? skeleton(g 1 )=skeleton(g 2 ) + v-structures(g 1 )=v-structures(g 2 ) Converse? No, e.g., two-different complete DAGs X Z I(G 1 )=I(G 2 ) Y
Structure Equivalence Still not a full characterization Refining the last piece X no edge Y Z Immorality skeleton(g 1 )=skeleton(g 2 ) + immoralities(g 1 )=immoralities(g 2 ) I(G 1 )=I(G 2 )
How To Construct G? A B E C G D?
What Can We Reconstruct? Ideally, want G such that I(G)=I Suppose such graph exists Graph may not be unique n! ordering to search over May not be able to such G Will settle for G an I-map for I, I(G) I But no extra edges (dependencies) Removing edges can only add independencies G is a minimal I-map = removing edges in G adds independencies not in I
How To Construct Minimal I-Map? Assume ordering is given (X 1,,X n ) For each node Find smallest subset so that Add edges from U to X i
Example of Finding I-Maps A B E C D Order=(A,B,E,C,D) What if a different order is chosen?
Summary Bayesian Network = DAG + CPDs Distribution factorizes according to graph (chain rule decomposition) distribution satisfies local independence in graph distribution satisfies global independence in graph (d-separation) D-separation precisely characterizes independencies in the distribution (almost) Convert high-dimensional real-valued space (distribution) to a discrete space (graph) However, graph may not be unique or may not be able to capture the exact set of independencies