STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Probability Review Some slides are taken (or modified) from Carlos Guestrin s 10-708 Probabilistic Graphical Models Fall 2008 at CMU
Graphical Models How to specify the model? What are the variables of interest? What are their values? How likely their combinations are? Need to specify a joint probability distribution But in a compact way Exploit local structure in the domain STAT 598L: Probabilistic Graphical Models (Probability Review) 2
Sample Space and Events Ω : Sample space, result of an experiment If you toss a coin twice Ω = {ΗΗ,ΗΤ,ΤΗ,ΤΤ} Event: a subset of Ω First toss is head = {HH,HT} S: event space, a set of events (σ-algebra): Closed under finite (and countably-infinite) union and complements Entails other binary operation: intersection, set difference, etc. Contains the empty event and Ω STAT 598L: Probabilistic Graphical Models (Probability Review) 3
Probability Measure Defined over (Ω,S) s.t. P(α) 0 for all α in S P(Ω) = 1 If α, β are disjoint, then P(α U β) = P(α) + P(β) We can deduce other properties from the above axioms P(Ø)=0 P(α U β)=p(α)+p(β)-p(α β) (inclusion-exclusion principle) STAT 598L: Probabilistic Graphical Models (Probability Review) 4
Visualization We can go on and define conditional probability, using the above visualization STAT 598L: Probabilistic Graphical Models (Probability Review) 5
Conditional Probability P(F H) = Fraction of worlds in which H is true that also have F true (provided P(H) 0) F H PF ( H) = PF ( H) PH ( ) STAT 598L: Probabilistic Graphical Models (Probability Review) 6
Law of Total Probability B 4 B 5 B 3 B 2 A B 7 B 6 B 1 Given a partition {B 1,,} of Ω p A = P Bi P A B ( ) ( ) ( ) i i STAT 598L: Probabilistic Graphical Models (Probability Review) 7
Chain Rule An equivalent chain for every order of events Foundation of a framework for Bayesian networks STAT 598L: Probabilistic Graphical Models (Probability Review) 8
Bayes Rule Chain Rule: Rule of Total Probability: STAT 598L: Probabilistic Graphical Models (Probability Review) 9
From Events to Random Variable Almost all the semester we will be dealing with r.v.s Concise way of specifying attributes of outcomes Modeling students (Grade and Intelligence): Ω = all possible students What are events Grade_A = all students with grade A Grade_B = all students with grade A Intelligence_High = with high intelligence We need functions that maps from Ω to an attribute space. STAT 598L: Probabilistic Graphical Models (Probability Review) 10
Ω Random Variables I:Intelligence high low G:Grade B A C Val(I)={high,low} Val(G)={A,B,C} STAT 598L: Probabilistic Graphical Models (Probability Review) 11
Ω Random Variables I:Intelligence high low G:Grade B A C P(I = high) = P( {all students whose intelligence is high}) {I -1 (a) a Val(I)} partition of Ω STAT 598L: Probabilistic Graphical Models (Probability Review) 12
Joint Probability Distribution Random variables encodes attributes Not all possible combination of attributes are equally likely Joint probability distributions quantify this P( X= x, Y= y) = P(x, y) How probable is it to observe these two attributes together? STAT 598L: Probabilistic Graphical Models (Probability Review) 13
Joint Probability Distribution P( X= x, Y= y) = P(x, y) P(I,G) G = A G = B G = C I = low 0.1 0.2 0.4 I = high 0.15 0.1 0.05 How probable is it to observe these two attributes together? Generalizes to n random variables How can we manipulate joint probability distributions? STAT 598L: Probabilistic Graphical Models (Probability Review) 14
Conditional Probability events ( x Y y) P X = = = ( = x Y = y) P X ( = y) P Y H P x y ( ) = Pxy (, ) P( y) F Can be computed by marginalization STAT 598L: Probabilistic Graphical Models (Probability Review) 15
Marginalization We know P(X,Y), what is P(X=x)? We can use the Law of Total Probability P x = P x, y ( ) ( ) y B 4 B 5 B 3 B 2 = y P y P x y ( ) ( ) B 7 B 6 A B 1 STAT 598L: Probabilistic Graphical Models (Probability Review) 16
Marginalization We know P(X,Y), what is P(X=x)? P x = P x, y ( ) ( ) P(I,G) G = A G = B G = C P(I) I = low 0.1 0.2 0.4 0.7 I = high 0.15 0.1 0.05 0.3 y P(G) 0.25 0.3 0.45 marginals = written in the margins STAT 598L: Probabilistic Graphical Models (Probability Review) 17
Marginalization More variables? P x ( ) ( ) = = yz, zy, P x, y, z P yz, P x yz, ( ) ( ) Marginalizing (naively) over n-1 variables: O(exp(n-1)) STAT 598L: Probabilistic Graphical Models (Probability Review) 18
Chain Rule Same as in the event case Conditional probabilities can be used to specify the joint distribution (instead of the joint probability tables) STAT 598L: Probabilistic Graphical Models (Probability Review) 19
Bayes Rule We know that P(I=high) =.3 If we also know that the students math grade is A, then how this affects our belief about his intelligence? STAT 598L: Probabilistic Graphical Models (Probability Review) 20
Bayes Rule What if another student grade (in English) becomes available? Can be done in any order STAT 598L: Probabilistic Graphical Models (Probability Review) 21
Joint Probability Assume each variable is binary Need 2 n -1 free parameters for n variables How to interpret, store, estimate for even moderate n? Need to reduce the number of free parameters STAT 598L: Probabilistic Graphical Models (Probability Review) 22
Independence X is independent of Y means that knowing Y does not change our belief about X. P(x y) = P(x) (P(y)>0) Alternatively, P(x, y) = P(x) P(y) The above should hold for all x, y It is symmetric and written as X Y STAT 598L: Probabilistic Graphical Models (Probability Review) 23
CI: Conditional Independence r.v.s are rarely independent but we can still leverage local structural properties like CI. X Y Z if once Z is observed, knowing the value of Y does not change our belief about X The following should hold for all values of X, Y, and Z P(x z, y) = P(x z) P(Y=y Z=z, X=x) = P(Y=y Z=z) P(X=x, Y=y Z=z) = P(X=x Z=z) P(Y=y Z=z) factors STAT 598L: Probabilistic Graphical Models (Probability Review) 24
Characterization of Conditional Independence X Z Y Y Z X Symmetry X Z Y W X Z Y & X Z W Decomposition X Z Y W X Z Y W Weak Union X Z Y W & X Z Y X Z Y W Contraction X Z Y W & X Z Y W X Z Y W Intersection Adopted from [Pearl 88] STAT 598L: Probabilistic Graphical Models (Probability Review) 25