Structure Learning in Bayesian Networks (mostly Chow-Liu)

Size: px

Start display at page:

Download "Structure Learning in Bayesian Networks (mostly Chow-Liu)"

Marcus Glenn
6 years ago
Views:

1 Structure Learning in Bayesian Networks (mostly Chow-Liu) Sue Ann Hong /5/007 Chow-Liu Give directions to edges in MST

2 Chow-Liu: how-to e.g. () & () I(X, X ) = x =0 val = 0, I(X i, X j ) = x i,x j ˆP (xi, x j ) log ˆP (x i,x j ) ˆP (x i ) ˆP (x j ) ˆP (x i, x j ) = Count(x i,x j ) m empirical distribution # examples.5 val = 0,, x =0 ˆP (X = x, X = x ) log ˆP (X =x,x =x ) ˆP (X =x ) ˆP (X =x ) e.g. ˆP (X = 0, X = ) = Count(X =0,X =) m Chow-Liu: how-to must reach all nodes tree with the greatest total weight.5 greedily add edges, just make sure it s a tree at every step e.g. Kruskal, Prim

3 Chow-Liu: how-to tree with the greatest total weight greedily add edges, just make sure it s a tree at every step e.g. Kruskal, Prim Chow-Liu: how-to tree with the greatest total weight greedily add edges, just make sure it s a tree at every step e.g. Kruskal, Prim

4 Chow-Liu: how-to what if there were negative edges? the algorithm still works but do you want them? *hint hint* tree with the greatest total weight greedily add edges, just make sure it s a tree at every step e.g. Kruskal, Prim Chow-Liu: how-to Give directions to edges in MST pick your favorite node (e.g. sinus??) draw arrows going away from it (e.g. BFS, DFS)

5 Chow-Liu: why it works Give directions to edges in MST Just two questions:. why can we represent data likelihood as sum of I(Xi,Xj) over edges?. why can we pick any direction for edges in the tree?* *as long as it s a tree. why can we represent data likelihood as sum of I(Xi,Xj) over edges?. why can we pick any direction for edges in the tree? data likelihood given (directed) edges logp (D G, θ G ) = m n j= i= logp (x i pa Xi ) information theoretic quantity logp (D G, θ G ) = m n i= (I(X i, P a Xi ) H(X i )) max only part that matters n i= I(X i, P a Xi ) tree! (Pa_Xi = just one other node) => I(Xi,Pa_Xi) = I(Xi,Xj) directed edges? nah. I(Xi,Xj) = I(Xj,Xi) I(X i, X j ) = x i,x j ˆP (xi, x j ) log ˆP (x i,x j ) ˆP (x i ) ˆP (x j )

6 . why can we represent data likelihood as sum of I(Xi,Xj) over edges?. why can we pick any direction for edges in the tree? data likelihood given (directed) edges logp (D G, θ G ) = m n j= i= logp (x i pa Xi ) information theoretic quantity logp (D G, θ G ) = m n i= (I(X i, P a Xi ) H(X i )) max only part that matters n i= I(X i, P a Xi ) tree! (Pa_Xi = just one other node) => I(Xi,Pa_Xi) = I(Xi,Xj) directed edges? nah. I(Xi,Xj) = I(Xj,Xi) so directions don t matter I(X i, X j ) = x i,x j ˆP (xi, x j ) log ˆP (x i,x j ) ˆP (x i ) ˆP (x j ) as long as no v-structures break TAN::Tree-Augmented Naive Bayes NB + Chow-Liu Same old Chow-Liu on features, but with I(Xi,Xj c) instead of I(Xi,Xj) Then learn P(Xi Pa(Xi), c) as before Remember this algorithm for the future c X X X X

7 the usual difficulties In general, NP-hard to learn structure with #parents > try e.g. BIC score: approximation of Bayesian score regularization maximizing still NP-hard Trees - easy to learn: one parent - no v-structures can do this greedy search with completely uncoupled scores Announcing no recitation next week - happy thanksgiving! better sleep...!

Structure Learning: the good, the bad, the ugly

Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform