Statistical Learning

Size: px
Start display at page:

Download "Statistical Learning"

Transcription

1 Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

2 Bayesian Networks and Graphical Models Bayes nets in a nutshell: Structured probability (density/mass) functions f X (x; θ) Provides a graph-based language/grammar to express conditional independence Allows formalizing the problem of inferring a subset of components of X from another subset thereof Allows formalizing the problem of learning θ from observed i.i.d. realizations of X: x 1,..., x n Bayes nets are one type of graphical models, based on directed graphs. Other types (more on them later): Markov random fields (MRF), based on undirected graphs Factor graphs Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

3 Bayesian Networks: Introduction Notation: we use the more compact p(x) notation instead of the more correct f X (x) For X R n, the pdf/pmf p(x) can be factored by Bayes law: p(x) = p(x 1 x 2,..., x n ) p(x 2,..., x n ). = p(x 1 x 2,..., x n ) p(x 2 x 3,..., x n ) p(x 3,..., x n ) = p(x 1 x 2,..., x n ) p(x n 1 x n ) p(x n ) Of course, this can be done in n! different ways; e.g., p(x) = p(x n x n 1,..., x 1 ) p(x n 1,..., x 1 ). = p(x n x n 1,..., x 1 ) p(x n 1 x n 2,..., x 1 ) p(x n 2,..., x 1 ) = p(x n x n 1,..., x 1 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

4 Bayesian Networks: Introduction For X R n, the pdf/pmf p(x) can be factored by Bayes law: p(x) = p(x n x n 1,..., x 1 ) p(x 2 x 1 ) p(x 1 ) In general, this is not more compact than p(x) Example: if xi {1,..., K}, a general p(x) has K n 1 K n parameters But p(x n x n 1,..., x 1 ) alone has (K 1)K n 1 K n parameters!...unless, there are some conditional independencies Example: X is a Markov chain: p(x i x i 1,..., x 1 ) = p(x i x i 1 ) in this case, p(x) = p(x n x n 1 ) p(x n 1 x n 2 ) p(x 2 x 1 ) p(x 1 ) has n K (K 1) + K n K 2 parameters linear in n, rather than exponential! Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

5 Conditional Independence Bayes nets are built on conditional independence Random variables X and Y are conditional independent, given Z, if f X,Y Z (x, y z) = f X Z (x z)f Y Z (y z) Naturally, X, Y, and Z can be groups of random variables Notation: X Y Z or X = Y Z Equivalent relationship (in short notation): p(x y, z) = p(x, y z) p(y z) = p(x z) p(y z) p(y z) = p(x z) Factorization: if X = (X 1, X 2, X 3 ) and X 1 X 3 X 2, then p(x) = p(x 3 x 2, x 1 ) p(x 2 x 1 ) p(x 1 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

6 Graphical Models Graph-based representations of the joint pdf/pmf p(x) Each node i represents random a variable X i The conditional independence properties are encoded by the presence/absence of edges in the graph Example: X 1 X 3 X 2 is represented by one of the following: p(x) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) p(x) = p(x 3 x 2 ) p(x 1 x 2 ) p(x 2 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) p(x) = p(x 1 x 2 ) p(x 2 x 3 ) p(x 3 ) = p(x 1 x 2 ) p(x 3 x 2 ) p(x 2 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

7 (Directed) Graph Concepts Directed graph G = (V, E), where set of nodes or vertices V = {1,..., V } 1 set of edges E V V ; i.e., each element of E has the form (s, t), with x, t V 2 3 in this context, we assume (v, v) E, v V 4 5 Note: in an undirected graph, each edge is a 2 element multiset; i.e., has the form {u, v}, where u, v V Parents of a node: pa(v) = {s V : (s, v) E} Example: pa(4) = {2, 3} Children of a node: ch(v) = {t V : (v, t) E} Example: ch(3) = {4, 5} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

8 (Directed) Graph Concepts (cont.) Root: a node v s.t. pa(v) = Example: 1 is a root. Leaf: a node v s.t. ch(v) = Example: 4 and 5 are leaves. Reachability: node t is reachable from s if there is a sequence of edges ( (vs1, v t1 ),..., (v sn, v tn ) ) s.t. v s1 = s, v si = v ti 1, i = 2,..., n, and v tn = t Ex: 5 reachable from 1; 2 not reachable from Ancestors of a node: anc(v) = {s : v is reachable from s} Examples: anc(5) = {1, 3}; anc(4) = {1, 2, 3}; anc(1) = Descendants of a node: desc(v) = {t : t is reachable from v} Examples: desc(1) = {2, 3, 4, 5}; desc(2) = {4}; desc(5) = Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

9 (Directed) Graph Concepts (cont.) Neighborhood of a node nbr(v) = {u : (u, v) E (v, u) E} Example: nbr(3) = {1, 4, 5} In-degree of node v is the cardinality of pa(v) Example: the in-degree of 4 is 2 Out-degree of node v is the cardinality of ch(v) Example: the out-degree of 3 is 2 Cycle (or loop): (v 1, v 2,..., v n ) : v 1 = v n, (v i, v i+1 ) E Example: the graph shown above has no loops/cycles Directed acyclic graph (DAG): a directed graph with no loops/cycles. Directed tree: a DAG where each node has 1 or 0 parents. Subgraph of G = (V, E) induced by a subset of nodes S V: G S = (S, E S ), where E S = {(u, v) E : u, v S} Example: G {1,3,5} = ({1, 3, 5}, {(1, 3), (3, 5)}) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

10 Directed Graphical Models (DGM) DGM, a.k.a. Bayesian networks, belief networks, causal networks Consider X = (X 1,..., X V ) with pdf/pmf p(x) = p(x 1,..., x V ) Consider a graph G = (V, E), where V = {1,..., V } G is a DGM for X if (with x S = {x v, v S}) V p(x) = p(x v x pa(v) ) }{{} Example v=1 cond. prob. dist. (CPD) 1 p(x) = p(x 1 ) p(x 2 x 1 ) p(x 3 x 1 ) p(x 4 x 2, x 3 ) p(x 5 x 3 ) 2 3 The DGM is not unique (example in slide 6) 4 5 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

11 Directed Graphical Models: Examples Naïve Bayes classification (generative model): class variable Y {1,..., K}, with prior p(y); class-conditional pdf/pmf p(x y) p(y, x) = p(x y) p(y) = p(y) V p(x v y) v=1 Y X 1 X 2 X 3 X 4 Tree-augmented naïve Bayes (TAN) classification: class variable Y {1,..., K}, with prior p(y); class-conditional pdf/pmf p(x y) Y V p(y, x) = p(y) p(x v x pa(v), y) v=1 where the DAG is a tree. X 1 X2 X 3 X 4 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

12 Directed Graphical Models: More Examples First order Markov chain: p(x) = p(x 1 ) p(x 2 x 1 ) p(x V x V 1 ) x 1 x 2 x 3 Second order Markov chain: p(x) = p(x 1, x 2 ) p(x 3 x 2, x 1 ) p(x V x V 1, x V 2 ) x 1 x 2 x 3 x 4 Hidden Markov model (HMM): (Z, X) = (Z 1,..., Z T, X 1,..., X T ) T p(z, x) = p(z 1 )p(x 1 z 1 ) p(x v z v ) p(z v z v 1 ) v=2 z 1 z 2 z T x 1 x 2 x T Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

13 Inference in Directed Graphical Models Visible and hidden variables: x = (x v, x h ) Joint pdf/pmf p(x v, x h θ) Inferring the hidden variables from the visible ones: p(x h x v, θ) = p(x h, x v θ) p(x v θ) = p(x h, x v θ) p(x h, x v θ) x h...with integral instead of sum, if x h has real components Sometimes, only a subset x q of x h is of interest, x h = (x q, x n ): p(x q x v, θ) = x n p(x q, x n x v, θ)...x n are sometimes called nuisance variables Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

14 Learning in Directed Graphical Models Observe samples x 1,..., x N of N i.i.d. copies of X p(x θ) MAP estimate of θ ( ˆθ = arg max log p(θ) + θ N i=1 ) log p(x i θ) Plate notation: for a collection of i.i.d. copies of some variable θ θ.. X 1 X N X i N If there are hidden variables, x should be understood as denoting only the visible ones Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

15 Learning in Directed Graphical Models (cont.) For N i.i.d. observations D = (x 1,..., x N ), N N V p(d θ) = p(x i θ) = p(x iv x i,pa(v), θ) = i=1 i=1 v=1 where D v is the data associated with node v and θ v the corresponding parameters If the prior factorizes, p(θ) = v p(θ v), the posterior p(θ D) p(θ) p(d θ) = also factorizes V p(θ v )p(d v θ t ) v=1 V p(d v θ v ) v=1 V p(θ v D v ) v=1 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

16 Learning in Directed Graphical Models: Categorical CPD Each x v {1,..., K v } The number of configurations of x pa(v) is C v = s pa(v) Abusing notation, write x pa(v) = c, for c {1,..., C v }, to denote that x pa(v) takes the c-th configuration Parameters: θ vck = P(x v = k x pa(v) = c) (of course K v k=1 θ vck = 1) Denote θ vc = (θ vc1,..., θ vckv ) Counts: N vck = N i=1 1(x iv = k, x i,pa(v) = c) Maximum likelihood estimates: ˆθ vck = N vck Kv j=1 N vcj MAP estimate w/ Dirichlet(α vc ) prior: ˆθ vck = K s N vck + α vck Kv j=1 (N vcj + α vcj ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

17 Conditional Independence Properties Consider X = (X 1,..., X V ) p(x), and V = {1,..., V } Let x A x B x C be a true conditional independence (CI) statement about p(x), where A, B, C are disjoint subsets of V Let I(p) be the set of all (true) CI statements about p(x) Graph G = (V, E) G expresses a collection I(G) of CI statements x A G x B x C (explained below) G is an I-map (independence map) of p(x) if I(G) I(p) Example: if G is fully connected, I(G) =, thus I(G) I(p), for any p Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

18 Conditional Independence Properties (cont.) What type conditional independence (CI) statements are expressed by some graph G? In an path through some node m, there are three possible orientation structures: tail to tail head to tail head to head Before proceeding to the general statement, we next exemplify which type of CI corresponds to each of these structures. Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

19 Conditional Independence Structures Tail to tail X Y Z p(x, y, z) p(x, y z) = p(z) p(x z) p(y z) p(z) = p(z) = p(x z) p(y z) Head to tail X Y Z p(x, y z) = p(y z) p(z x) p(x) p(z) = p(y z) p(x z) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

20 Conditional Independence Structures Head to head p(x, y z) p(x z) p(y z) X Y Classical example: the noisy fuel gauge Binary variables: X = battery OK, Y = full tank, Z = fuel gauge on P(X =1) = P(Y =1) = 0.9 x y P(Z = 1 x, y) Gauge is off (Z = 0); is the tank empty? P(Y = 0 Z = 0) = but, the battery is also dead, P(Y = 0 Z = 0, X = 0) = 0.11 Dead battery explains away the empty tank Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

21 D-Separation Graph G = (V, E) and three disjoint subsets of V: A, B, and C An (undirected) path from a node in A to a node in B is blocked by C if it includes a node m such that m C and the arrows meet head to tail or tail to tail the arrows meet head to head, m C, and desc(m) C = C D-separates A from B and x A G x B x C, if any path from a node in A to a node in B is blocked by C Example: x 4 G x 5 x 1 (tail to tail) x 1 G x 2 x 4 (head to head in 5 {4}) x 5 G x 6 x {2,3} (tail to tail in 2 and head to tail in 3) x 1 G x 2 x 7 is false (head to head in 5, but 7 desc(5)) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

22 Markov Blanket The Markov blanket of node m (mb(m)) is the set of nodes, conditioned on which x m is independent of all other nodes. Which nodes belong to mb(m)? V p(x i x pa(i) ) i=1 p(x m {x j : j m}) = V p(x i x pa(i) ) = x m i=1 j:m j,m pa(j) j:m j,m pa(j) p(x j x pa(j) ) p(x j x pa(j) ) x m...thus mb(m) = pa(m) ch(m) pa(ch(m)) }{{} coparents i mb(m) i mb(m) p(x i x pa(i) ) p(x i x pa(i) ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

23 Undirected Graphical Models (Markov Random Fields) MRF are based on undirected graphs G = (V, E), where each edge {u, v} is a sub-multiset of V of cardinality 2 Conditional independence statements result from simple separation (simpler than D-separation) definition Let A, B, C be disjoint subsets of V If any path from a node in A to a node in B goes through C, it is said that C separates A from B Graph G is an I-map for (X 1,..., X V ) = X p(x) if C separates A from B x A x B x C... a perfect I-map if can be replaced by A complete graph is an I-map for any p(x) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

24 Markov Random Fields (cont.) Neighborhood: N(i) = {j : {i, j} E} In an MRF, the Markov blanket is simply the neighborhood mb(i) = N(i): p(x i x V\{i} ) = p(x i x N(i) ) x i x V\({i} N(i)) x N(i) Clique: set of mutually neighboring nodes Maximal clique: a clique that is not contained in any other clique; C(G) denotes the set of maximal cliques of G Examples: {1, 2} is a (non-maximal) clique; {1, 2, 3, 5} is a not clique (1 and 5 are not neighbors); {1, 2, 3} and {4, 5, 6, 7} are maximal cliques Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May /

25 Hammersley-Clifford Theorem and Gibbs Distributions Let p(x) = p(x 1,..., x V ), such that p(x) > 0, x. Then, G is an I-map for p(x) if and only where p(x) = 1 Z C C(G) ψ C (x C ) = 1 Z exp( Z = x C C(G) ψ C (x C ) C C(G) E C (x C ) ) is the partition function (with integration, rathar than summation, in the case of continuous variables) ψ C is called clique potential; E C is called clique energy This is known in statistical physics as a Gibbs distribution Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

26 Local Conditionals in Gibbs Distributions Consider a Gibbs distribution over a graph G p(x) = 1 Z C C(G) Local conditional distribution p(x i x N(i) ) = = ψ C (x C ) = 1 Z exp ( 1 Z(x N(i) ) C: i C C C(G) ψ C (x C ) 1 ( Z(x N(i) ) exp C: i C ) E C (x C ) ) E C (x C ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

27 Auto Models Auto models: based on 2-node cliques: E C (x C ) = E D (x D ) D: D =2,D C equivalently, ψ C (x C ) = ψ D (x D ) D: D =2,D C Joint distribution has the form p(x) = 1 Z D: D =2,D C C(G) = 1 Z exp ( ψ D (x D ) D: D =2,D C C(G) ) E D (x D ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

28 Auto Models: Gaussian Markov Random Fields (GMRF) X R V, with Gaussian ( p(x) exp 1 ) 2 (x µ)t A(x µ) ( exp 1 2 V V i=1 j=1 ) A ij (x i µ i ) (x j µ j ) where A, the inverse of the covariance matrix, is symmetric Each pair-wise clique {i, j} has energy { Aij (x E {i,j} ({x i, x j }) = i µ i ) (x j µ j ) if i j A ii 2 (x i µ i ) 2 if i = j Neighborhood system: N(i) = {j : A ij 0} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

29 Auto Models: Ising and Potts Fields X { 1, +1} V, with energy function { β xi = x E {i,j} (x i, x j ) = j β x i x j where β > 0 (ferromagnetic interaction) or β < 0 (anti-ferromagnetic interaction) Computing Z is NP-hard in general Generalization to K states: the Potts model, X {1,..., K} V, with energy function { β xi = x E {i,j} (x i, x j ) = j 0 x i x j Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

30 Illustration of Potts Fields Samples of Potts model, with K = 10, Graph is 2D grid; neighborhood of each node is the set of its 4 nearest neighbors β = 1.42 β = 1.44 β = 1.46 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

31 DGM and MRF: The Easy Case Problem: how to write a DGM as an MRF? In some cases, there is a trivial relationship: A Markov chain p(x) = p(x 1 ) p(x 2 x 1 ) p(x N x N 1 ) can obviously be written as an MRF (with Z = 1) p(x) = 1 Z ψ {1,2}(x 1, x 2 ) }{{} p(x 1 ) p(x 2 x 1 ) ψ {2,3} (x 2, x 2 ) }{{} p(x 3 x 2 ) ψ {N,N 1} (x N, x N 1 ) }{{} p(x N x N 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

32 DGM and MRF: The General Case Procedure: 1 Insert undirected arrows between all pairs of parents of each node ( moralization ) 2 Make all edges undirected; 3 Initialize all clique potentials to 1; 4 Take each factor in the DGM and multiply it by the potential of a clique that contains all the involved nodes; Example p(x) = p(x 1 ) p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) }{{} ψ {1,2,3,4} (x 1,x 2,x 3,x 4 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

33 From DGM and MRF: Another Example DGM: p(x) = p(x 1 ) p(x 2 ) p(x 3 x 1, x 2 ) p(x 4 x 3 ) p(x 5 x 3 ) p(x 6 x 4, x 5 ) Cliques: C = { {1, 2, 3}, {3, 4, 5}, {4, 5, 6} } MRF: p(x) = ψ(x 1, x 2, x 3 ) }{{} (p(x 1 ) p(x 2 ) p(x 3 x 1,x 2 )) ψ(x 3, x 4, x 5 ) }{{} (p(x 4 x 3 ) p(x 5 x 3 )) ψ(x 4, x 5, x 6 ) }{{} p(x 6 x 4,x 5 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

34 Efficient Inference Motivating example: compute a marginal p(x n ) in chain graph p(x) = p(x 1,..., x N ) = ψ(x 1, x 2 ) ψ(x 2, x 3 ) ψ(x N 1, x N ) Naïve solution (suppose x 1, x 2,..., x N {1,..., K} and Z = 1) K K K K p(x n ) = p(x 1,..., x N ) x 1 =1 x n 1 =1 x n+1 =1 x N =1 has cost O(K N ) (just computing p(x 1,..., x N ) has cost O(K N ))...the structure of p(x 1,..., x N ) is not being exploited Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

35 Efficient Inference (cont.) Reorder the summations and use the structure K K K K p(x n ) = ψ(x 1, x 2 ) ψ(x N 1, x N ) x n+1=1 x N =1 x n 1=1 x 1=1 K K K = ψ(x n, x n+1 ) ψ(x N 1, x N ) ψ(x N 1, x N ) x n+1=1 x N 1 =1 x N =1 K K K ψ(x n 1, x n ) ψ(x 2, x 3 ) ψ(x 1, x 2 ) x n 1=1 x 2=1 x 1=1 = µ β (x n ) µ α (x n ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) The key is the distributive property: ab + ac = a(b + c) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

36 Efficient Inference (cont.) Can be seen as a message passing µ β (x n ) = K x n+1=1 ψ(x n, x n+1 ) K x N 1 =1 ψ(x N 2, x N 1 ) K x N =1 ψ(x N 1, x N ) } {{ } } {{ µ β (x N 1 ) } µ β (x N 2 ) Formally, right to left messages: µ β (x N ) = 1, for x N {1,..., K} K µ β (x j ) = ψ(x j, x j+1 ) µ β (x j+1 ), for x j {1,..., K} x j+1 =1 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

37 Efficient Inference (cont.) Can be seen as a message passing K µ α (x n ) = ψ(x n 1, x n ) x n 1=1 K ψ(x 2, x 3 ) x 2=1 K ψ(x 1, x 2 ) x 1=1 } {{ } } {{ µ α(x 2) } µ α(x 3) Formally, left to right messages: µ α (x 1 ) = 1, for x 1 {1,..., K} K µ α (x j ) = ψ(x j 1, x j ) µ β (x j 1 ), for x j {1,..., K} x j 1 =1 Known as the sum-product algorithm, a.k.a., belief propagation Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

38 Efficient Inference (cont.) Another example: compute the MAP configuration: max x p(x) p(x) = p(x 1,..., x N ) = ψ(x 1, x 2 ) ψ(x 2, x 3 ) ψ(x N 1, x N ) Naïve solution (suppose x 1, x 2,..., x N {1,..., K} and Z = 1) max x p(x) = max x 1 max max p(x 1,..., x N ) x 2 x N has cost O(K N ) (just computing p(x 1,..., x N ) has cost O(K N ))...the structure of p(x 1,..., x N ) is not being exploited Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

39 Efficient Inference (cont.) Exploit the structure max x p(x) = max x 1 max max ψ(x 1, x 2 ) ψ(x N 1, x N ) x 2 x N ψ(x N 2, x N 1 ) max x N 1 = max ψ(x 1, x 2 ) max x 1 Formally, right to left messages: µ(x N ) = 1, for x N {1,..., K} N x } {{ } ψ(x N 1, x N ) } {{ µ(x N 1 ) } µ(x N 2 ) µ(x j ) = max x j+1 ψ(x j, x j+1 ) µ(x j+1 ), for x j {1,..., K} Can also be done from left to right, using the reverse ordering Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

40 Efficient Inference (cont.) Exploit the structure max x p(x) = max x 1 max max ψ(x 1, x 2 ) ψ(x N 1, x N ) x 2 x N ψ(x N 1, x N ) max x N 1 = max ψ(x 1, x 2 ) max x 1 N x } {{ } ψ(x N 1, x N ) } {{ µ(x N 1 ) } µ(x N 2 ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) The key is the distributive property: max{a b, a c} = a max{b, c} Equivalently, for max log p(x), use max{a+b, a+c} = a+max{b, c} To compute arg max x p(x) need a backward and a forward pass; why and how? Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

41 Efficient Inference (cont.) Inference on a general DGM via message passing ( ) ˆx 1 = arg max max p(x 1, x 2..., x N ) = arg max µ(x 1 ) x 1 x 2,...,x N x 1 ˆx 2 = arg max x 2 ψ(ˆx 1, x 2 ) µ(x 2 ).. ˆx N = arg max x N ψ(ˆx N 1, x N ) µ(x N ) }{{} 1 = arg max x N ψ(ˆx N 1, x N ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) This is similar to dynamic programming and the Viterbi algorithm This is know as the max-product (or max-sum, with logs) algorithm How to extend this to more general graphical structures? Easy for DGM! A general algorithm is more conveniently written for factor graphs Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

42 Factor Graphs Joint pdf/pmf of X = (X 1,..., X V )(or hybrid) is a product p(x) = s S f s (x s ), where S 2 {1,...,V }, i.e., s {1,..., V } Each factor f s only depends on a subset x s of components of x Example: p(x 1, x 2, x 3 ) = f a (x 1, x 2 ) f b (x 1, x 2 ) f c (x 2, x 3 ) f d (x 3 ) Seen as an MRF, C = { {1, 2}, {2, 3} }, thus p(x) ψ {1,2} ψ {2,3}, ψ {1,2} f a (x 1, x 2 ) f b (x 1, x 2 ) and ψ {2,3} f c (x 2, x 3 ) f d (x 3 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

43 Factor Graphs Factor graphs are bpartite: two disjoint subsets of nodes (factors and variables); no edges between nodes in the same subset. Mapping an MRF to a factor graph is not unique: Mapping an Bayesian network to a factor graph is not unique: Neighborhood: ne(x) = {s S : x s} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

44 Sum-Product Algorithm on Factor Graphs Working example: compute a marginal p(x) = x\x p(x) Assume the graph is a tree Group factors in the following way p(x) = f s (x s ) = F s (x, X s ) s S s ne(x) where X s is the subset of variables on the subtree connected to x via factor s F s (x, X s ) is the product of all the factors in the subtree connected to s Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

45 Sum-Product Algorithm on Factor Graphs Rewrite the marginalization: p(x) = F s (x, X s ) = x\x s ne(x) s ne(x) µ fs x (x) {}}{ F s (x, X s ) X s F s (x, X s ) = f s (x, x 1,..., x M )G 1 (x 1, X s1 ) G M (x M, X sm ) µ fs x(x) = x 1 x M f s (x, x 1,..., x M ) m ne(f s)\x µ xm fs (x m) {}}{ X xm G m (x m, X sm ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

46 Sum-Product Algorithm on Factor Graphs Factor-to-variable (FtV) messages: µ fs x(x) = x 1 x M f s (x, x 1,..., x M ) m ne(f s)\x variable-to-factor (VtF) messages {}}{ µ xm f s (x m ) Computing FtV message from f s to x Compute product of VtF messages coming from all variables except x Multiply by the local factor Marginalize w.r.t. all variables except x Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

47 Sum-Product Algorithm on Factor Graphs Variable-to-factor (VtF) messages: µ xm fs (x m ) = G m (x m, X sm ) X xm But G m (x m, X sm ) = F l (x m, X ml ) l ne(x m)\f s µ xm f s (x m ) = = l ne(x m)\f s X ml F l (x m, X ml ) l ne(x m)\f s µ fl x m (x m ) Each VtF message is the product of the FtV messages that the variable receives from the other factors. Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

48 Sum-Product Algorithm on Factor Graphs Variable-to-factor (VtF) messages, from leaf variables µ xm fs (x m ) = µ fl x m (x m ) = 1 l ne(x m)\f s Factor-to-variable (FtV) messages, from leaf factors µ fs x(x) = f s (x, x 1,..., x M ) x 1 x M µ xm fs (x m ) = f s (x) m ne(f s)\x Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

49 Sum-Product on Factor Graphs: Detailed Example p(x) = f a (x 1, x 2 ) f b (x 2, x 3 ) f c (x 2, x 4 ) p(x 2 ) = µ fa x2 (x 2 ) µ fb x 2 (x 2 ) µ fb x 2 (x 2 ) = f a (x 1, x 2 ) f b (x 2, x 3 ) fc(x 2, x 4 ) x 1 x 3 x 4 = x 1 x 3 x 4 f a (x 1, x 2 )f b (x 2, x 3 )fc(x 2, x 4 ) = p(x 2 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

50 Max-Sum Algorithm on Factor Graphs Message passing for MAP: max log p(x) = max log f s (x) x s S Distributive property max{a + b, a + c} = a + max{b, c} Max-sum messages: µf x (x) = max x 1,...,x M µx f (x) = l ne(x)\f log f(x, x 1,..., x M ) + µ fl x(x) At leaf variables and factors: µ f x (x) = log f(x) µx f (x) = 0 m ne(f)\x µ xm f (x m ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

51 Recommended Reading C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 (this lecture was very much based on chapter 8 of this book). Chapter 8 is freely available at um/people/cmbishop/prml/pdf/bishop-prml-sample.pdf K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012 (chapters 10 and 19). Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

4.1 Notation and probability review

4.1 Notation and probability review Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models

Graphical Models. Andrea Passerini Statistical relational learning. Graphical Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs) Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Probabilistic Graphical Models: Representation and Inference

Probabilistic Graphical Models: Representation and Inference Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Graphical Models - Part II

Graphical Models - Part II Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z) Graphical Models Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig AIMA2e Möller/Mori 2

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Graphical Models - Part I

Graphical Models - Part I Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic

More information

Directed Graphical Models or Bayesian Networks

Directed Graphical Models or Bayesian Networks Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis Graphical Models Foundations of Data Analysis Torsten Möller and Thomas Torsney-Weir Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig

More information

Conditional Independence

Conditional Independence Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why

More information

Bayesian Networks (Part II)

Bayesian Networks (Part II) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part II) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Bayesian Networks. Alan Ri2er

Bayesian Networks. Alan Ri2er Bayesian Networks Alan Ri2er Problem: Non- IID Data Most real- world data is not IID (like coin flips) MulBple correlated variables Examples: Pixels in an image Words in a document Genes in a microarray

More information

Lecture 17: May 29, 2002

Lecture 17: May 29, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador

More information

Gibbs Fields & Markov Random Fields

Gibbs Fields & Markov Random Fields Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics

More information

Cours 7 12th November 2014

Cours 7 12th November 2014 Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information