Statistical Learning
|
|
- Lewis Melton
- 5 years ago
- Views:
Transcription
1 Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
2 Bayesian Networks and Graphical Models Bayes nets in a nutshell: Structured probability (density/mass) functions f X (x; θ) Provides a graph-based language/grammar to express conditional independence Allows formalizing the problem of inferring a subset of components of X from another subset thereof Allows formalizing the problem of learning θ from observed i.i.d. realizations of X: x 1,..., x n Bayes nets are one type of graphical models, based on directed graphs. Other types (more on them later): Markov random fields (MRF), based on undirected graphs Factor graphs Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
3 Bayesian Networks: Introduction Notation: we use the more compact p(x) notation instead of the more correct f X (x) For X R n, the pdf/pmf p(x) can be factored by Bayes law: p(x) = p(x 1 x 2,..., x n ) p(x 2,..., x n ). = p(x 1 x 2,..., x n ) p(x 2 x 3,..., x n ) p(x 3,..., x n ) = p(x 1 x 2,..., x n ) p(x n 1 x n ) p(x n ) Of course, this can be done in n! different ways; e.g., p(x) = p(x n x n 1,..., x 1 ) p(x n 1,..., x 1 ). = p(x n x n 1,..., x 1 ) p(x n 1 x n 2,..., x 1 ) p(x n 2,..., x 1 ) = p(x n x n 1,..., x 1 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
4 Bayesian Networks: Introduction For X R n, the pdf/pmf p(x) can be factored by Bayes law: p(x) = p(x n x n 1,..., x 1 ) p(x 2 x 1 ) p(x 1 ) In general, this is not more compact than p(x) Example: if xi {1,..., K}, a general p(x) has K n 1 K n parameters But p(x n x n 1,..., x 1 ) alone has (K 1)K n 1 K n parameters!...unless, there are some conditional independencies Example: X is a Markov chain: p(x i x i 1,..., x 1 ) = p(x i x i 1 ) in this case, p(x) = p(x n x n 1 ) p(x n 1 x n 2 ) p(x 2 x 1 ) p(x 1 ) has n K (K 1) + K n K 2 parameters linear in n, rather than exponential! Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
5 Conditional Independence Bayes nets are built on conditional independence Random variables X and Y are conditional independent, given Z, if f X,Y Z (x, y z) = f X Z (x z)f Y Z (y z) Naturally, X, Y, and Z can be groups of random variables Notation: X Y Z or X = Y Z Equivalent relationship (in short notation): p(x y, z) = p(x, y z) p(y z) = p(x z) p(y z) p(y z) = p(x z) Factorization: if X = (X 1, X 2, X 3 ) and X 1 X 3 X 2, then p(x) = p(x 3 x 2, x 1 ) p(x 2 x 1 ) p(x 1 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
6 Graphical Models Graph-based representations of the joint pdf/pmf p(x) Each node i represents random a variable X i The conditional independence properties are encoded by the presence/absence of edges in the graph Example: X 1 X 3 X 2 is represented by one of the following: p(x) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) p(x) = p(x 3 x 2 ) p(x 1 x 2 ) p(x 2 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) p(x) = p(x 1 x 2 ) p(x 2 x 3 ) p(x 3 ) = p(x 1 x 2 ) p(x 3 x 2 ) p(x 2 ) = p(x 3 x 2 ) p(x 2 x 1 ) p(x 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
7 (Directed) Graph Concepts Directed graph G = (V, E), where set of nodes or vertices V = {1,..., V } 1 set of edges E V V ; i.e., each element of E has the form (s, t), with x, t V 2 3 in this context, we assume (v, v) E, v V 4 5 Note: in an undirected graph, each edge is a 2 element multiset; i.e., has the form {u, v}, where u, v V Parents of a node: pa(v) = {s V : (s, v) E} Example: pa(4) = {2, 3} Children of a node: ch(v) = {t V : (v, t) E} Example: ch(3) = {4, 5} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
8 (Directed) Graph Concepts (cont.) Root: a node v s.t. pa(v) = Example: 1 is a root. Leaf: a node v s.t. ch(v) = Example: 4 and 5 are leaves. Reachability: node t is reachable from s if there is a sequence of edges ( (vs1, v t1 ),..., (v sn, v tn ) ) s.t. v s1 = s, v si = v ti 1, i = 2,..., n, and v tn = t Ex: 5 reachable from 1; 2 not reachable from Ancestors of a node: anc(v) = {s : v is reachable from s} Examples: anc(5) = {1, 3}; anc(4) = {1, 2, 3}; anc(1) = Descendants of a node: desc(v) = {t : t is reachable from v} Examples: desc(1) = {2, 3, 4, 5}; desc(2) = {4}; desc(5) = Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
9 (Directed) Graph Concepts (cont.) Neighborhood of a node nbr(v) = {u : (u, v) E (v, u) E} Example: nbr(3) = {1, 4, 5} In-degree of node v is the cardinality of pa(v) Example: the in-degree of 4 is 2 Out-degree of node v is the cardinality of ch(v) Example: the out-degree of 3 is 2 Cycle (or loop): (v 1, v 2,..., v n ) : v 1 = v n, (v i, v i+1 ) E Example: the graph shown above has no loops/cycles Directed acyclic graph (DAG): a directed graph with no loops/cycles. Directed tree: a DAG where each node has 1 or 0 parents. Subgraph of G = (V, E) induced by a subset of nodes S V: G S = (S, E S ), where E S = {(u, v) E : u, v S} Example: G {1,3,5} = ({1, 3, 5}, {(1, 3), (3, 5)}) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
10 Directed Graphical Models (DGM) DGM, a.k.a. Bayesian networks, belief networks, causal networks Consider X = (X 1,..., X V ) with pdf/pmf p(x) = p(x 1,..., x V ) Consider a graph G = (V, E), where V = {1,..., V } G is a DGM for X if (with x S = {x v, v S}) V p(x) = p(x v x pa(v) ) }{{} Example v=1 cond. prob. dist. (CPD) 1 p(x) = p(x 1 ) p(x 2 x 1 ) p(x 3 x 1 ) p(x 4 x 2, x 3 ) p(x 5 x 3 ) 2 3 The DGM is not unique (example in slide 6) 4 5 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
11 Directed Graphical Models: Examples Naïve Bayes classification (generative model): class variable Y {1,..., K}, with prior p(y); class-conditional pdf/pmf p(x y) p(y, x) = p(x y) p(y) = p(y) V p(x v y) v=1 Y X 1 X 2 X 3 X 4 Tree-augmented naïve Bayes (TAN) classification: class variable Y {1,..., K}, with prior p(y); class-conditional pdf/pmf p(x y) Y V p(y, x) = p(y) p(x v x pa(v), y) v=1 where the DAG is a tree. X 1 X2 X 3 X 4 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
12 Directed Graphical Models: More Examples First order Markov chain: p(x) = p(x 1 ) p(x 2 x 1 ) p(x V x V 1 ) x 1 x 2 x 3 Second order Markov chain: p(x) = p(x 1, x 2 ) p(x 3 x 2, x 1 ) p(x V x V 1, x V 2 ) x 1 x 2 x 3 x 4 Hidden Markov model (HMM): (Z, X) = (Z 1,..., Z T, X 1,..., X T ) T p(z, x) = p(z 1 )p(x 1 z 1 ) p(x v z v ) p(z v z v 1 ) v=2 z 1 z 2 z T x 1 x 2 x T Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
13 Inference in Directed Graphical Models Visible and hidden variables: x = (x v, x h ) Joint pdf/pmf p(x v, x h θ) Inferring the hidden variables from the visible ones: p(x h x v, θ) = p(x h, x v θ) p(x v θ) = p(x h, x v θ) p(x h, x v θ) x h...with integral instead of sum, if x h has real components Sometimes, only a subset x q of x h is of interest, x h = (x q, x n ): p(x q x v, θ) = x n p(x q, x n x v, θ)...x n are sometimes called nuisance variables Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
14 Learning in Directed Graphical Models Observe samples x 1,..., x N of N i.i.d. copies of X p(x θ) MAP estimate of θ ( ˆθ = arg max log p(θ) + θ N i=1 ) log p(x i θ) Plate notation: for a collection of i.i.d. copies of some variable θ θ.. X 1 X N X i N If there are hidden variables, x should be understood as denoting only the visible ones Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
15 Learning in Directed Graphical Models (cont.) For N i.i.d. observations D = (x 1,..., x N ), N N V p(d θ) = p(x i θ) = p(x iv x i,pa(v), θ) = i=1 i=1 v=1 where D v is the data associated with node v and θ v the corresponding parameters If the prior factorizes, p(θ) = v p(θ v), the posterior p(θ D) p(θ) p(d θ) = also factorizes V p(θ v )p(d v θ t ) v=1 V p(d v θ v ) v=1 V p(θ v D v ) v=1 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
16 Learning in Directed Graphical Models: Categorical CPD Each x v {1,..., K v } The number of configurations of x pa(v) is C v = s pa(v) Abusing notation, write x pa(v) = c, for c {1,..., C v }, to denote that x pa(v) takes the c-th configuration Parameters: θ vck = P(x v = k x pa(v) = c) (of course K v k=1 θ vck = 1) Denote θ vc = (θ vc1,..., θ vckv ) Counts: N vck = N i=1 1(x iv = k, x i,pa(v) = c) Maximum likelihood estimates: ˆθ vck = N vck Kv j=1 N vcj MAP estimate w/ Dirichlet(α vc ) prior: ˆθ vck = K s N vck + α vck Kv j=1 (N vcj + α vcj ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
17 Conditional Independence Properties Consider X = (X 1,..., X V ) p(x), and V = {1,..., V } Let x A x B x C be a true conditional independence (CI) statement about p(x), where A, B, C are disjoint subsets of V Let I(p) be the set of all (true) CI statements about p(x) Graph G = (V, E) G expresses a collection I(G) of CI statements x A G x B x C (explained below) G is an I-map (independence map) of p(x) if I(G) I(p) Example: if G is fully connected, I(G) =, thus I(G) I(p), for any p Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
18 Conditional Independence Properties (cont.) What type conditional independence (CI) statements are expressed by some graph G? In an path through some node m, there are three possible orientation structures: tail to tail head to tail head to head Before proceeding to the general statement, we next exemplify which type of CI corresponds to each of these structures. Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
19 Conditional Independence Structures Tail to tail X Y Z p(x, y, z) p(x, y z) = p(z) p(x z) p(y z) p(z) = p(z) = p(x z) p(y z) Head to tail X Y Z p(x, y z) = p(y z) p(z x) p(x) p(z) = p(y z) p(x z) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
20 Conditional Independence Structures Head to head p(x, y z) p(x z) p(y z) X Y Classical example: the noisy fuel gauge Binary variables: X = battery OK, Y = full tank, Z = fuel gauge on P(X =1) = P(Y =1) = 0.9 x y P(Z = 1 x, y) Gauge is off (Z = 0); is the tank empty? P(Y = 0 Z = 0) = but, the battery is also dead, P(Y = 0 Z = 0, X = 0) = 0.11 Dead battery explains away the empty tank Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
21 D-Separation Graph G = (V, E) and three disjoint subsets of V: A, B, and C An (undirected) path from a node in A to a node in B is blocked by C if it includes a node m such that m C and the arrows meet head to tail or tail to tail the arrows meet head to head, m C, and desc(m) C = C D-separates A from B and x A G x B x C, if any path from a node in A to a node in B is blocked by C Example: x 4 G x 5 x 1 (tail to tail) x 1 G x 2 x 4 (head to head in 5 {4}) x 5 G x 6 x {2,3} (tail to tail in 2 and head to tail in 3) x 1 G x 2 x 7 is false (head to head in 5, but 7 desc(5)) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
22 Markov Blanket The Markov blanket of node m (mb(m)) is the set of nodes, conditioned on which x m is independent of all other nodes. Which nodes belong to mb(m)? V p(x i x pa(i) ) i=1 p(x m {x j : j m}) = V p(x i x pa(i) ) = x m i=1 j:m j,m pa(j) j:m j,m pa(j) p(x j x pa(j) ) p(x j x pa(j) ) x m...thus mb(m) = pa(m) ch(m) pa(ch(m)) }{{} coparents i mb(m) i mb(m) p(x i x pa(i) ) p(x i x pa(i) ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
23 Undirected Graphical Models (Markov Random Fields) MRF are based on undirected graphs G = (V, E), where each edge {u, v} is a sub-multiset of V of cardinality 2 Conditional independence statements result from simple separation (simpler than D-separation) definition Let A, B, C be disjoint subsets of V If any path from a node in A to a node in B goes through C, it is said that C separates A from B Graph G is an I-map for (X 1,..., X V ) = X p(x) if C separates A from B x A x B x C... a perfect I-map if can be replaced by A complete graph is an I-map for any p(x) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
24 Markov Random Fields (cont.) Neighborhood: N(i) = {j : {i, j} E} In an MRF, the Markov blanket is simply the neighborhood mb(i) = N(i): p(x i x V\{i} ) = p(x i x N(i) ) x i x V\({i} N(i)) x N(i) Clique: set of mutually neighboring nodes Maximal clique: a clique that is not contained in any other clique; C(G) denotes the set of maximal cliques of G Examples: {1, 2} is a (non-maximal) clique; {1, 2, 3, 5} is a not clique (1 and 5 are not neighbors); {1, 2, 3} and {4, 5, 6, 7} are maximal cliques Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May /
25 Hammersley-Clifford Theorem and Gibbs Distributions Let p(x) = p(x 1,..., x V ), such that p(x) > 0, x. Then, G is an I-map for p(x) if and only where p(x) = 1 Z C C(G) ψ C (x C ) = 1 Z exp( Z = x C C(G) ψ C (x C ) C C(G) E C (x C ) ) is the partition function (with integration, rathar than summation, in the case of continuous variables) ψ C is called clique potential; E C is called clique energy This is known in statistical physics as a Gibbs distribution Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
26 Local Conditionals in Gibbs Distributions Consider a Gibbs distribution over a graph G p(x) = 1 Z C C(G) Local conditional distribution p(x i x N(i) ) = = ψ C (x C ) = 1 Z exp ( 1 Z(x N(i) ) C: i C C C(G) ψ C (x C ) 1 ( Z(x N(i) ) exp C: i C ) E C (x C ) ) E C (x C ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
27 Auto Models Auto models: based on 2-node cliques: E C (x C ) = E D (x D ) D: D =2,D C equivalently, ψ C (x C ) = ψ D (x D ) D: D =2,D C Joint distribution has the form p(x) = 1 Z D: D =2,D C C(G) = 1 Z exp ( ψ D (x D ) D: D =2,D C C(G) ) E D (x D ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
28 Auto Models: Gaussian Markov Random Fields (GMRF) X R V, with Gaussian ( p(x) exp 1 ) 2 (x µ)t A(x µ) ( exp 1 2 V V i=1 j=1 ) A ij (x i µ i ) (x j µ j ) where A, the inverse of the covariance matrix, is symmetric Each pair-wise clique {i, j} has energy { Aij (x E {i,j} ({x i, x j }) = i µ i ) (x j µ j ) if i j A ii 2 (x i µ i ) 2 if i = j Neighborhood system: N(i) = {j : A ij 0} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
29 Auto Models: Ising and Potts Fields X { 1, +1} V, with energy function { β xi = x E {i,j} (x i, x j ) = j β x i x j where β > 0 (ferromagnetic interaction) or β < 0 (anti-ferromagnetic interaction) Computing Z is NP-hard in general Generalization to K states: the Potts model, X {1,..., K} V, with energy function { β xi = x E {i,j} (x i, x j ) = j 0 x i x j Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
30 Illustration of Potts Fields Samples of Potts model, with K = 10, Graph is 2D grid; neighborhood of each node is the set of its 4 nearest neighbors β = 1.42 β = 1.44 β = 1.46 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
31 DGM and MRF: The Easy Case Problem: how to write a DGM as an MRF? In some cases, there is a trivial relationship: A Markov chain p(x) = p(x 1 ) p(x 2 x 1 ) p(x N x N 1 ) can obviously be written as an MRF (with Z = 1) p(x) = 1 Z ψ {1,2}(x 1, x 2 ) }{{} p(x 1 ) p(x 2 x 1 ) ψ {2,3} (x 2, x 2 ) }{{} p(x 3 x 2 ) ψ {N,N 1} (x N, x N 1 ) }{{} p(x N x N 1 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
32 DGM and MRF: The General Case Procedure: 1 Insert undirected arrows between all pairs of parents of each node ( moralization ) 2 Make all edges undirected; 3 Initialize all clique potentials to 1; 4 Take each factor in the DGM and multiply it by the potential of a clique that contains all the involved nodes; Example p(x) = p(x 1 ) p(x 2 ) p(x 3 ) p(x 4 x 1, x 2, x 3 ) }{{} ψ {1,2,3,4} (x 1,x 2,x 3,x 4 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
33 From DGM and MRF: Another Example DGM: p(x) = p(x 1 ) p(x 2 ) p(x 3 x 1, x 2 ) p(x 4 x 3 ) p(x 5 x 3 ) p(x 6 x 4, x 5 ) Cliques: C = { {1, 2, 3}, {3, 4, 5}, {4, 5, 6} } MRF: p(x) = ψ(x 1, x 2, x 3 ) }{{} (p(x 1 ) p(x 2 ) p(x 3 x 1,x 2 )) ψ(x 3, x 4, x 5 ) }{{} (p(x 4 x 3 ) p(x 5 x 3 )) ψ(x 4, x 5, x 6 ) }{{} p(x 6 x 4,x 5 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
34 Efficient Inference Motivating example: compute a marginal p(x n ) in chain graph p(x) = p(x 1,..., x N ) = ψ(x 1, x 2 ) ψ(x 2, x 3 ) ψ(x N 1, x N ) Naïve solution (suppose x 1, x 2,..., x N {1,..., K} and Z = 1) K K K K p(x n ) = p(x 1,..., x N ) x 1 =1 x n 1 =1 x n+1 =1 x N =1 has cost O(K N ) (just computing p(x 1,..., x N ) has cost O(K N ))...the structure of p(x 1,..., x N ) is not being exploited Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
35 Efficient Inference (cont.) Reorder the summations and use the structure K K K K p(x n ) = ψ(x 1, x 2 ) ψ(x N 1, x N ) x n+1=1 x N =1 x n 1=1 x 1=1 K K K = ψ(x n, x n+1 ) ψ(x N 1, x N ) ψ(x N 1, x N ) x n+1=1 x N 1 =1 x N =1 K K K ψ(x n 1, x n ) ψ(x 2, x 3 ) ψ(x 1, x 2 ) x n 1=1 x 2=1 x 1=1 = µ β (x n ) µ α (x n ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) The key is the distributive property: ab + ac = a(b + c) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
36 Efficient Inference (cont.) Can be seen as a message passing µ β (x n ) = K x n+1=1 ψ(x n, x n+1 ) K x N 1 =1 ψ(x N 2, x N 1 ) K x N =1 ψ(x N 1, x N ) } {{ } } {{ µ β (x N 1 ) } µ β (x N 2 ) Formally, right to left messages: µ β (x N ) = 1, for x N {1,..., K} K µ β (x j ) = ψ(x j, x j+1 ) µ β (x j+1 ), for x j {1,..., K} x j+1 =1 Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
37 Efficient Inference (cont.) Can be seen as a message passing K µ α (x n ) = ψ(x n 1, x n ) x n 1=1 K ψ(x 2, x 3 ) x 2=1 K ψ(x 1, x 2 ) x 1=1 } {{ } } {{ µ α(x 2) } µ α(x 3) Formally, left to right messages: µ α (x 1 ) = 1, for x 1 {1,..., K} K µ α (x j ) = ψ(x j 1, x j ) µ β (x j 1 ), for x j {1,..., K} x j 1 =1 Known as the sum-product algorithm, a.k.a., belief propagation Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
38 Efficient Inference (cont.) Another example: compute the MAP configuration: max x p(x) p(x) = p(x 1,..., x N ) = ψ(x 1, x 2 ) ψ(x 2, x 3 ) ψ(x N 1, x N ) Naïve solution (suppose x 1, x 2,..., x N {1,..., K} and Z = 1) max x p(x) = max x 1 max max p(x 1,..., x N ) x 2 x N has cost O(K N ) (just computing p(x 1,..., x N ) has cost O(K N ))...the structure of p(x 1,..., x N ) is not being exploited Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
39 Efficient Inference (cont.) Exploit the structure max x p(x) = max x 1 max max ψ(x 1, x 2 ) ψ(x N 1, x N ) x 2 x N ψ(x N 2, x N 1 ) max x N 1 = max ψ(x 1, x 2 ) max x 1 Formally, right to left messages: µ(x N ) = 1, for x N {1,..., K} N x } {{ } ψ(x N 1, x N ) } {{ µ(x N 1 ) } µ(x N 2 ) µ(x j ) = max x j+1 ψ(x j, x j+1 ) µ(x j+1 ), for x j {1,..., K} Can also be done from left to right, using the reverse ordering Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
40 Efficient Inference (cont.) Exploit the structure max x p(x) = max x 1 max max ψ(x 1, x 2 ) ψ(x N 1, x N ) x 2 x N ψ(x N 1, x N ) max x N 1 = max ψ(x 1, x 2 ) max x 1 N x } {{ } ψ(x N 1, x N ) } {{ µ(x N 1 ) } µ(x N 2 ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) The key is the distributive property: max{a b, a c} = a max{b, c} Equivalently, for max log p(x), use max{a+b, a+c} = a+max{b, c} To compute arg max x p(x) need a backward and a forward pass; why and how? Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
41 Efficient Inference (cont.) Inference on a general DGM via message passing ( ) ˆx 1 = arg max max p(x 1, x 2..., x N ) = arg max µ(x 1 ) x 1 x 2,...,x N x 1 ˆx 2 = arg max x 2 ψ(ˆx 1, x 2 ) µ(x 2 ).. ˆx N = arg max x N ψ(ˆx N 1, x N ) µ(x N ) }{{} 1 = arg max x N ψ(ˆx N 1, x N ) Cost O(N K 2 ) (versus O(K N ); e.g., 2000 versus ) This is similar to dynamic programming and the Viterbi algorithm This is know as the max-product (or max-sum, with logs) algorithm How to extend this to more general graphical structures? Easy for DGM! A general algorithm is more conveniently written for factor graphs Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
42 Factor Graphs Joint pdf/pmf of X = (X 1,..., X V )(or hybrid) is a product p(x) = s S f s (x s ), where S 2 {1,...,V }, i.e., s {1,..., V } Each factor f s only depends on a subset x s of components of x Example: p(x 1, x 2, x 3 ) = f a (x 1, x 2 ) f b (x 1, x 2 ) f c (x 2, x 3 ) f d (x 3 ) Seen as an MRF, C = { {1, 2}, {2, 3} }, thus p(x) ψ {1,2} ψ {2,3}, ψ {1,2} f a (x 1, x 2 ) f b (x 1, x 2 ) and ψ {2,3} f c (x 2, x 3 ) f d (x 3 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
43 Factor Graphs Factor graphs are bpartite: two disjoint subsets of nodes (factors and variables); no edges between nodes in the same subset. Mapping an MRF to a factor graph is not unique: Mapping an Bayesian network to a factor graph is not unique: Neighborhood: ne(x) = {s S : x s} Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
44 Sum-Product Algorithm on Factor Graphs Working example: compute a marginal p(x) = x\x p(x) Assume the graph is a tree Group factors in the following way p(x) = f s (x s ) = F s (x, X s ) s S s ne(x) where X s is the subset of variables on the subtree connected to x via factor s F s (x, X s ) is the product of all the factors in the subtree connected to s Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
45 Sum-Product Algorithm on Factor Graphs Rewrite the marginalization: p(x) = F s (x, X s ) = x\x s ne(x) s ne(x) µ fs x (x) {}}{ F s (x, X s ) X s F s (x, X s ) = f s (x, x 1,..., x M )G 1 (x 1, X s1 ) G M (x M, X sm ) µ fs x(x) = x 1 x M f s (x, x 1,..., x M ) m ne(f s)\x µ xm fs (x m) {}}{ X xm G m (x m, X sm ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
46 Sum-Product Algorithm on Factor Graphs Factor-to-variable (FtV) messages: µ fs x(x) = x 1 x M f s (x, x 1,..., x M ) m ne(f s)\x variable-to-factor (VtF) messages {}}{ µ xm f s (x m ) Computing FtV message from f s to x Compute product of VtF messages coming from all variables except x Multiply by the local factor Marginalize w.r.t. all variables except x Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
47 Sum-Product Algorithm on Factor Graphs Variable-to-factor (VtF) messages: µ xm fs (x m ) = G m (x m, X sm ) X xm But G m (x m, X sm ) = F l (x m, X ml ) l ne(x m)\f s µ xm f s (x m ) = = l ne(x m)\f s X ml F l (x m, X ml ) l ne(x m)\f s µ fl x m (x m ) Each VtF message is the product of the FtV messages that the variable receives from the other factors. Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
48 Sum-Product Algorithm on Factor Graphs Variable-to-factor (VtF) messages, from leaf variables µ xm fs (x m ) = µ fl x m (x m ) = 1 l ne(x m)\f s Factor-to-variable (FtV) messages, from leaf factors µ fs x(x) = f s (x, x 1,..., x M ) x 1 x M µ xm fs (x m ) = f s (x) m ne(f s)\x Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
49 Sum-Product on Factor Graphs: Detailed Example p(x) = f a (x 1, x 2 ) f b (x 2, x 3 ) f c (x 2, x 4 ) p(x 2 ) = µ fa x2 (x 2 ) µ fb x 2 (x 2 ) µ fb x 2 (x 2 ) = f a (x 1, x 2 ) f b (x 2, x 3 ) fc(x 2, x 4 ) x 1 x 3 x 4 = x 1 x 3 x 4 f a (x 1, x 2 )f b (x 2, x 3 )fc(x 2, x 4 ) = p(x 2 ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
50 Max-Sum Algorithm on Factor Graphs Message passing for MAP: max log p(x) = max log f s (x) x s S Distributive property max{a + b, a + c} = a + max{b, c} Max-sum messages: µf x (x) = max x 1,...,x M µx f (x) = l ne(x)\f log f(x, x 1,..., x M ) + µ fl x(x) At leaf variables and factors: µ f x (x) = log f(x) µx f (x) = 0 m ne(f)\x µ xm f (x m ) Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
51 Recommended Reading C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 (this lecture was very much based on chapter 8 of this book). Chapter 8 is freely available at um/people/cmbishop/prml/pdf/bishop-prml-sample.pdf K. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012 (chapters 10 and 19). Mário A. T. Figueiredo (IST & IT) Statistical Learning: Lecture 5 May / 51
Chris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationDirected and Undirected Graphical Models
Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationGraphical Models. Andrea Passerini Statistical relational learning. Graphical Models
Andrea Passerini passerini@disi.unitn.it Statistical relational learning Probability distributions Bernoulli distribution Two possible values (outcomes): 1 (success), 0 (failure). Parameters: p probability
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationProbabilistic Graphical Models: Representation and Inference
Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Franz Pernkopf, Robert Peharz, Sebastian Tschiatschek Graz University of Technology, Laboratory of Signal Processing and Speech Communication Inffeldgasse
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationGraphical Models - Part II
Graphical Models - Part II Bishop PRML Ch. 8 Alireza Ghane Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Graphical Models Alireza Ghane / Greg Mori 1 Outline Probabilistic
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationp(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)
Graphical Models Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig AIMA2e Möller/Mori 2
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationGraphical Models - Part I
Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationGraphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence
Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence General overview Introduction Directed acyclic graphs (DAGs) and conditional independence DAGs and causal effects
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationProbabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis
Graphical Models Foundations of Data Analysis Torsten Möller and Thomas Torsney-Weir Möller/Mori 1 Reading Chapter 8 Pattern Recognition and Machine Learning by Bishop some slides from Russell and Norvig
More informationConditional Independence
Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why
More informationBayesian Networks (Part II)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part II) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationBayesian Networks. Alan Ri2er
Bayesian Networks Alan Ri2er Problem: Non- IID Data Most real- world data is not IID (like coin flips) MulBple correlated variables Examples: Pixels in an image Words in a document Genes in a microarray
More informationLecture 17: May 29, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 2000 Dept. of Electrical Engineering Lecture 17: May 29, 2002 Lecturer: Jeff ilmes Scribe: Kurt Partridge, Salvador
More informationGibbs Fields & Markov Random Fields
Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationCOS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference
COS402- Artificial Intelligence Fall 2015 Lecture 10: Bayesian Networks & Exact Inference Outline Logical inference and probabilistic inference Independence and conditional independence Bayes Nets Semantics
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information