Graphical models. Sunita Sarawagi IIT Bombay

Size: px
Start display at page:

Download "Graphical models. Sunita Sarawagi IIT Bombay"

Transcription

1 1 Graphical models Sunita Sarawagi IIT Bombay

2 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x 1,... x n ) Goal: Answer several kind of projection queries on the distribution

3 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x 1,... x n ) Goal: Answer several kind of projection queries on the distribution Basic premise Explicit joint distribution is dauntingly large Queries are simple marginals (sum or max) over the joint distribution.

4 3 Example Variables are attributes are people. Age Income Experience Degree Location 10 ranges 7 scales 7 scales 3 scales 30 places An explicit joint distribution over all columns not tractable: number of combinations: = Queries: Estimate fraction of people with Income > 200K and Degree= Bachelors, Income < 200K, Degree= PhD and experience > 10 years. Many, many more.

5 4 Alternatives to an explicit joint distribution Assume all columns are independent of each other: bad assumption

6 4 Alternatives to an explicit joint distribution Assume all columns are independent of each other: bad assumption Use data to detect pairs of highly correlated column pairs and estimate their pairwise frequencies Many highly correlated pairs income age, income experience, age experience Ad hoc methods of combining these into a single estimate

7 4 Alternatives to an explicit joint distribution Assume all columns are independent of each other: bad assumption Use data to detect pairs of highly correlated column pairs and estimate their pairwise frequencies Many highly correlated pairs income age, income experience, age experience Ad hoc methods of combining these into a single estimate Go beyond pairwise correlations: conditional independencies income age, but income age experience experience degree, but experience degree income Graphical models make explicit an efficient joint distribution from these independencies

8 5 Graphical models Model joint distribution over several variables as a product of smaller factors that is 1 Intuitive to represent and visualize Graph: represent structure of dependencies Potentials over subsets: quantify the dependencies 2 Efficient to query given values of any variable subset, reason about probability distribution of others. many efficient exact and approximate inference algorithms

9 5 Graphical models Model joint distribution over several variables as a product of smaller factors that is 1 Intuitive to represent and visualize Graph: represent structure of dependencies Potentials over subsets: quantify the dependencies 2 Efficient to query given values of any variable subset, reason about probability distribution of others. many efficient exact and approximate inference algorithms Graphical models = graph theory + probability theory.

10 Graphical models in use Roots in statistical physics for modeling interacting atoms in gas and solids [ 1900] Early usage in genetics for modeling properties of species [ 1920] AI: expert systems ( 1970s-80s) Now many new applications: Error Correcting Codes: Turbo codes, impressive success story (1990s) Robotics and Vision: image denoising, robot navigation. Text mining: information extraction, duplicate elimination, hypertext classification, help systems Bio-informatics: Secondary structure prediction, Gene discovery Data mining: probabilistic classification and clustering.

11 7 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

12 Representation Structure of a graphical model: Graph + Potential Graph Nodes: variables x = x 1,... x n Continuous: Sensor temperatures, income Discrete: Degree (one of Bachelors, Masters, PhD), Levels of age, Labels of words Edges: direct interaction Directed edges: Bayesian networks Undirected edges: Markov Random fields Directed Location Age Degree Experience Income Undirected Age Location Degree Experience Income

13 9 Representation Potentials: ψ c (x c ) Scores for assignment of values to subsets c of directly interacting variables. Which subsets? What do the potentials mean? Different for directed and undirected graphs

14 9 Representation Potentials: ψ c (x c ) Scores for assignment of values to subsets c of directly interacting variables. Which subsets? What do the potentials mean? Different for directed and undirected graphs Probability Factorizes as product of potentials Pr(x = x 1,... x n ) ψ S (x S )

15 10 Directed graphical models: Bayesian networks Graph G: directed acyclic Parents of a node: Pa(xi ) = set of nodes in G pointing to x i

16 10 Directed graphical models: Bayesian networks Graph G: directed acyclic Parents of a node: Pa(xi ) = set of nodes in G pointing to x i

17 10 Directed graphical models: Bayesian networks Graph G: directed acyclic Parents of a node: Pa(xi ) = set of nodes in G pointing to x i Potentials: defined at each node in terms of its parents. ψ i (x i, Pa(x i )) = Pr(x i Pa(x i )

18 10 Directed graphical models: Bayesian networks Graph G: directed acyclic Parents of a node: Pa(xi ) = set of nodes in G pointing to x i Potentials: defined at each node in terms of its parents. ψ i (x i, Pa(x i )) = Pr(x i Pa(x i ) Probability distribution Pr(x 1... x n ) = n Pr(x i pa(x i )) i=1

19 11 Example of a directed graph Location Age Degree Experience Income

20 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other

21 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other ψ 2 (A) = Pr(A) > or, a Guassian distribution (µ, σ) = (35, 10)

22 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other ψ 2 (A) = Pr(A) > or, a Guassian distribution (µ, σ) = (35, 10)

23 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other ψ 2 (A) = Pr(A) ψ 2 (E, A) = Pr(E A) > > > or, a Guassian distribution (µ, σ) = (35, 10)

24 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other ψ 2 (A) = Pr(A) > or, a Guassian distribution (µ, σ) = (35, 10) ψ 2 (E, A) = Pr(E A) > > ψ 2 (I, E, D) = Pr(I D, A) 3 dimensional table, or a histogram approximation.

25 11 Example of a directed graph Location Age Degree Experience Income ψ 1 (L) = Pr(L) NY CA London Other ψ 2 (A) = Pr(A) > or, a Guassian distribution (µ, σ) = (35, 10) ψ 2 (E, A) = Pr(E A) > > ψ 2 (I, E, D) = Pr(I D, A) 3 dimensional table, or a histogram approximation. Probability distribution Pa(x = L, D, I, A, E) = Pr(L) Pr(D) Pr(A) Pr(E A) Pr(I D, E)

26 12 Conditional Independencies Given three sets of variables X, Y, Z, set X is conditionally independent of Y given Z (X Y Z) iff Pr(X Y, Z) = Pr(X Z)

27 12 Conditional Independencies Given three sets of variables X, Y, Z, set X is conditionally independent of Y given Z (X Y Z) iff Pr(X Y, Z) = Pr(X Z) Local conditional independencies in BN: for each x i x i ND(x i ) Pa(x i )

28 Income 12 Conditional Independencies Given three sets of variables X, Y, Z, set X is conditionally independent of Y given Z (X Y Z) iff Pr(X Y, Z) = Pr(X Z) Local conditional independencies in BN: for each x i x i ND(x i ) Pa(x i ) L E, D, A, I A L, D E L, D A I A E, D Location Age Degree Experience

29 13 CIs and Fractorization Theorem Local CI = Factorization Proof. x 1, x 2,..., x n topographically ordered (parents before children). Local CI: Pr(x i x 1,..., x i 1 ) = Pr(x i Pa(x i )) Chain rule: Pr(x 1,..., x n ) = i Pr(x i x 1,..., x i 1 ) = i Pr(x i Pa(x i ))

30 Global CIs in a BN Three sets of variables X, Y, Z. If Z d-separates X from Y in BN then, X Y Z. In a directed graph H, Z d-separates X from Y if all paths P from any X to Y is blocked by Z. A path P is blocked by Z when 1 x 1 x 2... x k and x i Z 2 x 1 x 2... x k and x i Z 3 x 1... x i... x k and x i Z 4 x 1... x i... x k and x i Z and Desc(x i ) Z

31 Global CIs in a BN Three sets of variables X, Y, Z. If Z d-separates X from Y in BN then, X Y Z. In a directed graph H, Z d-separates X from Y if all paths P from any X to Y is blocked by Z. A path P is blocked by Z when 1 x 1 x 2... x k and x i Z 2 x 1 x 2... x k and x i Z 3 x 1... x i... x k and x i Z 4 x 1... x i... x k and x i Z and Desc(x i ) Z Theorem The d-separation test identifies the complete set of conditional independencies that hold in all distributions that conform to a given Bayesian network.

32 Popular Bayesian networks Hidden Markov Models: speech recognition, information extraction y 1 y 2 y 3 y 4 y 5 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 State variables: discrete phoneme, entity tag Observation variables: continuous (speech waveform), discrete y 1 y (Word) 2 y 3 y 4 y 5 y 6 y 7 Kalman Filters: State variables: continuous Discussed later y 1 y 2 y 2 y 3 y 3 y 4 y 4 y Topic models for text data 5 y 5 y 6 y 6 y 7 1 Principled mechanism to categorize multi-labeled text documents while incorporating priors in a flexible generative framework 2 Application: news tracking QMR (Quick Medical Reference) system PRMs: Probabilistic relational networks: 15

33 16 Undirected graphical models Graph G: arbitrary undirected graph Useful when variables interact symmetrically, no natural parent-child relationship Example: labeling pixels of an image. Potentials ψ C (y C ) defined on arbitrary cliques C of G. ψ C (y C ): Any arbitrary non-negative value, cannot be interpreted as probability.

34 16 Undirected graphical models Graph G: arbitrary undirected graph Useful when variables interact symmetrically, no natural parent-child relationship Example: labeling pixels of an image. Potentials ψ C (y C ) defined on arbitrary cliques C of G. ψ C (y C ): Any arbitrary non-negative value, cannot be interpreted as probability. Probability distribution Pr(y 1... y n ) = 1 Z ψ C (y C ) C G where Z = y C G ψ C(y C ) (partition function)

35 16 Undirected graphical models Graph G: arbitrary undirected graph Useful when variables interact symmetrically, no natural parent-child relationship Example: labeling pixels of an image. Potentials ψ C (y C ) defined on arbitrary cliques C of G. ψ C (y C ): Any arbitrary non-negative value, cannot be interpreted as probability. Probability distribution Pr(y 1... y n ) = 1 Z ψ C (y C ) C G where Z = y C G ψ C(y C ) (partition function)

36 17 Example y i = 1 (part of foreground), 0 otherwise.

37 17 Example y i = 1 (part of foreground), 0 otherwise. Node potentials ψ1 (0) = 4, ψ 1 (1) = 1 ψ2 (0) = 2, ψ 2 (1) = 3... ψ9 (0) = 1, ψ 9 (1) = 1

38 Example y i = 1 (part of foreground), 0 otherwise. Node potentials ψ1 (0) = 4, ψ 1 (1) = 1 ψ2 (0) = 2, ψ 2 (1) = 3... ψ9 (0) = 1, ψ 9 (1) = 1 Edge potentials: Same for all edges ψ(0, 0) = 5, ψ(1, 1) = 5, ψ(1, 0) = 1, ψ(0, 1) = 1

39 Example y i = 1 (part of foreground), 0 otherwise. Node potentials ψ1 (0) = 4, ψ 1 (1) = 1 ψ2 (0) = 2, ψ 2 (1) = 3... ψ9 (0) = 1, ψ 9 (1) = 1 Edge potentials: Same for all edges ψ(0, 0) = 5, ψ(1, 1) = 5, ψ(1, 0) = 1, ψ(0, 1) = 1 Probability: Pr(y 1... y 9 ) 9 k=1 ψ k(y k ) (i,j) E(G) ψ(y i, y j )

40 18 Conditional independencies (CIs) in an undirected graphical model Let V = {y 1,..., y n }. 1 Local CI: y i V ne(y i ) {y i } ne(y i ) 2 Pairwise CI: y i y j V {y i, y j } if edge (y i, y j ) does not exist. 3 Global CI: X Y Z if Z separates X and Y in the graph. Equivalent when the distribution is positive. 1 y 1 y 3, y 5 y 6, y 7, y 8, y 9 y 2, y 4 2 y 1 y 3 y 2, y 4, y 5, y 6, y 7, y 8, y 9 3 y 1, y 2, y 3 y 7, y 8, y 9 y 4, y 5, y 6

41 19 Relationship between Local-CI and Global-CI Let G be a undirected graph over V = x 1,..., x n nodes and P(x 1,..., x n ) be a distribution. If P satisfies Global-CIs of G, then P will also satisfy the local-cis of G but the reverse is not always true. We will show this with an example. Consider a distribution over 5 binary variables: P(x 1,..., x 5 ) where x 1 = x 2, x 4 = x 5 and x 3 = x 2 ANDx 4. Let G be x 1 x 2 x 3 x 4 x 5 All 5 local CIs in the graph: e.g. x 1 {x 3, x 4, x 5 } x 2 etc hold in the graph. However, the global CI: x 2 x 4 x 3 does not hold.

42 Relationship between Local-CI and Pairwise-CI Let G be a undirected graph over V = x 1,..., x n nodes and P(x 1,..., x n ) be a distribution. If P satisfies Local-CIs of G, then P will also satisfy the pairwise-cis of G but the reverse is not always true. We will show this with an example. Consider a distribution over 3 binary variables: P(x 1, x 2, x 3 ) where x 1 = x 2 = x 3. That is, P(x 1, x 2, x 3 ) = 1/2 when all three are equal and 0 otherwise. Let G be x 1 x 2 x 3 All 2 pairwise CIs in the graph: e.g. x 1 {x 3 } x 2 and x 2 {x 3 } x 1 hold in the graph. However, the local CI: x 1 x 3 does not hold.

43 21 Factorization implies Global-CI Theorem Let G be a undirected graph over V = x 1,..., x n nodes and P(x 1,..., x n ) be a distribution. If P can be factorized as per the cliques of G, then P will also satisfy the global-cis of G Proof. Discussed in class.

44 Global-CI does not imply factorization. But global-ci does not imply factorization. Consider a distribution over 4 binary variables: P(x 1, x 2, x 3, x 4 ) Let G be x 1 x 2 x 4 x 3 Let P(x 1, x 2, x 3, x 4 ) = 1/8 when x 1, x 2, x 3, x 4 takes values from this set ={0000,1000,1100,1110,1111,0111,0011,0001}. In all other cases it is zero. One can painfully check that all four globals CIs in the graph: e.g. x 1 {x 3 } x 2, x 4 etc hold in the graph. Now let us look at factorization. The factors correspond to the edges in ψ(x 1, x 2 ). Each of the four possible assignment of each factor will get a positive value. But that cannot represent the zero probability for cases like x 1, x 2, x 3, x 4 = 0101.

45 Fractorization and CIs Theorem (Hammerseley Clifford Theorem) If a positive distribution P(x 1,..., x n ) confirms to the pairwise CIs of a UDGM G, then it can be factorized as per the cliques C of G as P(x 1,..., x n ) C G ψ C (y C ) Proof. Skipped.

46 24 Popular undirected graphical models Interacting atoms in gas and solids [ 1900] Markov Random Fields in vision for image segmentation Conditional Random Fields for information extraction Social networks Bio-informatics: annotating active sites in a protein molecules.

47 Comparing directed and undirected graphs Some distributions can only be expressed in one and not the other. x 2 x 3 x 4 x 3 x 4 x 5 x 5 Potentials Directed: conditional probabilities, more intuitive Undirected: arbitrary scores, easy to set. Dependence structure Directed: Complicated d-separation test Undirected: Graph separation: A B C iff C separates A and B in G. Often application makes the choice clear. Directed: Causality Undirected: Symmetric interactions. 25

48 26 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

49 27 Inference queries 1 Marginal probability queries over a small subset of variables: Find Pr(Income= High & Degree= PhD ) Find Pr(pixel y9 = 1) Pr(x 1 ) = Pr(x 1... x n ) x 2...x n

50 27 Inference queries 1 Marginal probability queries over a small subset of variables: Find Pr(Income= High & Degree= PhD ) Find Pr(pixel y9 = 1) Pr(x 1 ) = Pr(x 1... x n ) x 2...x n 2 Most likely labels of remaining variables: (MAP queries) Find most likely entity labels of all words in a sentence Find likely temperature at sensors in a room x = argmax x1...x n Pr(x 1... x n )

51 28 Exact inference on chains Given, Graph Potentials: ψi (y i, y i+1 ) Pr(y1,... y n ) = i ψ i(y i, y i+1 ) Find, Pr(y i ) for any i, say Pr(y 5 = 1) Exact method: Pr(y5 = 1) = y 1,...y 4 Pr(y 1,... y 4, 1) requires exponential number of summations. A more efficient alternative...

52 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = y 1 y 2 y 3 y 4 ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1)

53 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 ) ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4

54 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 ) ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 )B 3 (y 3 ) y 1 y 2 y 3

55 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 ) ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 )B 3 (y 3 ) y 1 y 2 y 3 = ψ 1 (y 1, y 2 )B 2 (y 2 ) y 1 y 2

56 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 ) ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 )B 3 (y 3 ) y 1 y 2 y 3 = ψ 1 (y 1, y 2 )B 2 (y 2 ) y 1 y 2 = B 1 (y 1 ) y 1

57 29 Exact inference on chains Pr(y 5 = 1) = Pr(y 1,... y 4, 1) y 1,...y 4 = ψ 1 (y 1, y 2 )ψ 2 (y 2, y 3 )ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 ) ψ 3 (y 3, y 4 )ψ 4 (y 4, 1) y 1 y 2 y 3 y 4 = ψ 1 (y 1, y 2 ) ψ 2 (y 2, y 3 )B 3 (y 3 ) y 1 y 2 y 3 = ψ 1 (y 1, y 2 )B 2 (y 2 ) y 1 y 2 = B 1 (y 1 ) y 1 An alternative view: flow of beliefs B i (.) from node i + 1 to node i

58 30 Adding evidence Given fixed values of a subset of variables x e (evidence), find the 1 Marginal probability queries over a small subset of variables: Find Pr(Income= High Degree= PhD ) Pr(x 1 ) = x 2...x m Pr(x 1... x n x e )

59 Adding evidence Given fixed values of a subset of variables x e (evidence), find the 1 Marginal probability queries over a small subset of variables: Find Pr(Income= High Degree= PhD ) Pr(x 1 ) = x 2...x m Pr(x 1... x n x e ) 2 Most likely labels of remaining variables: (MAP queries) Find likely temperature at sensors in a room given readings from a subset of them x = argmax x1...x m Pr(x 1... x n x e ) Easy to add evidence, just change the potential.

60 31 Inference in HMMs Given, y 1 y 2 y 3 y 4 y 5 y 6 y 7 Graph x 1 x 2 x 3 x 4 x 5 x 6 x 7 Potentials: Pr(yi y i 1 ), Pr(x i y i ) Evidence variables: x = x1... x n = o 1... o n. y 1 y 2 y 3 y 4 y 5 y 6 y Find most likely values of the hidden state variables. 7 y = y 1... y n argmax y Pr(y x = o) y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7

61 31 Inference in HMMs Given, y 1 y 2 y 3 y 4 y 5 y 6 y 7 Graph x 1 x 2 x 3 x 4 x 5 x 6 x 7 Potentials: Pr(yi y i 1 ), Pr(x i y i ) Evidence variables: x = x1... x n = o 1... o n. y 1 y 2 y 3 y 4 y 5 y 6 y Find most likely values of the hidden state variables. 7 y = y 1... y n argmax y Pr(y x = o) y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 Define ψ i (y i 1, y i ) = Pr(y i y i 1 ) Pr(x i = o i y i ) Reduced graph only a single chain of y nodes. y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7

62 31 Inference in HMMs Given, y 1 y 2 y 3 y 4 y 5 y 6 y 7 Graph x 1 x 2 x 3 x 4 x 5 x 6 x 7 Potentials: Pr(yi y i 1 ), Pr(x i y i ) Evidence variables: x = x1... x n = o 1... o n. y 1 y 2 y 3 y 4 y 5 y 6 y Find most likely values of the hidden state variables. 7 y = y 1... y n argmax y Pr(y x = o) y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 Define ψ i (y i 1, y i ) = Pr(y i y i 1 ) Pr(x i = o i y i ) Reduced graph only a single chain of y nodes. y 1 y 2 y 3 y 4 y 5 y 6 y 7 Algorithm same as earlier, just replace Sum with Max y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7

63 Inference in HMMs Given, y 1 y 2 y 3 y 4 y 5 y 6 y 7 Graph x 1 x 2 x 3 x 4 x 5 x 6 x 7 Potentials: Pr(yi y i 1 ), Pr(x i y i ) Evidence variables: x = x1... x n = o 1... o n. y 1 y 2 y 3 y 4 y 5 y 6 y Find most likely values of the hidden state variables. 7 y = y 1... y n argmax y Pr(y x = o) y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 Define ψ i (y i 1, y i ) = Pr(y i y i 1 ) Pr(x i = o i y i ) Reduced graph only a single chain of y nodes. y 1 y 2 y 3 y 4 y 5 y 6 y 7 Algorithm same as earlier, just replace Sum with Max This isythe 1 y 2 well-known y 2 y 3 y 3 y 4 Viterbi y 4 y 5 y 5 yalgorithm 6 y 6 y 7

64 Variable elimination on general graphs Given, arbitrary sets of potentials ψ C (x C ), C = cliques in a graph G. Find, Z = x 1,...,x n C ψ C(x C ) x 1,... x n = good ordering of variables F = ψ C (x C ), C = cliques in a graph G. for i = 1... n do F i = factors in F that contain x i M i = product of factors in F i m i = x i M i F = F F i {m i } end for

65 33 Example: Variable elimination Given, ψ 12 (x 1, x 2 ), ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), ψ 45 (x 4, x 5 ),, ψ 35 (x 3, x 5 ). Find, Z = x 1,...,x 5 ψ 12 (x 1, x 2 )ψ 24 (x 2, x 4 )ψ 23 (x 2, x 3 )ψ 45 (x 4, x 5 )ψ 35 (x 3, x 5 ). 1 x 1 : {ψ 12 (x 1, x 2 )} M 1 (x 1, x 2 ) x 1 m1 (x 2 )

66 33 Example: Variable elimination Given, ψ 12 (x 1, x 2 ), ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), ψ 45 (x 4, x 5 ),, ψ 35 (x 3, x 5 ). Find, Z = x 1,...,x 5 ψ 12 (x 1, x 2 )ψ 24 (x 2, x 4 )ψ 23 (x 2, x 3 )ψ 45 (x 4, x 5 )ψ 35 (x 3, x 5 ). 1 x 1 : {ψ 12 (x 1, x 2 )} M 1 (x 1, x 2 ) x 1 m1 (x 2 ) 2 x 2 : {ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), m 1 (x 2 )} M 2 (x 2, x 3, x 4 ) m 2 (x 3, x 4 ) x 2

67 33 Example: Variable elimination Given, ψ 12 (x 1, x 2 ), ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), ψ 45 (x 4, x 5 ),, ψ 35 (x 3, x 5 ). Find, Z = x 1,...,x 5 ψ 12 (x 1, x 2 )ψ 24 (x 2, x 4 )ψ 23 (x 2, x 3 )ψ 45 (x 4, x 5 )ψ 35 (x 3, x 5 ). 1 x 1 : {ψ 12 (x 1, x 2 )} M 1 (x 1, x 2 ) x 1 m1 (x 2 ) 2 x 2 : {ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), m 1 (x 2 )} M 2 (x 2, x 3, x 4 ) m 2 (x 3, x 4 ) 3 x 3 : {ψ 35 (x 3, x 5 ), m 2 (x 3, x 4 )} M 3 (x 3, x 4, x 5 ) x 2 x 3 m3 (x 4, x 5 )

68 33 Example: Variable elimination Given, ψ 12 (x 1, x 2 ), ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), ψ 45 (x 4, x 5 ),, ψ 35 (x 3, x 5 ). Find, Z = x 1,...,x 5 ψ 12 (x 1, x 2 )ψ 24 (x 2, x 4 )ψ 23 (x 2, x 3 )ψ 45 (x 4, x 5 )ψ 35 (x 3, x 5 ). 1 x 1 : {ψ 12 (x 1, x 2 )} M 1 (x 1, x 2 ) x 1 m1 (x 2 ) 2 x 2 : {ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), m 1 (x 2 )} M 2 (x 2, x 3, x 4 ) m 2 (x 3, x 4 ) 3 x 3 : {ψ 35 (x 3, x 5 ), m 2 (x 3, x 4 )} M 3 (x 3, x 4, x 5 ) 4 x 4 : {ψ 45 (x 4, x 5 ), m 3 (x 4, x 5 )} M 4 (x 4, x 5 ) x 2 x 3 m3 (x 4, x 5 ) x 4 m4 (x 5 )

69 Example: Variable elimination Given, ψ 12 (x 1, x 2 ), ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), ψ 45 (x 4, x 5 ),, ψ 35 (x 3, x 5 ). Find, Z = x 1,...,x 5 ψ 12 (x 1, x 2 )ψ 24 (x 2, x 4 )ψ 23 (x 2, x 3 )ψ 45 (x 4, x 5 )ψ 35 (x 3, x 5 ). 1 x 1 : {ψ 12 (x 1, x 2 )} M 1 (x 1, x 2 ) x 1 m1 (x 2 ) 2 x 2 : {ψ 24 (x 2, x 4 ), ψ 23 (x 2, x 3 ), m 1 (x 2 )} M 2 (x 2, x 3, x 4 ) m 2 (x 3, x 4 ) 3 x 3 : {ψ 35 (x 3, x 5 ), m 2 (x 3, x 4 )} M 3 (x 3, x 4, x 5 ) 4 x 4 : {ψ 45 (x 4, x 5 ), m 3 (x 4, x 5 )} M 4 (x 4, x 5 ) 5 x 5 : x {m 5 (x 5 )} M 5 (x 5 ) 5 Z x 2 x 3 m3 (x 4, x 5 ) x 4 m4 (x 5 )

70 34 Choosing a variable elimination order Complexity of VE O(nm w ) where w is the maximum number of variables in any factor. Wrong elimination order can give rise to very large intermediate factors. Example: eliminating x 2 first will give a factor of size 4. Given an example where the penalty can be really severe (?) Choosing the optimal elimination order is NP hard for general graphs. Polynomial time algorithm exists for chordal graphs. A graph is chordal or triangulated if all cycles of length greater than three have a shortcut. Optimal triangulation of graphs is NP hard. (Many heuristics)

71 Junction tree algorithm An optimal general-purpose algorithm for exact marginal/map queries Simultaneous computation of many queries Efficient data structures Complexity: O(m w N) w= size of the largest clique in (triangulated) graph, m = number of values of each discrete variable in the clique. linear for trees. Basis for many approximate algorithms. Many popular inference algorithms special cases of junction trees Viterbi algorithm of HMMs Forward-backward algorithm of Kalman filters

72 36 Junction tree Junction tree JT of a triangulated graph G with nodes x 1,..., x n is a tree where Nodes = cliques of G Edges ensure that if any two nodes contain a variable x i then x i is present in every node in the unique path between them (Running intersection property).

73 Junction tree Junction tree JT of a triangulated graph G with nodes x 1,..., x n is a tree where Nodes = cliques of G Edges ensure that if any two nodes contain a variable x i then x i is present in every node in the unique path between them (Running intersection property). Constructing a junction tree Efficient polynomial time algorithms exist for creating a JT from a triangulated graph. 1 Enumerate a covering set of cliques 2 Connect cliques to get a tree that satisfies the running intersection property. If graph is non-triangulated, triangulate first using heuristics, optimal triangulation is NP-hard.

74 37 Creating a junction tree from a graphical model 1. Starting graph 2. Triangulate graph 3. Create clique nodes x 1 x 2 x 3 x 1 x 2 x 3 x 1 x 2 x 4 x 5 x 4 x 5 x 2 x 3 x 4 x 3 x 4 x 5 4. Create tree edges such that variables connected. x 1 x 2 x 2 5) Assign potentials to exactly one subsumed clique node. x 1 x 2 x 2 Ψ 1 x 2 x 3 x 4 x 3 x 4 x 3 x 4 x 5 x 2 x 3 x 4 x 3 x 4 x 3 x 4 x 5 Ψ 23 Ψ 24 Ψ 45 Ψ 35

75 38 Finding cliques of a triangulated graph Theorem Every triangulated graph has a simplicial vertex, that is, a vertex whose neighbors form a complete set. Input: Graph G. n = number of vertices of G for i = 1,..., n do π i = pick any simplicial vertex in G C i = {π i } Ne(π i ) remove π i from G end for Return maximal cliques from C 1,... C n

76 39 Connecting cliques to form junction tree Separator variables = intersection of variables in the two cliques joined by an edge. Theorem A clique tree that satisfies the running intersection property maximizes the number of separator variables. Input: Cliques: C 1,... C k Form a complete weighted graph H with cliques as nodes and edge weights = size of the intersection of the two cliques it connects. T = maximum weight spanning tree of H Return T as the junction tree.

77 40 Belief propagation on junction trees Each node c sends belief Bc c (.) to each of its neighbors c once it has beliefs from every other neighbor N(c) {c }. Bc c (.) = belief that clique c has about the distribution of labels to common variables s = c c B c c (x s ) = x c s ψ c (x c ) d N(c) {c } Replace sum with max for MAP queries. B d c (x d c )

78 Belief propagation on junction trees Each node c sends belief Bc c (.) to each of its neighbors c once it has beliefs from every other neighbor N(c) {c }. Bc c (.) = belief that clique c has about the distribution of labels to common variables s = c c B c c (x s ) = x c s ψ c (x c ) d N(c) {c } Replace sum with max for MAP queries. Compute marginal probability of any variable x i as 1 c = clique in JT containing x i 2 Pr(x i ) x c xi ψ c (x c ) d N(c) B d c(x d c ) B d c (x d c )

79 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor.

80 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor. 2 Clique 345 sends belief B (y 34 ) = y 5 ψ 234 (y 345 ) to 234

81 41 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor. 2 Clique 345 sends belief B (y 34 ) = y 5 ψ 234 (y 345 ) to Clique 234 sends belief B (y 34 ) = y 2 ψ 234 (y 234 )B (y 2 ) to 345

82 41 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor. 2 Clique 345 sends belief B (y 34 ) = y 5 ψ 234 (y 345 ) to Clique 234 sends belief B (y 34 ) = y 2 ψ 234 (y 234 )B (y 2 ) to Clique 234 sends belief B (y 2 ) = y 4 ψ 234 (y 234 )B (y 34 ) to 12

83 41 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor. 2 Clique 345 sends belief B (y 34 ) = y 5 ψ 234 (y 345 ) to Clique 234 sends belief B (y 34 ) = y 2 ψ 234 (y 234 )B (y 2 ) to Clique 234 sends belief B (y 2 ) = y 4 ψ 234 (y 234 )B (y 34 ) to 12

84 41 Example ψ 234 (y 234 ) = ψ 23 (y 23 )ψ 34 (y 34 ) ψ 345 (y 345 ) = ψ 35 (y 35 )ψ 45 (y 45 ) ψ 234 (y 12 ) = ψ 12 (y 12 ) 1 Clique 12 sends belief B (y 2 ) = y 1 ψ 12 (y 12 ) to its only neighbor. 2 Clique 345 sends belief B (y 34 ) = y 5 ψ 234 (y 345 ) to Clique 234 sends belief B (y 34 ) = y 2 ψ 234 (y 234 )B (y 2 ) to Clique 234 sends belief B (y 2 ) = y 4 ψ 234 (y 234 )B (y 34 ) to 12 Pr(y 1 ) y 2 ψ 12 (y 12 )B (y 2 )

85 42 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

86 Why approximate inference Exact inference is NP hard. Complexity: O(w m ) w= tree width = size of the largest clique in (triangulated) graph-1, m = number of values of each discrete variable in the clique. Many real-life graphs produce large cliques on triangulation A n n grid has a tree width of n A Kalman filter on K parallel state variables influencing a common observation variable, has a tree width of size K + 1

87 Generalized belief propagation Approximate junction tree with a cluster graph where 1 Nodes = arbitrary clusters, not cliques in triangulated graph. Only ensure all potentials subsumed. 2 Separator nodes on edges = subset of intersecting variables. Special case: factor graphs. Example cluster graph Starting graph x 1 x 2 x 3 x 4 x5 Junction tree. Cluster graph x 1 x 2 x 2 x 3 x 4 x 5 x 2 x 2 x 3 x 4 x 3 x 4 x 3 x 4 x 5 x 1 x 2 x 2 x 3 x 2 x 4 x 3 x 5 x 4 x 5 44

88 45 Belief propagation in cluster graphs Graph can have loops, tree-based two-phase method not applicable. Many variants on scheduling order of propagating beliefs. Simple loopy belief propagation [?] Tree-reweighted message passing [?,?] Residual belief probagation [?] Many have no guarantees of convergence. Specific tree-based orders do [?] Works well in practice, default method of choice.

89 46 MCMC (Gibbs) sampling Useful when all else failes, guaranteed to converge to the optimal over infinite number of samples. Basic premise: easy to compute conditional probability Pr(x i fixed values of remaining variables) Algorithm Start with some initial assignment, say x 1 = [x 1,..., x n ] = [0,..., 0] For several iterations For each variable xi Get a new sample x t+1 by replacing value of x i with a new value sampled according to probability Pr(x i x1 t,... x i 1 t, x i+1 t,..., x n) t

90 47 Others Combinatorial algorithms for MAP [?]. Greedy algorithms: relaxation labeling. Variational methods like mean-field and structured mean-field. LP and QP based approaches.

91 48 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

92 Graph Structure 1 Manual: Designed by domain expert Used in applications where dependency structure is well-understood Example: QMR systems, Kalman filters, Vision (Grids), HMM for speech recognition and IE. 2 Learned from examples NP hard to find the optimal structure. Widely researched, mostly posed as a branch and bound search problem. Useful in dynamic situations

93 50 Parameters in Potentials 1 Manual: Provided by domain expert Used in infrequently constructured graphs, example QMR systems Also where potentials are an easy function of the attributes of connected graphs, example: vision networks. 2 Learned: from examples More popular since difficult for humans to assign numeric values Many variants of parameterizing potentials. 1 Table potentials: each entry a parameter, example, HMMs 2 Potentials: combination of shared parameters and data attributes: example, CRFs.

94 51 Learning potentials Given sample D = {x 1,..., x N } of data generated from a distribution P(x) represented by a graphical model with known structure G, learn potentials ψ C (x C ). Two dimensions: 1 All variables observed or not. 1 Fully observed: each training sample x i has all n variables observed. 2 Partially observed: a subset of the variables are observed. 2 Potentials coupled with a log-partition function or not. 1 No: Closed form solutions 2 Yes: Potentials attached to arbitrary overlapping subset of variables in a UDGM. Example = edge potentials in a grid graph. iterative solution as in the case of learning with shared parameters Discussed later.

95 Easy case: fully observed, decoupled potentials, table potentials 1 Potentials in a Bayesian network P(x) = i Pr(x i Pa(x i ) 2 potentials attached to maximal cliques in UDGM. C Cliques Pr(x) = Pr(x c) S Separators Pr(x s) Maximum likelihood estimation with constraints on potentials to make them behave like probabilities: Pr(x C ) = N i=1 [[xi C == x c]] N Pr(x j pa(x j )) = N i=1 [[x i j == x j, x i Pa(j) = pa(x j)]] N i=1 [[xi Pa(j) = pa(x j)]]

96 53 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

97 54 The framework Conditional distribution Pr(y x, θ), potentials are function of x and parameters θ to be learned. y = y 1,..., y n forms a graphical model: directed or undirected. Undirected: C Pr(y 1,..., y n x, θ) = ψ c(y c, x, θ) Z θ (x) = 1 Z θ (x) exp( F θ (y c, c, x)) c where Z θ (x) = y exp( c F θ(y c, c, x)) clique potential ψ c (y c, x) = exp(f θ (y c, c, x))

98 55 Local conditional probability for BN Pr(y 1,..., y n x, θ) = j = j Pr(y j y Pa(j), x, θ) exp(f θ (y Pa(j), y, j, x)) m y =1 exp(f θ(y Pa(j), y, j, x))

99 56 Forms of F θ (y c, c, x) Log-linear model over user-defined features. E.g. CRFs, Maxent models, etc. Let K be number of features. Denote a feature as f k (y c, c, x). Then, K F θ (y c, c, x) = θ k f k (y c, c, x) k=1 Arbitrary function, e.g. a neural network that takes as input y c, c, x and transforms them possibly non-linearly into a real value. θ are the parameters of the network.

100 Example: Named Entity Recognition 57

101 Named Entity Recognition: Features 58

102 59 Training Given N input output pairs D = {(x 1, y 1 ), (x 2, y 2 ),..., (x N, y N )} Form of F θ Learn parameters θ by maximum likelihood. max LL(θ, D) = max θ θ N log Pr(y i x i, θ) i=1

103 60 Training for BN LL(θ, D) = = N log Pr(y i x i, θ) i=1 N log j i=1 = i = i j j Pr(y i j y i Pa(j), x i, θ) log Pr(y i j y i Pa(j), x i, θ) F θ (y i Pa(j), y i j, j, x i )) log m exp(f θ (ypa(j), i y, j, x i )) y =1 Like normal classification task. No challenge arising during training because of graphical model. Normalizer is easy to compute.

104 61 Training undirected graphical model LL(θ, D) = = N log Pr(y i x i, θ) i=1 N 1 log Z θ (x i ) exp( c i=1 F θ (y i c, c, x i )) = [ F θ (yc, i c, x i ) log Z θ (x i ) i c The first part is easy to compute but the second term requires to invoke an inference algorithm to compute Z θ (x i ) for each i. Computing the gradient of the above objective with respect to θ also requires inference.

105 62 Training via gradient descent Assume log-linear models like in CRFs where F θ (y i c, c, x i ) = θ f(x i, y i c, c) Also, for brevity write f(x i, y i ) = c f(xi, y i c, c) LL(θ) = i log Pr(y i x i, θ) = i (θ f(x i, y i ) log Z θ (x i )) Add a regularizer to prevent over-fitting. max θ (θ f(x i, y i ) log Z θ (x i )) θ 2 /C i Concave in θ = gradient descent methods will work.

106 Gradient of the training objective L(θ) = i f(x i, y i ) y f(y, x i ) exp θ f(x i, y ) 2θ/C Z θ (x i ) = i f(x i, y i ) y f(x i, y ) Pr(y θ, x i ) 2θ/C = i f(x i, y i ) E Pr(y θ,x i )f(x i, y ) 2θ/C E Pr(y θ,x i )f k (x i, y ) = y f k (x i, y ) Pr(y θ, x i ) = y c f k(x i, y c, c) Pr(y θ, x i ) = c y f k(x i, y c c, c) Pr(y c θ, x i )

107 64 Computing E Pr(y θ t,x i )f k (x i, y) Three steps: 1 Pr(y θ t, x i ) is represented as an undirected model where nodes are the different components of y, that is y 1,..., y n. The potential ψ c (y c, x, θ) on clique c is exp(θ t f(x i, y i c, c)) 2 Run a sum-product inference algorithm on above UGM and compute for each c, y c marginal probability µ(y c, c, x i ). 3 Using these µs we compute E Pr(y θ t,x i )f k (x i, y) = c y c µ(y c, c, x i )f k (x i, c, y c )

108 65 Training algorithm 1: Initialize θ 0 = 0

109 65 Training algorithm 1: Initialize θ 0 = 0 2: for t = 1... T do 3: for i = 1... N do 4: g k,i = f k (x i, y i ) E Pr(y θ t,x i )f k (x i, y ) k = 1... K 5: end for 6: g k = i g k,i k = 1... K

110 65 Training algorithm 1: Initialize θ 0 = 0 2: for t = 1... T do 3: for i = 1... N do 4: g k,i = f k (x i, y i ) E Pr(y θ t,x i )f k (x i, y ) k = 1... K 5: end for 6: g k = i g k,i k = 1... K 7: θ t k = θt 1 k + γ t (g k 2θ t 1 k 8: Exit if g zero 9: end for /C)

111 65 Training algorithm 1: Initialize θ 0 = 0 2: for t = 1... T do 3: for i = 1... N do 4: g k,i = f k (x i, y i ) E Pr(y θ t,x i )f k (x i, y ) k = 1... K 5: end for 6: g k = i g k,i k = 1... K 7: θk t = θt 1 k /C) 8: Exit if g zero 9: end for Running time of the algorithm is O(INn(m 2 + K)) where I is the total number of iterations. + γ t (g k 2θ t 1 k

112 Partially observed, decoupled potentials y 1 y 2 y 3 y 4 y 5 y 6 y 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 EM Algorithm y 1 y 2 y 3 y 4 y 5 y 6 y Input: Graph G, Data D with 7 observed subset of variables x and hidden variables z. y 1 y 2 y 2 y 3 y 3 y 4 y 4 y 5 y 5 y 6 y 6 y Initially (t = 0): Assign random 7 variables of parameters Pr(x j pa(x j )) t for = 1,..., T do E-step for i = 1,..., N do Use inference in G to estimate conditionals Pr i (z c x i ) t for all variable subsets (i, pa(i)) involving any hidden variable. end for M-step Pr(x j pa(x j ) = z c ) t = end for N i=1 Pr i (z c x i )[[x i j ==x j ]] N i=1 Pr i (z c x i ) t 66

113 67 Part I: Outline 1 Representation Directed graphical models: Bayesian networks Undirected graphical models 2 Inference Queries Exact inference on chains Variable elimination on general graphs Junction trees 3 Approximate inference Generalized belief propagation Sampling: Gibbs, Particle filters 4 Constructing a graphical model Graph Structure Parameters in Potentials 5 General framework for Parameter learning in graphical models 6 References

114 More on graphical models Koller and Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, Wainwright s article in FnT for Machine Learning Kevin Murphy s brief online introduction ( Graphical models. M. I. Jordan. Statistical Science (Special Issue on Bayesian Statistics), 19, , (http: // Other text books: R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Graphical models by Lauritzen, Oxford science publications F. V. Jensen. Bayesian Networks and Decision Graphs. Springer

Graphical models. Sunita Sarawagi IIT Bombay

Graphical models. Sunita Sarawagi IIT Bombay 1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9 Undirected Models CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due next Wednesday (Nov 4) in class Start early!!! Project milestones due Monday (Nov 9)

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Graphical Models Another Approach to Generalize the Viterbi Algorithm Exact Marginalization Another Approach to Generalize the Viterbi Algorithm Oberseminar Bioinformatik am 20. Mai 2010 Institut für Mikrobiologie und Genetik Universität Göttingen mario@gobics.de 1.1 Undirected

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

CRF for human beings

CRF for human beings CRF for human beings Arne Skjærholt LNS seminar CRF for human beings LNS seminar 1 / 29 Let G = (V, E) be a graph such that Y = (Y v ) v V, so that Y is indexed by the vertices of G. Then (X, Y) is a conditional

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Message Passing Algorithms and Junction Tree Algorithms

Message Passing Algorithms and Junction Tree Algorithms Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Probabilistic Graphical Models (1): representation

Probabilistic Graphical Models (1): representation 1 / 28 Probabilistic Graphical Models (1): representation Qinfeng (Javen) Shi The ustralian entre for Visual Technologies, The University of delaide, ustralia 15 pril 2011 ourse Outline Probabilistic Graphical

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015 Course 16:198:520: Introduction To Artificial Intelligence Lecture 9 Markov Networks Abdeslam Boularias Monday, October 14, 2015 1 / 58 Overview Bayesian networks, presented in the previous lecture, are

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Machine Learning for Data Science (CS4786) Lecture 19

Machine Learning for Data Science (CS4786) Lecture 19 Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not

More information

Bayesian Networks: Representation, Variable Elimination

Bayesian Networks: Representation, Variable Elimination Bayesian Networks: Representation, Variable Elimination CS 6375: Machine Learning Class Notes Instructor: Vibhav Gogate The University of Texas at Dallas We can view a Bayesian network as a compact representation

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes Lecture 1 Slides March 28 th, 2006 Lec 1: March 28th, 2006 EE512

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Alternative Parameterizations of Markov Networks. Sargur Srihari

Alternative Parameterizations of Markov Networks. Sargur Srihari Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models Features (Ising,

More information

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018 Graphical Models Clique trees & Belief Propagation Siamak Ravanbakhsh Winter 2018 Learning objectives message passing on clique trees its relation to variable elimination two different forms of belief

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. ! Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall

Chapter 8 Cluster Graph & Belief Propagation. Probabilistic Graphical Models 2016 Fall Chapter 8 Cluster Graph & elief ropagation robabilistic Graphical Models 2016 Fall Outlines Variable Elimination 消元法 imple case: linear chain ayesian networks VE in complex graphs Inferences in HMMs and

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Variable Elimination (VE) Barak Sternberg

Variable Elimination (VE) Barak Sternberg Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information