Lecture 8: PGM Inference

Size: px
Start display at page:

Download "Lecture 8: PGM Inference"

Transcription

1 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401

2 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3

3 Marginal and MAP X1 X2 X3 X4 Marginal inference: P(x i ) = x j :j i P(x 1, x 2, x 3, x 4 ) MAP inference: (x 1, x 2, x 3, x 4 ) = argmax x 1,x 2,x 3,x 4 P(x 1, x 2, x 3, x 4 ) In general, x i argmax x i P(x i )

4 Marginals When do we need marginals? Marginals are used to compute normalisation constant Z = x i q(x i ) = x j q(x j ) i, j = 1,.... log loss in CRFs is log P(x 1,..., x n ) = log(z) +... expectations like E P(xi )[φ(x i )] and E P(xi,x j )[φ(x i, x j )], where ψ(x i ) = φ(x i ), w and ψ(x i, x j ) = φ(x i, x j ), w Gradient of CRFs risk contains above expectations.

5 MAP When do we need MAP? find the most likely configuration for (x i ) i V in testing. find the most violated constraint generated by (x i ) i V in training (i.e. learning), e.g. by cutting plane method (used in SVM-Struct) or by Bundle method for Risk Minimisation (Teo JMLR2010).

6 Variable elimination Variable elimination Max-product Sum-product max P(x 1, x 2, x 3, x 4 ) x 1,x 2,x 3,x 4 = max ψ(x 1, x 2 )ψ(x 2, x 3 )ψ(x 2, x 4 )ψ(x 1 )ψ(x 2 )ψ(x 3 )ψ(x 4 ) x 1,x 2,x 3,x 4 [ ( ) ( )] = max... max ψ(x 2, x 3 )ψ(x 3 ) max ψ(x 2, x 4 )ψ(x 4 ) x 1,x 2 x 3 x 4 [ ( )] = max ψ(x 1 ) max ψ(x 2 )ψ(x 1, x 2 )m 3 2 (x 2 )m 4 2 (x 2 ) x 1 x 2 ) ) ( = max ψ(x 1 )m 2 1 (x 1 ) x 1 ( x1 = argmax ψ(x 1 )m 2 1 (x 1 ) x 1

7 Max-product Variable elimination Max-product Sum-product Variable elimination for MAP Max-product: ( ) xi = argmax ψ(x i ) m j i (x i ) x i j Ne(i) ( m j i (x i ) = max ψ(x j )ψ(x i, x j ) x j k Ne(j)\{i} ) m k j (x j ) Ne(i): neighbouring nodes of i (i.e. nodes that connect with i). Ne(j)\{i} = if j has only one edge connecting it. e.g. x 1, x 3, x 4. For such node j, m j i (x i ) = max x j ψ(x j )ψ(x i, x j ) ( ) Easier computation!

8 Max-product Variable elimination Max-product Sum-product Order matters: message m 2 3 (x 3 ) requires m 1 2 (x 2 ) and m 4 2 (x 2 ). X1 X1 X1 m1->2(x2) m2->1(x1) X2 m3->2(x2) X2 m4->2(x2) m2->3(x3) X2 m2->4(x4) X3 X4 X3 X4 X3 X4 Alternatively, leaves to root, and root to leaves. X1 X1 X1 m2->1(x1) m1->2(x2) X2 m3->2(x2) X2 m4->2(x2) m2->3(x3) X2 m2->4(x4) X3 X4 X3 X4 X3 X4

9 Sum-product Variable elimination Max-product Sum-product Variable elimination for marginal Sum-product: P(x i ) = 1 Z ( ψ(x i ) j Ne(i) ) m j i (x i ) m j i (x i ) = x j (ψ(x j )ψ(x i, x j ) k Ne(j)\{i} ) m k j (x j )

10 Extension Variable elimination Max-product Sum-product To avoid over/under flow, often operate in the log space. Max/sum-product is also known as and Belief Propagation (BP). In graphs with loops, running BP for several iterations is known as Loopy BP (neither convergence nor optimal guarantee in general). Extend to Junction Tree Algorithm (exact, but expensive) and Clusters-based BP.

11 LP Relaxations LP Relaxations QP Relaxations Assume pairwise MRFs with graph G(V, E) P(X Y) = 1 ψ i,j (x i, x j ) ψ i (x i ) Z (i,j) E i V = 1 ( Z exp ) θ i (x i ) θ i,j (x i, x j ) + (i,j) E i V MAP X = argmax ψ i,j (x i, x j ) ψ i (x i ) X (i,j) E i V = argmax X (i,j) E θ i,j (x i, x j ) + i V θ i (x i )

12 LP Relaxations LP Relaxations QP Relaxations argmax X (i,j) E θ i,j (x i, x j ) + i V θ i (x i ) the following Integer Program: argmax q i,j (x i, x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) {q} x i,x j x i (i,j) E s.t. q i,j (x i, x j ) {0, 1}, x i,x j q i,j (x i, x j ) = 1, x i q i,j (x i, x j ) = q j (x j ). Relax to Linear Program: argmax q i,j (x i, x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) {q} x i,x j x i (i,j) E s.t. q i,j (x i, x j ) [0, 1], x i,x j q i,j (x i, x j ) = 1, x i q i,j (x i, x j ) = q j (x j ). i V i V

13 QP Relaxations LP Relaxations QP Relaxations Imposing q i,j (x i, x j ) = q i (x i )q j (x j ) yields QP: argmax {q} (i,j) E q i (x i )q j (x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) x i,x j x i s.t. q i (x i ) [0, 1], x i q i (x i ) = 1. i V Rewrite objective function as y T Θ y + y T θ, where y = [q i (x i )] i V,xi Val(X ) R N, θ = [θ i (x i )] i,xi Θ = [θ i,j (x i, x j )] i,xi,j,x j R N N. R N, and

14 QP Relaxations LP Relaxations QP Relaxations Problem: y T Θ y + y T θ may not be concave i.e. Θ may not be negative semidefinite (NSD). 1 In general for y i [0, 1], y i y 2 i 0, thus g(d, y) 0

15 QP Relaxations LP Relaxations QP Relaxations Problem: y T Θ y + y T θ may not be concave i.e. Θ may not be negative semidefinite (NSD). Solution: To find a d R N, such that (Θ diag(d)) is NSD. Given d, we have y T Θ y + y T θ = y T (Θ diag(d)) y +(y + d) T θ + g(d, y), where the gap g(d, y) = y T diag(d) y d T θ = N i=1 d i(y i yi 2). Note for y i {0, 1}, y i yi 2 = 0, thus g(d, y) = 0 1. So we know y T Θ y + y T θ y T (Θ diag(d)) y +(y + d) T θ }{{} Concave argmax y T (Θ diag(d)) y +(y + d) T θ y 1 In general for y i [0, 1], y i y 2 i 0, thus g(d, y) 0

16 Break LP Relaxations QP Relaxations Take a break...

17 Understanding samples In fact, there is no way to check a sample is from a distribution or not two totally different distributions can generate the same sample. For example, uniform[0, 1] and gaussian N(0, 1) can both generate a sample with value 0. Looking at a sample with value = 0 alone, how do you know its distribution for sure? What we really check (and know for sure) is the way that the samples were generated. When we say a procedure generates a sample from a distribution P, what we really mean is that keeping sampling this way (by the procedure), the normalised histogram H n with n samples is going to converge to the distribution P. That is H n P as n. If we don t know the way that the samples were generated, we never know what s the distribution for sure we can only guess (e.g. using statistical tests) based on a number of available samples.

18 Overview Monte Carlo Importance sampling Acceptance-rejection sampling

19 Monte Carlo Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. repeat draw sample(s) compute result according to the samples until sampled enough ( or the result is stable)

20 Monte Carlo To estimate π ( area of a circle with radius r is S c = πr 2 ). Idea: draw a circle ( r = 1) and a rectangle (2r 2r) enclosing the circle. We know the area of the rectangle is S rec = (2r) 2. If we can estimate the area of the circle, then we can estimate π by π = S c /r 2. Draw a sample point from the rectangle area uniformly. The chance of it being within the circle is S c /S rec. So if we throw enough points, we have N within /N total S c /S rec. Thus S c S rec N within /N total. Theorefore, π S recn within /N total r 2 = 4N within /N total See a matlab demo.

21 Monte Carlo #1000, pi e st: ,pi: π Estimated π True π Number of samples x 10 4

22 Monte Carlo To estimate an expectation: Generate samples x i q(x ), i = 1,..., N. E X q(x ) [f (X )] Ê X q(x ) [f (X )] = 1 N f (x i ), N i=1

23 Importance sampling To compute E X p(x ) [f (X )]. Assume p(x) (target distribution) is hard to sample from directly, and q(x) (proposal distribution) is easy to sample from and q(x) > 0 when p(x) > 0. E X p(x ) [f (X )] = p(x)f (x)dx x = x q(x) p(x) q(x) f (x)dx = E X q(x ) [ p(x ) f (X )]. q(x ) Ê X p(x ) [f (X )] = Ê X q(x ) [ p(x ) f (X )], q(x ) where Ê X q(x ) [f (X )] = 1 N f (x i ), x i q(x ), i = 1,..., N. N i=1

24 Acceptance-rejection sampling Target: to sample X from p(x). Given: q(x) easy to sample from. Find a constant M such that M q(x) p(x), x. repeat step 1: sample Y q(y) step 2: sample U Uniform[0, 1] p(y) M q(y) then if U then X = Y ; else reject and go to step 1. end if until sampled enough

25 Acceptance-rejection sampling Proof: Pr(accept X = x) = p(x) and Pr(X = x) = q(x) M q(x) Pr(accept) = Pr(accept X = x) Pr(X = x)dx x p(x) = x M q(x) q(x)dx = 1 ( thus don t want M big) M Pr(accept X ) P(X ) Pr(X accept) = Pr(accept) = p(x) M q(x) q(x) 1 M = p(x).

26 Understanding AR sampling (1) I guess the most confusing part, is why M comes in. So let s look at the case without M first. Denote the histogram formed by n samples from q(x) as H n q, the histogram formed by n samples from p(x) as H n p, the histogram formed by n accepted samples from AR sampling procedure as H n. For a sample x q(x), if p(x) < q(x), it suggests if you accept all the x and keep sampling this way, the histogram you will get is H n q. But what you really want to get, is a way that the resulting histogram H becomes H n p. Rejecting some portion of x can make the histogram H has the same shape as H p at point x. In other words, the histogram H has more counts at point x than H p, so we remove some counts to make H(x) = H p (x). (Take a moment to think this through).

27 Understanding AR sampling (2) What if for a sample x q(x), p(x) > q(x)? The histogram Hq n already has less counts than Hp n at x. What do we do? Well, we can sample M n points from q(x) to build Hq Mn first. Now Hq Mn should have more counts than Hp n at x (because we choose a M such that p(x) < Mq(x) for all x. If not, choose a larger M). Visually, Hq Mn encloses Hp n. At point x, we only want to keep Hp n (x) many samples from totally Hq Mn (x) many. This is how uniform sampling and M came in. We sample u Uniform[0, Mq(x)], accept x when u < p(x) (equivalent to sample u Uniform[0, 1], accept x when u < p(x)/mq(x)). As a result, after Mn samples, we will get a H close to Hp n. Moreover, lim n Hn = lim n Hn p = p.

28 Understanding AR sampling (3) Here we can choose any M such that p(x) < Mq(x) for all x. The bigger M is, the more samples (Mn samples) you need to approximate H n p. That s why in practice, people want to use the smallest M (such that p(x) < Mq(x) for all x) to reduce the number of rejected samples.

29 Sampling in PGM inference Overview:

30 Given an ordering of subsets of random variables {X i } n i=1 (knowing parents to generate children). for i = 1 to n do u i Pa(x i 1 ) sample x i from P(X i u i ) end for P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Happy H Letter L J Job P(J L,S) P(H G,J)

31 Assume {x i } M i=1 are M samples from P(X ), we can approximately compute expectation: E X P(X ) [f (X )] 1 M M f (x i ) i=1 MAP solution: argmax x P(x) argmax x {xi } M i=1 P(x) marginal: P(x) N X =x /N total sample from P(X e) when evidences e: sample from P(X ) first, and reject x when it does not agree on e.

32 Problems?

33 Problem: Rejection step in estimating P(X e) wastes too many samples when P(e) is small. In real applications, P(e) is almost always very small. Question: how do we avoid rejecting samples?

34 How about setting the observed random variables to the observed values, and then doing forward sampling on the rest?

35 Let s see if it works. To sample from P(D, I, G, L S = 0) from a simplified PGM. P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L Fixing S = 0, and then sample D, I, G, L. Does this give the same result comparing to forward sampling with rejection?

36 No! It doesn t. The samples are not from P(D, I, G, L S = 0) at all! Fixing this lead to.

37 Input: {Z i = z i } i are observed. Step 1: set {Z i } i to the observed values. Step 2: forward sampling the unobserved variables. Step 3: weight the sample by i P(zi Pa(z i ))

38 inference To sample from P(D, I, G, L S = 0) from the following PGM. P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L Fix S = 0, and forward sample D, I, G, L. Then weight the sample by P(D, I, G, L S = 0). Does this give the same result comparing to forward sampling with rejection?

39 E X P(D,I,G,L S=0) [f (D, I, G, L, 0)] 1 N [f (d j, i j, g j, l j, 0) P(d j, i j, g j, l j S = 0)] N j=1

40 P(D) Difficulty D Intelligence I P(I) P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L (a) G 1 with p(x ) Sample {x i } N i=1 from q(x ). Mutilate Ê X p(x ) [f (X )] = 1 N N i=1 Letter L (b) G 2 with q(x ) p(x i ) q(x i ) f (x i).

41 MAP Inference Revisit Primal Variable Elimination Max-product, (Loopy) BP) Junction Tree Algorithm and Clusters-based BP Linear Programming (LP) Relaxations Quadratic programming (QP). Semidefinite programming (SDP), Second-Order Cone Programming (SOCP)... Special potentials (Graph Cut)... Dual GMPLP, Dual decomposition...

42 Marginal Inference Revisit Many MAP inference methods can be converted to marginal ones (vice versa). max-product sum-product. LP based dual methods marginal (via + entropy term (kikuchi approximation)).

43 That s all Thanks!

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Bayesian networks: approximate inference

Bayesian networks: approximate inference Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Need for Sampling in Machine Learning. Sargur Srihari

Need for Sampling in Machine Learning. Sargur Srihari Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models CS4786/5786: Machine Learning for Data Science, Spring 2015 4/24/2015: Assignment 3: EM, graphical models Due Tuesday May 5th at 11:59pm on CMS. Submit what you have at least once by an hour before that

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II References Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference 15-780: Graduate Artificial Intelligence ayesian networks: Construction and inference ayesian networks: Notations ayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo)

More information

Stat 521A Lecture 18 1

Stat 521A Lecture 18 1 Stat 521A Lecture 18 1 Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2 Hybrid networks A hybrid

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Learning in Markov Random Fields An Empirical Study

Learning in Markov Random Fields An Empirical Study Learning in Markov Random Fields An Empirical Study Sridevi Parise, Max Welling University of California, Irvine sparise,welling@ics.uci.edu Abstract Learning the parameters of an undirected graphical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Likelihood Weighting and Importance Sampling

Likelihood Weighting and Importance Sampling Likelihood Weighting and Importance Sampling Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Likelihood Weighting Intuition Importance Sampling Unnormalized Importance Sampling Normalized Importance

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Markov-Chain Monte Carlo

Markov-Chain Monte Carlo Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. References Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

COMPSCI 276 Fall 2007

COMPSCI 276 Fall 2007 Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1 elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2 Probabilistic Inference

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

Lecture 8: Bayesian Networks

Lecture 8: Bayesian Networks Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Probabilistic Graphical Models (1): representation

Probabilistic Graphical Models (1): representation 1 / 28 Probabilistic Graphical Models (1): representation Qinfeng (Javen) Shi The ustralian entre for Visual Technologies, The University of delaide, ustralia 15 pril 2011 ourse Outline Probabilistic Graphical

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

From Bayesian Networks to Markov Networks. Sargur Srihari

From Bayesian Networks to Markov Networks. Sargur Srihari From Bayesian Networks to Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Bayesian Networks and Markov Networks From BN to MN: Moralized graphs From MN to BN: Chordal graphs 2 Bayesian Networks

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Scribe to lecture Tuesday March

Scribe to lecture Tuesday March Scribe to lecture Tuesday March 16 2004 Scribe outlines: Message Confidence intervals Central limit theorem Em-algorithm Bayesian versus classical statistic Note: There is no scribe for the beginning of

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Message Passing Algorithms and Junction Tree Algorithms

Message Passing Algorithms and Junction Tree Algorithms Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(

More information