Lecture 8: PGM Inference
|
|
- Madison Jenkins
- 5 years ago
- Views:
Transcription
1 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401
2 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3
3 Marginal and MAP X1 X2 X3 X4 Marginal inference: P(x i ) = x j :j i P(x 1, x 2, x 3, x 4 ) MAP inference: (x 1, x 2, x 3, x 4 ) = argmax x 1,x 2,x 3,x 4 P(x 1, x 2, x 3, x 4 ) In general, x i argmax x i P(x i )
4 Marginals When do we need marginals? Marginals are used to compute normalisation constant Z = x i q(x i ) = x j q(x j ) i, j = 1,.... log loss in CRFs is log P(x 1,..., x n ) = log(z) +... expectations like E P(xi )[φ(x i )] and E P(xi,x j )[φ(x i, x j )], where ψ(x i ) = φ(x i ), w and ψ(x i, x j ) = φ(x i, x j ), w Gradient of CRFs risk contains above expectations.
5 MAP When do we need MAP? find the most likely configuration for (x i ) i V in testing. find the most violated constraint generated by (x i ) i V in training (i.e. learning), e.g. by cutting plane method (used in SVM-Struct) or by Bundle method for Risk Minimisation (Teo JMLR2010).
6 Variable elimination Variable elimination Max-product Sum-product max P(x 1, x 2, x 3, x 4 ) x 1,x 2,x 3,x 4 = max ψ(x 1, x 2 )ψ(x 2, x 3 )ψ(x 2, x 4 )ψ(x 1 )ψ(x 2 )ψ(x 3 )ψ(x 4 ) x 1,x 2,x 3,x 4 [ ( ) ( )] = max... max ψ(x 2, x 3 )ψ(x 3 ) max ψ(x 2, x 4 )ψ(x 4 ) x 1,x 2 x 3 x 4 [ ( )] = max ψ(x 1 ) max ψ(x 2 )ψ(x 1, x 2 )m 3 2 (x 2 )m 4 2 (x 2 ) x 1 x 2 ) ) ( = max ψ(x 1 )m 2 1 (x 1 ) x 1 ( x1 = argmax ψ(x 1 )m 2 1 (x 1 ) x 1
7 Max-product Variable elimination Max-product Sum-product Variable elimination for MAP Max-product: ( ) xi = argmax ψ(x i ) m j i (x i ) x i j Ne(i) ( m j i (x i ) = max ψ(x j )ψ(x i, x j ) x j k Ne(j)\{i} ) m k j (x j ) Ne(i): neighbouring nodes of i (i.e. nodes that connect with i). Ne(j)\{i} = if j has only one edge connecting it. e.g. x 1, x 3, x 4. For such node j, m j i (x i ) = max x j ψ(x j )ψ(x i, x j ) ( ) Easier computation!
8 Max-product Variable elimination Max-product Sum-product Order matters: message m 2 3 (x 3 ) requires m 1 2 (x 2 ) and m 4 2 (x 2 ). X1 X1 X1 m1->2(x2) m2->1(x1) X2 m3->2(x2) X2 m4->2(x2) m2->3(x3) X2 m2->4(x4) X3 X4 X3 X4 X3 X4 Alternatively, leaves to root, and root to leaves. X1 X1 X1 m2->1(x1) m1->2(x2) X2 m3->2(x2) X2 m4->2(x2) m2->3(x3) X2 m2->4(x4) X3 X4 X3 X4 X3 X4
9 Sum-product Variable elimination Max-product Sum-product Variable elimination for marginal Sum-product: P(x i ) = 1 Z ( ψ(x i ) j Ne(i) ) m j i (x i ) m j i (x i ) = x j (ψ(x j )ψ(x i, x j ) k Ne(j)\{i} ) m k j (x j )
10 Extension Variable elimination Max-product Sum-product To avoid over/under flow, often operate in the log space. Max/sum-product is also known as and Belief Propagation (BP). In graphs with loops, running BP for several iterations is known as Loopy BP (neither convergence nor optimal guarantee in general). Extend to Junction Tree Algorithm (exact, but expensive) and Clusters-based BP.
11 LP Relaxations LP Relaxations QP Relaxations Assume pairwise MRFs with graph G(V, E) P(X Y) = 1 ψ i,j (x i, x j ) ψ i (x i ) Z (i,j) E i V = 1 ( Z exp ) θ i (x i ) θ i,j (x i, x j ) + (i,j) E i V MAP X = argmax ψ i,j (x i, x j ) ψ i (x i ) X (i,j) E i V = argmax X (i,j) E θ i,j (x i, x j ) + i V θ i (x i )
12 LP Relaxations LP Relaxations QP Relaxations argmax X (i,j) E θ i,j (x i, x j ) + i V θ i (x i ) the following Integer Program: argmax q i,j (x i, x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) {q} x i,x j x i (i,j) E s.t. q i,j (x i, x j ) {0, 1}, x i,x j q i,j (x i, x j ) = 1, x i q i,j (x i, x j ) = q j (x j ). Relax to Linear Program: argmax q i,j (x i, x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) {q} x i,x j x i (i,j) E s.t. q i,j (x i, x j ) [0, 1], x i,x j q i,j (x i, x j ) = 1, x i q i,j (x i, x j ) = q j (x j ). i V i V
13 QP Relaxations LP Relaxations QP Relaxations Imposing q i,j (x i, x j ) = q i (x i )q j (x j ) yields QP: argmax {q} (i,j) E q i (x i )q j (x j )θ i,j (x i, x j ) + q i (x i )θ i (x i ) x i,x j x i s.t. q i (x i ) [0, 1], x i q i (x i ) = 1. i V Rewrite objective function as y T Θ y + y T θ, where y = [q i (x i )] i V,xi Val(X ) R N, θ = [θ i (x i )] i,xi Θ = [θ i,j (x i, x j )] i,xi,j,x j R N N. R N, and
14 QP Relaxations LP Relaxations QP Relaxations Problem: y T Θ y + y T θ may not be concave i.e. Θ may not be negative semidefinite (NSD). 1 In general for y i [0, 1], y i y 2 i 0, thus g(d, y) 0
15 QP Relaxations LP Relaxations QP Relaxations Problem: y T Θ y + y T θ may not be concave i.e. Θ may not be negative semidefinite (NSD). Solution: To find a d R N, such that (Θ diag(d)) is NSD. Given d, we have y T Θ y + y T θ = y T (Θ diag(d)) y +(y + d) T θ + g(d, y), where the gap g(d, y) = y T diag(d) y d T θ = N i=1 d i(y i yi 2). Note for y i {0, 1}, y i yi 2 = 0, thus g(d, y) = 0 1. So we know y T Θ y + y T θ y T (Θ diag(d)) y +(y + d) T θ }{{} Concave argmax y T (Θ diag(d)) y +(y + d) T θ y 1 In general for y i [0, 1], y i y 2 i 0, thus g(d, y) 0
16 Break LP Relaxations QP Relaxations Take a break...
17 Understanding samples In fact, there is no way to check a sample is from a distribution or not two totally different distributions can generate the same sample. For example, uniform[0, 1] and gaussian N(0, 1) can both generate a sample with value 0. Looking at a sample with value = 0 alone, how do you know its distribution for sure? What we really check (and know for sure) is the way that the samples were generated. When we say a procedure generates a sample from a distribution P, what we really mean is that keeping sampling this way (by the procedure), the normalised histogram H n with n samples is going to converge to the distribution P. That is H n P as n. If we don t know the way that the samples were generated, we never know what s the distribution for sure we can only guess (e.g. using statistical tests) based on a number of available samples.
18 Overview Monte Carlo Importance sampling Acceptance-rejection sampling
19 Monte Carlo Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. repeat draw sample(s) compute result according to the samples until sampled enough ( or the result is stable)
20 Monte Carlo To estimate π ( area of a circle with radius r is S c = πr 2 ). Idea: draw a circle ( r = 1) and a rectangle (2r 2r) enclosing the circle. We know the area of the rectangle is S rec = (2r) 2. If we can estimate the area of the circle, then we can estimate π by π = S c /r 2. Draw a sample point from the rectangle area uniformly. The chance of it being within the circle is S c /S rec. So if we throw enough points, we have N within /N total S c /S rec. Thus S c S rec N within /N total. Theorefore, π S recn within /N total r 2 = 4N within /N total See a matlab demo.
21 Monte Carlo #1000, pi e st: ,pi: π Estimated π True π Number of samples x 10 4
22 Monte Carlo To estimate an expectation: Generate samples x i q(x ), i = 1,..., N. E X q(x ) [f (X )] Ê X q(x ) [f (X )] = 1 N f (x i ), N i=1
23 Importance sampling To compute E X p(x ) [f (X )]. Assume p(x) (target distribution) is hard to sample from directly, and q(x) (proposal distribution) is easy to sample from and q(x) > 0 when p(x) > 0. E X p(x ) [f (X )] = p(x)f (x)dx x = x q(x) p(x) q(x) f (x)dx = E X q(x ) [ p(x ) f (X )]. q(x ) Ê X p(x ) [f (X )] = Ê X q(x ) [ p(x ) f (X )], q(x ) where Ê X q(x ) [f (X )] = 1 N f (x i ), x i q(x ), i = 1,..., N. N i=1
24 Acceptance-rejection sampling Target: to sample X from p(x). Given: q(x) easy to sample from. Find a constant M such that M q(x) p(x), x. repeat step 1: sample Y q(y) step 2: sample U Uniform[0, 1] p(y) M q(y) then if U then X = Y ; else reject and go to step 1. end if until sampled enough
25 Acceptance-rejection sampling Proof: Pr(accept X = x) = p(x) and Pr(X = x) = q(x) M q(x) Pr(accept) = Pr(accept X = x) Pr(X = x)dx x p(x) = x M q(x) q(x)dx = 1 ( thus don t want M big) M Pr(accept X ) P(X ) Pr(X accept) = Pr(accept) = p(x) M q(x) q(x) 1 M = p(x).
26 Understanding AR sampling (1) I guess the most confusing part, is why M comes in. So let s look at the case without M first. Denote the histogram formed by n samples from q(x) as H n q, the histogram formed by n samples from p(x) as H n p, the histogram formed by n accepted samples from AR sampling procedure as H n. For a sample x q(x), if p(x) < q(x), it suggests if you accept all the x and keep sampling this way, the histogram you will get is H n q. But what you really want to get, is a way that the resulting histogram H becomes H n p. Rejecting some portion of x can make the histogram H has the same shape as H p at point x. In other words, the histogram H has more counts at point x than H p, so we remove some counts to make H(x) = H p (x). (Take a moment to think this through).
27 Understanding AR sampling (2) What if for a sample x q(x), p(x) > q(x)? The histogram Hq n already has less counts than Hp n at x. What do we do? Well, we can sample M n points from q(x) to build Hq Mn first. Now Hq Mn should have more counts than Hp n at x (because we choose a M such that p(x) < Mq(x) for all x. If not, choose a larger M). Visually, Hq Mn encloses Hp n. At point x, we only want to keep Hp n (x) many samples from totally Hq Mn (x) many. This is how uniform sampling and M came in. We sample u Uniform[0, Mq(x)], accept x when u < p(x) (equivalent to sample u Uniform[0, 1], accept x when u < p(x)/mq(x)). As a result, after Mn samples, we will get a H close to Hp n. Moreover, lim n Hn = lim n Hn p = p.
28 Understanding AR sampling (3) Here we can choose any M such that p(x) < Mq(x) for all x. The bigger M is, the more samples (Mn samples) you need to approximate H n p. That s why in practice, people want to use the smallest M (such that p(x) < Mq(x) for all x) to reduce the number of rejected samples.
29 Sampling in PGM inference Overview:
30 Given an ordering of subsets of random variables {X i } n i=1 (knowing parents to generate children). for i = 1 to n do u i Pa(x i 1 ) sample x i from P(X i u i ) end for P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Happy H Letter L J Job P(J L,S) P(H G,J)
31 Assume {x i } M i=1 are M samples from P(X ), we can approximately compute expectation: E X P(X ) [f (X )] 1 M M f (x i ) i=1 MAP solution: argmax x P(x) argmax x {xi } M i=1 P(x) marginal: P(x) N X =x /N total sample from P(X e) when evidences e: sample from P(X ) first, and reject x when it does not agree on e.
32 Problems?
33 Problem: Rejection step in estimating P(X e) wastes too many samples when P(e) is small. In real applications, P(e) is almost always very small. Question: how do we avoid rejecting samples?
34 How about setting the observed random variables to the observed values, and then doing forward sampling on the rest?
35 Let s see if it works. To sample from P(D, I, G, L S = 0) from a simplified PGM. P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L Fixing S = 0, and then sample D, I, G, L. Does this give the same result comparing to forward sampling with rejection?
36 No! It doesn t. The samples are not from P(D, I, G, L S = 0) at all! Fixing this lead to.
37 Input: {Z i = z i } i are observed. Step 1: set {Z i } i to the observed values. Step 2: forward sampling the unobserved variables. Step 3: weight the sample by i P(zi Pa(z i ))
38 inference To sample from P(D, I, G, L S = 0) from the following PGM. P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L Fix S = 0, and forward sample D, I, G, L. Then weight the sample by P(D, I, G, L S = 0). Does this give the same result comparing to forward sampling with rejection?
39 E X P(D,I,G,L S=0) [f (D, I, G, L, 0)] 1 N [f (d j, i j, g j, l j, 0) P(d j, i j, g j, l j S = 0)] N j=1
40 P(D) Difficulty D Intelligence I P(I) P(D) Difficulty D Intelligence I P(I) P(G D,I) Grade G P(L G) S SAT P(S I) P(G D,I) Grade G P(L G) S SAT P(S I) Letter L (a) G 1 with p(x ) Sample {x i } N i=1 from q(x ). Mutilate Ê X p(x ) [f (X )] = 1 N N i=1 Letter L (b) G 2 with q(x ) p(x i ) q(x i ) f (x i).
41 MAP Inference Revisit Primal Variable Elimination Max-product, (Loopy) BP) Junction Tree Algorithm and Clusters-based BP Linear Programming (LP) Relaxations Quadratic programming (QP). Semidefinite programming (SDP), Second-Order Cone Programming (SOCP)... Special potentials (Graph Cut)... Dual GMPLP, Dual decomposition...
42 Marginal Inference Revisit Many MAP inference methods can be converted to marginal ones (vice versa). max-product sum-product. LP based dual methods marginal (via + entropy term (kikuchi approximation)).
43 That s all Thanks!
Lecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationBayesian networks: approximate inference
Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationRecitation 9: Loopy BP
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationMessage-Passing Algorithms for GMRFs and Non-Linear Optimization
Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationMessage Passing and Junction Tree Algorithms. Kayhan Batmanghelich
Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationNeed for Sampling in Machine Learning. Sargur Srihari
Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More information1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models
CS4786/5786: Machine Learning for Data Science, Spring 2015 4/24/2015: Assignment 3: EM, graphical models Due Tuesday May 5th at 11:59pm on CMS. Submit what you have at least once by an hour before that
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationIntroduction To Graphical Models
Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013
More informationResults: MCMC Dancers, q=10, n=500
Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time
More informationCSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches
CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationReferences. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II
References Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can
More informationInference as Optimization
Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationDual Decomposition for Inference
Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More information15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference
15-780: Graduate Artificial Intelligence ayesian networks: Construction and inference ayesian networks: Notations ayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo)
More informationStat 521A Lecture 18 1
Stat 521A Lecture 18 1 Outline Cts and discrete variables (14.1) Gaussian networks (14.2) Conditional Gaussian networks (14.3) Non-linear Gaussian networks (14.4) Sampling (14.5) 2 Hybrid networks A hybrid
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationLearning in Markov Random Fields An Empirical Study
Learning in Markov Random Fields An Empirical Study Sridevi Parise, Max Welling University of California, Irvine sparise,welling@ics.uci.edu Abstract Learning the parameters of an undirected graphical
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationLikelihood Weighting and Importance Sampling
Likelihood Weighting and Importance Sampling Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Likelihood Weighting Intuition Importance Sampling Unnormalized Importance Sampling Normalized Importance
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationMarkov-Chain Monte Carlo
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. References Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationCOMPSCI 276 Fall 2007
Exact Inference lgorithms for Probabilistic Reasoning; OMPSI 276 Fall 2007 1 elief Updating Smoking lung ancer ronchitis X-ray Dyspnoea P lung cancer=yes smoking=no, dyspnoea=yes =? 2 Probabilistic Inference
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationProbabilistic Graphical Models (1): representation
1 / 28 Probabilistic Graphical Models (1): representation Qinfeng (Javen) Shi The ustralian entre for Visual Technologies, The University of delaide, ustralia 15 pril 2011 ourse Outline Probabilistic Graphical
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationFrom Bayesian Networks to Markov Networks. Sargur Srihari
From Bayesian Networks to Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Bayesian Networks and Markov Networks From BN to MN: Moralized graphs From MN to BN: Chordal graphs 2 Bayesian Networks
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationInference and Representation
Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationMinimizing D(Q,P) def = Q(h)
Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact
More informationApproximate Message Passing
Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationScribe to lecture Tuesday March
Scribe to lecture Tuesday March 16 2004 Scribe outlines: Message Confidence intervals Central limit theorem Em-algorithm Bayesian versus classical statistic Note: There is no scribe for the beginning of
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationMessage Passing Algorithms and Junction Tree Algorithms
Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(
More information