Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
|
|
- Alexis West
- 6 years ago
- Views:
Transcription
1 Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller
2 Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate Gaussian Example Allowable Models and Constraints VIBES and Extensions
3 Variational Inference Our model (directed graphical model) X = (V,H ) V are visible variables H are latent variables includes parameters Our dilemma Exact inference algorithms are computationally intractable for all but the simplest models. Our goal Find tractable approximate: Q(H) P(H V)
4 Variational Inference Note the natural decomposition of log likelihood: where and lnp(v) = (Q) + KL(Q P) (Q) = H Q(H)ln P(H,V) Q(H) H KL(Q P) = Q(H)ln P(H V) Q(H)
5 Variational Inference For arbitrary Q lnp(v) = (Q) + KL(Q P) fixed maximize minimize Minimize KL(Q P) w.r.t unrestricted Q? We get: Q(H) = P(H V) but this is what we re trying to avoid
6 Variational Inference Family of distributions explored by Winn: where {H i } are disjoint groups of latent variables Vastly reduces space e.g. assuming a fully disjoint set of discrete variables: Q reduces P space: i Q(H) = Q i (H i ) H = N, H i {1,..,K} K N KN
7 Variational Inference Plug in factorized Q to lower bound equation: (Q) = Q i (H i ) lnp(h,v) Q i (H i )lnq i (H i ) H i i H i Separate out all terms in one factor Q j (Q) = KL(Q j Q j * ) + terms not in Q j Introduce some new distribution Q* j Minimize this KL divergence
8 Variational Inference Maximizing the lower bound w.r.t. some factor Q j : lnq j * (H j ) = ln P(H,V) ~Q(H j ) + const. Q * j (H j ) = 1 Z exp ( P(H,V) ) ~Q(H j ) Can see that solutions are coupled, each Q j depends on expectations w.r.t. factors Q i j Variational optimization proceeds by initializing each Q i and then cycling through each factor
9 Variational Inference (recap) 1. Choose a family, Q(H) of variational distributions: 2. Use Kullback-Leibler divergence, KL(Q P ), as a measure of distance between P (H V ) and Q(H). 3. Find Q that minimizes divergence (or equivalently, maximizes the lower bound).
10 KL Divergence minkl( p q) minkl(q p) Figures from Pattern Recognition and Machine Learning. Bishop, 2006.
11 KL Divergence minkl( p q) minkl(q p) Figures from Pattern Recognition and Machine Learning. Bishop, 2006.
12 Variational Inference in Directed Model Assuming a directed graphical model, full distribution: i P(X) = P(X i pa i ) Winn assumes fully factorized Q i Q(H) = Q i (H i )
13 Variational Inference in Directed Model Plugging factorized joint into optimized form of factor j: lnq j * = lnp(x i pa i ) i ~Q( H j ) + const. Terms not depending on H j will be constant, yielding: lnq * j (H j ) = ln P(H j pa j ) + ~Q( H j lnp(x ) k pa k ) + ~Q( H const. k ch j j ) Distribution only relies on parents, children and co-parents
14 Variational Inference in Directed Model For a factorized Q, each update equation relies only on variables in the Markov blanket Can decompose the overall optimization into a set of local computations
15 Conjugate-Exponential Models Simplify update equations conditional distributions from the exponential family conjugate w.r.t. distributions over parent variables y x A parent distribution p(x y) is said to be conjugate to child distribution p(w x) if p(x y) has the same functional form, with respect to x, as p(w x). p(x w, y) p(w x) p(x y) w same family same functional form
16 Conjugate-Exponential Models Conditional distributions expressed in exponential family form: T ln P ( X θ ) = θ u ( X ) + g ( θ ) + f ( X ) natural parameter vector sufficient statistics vector E.g. univariate Gausian: ln P ( X µ, γ ) T µγ X = ln / 2 + γ γµ γ X 2 π
17 Conjugate-Exponential Models Parents and children are chosen to be conjugate, i.e. have the same functional form ln T ln P ( X θ ) = θ u ( X ) + g ( θ ) + same f ( X ) T P ( Z X, Y ) = φ ( Y, Z ) u ( X ) + g '( X ) + E.g. f '( Y, Z ) Gaussian for the mean of a Gaussian Gamma for the precision of a Gaussian Dirichlet for the parameters of a discrete distribution X Z Y
18 Conjugate-Exponential Models P(Y X, pa Y ) = P(X Y,cp Y )P(Y pa Y ) same family same form (wrt Y)
19 Conjugate-Exponential Models lnp(y pa Y ) = φ Y ( pa Y ) Τ u Y (Y) + f Y (Y) +g Y (pa Y ) lnp(x Y,cp Y ) = φ X (Y,cp Y ) Τ u X (X) + f X (X) +g X (Y,cp Y ) = φ XY (X,cp Y ) Τ u Y (Y) + λ(x,cp Y )
20 Conjugate-Exponential Models lnq Y * (Y) = φ Y ( pa Y ) Τ u Y (Y) +f Y (Y) + g Y (pa Y ) ~Q(Y ) + φ XY (X k,cp k ) Τ u Y (Y) +λ(x k,cp k ) ~Q(Y ) + const. k ch j = φ Y ( pa Y ) ~Q(Y ) + φ XY (X k,cp k ) ~Q(Y ) k ch Y Τ u Y (Y)+f Y (Y) + const.
21 Conjugate-Exponential Models φ * Y = φ Y ( pa Y ) ~Q(Y ) + φ XY (X k,cp k ) ~Q(Y ) k ch Y φ Y { u i } i pay φ XY u k, u j = φ Y (pa Y ) ~Q(Y ) { } j cpk = φ XY (X k,cp k ) ~Q(Y )
22 Variational Message Passing Conditional distributions: T ln P ( X θ ) = θ u ( X ) + g ( θ ) + f ( X ) T ln P ( Z X, Y ) = φ ( Y, Z ) u ( X ) + g '( X ) + f '( Y, Z ) Messages: Parent to child (X Z) X Y Child to parent (Z X) Z
23 Varational Message Passing Optimal Q(X) has the same form as P(X θ), but with updated parameter vector θ* Computed from messages from parents X Y Z
24 VMP Example: defining the model Learning parameters of a Gaussian from N data points. mean µ γ precision (inverse variance) data x N N n =1 P(x µ,γ 1 ) = N(x n µ,γ 1 )
25 VMP Example: defining the model Learning parameters of a Gaussian from N data points. mean µ γ precision (inverse variance) data x N u x (x n ) = [x n, x n 2 ] Τ lnp(x µ,γ 1 ) = γµ γ /2 Τ x n 2 x n (lnγ γµ2 ln2π)
26 VMP Example: defining the model Learning parameters of a Gaussian from N data points. mean µ γ precision (inverse variance) data x N u µ (µ) = [µ,µ 2 ] Τ lnp(x µ,γ 1 ) = γx n γ /2 Τ µ µ (lnγ γx 2 n ln2π)
27 VMP Example: defining the model Learning parameters of a Gaussian from N data points. mean µ γ precision (inverse variance) data x N lnp(x µ,γ 1 ) = 1 (x 2 n µ)2 1 2 Τ u γ (γ) = [γ,lnγ] Τ γ ln2π lnγ
28 VMP Example: defining the model Learning parameters of a Gaussian from N data points. Gaussian distribution with hyper params (m, β ) Gamma distribution with hyper params (a, b ) lnp(µ m,β 1 ) = βm β/2 Τ µ µ (lnβ βm2 ln2π) lnp(γ a,b) = b a 1 mean µ γ Τ precision (inverse variance) γ + alnb lnγ(a) lnγ data x N
29 VMP Example: passing messages Variational Distribution: Q(µ,γ ) = Q(µ)Q(γ) µ γ x N Find initial values: u µ (µ) and u γ (γ )
30 VMP Example: passing messages Message from γ to all x. µ γ m γ xn = γ lnγ x N
31 VMP Example: passing messages Messages from each x n to µ. µ γ m xn µ = γ x n γ /2 x N lnp(x µ,γ 1 ) = γx n γ /2 Τ µ µ (lnγ γx 2 n ln2π)
32 VMP Example: passing messages Update Q(µ) parameter vector N φ * µ = φ µ + m xn µ n =1 = βm N + m β/2 xn µ n =1 µ x N γ
33 VMP Example: passing messages Message from updated µ to all x. µ γ m µ xn = µ µ 2 x N
34 VMP Example: passing messages µ x γ N m xn γ = 1 2 x n µ ( ) Messages from each x n to γ. lnp(x µ,γ 1 ) = 1 2 (x n µ)2 1 2 Τ γ lnγ ln2π
35 VMP Example: passing messages Update Q(γ) parameter vector µ x N γ φ γ * = φ γ + N m xn γ n =1 = b + a 1 N n =1 m xn γ
36 VMP Example: converged distribution Q(µ,γ ) = Q µ (µ)q γ (γ) P(µ,γ V)
37 Features of VMP Guaranteed to converge to a local minimum of KL(Q P) Flexible message passing schedule factors can be updated in any order (thought it may alter convergence) Graph does not need to be a tree (needs to be acyclic)
38 Allowable Models and Constraints Parent-child edges must satisfy conjugacy Gaussian variable: Gaussian parent for its mean Gamma parent for its precision Gamma variable: Gamma scale parameter b Discrete Variable Dirichlet prior
39 Allowable Models and Constraints Table 1: Distributions for each parameter of a number of exponential family distributions if the model is to satisfy conjugacy constraints. Conjugacy also holds if the distributions are replaced by their multivariate counterparts e.g. the distribution conjugate to the precision matrix of a multivariate Gaussian is a Wishart distribution. Where None is specified, no standard distribution satisfies conjugacy.
40 Allowable Models and Constraints Truncated Distributions Incorporates deterministic variables Mixture distributions Multivariate distributions
41 Allowable Models: Mixture Models Not in the exponential family: K k =1 P(X {π k },{θ k }) = π k P k (X θ k ) Introduce latent variable,, which indicates component P(X λ,{θ k }) = K k =1 P k (X θ k ) δ λk lnp(x λ,{θ k }) = δ(λ,k) [ φ k (θ k ) Τ u k (X) + f k (X) + g k (θ k )] k
42 Allowable Models: Mixture Models Require that all component distributions have the same natural statistic vector: u X (X) = Δ Can rewrite log conditional: u 1 (X) =... = u K (X) lnp(x λ,{θ k }) = φ X (λ,{θ k }) Τ u X (X) + f X (X) + g (φ X (λ,{θ k })) where φ X = δ( λ,k)φ k (θ k ) k
43 Allowable Models: Mixture Models Now the messages are defined as: m X θ k = Q(λ = k) φ Xθ k m X Z chx = u X (X) λ m X λ = lnp k (X θ k ) θ K x N Z
44 Allowable Models General architecture: arbitrary directed acyclic subgraphs of multinomial discrete variables (with Dirichlet priors) Arbitrary subgraphs of univariate and multivariate linear Gaussian nodes (having gamma and Wishart priors) Arbitrary mixture nodes providing connections from discrete to continuous subgraphs Can include deterministic nodes Any continuous distribution can be truncated to restrict range of allowable values Includes: HMMs, Kalman Filters, Factor Analysis, PCA, ICA
45 VIBES VIBES inspired by WinBUGS Graphically specify models, and run inference
46 Extensions Can introduce additional variational parameters to use non-conjugate distributions Logistic Sigmoid function can be estimated by Gaussian-like bound Next step would be achieve a posterior estimate with some dependency structure (i.e. structured variational inference)
Variational Message Passing
Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.
More informationVariational Message Passing
Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.
More informationVIBES: A Variational Inference Engine for Bayesian Networks
VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit
More informationVariational Bayes and Variational Message Passing
Variational Bayes and Variational Message Passing Mohammad Emtiyaz Khan CS,UBC Variational Bayes and Variational Message Passing p.1/16 Variational Inference Find a tractable distribution Q(H) that closely
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationVariational Learning : From exponential families to multilinear systems
Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationProbabilistic Graphical Models: Representation and Inference
Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationCollapsed Variational Inference for Sum-Product Networks
for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationOn the Relationship between Sum-Product Networks and Bayesian Networks
On the Relationship between Sum-Product Networks and Bayesian Networks International Conference on Machine Learning, 2015 Han Zhao Mazen Melibari Pascal Poupart University of Waterloo, Waterloo, ON, Canada
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationLecture 1a: Basic Concepts and Recaps
Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationInstructor: Dr. Volkan Cevher. 1. Background
Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationDeterministic Approximation Methods in Bayesian Inference
Deterministic Approximation Methods in Bayesian Inference Tobias Plötz Department of Computer Science Technical University of Darmstadt 64289 Darmstadt t_ploetz@rbg.informatik.tu-darmstadt.de Abstract
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationGraphical Models 359
8 Graphical Models Probabilities play a central role in modern pattern recognition. We have seen in Chapter 1 that probability theory can be expressed in terms of two simple equations corresponding to
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationDeep Variational Inference. FLARE Reading Group Presentation Wesley Tansey 9/28/2016
Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is Variational Inference? What is Variational Inference? Want to estimate some distribution, p*(x) p*(x) What is
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More information