Variational Scoring of Graphical Model Structures

Size: px
Start display at page:

Download "Variational Scoring of Graphical Model Structures"

Transcription

1 Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003

2 Overview Bayesian model selection Approximations using Variational Bayesian EM Annealed Importance Sampling Structure scoring in discrete Directed Acyclic Graphs Cheeseman-Stutz vs. Variational Bayes (if time)

3 Bayesian Model Selection Select the model class m j with the highest probability given the data y: p(m j y) = p(m j)p(y m j ), p(y m j ) = dθ j p(θ j m j )p(y θ j, m j ) p(y) Interpretation: The probability that randomly selected parameter values from the model class would generate data set y. too simple P(Data) "just right" too complex Y Data Sets Model classes that are too simple are unlikely to generate the data set. Model classes that are too complex can generate many possible data sets, so again, they are unlikely to generate that particular data set at random.

4 Intractable Likelihoods The marginal likelihood is often a difficult integral p(y m) = dθ p(θ m)p(y θ) because of the high dimensionality of the parameter space analytical intractability and also due to the presence of hidden variables: p(y m) = dθ p(θ m)p(y θ) = dθ p(θ m) dx p(y, x θ, m) x y θ

5 Practical Bayesian methods Laplace approximations: Appeal to Central Limit Theorem, making a Gaussian approximation about the maximum a posteriori parameter estimate, ˆθ. ln p(y m) ln p(ˆθ m) + ln p(y ˆθ) + d 2 ln 2π 1 2 ln H Large sample approximations: e.g. BIC: as n, ln p(y m) ln p(y ˆθ) d 2 ln n Markov chain Monte Carlo (MCMC): Guaranteed to converge in the limit. Many samples required for accurate results. Hard to assess convergence. Variational approximations... this changes the cost function Other deterministic approximations are also available now: Freeman & Weiss, 2000) and Expectation Propagation (Minka, 2001). e.g. Bethe approximations (Yedidia,

6 Lower Bounding the Marginal Likelihood Variational Bayesian Learning Let the hidden states be x, data y and the parameters θ. We can lower bound the marginal likelihood (Jensen s inequality): ln p(y m) = ln = ln dx dθ p(y, x, θ m) p(y, x, θ m) dx dθ q(x, θ) q(x, θ) dx dθ q(x, θ) ln p(y, x, θ m). q(x, θ) x y θ Use a simpler, factorised approximation to q(x, θ): ln p(y m) dx dθ q x (x)q θ (θ) ln = F m (q x (x), q θ (θ), y). p(y, x, θ m) q x (x)q θ (θ)

7 Updating the variational approximation... Maximizing this lower bound, F m, leads to EM-like updates: [ qx(x) exp [ qθ(θ) p(θ) exp ] dθ q θ (θ) ln p(x, y θ) ] dx q x (x) ln p(x, y θ) E like step M like step Maximizing F m is equivalent to minimizing KL-divergence between the approximate posterior, q θ (θ) q x (x) and the true posterior, p(θ, x y, m): ln p(y m) }{{} desired quantity F m (q x (x), q θ (θ), y) }{{} computable = dx dθ q x (x) q θ (θ) ln q x(x) q θ (θ) p(x, θ y, m) = KL(q p) }{{} measure of inaccuracy of approximation In the limit as n, for identifiable models, the variational lower bound approaches Schwartz s (1978) BIC criterion.

8 Conjugate-Exponential models Let s focus on conjugate-exponential (CE) models, which satisfy (1) and (2): Condition (1). The joint probability over variables is in the exponential family: p(x, y θ) = f(x, y) g(θ) exp { φ(θ) u(x, y) } where φ(θ) is the vector of natural parameters, u are sufficient statistics Condition (2). The prior over parameters is conjugate to this joint probability: p(θ η, ν) = h(η, ν) g(θ) η exp { φ(θ) ν } where η and ν are hyperparameters of the prior. Conjugate priors are computationally convenient and have an intuitive interpretation: η: number of pseudo-observations ν: values of pseudo-observations

9 Conjugate-Exponential examples In the CE family: Gaussian mixtures factor analysis, probabilistic PCA hidden Markov models and factorial HMMs linear dynamical systems and switching models discrete-variable belief networks Other as yet undreamt-of models can combine Gaussian, Gamma, Poisson, Dirichlet, Wishart, Multinomial and others. Not in the CE family: Boltzmann machines, MRFs (no conjugacy) logistic regression (no conjugacy) sigmoid belief networks (not exponential) independent components analysis (not exponential) One can often approximate these models with models in the CE family e.g. IFA (Attias, 1998).

10 A very useful result Theorem Given an iid data set y = (y 1,... y n ), if the model is CE then: (a) q θ (θ) is also conjugate, i.e. q θ (θ) = h( η, ν)g(θ) η exp { φ(θ) ν } (b) q x (x) = n i=1 q x i (x i ) is of the same form as in the E step of regular EM, but using pseudo parameters computed by averaging over q θ (θ) q xi (x i ) f(x i, y i ) exp { } φ u(x i, y i ) = p(x i y i, θ)? where φ = φ(θ) qθ (θ) = φ( θ) KEY points: (a) the approximate parameter posterior is of the same form as the prior; (b) the approximate hidden variable posterior, averaging over all parameters, is of the same form as the exact hidden variable posterior under θ.

11 The Variational Bayesian EM algorithm EM for MAP estimation Goal: maximize p(θ y, m) w.r.t. θ E Step: compute Variational Bayesian EM Goal: lower bound p(y m) VB-E Step: compute M Step: θ (t+1) = arg max θ q (t+1) x (x) = p(x y, θ (t) ) dx q (t+1) x (x) ln p(x, y, θ) VB-M Step: q (t+1) x (x) = p(x y, φ (t) ) [ q (t+1) θ (θ) exp ] dx q (t+1) x (x) ln p(x, y, θ) Properties: Reduces to the EM algorithm if q θ (θ) = δ(θ θ ). F m increases monotonically, and incorporates the model complexity penalty. Analytical parameter distributions (but not constrained to be Gaussian). VB-E step has same complexity as corresponding E step. We can use the junction tree, belief propagation, Kalman filter, etc, algorithms in the VB-E step of VB-EM, but using expected natural parameters, φ.

12 Variational Bayesian EM The Variational Bayesian EM algorithm has been used to approximate Bayesian learning in a wide range of models, such as: probabilistic PCA and factor analysis (Bishop, 1999) mixtures of Gaussians (Attias, 1999) mixtures of factor analysers (Ghahramani & Beal, 1999) state-space models (Ghahramani & Beal, 2000; Beal, 2003) ICA, IFA (Attias, 1999; Miskin & MacKay, 2000; Valpola 2000) mixtures of experts (Ueda & Ghahramani, 2000) hidden Markov models (MacKay, 1995; Beal, 2003) The main advantage is that it can be used to automatically do model selection and does not suffer from overfitting to the same extent as ML methods do. Also it is about as computationally demanding as the usual EM algorithm. See:

13 Graphical Models (Bayesian Networks / Directed Acyclic Graphical Models) A B A Bayesian network corresponds to a factorization of the joint distribution: C D p(a, B, C, D, E) =p(a)p(b)p(c A, B) p(d B, C)p(E C, D) E In general: p(x 1,..., X n ) = where pa(i) are the parents of node i. n p(x i Xpa(i)) i=1 Semantics: X Y V if V d-separates X from Y. Two advantages: interpretability and efficiency.

14 Model Selection Task Which of the following graphical models is the data generating process? Discrete directed acyclic graphical models: data y = (A, B, C, D, E) n ALL OBSERVED E A true complex simple C B D E A C B D E A C B D marginal likelihood tractable If the data are just y = (C, D, E) n, and s = (A, B) n are hidden variables...? true complex simple OBS.+HIDDEN E A C B D E A C B D E A C B D marginal likelihood intractable

15 Case study: Discrete DAGs Let the hidden and observed variables be denoted with z = {z 1,..., z n } = {s 1, y 1,..., s n, y n }, of which some j H are hidden and j V are observed variables, i.e. s i = {z ij } j H and y i = {z ij } j V. Complete-data likelihood Complete-data marginal likelihood p(z θ)= n z i i=1 j=1 p(z ij z ipa(j), θ), Incomplete-data likelihood p(z m)= dθ p(θ m) n z i i=1 j=1 p(z ij z ipa(j), θ) p(y θ) = n p(y i θ) = i=1 n z i i=1 {z ij } j H j=1 p(z ij z ipa(j), θ) Incomplete-data marginal likelihood p(y m) = dθ p(θ m) n z i p(z ij z ipa(j), θ) (intractable!) i=1 {z ij } j H j=1

16 The BIC and CS approximations BIC - Bayesian Information Criterion p(y m) = dθ p(θ m)p(y θ), ln p(y m) ln p(y m) BIC = ln p(y }{{ ˆθ) d } 2 ln n. use EM to find ˆθ CS - Cheeseman-Stutz criterion p(y m) p(y m) = p(z m) p(z m) dθ p(θ m)p(y θ) = p(s, y m) dθ p(θ m)p(s, y θ ), ( ) ln p(y m) ln p(y m) CS = ln p(ŝ, y m) + ln p(y ˆθ) ln p(ŝ, y ˆθ). ( ) is correct for any completion s of the hidden variables, so what completion ŝ should we use? [ CS uses result of E-step from the EM algorithm. ]

17 Experiments Bipartite structure: only hidden variables can be parents of observed variables. Two binary hidden variables, and four five-valued discrete observed variables. s i1 s i2 i=1...n y i1 y i2 y i3 y i4 Conjugate prior is Dirichlet, Conjugate-Exponential model, so the VB-EM algorithm is a straightforward modification of EM. Experiment: There are 136 distinct structures (out of 256) with 2 latent variables as potential parents of 4 conditionally independent observed vars. Score each structure for twenty varying size data sets: n {10, 20, 40, 80, 110, 160, 230, 320, 400, 430, 480, 560, 640, 800, 960, 1120, 1280, 2560, 5120, 10240} using 4 methods: BIC, CS, VB, and a gold standard AIS 2720 graphs to score, times for each: BIC (1.5s), CS (1.5s), VB (4s), AIS (400s).

18 Annealed Importance Sampling (AIS) AIS is a state-of-the-art method for estimating marginal likelihoods, by breaking a difficult integral into a series of easier ones. Combines ideas from importance sampling, Markov chain Monte Carlo, & annealing. Define Z k = dθ p(θ m)p(y θ) τ(k) = dθ f k (θ) with τ(0) = 0 = Z 0 = dθ p(θ m) = 1 and τ(k) = 1 = Z K = p(y m) normalisation of prior, marginal likelihood. Schedule: {τ(k)} K k=1, Z K Z 0 Z 1 Z 0 Z 2 Z 1... Z K Z K 1. Importance sample from f k 1 (θ) as follows: with θ (r) f k 1 (θ), Z k Z k 1 dθ f k(θ) f k 1 (θ) f k 1 (θ) Z k 1 1 R R r=1 f k (θ (r) ) f k 1 (θ (r) ) = 1 R R p(y θ (r), m) τ(k) τ(k 1) r=1 How reliable is AIS? How tight are the variational bounds?

19 How reliable is the AIS for this problem? Varying the annealing schedule with random initialisation. n = 480, K = Marginal Likelihood (AIS) * AIS VB BIC Duration of Annealing (samples)

20 Scoring all structures by every method 400 log marginal likelihood estimate MAP AIS VB BIC

21 Every method scoring every structure MAP BIC BICp CS VB AIS (5)

22 Ranking of the true structure by each method n MAP BIC BICp CS VB AIS (5)

23 10 0 Ranking the true structure VB score finds correct structure earlier, and more reliably AIS VB CS BICp BIC rank of true structure n

24 Acceptance rate of the AIS sampler M-H acceptance fraction, measured over each of four quarters of the annealing schedule st 2nd 3rd 4th acceptance fraction n

25 How true are the various scores? score difference Difference in scores between true and top-ranked structures AIS VB CS BICp BIC n

26 Average Rank of the true structure Averaged over true structure parameters drawn from the prior (106 instances) 10 0 VB CS BICp BIC median rank of true structure n

27 True Top-ranked score differences Averaged over true structure parameters drawn from the prior (106 instances) median score difference n VB CS BICp BIC

28 Overall Success Rate of each method Averaged over true structure parameters drawn from the prior (106 instances) success rate at selecting true structure VB CS BICp BIC n

29 Results summary VB outperforms BIC on ranking: 20 data sets, and 95 instances. % times that \ than BIC* BICp* CS* BIC BICp CS VB ranks worse same better AIS standard can break down at high n, violating VB strict lower bound when scoring the 136 structures: n % M-H rej. < single #AIS (1) <VB averaged #AIS (5) <VB AIS has many parameters to tune: Metropolis-Hastings proposal widths/shapes, annealing schedules (non-linear), # samples, reaching equilibrium... VB has none!

30 Cheeseman-Stutz is a lower bound p(y m) = p(y ˆθ) = dθ p(θ) p(y ˆθ) p(y m) CS = p(ŝ, y m) p(y m). p(ŝ, y ˆθ) n p(y i θ) i=1 n p(y i ˆθ) = i=1 dθ p(θ) n exp i=1 n exp i=1 q si (s i ) ln p(s i, y i θ) q s si (s i ) i s i ˆq si (s i ) ln p(s i, y i ˆθ) ˆq si (s i ).. where ˆq si (s i ) p(s i y, ˆθ), ŝ i : ln p(ŝ i, y ˆθ) = s i ˆq si (s i ) ln p(s i, y i θ). p(y m) = n 1 exp ˆq si (s i ) ln ˆq si (s i ) i=1 s i p(y ˆθ) n i=1 exp { s i ˆq si (s i ) ln p(s i, y i ˆθ)} = p(y ˆθ) n i=1 p(ŝ i, y i ˆθ) dθ p(θ) dθ p(θ) dθ p(θ) n exp ˆq si (s i ) ln p(s i, y i θ) s i n exp ˆq si (s i ) ln p(s i, y i θ) s i i=1 i=1 n p(ŝ i, y i θ). i=1

31 VB can be made universally tighter than CS ln p(y m) CS F m (q s (s), q θ (θ)) ln p(y m). Let s approach this result by considering the following forms for q s (s) and q θ (θ): n q s (s) = q si (s i ), with q si (s i ) = p(s i y i, ˆθ), i=1 q θ (θ) ln p(θ)p(s, y θ) qs (s). We write the form for q θ (θ) explicitly: q θ (θ) = p(θ) { n } i=1 exp s i q si (s i ) ln p(s i, y i θ) dθ p(θ ) { n }, i=1 exp s i q si (s i ) ln p(s i, y i θ ) and then substitute q s (s) and q θ (θ) into the variational lower bound F m.

32 F m (q s (s), q θ (θ)) = = = dθ q θ (θ) + dθ q θ (θ) n i=1 n q si (s i ) ln i=1 s i dθ q θ (θ) ln n q si (s i ) ln i=1 s i q si (s i ) ln p(s i, y i θ) q s si (s i ) i 1 q si (s i ) + dθ q θ (θ) ln p(θ) q θ (θ) n dθ p(θ ) exp q si (s i ) ln p(s i, y i θ ) i=1 s i n dθ p(θ) exp q si (s i ) ln p(s i, y i θ). s i 1 q si (s i ) + ln With this choice of q θ (θ) and q s (s) we acheive equality between the CS and VB approximations. We complete the proof by noting that at the very next step of VBEM (VBE-step) is guaranteed to increase or leave unchanged F m, and hence surpass the CS bound. i=1

33 Summary & Conclusions Bayesian learning avoids overfitting and can be used to do model selection. Variational Bayesian EM for CE models and propagation algorithms. These methods have advantages over MCMC in that they can provide fast approximate Bayesian inference. Especially important in machine learning applications with large data sets. Results: VB outperforms BIC and CS in scoring discrete DAGs. VB approaches the capability of AIS sampling, at little computational cost. Finds the true structure consistently, whereas AIS needs tuning (e.g. large n). Compute time: BIC CS VB AIS time to compute each graph score 1.5s 1.5s 4s 400s total time for all 2720 graphs 1hr 1hr 3hrs 13days No need to use CS because VB is provably better!

34 Future Work Comparison to other methods: Laplace Method Other more sophisticated MCMC variants? (e.g. slice sampling) Bethe/Kikuchi approximations to marginal likelihood for discrete models (Heskes) Incorporate into a local search algorithm over structures (exhaustive enumeration done here is only of academic interest!). Extend to Gaussian and other non-discrete graphical models. Apply to real-world data sets. VB tools development: Overlay an AIS module into a general variational inference engine, such as VIBES (Winn et al.) Automated algorithm derivation, such as AutoBayes (Gray et al.)

35 no more slides... coffee

36 Scoring all structures by every method 10 log marginal likelihood estimate MAP AIS VB BIC

37 Scoring all structures by every method 20 log marginal likelihood estimate MAP AIS VB BIC

38 Scoring all structures by every method 40 log marginal likelihood estimate MAP AIS VB BIC

39 Scoring all structures by every method 80 log marginal likelihood estimate MAP AIS VB BIC

40 Scoring all structures by every method 110 log marginal likelihood estimate MAP AIS VB BIC

41 Scoring all structures by every method 160 log marginal likelihood estimate MAP AIS VB BIC

42 Scoring all structures by every method 230 log marginal likelihood estimate MAP AIS VB BIC

43 Scoring all structures by every method 320 log marginal likelihood estimate MAP AIS VB BIC

44 Scoring all structures by every method 400 log marginal likelihood estimate MAP AIS VB BIC

45 Scoring all structures by every method 430 log marginal likelihood estimate MAP AIS VB BIC

46 Scoring all structures by every method 480 log marginal likelihood estimate MAP AIS VB BIC

47 Scoring all structures by every method 560 log marginal likelihood estimate MAP AIS VB BIC

48 Scoring all structures by every method 640 log marginal likelihood estimate MAP AIS VB BIC

49 Scoring all structures by every method 800 log marginal likelihood estimate MAP AIS VB BIC

50 Scoring all structures by every method 960 log marginal likelihood estimate MAP AIS VB BIC

51 Scoring all structures by every method 1120 log marginal likelihood estimate MAP AIS VB BIC

52 Scoring all structures by every method 1280 log marginal likelihood estimate MAP AIS VB BIC

53 Scoring all structures by every method 2560 log marginal likelihood estimate MAP AIS VB BIC

54 Scoring all structures by every method 5120 log marginal likelihood estimate MAP AIS VB BIC

55 Scoring all structures by every method log marginal likelihood estimate MAP AIS VB BIC

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures

The Variational Bayesian EM Algorithm for Incomplete Data: with Application to Scoring Graphical Model Structures BAYESIAN STATISTICS 7, pp. 000 000 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 The Variational Bayesian EM

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Learning the structure of discrete-variable graphical models with hidden variables

Learning the structure of discrete-variable graphical models with hidden variables Chapter 6 Learning the structure of discrete-variable graphical models with hidden variables 6.1 Introduction One of the key problems in machine learning and statistics is how to learn the structure of

More information

Variational inference in the conjugate-exponential family

Variational inference in the conjugate-exponential family Variational inference in the conjugate-exponential family Matthew J. Beal Work with Zoubin Ghahramani Gatsby Computational Neuroscience Unit August 2000 Abstract Variational inference in the conjugate-exponential

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Probabilistic Modelling and Bayesian Inference

Probabilistic Modelling and Bayesian Inference Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani University of Cambridge, UK Uber AI Labs, San Francisco, USA zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS Tübingen Lectures

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Probabilistic Modelling and Bayesian Inference

Probabilistic Modelling and Bayesian Inference Probabilistic Modelling and Bayesian Inference Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS Tübingen Lectures

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

VIBES: A Variational Inference Engine for Bayesian Networks

VIBES: A Variational Inference Engine for Bayesian Networks VIBES: A Variational Inference Engine for Bayesian Networks Christopher M. Bishop Microsoft Research Cambridge, CB3 0FB, U.K. research.microsoft.com/ cmbishop David Spiegelhalter MRC Biostatistics Unit

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Variational Message Passing

Variational Message Passing Journal of Machine Learning Research 5 (2004)?-? Submitted 2/04; Published?/04 Variational Message Passing John Winn and Christopher M. Bishop Microsoft Research Cambridge Roger Needham Building 7 J. J.

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Variational Learning : From exponential families to multilinear systems

Variational Learning : From exponential families to multilinear systems Variational Learning : From exponential families to multilinear systems Ananth Ranganathan th February 005 Abstract This note aims to give a general overview of variational inference on graphical models.

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information