Bayesian Inference. Sudhir Shankar Raman Translational Neuromodeling Unit, UZH & ETH. The Reverend Thomas Bayes ( )

Size: px
Start display at page:

Download "Bayesian Inference. Sudhir Shankar Raman Translational Neuromodeling Unit, UZH & ETH. The Reverend Thomas Bayes ( )"

Transcription

1 The Reverend Thomas Baes ( ) Baesian Inference Sudhir Shankar Raman Translational Neuromodeling Unit, UZH & ETH With man thanks for some slides to: Klaas Enno Stephan & Ka H. Brodersen

2 Wh do I need to learn about Baesian stats? Because SPM is getting more and more Baesian: Segmentation & spatial normalisation Posterior probabilit maps (PPMs) Dnamic Causal Modelling (DCM) Baesian Model Selection (BMS) EEG: source reconstruction

3 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

4 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

5 Classical and Baesian statistics p-value: probabilit of getting the observed data in the effect s absence. If small, reject null hpothesis that there is no effect. Probabilit of observing the data, given no effect (θ = 0). Baesian Inference Flexibilit in modelling Incorporating prior information Posterior probabilit of effect Options for model comparison H0 : θ = 0 p( H ) p(θ ) 0 p(, θ ) p( θ ) One can never accept the null hpothesis Given enough data, one can alwas demonstrate a significant effect Correction for multiple comparisons necessar Statistical analsis and the illusion of objectivit. James O. Berger, Donald A. Berr

6 Baes Theorem Observed Data Posterior Beliefs Reverend Thomas Baes Prior Beliefs Baes theorem describes, how an ideall rational person processes information."

7 ) ( ) ( ) ( ) ( p p p P θ θ θ = Eliminating p(,θ) gives Baes rule: Likelihood Prior Evidence Posterior Baes Theorem Given data and parameters θ, the conditional probabilities are: ) ( ), ( ) ( θ θ θ p p p = ) ( ), ( ) ( p p p θ θ =

8 new data prior knowledge p ( θ ) p(θ ) Baesian statistics p( θ ) p( θ ) p( θ ) posterior likelihood prior Baes theorem allows one to formall incorporate prior knowledge into computing statistical probabilities. Priors can be of different sorts: empirical, principled or shrinkage priors, uninformative. The posterior probabilit of the parameters given the data is an optimal combination of prior knowledge and new data, weighted b their relative precision.

9 Baes in motion - an animation Likelihood Prior Posterior

10 Principles of Baesian inference Formulation of a generative model Model Likelihood function p( θ) prior distribution p(θ) Observation of data Data Model Inversion - Update of beliefs based upon observations, given a prior state of knowledge p( θ ) p( θ) p( θ) Maximum a posteriori (MAP) Maximum likelihood (ML)

11 Conjugate Priors Prior and Posterior have the same form p( θ ) = p( θ ) p( θ ) p( ) Same form!! Analtical expression. Conjugate priors for all exponential famil members. Example Gaussian Likelihood, Gaussian prior over mean

12 Likelihood & prior Posterior: Prior Likelihood Posterior Gaussian Model µ p µ ), ( ) ( ), ( ) ( 1 1 = = p p e N p N p λ µ θ θ λ θ θ ), ( ) ( 1 = λ µ θ θ N p p p e p e µ λ λ λ λ µ λ λ λ + = + = Relative precision weighting θ + ε =

13 Baesian regression: univariate case Relative precision weighting Normal densities = xθ + ε + = + = p p e p e x x η σ σ σ η σ σ σ θ θ θ x Univariate linear model ), ( ) ( 2 p p N p σ η θ θ = ), ( ) ( 2 e x N p σ θ θ = ), ( ) ( 2 N p θ η θ σ θ θ = η p η θ

14 Baesian GLM: multivariate case Normal densities p( θ ) = N( θ; η p, C p ) General Linear Model = Xθ + e p θ) = N( ; Xθ, C ( e ) p( θ ) = N( θ; η θ, C θ ) θ 2 C η 1 θ θ = X = C θ T C 1 e X + C ( ) T 1 1 X C + C η e 1 p p p One step if C e is known. Otherwise define conjugate prior or perform iterative estimation with EM. θ 1

15 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

16 Baesian model selection (BMS) Given competing hpotheses on structure & functional mechanisms of a sstem, which model is the best? Which model represents the best balance between model fit and model complexit? For which model m does p( m) become maximal? Pitt & Miung (2002), TICS

17 Baesian model selection (BMS) Baes rule: p( θ, m) = p( θ, m) p( θ m) p( m) Model evidence: p ( m) = p( θ, m) p( θ m) dθ accounts for both accurac and complexit of the model allows for inference about structure (generalizabilit) of the model Model comparison via Baes factor: p m 1 ) p(m 2 ) = p( m 1) p(m 1 ) p( m 2 ) p(m 2 ) BB = p( m 1) p( m 2 ) Model averaging p θ = p θ, m p(m ) m Kass and Rafter (1995), Penn et al. (2004) NeuroImage BF 10 Evidence against H 0 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Decisive

18 Model Evidence McKa 1992, Neural Computations, Bishop PRML 2007

19 Baesian model selection (BMS) Various Approximations: Akaike Information Criterion (AIC) Akaike, 1974 ln p(d) ln p D θ MM ) M Baesian Information Criterion (BIC) Schwarz, 1978 ln p(d) ln p D θ ML ) 1 M ln (N) 2 Negative free energ ( F ) A b-product of Variational Baes Path Sampling (Thermodnamic Integration) - MCMC

20 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

21 Approximate Baesian inference Baesian inference formalizes model inversion, the process of passing from a prior to a posterior in light of data. posterior p θ = likelihood prior p θ p θ p, θ dθ marginal likelihood p (model evidence) In practice, evaluating the posterior is usuall difficult because we cannot easil evaluate p, especiall when: High dimensionalit, complex form analtical solutions are not available numerical integration is too expensive Source: Ka H. Brodersen, 2013,

22 Approximate Baesian inference There are two approaches to approximate inference. The have complementar strengths and weaknesses. Deterministic approximate inference in particular variational Baes Stochastic approximate inference in particular sampling find an analtical prox q θ that is maximall similar to p θ inspect distribution statistics of q θ (e.g., mean, quantiles, intervals, ) often insightful and fast often hard work to derive converges to local minima design an algorithm that draws samples θ 1,, θ m from p θ inspect sample statistics (e.g., histogram, sample quantiles, ) asmptoticall exact computationall expensive trick engineering concerns Source: Ka H. Brodersen, 2013,

23 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

24 The Laplace approximation The Laplace approximation provides a wa of approximating a densit whose normalization constant we cannot evaluate, b fitting a Normal distribution to its mode. p z = 1 Z normalization constant (unknown) f z main part of the densit (eas to evaluate) Pierre-Simon Laplace ( ) French mathematician and astronomer This is exactl the situation we face in Baesian inference: p θ = 1 p model evidence (unknown) p, θ joint densit (eas to evaluate) Source: Ka H. Brodersen, 2013,

25 Appling the Laplace approximation Given a model with parameters θ = θ 1,, θ p, the Laplace approximation reduces to a simple three-step procedure: Find the mode of the log-joint: θ = arg max θ ln p, θ Evaluate the curvature of the log-joint at the mode: ln p, θ We obtain a Gaussian approximation: N θ μ, Λ 1 with μ = θ Λ = ln p, θ Source: Ka H. Brodersen, 2013,

26 Limitations of the Laplace approximation The Laplace approximation is often too strong a simplification ignores global properties of the posterior 0.25 log joint approximation becomes brittle when the posterior is multimodal onl directl applicable to real-valued parameters Source: Ka H. Brodersen, 2013,

27 Variational Baes Goal: Choose q from a hpothesis class s.t. KL( q p) 0 Hpothesis Class Model Evidence log p( m) KL( q p) KL( q p) q2 q1 q6 Free Energ (F) p( θ ) q3 q5 q4

28 Variational calculus Variational Baesian inference is based on variational calculus. Standard calculus Newton, Leibniz, and others functions f: x f x derivatives df dx Example: maximize the likelihood expression p θ w.r.t. θ Variational calculus Euler, Lagrange, and others functionals F: f F f derivatives df df Example: maximize the entrop H p w.r.t. a probabilit distribution p x Leonhard Euler ( ) Swiss mathematician, Elementa Calculi Variationum Source: Ka H. Brodersen, 2013,

29 Variational calculus and the free energ Variational calculus lends itself nicel to approximate Baesian inference. ln p() = ln p(,θ) p θ = q θ ln p(,θ) p θ = q θ ln p(,θ) p θ = q θ ln = q θ ln q θ p θ q θ p θ dθ q θ q θ dθ + ln p(,θ) q θ dθ dθ + q θ ln p(,θ) q θ dθ KL[q p] divergence between q θ and p θ F(q, ) free energ Source: Ka H. Brodersen, 2013,

30 Variational calculus and the free energ In summar, the log model evidence can be expressed as: ln p() = KL[q p + F q, divergence 0 (unknown) free energ (eas to evaluate for a given q) ln p KL[q p ln p F q, KL[q p Maximizing F q, is equivalent to: minimizing KL[q p tightening F q, as a lower bound to the log model evidence F q, initialization convergence Source: Ka H. Brodersen, 2013,

31 Computing the free energ We can decompose the free energ F(q, ) as follows: F(q, ) = q θ ln p,θ q θ dθ = q θ ln p, θ dd q θ ln q θ dd = ln p, θ q + H q expected log-joint Shannon entrop Source: Ka H. Brodersen, 2013,

32 The mean-field assumption When inverting models with several parameters, a common wa of restricting the class of approximate posteriors q θ is to consider those posteriors that factorize into independent partitions, q θ 1 q θ 2 q θ = q i θ i where q i θ i is the approximate posterior for the i th subset of parameters. i, θ 2 θ 1 Jean Daunizeau, ~jdaunize/presentations/baes2.pdf Source: Ka H. Brodersen, 2013,

33 Variational inference under the mean-field assumption F q, p, θ = q θ ln q θ dθ = q i ln p, θ ln q i i i dθ mean-field assumption: q θ = q i θ i i = q j q i ln p, θ ln q j \j dθ q j q i ln q i dθ \j \j = q j q i ln p,θ \j ln p,θ q\j dθ \j ln q j dθ j q j q i ln q i dθ \j dθ j \j \j = q j ln exp ln p,θ q \j q j dθ j + c = KL q j exp ln p,θ q\j + c

34 Variational algorithm under the mean-field assumption In summar: F q, = KL q j exp ln p,θ q\j + c Suppose the densities q \j q θ \j are kept fixed. Then the approximate posterior q θ j that maximizes F q, is given b: q j = arg max q j F q, = 1 Z exp ln p, θ q \j Therefore: ln q j = ln p, θ q\j =:I θ j ln Z This implies a straightforward algorithm for variational inference: Initialize all approximate posteriors q θ i, e.g., b setting them to their priors. Ccle over the parameters, revising each given the current estimates of the others. Loop until convergence. Source: Ka H. Brodersen, 2013,

35 Tpical strategies in variational inference no mean-field assumption mean-field assumption q θ = q θ i no parametric assumptions (variational inference = exact inference) iterative free-form variational optimization parametric assumptions q θ = F θ δ fixed-form optimization of moments iterative fixed-form variational optimization Source: Ka H. Brodersen, 2013,

36 Example: variational densit estimation We are given a univariate dataset 1,, n which we model b a simple univariate Gaussian distribution. We wish to infer on its mean and precision: p μ, τ Although in this case a closed-form solution exists*, we shall pretend it does not. Instead, we consider approximations that satisf the meanfield assumption: mean precision μ i data i = 1 n τ p μ τ = N μ μ 0, λ 0 τ 1 p τ = Ga τ a 0,b 0 p i μ,τ = N i μ,τ 1 q μ, τ = q μ μ q τ τ Source: Ka H. Brodersen, 2013, ; Bishop (2006) PRML

37 Recurring expressions in Baesian inference Univariate normal distribution ln N x μ, λ 1 = 1 2 ln λ 1 2 ln π λ 2 x μ 2 = 1 2 λx2 + λλλ + c Multivariate normal distribution ln N d x μ, Λ 1 = 1 2 ln Λ 1 d 2 ln 2π 1 2 x μ T Λ x μ = 1 2 xt Λx + x T Λμ + c Gamma distribution ln Ga x a, b = a ln b ln Γ a + a 1 ln x b x = a 1 ln x b x + c Source: Ka H. Brodersen, 2013,

38 Variational densit estimation: mean μ ln q μ = ln p, μ, τ q τ + c n = ln p i μ, τ i q τ + ln p μ τ q τ + ln p τ q τ + c = ln N i μ,τ 1 q τ + ln N μ μ 0, λ 0 τ 1 q τ + ln Ga τ a 0,b 0 q τ + c = τ 2 i μ 2 q τ + λ 0τ 2 μ μ 0 2 q τ + c = τ q τ 2 i 2 + τ q τ n μ n τ q τ 2 μ2 λ 0 τ q τ 2 μ 2 + λ 0 μμ 0 τ q τ λ 0 2 μ 0 2 +c = 1 2 n τ q τ + λ 0 τ q τ μ 2 + n τ q τ + λ 0 μ 0 τ q τ μ + c reinstation b inspection q μ = N μ μ n, λ n 1 with λ n = λ 0 + n τ q τ μ n = n τ q τ + λ 0 μ 0 τ q τ λ n = λ 0μ 0 + n λ 0 + n Source: Ka H. Brodersen, 2013,

39 Variational densit estimation: precision τ ln q τ = ln p, μ, τ q μ + c n = ln N i μ, τ 1 n i=1 q μ = 1 2 ln τ τ 2 i μ 2 i=1 + ln N μ μ 0, λ 0 τ 1 q μ + ln Ga τ a 0, b 0 q μ + c q μ + a 0 1 ln τ b 0 τ q μ + c ln λ 0τ λ 0τ 2 μ μ 0 2 q μ = n 2 lnτ τ 2 i μ 2 q μ lnλ lnτ λ 0τ 2 μ μ 2 0 q μ + a 0 1 lnτ b 0 τ + c = n a ln τ 2 i μ 2 q μ + λ 0 2 μ μ 2 0 q μ + b 0 τ + c q τ = Ga τ a n, b n with a n = a 0 + n b n = b 0 + λ 0 2 μ μ 0 2 q μ i μ 2 q μ Source: Ka H. Brodersen, 2013,

40 Variational densit estimation: illustration p θ q θ q θ Source: Ka H. Brodersen, 2013, Bishop (2006) PRML, p. 472

41 TAPAS Variational Baes Linear Regression

42

43 Demo VB Linear Regression

44 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

45 Markov Chain Monte Carlo(MCMC) sampling Andrei Markov ( ) Russian mathematician Proposal distribution q(θ) θ 0 θ 1 θ 2 θ 3... θ N A general framework for sampling from a large class of distributions Scales well with dimensionalit of sample space Asmptoticall convergent Posterior distribution p θ

46 Markov chain properties Transition probabilities homogeneous Invariance p θ t+1 θ 1,, θ t ) = p θ t+1 θ t ) = T t (θ t+1, θ t ) p θ = T θ, θ p θ θ Detailed Balance Ergodic Reversible Invariant T θ, θ p θ = T θ, θ p θ Homogeneous Ergodicit p θ = lim n p θ n p(θ 0 ) MCMC

47 Metropolis-Hastings Algorithm Initialize θ at step 1 - for example, sample from prior At step t, sample from the proposal distribution: Accept with probabilit: θ ~ q θ θ t ) A(θ, θ t ) ~ mmm 1, p θ ) q(θ t θ ) p θ t )q(θ θ t ) Metropolis Smmetric proposal distribution A(θ, θ t ) ~ mmm 1, p θ ) p θ t ) Bishop (2006) PRML, p. 539

48 Gibbs Sampling Algorithm Special case of Metropolis Hastings At step t, sample from the conditional distribution: Acceptance probabilit = 1 Blocked Sampling θ t+1 1 ~ p θ 1 θ 2t,, θ t n θ t+1 2 ~ p(θ 2 θ t+1 1,, θ nt ) : : : θ n

49 Posterior analsis from MCMC θ 0 θ 1 θ 2 θ 3... θ N Obtain independent samples: Generate samples based on MCMC sampling. Discard initial burn-in period samples to remove dependence on initialization. Thinning- select ever m th sample to reduce correlation. Inspect sample statistics (e.g., histogram, sample quantiles, )

50 MAP estimate via Simulated Annealing Add a temperature parameter and schedule to update it T = 0.1 Algorithm Set T = 1 Until convergence For ever K iterations sample from: Reduce T T = 1 p 1/T θ )

51 Convergence Analsis Single chain methods Geweke (1992) Rafter-Lewis (1992) Multi-chain methods Gelman-Rubin (1992) Potential Scale Reduction factor

52 Model evidence using MCMC Importance Sampling Prior arithmetic mean Posterior harmonic mean

53 Thermodnamic Integration Path Sampling (Thermodnamic Integration) T=1.0 T=0.8 E 1.0 (loglk) E 0.8 (loglk) Model Evidence T=0.5 E 0.5 (loglk) T=0.0 E 0.0 (loglk) Ogata,Y Num.Math.55: , Gelman,A Stat. Sci. 13:

54 Derivation - Extra

55 Other MCMC variants Slice Sampling Adaptive MH Hamiltonian Monte Carlo Population MCMC

56 TAPAS mpdcm GPU based MCMC

57 Demo MCMC Linear Regression

58 Summar Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling

59 References Pattern Recognition and Machine Learning (2006). Christopher M. Bishop Baesian reasoning and machine learning. David Barber Information theor, pattern recognition and learning algorithms. David McKa Conjugate Baesian analsis of the Gaussian distribution. Kevin P. Murph. Videolectures.net Baesian inference, MCMC, Variational Baes Akaike, H (1974). A new look at statistical model identification. IEEE transactions on Automatic Control 19, Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, Kass, R.E. and Rafter, A.E. (1995).Baes factors Journal of the American Statistical Association, 90, Markov Chain Monte Carlo: Stochastic Simulation for Baesian Inference, Second Edition (Chapman & Hall/CRC Texts in Statistical Science) Dani Gamerman, Hedibert F. Lopes. Statistical Parametric Mapping: The Analsis of Functional Brain Images. N. Lartillot and H. Philippe, Computing baes factors using thermodnamic integration, Sstematic Biolog, vol. 55, no. 2, pp , Andrew Gelman, Donald B. Rubin. Inference from iterative simulation using multiple sequences Stat. Sci., 7 (4) (1992), pp J. Geweke. Evaluating the accurac of sampling-based approaches to calculating posterior moments Baesian Statistics 4, (1992), pp Adrian E. Rafter, Steven LewisHow man iterations in the Gibbs sampler? Baesian Statistics 4, Oxford Universit Press (1992), pp

60 Thank You

Models of effective connectivity & Dynamic Causal Modelling (DCM)

Models of effective connectivity & Dynamic Causal Modelling (DCM) Models of effective connectivit & Dnamic Causal Modelling (DCM Presented b: Ariana Anderson Slides shared b: Karl Friston Functional Imaging Laborator (FIL Wellcome Trust Centre for Neuroimaging Universit

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Monte Carlo integration

Monte Carlo integration Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?

More information

Models of effective connectivity & Dynamic Causal Modelling (DCM)

Models of effective connectivity & Dynamic Causal Modelling (DCM) Models of effective connectivit & Dnamic Causal Modelling (DCM) Slides obtained from: Karl Friston Functional Imaging Laborator (FIL) Wellcome Trust Centre for Neuroimaging Universit College London Klaas

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. June 2nd 2011 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule p(m y) = p(y m)p(m)

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Will Penny. DCM short course, Paris 2012

Will Penny. DCM short course, Paris 2012 DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a

More information

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.

Model Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC. Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

An introduction to Bayesian inference and model comparison J. Daunizeau

An introduction to Bayesian inference and model comparison J. Daunizeau An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Will Penny. SPM for MEG/EEG, 15th May 2012

Will Penny. SPM for MEG/EEG, 15th May 2012 SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)

Bayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides) Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

Fault Detection and Diagnosis Using Information Measures

Fault Detection and Diagnosis Using Information Measures Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Tutorial on Probabilistic Programming with PyMC3

Tutorial on Probabilistic Programming with PyMC3 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Will Penny. SPM short course for M/EEG, London 2013

Will Penny. SPM short course for M/EEG, London 2013 SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information