Bayesian Inference. Sudhir Shankar Raman Translational Neuromodeling Unit, UZH & ETH. The Reverend Thomas Bayes ( )
|
|
- Darleen Newman
- 5 years ago
- Views:
Transcription
1 The Reverend Thomas Baes ( ) Baesian Inference Sudhir Shankar Raman Translational Neuromodeling Unit, UZH & ETH With man thanks for some slides to: Klaas Enno Stephan & Ka H. Brodersen
2 Wh do I need to learn about Baesian stats? Because SPM is getting more and more Baesian: Segmentation & spatial normalisation Posterior probabilit maps (PPMs) Dnamic Causal Modelling (DCM) Baesian Model Selection (BMS) EEG: source reconstruction
3 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
4 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
5 Classical and Baesian statistics p-value: probabilit of getting the observed data in the effect s absence. If small, reject null hpothesis that there is no effect. Probabilit of observing the data, given no effect (θ = 0). Baesian Inference Flexibilit in modelling Incorporating prior information Posterior probabilit of effect Options for model comparison H0 : θ = 0 p( H ) p(θ ) 0 p(, θ ) p( θ ) One can never accept the null hpothesis Given enough data, one can alwas demonstrate a significant effect Correction for multiple comparisons necessar Statistical analsis and the illusion of objectivit. James O. Berger, Donald A. Berr
6 Baes Theorem Observed Data Posterior Beliefs Reverend Thomas Baes Prior Beliefs Baes theorem describes, how an ideall rational person processes information."
7 ) ( ) ( ) ( ) ( p p p P θ θ θ = Eliminating p(,θ) gives Baes rule: Likelihood Prior Evidence Posterior Baes Theorem Given data and parameters θ, the conditional probabilities are: ) ( ), ( ) ( θ θ θ p p p = ) ( ), ( ) ( p p p θ θ =
8 new data prior knowledge p ( θ ) p(θ ) Baesian statistics p( θ ) p( θ ) p( θ ) posterior likelihood prior Baes theorem allows one to formall incorporate prior knowledge into computing statistical probabilities. Priors can be of different sorts: empirical, principled or shrinkage priors, uninformative. The posterior probabilit of the parameters given the data is an optimal combination of prior knowledge and new data, weighted b their relative precision.
9 Baes in motion - an animation Likelihood Prior Posterior
10 Principles of Baesian inference Formulation of a generative model Model Likelihood function p( θ) prior distribution p(θ) Observation of data Data Model Inversion - Update of beliefs based upon observations, given a prior state of knowledge p( θ ) p( θ) p( θ) Maximum a posteriori (MAP) Maximum likelihood (ML)
11 Conjugate Priors Prior and Posterior have the same form p( θ ) = p( θ ) p( θ ) p( ) Same form!! Analtical expression. Conjugate priors for all exponential famil members. Example Gaussian Likelihood, Gaussian prior over mean
12 Likelihood & prior Posterior: Prior Likelihood Posterior Gaussian Model µ p µ ), ( ) ( ), ( ) ( 1 1 = = p p e N p N p λ µ θ θ λ θ θ ), ( ) ( 1 = λ µ θ θ N p p p e p e µ λ λ λ λ µ λ λ λ + = + = Relative precision weighting θ + ε =
13 Baesian regression: univariate case Relative precision weighting Normal densities = xθ + ε + = + = p p e p e x x η σ σ σ η σ σ σ θ θ θ x Univariate linear model ), ( ) ( 2 p p N p σ η θ θ = ), ( ) ( 2 e x N p σ θ θ = ), ( ) ( 2 N p θ η θ σ θ θ = η p η θ
14 Baesian GLM: multivariate case Normal densities p( θ ) = N( θ; η p, C p ) General Linear Model = Xθ + e p θ) = N( ; Xθ, C ( e ) p( θ ) = N( θ; η θ, C θ ) θ 2 C η 1 θ θ = X = C θ T C 1 e X + C ( ) T 1 1 X C + C η e 1 p p p One step if C e is known. Otherwise define conjugate prior or perform iterative estimation with EM. θ 1
15 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
16 Baesian model selection (BMS) Given competing hpotheses on structure & functional mechanisms of a sstem, which model is the best? Which model represents the best balance between model fit and model complexit? For which model m does p( m) become maximal? Pitt & Miung (2002), TICS
17 Baesian model selection (BMS) Baes rule: p( θ, m) = p( θ, m) p( θ m) p( m) Model evidence: p ( m) = p( θ, m) p( θ m) dθ accounts for both accurac and complexit of the model allows for inference about structure (generalizabilit) of the model Model comparison via Baes factor: p m 1 ) p(m 2 ) = p( m 1) p(m 1 ) p( m 2 ) p(m 2 ) BB = p( m 1) p( m 2 ) Model averaging p θ = p θ, m p(m ) m Kass and Rafter (1995), Penn et al. (2004) NeuroImage BF 10 Evidence against H 0 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Decisive
18 Model Evidence McKa 1992, Neural Computations, Bishop PRML 2007
19 Baesian model selection (BMS) Various Approximations: Akaike Information Criterion (AIC) Akaike, 1974 ln p(d) ln p D θ MM ) M Baesian Information Criterion (BIC) Schwarz, 1978 ln p(d) ln p D θ ML ) 1 M ln (N) 2 Negative free energ ( F ) A b-product of Variational Baes Path Sampling (Thermodnamic Integration) - MCMC
20 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
21 Approximate Baesian inference Baesian inference formalizes model inversion, the process of passing from a prior to a posterior in light of data. posterior p θ = likelihood prior p θ p θ p, θ dθ marginal likelihood p (model evidence) In practice, evaluating the posterior is usuall difficult because we cannot easil evaluate p, especiall when: High dimensionalit, complex form analtical solutions are not available numerical integration is too expensive Source: Ka H. Brodersen, 2013,
22 Approximate Baesian inference There are two approaches to approximate inference. The have complementar strengths and weaknesses. Deterministic approximate inference in particular variational Baes Stochastic approximate inference in particular sampling find an analtical prox q θ that is maximall similar to p θ inspect distribution statistics of q θ (e.g., mean, quantiles, intervals, ) often insightful and fast often hard work to derive converges to local minima design an algorithm that draws samples θ 1,, θ m from p θ inspect sample statistics (e.g., histogram, sample quantiles, ) asmptoticall exact computationall expensive trick engineering concerns Source: Ka H. Brodersen, 2013,
23 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
24 The Laplace approximation The Laplace approximation provides a wa of approximating a densit whose normalization constant we cannot evaluate, b fitting a Normal distribution to its mode. p z = 1 Z normalization constant (unknown) f z main part of the densit (eas to evaluate) Pierre-Simon Laplace ( ) French mathematician and astronomer This is exactl the situation we face in Baesian inference: p θ = 1 p model evidence (unknown) p, θ joint densit (eas to evaluate) Source: Ka H. Brodersen, 2013,
25 Appling the Laplace approximation Given a model with parameters θ = θ 1,, θ p, the Laplace approximation reduces to a simple three-step procedure: Find the mode of the log-joint: θ = arg max θ ln p, θ Evaluate the curvature of the log-joint at the mode: ln p, θ We obtain a Gaussian approximation: N θ μ, Λ 1 with μ = θ Λ = ln p, θ Source: Ka H. Brodersen, 2013,
26 Limitations of the Laplace approximation The Laplace approximation is often too strong a simplification ignores global properties of the posterior 0.25 log joint approximation becomes brittle when the posterior is multimodal onl directl applicable to real-valued parameters Source: Ka H. Brodersen, 2013,
27 Variational Baes Goal: Choose q from a hpothesis class s.t. KL( q p) 0 Hpothesis Class Model Evidence log p( m) KL( q p) KL( q p) q2 q1 q6 Free Energ (F) p( θ ) q3 q5 q4
28 Variational calculus Variational Baesian inference is based on variational calculus. Standard calculus Newton, Leibniz, and others functions f: x f x derivatives df dx Example: maximize the likelihood expression p θ w.r.t. θ Variational calculus Euler, Lagrange, and others functionals F: f F f derivatives df df Example: maximize the entrop H p w.r.t. a probabilit distribution p x Leonhard Euler ( ) Swiss mathematician, Elementa Calculi Variationum Source: Ka H. Brodersen, 2013,
29 Variational calculus and the free energ Variational calculus lends itself nicel to approximate Baesian inference. ln p() = ln p(,θ) p θ = q θ ln p(,θ) p θ = q θ ln p(,θ) p θ = q θ ln = q θ ln q θ p θ q θ p θ dθ q θ q θ dθ + ln p(,θ) q θ dθ dθ + q θ ln p(,θ) q θ dθ KL[q p] divergence between q θ and p θ F(q, ) free energ Source: Ka H. Brodersen, 2013,
30 Variational calculus and the free energ In summar, the log model evidence can be expressed as: ln p() = KL[q p + F q, divergence 0 (unknown) free energ (eas to evaluate for a given q) ln p KL[q p ln p F q, KL[q p Maximizing F q, is equivalent to: minimizing KL[q p tightening F q, as a lower bound to the log model evidence F q, initialization convergence Source: Ka H. Brodersen, 2013,
31 Computing the free energ We can decompose the free energ F(q, ) as follows: F(q, ) = q θ ln p,θ q θ dθ = q θ ln p, θ dd q θ ln q θ dd = ln p, θ q + H q expected log-joint Shannon entrop Source: Ka H. Brodersen, 2013,
32 The mean-field assumption When inverting models with several parameters, a common wa of restricting the class of approximate posteriors q θ is to consider those posteriors that factorize into independent partitions, q θ 1 q θ 2 q θ = q i θ i where q i θ i is the approximate posterior for the i th subset of parameters. i, θ 2 θ 1 Jean Daunizeau, ~jdaunize/presentations/baes2.pdf Source: Ka H. Brodersen, 2013,
33 Variational inference under the mean-field assumption F q, p, θ = q θ ln q θ dθ = q i ln p, θ ln q i i i dθ mean-field assumption: q θ = q i θ i i = q j q i ln p, θ ln q j \j dθ q j q i ln q i dθ \j \j = q j q i ln p,θ \j ln p,θ q\j dθ \j ln q j dθ j q j q i ln q i dθ \j dθ j \j \j = q j ln exp ln p,θ q \j q j dθ j + c = KL q j exp ln p,θ q\j + c
34 Variational algorithm under the mean-field assumption In summar: F q, = KL q j exp ln p,θ q\j + c Suppose the densities q \j q θ \j are kept fixed. Then the approximate posterior q θ j that maximizes F q, is given b: q j = arg max q j F q, = 1 Z exp ln p, θ q \j Therefore: ln q j = ln p, θ q\j =:I θ j ln Z This implies a straightforward algorithm for variational inference: Initialize all approximate posteriors q θ i, e.g., b setting them to their priors. Ccle over the parameters, revising each given the current estimates of the others. Loop until convergence. Source: Ka H. Brodersen, 2013,
35 Tpical strategies in variational inference no mean-field assumption mean-field assumption q θ = q θ i no parametric assumptions (variational inference = exact inference) iterative free-form variational optimization parametric assumptions q θ = F θ δ fixed-form optimization of moments iterative fixed-form variational optimization Source: Ka H. Brodersen, 2013,
36 Example: variational densit estimation We are given a univariate dataset 1,, n which we model b a simple univariate Gaussian distribution. We wish to infer on its mean and precision: p μ, τ Although in this case a closed-form solution exists*, we shall pretend it does not. Instead, we consider approximations that satisf the meanfield assumption: mean precision μ i data i = 1 n τ p μ τ = N μ μ 0, λ 0 τ 1 p τ = Ga τ a 0,b 0 p i μ,τ = N i μ,τ 1 q μ, τ = q μ μ q τ τ Source: Ka H. Brodersen, 2013, ; Bishop (2006) PRML
37 Recurring expressions in Baesian inference Univariate normal distribution ln N x μ, λ 1 = 1 2 ln λ 1 2 ln π λ 2 x μ 2 = 1 2 λx2 + λλλ + c Multivariate normal distribution ln N d x μ, Λ 1 = 1 2 ln Λ 1 d 2 ln 2π 1 2 x μ T Λ x μ = 1 2 xt Λx + x T Λμ + c Gamma distribution ln Ga x a, b = a ln b ln Γ a + a 1 ln x b x = a 1 ln x b x + c Source: Ka H. Brodersen, 2013,
38 Variational densit estimation: mean μ ln q μ = ln p, μ, τ q τ + c n = ln p i μ, τ i q τ + ln p μ τ q τ + ln p τ q τ + c = ln N i μ,τ 1 q τ + ln N μ μ 0, λ 0 τ 1 q τ + ln Ga τ a 0,b 0 q τ + c = τ 2 i μ 2 q τ + λ 0τ 2 μ μ 0 2 q τ + c = τ q τ 2 i 2 + τ q τ n μ n τ q τ 2 μ2 λ 0 τ q τ 2 μ 2 + λ 0 μμ 0 τ q τ λ 0 2 μ 0 2 +c = 1 2 n τ q τ + λ 0 τ q τ μ 2 + n τ q τ + λ 0 μ 0 τ q τ μ + c reinstation b inspection q μ = N μ μ n, λ n 1 with λ n = λ 0 + n τ q τ μ n = n τ q τ + λ 0 μ 0 τ q τ λ n = λ 0μ 0 + n λ 0 + n Source: Ka H. Brodersen, 2013,
39 Variational densit estimation: precision τ ln q τ = ln p, μ, τ q μ + c n = ln N i μ, τ 1 n i=1 q μ = 1 2 ln τ τ 2 i μ 2 i=1 + ln N μ μ 0, λ 0 τ 1 q μ + ln Ga τ a 0, b 0 q μ + c q μ + a 0 1 ln τ b 0 τ q μ + c ln λ 0τ λ 0τ 2 μ μ 0 2 q μ = n 2 lnτ τ 2 i μ 2 q μ lnλ lnτ λ 0τ 2 μ μ 2 0 q μ + a 0 1 lnτ b 0 τ + c = n a ln τ 2 i μ 2 q μ + λ 0 2 μ μ 2 0 q μ + b 0 τ + c q τ = Ga τ a n, b n with a n = a 0 + n b n = b 0 + λ 0 2 μ μ 0 2 q μ i μ 2 q μ Source: Ka H. Brodersen, 2013,
40 Variational densit estimation: illustration p θ q θ q θ Source: Ka H. Brodersen, 2013, Bishop (2006) PRML, p. 472
41 TAPAS Variational Baes Linear Regression
42
43 Demo VB Linear Regression
44 Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
45 Markov Chain Monte Carlo(MCMC) sampling Andrei Markov ( ) Russian mathematician Proposal distribution q(θ) θ 0 θ 1 θ 2 θ 3... θ N A general framework for sampling from a large class of distributions Scales well with dimensionalit of sample space Asmptoticall convergent Posterior distribution p θ
46 Markov chain properties Transition probabilities homogeneous Invariance p θ t+1 θ 1,, θ t ) = p θ t+1 θ t ) = T t (θ t+1, θ t ) p θ = T θ, θ p θ θ Detailed Balance Ergodic Reversible Invariant T θ, θ p θ = T θ, θ p θ Homogeneous Ergodicit p θ = lim n p θ n p(θ 0 ) MCMC
47 Metropolis-Hastings Algorithm Initialize θ at step 1 - for example, sample from prior At step t, sample from the proposal distribution: Accept with probabilit: θ ~ q θ θ t ) A(θ, θ t ) ~ mmm 1, p θ ) q(θ t θ ) p θ t )q(θ θ t ) Metropolis Smmetric proposal distribution A(θ, θ t ) ~ mmm 1, p θ ) p θ t ) Bishop (2006) PRML, p. 539
48 Gibbs Sampling Algorithm Special case of Metropolis Hastings At step t, sample from the conditional distribution: Acceptance probabilit = 1 Blocked Sampling θ t+1 1 ~ p θ 1 θ 2t,, θ t n θ t+1 2 ~ p(θ 2 θ t+1 1,, θ nt ) : : : θ n
49 Posterior analsis from MCMC θ 0 θ 1 θ 2 θ 3... θ N Obtain independent samples: Generate samples based on MCMC sampling. Discard initial burn-in period samples to remove dependence on initialization. Thinning- select ever m th sample to reduce correlation. Inspect sample statistics (e.g., histogram, sample quantiles, )
50 MAP estimate via Simulated Annealing Add a temperature parameter and schedule to update it T = 0.1 Algorithm Set T = 1 Until convergence For ever K iterations sample from: Reduce T T = 1 p 1/T θ )
51 Convergence Analsis Single chain methods Geweke (1992) Rafter-Lewis (1992) Multi-chain methods Gelman-Rubin (1992) Potential Scale Reduction factor
52 Model evidence using MCMC Importance Sampling Prior arithmetic mean Posterior harmonic mean
53 Thermodnamic Integration Path Sampling (Thermodnamic Integration) T=1.0 T=0.8 E 1.0 (loglk) E 0.8 (loglk) Model Evidence T=0.5 E 0.5 (loglk) T=0.0 E 0.0 (loglk) Ogata,Y Num.Math.55: , Gelman,A Stat. Sci. 13:
54 Derivation - Extra
55 Other MCMC variants Slice Sampling Adaptive MH Hamiltonian Monte Carlo Population MCMC
56 TAPAS mpdcm GPU based MCMC
57 Demo MCMC Linear Regression
58 Summar Baesian Inference Model Selection Approximate Inference Variational Baes MCMC Sampling
59 References Pattern Recognition and Machine Learning (2006). Christopher M. Bishop Baesian reasoning and machine learning. David Barber Information theor, pattern recognition and learning algorithms. David McKa Conjugate Baesian analsis of the Gaussian distribution. Kevin P. Murph. Videolectures.net Baesian inference, MCMC, Variational Baes Akaike, H (1974). A new look at statistical model identification. IEEE transactions on Automatic Control 19, Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, Kass, R.E. and Rafter, A.E. (1995).Baes factors Journal of the American Statistical Association, 90, Markov Chain Monte Carlo: Stochastic Simulation for Baesian Inference, Second Edition (Chapman & Hall/CRC Texts in Statistical Science) Dani Gamerman, Hedibert F. Lopes. Statistical Parametric Mapping: The Analsis of Functional Brain Images. N. Lartillot and H. Philippe, Computing baes factors using thermodnamic integration, Sstematic Biolog, vol. 55, no. 2, pp , Andrew Gelman, Donald B. Rubin. Inference from iterative simulation using multiple sequences Stat. Sci., 7 (4) (1992), pp J. Geweke. Evaluating the accurac of sampling-based approaches to calculating posterior moments Baesian Statistics 4, (1992), pp Adrian E. Rafter, Steven LewisHow man iterations in the Gibbs sampler? Baesian Statistics 4, Oxford Universit Press (1992), pp
60 Thank You
Models of effective connectivity & Dynamic Causal Modelling (DCM)
Models of effective connectivit & Dnamic Causal Modelling (DCM Presented b: Ariana Anderson Slides shared b: Karl Friston Functional Imaging Laborator (FIL Wellcome Trust Centre for Neuroimaging Universit
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationMonte Carlo integration
Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?
More informationModels of effective connectivity & Dynamic Causal Modelling (DCM)
Models of effective connectivit & Dnamic Causal Modelling (DCM) Slides obtained from: Karl Friston Functional Imaging Laborator (FIL) Wellcome Trust Centre for Neuroimaging Universit College London Klaas
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationBayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL London SPM Course Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk A spectacular piece of information
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationA prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.
June 2nd 2011 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule p(m y) = p(y m)p(m)
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationWill Penny. DCM short course, Paris 2012
DCM short course, Paris 2012 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated to a
More informationModel Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.
Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationAn introduction to Bayesian inference and model comparison J. Daunizeau
An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationWill Penny. SPM for MEG/EEG, 15th May 2012
SPM for MEG/EEG, 15th May 2012 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationA note on Reversible Jump Markov Chain Monte Carlo
A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationBayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)
Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationFault Detection and Diagnosis Using Information Measures
Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationMonte Carlo Inference Methods
Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationTutorial on Probabilistic Programming with PyMC3
185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationVariational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M
A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M PD M = PD θ, MPθ Mdθ Lecture 14 : Variational Bayes where θ are the parameters of the model and Pθ M is
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationWill Penny. SPM short course for M/EEG, London 2013
SPM short course for M/EEG, London 2013 Ten Simple Rules Stephan et al. Neuroimage, 2010 Model Structure Bayes rule for models A prior distribution over model space p(m) (or hypothesis space ) can be updated
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More information