Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Size: px
Start display at page:

Download "Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge"

Transcription

1 Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge

2 The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference correspond to integrals: ˆ Epectations ˆ Marginal distributions ˆ Integrating out nuisance parameters ˆ Normalization constants ˆ Model comparison function f() input density p()

3 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 function f() input density p() samples

4 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: function f() input density p() samples

5 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up function f() input density p() samples

6 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up ˆ Often, nearby function values will be similar function f() input density p() samples

7 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up ˆ Often, nearby function values will be similar ˆ Model-based and quasi-monte Carlo methods spread out samples to achieve faster convergence. function f() input density p() samples

8 Model-based Integration

9 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ± SD p(z)

10 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples

11 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples

12 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples

13 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples

14 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples

15 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples ˆ We ll call using a GP prior Bayesian Quadrature

16 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d

17 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z

18 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z ˆ Doesn t depend on function values at all!

19 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z ˆ Doesn t depend on function values at all! ˆ Choosing samples sequentially to minimize variance: Sequential Bayesian Quadrature.

20 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y)

21 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y) ˆ Can condition on gradients

22 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y) ˆ Can condition on gradients ˆ Posterior variance is a natural convergence diagnostic Z # of samples

23 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways:

24 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Simply from the GP posterior:

25 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Or by recomputing f θ () for different θ with same Z True σ Marg. Like σ

26 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Or by recomputing f θ () for different θ with same Z True σ Marg. Like. ˆ Much nicer than histograms! σ

27 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/ N ) i.i.d. sampling Herding with 1/N weights number of samples

28 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/ N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights number of samples

29 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples

30 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Empirical Rates in RKHS 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples

31 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Empirical Rates out of RKHS 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples

32 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Bound on Bayesian Error 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples

33 GPs vs Log-GPs for Inference l() True Function Evaluations

34 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior

35 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior log l() True Log-func Evaluations 0 1

36 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior log l() True Log-func Evaluations GP Posterior 1

37 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior Log-GP Posterior log l() True Log-func Evaluations GP Posterior 1

38 GPs vs Log-GPs 200 l() True Function Evaluations 50 0

39 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0

40 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0 0 log l() True Log-func Evaluations

41 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0 0 log l() True Log-func Evaluations GP Posterior 200

42 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior Log-GP Posterior 0 0 log l() True Log-func Evaluations GP Posterior 200

43 Integrating under Log-GPs 80 l() True Function Evaluations GP Posterior Log-GP Posterior 20 log l() True Log-func Evaluations GP Posterior 1

44 Integrating under Log-GPs 80 l() True Function Evaluations Mean of GP Appro Log-GP Inducing Points 20 log l() True Log-func Evaluations GP Posterior Inducing Points

45 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions.

46 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions. ˆ BQ has nice convergence properties if its assumptions are correct.

47 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions. ˆ BQ has nice convergence properties if its assumptions are correct. ˆ For inference, GP is not especially appropriate, but other models are intractable.

48 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive.

49 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations.

50 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model?

51 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model? ˆ How much overhead is worthwhile? Bounded rationality work seems relevant.

52 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model? ˆ How much overhead is worthwhile? Bounded rationality work seems relevant. Thanks!

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals Chapter 6 Student Lecture Notes 6-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 6 Introduction to Sampling Distributions Chap 6-1 Chapter Goals To use information from the sample

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Global Optimisation with Gaussian Processes. Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford

Global Optimisation with Gaussian Processes. Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford Global Optimisation with Gaussian Processes Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford Global optimisation considers objective functions that

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse

Sandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion

More information

Particle-Based Approximate Inference on Graphical Model

Particle-Based Approximate Inference on Graphical Model article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

A Comparison of Particle Filters for Personal Positioning

A Comparison of Particle Filters for Personal Positioning VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Bayes Nets: Sampling

Bayes Nets: Sampling Bayes Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Inference:

More information

ML estimation: Random-intercepts logistic model. and z

ML estimation: Random-intercepts logistic model. and z ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1)

More information

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

An example to illustrate frequentist and Bayesian approches

An example to illustrate frequentist and Bayesian approches Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Sampling from complex probability distributions

Sampling from complex probability distributions Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation

More information

Bayesian Estimation with Sparse Grids

Bayesian Estimation with Sparse Grids Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling Rejection sampling - Acceptance probability Review: How to sample from a multivariate normal in R Goal: Simulate from N d (µ,σ)? Note: For c to be small, g() must be similar to f(). The art of rejection

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Learning Energy-Based Models of High-Dimensional Data

Learning Energy-Based Models of High-Dimensional Data Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Gaussian Process Optimization with Mutual Information

Gaussian Process Optimization with Mutual Information Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Big model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts.

Big model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts. Big model configuration with Bayesian quadrature David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts. This talk will develop Bayesian quadrature approaches

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Lecture Note 2: Estimation and Information Theory

Lecture Note 2: Estimation and Information Theory Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem

More information

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1 Introduction Question: What are functional data? Answer: Data that are functions of a continuous variable.... say

More information

Markov chain Monte Carlo methods for visual tracking

Markov chain Monte Carlo methods for visual tracking Markov chain Monte Carlo methods for visual tracking Ray Luo rluo@cory.eecs.berkeley.edu Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Sequential Bayesian Prediction in the Presence of Changepoints

Sequential Bayesian Prediction in the Presence of Changepoints Roman Garnett, Michael A. Osborne, Stephen J. Roberts {rgarnett, mosb, sjrob}@robots.o.ac.uk Department of Engineering Science, University of Oford, Oford, UK OX 3PJ Keywords: Gaussian processes, time-series

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

STA Module 10 Comparing Two Proportions

STA Module 10 Comparing Two Proportions STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Monte Carlo integration

Monte Carlo integration Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?

More information