Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge
|
|
- Cory Washington
- 6 years ago
- Views:
Transcription
1 Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge
2 The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference correspond to integrals: ˆ Epectations ˆ Marginal distributions ˆ Integrating out nuisance parameters ˆ Normalization constants ˆ Model comparison function f() input density p()
3 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 function f() input density p() samples
4 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: function f() input density p() samples
5 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up function f() input density p() samples
6 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up ˆ Often, nearby function values will be similar function f() input density p() samples
7 Sampling Methods ˆ Monte Carlo methods: Sample from p(), take empirical mean: Ẑ = 1 N N f ( i ) i=1 ˆ Possibly sub-optimal for two reasons: ˆ Random bunching up ˆ Often, nearby function values will be similar ˆ Model-based and quasi-monte Carlo methods spread out samples to achieve faster convergence. function f() input density p() samples
8 Model-based Integration
9 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ± SD p(z)
10 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples
11 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples
12 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples
13 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples
14 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples
15 Model-based Integration ˆ Place a prior on f, for eample, a GP ˆ Posterior over f implies a posterior over Z. Z f() p() GP mean GP mean ±2SD p(z) samples ˆ We ll call using a GP prior Bayesian Quadrature
16 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d
17 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z
18 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z ˆ Doesn t depend on function values at all!
19 Bayesian Quadrature Estimator ˆ Posterior over Z has mean linear in f ( s ): E gp [Z f ( s )] = N z T K 1 f ( i ) i=1 where z n = k(, n )p()d ˆ Natural to minimize posterior variance of Z: V [Z f ( s )] = k(, )p()p( )dd z T K 1 z ˆ Doesn t depend on function values at all! ˆ Choosing samples sequentially to minimize variance: Sequential Bayesian Quadrature.
20 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y)
21 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y) ˆ Can condition on gradients
22 Things you can do with Bayesian Quadrature ˆ Can incorporate knowledge of function (symmetries) f (, y) = f (y, ) k s (, y,, y ) = k(, y,, y ) + k(, y,, y) + k(, y,, y ) + k(, y,, y) ˆ Can condition on gradients ˆ Posterior variance is a natural convergence diagnostic Z # of samples
23 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways:
24 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Simply from the GP posterior:
25 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Or by recomputing f θ () for different θ with same Z True σ Marg. Like σ
26 More things you can do with Bayesian Quadrature ˆ Can compute likelihood of GP, learn kernel ˆ Can compute marginals with error bars, in two ways: ˆ Or by recomputing f θ () for different θ with same Z True σ Marg. Like. ˆ Much nicer than histograms! σ
27 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/ N ) i.i.d. sampling Herding with 1/N weights number of samples
28 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/ N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights number of samples
29 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples
30 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Empirical Rates in RKHS 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples
31 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Empirical Rates out of RKHS 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples
32 Rates of Convergence What is rate of convergence of SBQ when its assumptions are true? Epected Variance / MMD Bound on Bayesian Error 10 1 MMD O(1/N ) i.i.d. sampling Herding with 1/N weights Herding with BQ weights SBQ with BQ weights number of samples
33 GPs vs Log-GPs for Inference l() True Function Evaluations
34 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior
35 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior log l() True Log-func Evaluations 0 1
36 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior log l() True Log-func Evaluations GP Posterior 1
37 GPs vs Log-GPs for Inference l() True Function Evaluations GP Posterior Log-GP Posterior log l() True Log-func Evaluations GP Posterior 1
38 GPs vs Log-GPs 200 l() True Function Evaluations 50 0
39 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0
40 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0 0 log l() True Log-func Evaluations
41 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior 0 0 log l() True Log-func Evaluations GP Posterior 200
42 GPs vs Log-GPs 200 l() True Function Evaluations GP Posterior Log-GP Posterior 0 0 log l() True Log-func Evaluations GP Posterior 200
43 Integrating under Log-GPs 80 l() True Function Evaluations GP Posterior Log-GP Posterior 20 log l() True Log-func Evaluations GP Posterior 1
44 Integrating under Log-GPs 80 l() True Function Evaluations Mean of GP Appro Log-GP Inducing Points 20 log l() True Log-func Evaluations GP Posterior Inducing Points
45 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions.
46 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions. ˆ BQ has nice convergence properties if its assumptions are correct.
47 Conclusions ˆ Model-based integration allows active learning about integrals, can require fewer samples than MCMC, and allows us to check our assumptions. ˆ BQ has nice convergence properties if its assumptions are correct. ˆ For inference, GP is not especially appropriate, but other models are intractable.
48 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive.
49 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations.
50 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model?
51 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model? ˆ How much overhead is worthwhile? Bounded rationality work seems relevant.
52 Limitations and Future Directions ˆ Right now, BQ really only works in low dimensions ( < 10 ), when the function is fairly smooth, and is only worth using when computing f () is epensive. ˆ How to etend to high dimensions? Gradient observations are helpful, but a D-dimensional gradient is D separate observations. ˆ It seems unlikely that we ll find another tractable nonparametric distribution like GPs - should we accept that we ll need a second round of approimate integration on a surrogate model? ˆ How much overhead is worthwhile? Bounded rationality work seems relevant. Thanks!
STA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationBusiness Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals
Chapter 6 Student Lecture Notes 6-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 6 Introduction to Sampling Distributions Chap 6-1 Chapter Goals To use information from the sample
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationGlobal Optimisation with Gaussian Processes. Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford
Global Optimisation with Gaussian Processes Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford Global optimisation considers objective functions that
More informationGaussian Process Vine Copulas for Multivariate Dependence
Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University,
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSandwiching the marginal likelihood using bidirectional Monte Carlo. Roger Grosse
Sandwiching the marginal likelihood using bidirectional Monte Carlo Roger Grosse Ryan Adams Zoubin Ghahramani Introduction When comparing different statistical models, we d like a quantitative criterion
More informationParticle-Based Approximate Inference on Graphical Model
article-based Approimate Inference on Graphical Model Reference: robabilistic Graphical Model Ch. 2 Koller & Friedman CMU, 0-708, Fall 2009 robabilistic Graphical Models Lectures 8,9 Eric ing attern Recognition
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationA Comparison of Particle Filters for Personal Positioning
VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy May 9-June 6. A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationBayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationBayes Nets: Sampling
Bayes Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Approximate Inference:
More informationML estimation: Random-intercepts logistic model. and z
ML estimation: Random-intercepts logistic model log p ij 1 p = x ijβ + υ i with υ i N(0, συ) 2 ij Standardizing the random effect, θ i = υ i /σ υ, yields log p ij 1 p = x ij β + σ υθ i with θ i N(0, 1)
More informationPower EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract
Power EP Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR-2004-149, October 4, 2004 Abstract This note describes power EP, an etension of Epectation Propagation (EP) that makes the computations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationAn example to illustrate frequentist and Bayesian approches
Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationWhy do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationBayesian Estimation with Sparse Grids
Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse
More informationNonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group
Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationRejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling
Rejection sampling - Acceptance probability Review: How to sample from a multivariate normal in R Goal: Simulate from N d (µ,σ)? Note: For c to be small, g() must be similar to f(). The art of rejection
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationGaussian Process Optimization with Mutual Information
Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationGaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction
Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction Agathe Girard Dept. of Computing Science University of Glasgow Glasgow, UK agathe@dcs.gla.ac.uk Carl Edward Rasmussen Gatsby
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationBig model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts.
Big model configuration with Bayesian quadrature David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts. This talk will develop Bayesian quadrature approaches
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationPaul Karapanagiotidis ECO4060
Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationLecture Note 2: Estimation and Information Theory
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem
More informationFundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University
Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1 Introduction Question: What are functional data? Answer: Data that are functions of a continuous variable.... say
More informationMarkov chain Monte Carlo methods for visual tracking
Markov chain Monte Carlo methods for visual tracking Ray Luo rluo@cory.eecs.berkeley.edu Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMarkov Chain Monte Carlo
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationSequential Bayesian Prediction in the Presence of Changepoints
Roman Garnett, Michael A. Osborne, Stephen J. Roberts {rgarnett, mosb, sjrob}@robots.o.ac.uk Department of Engineering Science, University of Oford, Oford, UK OX 3PJ Keywords: Gaussian processes, time-series
More informationGaussian Processes for Computer Experiments
Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationSTA Module 10 Comparing Two Proportions
STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationMonte Carlo integration
Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?
More information