Computer Emulation With Density Estimation

Size: px
Start display at page:

Download "Computer Emulation With Density Estimation"

Transcription

1 Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

2 Computer Emulation Motivation Expensive Experiments Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

3 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

4 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

5 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure Want to predict underlying density given physics input parameters - suggests Bayesian density estimation and regression 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

6 Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

7 Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Complication - we don t have access to draws, only counts within bins Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

8 Density Estimation Single-Histogram Model Likelihood Let Y j be the counts in bin j, which has edges [α j 1, α j ). Marginally Poisson jointly Multinomial (conditional on total) p( Y ) J j=1 p Y j j p j α j f (t)dt α j 1 We aim to model the unknown density f (t) nonparametrically with a smooth, continuous function over [0, 1]. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

9 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

10 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

11 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Hope - P(f (t) < 0) is very small in region of interest Positive values are often modeled with normal RVs if far enough away from zero (heights, rainfall, etc) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

12 Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

13 Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] GP Density Estimate, Bins = 6 & N x = 5 Density Post Mean Post 95% Cred Truth y Decent enough! Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

14 Density Estimation Multiple-Histogram Model Extending the Model Now we assume that we have an input d upon which we condition our estimate f (t d) = N 2an [Z n (d) cos(2πnt/t ) + W n (d) sin(2πnt/t )] + γ/t n=1 where {Z n ( )}, {W n ( )} GP(0, c M (, )) Thus, each component of the Karhunen-Loève representation is itself a GP. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

15 Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities Predicted Probability Predicted Truth Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

16 Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities GP Density Estimate Predicted Probability Predicted Truth Density Post Mean 95% Interval Truth Bin y Bin probability prediction is good, density prediction less so Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

17 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

18 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

19 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Predicted Bin Probabilities Strawman Predicted Probability Predicted Truth Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

20 Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

21 Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Future Directions Incorporate calibration Improve density estimation Show some form of posterior consistency with counts and bins going to infinity Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

22 Future Directions Thank you! Questions? Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

23 Future Directions Works Cited: I Adams, R., Murray, I., and MacKay, D. (2009). Nonparametric Bayesian density modeling with gaussian processes. G. Aad et al. (2010). Observation of a centrality-dependent dijet asymmetry in lead-lead collisions atsnn=2.76 TeVwith the ATLAS detector at the LHC. Physical Review Letters, 105(25). Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high dimensional output. Journal of the American Statistical Association, 103(482): Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. (2004). Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing, 26(2): Kundu, S. and Dunson, D. B. (2014). Latent factor models for density estimation. Biometrika, 101(3): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

24 Future Directions Works Cited: II Lenk, P. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78(3): Lenk, P. (2003). Bayesian semiparametric density estimation and model verification using a logistic-gaussian process. Journal of Computational and Graphical Statistics, 12(3): Riihimäki, J. and Vehtari, A. (2010). Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Analysis, 9(2): Tokdar, Zhu, and Gosh (2010). Bayesian density regression with logistic gaussian process and subspace projection. Bayesian Analysis, 5(2): Tokdar, S. T. (2007). Towards a faster implementation of density estimation with logistic gaussian process priors. Journal of Computational and Graphical Statistics, 16(3): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

25 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

26 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

27 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

28 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

29 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

30 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Usually assume separable covariance function. That is, if x has J dimensions, then r(x x θ) = J j=1 r j(x j x j θ) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

31 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

32 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

33 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Monte Carlo integration [ Note X (d ) d, Y( d ), X( d ), ] θ simply conditional normals, from GP Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

34 Appendix Density Estimation Emulation Predictions Data Across Input Emulated Values Across Input Fraction of Bin Out of Sample Histgram Probability of Bin Out of Sample Prediction Aj Aj Figure: The left plot depicts the bin probability data points, denoting the holdout set, while the right plot depicts emulator predictions Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

The Gaussian Process Density Sampler

The Gaussian Process Density Sampler The Gaussian Process Density Sampler Ryan Prescott Adams Cavendish Laboratory University of Cambridge Cambridge CB3 HE, UK rpa3@cam.ac.uk Iain Murray Dept. of Computer Science University of Toronto Toronto,

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Student-t Process as Alternative to Gaussian Processes Discussion

Student-t Process as Alternative to Gaussian Processes Discussion Student-t Process as Alternative to Gaussian Processes Discussion A. Shah, A. G. Wilson and Z. Gharamani Discussion by: R. Henao Duke University June 20, 2014 Contributions The paper is concerned about

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Probabilistic Graphical Models Lecture 20: Gaussian Processes Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

NONPARAMETRIC BAYESIAN DENSITY MODELING WITH GAUSSIAN PROCESSES

NONPARAMETRIC BAYESIAN DENSITY MODELING WITH GAUSSIAN PROCESSES Submitted to the Annals of Statistics NONPARAMETRIC BAYESIAN DENSITY MODELING WITH GAUSSIAN PROCESSES By Ryan P. Adams, Iain Murray and David J.C. MacKay University of Toronto, University of Edinburgh

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Gaussian Process Regression Networks

Gaussian Process Regression Networks Gaussian Process Regression Networks Andrew Gordon Wilson agw38@camacuk mlgengcamacuk/andrew University of Cambridge Joint work with David A Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Ryan Prescott Adams rpa23@cam.ac.uk Cavendish Laboratory, University of Cambridge, Cambridge CB3 HE, UK

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Hilbert Space Methods for Reduced-Rank Gaussian Process Regression Arno Solin and Simo Särkkä Aalto University, Finland Workshop on Gaussian Process Approximation Copenhagen, Denmark, May 2015 Solin &

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Machine learning: Hypothesis testing. Anders Hildeman

Machine learning: Hypothesis testing. Anders Hildeman Location of trees 0 Observed trees 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 700 800 900 1000 Figur: Observed points pattern of the tree specie Beilschmiedia pendula. Location of

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Non-Gaussian likelihoods for Gaussian Processes

Non-Gaussian likelihoods for Gaussian Processes Non-Gaussian likelihoods for Gaussian Processes Alan Saul University of Sheffield Outline Motivation Laplace approximation KL method Expectation Propagation Comparing approximations GP regression Model

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

State Space Representation of Gaussian Processes

State Space Representation of Gaussian Processes State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

New Insights into History Matching via Sequential Monte Carlo

New Insights into History Matching via Sequential Monte Carlo New Insights into History Matching via Sequential Monte Carlo Associate Professor Chris Drovandi School of Mathematical Sciences ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Bayesian Model Selection Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information