Computer Emulation With Density Estimation

Similar documents
Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

STAT 518 Intro Student Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian estimation of the discrepancy with misspecified parametric models

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Nonparametric Bayesian Methods - Lecture I

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Hierarchical Modeling for Univariate Spatial Data

FastGP: an R package for Gaussian processes

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

The Gaussian Process Density Sampler

A Process over all Stationary Covariance Kernels

GAUSSIAN PROCESS REGRESSION

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Methods for Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Introduction to Gaussian Processes

Density Estimation. Seungjin Choi

STA414/2104 Statistical Methods for Machine Learning II

Gaussian Process Regression

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Bayesian Dynamic Linear Modelling for. Complex Computer Models

CPSC 540: Machine Learning

The Bayesian approach to inverse problems

Non-Parametric Bayes

Gaussian Processes in Machine Learning

Approximate Inference Part 1 of 2

Student-t Process as Alternative to Gaussian Processes Discussion

Pattern Recognition and Machine Learning

Disease mapping with Gaussian processes

Approximate Inference Part 1 of 2

Markov Chain Monte Carlo methods

Hierarchical Modelling for Univariate Spatial Data

Probabilistic Graphical Models Lecture 20: Gaussian Processes

Gaussian Processes (10/16/13)

NONPARAMETRIC BAYESIAN DENSITY MODELING WITH GAUSSIAN PROCESSES

Introduction to Probabilistic Machine Learning

Gaussian Process Regression Networks

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

The Variational Gaussian Approximation Revisited

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Gaussian with mean ( µ ) and standard deviation ( σ)

Machine Learning Techniques for Computer Vision

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities

GWAS V: Gaussian processes

STA 4273H: Statistical Machine Learning

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayes methods for categorical data. April 25, 2017

Bayesian non-parametric model to longitudinally predict churn

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Nonparameteric Regression:

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

STA 4273H: Statistical Machine Learning

Probabilistic & Unsupervised Learning

20: Gaussian Processes

Machine learning: Hypothesis testing. Anders Hildeman

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Unsupervised Learning

Statistical Data Analysis Stat 3: p-values, parameter estimation

Latent factor density regression models

Non-Gaussian likelihoods for Gaussian Processes

A short introduction to INLA and R-INLA

Bayesian Support Vector Machines for Feature Ranking and Selection

Learning Bayesian network : Given structure and completely observed data

Efficient Likelihood-Free Inference

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

12 - Nonparametric Density Estimation

Curve Fitting Re-visited, Bishop1.2.5

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Bayesian Nonparametrics

Expectation Propagation for Approximate Bayesian Inference

Factor Analysis and Kalman Filtering (11/2/04)

State Space Representation of Gaussian Processes

Bayesian Models in Machine Learning

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Hierarchical Modelling for Univariate Spatial Data

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

CPSC 340: Machine Learning and Data Mining

CS 7140: Advanced Machine Learning

Introduction. Chapter 1

New Insights into History Matching via Sequential Monte Carlo

Introduction to Geostatistics

Contents. Part I: Fundamentals of Bayesian Inference 1

Recent Advances in Bayesian Inference Techniques

Bayesian Modeling of Conditional Distributions

Statistical Approaches to Learning and Discovery

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Transcription:

Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17

Computer Emulation Motivation Expensive Experiments 1 1 http://cms.web.cern.ch/news/jet-quenching-observed-cms-heavy-ion-collisions Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 2 / 17

Density Estimation Physics Data Motivation & Literature Review Physics Data 2 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 3 / 17

Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 3 / 17

Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure Want to predict underlying density given physics input parameters - suggests Bayesian density estimation and regression 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 3 / 17

Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 4 / 17

Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Complication - we don t have access to draws, only counts within bins Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 4 / 17

Density Estimation Single-Histogram Model Likelihood Let Y j be the counts in bin j, which has edges [α j 1, α j ). Marginally Poisson jointly Multinomial (conditional on total) p( Y ) J j=1 p Y j j p j α j f (t)dt α j 1 We aim to model the unknown density f (t) nonparametrically with a smooth, continuous function over [0, 1]. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 5 / 17

Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 6 / 17

Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 6 / 17

Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Hope - P(f (t) < 0) is very small in region of interest Positive values are often modeled with normal RVs if far enough away from zero (heights, rainfall, etc) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 6 / 17

Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 7 / 17

Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] GP Density Estimate, Bins = 6 & N x = 5 Density 0.0 0.5 1.0 1.5 2.0 2.5 Post Mean Post 95% Cred Truth 0.0 0.1 0.2 0.3 0.4 0.5 0.6 y Decent enough! Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 7 / 17

Density Estimation Multiple-Histogram Model Extending the Model Now we assume that we have an input d upon which we condition our estimate f (t d) = N 2an [Z n (d) cos(2πnt/t ) + W n (d) sin(2πnt/t )] + γ/t n=1 where {Z n ( )}, {W n ( )} GP(0, c M (, )) Thus, each component of the Karhunen-Loève representation is itself a GP. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 8 / 17

Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities 0.15 0.10 Predicted Probability 0.05 0.00 Predicted Truth 0.0 0.1 0.2 0.3 0.4 Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 9 / 17

Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities GP Density Estimate Predicted Probability 0.15 0.10 0.05 0.00 Predicted Truth Density 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Post Mean 95% Interval Truth 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.5 Bin y Bin probability prediction is good, density prediction less so Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 9 / 17

Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 10 / 17

Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 10 / 17

Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Predicted Bin Probabilities Strawman 0.14 0.12 Predicted Probability 0.10 0.08 0.06 0.04 0.02 Predicted Truth 0.0 0.1 0.2 0.3 0.4 Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 10 / 17

Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 11 / 17

Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Future Directions Incorporate calibration Improve density estimation Show some form of posterior consistency with counts and bins going to infinity Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 11 / 17

Future Directions Thank you! Questions? Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 12 / 17

Future Directions Works Cited: I Adams, R., Murray, I., and MacKay, D. (2009). Nonparametric Bayesian density modeling with gaussian processes. G. Aad et al. (2010). Observation of a centrality-dependent dijet asymmetry in lead-lead collisions atsnn=2.76 TeVwith the ATLAS detector at the LHC. Physical Review Letters, 105(25). Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high dimensional output. Journal of the American Statistical Association, 103(482):570 583. Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. (2004). Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing, 26(2):448 466. Kundu, S. and Dunson, D. B. (2014). Latent factor models for density estimation. Biometrika, 101(3):641 654. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 13 / 17

Future Directions Works Cited: II Lenk, P. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78(3):531 543. Lenk, P. (2003). Bayesian semiparametric density estimation and model verification using a logistic-gaussian process. Journal of Computational and Graphical Statistics, 12(3):548 565. Riihimäki, J. and Vehtari, A. (2010). Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Analysis, 9(2):425 448. Tokdar, Zhu, and Gosh (2010). Bayesian density regression with logistic gaussian process and subspace projection. Bayesian Analysis, 5(2):319 344. Tokdar, S. T. (2007). Towards a faster implementation of density estimation with logistic gaussian process priors. Journal of Computational and Graphical Statistics, 16(3):633 655. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 14 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Usually assume separable covariance function. That is, if x has J dimensions, then r(x x θ) = J j=1 r j(x j x j θ) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 15 / 17

Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 16 / 17

Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 16 / 17

Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Monte Carlo integration [ Note X (d ) d, Y( d ), X( d ), ] θ simply conditional normals, from GP Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 16 / 17

Appendix Density Estimation Emulation Predictions Data Across Input Emulated Values Across Input Fraction of Bin 0.04 0.06 0.08 0.10 0.12 0.14 Out of Sample Histgram Probability of Bin 0.04 0.06 0.08 0.10 0.12 0.14 Out of Sample Prediction 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Aj Aj Figure: The left plot depicts the bin probability data points, denoting the holdout set, while the right plot depicts emulator predictions Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 17 / 17