Computer Emulation With Density Estimation

Size: px

Start display at page:

Download "Computer Emulation With Density Estimation"

Pierce Williamson
5 years ago
Views:

1 Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

Computer Emulation Motivation Expensive Experiments 1 1 http://cms.web.cern.

2 Computer Emulation Motivation Expensive Experiments Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

3 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

4 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure Want to predict

5 Density Estimation Physics Data Motivation & Literature Review Physics Data 2 Outputs are frequency histograms rather than just multivariate vectors with unknown correlation structure Want to predict underlying density given physics input parameters - suggests Bayesian density estimation and regression 2 [G. Aad et al., 2010] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

6 Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

7 Density Estimation Physics Data Motivation & Literature Review Density Estimation Literature We aim measure smoothness of the density with a Gaussian Process. Some prior work in this area: Logistic GP prior ([Lenk, 1991], [Lenk, 2003], [Tokdar, 2007], [Tokdar et al., 2010], [Riihimäki and Vehtari, 2010]) Latent Factor Models ([Kundu and Dunson, 2014]) Exact sampling of transformed GP ([Adams et al., 2009]) Complication - we don t have access to draws, only counts within bins Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

8 Density Estimation Single-Histogram Model Likelihood Let Y j be the counts in bin j, which has edges [α j 1, α j ). Marginally Poisson jointly Multinomial (conditional on total) p( Y ) J j=1 p Y j j p j α j f (t)dt α j 1 We aim to model the unknown density f (t) nonparametrically with a smooth, continuous function over [0, 1]. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

9 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

10 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

11 Density Estimation Single-Histogram Model Main Idea f (t) = 2[an Z n cos(2πnt) + b n W n sin(2πnt)] + 1 n=1 where n a2 n + b 2 n < and {Z n }, {W n } iid N(0, 1). Then f is a GP with covariance function c(t, t ) = n 2a2 n cos(2πn[t t ]) if a n = b n 1 0 f (t)dt = 1 α j α j 1 f (t)dt can be easily found and pre-computed Downside Not positive a.s. Hope - P(f (t) < 0) is very small in region of interest Positive values are often modeled with normal RVs if far enough away from zero (heights, rainfall, etc) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

12 Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

13 Density Estimation Single-Histogram Model Toy Example We let c = 0.3 and r = 0.7, while looking to estimate 10,000 draws from a Beta(3, 7) distribution in 6 evenly-spaced bins in [0, 0.6] GP Density Estimate, Bins = 6 & N x = 5 Density Post Mean Post 95% Cred Truth y Decent enough! Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

14 Density Estimation Multiple-Histogram Model Extending the Model Now we assume that we have an input d upon which we condition our estimate f (t d) = N 2an [Z n (d) cos(2πnt/t ) + W n (d) sin(2πnt/t )] + γ/t n=1 where {Z n ( )}, {W n ( )} GP(0, c M (, )) Thus, each component of the Karhunen-Loève representation is itself a GP. Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

15 Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities Predicted Probability Predicted Truth Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

16 Density Estimation Multiple-Histogram Model Initial Results I chose c = 1, r = 0.5, and squared-exponential kernel. Predicted Bin Probabilities GP Density Estimate Predicted Probability Predicted Truth Density Post Mean 95% Interval Truth Bin y Bin probability prediction is good, density prediction less so Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

17 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

18 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

19 Density Estimation Multiple-Histogram Model Strawman The naïve emulation strategy treats the histogram counts as multivariate normals and rotates them via PCA to apply independent GPs Adjusts for within-histogram correlation through PCA No density estimation Predicted Bin Probabilities Strawman Predicted Probability Predicted Truth Bin Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

20 Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

21 Future Directions Thoughts and Future Directions Improvement over strawman will have to come in full density estimation Increasing N (with higher r) could provide more flexibility to avoid strange tail behavoir A different (or learned) a n could lead to other processes Future Directions Incorporate calibration Improve density estimation Show some form of posterior consistency with counts and bins going to infinity Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

22 Future Directions Thank you! Questions? Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

23 Future Directions Works Cited: I Adams, R., Murray, I., and MacKay, D. (2009). Nonparametric Bayesian density modeling with gaussian processes. G. Aad et al. (2010). Observation of a centrality-dependent dijet asymmetry in lead-lead collisions atsnn=2.76 TeVwith the ATLAS detector at the LHC. Physical Review Letters, 105(25). Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high dimensional output. Journal of the American Statistical Association, 103(482): Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. (2004). Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing, 26(2): Kundu, S. and Dunson, D. B. (2014). Latent factor models for density estimation. Biometrika, 101(3): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

24 Future Directions Works Cited: II Lenk, P. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78(3): Lenk, P. (2003). Bayesian semiparametric density estimation and model verification using a logistic-gaussian process. Journal of Computational and Graphical Statistics, 12(3): Riihimäki, J. and Vehtari, A. (2010). Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Analysis, 9(2): Tokdar, Zhu, and Gosh (2010). Bayesian density regression with logistic gaussian process and subspace projection. Bayesian Analysis, 5(2): Tokdar, S. T. (2007). Towards a faster implementation of density estimation with logistic gaussian process priors. Journal of Computational and Graphical Statistics, 16(3): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

25 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

26 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

27 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

28 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

29 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

30 Appendix Computer Emulation Covariance Functions The covariance function c(, ) is often of the form c(x, x ) = λ 1 r(x x θ). Examples of r( θ): Power Exponential: r(h α, l) = e h/l α, where α (0, 2] Usually learn l and fix α. Setting α = 2 makes the function infinitely differentiable - maybe undesirable. Sometimes set α = 1.9 for computational stability ( Matérn: r(h ν, l) = 21 ν h ) ν ( h ) Γ(ν) l Kν l, where Kν is the modified Bessel function of the second kind For ν = n/2 for n N, this has closed form. Most common are ν = 3/2 and ν = 5/2 ν = 3/2 : r(h l) = e ( ) h/l 1 + ( h l ) ν = 5/2 : r(h l) = e h/l 1 + h + h2 l 3l 2 Usually assume separable covariance function. That is, if x has J dimensions, then r(x x θ) = J j=1 r j(x j x j θ) Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

31 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

32 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

33 Appendix Density Estimation Predictive Posterior The change to a GP prior on the components allows us to predict bin probabilities given new input d. Let Y( d ) and Y (d ) be the histogram counts for in-sample and out-of-sample inputs, respectively (similarly for X and P). [ We want Y (d ) d, Y( d ] ) Note P is a linear transformation of X [ Y (d ) d, Y( d ] ) = [ Y (d ) X (d ), d, Y( ] [ d ) X (d ) d, Y( ] d ) dx X [ X (d ) d, Y( d ] ) = [ X(d ) d, Y( d ), X( d ), ] [ θ X( d ), θ d, Y( ] d ), dxdθ Θ X Monte Carlo integration [ Note X (d ) d, Y( d ), X( d ), ] θ simply conditional normals, from GP Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

34 Appendix Density Estimation Emulation Predictions Data Across Input Emulated Values Across Input Fraction of Bin Out of Sample Histgram Probability of Bin Out of Sample Prediction Aj Aj Figure: The left plot depicts the bin probability data points, denoting the holdout set, while the right plot depicts emulator predictions Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, / 17

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature