Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Similar documents
BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ASYMPTOTIC THEORY FOR KRIGING WITH ESTIMATED PARAMETERS AND ITS APPLICATION TO NETWORK DESIGN

Chapter 4 - Fundamentals of spatial processes Lecture notes

Covariance function estimation in Gaussian process regression

Chapter 4 - Fundamentals of spatial processes Lecture notes

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Default priors and model parametrization

Statistics & Data Sciences: First Year Prelim Exam May 2018

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Likelihood-Based Methods

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Model Selection for Geostatistical Models

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Hierarchical Modeling for Univariate Spatial Data

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

Bootstrap and Parametric Inference: Successes and Challenges

Bayesian and Frequentist Methods for Approximate Inference in Generalized Linear Mixed Models

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Weighted Least Squares I

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Hierarchical Modelling for Univariate Spatial Data

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Econ 582 Nonparametric Regression

Bayesian estimation of the discrepancy with misspecified parametric models

Spring 2012 Math 541B Exam 1

MCMC algorithms for fitting Bayesian models

Parametric Techniques Lecture 3

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Stat 5101 Lecture Notes

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Geostatistical Modeling for Large Data Sets: Low-rank methods

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Submitted to the Brazilian Journal of Probability and Statistics

F & B Approaches to a simple model

ABC random forest for parameter estimation. Jean-Michel Marin

MIT Spring 2015

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Introduction to Estimation Methods for Time Series models Lecture 2

ML estimation: Random-intercepts logistic model. and z

The Bayesian approach to inverse problems

1. Fisher Information

A Very Brief Summary of Statistical Inference, and Examples

Methods of Estimation

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

STAT 730 Chapter 4: Estimation

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Likelihood and p-value functions in the composite likelihood context

Masters Comprehensive Examination Department of Statistics, University of Florida

Default Priors and Effcient Posterior Computation in Bayesian

Parametric Techniques

Problem Selected Scores

An Extended BIC for Model Selection

1 Mixed effect models and longitudinal data analysis

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

The loss function and estimating equations

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

1 One-way analysis of variance

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Graduate Econometrics I: Maximum Likelihood I

Generalized Linear Models. Kurt Hornik

Overview of Spatial Statistics with Applications to fmri

ASSESSING A VECTOR PARAMETER

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

MIXED MODELS THE GENERAL MIXED MODEL

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Some Curiosities Arising in Objective Bayesian Analysis

Statistics for Spatial Functional Data

Introduction to General and Generalized Linear Models

Computational Statistics. Jian Pei School of Computing Science Simon Fraser University

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

Stat 5102 Final Exam May 14, 2015

ECE521 week 3: 23/26 January 2017

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

2 Statistical Estimation: Basic Concepts

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Forecasting Data Streams: Next Generation Flow Field Forecasting

Linear Methods for Prediction

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Parametric Bootstrap Methods for Bias Correction in Linear Mixed Models

Lecture 3 September 1

Step-Stress Models and Associated Inference

,..., θ(2),..., θ(n)

Applied Asymptotics Case studies in higher order inference

The comparative studies on reliability for Rayleigh models

Transcription:

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department of Statistical Science University of North Carolina at Chapel Hill Department of Statistics and Operations Research January 14th, 2009

Outline Introduction and Motivation Statistical Methods Background Univariate Linear Methods Developed (Smith and Zhu) Research Questions Multivariate Non-Linear Development Simulation Results Conclusions and Future Work 1

Introduction Motivation The need often arises in spatial settings for data transformation. The transformation may be non-linear, and the desired predictand may require interpolation of predictions at multiple sites. In traditional kriging methods, standard formula for MSPE does not take into account estimation of covariance parameters - this generally leads to underestimated prediction errors Bayesian methods offer a solution, but iterative methods can be computationally time intensive 2

Introduction Possible Solution Smith and Zhu (2004) establish a second-order expansion for predictive distributions in Gaussian processes with estimated covariances. Here, we establish a similar expansion as in Smith and Zhu for multivariate kriging with non-linear predictands. Main Results Explicit formula for a general, non-linear predictand for: the expected length of a Bayesian prediction interval the coverage probability bias Matching prior (CPB=0) and alternative estimator are explored 3

Background: Spatial Statistics We have a stochastic process Z(s), generally assumed to be Gaussian with known mean µ and covariance structure V (θ). Z(s) = µ(s) + e(s) where e(s) is a zero mean error process. Basic mode:l Z N(µ, V (θ)) And the model with mean as linear function of covariates: N(Xβ, V (θ)), with Z X a matrix of covariates β a vector of unknown regression coefficients. 4

Background: Covariance Structures Example: Covariance function for the exponential model: cov{z(s i ), Z(s j )} = σ 2 exp ( ( )) dij φ (1) V (θ) vector of standardized covariances determined by θ = (σ 2, φ). Underlying covariance( structure introduced through V (θ), matrix with entries v ij = exp ( d ij φ ). ) φ = range parameter σ = scale parameter 5

Background: Kriging Kriging: technique for predicting values at unobserved locations w/in random field through linear combinations of observed variables. Refers to the construction of spatial predictor in terms of known model parameters. Universal kriging: mean process is a linear combination of covariates. Y is vector of known observations; Y 0 value to be predicted (scalar) ( Y Y 0 ) N (( Xβ x T 0 β ) ( V (θ) w, T (θ) w(θ) v 0 (θ) )) (2) X is n p vector of covariates for observations Y x 0 is p 1 vector of covariates for predicted scalar Y 0 β vector of regression coefficients θ is vector of covariance parameters. 6

Background: Kriging Universal kriging aims to find linear predictor Ŷ 0 = λ T Y that minimizes MSPE E { (Y 0 Ŷ 0 ) 2} subject to condition X T λ = x 0. Using Lagrange multipliers, the optimal λ is: λ(θ) = V 1 (θ)w(θ)+v 1 (θ)x(x T V 1 (θ)x) 1 (x 0 X T V 1 (θ)w(θ)). with corresponding MSPE: σ0 2 (θ) = v 0(θ) w(θ) T V 1 (θ)w(θ) + (x 0 X T V 1 (θ)w(θ)) T (X T V 1 (θ)x) 1 (x 0 X T V 1 (θ)w(θ)). Thus the predictive distribution function is: Pr {Y 0 z Y = y, θ} = ψ(z; y, θ) = Φ ( z λ(θ) T y σ 0 (θ) ) 7

Background: REML Estimation Restricted Maximum Likelihood (REML) estimation is based on the joint density of vector of contrasts. Distribution independent of population mean, and resulting maximum likelihood estimator is approximately unbiased, as opposed to the MLE. REML estimator (Smith, 2001; Stein, 1999) is max Θ ln (θ) where ln(θ) = n q 2 log(2π) + 1 2 log XT X 1 2 log XT V (θ) 1 X 1 2 log V (θ) 1 2 G2 (θ) where G 2 (θ) is the generalized residual sum of squares G 2 (θ) = Y T {V 1 (θ) V 1 (θ)x(x T V 1 (θ)x) 1 X T V 1 (θ)}y Use REML estimator ˆθ to obtain the predictive distribution function: ˆψ(z; y, θ) = ψ(z; y, ˆθ) 8

Background: Smith and Zhu (2004) Smith and Zhu provide the original development for univariate normal predictive distribution of methods considered in this paper for non-linear multivariate case. This includes: Establishing a second-order expansion for predictive distributions in Gaussian processes. Using covariance parameter estimates (REML) in the plug-in approach as well as Bayesian methods. Main focus is the estimation of quantiles for predictive distribution and application to prediction intervals. Leads to calculation of second-order coverage probability bias - lends itself to possible existence of matching prior where CPB=0. Also: frequentist correction, z P, leads to coverage probability bias of zero, analogous to existence of matching prior. 9

Introduce Notation Recall the restricted log-likelihood function, l n (θ) Let U i = l n(θ) θ i, U ij = 2 l n (θ) θ i θ j, etc, U ij is (i, j) entry of inverse of matrix whose (i, j) entry is U ij and Q(θ) is the log of the prior, π(θ). Superscripts denote components of vectors, subscripts indicate differentiation wrt components of θ; summation notation. Function of interest is predictive distribution function, ψ(z; Z, θ). ψ denotes either plug-in estimator ˆψ, or Bayesian estimator, ψ: ψ = ˆψ + ˆD + O p (n 2 ) (3) where D = 1 2 U ijkψ l U ij U kl 1 2 (ψ ij + 2ψ i Q j )U ij (4) and ˆD indicates the evaluation of D at ˆθ. 10

Introduce Notation Further, introduce random Z i, Z ij, Z ijk with mean 0 such that U i = n 1 2Z i, U ij = nκ ij + n 1 2Z ij, U ijk = nκ ijk + n 1 2Z ijk where κ i,j = E{Z i Z j }, κ ij,k = E{Z ij Z k }. Note κ i,j = κ ij and is (i, j) entry of normalized Fisher information matrix. Matrix is assumed invertible w/ inverse entries κ i,j. 11

Univariate Normal Case Assume that ψ has expansion: ψ (z; Y ) = ψ(z; Y, θ) + n 1 2R(z, Y ) + n 1 S(z, Y ) + o p (n 1 ) (5) For both the plug-in and Bayesian method, components of R and S can be calculated explicitly, using a Taylor expansion for the plug-in approach and a combination of Taylor and Laplace for the Bayesian approach. For ẑ P (plug-in estimator), R = κ i,j Z i ψ j (6) S = κ i,j κ k,l Z ik Z j ψ l + 1 2 κi,r κ j,s κ k,t κ ijk Z r Z s ψ t + 1 2 κi,j κ k,l Z i Z k ψ jl (7) where S for ẑ P is further denoted S 1. For z P (Bayesian estimator), corresponding expression is: S 2 = S 1 + 1 2 κ ijkκ i,j κ k,l ψ l + ( 1 2 ψ ij + ψ i Q j )κ i,j where Q(θ) is the log of the prior, π(θ). 12

Coverage Probability Bias Hence the coverage probability bias, the expected value of ψ(zp ; Y, θ) ψ(z P ; Y, θ), is expressed: CPB = E[ n 1/2 R(z P, Y ) + n 1 R(z P, Y )R (z P, Y ) ψ (z P ; Y, θ) S(z P, Y ) + o p (n 1 )] (8) The coverage probability bias represents difference between P {Y 0 zp Y, θ)} and target probability P, where z P is the plug-in estimate zˆ P or the Bayesian estimate z P of P-quantiles of target distribution. 13

Key Findings Development of these expansions allows comparison with standard frequentist correction procedures. It also allows for selection of design criterion based on expected length of a prediction interval and coverage probability bias. Matching Prior Interesting development: coverage probability bias can be reduced to a form (Smith, 2004) that suggests existence of a matching prior. May be possible to chose prior, π, so that expectations of the O(n 1/2 ) and O(n 1 ) terms in second-order CPB are zero. Important result because while it may be difficult or impratical to compute matching prior, it lends itself to assisting in prior selection based on how closely different forms of standard priors (Jeffreys, reference prior, etc.) come to matching prior. 14

Key Findings Estimator z P is a form of the asymptotic bias and includes a frequentist correction term developed by Harville and Jeske (1992) and Zimmerman and Cressie (1992): z P = ẑ P n 1 asymptotic bias φ(φ 1 (P )) (9) To calculate CPB, calculation of moments of various expressions involving R, S, and their derivatives is needed. By the asymptotic formulae, these can be expressed in terms of derivatives of ψ and other quantities that are explicit functions of the Gaussian process. 15

Preliminary Simulation for Univariate Linear Predictand In this paper we first looked at a preliminary simulation for the univariate predictand. Random plane of 16 location values Simulated corresponding observations: Y (s) = X T (s)β + S(s) X(s) is column vector with entries 1, s 1, s 2 where s 1 and s 2 are the coordinates at site s. β = (123) T and S(s) is a stationary Gaussian process with mean 0 Exponential covariance structure with σ = 1 and φ = 1 is used. Parameter estimates are obtained at each site using the other 15 sites. Theoretical 95% PI are constructed and the Empirical Coverage Probabilities are computed over 100 simulations at each site. 16

Preliminary Simulation Results Empirical coverage probabilities obtained through kriging using REML estimates for the covariance parameters are much lower than 95%, with values ranging from 64% to 89% and an Average Empirical Coverage (AEC) of 81.4%. Discrepancy can be attributed to error introduced into model through estimates of covariance parameters Bayesian method used Gibbs sampling in WinBUGS with REML estimates as starting values. Showed empirical coverage results around 95%, with a range of 86% to 99%, and an AEC of 94.8%. The Smith-Zhu Laplace approximation method shows definite improvement with an Average Empirical Coverage of 92.9% and a range of 87% to 98% coverage. 17

Comparison of Laplace, Bayesian, and Plug-In Methods Empirical Coverage Probabilities 2 Param Exponential Predictions at 16 Sites Coverage Probability 60 70 80 90 100 o o True: AEC=94% Bayes: AEC =95% REML: AEC=81% Lapl: AEC=93% 5 10 15 Sites 18

Conclusion Key results of Smith and Zhu s 2004 paper are expressions for coverage probability bias and expected length of a prediction interval for both plug-in and Bayesian predictors. Established for Gaussian process with mean that is combination of linear regressors and parametrically specified covariance. Possible existence of a matching prior introduced Frequentist correction allows for second-order CPB of zero. This paper expands these methods to the analogous non-linear multivariate predictands, such as those motivated by methods established in Smith, Kolenikov, and Cox (2003). 19

Multivariate Non-Linear Development An important difference between the univariate linear and multivariate non-linear case, is that the multivariate predictive distribution (G ) not necessarily available in closed form. Thus, a method is needed to determine derivatives of predictive distribution function. In the multivariate case, predictand can be written as H = m j=1 h(y 0,j ) or H = H(Y 0 ) (10) where h( ) is linear kriging function, such as h(y) = y 2. Example: variance stabilizing square root transform where desired predictand is spatial average (Smith, Kolenikov and Cox, 2003.) H( ) is a more general transformation function 20

Multivariate Non-Linear Development Assume G has an expansion: G (z; Y ) = G(z; Y, θ) + n 1 2R(z, Y ) + n 1 S(z, Y ) + o p (n 1 ) Consider multivariate non-linear expansion analogs to Smith and Zhu s expansions and form of CPB: R = κ i,j Z i G j S = κ i,j κ k,l Z ik Z j G l + 1 2 κi,r κ j,s κ k,t κ ijk Z r Z s G t + 1 2 κi,j κ k,l Z i Z k G jl CPB = E[ n 1 2R(z P, Y ) + n 1 R(z P, Y )R (z P, Y ) G (z P ; Y, θ) (11) S(z P, Y ) + o p (n 1 )] Extension to multivariate kriging is generalization of univariate. 21

Multivariate Non-Linear Development Objective is to evaluate G = P (H(Y 0 ) z y; θ) and its partial derivatives. For the multivariate, nonlinear predictand, the exact form of G may not be easily manipulated. Develop methodology to derive derivatives of predictive distribution G with respect to z, θ, and both z and θ Employ kernel density estimation to evaluate each term Develop parameteric bootstrap to estimate empirical cdf Can estimate predictive distribution as G B (z Y, θ) = 1 B ΣI{H(Y b o ) z}, where B denotes number of iterations and G B represents the bootstrapped estimate 22

Derivative Development For G (z Y, θ) = E f(y0 Y,θ) [I{H(Y 0) z}], partial derivatives up to order 2 are necessary; can be expressed as expectations wrt predictive distribution function. θ i I{H(Y 0 ) z}f(y 0 Y, θ)dy 0 = E f(y0 Y,θ) [ I{H(Y 0 ) z} ] θ ilnf(y 0 Y, θ) where f(y 0 Y, θ) is the restricted likelihood and be analytically evaluated. θ ilnf(y 0 Y, θ) can In practice, I{H(Y 0 ) z} θ ilnf(y 0 Y, θ) is empirically estimated and averaged over many iterations using a numerical approximation for derivatives of the restricted log-likelihood. Simulated values are used in place of theoretical expected values. 23

Kernel Density The cumulative distribution of the kernel is used to approximate the predictive distribution G (z Y, θ). G (z Y, θ) = 1 B ΣI{H(Y b o ) z} 1 B ΣB b=1 K 1( z H(Y b h o ) ) (12) The kernel density is further expressed as K and its distribution function written K 1. Density used to estimate predictive density, G (z Y, θ). K 1 = K estimates the predictive distribution G (z Y, θ). 24

Kernel Density Here we consider the Epanechnikov kernel outlined in Silverman (1986). The Epanechnikov kernel has an efficiency of 1; based on minimizing mean integrated squared error. 1 f(t) = Bh ΣB b=1 3 ( 4 1 1 5 5 t 2) 5 t 5 0 otherwise. where t = z H(Y 0) h with z the predicted value, H(Y 0 ) the true value, and h smoothing parameter (bandwidth). Epanechnikov Kernel Value of Kernel 0.00 0.10 0.20 0.30 2 1 0 1 2 t 25

Multivariate Non-Linear Development In summary, estimation is achieved through a parametric bootstrap : 1. For b = 1,..., B replications generate Y (b) 0 N[Λ T Y, V 0 ]. 2. Calculate G (z Y, θ) = P [H(Y 0 ) z] 1 B b I{H(Y (b) 0 ) z}. 3. Use kernel density differentiable wrt z and wrt the components of θ to approximate Ĝ and derivatives wrt z 4. Express derivatives wrt θ as expectations wrt restricted loglikelihood f(y 0 Y, θ) 5. Evaluate derivatives wrt θ using numerical approximation 26

Expansion of Coverage Probability Bias The coverage probability bias can be expressed as: E [G (z P ) G (z P )] = n 1 2 ( n 1 2κ i,j E + n 1 (n 1 κ i,j κ k,l E [ 1 U i B ΣB b=1 K 1( z H(Y b h ] ) [ ARR G B RR G E [S G ] o ) ) θ ilnf(y b 0 Y, θ) ]) (13) where K 1 is the kernel distribution with kernel density K, A RR G and B RR G are as expressed in Equations (14) and (15) and E[S G ] is the appropriate S for the desired plug-in or bayesian prediction. 1 A RR G = U i B ΣB b=1 K 1( z H(Y o b) ) h θ lnf(y b j 0 Y, θ)u k Σ B 1 b=1 Bh K(z H(Y o b) ) h θ llnf(y 0 b Y, θ) (14) B RR G = 1 Bh ΣB b=1 K ( z H(Y b h o ) ) (15) 27

For the Bayesian approach, where S = S 2G : E [S 2G ] = E [S 1G ] [ 1 + 1 2 κ ijkκ i,j κ k,l E B ΣB b=1 K 1( z H(Y o b ) ) h θ llnf(y 0 b Y, θ) + 1 [ 1 2 κi,j E B ΣB b=1 K 1( z H(Y o b ] ) )A S2G h [ + κ i,j 1 E B ΣB b=1 K 1( z H(Y o b ) ) ] h θ ilnf(y 0 b Y, θ) Q j (16) ] where Q(θ) = log(π(θ)) from the Bayesian framework and A S2G = ( 2 lnf(y b 0 Y, θ) + θ i θ j θ ilnf(y 0 b Y, θ) θ j lnf(y 0 b Y θ) ) 28

Asymptotic Frequentist Correction Alternative to Matching Prior Not necessary to find exact form of matching prior. Construct artificial predictor z P, equivalent to Bayesian predictor, as an alternative to solving Equation (13) by obtaining matching prior. For percentile P, define z P. Laplace Approximation [ ] z P = ẑ P n 1 Equation (13) G (G 1 (P )) (17) where Equation (13) is the expression for the CPB. This is a function of the asymptotic bias as seen in Equation (9) from Smith and Zhu, and is an analog to the univariate normal case. 29

Simulation To compare the Laplace approximation technique to the standard Plug-In approach using REML estimates, a simulation was constructed. Here we look specifically at the sum of the squares of predictions over multiple sites. Run over N = 100 iterations - can be thought of as a double loop. Outer loop generates predictions by kriging w/ REML estimates Inner loop uses kernel density estimation to obtain an empirical predictive distribution across the prediction sites and calculate estimate using Laplace approximation technique 30

Simulation n 1 sites of (s 1, s 2 ) coordinates are randomly generated and a random field Y (S) is generated of the form Y (s) = X T (s)β + S(s). X(s) is column vector with entries s 1, s 2, β = ( 1 2 ) and S(s) is a stationary Gaussian process with mean 0 and variance σ 2 = 1. Correlation function is parametrized by an exponential covariance structure cov{y (s 1 ), Y (s 2 )} = σ 2 (exp( d ij φ )) σ = 1.0 and the range parameter φ = 0.2. 31

Simulation n 1 Y values ( n 1 = 30) treated as observed across original sites. An additional n 2 = 5 sites are generated, and corresponding simulated field values, Y 0, are treated as true values. Objective of simulation is to interpolate a non-linear, multivariate predictand H(Y 0 ) across the n 2 sites. Predictand here is the sum of squares across the n 2 sites, H(Y 0 ) = Σ n 2 i=1 Y 2 0. 32

Simulation Of interest is the empirical coverage of 95% theoretical prediction intervals generated using the Laplace method vs the plug-in method. n 1 observed sites are used to obtain parameter estimates (using REML) for the range parameter φ and the shape parameter σ. REML estimates plugged-in to the universal kriging methods to interpolate sum of squares prediction over n 2 sites Laplace approximation is calculated using developed methodology 33

Simulation Results Non-Linear Plug-In vs Laplace Prediction Intervals: 1 Parameter Empirical 95% PI s for plug-in method result in severe undercoverage. An Average Empirical Coverage (AEC) of 75.8% was found. Laplace approximation technique resulted in an improvement. AEC for the Laplace approximation prediction intervals was 78.7%. Note that the Laplace approximation sometimes exhibits erratic behavior - the correction produced extremely large adjustments, possibly due to the REML estimates hitting the bounds set in the optimization algorithm and the fact that asymptotic arguments are not as reliable for small samples. Thus it is reasonable to assume that the Laplace approximation may result in even more of an improvement over the plug-in method if this can be corrected, and provides an interesting area of future study. 34

Empirical probabilities from the Laplace technique are plotted against the plug-in coverages. As expected there is a strong positive correlation. Line y = x shows that majority of Laplace coverages are larger than plug-in coverages. 35

Simulation Results Empirical Coverage Probabilities: 2 Parameter Case Empirical 95% PI s for plug-in method result in undercoverage. An Average Empirical Coverage (AEC) of 91.9% was found. Laplace approximation technique resulted in a slight improvement. AEC for Laplace approximation prediction intervals was 92.2%. Empirical probabilities from Laplace technique are plotted against plug-in coverages. Line y = x shows empirical coverages of Laplace approximation larger than plug-in coverages in about half of simulated intervals. Plot shows evidence of greater improvement in Laplace approximation over plug-in method when empirical coverage is low 36

Conclusions Developed a practical method for analytical evaluation which included a boot-strap method to obtain predictions for general, non-linear predictands and incorporates kernel density estimation for unknown or computationally difficult predictive distribution. Laplace approximation technique showed improvement in linear univariate case, with empirical coverage probabilities for PI s analogous to Bayesian PI s, both showing very close agreement to theoretical prediction coverage of 95%. Simulation for multivariate non-linear predictand showed promising results for Laplace approximation technique. Results also suggest the existence of matching prior for nonlinear predictands, which has a form analogous to the form derived for the univariate normal case in Smith and Zhu (2004). 37

Future Work Investigation of different prior specifications for Bayesian and Laplace methods, specifically the Jeffreys prior and the reference prior Specific computation of matching prior to achieve second-order coverage probability bias of zero Application to data analysis, such as the square-root transformation of the PM 2.5 data considered in Smith, Kolenikov, and Cox (2003) 38

Thank You! 39