Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

Similar documents
Efficient Posterior Inference and Prediction of Space-Time Processes Using Dynamic Process Convolutions

Bayesian Dynamic Linear Modelling for. Complex Computer Models

STAT 518 Intro Student Presentation

Flexible Gaussian Processes via Convolution

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data

CPSC 540: Machine Learning

Markov Chain Monte Carlo

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Introduction to Bayesian methods in inverse problems

Hierarchical Modeling for Univariate Spatial Data

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Markov Chain Monte Carlo (MCMC)

An introduction to Bayesian statistics and model calibration and a host of related topics

CSC 2541: Bayesian Methods for Machine Learning

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Markov Random Fields

Hierarchical Modelling for Univariate Spatial Data

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Bayesian data analysis in practice: Three simple examples

Computational statistics

Kernel adaptive Sequential Monte Carlo

Nearest Neighbor Gaussian Processes for Large Spatial Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

INTRODUCTION TO BAYESIAN STATISTICS

Bayesian Methods for Machine Learning

Te Whare Wananga o te Upoko o te Ika a Maui. Computer Science

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Groundwater permeability

Spatial Dynamic Factor Analysis

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

17 : Markov Chain Monte Carlo

Part 6: Multivariate Normal and Linear Models

Surveying the Characteristics of Population Monte Carlo

Wrapped Gaussian processes: a short review and some new results

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

STA 4273H: Statistical Machine Learning

On Bayesian Computation

Gibbs Sampling in Linear Models #2

MCMC algorithms for fitting Bayesian models

ST 740: Markov Chain Monte Carlo

Spatio-temporal precipitation modeling based on time-varying regressions

MCMC Sampling for Bayesian Inference using L1-type Priors

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

An introduction to Sequential Monte Carlo

MARKOV CHAIN MONTE CARLO

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Spatial Inference of Nitrate Concentrations in Groundwater

Adaptive Monte Carlo methods

Markov random fields. The Markov property

Markov chain Monte Carlo methods in atmospheric remote sensing

Bayesian Linear Models

Metropolis-Hastings Algorithm

A note on Reversible Jump Markov Chain Monte Carlo

Principles of Bayesian Inference

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Gaussian predictive process models for large spatial data sets.

Advances and Applications in Perfect Sampling

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Default Priors and Effcient Posterior Computation in Bayesian

VCMC: Variational Consensus Monte Carlo

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Bayesian Gaussian Process Regression

16 : Approximate Inference: Markov Chain Monte Carlo

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Sub-kilometer-scale space-time stochastic rainfall simulation

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Multivariate spatial modeling

Hierarchical Modelling for Univariate Spatial Data

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

STA 4273H: Statistical Machine Learning

Lecture 8: The Metropolis-Hastings Algorithm

Monte Carlo Methods. Leon Gu CSD, CMU

Nonparametric Bayesian Methods (Gaussian Processes)

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Brief introduction to Markov Chain Monte Carlo

DAG models and Markov Chain Monte Carlo methods a short overview

Advanced Sampling Algorithms

Kobe University Repository : Kernel

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

What s for today. Continue to discuss about nonstationary models Moving windows Convolution model Weighted stationary model

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Transcription:

1 Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

2 MOVING AVERAGE SPATIAL MODELS

Kernel basis representation for spatial processes z(s) Define m basis functions k 1 (s),..., k m (s). basis 0.0 0.4 0.8!2 0 2 4 6 8 10 12 3 Here k j (s) is normal density cetered at spatial location ω j : k j (s) = 1 2π exp{ 1 2 (s ω j) 2 } s set z(s) = m k j(s)x j where x N(0,I m ). j=1 Can represent z =(z(s 1 ),..., z(s n )) T as z = Kx where K ij = k j (s i )

x and k(s) determine spatial processes z(s) k j (s)x j z(s) basis!0.5 0.5 1.5 z(s)!0.5 0.5 1.5 4!2 0 2 4 6 8 10 12 s!2 0 2 4 6 8 10 12 s Continuous representation: z(s) = m j=1 k j(s)x j where x N(0,I m ). Discrete representation: For z =(z(s 1 ),..., z(s n )) T, z = Kx where K ij = k j (s i )

Estimate z(s) by specifying k j (s) and estimating x k j (s)x j z(s) fitted bases!1.0!0.5 0.0 0.5 1.0 y(s)!2.0!1.5!1.0!0.5 0.0 0.5 5!2 0 2 4 6 8 10 12 s!2 0 2 4 6 8 10 12 s Discrete representation: For z =(z(s 1 ),..., z(s n )) T, z = Kx where K ij = k j (s i ) standard regression model: y = Kx + ɛ

Formulation for the 1-d example Data y = (y(s 1 ),..., y(s n )) T observed at locations s 1,..., s n. Once knot locations ω j,j=1,..., m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) 6 where K ij = k(ω j s i ). Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T x Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T x] }

Posterior and full conditionals Posterior: π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T x]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } 7 π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Gibbs sampler implementation x N((λ y K T K + λ x I m ) 1 λ y K T y, (λ y K T K + λ x I m ) 1 ) λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

Gibbs sampler: intuition Gibbs sampler for a bivariate normal density π(z) =π(z 1,z 2 ) 1 ρ ρ 1 2 1 exp 1 2 ( z 1 z 2 ) 1 ρ ρ 1 1 z 1 z 2 Full conditionals of π(z): 9 z 1 z 2 N(ρz 2, 1 ρ 2 ) z 2 z 1 N(ρz 1, 1 ρ 2 ) initialize chain with z 0 N 0 0, 1 ρ ρ 1 z 2 (z 1 1,z 1 2)=z 1 draw z 1 1 N(ρz 0 2, 1 ρ 2 ) now (z 1 1,z 0 2) T π(z) z 0 =(z 0 1,z 0 2) (z 1 1,z 0 2) z 1

Gibbs sampler: intuition Gibbs sampler gives z 0,z 2,...,z T which can be treated as dependent draws from π(z). 10 If z 0 is not a draw from π(z), then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference about π(z): Posterior mean of z is estimated by: ˆµ 1 = 1 ˆµ 2 T Posterior probabilities: T k=1 z1 k z2 k z 2 P (z 1 > 1) = 1 T P (z 1 >z 2 )= 1 T T k=1 T k=1 90% interval: (z [5%] 1,z [95%] 1 ). I[z k 1 > 1] I[z k 1 >z k 2] z 1

1-d example m =6knots evenly spaced between.3 and 1.2. n =5data points at s =.05,.25,.52,.65,.91. k(s) is N(0, sd =.3) a y = 10, b y = 10 (.25 2 ) strong prior at λ y =1/.25 2 ; a x =1, b x =.001 3 2 3 2 basis functions 1 1 y(s) 0 k j (s) 0!1!1!2!2 20 x j!3 0 0.2 0.4 0.6 0.8 1 s 8 6 4 2 0!2!4 0 200 400 600 800 1000 iteration! x,! y!3 0 0.2 0.4 0.6 0.8 1 s 50 40 30 20 10 0 0 200 400 600 800 1000 iteration

1-d example From posterior realizations of knot weights x, one can construct posterior realizations of the smooth fitted function z(s) = m j=1 k j (s)x j. Note strong prior on λ y required since n is small. 3 2 posterior realizations of z(s) 3 2 mean & pointwise 90% bounds z(s), y(s) 1 0 z(s), y(s) 1 0 21!1!1!2!2!3 0 0.2 0.4 0.6 0.8 1 s!3 0 0.2 0.4 0.6 0.8 1 s

1-d example m = 20 knots evenly spaced between 2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 2) data posterior mean and 80% CI for z(s) 8 y -0.5 0.0 0.5 1.0 1.5 y -0.5 0.0 0.5 1.0 1.5 0 2 4 6 8 10 0 2 4 6 8 10 s s 0 50 100 150 200 posterior dist for lam_y 0 50 100 150 200 250 300 posterior dist for lam_x 0 10 20 30 40 50 60 0.0 0.5 1.0 1.5 2.0 lambda_y lambda_x

1-d example m = 20 knots evenly spaced between 2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 1) data posterior mean and 80% CI for z(s) 9 y -0.5 0.0 0.5 1.0 1.5 y -0.5 0.0 0.5 1.0 1.5 0 2 4 6 8 10 0 2 4 6 8 10 s s 0 50 100 150 200 posterior dist for lam_y 0 50 100 150 posterior dist for lam_x 0 100 200 300 400 500 600 0.5 1.0 1.5 2.0 2.5 3.0 3.5 lambda_y lambda_x

Basis representations for spatial processes z(s) Represent z(s) at spatial locations s 1,..., s n. z =(z(s 1 ),..., z(s n )) T N(0, Σ z ). Recall z = Kx, where KK T = Σ z and x N(0,I n ). Gives a discrete representation of z(s) at locations s 1,..., s n. discrete representation continuous representation 10 basis!1.0!0.5 0.0 0.5 1.0 basis!1.0!0.5 0.0 0.5 1.0 0 2 4 6 8 10 s 0 2 4 6 8 10 s z(s i )= n j=1 K ijx j z(s) = n k j(s)x j j=1 Columns of K give basis functions. Can use a subset of these basis functions to reduce dimensionality.

decomposition basis covariance Cholesky (w/piv) basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s 11 SVD basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s kernels basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s

How many basis kernels? Define m basis functions k 1 (s),..., k m (s). m = 20? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s 12 m = 10? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s m =6? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s

Moving average specifications for spatial models z(s) smoothing kernel k(s) 13 white noise process x =(x 1,..., x m ) at spatial locations ω 1,..., ω m. x N(0, 1 I m ) λ x spatial process z(s) constructed by convolving x with smoothing kernel k(s) z(s) = m x jk(ω j s) j=1 z(s) is a Gaussian process with mean 0 and covariance given by Cov(z(s),z(s )) = 1 λ x m j=1 k(ω j s)k(ω j s )

14

Example: constructing 1-d models for z(s) z(s), x(s) -3-1 1 2 3 0 2 4 6 8 10 s m = 20 knot locations ω 1,..., ω m equally spaced between 2 and 12. x =(x 1,..., x m ) T N(0,I m ) z(s), x(s) -3-1 1 2 3 0 2 4 6 8 10 z(s) = m k=1 k(ω k s)x k k(s) is a normal density with sd = 1 4, 1 2, and 1 4th frame uses k(s) = 1.4 exp{ 1.4 s }. s 16 z(s), x(s) z(s), x(s) -3-1 1 2 3-3 -1 1 2 3 0 2 4 6 8 10 s 0 2 4 6 8 10 General points: smooth kernels required spacing depends on kernel width knot spacing 1 sd for normal k(s) kernel width is equivalent to scale parameter in GP models s

17

MRF formulation for the 1-d example Data y = (y(s 1 ),..., y(s n )) T observed at locations s 1,..., s n. Once knot locations ω j,j=1,..., m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) 18 where K ij = k(ω j s i )x j. Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T Wx Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] }

Posterior: Posterior and full conditionals (MRF formulation) π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T Wx]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } 19 π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Gibbs sampler implementation x N((λ y K T K + λ x W ) 1 λ y K T y, (λ y K T K + λ x W ) 1 ) λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

1-d example - using MRF prior for x post mean & 80% ci for z(s) post mean for x s 0 2 4 6 8 10 0 2 4 6 8 10 20 (d) gp model (c) iid x s; k(s) fat (b) rw x s; k(s) skinny (a) iid x s; k(s) skinny estimated process z(s) 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0 2 4 6 8 10 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 estimated x & Kx 0 2 4 6 8 10 spatial location s

8 hour max for ozone on a summer day in the Eastern US 21 n = 510 ozone measurements ω k s laid out on a hexagonal lattice as shown. k(s) is circular normal, with sd = lattice spacing. Choice of width for k(s): could look at empirical variogram could estimate using ML or cross-validation could treat as additional parameter in posterior 20 60 100

A multiresolution spatial model formulation 22 z(s) =z coarse (s)+z fine (s) 20 60 100 Coarse process: m c = 27 locations ω1, c..., ωm c c grid. x c =(x c1,..., x cmc ) T N(0, 1 λ c I mc ) on a hexagonal coarse smoothing kernel k c (s) is normal with sd = coarse grid spacing. Fine process: m f = 87 locations ω1 f,..., ωm f f grid. on a hexagonal x f =(x f1,..., x fmf ) T N(0, 1 λ I mf ) f fine smoothing kernel k f (s) is normal with sd = fine grid spacing. note: coarse kernel width is twice the fine kernel width.

23 Model: where Multiresolution formulation and full conditionals y = K c x c + K f x f + ɛ y = Kx + ɛ K =(K c K f ) and x = Define λ W x = c I mc 0 0 λ f I mf Gibbs sampler implementation then becomes x c x f x N((λ y K T K + W x ) 1 λ y K T y, (λ y K T K + W x ) 1 ) λ c Γ(a x + m c 2,b x +.5x T c x c ) λ f Γ(a x + m f 2,b x +.5x T f x f ) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

Multiresolution model for 8 hour max ozone coarse formulation 24 coarse + fine formulation 20 60 100

Basic binary classification example data true field 25 5 10 15 20 0 5 10 15 20 5 10 15 20 0 5 10 15 20 binary spatial process z (s) spatial area partitioned into two regions: z (s) =1and z (s) =0. n = 10 measurements y =(y 1,..., y n ) T taken at spatial locations s 1,..., s n. y i = z (s i )+ɛ i, i =1,..., n; ɛ i iid N(0, 1), i =1,..., n

x -4-2 0 2 4 Z Constructing a binary spatial process z (s) knot values x smoothing kernel z(s) 2 0.01 0.008 0 k(s) 0.006 0.004-4 -2 0 2 4 Z 0.002 20!2 20 15 10 s 2 5 0 0 5 s 1 10 15 20 0 20 15 10 s 2 5 0 0 5 s 1 10 15 20 15 10 Y 5 5 10 X 15 20 x convolved with k(s) gives z(s) x N(0,I m ) where each x j is located at ω j ; z(s) = m j=1 x jk(s ω j ) 26 z(s) binary z*(s) 20-2 -1 0 1 2 3 Z 15 10 Y 5 5 10 X 15 20 20 15 10 Y 5 5 10 X 15 20 Now define the binary field z (s) by clipping z(s): z (s) =I[z(s) > 0].

Model Formulation Define n = 10 observations y =(y(s 1 ),..., y(s n )) T. Define z =(z (s 1 ),..., z (s n )) T. Define x =(x(ω 1 ),..., x(ω m )) T to be the m = 25-vector of white noise knot values at spatial grid sites ω 1,..., ω m. Recall z (s) and the vector z are determined by the knot values x. 27 Likelihood Independent normal prior for x L(y z ) exp{ 1 2 (y z ) T (y z )} π(x) exp{ 1 2 xt x} Posterior distribution π(x y) exp{ 1 2 (y z ) T (y z ) 1 2 xt x}

Sampling the posterior via Metropolis Full conditional distributions π(x j x j,y) exp{ 1 2 (y z ) T (y z ) 1 2 x2 j}, j=1,..., m Metropolis implementation for sampling from π(x y): Initialize x at x =0. Cycle thru full conditionals updating each x j according to Metropolis rules. 28 generate proposal x j U[x j r, x j + r]. compute acceptance probability α = min 1, π(x j x j,y) π(x j x j,y) update x j to new value: x new j = x j x j with probability α with probability 1 α Here we ran for T = 1000 scans, giving realizations x 1,..., x T from the posterior. Discarded the first 100 for burn in. Note: proposal width r tuned so that x j is accepted about half the time.

Sampling from non-standard multivariate distributions Nick Metropolis Computing pioneer at Los Alamos National Laboratory inventor of the Monte Carlo method inventor of Markov chain Monte Carlo: 29 Equation of State Calculations by Fast Computing Machines (1953) by N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller and E. Teller, Journal of Chemical Physics. Originally implemented on the MANIAC1 computer at LANL Algorithm constructs a Markov chain whose realizations are draws from the target (posterior) distribution. Constructs steps that maintain detailed balance.

Gibbs Sampling and Metropolis for a bivariate normal density π(z 1,z 2 ) 1 ρ ρ 1 2 1 exp 1 2 ( z 1 z 2 ) 1 ρ ρ 1 1 z 1 z 2 30 sampling from the full conditionals also called heat bath z 1 z 2 N(ρz 2, 1 ρ 2 ) z 2 z 1 N(ρz 1, 1 ρ 2 ) Metropolis updating: generate z1 U[z 1 r, z 1 + r] calculate α = min{1, π(z 1,z 2) set z new 1 = π(z 1,z 2 ) = π(z 1 z 2) π(z 1 z 2 ) } z 1 with probability α z 1 with probability 1 α

31 Posterior realizations of z (s) =I[z(s) > 0]

data Posterior mean of z (s) =I[z(s) > 0] true field posterior mean 5 10 15 20 32 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Application locating archeological sites 1.7 1.1 1.0-0.1 0.2 0.6 1.2 1.3 1.3 1.0 1.6-0.1-1.3 1.3 2.4 2.6 1.2 1.0 0.1 0.6 0.4 0.7 0.3-0.3 0.6 0.9 0.0 0.7 0.9 1.8 2.2 2.4 0.6 1.4 1.3 0.1 0.3 0.7-0.1 0.6 1.3 1.8 1.2-0.4 1.2 0.8 1.6 2.4 1.0 0.4 0.4 1.4 0.2 1.4 1.0 0.6-0.9 1.8 1.4 2.0 2.0 2.0 1.8 2.0 1.2 1.1 1.8 1.5-1.1 1.8 1.2 1.2 1.2 1.3?? 3.2 1.7? 1.4 1.2 1.4 1.6 1.9 1.1 0.6-0.2 1.3 0.3 2.0?? 1.4 1.7? 1.2 0.7 1.6 2.2 1.8 0.7 1.3 0.4 0.1 0.7 1.5 1.3 2.5 1.2 1.6 0.9 4.4 1.4 1.8 0.8 2.6 2.8 1.3 1.2 0.7 1.7 1.4 2.9 1.4-0.1 0.8 1.0 0.8 1.5 1.5 1.5 2.0 1.8 1.4 1.1 1.6 1.6 1.7 1.2 0.1 0.7 0.8 0.8 2.2 1.1 2.7 2.0 2.8 1.8 1.4 1.5 1.8 1.3 2.3 1.3 1.3 0.6 1.2 0.4-0.4 1.2 1.8 2.1 2.4 1.8 2.0 1.4 1.5-0.4 1.6 1.8 1.7 0.1 1.1-0.2 1.5 1.0 1.4 2.1? 2.0 2.5 1.8 1.7 0.6 0.4 1.2 0.4 1.1 1.0 0.7 0.7 0.9 1.7 2.0 2.4 1.6 2.4 1.9 1.8 0.9 0.0 1.6-0.7 1.2 1.1 0.7 0.2 1.1 1.5 1.6 1.8 1.6 1.9 1.9 1.6 0.9 2.8 1.1 0.3 1.3 0.3 1.0 1.5 1.3 0.7 1.4 1.8 2.1 2.0 1.7 1.7 0.9 0.4-0.4 1.4 1.4 1.5 1.6 1.4 33 1.3 1.0 1.2 1.6 1.3 1.6 1.8 1.2 0.2 0.2 1.1 1.4?? 2.4 1.8 Data at 16 16 9 locations over 10 m 2 area -1 0 1 2 3 4 Assume y(s i ) z (s i ) N(µ(z (s i )), 1) µ(0) = 1, µ(1) = 2. Recall z(s) = m j=1 x j k(s ω j ) and z (s) =I[z(s) > 0]. m =8 8 lattice of knot locations ω 1,..., ω m ; sd of k(s) equal to knot spacings.

Posterior mean and realizations for z (s) =I[z(s) > 0] 34

Bayesian formulation Well.NW 0.0 0.10 0.20 0.30 Well.N 0.0 0.10 0.20 0.30 Well.NE 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 Time Well.W 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Well.E 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time 42 Well.SW 0.0 0.10 0.20 0.30 Well.S 0.0 0.10 0.20 0.30 Well.SE 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time L(y η(z)) Σ 1 2 exp 1 2 (y η(z))t Σ 1 (y η(z)) π(z λ z ) λ m 2 z exp Time 1 2 zt W z z π(λ z ) λ a z 1 z exp {b z λ z } π(z,λ z y) L(y η(z)) π(z λ z ) π(λ z ) Time

Posterior realizations of z under MRF and moving average priors Well Data MRF Realization MRF Realization MRF Posterior Mean I I I I 0.42 0.34 0.93 0.52 0.17 P P P 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 43 GP Realization GP Realization GP Posterior Mean 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40

44 References D. Higdon (1998) A process-convolution approach to modeling temperatures in the North Atlantic Ocean (with discussion), Environmental and Ecological Statistics, 5:173 190. D. Higdon (2002) Space and space-time modeling using process convolutions, in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds),37 56. D. Higdon, H. Lee and C. Holloman (2003) Markov chain Monte Carlo approaches for inference in computationally intensive inverse problems, in Bayesian Statistics 7, Proceedings of the Seventh Valencia International Meeting (Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith and West, eds).

45 NON-STATIONARY SPATIAL CONVOLUTION MODELS

A convolution-based approach for building non-stationary spatial models * = white noise smoothing kernel spatial field 46 z(s) = m x jk(ω j s) = m x jk s (ω j ) j=1 j=1 x N(0,I m ) where each x j is located at ω j over a regular 2-d grid.

Convolutions for constructing non-stationary spatial models 47 m z(s) = x jk s (ω j ) j=1 x N(0,λ 1 x I m ) at regular 2-d lattice locations ω 1,..., ω m. z(s) GP (0,C(s 1,s 2 ) where C(s 1,s 2 )=λ 1 x k s 1 (ω j )k s2 (ω j ) j=1 smoothing kernel k s () changes smoothly over spatial location m

Defining smoothly varying kernels via basis kernels k 1 () k 2 () k 3 () 48 Define k s () =w 1 (s)k 1 ( s)+w 2 (s)k 2 ( s)+w 3 (s)k 3 ( s) k s () is a weighted combination of kernels centered at s. Define weights that change smoothly over space.

Defining smoothly varying kernels via basis kernels 49 Define iid, mean 0 Gaussian Processes φ 1 (s), φ 2 (s) and φ 3 (s) Set w i (s) = exp{φ i (s)} exp{φ 1 (s)} + exp{φ 2 (s)} + exp{φ 3 (s)} Estimate φ i (s) s like any other parameter in the analysis.

50 An application to Piazza Road Superfund site

51 MOVING AVERAGE/BASIS SPACE-TIME MODELS

A convolution-based approach for building space-time models marked point process smoothing kernel space-time field time time 52 space space Define space-time domain S T Define discrete knot process x(s, t) on {(ω 1, τ 1 ),..., (ω m, τ m )} within S T Define smoothing kernel k(s, t) Construct space-time process z(s, t) z(s, t) = z(s, t) = m j=1 k((s, t) (ω j, τ j ))x(ω j, τ j ) or with varying kernel m j=1 k st(ω j, τ j )x j

A Space-time model for ocean temperatures ocean temperature at constant potential density 50N 45N latitude 40N 35N 30N 25N 53 Data: y =(y 1,..., y n ) T at space-time locations: (s 1,t 1 ),..., (s n,t n ) Times: 1910 1988 assume data are centered (ȳ =0) degrees C 20N 6 8 10 14 30W 25W 20W 15W 10W 5W longitude

Knot locations and kernels smoothing kernels and latent grid latitude 50N 45N 40N 35N 30N 25N 20N 54 m knot locations over a space-time grid (ω 1, τ 1 ),..., (ω m, τ m ) Spatial knot locations shown; Temporal knot spacing 7 years; 1900-1995; Kernels vary with spatial location k st (ω, τ) =k s (ω, τ) use spatial mixture of normal basis kernels. 30W 25W 20W 15W 10W 5W longitude

Likelihood: Formulation for the ocean example L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) where K ij = k si t i (ω j, τ j )x j. Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T x 55 Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T x] }

Full conditionals: Full conditionals for ocean formulation π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T x]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } 56 Gibbs sampler implementation x N((λ y K T K + λ x I m ) 1 λ y K T y, (λ y K T K + λ x I m ) 1 ) x j N λ y rj T k j 1, λ y kj T k j + λ x λ y kj T k j + λ x λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx)) where k j is jth column of K and r j = y j j k j x j.

Posterior mean of the space-time temperature field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 6 8 10 12 14 20N 6 8 10 12 14 20N 6 8 10 12 14 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 57 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 6 8 10 12 14 20N 6 8 10 12 14 20N 6 8 10 12 14 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Deviations from time-averaged mean temperature field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N -1 0 1 20N -1 0 1 20N -1 0 1 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 58 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N -1 0 1 20N -1 0 1 20N -1 0 1 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Posterior probabilities of differing from time-averaged mean field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 59 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Alternative approaches for building space-time models latent process smoothing kernel space-time field time time space 60 space Define space-time domain S T with T = {1,..., n t } space Discrete knot process x(s, t) on {ω 1,..., ω ns } {1,..., n t }, n t n s = m Define spatial smoothing kernel k(s) Construct space-time process z(s, t) z(s, t) = n s z(s, t) = n s j=1 k(s ω j)x(ω j,t) or with varying kernel j=1 k st(ω j )x jt

8 hour max for ozone over summer days in the Eastern US 61 0 20 60 100 n = 510 ozone measurements each of 30 days ω k s laid out on same hexagonal lattice

62 Temporally evolving latent x(s, t) process Times: t T = {1,..., n t }, n t = 30 spatial knot locations: W = {ω 1,..., ω ns }, n s = 27 x(s, t) defined on W T Space-time process z(s, t) obtained by convolving x(s, t) with k(s): z(s, t) = n s j=1 k(ω j s)x(ω j,t) = n s j=1 k s(ω j )x jt Specify locally linear MRF priors for each x j =(x j1,..., x jnt ) T π(x j λ x ) λ n t 2 exp 1 2 λ xx T j Wx j where 1 if i j =1 1 if i = j =1or i = j = n W ij = t 2 if 1 <i= j < n t 0 otherwise So for x =(x 11,x 21..., x ns 1,x 12,..., x ns 2,..., x 1nt,..., x ns n t ) T π(x λ x ) λ n t n s 2 exp 1 2 λ xx T (W I ns )x

Formulation for temporally evolving z(s, t) Data: at each time t, observe n-vector y t =(y 1t,..., y nt ) T at sites s 1,..., s n. Likelihood for data observed at time t: L(y t x t, λ y ) λ n 2 y exp 1 2 λ y(y t K t x t ) T (y K t x t ) where K t ij = k(ω j s i ) 63 Define n n t -vector y =(y T 1,..., y T n t ) T Likelihood for entire data y: L(y x, λ y ) λ nn t 2 y where K = diag(k 1,..., K n t). Priors: exp 1 2 λ y(y Kx) T (y Kx) π(x λ x ) λ m 2 x exp 1 2 λ xx T Wx π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y }

Posterior and full conditionals π(x, λ x, λ y y) λ a y+ nn t 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ n sn t 2 1 x exp { λ x [b x +.5x T Wx] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T Wx]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } 64 Gibbs sampler implementation x N((λ y K T K + λ x W ) 1 λ y K T y, (λ y K T K + λ x W ) 1 ) x jt N λ y r T tjk tj + n j x j λ y k T tjk tj + n j λ x, 1 λ y k T j k j + n j λ x λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx)) where k tj is jth column of K t, r tj = y t j j k tj x tj, n j = number of neighbors of x jt, and x jt =mean of neighbors of x jt

DLM setup for ozone example Given latent process x t =(x 1,t,..., x 27,t ) T, t =1,..., 30 y t =(y 1t,..., y ny t) T at sites s 1t,..., s ny t y t = K t x t + ɛ t x t = x t 1 + ν t 65 where K t is the n y 27 matrix given by: Kij t = k(s it ω j ),t=1,..., 30, ɛ t ν t i.i.d. N(0, σɛ 2 ), t=1,..., 30, i.i.d. N(0, σν), 2 t=1,..., 30, and x 1 N(0, σ 2 xi 27 ). can use dynamic linear model (DLM)/Kalman filter machinery single site MCMC works too See Stroud et.al. (1999) for alternative model.

Posterior mean for first 9 days 66 0 20 60 100

Posterior mean of selected x j s latent proces values x(s,t) 67-300 -100 0 100 300 independent over time 1 21 1 11 1 3 3 3 1 1 3 1 1 1 1 11 1 1 1 1 1 1 3 2 3 1 1 1 2 22 2 22 2 22 22 3 3 3 3 33 3 33 3 3 1 1 1 1 2 2 222 3 2 3 3 2 22 2 3 3 3 3 3333 1 2 2 22 3 3 122 1 2 0 5 10 15 20 25 30 t (days) latent proces values x(s,t) -300-100 0 100 300 random walk over time 11 1 1 1 1 11 1 1 1 1 1 1 31 11 3 1 11 1 111 1 11 1 2 22 22 1 22 2 2 2 2 3 333 22 333 333 3 3 3 3 3 3 3 3 3 3 3 3 3 33 1 2 2 222 33 22 3 2 2 2 2 2 2 22 2 2 0 5 10 15 20 25 30 t (days)

68 References D. Higdon (1998) A process-convolution approach to modeling temperatures in the North Atlantic Ocean (with discussion), Environmental and Ecological Statistics, 5:173 190. J. Stroud, P. Müler and B. Sanso (2001) Dynamic Models For Spatio- Temporal Data, Journal of the Royal Statistical Society, Series B, 63, 673 689. D. Higdon (2002) Space and space-time modeling using process convolutions, in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds), 37 56. M. West and J. Harrison (1997) Bayesian Forecasting and Dynamic Models (Second Edition), Springer-Verlag.