Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

1 Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

2 MOVING AVERAGE SPATIAL MODELS

Kernel basis representation for spatial processes z(s) Define m basis functions k 1 (s),..., k m (s). basis 0.0 0.4 0.8!2 0 2 4 6 8 10 12 3 Here k j (s) is normal density cetered at spatial location ω j : k j (s) = 1 2π exp{ 1 2 (s ω j) 2 } s set z(s) = m k j(s)x j where x N(0,I m ). j=1 Can represent z =(z(s 1 ),..., z(s n )) T as z = Kx where K ij = k j (s i )

x and k(s) determine spatial processes z(s) k j (s)x j z(s) basis!0.5 0.5 1.5 z(s)!0.5 0.5 1.5 4!2 0 2 4 6 8 10 12 s!2 0 2 4 6 8 10 12 s Continuous representation: z(s) = m j=1 k j(s)x j where x N(0,I m ). Discrete representation: For z =(z(s 1 ),..., z(s n )) T, z = Kx where K ij = k j (s i )

Estimate z(s) by specifying k j (s) and estimating x k j (s)x j z(s) fitted bases!1.0!0.5 0.0 0.5 1.0 y(s)!2.0!1.5!1.0!0.5 0.0 0.5 5!2 0 2 4 6 8 10 12 s!2 0 2 4 6 8 10 12 s Discrete representation: For z =(z(s 1 ),..., z(s n )) T, z = Kx where K ij = k j (s i ) standard regression model: y = Kx + ɛ

Formulation for the 1-d example Data y = (y(s 1 ),..., y(s n )) T observed at locations s 1,..., s n. Once knot locations ω j,j=1,..., m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) 6 where K ij = k(ω j s i ). Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T x Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T x] }

Posterior and full conditionals Posterior: π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T x]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } 7 π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Gibbs sampler implementation x N((λ y K T K + λ x I m ) 1 λ y K T y, (λ y K T K + λ x I m ) 1 ) λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

Gibbs sampler: intuition Gibbs sampler for a bivariate normal density π(z) =π(z 1,z 2 ) 1 ρ ρ 1 2 1 exp 1 2 ( z 1 z 2 ) 1 ρ ρ 1 1 z 1 z 2 Full conditionals of π(z): 9 z 1 z 2 N(ρz 2, 1 ρ 2 ) z 2 z 1 N(ρz 1, 1 ρ 2 ) initialize chain with z 0 N 0 0, 1 ρ ρ 1 z 2 (z 1 1,z 1 2)=z 1 draw z 1 1 N(ρz 0 2, 1 ρ 2 ) now (z 1 1,z 0 2) T π(z) z 0 =(z 0 1,z 0 2) (z 1 1,z 0 2) z 1

Gibbs sampler: intuition Gibbs sampler gives z 0,z 2,...,z T which can be treated as dependent draws from π(z). 10 If z 0 is not a draw from π(z), then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference about π(z): Posterior mean of z is estimated by: ˆµ 1 = 1 ˆµ 2 T Posterior probabilities: T k=1 z1 k z2 k z 2 P (z 1 > 1) = 1 T P (z 1 >z 2 )= 1 T T k=1 T k=1 90% interval: (z [5%] 1,z [95%] 1 ). I[z k 1 > 1] I[z k 1 >z k 2] z 1

1-d example m =6knots evenly spaced between.3 and 1.2. n =5data points at s =.05,.25,.52,.65,.91. k(s) is N(0, sd =.3) a y = 10, b y = 10 (.25 2 ) strong prior at λ y =1/.25 2 ; a x =1, b x =.001 3 2 3 2 basis functions 1 1 y(s) 0 k j (s) 0!1!1!2!2 20 x j!3 0 0.2 0.4 0.6 0.8 1 s 8 6 4 2 0!2!4 0 200 400 600 800 1000 iteration! x,! y!3 0 0.2 0.4 0.6 0.8 1 s 50 40 30 20 10 0 0 200 400 600 800 1000 iteration

1-d example From posterior realizations of knot weights x, one can construct posterior realizations of the smooth fitted function z(s) = m j=1 k j (s)x j. Note strong prior on λ y required since n is small. 3 2 posterior realizations of z(s) 3 2 mean & pointwise 90% bounds z(s), y(s) 1 0 z(s), y(s) 1 0 21!1!1!2!2!3 0 0.2 0.4 0.6 0.8 1 s!3 0 0.2 0.4 0.6 0.8 1 s

1-d example m = 20 knots evenly spaced between 2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 2) data posterior mean and 80% CI for z(s) 8 y -0.5 0.0 0.5 1.0 1.5 y -0.5 0.0 0.5 1.0 1.5 0 2 4 6 8 10 0 2 4 6 8 10 s s 0 50 100 150 200 posterior dist for lam_y 0 50 100 150 200 250 300 posterior dist for lam_x 0 10 20 30 40 50 60 0.0 0.5 1.0 1.5 2.0 lambda_y lambda_x

1-d example m = 20 knots evenly spaced between 2 and 12. n = 18 data points evenly spaced between 0 and 10. k(s) is N(0, sd = 1) data posterior mean and 80% CI for z(s) 9 y -0.5 0.0 0.5 1.0 1.5 y -0.5 0.0 0.5 1.0 1.5 0 2 4 6 8 10 0 2 4 6 8 10 s s 0 50 100 150 200 posterior dist for lam_y 0 50 100 150 posterior dist for lam_x 0 100 200 300 400 500 600 0.5 1.0 1.5 2.0 2.5 3.0 3.5 lambda_y lambda_x

Basis representations for spatial processes z(s) Represent z(s) at spatial locations s 1,..., s n. z =(z(s 1 ),..., z(s n )) T N(0, Σ z ). Recall z = Kx, where KK T = Σ z and x N(0,I n ). Gives a discrete representation of z(s) at locations s 1,..., s n. discrete representation continuous representation 10 basis!1.0!0.5 0.0 0.5 1.0 basis!1.0!0.5 0.0 0.5 1.0 0 2 4 6 8 10 s 0 2 4 6 8 10 s z(s i )= n j=1 K ijx j z(s) = n k j(s)x j j=1 Columns of K give basis functions. Can use a subset of these basis functions to reduce dimensionality.

decomposition basis covariance Cholesky (w/piv) basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s 11 SVD basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s kernels basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 s location s

How many basis kernels? Define m basis functions k 1 (s),..., k m (s). m = 20? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s 12 m = 10? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s m =6? basis!1.0!0.5 0.0 0.5 1.0 location s 0 2 4 6 8 10 0 2 4 6 8 10 s 0 2 4 6 8 10 location s

Moving average specifications for spatial models z(s) smoothing kernel k(s) 13 white noise process x =(x 1,..., x m ) at spatial locations ω 1,..., ω m. x N(0, 1 I m ) λ x spatial process z(s) constructed by convolving x with smoothing kernel k(s) z(s) = m x jk(ω j s) j=1 z(s) is a Gaussian process with mean 0 and covariance given by Cov(z(s),z(s )) = 1 λ x m j=1 k(ω j s)k(ω j s )

Example: constructing 1-d models for z(s) z(s), x(s) -3-1 1 2 3 0 2 4 6 8 10 s m = 20 knot locations ω 1,..., ω m equally spaced between 2 and 12. x =(x 1,..., x m ) T N(0,I m ) z(s), x(s) -3-1 1 2 3 0 2 4 6 8 10 z(s) = m k=1 k(ω k s)x k k(s) is a normal density with sd = 1 4, 1 2, and 1 4th frame uses k(s) = 1.4 exp{ 1.4 s }. s 16 z(s), x(s) z(s), x(s) -3-1 1 2 3-3 -1 1 2 3 0 2 4 6 8 10 s 0 2 4 6 8 10 General points: smooth kernels required spacing depends on kernel width knot spacing 1 sd for normal k(s) kernel width is equivalent to scale parameter in GP models s

MRF formulation for the 1-d example Data y = (y(s 1 ),..., y(s n )) T observed at locations s 1,..., s n. Once knot locations ω j,j=1,..., m and kernel choice k(s) are specified, the remaining model formulation is trivial: Likelihood: L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) 18 where K ij = k(ω j s i )x j. Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T Wx Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] }

Posterior: Posterior and full conditionals (MRF formulation) π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T Wx]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } 19 π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Gibbs sampler implementation x N((λ y K T K + λ x W ) 1 λ y K T y, (λ y K T K + λ x W ) 1 ) λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

1-d example - using MRF prior for x post mean & 80% ci for z(s) post mean for x s 0 2 4 6 8 10 0 2 4 6 8 10 20 (d) gp model (c) iid x s; k(s) fat (b) rw x s; k(s) skinny (a) iid x s; k(s) skinny estimated process z(s) 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0 2 4 6 8 10 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 estimated x & Kx 0 2 4 6 8 10 spatial location s

8 hour max for ozone on a summer day in the Eastern US 21 n = 510 ozone measurements ω k s laid out on a hexagonal lattice as shown. k(s) is circular normal, with sd = lattice spacing. Choice of width for k(s): could look at empirical variogram could estimate using ML or cross-validation could treat as additional parameter in posterior 20 60 100

A multiresolution spatial model formulation 22 z(s) =z coarse (s)+z fine (s) 20 60 100 Coarse process: m c = 27 locations ω1, c..., ωm c c grid. x c =(x c1,..., x cmc ) T N(0, 1 λ c I mc ) on a hexagonal coarse smoothing kernel k c (s) is normal with sd = coarse grid spacing. Fine process: m f = 87 locations ω1 f,..., ωm f f grid. on a hexagonal x f =(x f1,..., x fmf ) T N(0, 1 λ I mf ) f fine smoothing kernel k f (s) is normal with sd = fine grid spacing. note: coarse kernel width is twice the fine kernel width.

23 Model: where Multiresolution formulation and full conditionals y = K c x c + K f x f + ɛ y = Kx + ɛ K =(K c K f ) and x = Define λ W x = c I mc 0 0 λ f I mf Gibbs sampler implementation then becomes x c x f x N((λ y K T K + W x ) 1 λ y K T y, (λ y K T K + W x ) 1 ) λ c Γ(a x + m c 2,b x +.5x T c x c ) λ f Γ(a x + m f 2,b x +.5x T f x f ) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx))

Multiresolution model for 8 hour max ozone coarse formulation 24 coarse + fine formulation 20 60 100

Basic binary classification example data true field 25 5 10 15 20 0 5 10 15 20 5 10 15 20 0 5 10 15 20 binary spatial process z (s) spatial area partitioned into two regions: z (s) =1and z (s) =0. n = 10 measurements y =(y 1,..., y n ) T taken at spatial locations s 1,..., s n. y i = z (s i )+ɛ i, i =1,..., n; ɛ i iid N(0, 1), i =1,..., n

x -4-2 0 2 4 Z Constructing a binary spatial process z (s) knot values x smoothing kernel z(s) 2 0.01 0.008 0 k(s) 0.006 0.004-4 -2 0 2 4 Z 0.002 20!2 20 15 10 s 2 5 0 0 5 s 1 10 15 20 0 20 15 10 s 2 5 0 0 5 s 1 10 15 20 15 10 Y 5 5 10 X 15 20 x convolved with k(s) gives z(s) x N(0,I m ) where each x j is located at ω j ; z(s) = m j=1 x jk(s ω j ) 26 z(s) binary z*(s) 20-2 -1 0 1 2 3 Z 15 10 Y 5 5 10 X 15 20 20 15 10 Y 5 5 10 X 15 20 Now define the binary field z (s) by clipping z(s): z (s) =I[z(s) > 0].

Model Formulation Define n = 10 observations y =(y(s 1 ),..., y(s n )) T. Define z =(z (s 1 ),..., z (s n )) T. Define x =(x(ω 1 ),..., x(ω m )) T to be the m = 25-vector of white noise knot values at spatial grid sites ω 1,..., ω m. Recall z (s) and the vector z are determined by the knot values x. 27 Likelihood Independent normal prior for x L(y z ) exp{ 1 2 (y z ) T (y z )} π(x) exp{ 1 2 xt x} Posterior distribution π(x y) exp{ 1 2 (y z ) T (y z ) 1 2 xt x}

Sampling the posterior via Metropolis Full conditional distributions π(x j x j,y) exp{ 1 2 (y z ) T (y z ) 1 2 x2 j}, j=1,..., m Metropolis implementation for sampling from π(x y): Initialize x at x =0. Cycle thru full conditionals updating each x j according to Metropolis rules. 28 generate proposal x j U[x j r, x j + r]. compute acceptance probability α = min 1, π(x j x j,y) π(x j x j,y) update x j to new value: x new j = x j x j with probability α with probability 1 α Here we ran for T = 1000 scans, giving realizations x 1,..., x T from the posterior. Discarded the first 100 for burn in. Note: proposal width r tuned so that x j is accepted about half the time.

Sampling from non-standard multivariate distributions Nick Metropolis Computing pioneer at Los Alamos National Laboratory inventor of the Monte Carlo method inventor of Markov chain Monte Carlo: 29 Equation of State Calculations by Fast Computing Machines (1953) by N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller and E. Teller, Journal of Chemical Physics. Originally implemented on the MANIAC1 computer at LANL Algorithm constructs a Markov chain whose realizations are draws from the target (posterior) distribution. Constructs steps that maintain detailed balance.

Gibbs Sampling and Metropolis for a bivariate normal density π(z 1,z 2 ) 1 ρ ρ 1 2 1 exp 1 2 ( z 1 z 2 ) 1 ρ ρ 1 1 z 1 z 2 30 sampling from the full conditionals also called heat bath z 1 z 2 N(ρz 2, 1 ρ 2 ) z 2 z 1 N(ρz 1, 1 ρ 2 ) Metropolis updating: generate z1 U[z 1 r, z 1 + r] calculate α = min{1, π(z 1,z 2) set z new 1 = π(z 1,z 2 ) = π(z 1 z 2) π(z 1 z 2 ) } z 1 with probability α z 1 with probability 1 α

31 Posterior realizations of z (s) =I[z(s) > 0]

data Posterior mean of z (s) =I[z(s) > 0] true field posterior mean 5 10 15 20 32 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Application locating archeological sites 1.7 1.1 1.0-0.1 0.2 0.6 1.2 1.3 1.3 1.0 1.6-0.1-1.3 1.3 2.4 2.6 1.2 1.0 0.1 0.6 0.4 0.7 0.3-0.3 0.6 0.9 0.0 0.7 0.9 1.8 2.2 2.4 0.6 1.4 1.3 0.1 0.3 0.7-0.1 0.6 1.3 1.8 1.2-0.4 1.2 0.8 1.6 2.4 1.0 0.4 0.4 1.4 0.2 1.4 1.0 0.6-0.9 1.8 1.4 2.0 2.0 2.0 1.8 2.0 1.2 1.1 1.8 1.5-1.1 1.8 1.2 1.2 1.2 1.3?? 3.2 1.7? 1.4 1.2 1.4 1.6 1.9 1.1 0.6-0.2 1.3 0.3 2.0?? 1.4 1.7? 1.2 0.7 1.6 2.2 1.8 0.7 1.3 0.4 0.1 0.7 1.5 1.3 2.5 1.2 1.6 0.9 4.4 1.4 1.8 0.8 2.6 2.8 1.3 1.2 0.7 1.7 1.4 2.9 1.4-0.1 0.8 1.0 0.8 1.5 1.5 1.5 2.0 1.8 1.4 1.1 1.6 1.6 1.7 1.2 0.1 0.7 0.8 0.8 2.2 1.1 2.7 2.0 2.8 1.8 1.4 1.5 1.8 1.3 2.3 1.3 1.3 0.6 1.2 0.4-0.4 1.2 1.8 2.1 2.4 1.8 2.0 1.4 1.5-0.4 1.6 1.8 1.7 0.1 1.1-0.2 1.5 1.0 1.4 2.1? 2.0 2.5 1.8 1.7 0.6 0.4 1.2 0.4 1.1 1.0 0.7 0.7 0.9 1.7 2.0 2.4 1.6 2.4 1.9 1.8 0.9 0.0 1.6-0.7 1.2 1.1 0.7 0.2 1.1 1.5 1.6 1.8 1.6 1.9 1.9 1.6 0.9 2.8 1.1 0.3 1.3 0.3 1.0 1.5 1.3 0.7 1.4 1.8 2.1 2.0 1.7 1.7 0.9 0.4-0.4 1.4 1.4 1.5 1.6 1.4 33 1.3 1.0 1.2 1.6 1.3 1.6 1.8 1.2 0.2 0.2 1.1 1.4?? 2.4 1.8 Data at 16 16 9 locations over 10 m 2 area -1 0 1 2 3 4 Assume y(s i ) z (s i ) N(µ(z (s i )), 1) µ(0) = 1, µ(1) = 2. Recall z(s) = m j=1 x j k(s ω j ) and z (s) =I[z(s) > 0]. m =8 8 lattice of knot locations ω 1,..., ω m ; sd of k(s) equal to knot spacings.

Posterior mean and realizations for z (s) =I[z(s) > 0] 34

Bayesian formulation Well.NW 0.0 0.10 0.20 0.30 Well.N 0.0 0.10 0.20 0.30 Well.NE 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 Time Well.W 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Well.E 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 Time 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time 42 Well.SW 0.0 0.10 0.20 0.30 Well.S 0.0 0.10 0.20 0.30 Well.SE 0.0 0.10 0.20 0.30 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time L(y η(z)) Σ 1 2 exp 1 2 (y η(z))t Σ 1 (y η(z)) π(z λ z ) λ m 2 z exp Time 1 2 zt W z z π(λ z ) λ a z 1 z exp {b z λ z } π(z,λ z y) L(y η(z)) π(z λ z ) π(λ z ) Time

Posterior realizations of z under MRF and moving average priors Well Data MRF Realization MRF Realization MRF Posterior Mean I I I I 0.42 0.34 0.93 0.52 0.17 P P P 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 43 GP Realization GP Realization GP Posterior Mean 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40

44 References D. Higdon (1998) A process-convolution approach to modeling temperatures in the North Atlantic Ocean (with discussion), Environmental and Ecological Statistics, 5:173 190. D. Higdon (2002) Space and space-time modeling using process convolutions, in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds),37 56. D. Higdon, H. Lee and C. Holloman (2003) Markov chain Monte Carlo approaches for inference in computationally intensive inverse problems, in Bayesian Statistics 7, Proceedings of the Seventh Valencia International Meeting (Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith and West, eds).

45 NON-STATIONARY SPATIAL CONVOLUTION MODELS

A convolution-based approach for building non-stationary spatial models * = white noise smoothing kernel spatial field 46 z(s) = m x jk(ω j s) = m x jk s (ω j ) j=1 j=1 x N(0,I m ) where each x j is located at ω j over a regular 2-d grid.

Convolutions for constructing non-stationary spatial models 47 m z(s) = x jk s (ω j ) j=1 x N(0,λ 1 x I m ) at regular 2-d lattice locations ω 1,..., ω m. z(s) GP (0,C(s 1,s 2 ) where C(s 1,s 2 )=λ 1 x k s 1 (ω j )k s2 (ω j ) j=1 smoothing kernel k s () changes smoothly over spatial location m

Defining smoothly varying kernels via basis kernels k 1 () k 2 () k 3 () 48 Define k s () =w 1 (s)k 1 ( s)+w 2 (s)k 2 ( s)+w 3 (s)k 3 ( s) k s () is a weighted combination of kernels centered at s. Define weights that change smoothly over space.

Defining smoothly varying kernels via basis kernels 49 Define iid, mean 0 Gaussian Processes φ 1 (s), φ 2 (s) and φ 3 (s) Set w i (s) = exp{φ i (s)} exp{φ 1 (s)} + exp{φ 2 (s)} + exp{φ 3 (s)} Estimate φ i (s) s like any other parameter in the analysis.

50 An application to Piazza Road Superfund site

51 MOVING AVERAGE/BASIS SPACE-TIME MODELS

A convolution-based approach for building space-time models marked point process smoothing kernel space-time field time time 52 space space Define space-time domain S T Define discrete knot process x(s, t) on {(ω 1, τ 1 ),..., (ω m, τ m )} within S T Define smoothing kernel k(s, t) Construct space-time process z(s, t) z(s, t) = z(s, t) = m j=1 k((s, t) (ω j, τ j ))x(ω j, τ j ) or with varying kernel m j=1 k st(ω j, τ j )x j

A Space-time model for ocean temperatures ocean temperature at constant potential density 50N 45N latitude 40N 35N 30N 25N 53 Data: y =(y 1,..., y n ) T at space-time locations: (s 1,t 1 ),..., (s n,t n ) Times: 1910 1988 assume data are centered (ȳ =0) degrees C 20N 6 8 10 14 30W 25W 20W 15W 10W 5W longitude

Knot locations and kernels smoothing kernels and latent grid latitude 50N 45N 40N 35N 30N 25N 20N 54 m knot locations over a space-time grid (ω 1, τ 1 ),..., (ω m, τ m ) Spatial knot locations shown; Temporal knot spacing 7 years; 1900-1995; Kernels vary with spatial location k st (ω, τ) =k s (ω, τ) use spatial mixture of normal basis kernels. 30W 25W 20W 15W 10W 5W longitude

Likelihood: Formulation for the ocean example L(y x, λ y ) λ n 2 y exp 1 2 λ y(y Kx) T (y Kx) where K ij = k si t i (ω j, τ j )x j. Priors: π(x λ x ) λ m 2 x exp 1 2 λ xx T x 55 Posterior: π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y } π(x, λ x, λ y y) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } λ a x+ m 2 1 x exp { λ x [b x +.5x T x] }

Full conditionals: Full conditionals for ocean formulation π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T x]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T x] } π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } 56 Gibbs sampler implementation x N((λ y K T K + λ x I m ) 1 λ y K T y, (λ y K T K + λ x I m ) 1 ) x j N λ y rj T k j 1, λ y kj T k j + λ x λ y kj T k j + λ x λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx)) where k j is jth column of K and r j = y j j k j x j.

Posterior mean of the space-time temperature field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 6 8 10 12 14 20N 6 8 10 12 14 20N 6 8 10 12 14 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 57 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 6 8 10 12 14 20N 6 8 10 12 14 20N 6 8 10 12 14 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Deviations from time-averaged mean temperature field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N -1 0 1 20N -1 0 1 20N -1 0 1 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 58 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N -1 0 1 20N -1 0 1 20N -1 0 1 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Posterior probabilities of differing from time-averaged mean field 50N 1960 50N 1965 50N 1970 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 59 50N 1975 50N 1980 50N 1985 45N 45N 45N 40N 40N 40N 35N 35N 35N 30N 30N 30N 25N 25N 25N 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 20N 0.0 0.4 0.8 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W 30W 25W 20W 15W 10W 5W

Alternative approaches for building space-time models latent process smoothing kernel space-time field time time space 60 space Define space-time domain S T with T = {1,..., n t } space Discrete knot process x(s, t) on {ω 1,..., ω ns } {1,..., n t }, n t n s = m Define spatial smoothing kernel k(s) Construct space-time process z(s, t) z(s, t) = n s z(s, t) = n s j=1 k(s ω j)x(ω j,t) or with varying kernel j=1 k st(ω j )x jt

8 hour max for ozone over summer days in the Eastern US 61 0 20 60 100 n = 510 ozone measurements each of 30 days ω k s laid out on same hexagonal lattice

62 Temporally evolving latent x(s, t) process Times: t T = {1,..., n t }, n t = 30 spatial knot locations: W = {ω 1,..., ω ns }, n s = 27 x(s, t) defined on W T Space-time process z(s, t) obtained by convolving x(s, t) with k(s): z(s, t) = n s j=1 k(ω j s)x(ω j,t) = n s j=1 k s(ω j )x jt Specify locally linear MRF priors for each x j =(x j1,..., x jnt ) T π(x j λ x ) λ n t 2 exp 1 2 λ xx T j Wx j where 1 if i j =1 1 if i = j =1or i = j = n W ij = t 2 if 1 <i= j < n t 0 otherwise So for x =(x 11,x 21..., x ns 1,x 12,..., x ns 2,..., x 1nt,..., x ns n t ) T π(x λ x ) λ n t n s 2 exp 1 2 λ xx T (W I ns )x

Formulation for temporally evolving z(s, t) Data: at each time t, observe n-vector y t =(y 1t,..., y nt ) T at sites s 1,..., s n. Likelihood for data observed at time t: L(y t x t, λ y ) λ n 2 y exp 1 2 λ y(y t K t x t ) T (y K t x t ) where K t ij = k(ω j s i ) 63 Define n n t -vector y =(y T 1,..., y T n t ) T Likelihood for entire data y: L(y x, λ y ) λ nn t 2 y where K = diag(k 1,..., K n t). Priors: exp 1 2 λ y(y Kx) T (y Kx) π(x λ x ) λ m 2 x exp 1 2 λ xx T Wx π(λ x ) λ a x 1 x exp{ b x λ x } π(λ y ) λ a y 1 y exp{ b y λ y }

Posterior and full conditionals π(x, λ x, λ y y) λ a y+ nn t 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } Full conditionals: λ a x+ n sn t 2 1 x exp { λ x [b x +.5x T Wx] } π(x ) exp{ 1 2 [λ yx T K T Kx 2λ y x T K T y + λ x x T Wx]} π(λ x ) λ a x+ m 2 1 x exp { λ x [b x +.5x T Wx] } π(λ y ) λ a y+ n 2 1 y exp { λ y [b y +.5(y Kx) T (y Kx)] } 64 Gibbs sampler implementation x N((λ y K T K + λ x W ) 1 λ y K T y, (λ y K T K + λ x W ) 1 ) x jt N λ y r T tjk tj + n j x j λ y k T tjk tj + n j λ x, 1 λ y k T j k j + n j λ x λ x Γ(a x + m 2,b x +.5x T x) λ y Γ(a y + n 2,b y +.5(y Kx) T (y Kx)) where k tj is jth column of K t, r tj = y t j j k tj x tj, n j = number of neighbors of x jt, and x jt =mean of neighbors of x jt

DLM setup for ozone example Given latent process x t =(x 1,t,..., x 27,t ) T, t =1,..., 30 y t =(y 1t,..., y ny t) T at sites s 1t,..., s ny t y t = K t x t + ɛ t x t = x t 1 + ν t 65 where K t is the n y 27 matrix given by: Kij t = k(s it ω j ),t=1,..., 30, ɛ t ν t i.i.d. N(0, σɛ 2 ), t=1,..., 30, i.i.d. N(0, σν), 2 t=1,..., 30, and x 1 N(0, σ 2 xi 27 ). can use dynamic linear model (DLM)/Kalman filter machinery single site MCMC works too See Stroud et.al. (1999) for alternative model.

Posterior mean for first 9 days 66 0 20 60 100

Posterior mean of selected x j s latent proces values x(s,t) 67-300 -100 0 100 300 independent over time 1 21 1 11 1 3 3 3 1 1 3 1 1 1 1 11 1 1 1 1 1 1 3 2 3 1 1 1 2 22 2 22 2 22 22 3 3 3 3 33 3 33 3 3 1 1 1 1 2 2 222 3 2 3 3 2 22 2 3 3 3 3 3333 1 2 2 22 3 3 122 1 2 0 5 10 15 20 25 30 t (days) latent proces values x(s,t) -300-100 0 100 300 random walk over time 11 1 1 1 1 11 1 1 1 1 1 1 31 11 3 1 11 1 111 1 11 1 2 22 22 1 22 2 2 2 2 3 333 22 333 333 3 3 3 3 3 3 3 3 3 3 3 3 3 33 1 2 2 222 33 22 3 2 2 2 2 2 2 22 2 2 0 5 10 15 20 25 30 t (days)

68 References D. Higdon (1998) A process-convolution approach to modeling temperatures in the North Atlantic Ocean (with discussion), Environmental and Ecological Statistics, 5:173 190. J. Stroud, P. Müler and B. Sanso (2001) Dynamic Models For Spatio- Temporal Data, Journal of the Royal Statistical Society, Series B, 63, 673 689. D. Higdon (2002) Space and space-time modeling using process convolutions, in Quantitative Methods for Current Environmental Issues (C. Anderson and V. Barnett and P. C. Chatwin and A. H. El-Shaarawi, eds), 37 56. M. West and J. Harrison (1997) Bayesian Forecasting and Dynamic Models (Second Edition), Springer-Verlag.