Multi-resolution models for large data sets

Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation NORDSTAT, Umeå, June, 2012

Credits Steve Sain, NCAR Tia LeRud, UC Davis Dorit Hammerling, U Michigan Soutir Bandyopadhyay, Lehigh Finn Lindgren, NTNU, Norway D. Nychka LatticeKrig 2

Outline A climate data set Kriging... Compact basis functions (Φ), Markov Random fields (H) The multi-resolution model Properties of the spatial process A climate example Key idea: Introduce sparse basis and precision matrices without compromising the spatial model. D. Nychka LatticeKrig 3

Observed mean summer precipitation 1720 stations reporting, mean for 1950-2010 Observed JJA Precipitation (.1 mm) 7000 6000 5000 4000 3000 2000 1000 D. Nychka LatticeKrig 4

Kriging or Gaussian spatial process estimates D. Nychka LatticeKrig 5

Estimating a curve or surface. An additive statistical model: Given n pairs of observations (x i, y i ), i = 1,..., n y i = g(x i ) + ɛ i ɛ i s are random errors and g is an unknown, smooth realization of a Gaussian process. The goals: Estimate g(x) based on the observations Quantify the uncertainty in the estimate. D. Nychka LatticeKrig 6

Random Effects/Linear model for g {Φ j }: m basis functions g(x) = j Φ j (x)c j A linear model: y = Φc + ɛ Random effects: c MN(0, ρp ) and ɛ MN(0, σ 2 I) Implied Covariance: E[g(x)g(x )] = j,k Φ j (x)ρp j,k Φ k (x ) λ = σ 2 /ρ plays an important role as a parameter. D. Nychka LatticeKrig 7

Key ideas for large data sets Inverse of P chosen to be sparse. Basis functions have compact support. Still have a useful spatial model! Why this works Find c by: Ridge regression/ conditional expectation/blue ĝ(x) = E[g(x) y, P ] = k=1,n ĉ k Φ k (x) ĉ = ( Φ T Φ + λp 1) 1 Φ T y, λ = σ 2 /ρ Φ T, Φ T Φ, P 1 are sparse. D. Nychka LatticeKrig 8

Choosing the basis and P D. Nychka LatticeKrig 9

A recipe for radial basis functions Basis function j (x) = ϕ( x u j /θ) ϕ is a positive definite, compactly supported function a nice bump. {u j } basis centers on a regular grid θ scale set to provide some overlap 2-d Wendland bump function Standard Wendland (order=2) Radial 2-d function 1.5 1.0 0.5 0.0 0.5 1.0 1.5 D. Nychka LatticeKrig 10

A recipe for P 1 Markov random field among coefficients: c is a spatial AR 1 (4 + κ 2 )c j l N c l = e j {e j } are uncorrelated N(0,1) and N is 4 nearest neighbors. Hc MN(0, I) or c MN(0, (H T H) 1 ) i.e. P = (H T H) 1 ) Weights in lattice format:....... -1... -1 (4 + κ 2 ) -1... -1....... D. Nychka LatticeKrig 11

What about SPDEs? An alternative way to define Q = P 1 is Q j,k = LΦ j LΦ k dx where L is the differential operator associated with the SPDE. D. Nychka LatticeKrig 12

Realizations of c Simulated fields on a 101 101 lattice 2 1 0 1 2 5 0 5 10 80 70 60 50 40 30 κ =.5 κ =.1 κ =.01 Note: κ acts as range parameter D. Nychka LatticeKrig 13

Combining basis and coefficients g(x) = j Φ j (x)c j Coefficient field and evaluated surface. 0.7 0.8 0.9 1.0 1.1 1.2 Normalized to give a constant marginal variance: E[g(x) 2 )] = j Φ j (x) 2 P j,j = 1 Overlap set to 2.5 units of lattice. D. Nychka LatticeKrig 14

A convolution interpretation g(x) = j Φ j (x)c j = j ϕ((x j u j )/θ)c(u j ) = (1/m) j ϕ((x j u j )/θ)(m)c(u j ) ϕ((x u)/θ)c(u)du Note: this only makes sense if centers are on a uniform grid. D. Nychka LatticeKrig 15

Generalizing to a multi-resolution basis D. Nychka LatticeKrig 16

A 1-d Multi-Res cartoon... First level: 8 basis functions 0 2 4 6 8 Second level: 16 basis function 0 2 4 6 8 More levels: Increasing by factor of 2... D. Nychka LatticeKrig 17

Example of multi-resolution in 2d An example on the unit square starting with 11 11 grid: First level: centers on 11 11 grid scale of 2.5/10 =.25 11 2 = 121 basis functions Second level: centers on 21 21 unit grid scale of 2.5/20 21 2 = 441 basis functions. Four level multi-resolution for this case has 8804 basis functions. D. Nychka LatticeKrig 18

Some assumptions: The multiresolution: g(x) = g 1 (x) + g 2 (x) +... + g L (x) = Φ 1 j (x)c1 j + Φ 2 j (x)c2 j +... + Φ L j (x)cl j (1) Coefficients at each level follow a Markov Random field Coefficients between levels are independent At least two parameters: (κ l, ρ l ) at each level Correlation functions for each level correlation 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 log (1 correlation) 1e 05 1e 03 1e 01 0.005 0.020 0.100 distance distance D. Nychka LatticeKrig 19

Properties of spatial process D. Nychka LatticeKrig 20

Flexibility of LatticeKrig model Fitting an exponential (minimizing mean squared error) First level resolution of 10 10 3 levels, 4 levels, target exponential correlation 0.0 0.4 0.8 Error 0.10 0.00 0.05 4 3 0.0 0.2 0.4 0.6 0.8 1.0 distance 0.00 0.10 0.20 0.30 Distance Also works well for approximating smoother covariances. D. Nychka LatticeKrig 21

More Flexibility of LatticeKrig model Fitting a mixture of exponentials First level resolution of 10 10 3 levels, 4 levels, target:.4exp(.1) +.6Exp(3) Correlation 0.2 0.4 0.6 0.8 1.0 Error 0.03 0.01 0.01 4 3 0.0 0.4 0.8 Distance 0.00 0.10 0.20 0.30 Distance D. Nychka LatticeKrig 22

Some Theory Switch to an infinite sum of independent convolution processes. g(x) = g 1 (x) + g 2 (x) + g 3 (x) +... g l (x) has marginal variance ρ l and the spatial correlation range is θ l. What class of covariances can be approximated by letting {ρ l } and {θ l } vary? θ l = 2 l gives the power of 2 scaling of the multi-resolution. ρ l = 2 l/2 gives a process with similar smoothness as an exponential (matches tail behavior of spectral density.) D. Nychka LatticeKrig 23

The details: A theorem θ l = 2 l and ρ l = θ β 1 l c l (u) follows a Matern process with smoothness (ν) 1 and range θ l. ϕ is a K th order Wendland function. β 1 > 0, (β 1 + 1) < 5 + 2K If S(r) is the spectral density for the process g then C 1 r 2(β 1+1) < S(r) < C 2 r 2(β 1+1) as r. Comments: θ l = 2 l gives the power of 2 scaling of the multi-resolution. β 1 = 1/2 gives an exponential covariance-like spectral density. The tail behavior is not directly related to the smoothness of the basis functions or of the lattice process! They just need to be smooth enough. Approximation is accurate after about 4-6 components. D. Nychka LatticeKrig 24

Benefits of the multi-resolution Mixture of different scaled covariance functions can approximate standard covariance famlies. A mixture has the flexibility to approximate more complex covariance functions For irregularly spaced observational data the distances among station may vary widely and a multi-scale covariance model will adjust to these differences. D. Nychka LatticeKrig 25

Back to climate data D. Nychka LatticeKrig 26

Some details Used log transformation and weighted by number of observations Used stereographic projection for locations Elevation included as linear fixed effect. Covariance parameters found by maximum likelihood using space filling designs and partial maximization over ρ and σ. 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 D. Nychka LatticeKrig 27

Estimated correlation functions 0.4 0.6 Matern MLE Solid line 4 level model beginning with 10 10 and a single κ Dotted line 1 level with 73 73 basis functions 0.0 0.2 Correlation 0.8 1.0 In projected coordinates size of spatial domain roughly 1.0 units. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Distance Why these are so different is still uncertain. D. Nychka LatticeKrig 28

Predicted field and uncertainty D. Nychka LatticeKrig 29 For 4 level covariance model Mean Summer total rainfall (cm) pred. standard error/mean 1 2 5 20 40 60 0.05 0.1 0.25 0.5 1 2 4 Standard errors found by conditional simulation of 100 fields.

Summary Multi-resolution can approximate standard covariance families (e.g. Matern) Computational efficiency gained by compact basis functions and sparse precision matrix. Flexibility in model to account for nonstationary spatial dependence. Transform to preserve nonnegative rainfall and add orographic covariates See LatticeKrig package in R D. Nychka LatticeKrig 30

Thank you! D. Nychka LatticeKrig 31