Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances Fast estimation of the field Results Spatial statistics Interpolation Computational costs Seabed reconstruction What s done today? 14 9 4 00m 1 How do we reconstruct the seabed? Where should the next measurement be? 14 9 4 A data set consists of geometry x, y, z Plane coordinates x, y are typically provided by GPS Depth z is measured using a Single Beam EchoSounders SBES Data are collected along transect lines, with dense data along track but no coverage between tracks Given point measurements of depth: Today? Seabed reconstruction Background 00m 1 How do we reconstruct the seabed? Where should the next measurement be? A data set consists of geometry x, y, z Plane coordinates x, y are typically provided by GPS Depth z is measured using a Single Beam EchoSounders SBES Data are collected along transect lines, with dense data along track but no coverage between tracks Given point measurements of depth: Overview Today? Seabed reconstruction Background Seattle May 4, 009 4 3 Department of Mathematical Sciences Norwegian University of Science and Technology, Trondheim 1 Department of Statistics University of Washington Johan Lindstro m1, Finn Lindgren David Bolin Ha vard Rue3 Peter Jonsson4 Spatial modelling using Gaussian Markov Random Fields: Applied to seabed reconstruction
Today? What s done today? 1 Triangulate/grid the intermidiate points Interpolate the values using: Linear intepolation Splines Statistical methods can do better than this Gives no information about the uncertainties Today? What s done today? 1 Triangulate/grid the intermidiate points Interpolate the values using: Linear intepolation Splines Statistical methods can do better than this Gives no information about the uncertainties Interpolation Computations Spatials statistics Interpolation Spatial statistics in general deal with data observed over an area and with dependence between observations taken at different locations A basic problem in spatial statistics is to use observations, Yi, taken at a number of locations, {si} n i=1, to make inference about the value, Y0, at an unobserved location s0 Assume the observations come from a Gaussian field [ ] [ ] [ ] Y 0 Ñ 0 Ë 00 Ë0n N, Y Ë 0n Ënn Given known mean and covariance matrix the optimal predictor is the conditional mean E Y0 Y1,,Yn = Ñ 0 Ë0nË + nn 1 Y Ñ However the mean and variance are generally not known Interpolation Computations Spatials statistics Interpolation Spatial statistics in general deal with data observed over an area and with dependence between observations taken at different locations A basic problem in spatial statistics is to use observations, Yi, taken at a number of locations, {si} n i=1, to make inference about the value, Y0, at an unobserved location s0 Assume the observations come from a Gaussian field [ ] [ ] [ ] Y 0 Ñ 0 Ë 00 Ë0n N, Y Ë 0n Ënn Given known mean and covariance matrix the optimal predictor is the conditional mean E Y0 Y1,,Yn = Ñ 0 Ë0nË + nn 1 Y Ñ However the mean and variance are generally not known Ñ Ñ
Interpolation Computations Gaussian fields Assume a parametric form for the covariance and mean Y NÑθ, Ëθ The log-likelihood now becomes log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Estimate the parameters by maximising the log-likelihood The reconstruction becomes E Y0 Y1,,Yn,θ Ideally we d like to integrate out the parameter uncertainty, often done using MCMC Interpolation Computations Gaussian fields Assume a parametric form for the covariance and mean Y NÑθ, Ëθ The log-likelihood now becomes log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Estimate the parameters by maximising the log-likelihood The reconstruction becomes E Y0 Y1,,Yn,θ Ideally we d like to integrate out the parameter uncertainty, often done using MCMC Interpolation Computations Computational costs Both optimisation and MCMC will require repeated evaluations of the log-likelihood log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Evaluation of the log-likelihood has two difficults: 1 The covariance matrix is a n-by-n matrix, which contains n + n/ unique elements Calculating the determinant or inverse, or Cholesky factor of the covariance matrix requires O n 3 operations For the depth data we have 11705 observations Storing the covariance matrix requires roughly 1 GB, and evaluating the log-likelihood takes 5 minutes on a CoreDuo T8100 laptop Interpolation Computations Computational costs Both optimisation and MCMC will require repeated evaluations of the log-likelihood log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Evaluation of the log-likelihood has two difficults: 1 The covariance matrix is a n-by-n matrix, which contains n + n/ unique elements Calculating the determinant or inverse, or Cholesky factor of the covariance matrix requires O n 3 operations For the depth data we have 11705 observations Storing the covariance matrix requires roughly 1 GB, and evaluating the log-likelihood takes 5 minutes on a CoreDuo T8100 laptop
Basics Matérn INLA Results Gaussian Markov Random Fields GMRF:s A Gaussian Markov random field GMRF is a Gaussian random field with a Markov property The neighbours Ni to a point si are the points that in some sense are close to si The Gaussian random field x N Ñ,Q 1 has a joint distribution that satisfies pxi {xj : j i} = pxi {xj : j Ni} j / Ni xi xj { xk : k / {i,j} } Qi,j = 0 The density is px = Q 1/ exp Ô n/ 1 x Ñ Qx Ñ Fast algorithms that utilise the sparsity of Q exist c-package GMRFlib See Rue and Held 005 for extensive details on GMRF:s Basics Matérn INLA Results GMRF:s How do we choose Q? A GMRF may be computationally effective but it has been difficult to construct precision matrices that result in reasonable Gaussian fields Various ad-hoc methods exist A common solution is to use a small neighbourhood and let the precision between two points depend on the distance between the points Rue and Tjelmeland 00 created GMRF:s on rectangular grids that approximate Gaussian fields with a wide class of covariance functions Having the field defined only on a regular grid leads to issues with mapping the observations to the grid points Basics Matérn INLA Results GMRF:s How do we choose Q? A GMRF may be computationally effective but it has been difficult to construct precision matrices that result in reasonable Gaussian fields Various ad-hoc methods exist A common solution is to use a small neighbourhood and let the precision between two points depend on the distance between the points Rue and Tjelmeland 00 created GMRF:s on rectangular grids that approximate Gaussian fields with a wide class of covariance functions Having the field defined only on a regular grid leads to issues with mapping the observations to the grid points Basics Matérn INLA Results Matérn covariances The Matérn covariance family on u R d : ru,v = Cxu,xv = 1 Ò Ò v u Ò K Ò v u with scale inverse range > 0 and shape/smoothness Ò > 0 Here K Ò is a modified Bessel function Fields with Matérn covariances are solutions to an Stochastic Partial Differential Equation SPDE Whittle, 1954, / xu = Ø u, where u is spatial white noise, and = Ò + d/
Basics Matérn INLA Results Construction of Q Lindgren and Rue, 007 The GMRF approximation is constructed using a Finite Element method to solve the SPDE For = the precision matrix of the approximating GMRF can be written as Q = C + G C 1 C EC 1 C + G = C 1 G + I C E C 1 G + I C+G arises from the Finite Element approximation of CE is the precision of the driving spatial white noise It can be shown that CE = C C is tri-diagonal, with dense inverse To obtain a sparse Q-matrix Lindgren and Rue uses a diagonal approximation, C Basics Matérn INLA Results Construction of Q Lindgren and Rue, 007 The GMRF approximation is constructed using a Finite Element method to solve the SPDE For = the precision matrix of the approximating GMRF can be written as Q = C + G C 1 C EC 1 C + G = C 1 G + I C E C 1 G + I C+G arises from the Finite Element approximation of CE is the precision of the driving spatial white noise It can be shown that CE = C C is tri-diagonal, with dense inverse To obtain a sparse Q-matrix Lindgren and Rue uses a diagonal approximation, C Basics Matérn INLA Results Construction of Q An example Given a regular grid, and taking =, the finite element approximation of is 0 1 0 1 4 + 1 0 1 0 The corresponding elements in Q are 1 Ø 0 0 1 0 0 0 4 + 0 1 4 + 4 + + 4 4 + 1 0 4 + 0 0 0 1 0 0 For an irregular triangulation things are slightly more complicated Basics Matérn INLA Results Construction of Q An example Given a regular grid, and taking =, the finite element approximation of is 0 1 0 1 4 + 1 0 1 0 The corresponding elements in Q are 1 Ø 0 0 1 0 0 0 4 + 0 1 4 + 4 + + 4 4 + 1 0 4 + 0 0 0 1 0 0 For an irregular triangulation things are slightly more complicated
Basics Matérn INLA Results INLA Fast estimation Rue and Martino, 007 1 Assume an underlying GMRF with possibly non-gaussian point observations, ie x N Ñθ,Qθ 1, pyi x,θ = pyi xi,θ Obtain a Gaussian approximation of the posterior, px y,θ exp 1 x Ñ Ñ Q x + log pyi xi, θ i through a Taylor expansion of the log-observation density 3 Use the Gaussian approximation to do numerical optimisation and integration of the log-likelihood Provides a fast way of obtaining posteriors Errors due to the Taylor expansion and numerical integration are usually smaller than the MCMC errors from a reasonable MCMC run Basics Matérn INLA Results INLA Fast estimation Rue and Martino, 007 1 Assume an underlying GMRF with possibly non-gaussian point observations, ie x N Ñθ,Qθ 1, pyi x,θ = pyi xi,θ Obtain a Gaussian approximation of the posterior, px y,θ exp 1 x Ñ Ñ Q x + log pyi xi, θ i through a Taylor expansion of the log-observation density 3 Use the Gaussian approximation to do numerical optimisation and integration of the log-likelihood Provides a fast way of obtaining posteriors Errors due to the Taylor expansion and numerical integration are usually smaller than the MCMC errors from a reasonable MCMC run Basics Matérn INLA Results INLA Results Posterior densities for the hyperparameters of the model and for the underlying field where estimated using INLA Total estimation time on a CoreDuo laptop: 65 minutes Basics Matérn INLA Results Variance as a function of distance In a stationary model the variance will depend only on the distance to the closest measurement point 10 0 10 1 VX Y 10 10 0 10 10 4 dist If we use the variance to decide where to measure this implies that we should measure far from existing measurements The underlying field is most likely non-stationary We need a non-stationary model and a way of estimating it
Basics Matérn INLA Results Variance as a function of distance In a stationary model the variance will depend only on the distance to the closest measurement point 10 0 10 1 VX Y 10 10 0 10 10 4 dist If we use the variance to decide where to measure this implies that we should measure far from existing measurements The underlying field is most likely non-stationary We need a non-stationary model and a way of estimating it Model Estimation A non-stationary extension We now introduce a non-stationary version the SPDE through two modifications: 1 Drive the SPDE with independent Gaussian noise, but let the variance be a function of the location Let the range parameter,, vary in space Taking = we obtain s xs = 1 Õs Es, where Õs is a spatially varying precision of the driving noise Introducing diagonal matrices Õ and with elements ii = si and Õii = Õsi, The nonstationary precision matrix becomes Q = C 1 G + CÕ C 1G + Model Estimation A non-stationary extension We now introduce a non-stationary version the SPDE through two modifications: 1 Drive the SPDE with independent Gaussian noise, but let the variance be a function of the location Let the range parameter,, vary in space Taking = we obtain s xs = 1 Õs Es, where Õs is a spatially varying precision of the driving noise Introducing diagonal matrices Õ and with elements ii = si and Õii = Õsi, The nonstationary precision matrix becomes Q = C 1 G + CÕ C 1G + Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð
Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð Model Estimation Estimation The posterior density is px,θ,, Ð, Y py X, px θ, pθp ÐpÐp, px,θ, Ð,,Y is jointly Gaussian so we can integrate out X and θ obtaining p, Ð, Y Further it is posible to explicitly calculate the derivatives of the log-likelihood, log p, Ð, Y With the derivatives we use an ordinary BFGS-algorithm to obtain ML-estimates of the parameters Estimation of parameters along with calculations of the conditional posterior expectation, and variance now takes slightly less than one hour on a CoreDuo laptop Model Estimation Estimation The posterior density is px,θ,, Ð, Y py X, px θ, pθp ÐpÐp, px,θ, Ð,,Y is jointly Gaussian so we can integrate out X and θ obtaining p, Ð, Y Further it is posible to explicitly calculate the derivatives of the log-likelihood, log p, Ð, Y With the derivatives we use an ordinary BFGS-algorithm to obtain ML-estimates of the parameters Estimation of parameters along with calculations of the conditional posterior expectation, and variance now takes slightly less than one hour on a CoreDuo laptop
0 0 10 10 10 10 4 10 dist Rue, H and Tjelmeland, H 00, Fitting Gaussian Markov Random Fields to Gaussian Fields, Scand J Statist, 9, 31 49 Whittle, P 1954, On Stationary Processes in the Plane, Biometrika, 41, 434 449 This presentation: wwwmathslthse/matstat/staff/johanl/talks/ Lindgren, F and Rue, H 007, Explicit construction of GMRF approximations to generalised Mate rn Fields on irregular grids, Tech Rep 1, Centre for Mathematical Sciences, Lund University, Lund, Sweden Rue, H and Held, L 005, Gaussian Markov Random Fields; Theory and Applications, vol 104 of Monographs on Statistics and Applied Probability, Chapman & Hall/CRC Rue, H and Martino, S 007, Approximate Bayesian inference for hierarchical Gaussian Markov random field models, J Statist Plann and Inference, 137, 3177 319 Bibliography In the stationary Kriging model the variance depended only on the distance to the closest measurement point This is however not the case for the non-stationary field q Use Hessian from optimisation to run MCMC, to obtain posteriors Measurement errors are probably correlated along the transect lines How do we utilize the estimates to determine where to measure next? k Variance as a function of distance l Priors for the parameters especially? Strong dependence between and How to select the number of basis functions? TODO: Unresolved issues Results VX Y
SPDE issues Non-uniqueness: If xu is a solution to the SPDE for =, so is xu + c exp e u, for any unit length vector e and any constant c Non-stationarity: On a bounded domain, the SPDE solutions are non-stationary, unless conditioned on suitable boundary distributions Practical solution to the non-uniqueness and non-stationarity: Zero-normal-derivative Neumann boundaries reduce the impact of the null-space solutions Resulting covariance, for Ï = [0, L] R: Cxu,xv rmu,v + rmu, v + rmu,l v = rm0,v u + rm0,v + u + rm0,l v + u Computational costs GMRF:s The log-likelihood contains a number of different terms, the most costly to compute are: log Q and b Q 1 b 1 The cholesky factor of Q = LL is sparse possibly after reordering, and can be calculated efficiently The log determinante is log Q = i log L ii 3 We now have that Q 1 b = L L 1 b where L 1 b can be calculated by solving a sparse triangular equation system Computational costs GMRF:s cont For the derivatives the additional difficult term is the derivative log Q = tr Q 1 Q Due to the sparsity the trace can be calculated as tr Q 1 Q = n i=1 j {i,ni} Q 1 ij Q ji Thus to calculate the traces we will at most need the elements of Q 1 that correspond to neighbouring points in the GMRF Given the sparse Cholesky factor these elements can be calculated in O nlogn Comparing the stationary and non-stationary model