Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Similar documents
Fast algorithms that utilise the sparsity of Q exist (c-package ( Q 1. GMRFlib) Seattle February 3, 2009

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14

Gaussian Processes 1. Schedule

Spatial Statistics with Image Analysis. Lecture L08. Computer exercise 3. Lecture 8. Johan Lindström. November 25, 2016

A short introduction to INLA and R-INLA

Markov random fields. The Markov property

Latent Gaussian Processes and Stochastic Partial Differential Equations

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

The Bayesian approach to inverse problems

A GAUSSIAN MARKOV RANDOM FIELD MODEL FOR TOTAL YEARLY PRECIPITATION

Multivariate Gaussian Random Fields with SPDEs

Covariance function estimation in Gaussian process regression

Fast approximations for the Expected Value of Partial Perfect Information using R-INLA

Spatial smoothing over complex domain

Bayesian multiscale analysis for time series data

IMAGE MODELLING AND ESTIMATION FINN LINDGREN A STATISTICAL APPROACH CENTRUM SCIENTIARUM MATHEMATICARUM. Third edition, , Chapter 1 5

Statistics for extreme & sparse data

Fast kriging of large data sets with Gaussian Markov random fields

Nonparametric Bayesian Methods (Gaussian Processes)

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Nonparameteric Regression:

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

1 Bayesian Linear Regression (BLR)

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Parameter Estimation in High Dimensional Gaussian Distributions. Erlend Aune (NTNU, Norway) and Daniel P. Simpson (NTNU, Norway) May 28, 2018

Gaussian Process Regression

arxiv: v4 [stat.me] 14 Sep 2015

Spatial point processes in the modern world an

Beyond MCMC in fitting complex Bayesian models: The INLA method

Markov Chain Monte Carlo methods

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Integrated Non-Factorized Variational Inference

arxiv: v1 [stat.ap] 18 Apr 2011

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Gaussian with mean ( µ ) and standard deviation ( σ)

The Variational Gaussian Approximation Revisited

Nearest Neighbor Gaussian Processes for Large Spatial Data

System Identification, Lecture 4

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

System Identification, Lecture 4

Nonparametric Regression With Gaussian Processes

An EM algorithm for Gaussian Markov Random Fields

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Computer Intensive Methods in Mathematical Statistics

Spatial smoothing using Gaussian processes

Modelling Non-linear and Non-stationary Time Series

Summary STK 4150/9150

State Space Representation of Gaussian Processes

Data Analysis and Uncertainty Part 2: Estimation

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COM336: Neural Computing

arxiv: v1 [stat.co] 16 May 2011

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Introduction to Gaussian Processes

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Scalable kernel methods and their use in black-box optimization

Non-stationary Gaussian models with physical barriers

Improving posterior marginal approximations in latent Gaussian models

Inference for latent variable models with many hyperparameters

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Gaussian Processes (10/16/13)

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

CPSC 540: Machine Learning

Fast Direct Methods for Gaussian Processes

Bayesian computation using INLA. 5: Spatial Markovian models The SPDE approach

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Chapter 4 - Fundamentals of spatial processes Lecture notes

ECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Spatial Statistics with Image Analysis. Lecture L11. Home assignment 3. Lecture 11. Johan Lindström. December 5, 2016.

Introduction to Bayes and non-bayes spatial statistics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Gaussian Processes for Machine Learning

GAUSSIAN PROCESS REGRESSION

Model Selection for Gaussian Processes

A multi-resolution Gaussian process model for the analysis of large spatial data sets.

Gaussian Processes in Machine Learning

Bayesian Inference: Concept and Practice

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Frequentist-Bayesian Model Comparisons: A Simple Example

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Lecture 2: From Linear Regression to Kalman Filter and Beyond

K-Means and Gaussian Mixture Models

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012

Gaussian Markov Random Fields: Theory and Applications

A Process over all Stationary Covariance Kernels

Non-Parametric Bayes

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Bayesian Regression Linear and Logistic Regression

Computer model calibration with large non-stationary spatial outputs: application to the calibration of a climate model

Geostatistical Modeling for Large Data Sets: Low-rank methods

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Transcription:

Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances Fast estimation of the field Results Spatial statistics Interpolation Computational costs Seabed reconstruction What s done today? 14 9 4 00m 1 How do we reconstruct the seabed? Where should the next measurement be? 14 9 4 A data set consists of geometry x, y, z Plane coordinates x, y are typically provided by GPS Depth z is measured using a Single Beam EchoSounders SBES Data are collected along transect lines, with dense data along track but no coverage between tracks Given point measurements of depth: Today? Seabed reconstruction Background 00m 1 How do we reconstruct the seabed? Where should the next measurement be? A data set consists of geometry x, y, z Plane coordinates x, y are typically provided by GPS Depth z is measured using a Single Beam EchoSounders SBES Data are collected along transect lines, with dense data along track but no coverage between tracks Given point measurements of depth: Overview Today? Seabed reconstruction Background Seattle May 4, 009 4 3 Department of Mathematical Sciences Norwegian University of Science and Technology, Trondheim 1 Department of Statistics University of Washington Johan Lindstro m1, Finn Lindgren David Bolin Ha vard Rue3 Peter Jonsson4 Spatial modelling using Gaussian Markov Random Fields: Applied to seabed reconstruction

Today? What s done today? 1 Triangulate/grid the intermidiate points Interpolate the values using: Linear intepolation Splines Statistical methods can do better than this Gives no information about the uncertainties Today? What s done today? 1 Triangulate/grid the intermidiate points Interpolate the values using: Linear intepolation Splines Statistical methods can do better than this Gives no information about the uncertainties Interpolation Computations Spatials statistics Interpolation Spatial statistics in general deal with data observed over an area and with dependence between observations taken at different locations A basic problem in spatial statistics is to use observations, Yi, taken at a number of locations, {si} n i=1, to make inference about the value, Y0, at an unobserved location s0 Assume the observations come from a Gaussian field [ ] [ ] [ ] Y 0 Ñ 0 Ë 00 Ë0n N, Y Ë 0n Ënn Given known mean and covariance matrix the optimal predictor is the conditional mean E Y0 Y1,,Yn = Ñ 0 Ë0nË + nn 1 Y Ñ However the mean and variance are generally not known Interpolation Computations Spatials statistics Interpolation Spatial statistics in general deal with data observed over an area and with dependence between observations taken at different locations A basic problem in spatial statistics is to use observations, Yi, taken at a number of locations, {si} n i=1, to make inference about the value, Y0, at an unobserved location s0 Assume the observations come from a Gaussian field [ ] [ ] [ ] Y 0 Ñ 0 Ë 00 Ë0n N, Y Ë 0n Ënn Given known mean and covariance matrix the optimal predictor is the conditional mean E Y0 Y1,,Yn = Ñ 0 Ë0nË + nn 1 Y Ñ However the mean and variance are generally not known Ñ Ñ

Interpolation Computations Gaussian fields Assume a parametric form for the covariance and mean Y NÑθ, Ëθ The log-likelihood now becomes log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Estimate the parameters by maximising the log-likelihood The reconstruction becomes E Y0 Y1,,Yn,θ Ideally we d like to integrate out the parameter uncertainty, often done using MCMC Interpolation Computations Gaussian fields Assume a parametric form for the covariance and mean Y NÑθ, Ëθ The log-likelihood now becomes log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Estimate the parameters by maximising the log-likelihood The reconstruction becomes E Y0 Y1,,Yn,θ Ideally we d like to integrate out the parameter uncertainty, often done using MCMC Interpolation Computations Computational costs Both optimisation and MCMC will require repeated evaluations of the log-likelihood log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Evaluation of the log-likelihood has two difficults: 1 The covariance matrix is a n-by-n matrix, which contains n + n/ unique elements Calculating the determinant or inverse, or Cholesky factor of the covariance matrix requires O n 3 operations For the depth data we have 11705 observations Storing the covariance matrix requires roughly 1 GB, and evaluating the log-likelihood takes 5 minutes on a CoreDuo T8100 laptop Interpolation Computations Computational costs Both optimisation and MCMC will require repeated evaluations of the log-likelihood log Lθ Y = 1 log Ëθ 1 Y Ñθ Ëθ 1 Y Ñθ Evaluation of the log-likelihood has two difficults: 1 The covariance matrix is a n-by-n matrix, which contains n + n/ unique elements Calculating the determinant or inverse, or Cholesky factor of the covariance matrix requires O n 3 operations For the depth data we have 11705 observations Storing the covariance matrix requires roughly 1 GB, and evaluating the log-likelihood takes 5 minutes on a CoreDuo T8100 laptop

Basics Matérn INLA Results Gaussian Markov Random Fields GMRF:s A Gaussian Markov random field GMRF is a Gaussian random field with a Markov property The neighbours Ni to a point si are the points that in some sense are close to si The Gaussian random field x N Ñ,Q 1 has a joint distribution that satisfies pxi {xj : j i} = pxi {xj : j Ni} j / Ni xi xj { xk : k / {i,j} } Qi,j = 0 The density is px = Q 1/ exp Ô n/ 1 x Ñ Qx Ñ Fast algorithms that utilise the sparsity of Q exist c-package GMRFlib See Rue and Held 005 for extensive details on GMRF:s Basics Matérn INLA Results GMRF:s How do we choose Q? A GMRF may be computationally effective but it has been difficult to construct precision matrices that result in reasonable Gaussian fields Various ad-hoc methods exist A common solution is to use a small neighbourhood and let the precision between two points depend on the distance between the points Rue and Tjelmeland 00 created GMRF:s on rectangular grids that approximate Gaussian fields with a wide class of covariance functions Having the field defined only on a regular grid leads to issues with mapping the observations to the grid points Basics Matérn INLA Results GMRF:s How do we choose Q? A GMRF may be computationally effective but it has been difficult to construct precision matrices that result in reasonable Gaussian fields Various ad-hoc methods exist A common solution is to use a small neighbourhood and let the precision between two points depend on the distance between the points Rue and Tjelmeland 00 created GMRF:s on rectangular grids that approximate Gaussian fields with a wide class of covariance functions Having the field defined only on a regular grid leads to issues with mapping the observations to the grid points Basics Matérn INLA Results Matérn covariances The Matérn covariance family on u R d : ru,v = Cxu,xv = 1 Ò Ò v u Ò K Ò v u with scale inverse range > 0 and shape/smoothness Ò > 0 Here K Ò is a modified Bessel function Fields with Matérn covariances are solutions to an Stochastic Partial Differential Equation SPDE Whittle, 1954, / xu = Ø u, where u is spatial white noise, and = Ò + d/

Basics Matérn INLA Results Construction of Q Lindgren and Rue, 007 The GMRF approximation is constructed using a Finite Element method to solve the SPDE For = the precision matrix of the approximating GMRF can be written as Q = C + G C 1 C EC 1 C + G = C 1 G + I C E C 1 G + I C+G arises from the Finite Element approximation of CE is the precision of the driving spatial white noise It can be shown that CE = C C is tri-diagonal, with dense inverse To obtain a sparse Q-matrix Lindgren and Rue uses a diagonal approximation, C Basics Matérn INLA Results Construction of Q Lindgren and Rue, 007 The GMRF approximation is constructed using a Finite Element method to solve the SPDE For = the precision matrix of the approximating GMRF can be written as Q = C + G C 1 C EC 1 C + G = C 1 G + I C E C 1 G + I C+G arises from the Finite Element approximation of CE is the precision of the driving spatial white noise It can be shown that CE = C C is tri-diagonal, with dense inverse To obtain a sparse Q-matrix Lindgren and Rue uses a diagonal approximation, C Basics Matérn INLA Results Construction of Q An example Given a regular grid, and taking =, the finite element approximation of is 0 1 0 1 4 + 1 0 1 0 The corresponding elements in Q are 1 Ø 0 0 1 0 0 0 4 + 0 1 4 + 4 + + 4 4 + 1 0 4 + 0 0 0 1 0 0 For an irregular triangulation things are slightly more complicated Basics Matérn INLA Results Construction of Q An example Given a regular grid, and taking =, the finite element approximation of is 0 1 0 1 4 + 1 0 1 0 The corresponding elements in Q are 1 Ø 0 0 1 0 0 0 4 + 0 1 4 + 4 + + 4 4 + 1 0 4 + 0 0 0 1 0 0 For an irregular triangulation things are slightly more complicated

Basics Matérn INLA Results INLA Fast estimation Rue and Martino, 007 1 Assume an underlying GMRF with possibly non-gaussian point observations, ie x N Ñθ,Qθ 1, pyi x,θ = pyi xi,θ Obtain a Gaussian approximation of the posterior, px y,θ exp 1 x Ñ Ñ Q x + log pyi xi, θ i through a Taylor expansion of the log-observation density 3 Use the Gaussian approximation to do numerical optimisation and integration of the log-likelihood Provides a fast way of obtaining posteriors Errors due to the Taylor expansion and numerical integration are usually smaller than the MCMC errors from a reasonable MCMC run Basics Matérn INLA Results INLA Fast estimation Rue and Martino, 007 1 Assume an underlying GMRF with possibly non-gaussian point observations, ie x N Ñθ,Qθ 1, pyi x,θ = pyi xi,θ Obtain a Gaussian approximation of the posterior, px y,θ exp 1 x Ñ Ñ Q x + log pyi xi, θ i through a Taylor expansion of the log-observation density 3 Use the Gaussian approximation to do numerical optimisation and integration of the log-likelihood Provides a fast way of obtaining posteriors Errors due to the Taylor expansion and numerical integration are usually smaller than the MCMC errors from a reasonable MCMC run Basics Matérn INLA Results INLA Results Posterior densities for the hyperparameters of the model and for the underlying field where estimated using INLA Total estimation time on a CoreDuo laptop: 65 minutes Basics Matérn INLA Results Variance as a function of distance In a stationary model the variance will depend only on the distance to the closest measurement point 10 0 10 1 VX Y 10 10 0 10 10 4 dist If we use the variance to decide where to measure this implies that we should measure far from existing measurements The underlying field is most likely non-stationary We need a non-stationary model and a way of estimating it

Basics Matérn INLA Results Variance as a function of distance In a stationary model the variance will depend only on the distance to the closest measurement point 10 0 10 1 VX Y 10 10 0 10 10 4 dist If we use the variance to decide where to measure this implies that we should measure far from existing measurements The underlying field is most likely non-stationary We need a non-stationary model and a way of estimating it Model Estimation A non-stationary extension We now introduce a non-stationary version the SPDE through two modifications: 1 Drive the SPDE with independent Gaussian noise, but let the variance be a function of the location Let the range parameter,, vary in space Taking = we obtain s xs = 1 Õs Es, where Õs is a spatially varying precision of the driving noise Introducing diagonal matrices Õ and with elements ii = si and Õii = Õsi, The nonstationary precision matrix becomes Q = C 1 G + CÕ C 1G + Model Estimation A non-stationary extension We now introduce a non-stationary version the SPDE through two modifications: 1 Drive the SPDE with independent Gaussian noise, but let the variance be a function of the location Let the range parameter,, vary in space Taking = we obtain s xs = 1 Õs Es, where Õs is a spatially varying precision of the driving noise Introducing diagonal matrices Õ and with elements ii = si and Õii = Õsi, The nonstationary precision matrix becomes Q = C 1 G + CÕ C 1G + Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð

Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð Model Estimation The model Gaussian point observations: Yj X N Xsj, An underlying GMRF: X N B Ñ θ,qõ, 1 The trend, B Ñ, is assumed to consist of a constant and a linear trend, with a N 0,10 6 I prior for θ Precision, Õ, and range,, are modeled using a set of basis functions eg B-splines Õs = expbq Õ and s = expbq Ideally we would like a smoothnes Õ prior on and, eg logõs N Õ 0,Ð Q 0 1, where the Ð:s are hyper-parameters This results in the following prior on Õ and, Õ N 0, Ð Õ B q Q 0 Bq 1 Finally we take hyper-priors on Ð Model Estimation Estimation The posterior density is px,θ,, Ð, Y py X, px θ, pθp ÐpÐp, px,θ, Ð,,Y is jointly Gaussian so we can integrate out X and θ obtaining p, Ð, Y Further it is posible to explicitly calculate the derivatives of the log-likelihood, log p, Ð, Y With the derivatives we use an ordinary BFGS-algorithm to obtain ML-estimates of the parameters Estimation of parameters along with calculations of the conditional posterior expectation, and variance now takes slightly less than one hour on a CoreDuo laptop Model Estimation Estimation The posterior density is px,θ,, Ð, Y py X, px θ, pθp ÐpÐp, px,θ, Ð,,Y is jointly Gaussian so we can integrate out X and θ obtaining p, Ð, Y Further it is posible to explicitly calculate the derivatives of the log-likelihood, log p, Ð, Y With the derivatives we use an ordinary BFGS-algorithm to obtain ML-estimates of the parameters Estimation of parameters along with calculations of the conditional posterior expectation, and variance now takes slightly less than one hour on a CoreDuo laptop

0 0 10 10 10 10 4 10 dist Rue, H and Tjelmeland, H 00, Fitting Gaussian Markov Random Fields to Gaussian Fields, Scand J Statist, 9, 31 49 Whittle, P 1954, On Stationary Processes in the Plane, Biometrika, 41, 434 449 This presentation: wwwmathslthse/matstat/staff/johanl/talks/ Lindgren, F and Rue, H 007, Explicit construction of GMRF approximations to generalised Mate rn Fields on irregular grids, Tech Rep 1, Centre for Mathematical Sciences, Lund University, Lund, Sweden Rue, H and Held, L 005, Gaussian Markov Random Fields; Theory and Applications, vol 104 of Monographs on Statistics and Applied Probability, Chapman & Hall/CRC Rue, H and Martino, S 007, Approximate Bayesian inference for hierarchical Gaussian Markov random field models, J Statist Plann and Inference, 137, 3177 319 Bibliography In the stationary Kriging model the variance depended only on the distance to the closest measurement point This is however not the case for the non-stationary field q Use Hessian from optimisation to run MCMC, to obtain posteriors Measurement errors are probably correlated along the transect lines How do we utilize the estimates to determine where to measure next? k Variance as a function of distance l Priors for the parameters especially? Strong dependence between and How to select the number of basis functions? TODO: Unresolved issues Results VX Y

SPDE issues Non-uniqueness: If xu is a solution to the SPDE for =, so is xu + c exp e u, for any unit length vector e and any constant c Non-stationarity: On a bounded domain, the SPDE solutions are non-stationary, unless conditioned on suitable boundary distributions Practical solution to the non-uniqueness and non-stationarity: Zero-normal-derivative Neumann boundaries reduce the impact of the null-space solutions Resulting covariance, for Ï = [0, L] R: Cxu,xv rmu,v + rmu, v + rmu,l v = rm0,v u + rm0,v + u + rm0,l v + u Computational costs GMRF:s The log-likelihood contains a number of different terms, the most costly to compute are: log Q and b Q 1 b 1 The cholesky factor of Q = LL is sparse possibly after reordering, and can be calculated efficiently The log determinante is log Q = i log L ii 3 We now have that Q 1 b = L L 1 b where L 1 b can be calculated by solving a sparse triangular equation system Computational costs GMRF:s cont For the derivatives the additional difficult term is the derivative log Q = tr Q 1 Q Due to the sparsity the trace can be calculated as tr Q 1 Q = n i=1 j {i,ni} Q 1 ij Q ji Thus to calculate the traces we will at most need the elements of Q 1 that correspond to neighbouring points in the GMRF Given the sparse Cholesky factor these elements can be calculated in O nlogn Comparing the stationary and non-stationary model