The Matrix Reloaded: Computations for large spatial data sets

Similar documents
The Matrix Reloaded: Computations for large spatial data sets

Tomoko Matsuo s collaborators

Nearest Neighbor Gaussian Processes for Large Spatial Data

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Forecasting and data assimilation

Theory and Computation for Gaussian Processes

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Climate Change: the Uncertainty of Certainty

Spectral Ensemble Kalman Filters

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Some Areas of Recent Research

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Lecture 9: Introduction to Kriging

Bayesian Inverse problem, Data assimilation and Localization

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Multi-resolution models for large data sets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Covariance Tapering for Interpolation of Large Spatial Datasets

A Fused Lasso Approach to Nonstationary Spatial Covariance Estimation

Morphing ensemble Kalman filter

Hidden Markov Models for precipitation

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis

X t = a t + r t, (7.1)

Covariance Tapering for Interpolation of Large Spatial Datasets

Derivation of the Kalman Filter

Lecture 14 Bayesian Models for Spatio-Temporal Data

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Geostatistics for Gaussian processes

Multivariate Gaussian Random Fields with SPDEs

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

Spatial smoothing using Gaussian processes

Bayesian data analysis in practice: Three simple examples

Sub-kilometer-scale space-time stochastic rainfall simulation

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

CPSC 540: Machine Learning

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Types of Spatial Data

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Multiple realizations using standard inversion techniques a

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Lecture 18 Classical Iterative Methods

Groundwater permeability

Combining Interval and Probabilistic Uncertainty in Engineering Applications

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

A Bayesian Spatio-Temporal Geostatistical Model with an Auxiliary Lattice for Large Datasets

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Course Notes: Week 1

Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization

A short introduction to INLA and R-INLA

Compressive Sensing Applied to Full-wave Form Inversion

1 Sparsity and l 1 relaxation

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

A multi-resolution Gaussian process model for the analysis of large spatial data sets.

Representation of inhomogeneous, non-separable covariances by sparse wavelet-transformed matrices

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

From Stationary Methods to Krylov Subspaces

Image Filtering. Slides, adapted from. Steve Seitz and Rick Szeliski, U.Washington

Outline. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices Methods and Kriging

Covariance function estimation in Gaussian process regression

Multi-resolution models for large data sets

Fundamentals of Data Assimila1on

The Ensemble Kalman Filter:

Multivariate spatial modeling

Introduction to Gaussian Process

Ergodicity in data assimilation methods

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Overview of Spatial Statistics with Applications to fmri

Sparse Matrices and Large Data Issues

A Multi-resolution Gaussian process model for the analysis of large spatial data sets.

Convex Optimization CMU-10725

Simple example of analysis on spatial-temporal data set

Sparse Matrices and Large Data Issues

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012

Statistical analysis of regional climate models. Douglas Nychka, National Center for Atmospheric Research

A hierarchical Bayesian model to estimate and forecast ozone through space and time

CPSC 540: Machine Learning

Lecture 9 Approximations of Laplace s Equation, Finite Element Method. Mathématiques appliquées (MATH0504-1) B. Dewals, C.

Fixed-domain Asymptotics of Covariance Matrices and Preconditioning

Testing Restrictions and Comparing Models

Nonparametric Bayesian Methods

Choosing among models

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Hierarchical Modeling for Multivariate Spatial Data

Spectral Regularization

Numerical Weather Prediction: Data assimilation. Steven Cavallo

Stanford Exploration Project, Report 105, September 5, 2000, pages 41 53

Covariance Matrix Simplification For Efficient Uncertainty Management

Modelling Multivariate Spatial Data

Iterative Methods for Solving A x = b

Gaussian Filtering Strategies for Nonlinear Systems

Regularization via Spectral Filtering

Transcription:

The Matrix Reloaded: Computations for large spatial data sets Doug Nychka National Center for Atmospheric Research The spatial model Solving linear systems Matrix multiplication Creating sparsity Sparsity, fast matrix multiplications, iterative solutions. Supported by the National Science Foundation DMS

The Cast Reinhard Furrer and Marc Genton Chris Wikle and J Andrew Royle Tim Hoar Ralph Milliff and Mark Berliner Craig Johns

A large spatial dataset Reporting precipitation stations for 1997.

Spatial Models We observe a random field, u(x), e.g. ozone concentration at location x, with covariance function k(x, x ) = COV (u(x), u(x ) There are other parts of u that are important: E(u(x)), fixed effects and covariates u(x) is not Gaussian Copies of u(x) observed at different times are correlated, e.g ozone fields for each day. I really don t want to talk about these today!

Spatial Models (continued) Let u be the field values on a large, regular 2-d grid (and stacked as a vector). This is our universe. Σ = COV (u) The monster matrix Σ An exponential covariance with range 340 miles for the ozone example.

Observational model We observe part of u, possibly with error. Y = Ku e K is a known observational functional such as an incidence matrix of ones and zeroes for irregularly spaced data or weighted averages... or both. K is usually sparse... COV (e) = R (where part of the variability may be due to discretization error.)

Kriging Assuming Y has zero mean. or with Q = COV (Y) = KΣK T R and the covariance of the estimate is û = COV (u, Y)COV (Y) 1 Y û = ΣKQ 1 Y Σ ΣKQ 1 K T Σ I like to think of the estimate as based on the conditional multivariate normal distribution of the grid points given the data: [u Y] An approximate posterior.

Surface Ozone Pollution (YAOZE) 8-hour average ozone measurements (in PPB) for June 19,1987, Midwest US and the posterior mean surface.

Five samples from the posterior 0 50 100 150 200

The problems Σ is big and so are Q and Y! Simplistic implementations will take too long or involve matrices that are too big.

Some ideas for computation Don t invert the matrix Q = (KΣK T R)! Instead solve the linear system for ω and then Qω = Y û = ΣKω Estimate variability by generating samples (an ensemble) from the conditional distribution. To obtain a realization: u N(0, Σ). Y = Ku e perturbation= û u ΣKQ 1 Y û perturbation Note that the key step in generating ensemble members is just a Kriging estimate.

Solving linear systems Our job is to find ω for Qω = Y Conjugate gradient algorithm (CGA) is an iterative method for finding a solution. If Q is n n it will find the exact solution in n steps but one can stop in much fewer steps to obtain an approximate solution. Each iteration only requires two multiplications of Q by a vector. So CGA never needs to create Q, one only needs to only multiply Q by vectors a limited number of times. But how do we multiply matrices fast?

Fast multiplication convolution: If Σ is formed from a stationary covariance matrix then Σv has components k(x i, x j )v(x j ) = ψ(x i x j )v(x j ) j j so the multiplication is just a convolution to the covariance with the vector (actually an image). Convolution of an image can be done very quickly using a 2-d fast Fourier transform. sparsity: If K or K T are sparse (mostly zeroes) then multiplications can be done quickly by skipping zero elements. multi-resolution: If Σ = W DW T where W v is the (inverse) discrete wavelet transform and D is sparse then the multiplication is fast. In all of these we never need to create the full matrices to do the multiplications!

A 1-d wavelet basis of 32 functions 0.0 0.5 1.0 1.5 2 1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Back to Q and Kriging Q = (KΣK T R) If all the intermediate matrices can be multiplied quickly then so can Q. Also note that ΣKω can also be done quickly.

GibbsWinds A project to create ocean surface wind fields by blending two forms of data. The CGA was used in the core of a Gibbs sampler using multi-resolution-based covariances. Results are posterior wind realizations for the tropical ocean every 6 hours for three years. 256 96 spatial grid and 4648 time points.

An example of a GibbsWinds realization

Comparison to TAO buoy data

Another approach: Enforcing sparsity Our job is to find ω for Qω = Y If Q is sparse we can. Find a sparse (and exact) Cholesky factorization Q = CC T Solve using sparsity Cη = Y Solve using sparsity C T ω = η But how can Q be made sparse?

Tapering Σ Introduce zeroes by multiplying Σ with a positive definite tapering function, h. (componentwise multiplication) This certainly introduces zeroes... But are we still close to Kriging? Is it faster? Σ i,j = Σ i,j h i,j

Some numerical results for MSE 30 30 grid predicting at center. Ordinary Kriging (OK) OK with tapering (.2,.15,.1) Nearest Neighbor OK (4,16)

Matern smoothness.75

Matern smoothness 1.25

Some timing results for sparse Kriging At 2000 points the difference is more than 100 : 1!

Summary One can multiply stationary covariance matrices (and multi-resolution matrices) )quickly if points are on grids. CGA can be used to handle large problems. Sparsity can be enforced with little penalty and much improvement in speed. Asymptotic theory of Stein supports these results. Some open issues: Companion efficiency in estimating the covariance model. Relationship to the spatial Kalman filter.