The Matrix Reloaded: Computations for large spatial data sets

Similar documents
The Matrix Reloaded: Computations for large spatial data sets

Tomoko Matsuo s collaborators

Nearest Neighbor Gaussian Processes for Large Spatial Data

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Theory and Computation for Gaussian Processes

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Multi-resolution models for large data sets

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Some Areas of Recent Research

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Forecasting and data assimilation

Spectral Ensemble Kalman Filters

Covariance Tapering for Interpolation of Large Spatial Datasets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Covariance Tapering for Interpolation of Large Spatial Datasets

Climate Change: the Uncertainty of Certainty

Lecture 9: Introduction to Kriging

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis

Spatial smoothing using Gaussian processes

Multivariate Gaussian Random Fields with SPDEs

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Bayesian Inverse problem, Data assimilation and Localization

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

A multi-resolution Gaussian process model for the analysis of large spatial data sets.

Types of Spatial Data

A Fused Lasso Approach to Nonstationary Spatial Covariance Estimation

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Lecture 18 Classical Iterative Methods

Representation of inhomogeneous, non-separable covariances by sparse wavelet-transformed matrices

Hidden Markov Models for precipitation

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Multi-resolution models for large data sets

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A Multi-resolution Gaussian process model for the analysis of large spatial data sets.

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012

X t = a t + r t, (7.1)

Derivation of the Kalman Filter

Geostatistics for Gaussian processes

Lecture 14 Bayesian Models for Spatio-Temporal Data

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

Fixed-domain Asymptotics of Covariance Matrices and Preconditioning

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

CPSC 540: Machine Learning

Convex Optimization CMU-10725

Iterative Methods for Solving A x = b

Morphing ensemble Kalman filter

Multiple realizations using standard inversion techniques a

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Nonparametric Bayesian Methods

Groundwater permeability

Combining Interval and Probabilistic Uncertainty in Engineering Applications

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

Covariance function estimation in Gaussian process regression

Outline. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices Methods and Kriging

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

A Bayesian Spatio-Temporal Geostatistical Model with an Auxiliary Lattice for Large Datasets

Overview of Spatial Statistics with Applications to fmri

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Statistical analysis of regional climate models. Douglas Nychka, National Center for Atmospheric Research

Course Notes: Week 1

Covariance Tapering for Prediction of Large Spatial Data Sets in Transformed Random Fields

Polynomial accelerated MCMC... and other sampling algorithms inspired by computational optimization

Sparse Matrices and Large Data Issues

Spectral Regularization

Sparse Matrices and Large Data Issues

Adaptive Nonparametric Density Estimators

A short introduction to INLA and R-INLA

1 Sparsity and l 1 relaxation

Hierarchical Modeling for Multivariate Spatial Data

Numerical Approximation Methods for Non-Uniform Fourier Data

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Sub-kilometer-scale space-time stochastic rainfall simulation

Regularization via Spectral Filtering

Introduction to Machine Learning CMU-10701

Bayesian data analysis in practice: Three simple examples

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

OPTIMAL CONTROL AND ESTIMATION

Gaussian Filtering Strategies for Nonlinear Systems

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Lecture Notes 5: Multiresolution Analysis

From Stationary Methods to Krylov Subspaces

Image Filtering. Slides, adapted from. Steve Seitz and Rick Szeliski, U.Washington

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Logistic Regression Logistic

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Latent Gaussian Processes and Stochastic Partial Differential Equations

An Inversion-Free Estimating Equations Approach for. Gaussian Process Models

Introduction to Gaussian Process

Hierarchical Modelling for Multivariate Spatial Data

Energy Landscapes of Deep Networks

Multivariate spatial modeling

Transcription:

The Matrix Reloaded: Computations for large spatial data sets The spatial model Solving linear systems Matrix multiplication Creating sparsity Doug Nychka National Center for Atmospheric Research Sparsity, fast matrix multiplications, iterative solutions. Supported by the National Science Foundation DMS

The Cast Reinhard Furrer and Marc Genton Chris Wikle and J Andrew Royle Tim Hoar Ralph Milliff and Mark Berliner Craig Johns

A large spatial dataset Reporting precipitation stations for 1997.

Spatial Models We observe a random field, u(x), e.g. ozone concentration at location x, with covariance function k(x, x ) = COV (u(x), u(x ) There are other parts of u that are important: E(u(x)), fixed effects and covariates u(x) is not Gaussian Copies of u(x) observed at different times are correlated, e.g ozone fields for each day. I really don t want to talk about these today!

Spatial Models (continued) Let u be the field values on a large, regular 2-d grid (and stacked as a vector). This is our universe. Σ = COV (u) The monster matrix Σ An exponential covariance with range 340 miles for the ozone example.

Observational model We observe part of u, possibly with error. Y = Ku e K is a known observational functional such as an incidence matrix of ones and zeroes for irregularly spaced data or weighted averages... or both. K is usually sparse... COV (e) = R (where part of the variability may be due to discretization error.)

Kriging Assuming Y has zero mean. or with Q = COV (Y ) = KΣK T R and the covariance of the estimate is û = COV (u, Y )COV (Y ) 1 Y û = ΣKQ 1 Y Σ ΣKQ 1 K T Σ I like to think of the estimate as based on the conditional multivariate normal distribution of the grid points given the data: [u Y ] An approximate posterior.

Surface Ozone Pollution (YAOZE) 8-hour average ozone measurements (in PPB) for June 19,1987, Midwest US and the posterior mean surface.

Five samples from the posterior 0 50 100 150 200

The problems Σ is big and so are Q and Y! Simplistic implementations will take too long or involve matrices that are too big.

Some ideas for computation Don t invert the matrix Q = (KΣK T R)! Instead solve the linear system Qω = Y for ω and then û = ΣKω Estimate variability by generating samples from the conditional distribution. u N(0, Σ). Y = Ku e perturbation= u ΣKQ 1 Y û perturbation The key third step is just the Kriging estimate!

Solving linear systems Our job is to find ω for Qω = Y Conjugate gradient algorithm (CGA) is an iterative method for finding a solution. If Q is n n it will find the exact solution in n steps but one can stop in much fewer steps to obtain an approximate solution. Each iteration only requires two multiplications of Q by a vector. So CGA never needs to create Q, one only needs to only multiply Q by vectors a limited number of times. But how do we multiply matrices fast?

Fast multiplication convolution: Σv has components j If Σ is formed from a stationary covariance matrix then k(x i, x j )v(x j ) = j ψ(x i x j )v(x j ) so the multiplication is just a convolution to the covariance with the vector (actually an image). Convolution of an image can be done very quickly using a 2-d fast Fourier transform. sparsity: If K or K T are sparse (mostly zeroes) then multiplications can be done quickly by skipping zero elements. multi-resolution: If Σ = W DW T where W v is the (inverse) discrete wavelet transform and D is sparse then the multiplication is fast. In all of these we never need to create the full matrices to do the multiplications!

A 1-d wavelet basis of 32 functions 0.0 0.5 1.0 1.5 2 1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 3 2 1 0 1 2 3 4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Back to Q and Kriging Q = (KΣK T R) If all the intermediate matrices can be multiplied quickly then so can Q. Also note that ΣKω can also be done quickly.

GibbsWinds A project to create ocean surface wind fields by blending two forms of data. The CGA was used in the core of a Gibbs sampler using multi-resolution-based covariances. Results are posterior wind realizations for the tropical ocean every 6 hours for three years. 256 96 spatial grid and 4648 time points.

An example of a GibbsWinds realization

Comparison to TAO buoy data

Another approach: Enforcing sparsity Our job is to find ω for Qω = Y If Q is sparse we can. Find a sparse (and exact) Cholesky factorization Q = CC T Solve using sparsity Cη = Y Solve using sparsity C T ω = η But how can Q be made sparse?

Tapering Σ Introduce zeroes by multiplying Σ with a positive definite, compactly supported tapering function, H. (componentwise multiplication) Σ i,j = Σ i,j H i,j This certainly introduces zeroes... But are we still close to Kriging? Is it faster?

Some numerical results for MSE 30 30 grid predicting at center. Ordinary Kriging (OK) OK with tapering (.2,.15,.1) Nearest Neighbor OK (4,16)

Matern smoothness.75

Matern smoothness 1.25

Some timing results for sparse Kriging At 2000 points the difference is more than 100 : 1!

Why does this work? Working in spectral domain Stationary covariance functions can be uniquely determined by there spectral density (a.k.a. Fourier transform or characterisic function). From Michael Stein s work if tail of spectral densities are the same then MSE of estimates are asymptotically equivalent. Effect of tapering: Convolutions of densities products of Fourier transforms Products of densities convolutions of Fourier transforms Under certain smoothness on the taper: lim u ˆσ(u) = constant ˆσ(u v)ĥ(v)dv

Summary One can multiply stationary covariance matrices (and multi-resolution matrices) quickly if points are on grids. CGA can be used to handle large problems. Sparsity can be enforced with little penalty and much improvement in speed. Asymptotic theory of Stein supports these results. Some open issues: Companion efficiency in estimating the covariance model. Relationship to the spatial Kalman filter.