Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Similar documents
Geostatistical Modeling for Large Data Sets: Low-rank methods

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Hierarchical Modelling for Univariate Spatial Data

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Introduction to Spatial Data and Models

Stochastic Spectral Approaches to Bayesian Inference

Chapter 4 - Fundamentals of spatial processes Lecture notes

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Introduction to Spatial Data and Models

Hierarchical Modelling for Univariate Spatial Data

Bayesian Transformed Gaussian Random Field: A Review

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

MLES & Multivariate Normal Theory

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Statistics & Data Sciences: First Year Prelim Exam May 2018

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Overview of Spatial Statistics with Applications to fmri

Extreme Value Analysis and Spatial Extremes

Linear Methods for Prediction

Cross-covariance Functions for Tangent Vector Fields on the Sphere

Hierarchical Modeling for Univariate Spatial Data

Solving the steady state diffusion equation with uncertainty Final Presentation

Gaussian predictive process models for large spatial data sets.

Stat 5101 Lecture Notes

Non-gaussian spatiotemporal modeling

Polynomial Chaos and Karhunen-Loeve Expansion

arxiv: v4 [stat.me] 14 Sep 2015

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Positive Definite Functions on Spheres

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Analysing geoadditive regression data: a mixed model approach

Introduction to Geostatistics

Part 6: Multivariate Normal and Linear Models

Identifiability problems in some non-gaussian spatial random fields

The problem is to infer on the underlying probability distribution that gives rise to the data S.

Nonstationary cross-covariance models for multivariate processes on a globe

A Complete Spatial Downscaler

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Non-stationary Cross-Covariance Models for Multivariate Processes on a Globe

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Covariance function estimation in Gaussian process regression

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Solving the Stochastic Steady-State Diffusion Problem Using Multigrid

Point-Referenced Data Models

CS281 Section 4: Factor Analysis and PCA

Basics of Point-Referenced Data Models

Machine Learning 2nd Edition

STA 294: Stochastic Processes & Bayesian Nonparametrics

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

Nearest Neighbor Gaussian Processes for Large Spatial Data

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Chapter 5 Prediction of Random Variables

Wrapped Gaussian processes: a short review and some new results

Gaussian Process Regression

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

Simulating with uncertainty : the rough surface scattering problem

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

GAUSSIAN PROCESS REGRESSION

Student-t Process as Alternative to Gaussian Processes Discussion

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Nonparametric Bayesian Methods

Matrices and Multivariate Statistics - II

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

LECTURE 16 GAUSS QUADRATURE In general for Newton-Cotes (equispaced interpolation points/ data points/ integration points/ nodes).

A Stochastic Collocation based. for Data Assimilation

Chapter 4 - Fundamentals of spatial processes Lecture notes

Linear Methods for Prediction

Fast Algorithm for Computing Karhunen-Loève Expansion

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Fast Direct Methods for Gaussian Processes

Regression diagnostics

Metropolis-Hastings Algorithm

Multiple Random Variables

10708 Graphical Models: Homework 2

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:

Multivariate spatial modeling

MIT Spring 2015

Copula modeling for discrete data

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Numerical methods for the discretization of random fields by means of the Karhunen Loève expansion

Model Selection for Geostatistical Models

Practicum : Spatial Regression

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Modeling Real Estate Data using Quantile Regression

The Expectation Maximization or EM algorithm

Multivariate Random Variable

Multivariate Analysis and Likelihood Inference

Joint Factor Analysis for Speaker Verification

Transcription:

TTU, October 26, 2012 p. 1/3 Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes Hao Zhang Department of Statistics Department of Forestry and Natural Resources Purdue University zhanghao@purdue.edu

TTU, October 26, 2012 p. 2/3 Outline Motivations of studying KL expansion in spatial statistics Numerical algorithms Some examples and results Applications in multivariate spatial processes

Spatial data Y (s i ),s i D R 2 or R 3. Correlated Massive Satellite data Remotely sensed data Censor network Multi-source X2 3000 3500 4000 4500 5000 5500 1000 0 1000 2000 3000 X1 TTU, October 26, 2012 p. 3/3

TTU, October 26, 2012 p. 4/3 Spatial process Review the spatial data as a realization of an underlying spatial process: data=mean+error, mean function=deterministic, linear function of some covariates (i.e., lat and long). error process=2nd order process, often assumed to be stationary. Spatial correlation attributable to unobserved latent variables; accounting for the spatial variation unaccounted for by the mean. I assume the mean is a constant (known or unknown).

TTU, October 26, 2012 p. 5/3 Covariogram and Kriging Covariogram: C(s 1,s 2 ) = Cov(Y (s 1 ),Y (s 2 )). Stationary if C(s 1,s 2 ) = C(0,s 2 s 1 ). Isotropic if C(s 1,s 2 ) = C( s 1 s 2 ). Kriging=Best Linear Prediction: Given Y (s 1 ),,Y (s n ), n Ŷ (s 0 ) = E(Y (s 0 )) + α i (Y (s i ) EY (s i )) i=1 = E(Y (s 0 )) + Cov(Y (s),y )[V ar(y )] 1 (Y EY ).

TTU, October 26, 2012 p. 6/3 Large covariance matrix n is hundreds of thousands or millions. Data are correlated The inverse of V n n is needed in the full MLE or Bayesian inferences, and kriging, rending the traditional methods impractical: L n (θ) = n 2 log(2π) 1 2 log V n(θ) 1 2 (Y n Xβ) Vn 1 (θ)(y n Xβ), The inverse matrix is needed for prediction.

TTU, October 26, 2012 p. 7/3 Large storage and memory For an n n matrix, the double precision storage is 8n 2 bites. When n = 10 4, the storage is 800MB! When n = 10 6, the storage is 8000GB! Computation for V 1 n n is O(n 3 ) unless V n n has a special structure. Therefore, for massive spatial data, traditional geostatistical methods may not be applicable and innovative methods are need.

TTU, October 26, 2012 p. 8/3 Low rank models Y (s) = µ + a(s) Z + ǫ(s), (1) where a(s) = (a 1 (s),,a m (s)) is a function of s that may depend on some parameters θ, Z = (Z(u 1 ),,Z(u m )) is a vector of latent variables with u i s pre-chosen, and ǫ(s) is a white noise with variance τ 2 and is independent of Z. Gaussian process convolution: a j (s) = exp( s u j 2 /φ)/ 2π, Z(u i ) s are i.i.d. Fixed rank kriging: a j (s) = b( s u j 2 /φ) for some known basis function b and Z = (Z(u 1 ),,Z(u m )) is a realization of a latent process at locations u j, j = 1,,m. Predictive process: a(s) = V ar(z) 1 Cov(Z,Z(s)) where Z(s) is a Gaussian stationary process with mean 0 and some parametric covariogram.

TTU, October 26, 2012 p. 9/3 Covariance matrix of low rank Observing Y = (Y (s i ),i = 1,,n) from the low rank model, write Y = µ1 + AZ + ǫ. The covariance matrix of Y is V = AV z A + τ 2 I n, where A is n m and V z = V ar(z). Woodbury Formula: V 1 = τ 2 (I n A(τ 2 V 1 z + A A) 1 A ). Memory requirement is 8(n + nm + m 2 ) and linear in n. Sylvester s determinant theorem: V = τ 2(n m) V z τ 2 V 1 z + A A.

TTU, October 26, 2012 p. 10/3 Optimal low rank models Given a random field Y (s),s D, consider the best low rank model, i.e., to minimize D E ( Y (s) ) 2 m a k (s)z k ds, k=1 over all functions a k (s) and random variables Z k. Solution: a k (s) = λ k f k (s) where λ k and f k (s), k = 1,,m are the first m largest eigenvalues and the corresponding eigenfunctions of the Karhunen-Loeve expansion of the random field.

TTU, October 26, 2012 p. 11/3 K-L expansion Y (s) = λk f k (s)z k,s D k=1 where λ 1 λ 2 0, Z k are white noises of unit variance and f j (s)f k (s)ds = δ jk. D Hence C(s,x) = λ k f k (s)f k (x) k=1 and D C(s,x)f k (x)dx = λ k f k (s).

TTU, October 26, 2012 p. 12/3 Truncated KL D E D { E Y (s) { } 2 m λk f k (s)z k ds k=1 Y (s) } 2 m a k (s)z k ds for any functions a k (s) and zero-mean random variables Z k where D is bounded. k=1

TTU, October 26, 2012 p. 13/3 Computation of eigenpairs Given a covariance function C(s,x),x,s D R d, find λ > 0 and function f(x) such that Approximation: D C(s,x)f(x)dx = λf(s). Choose a basis {φ i,i = 1,,m} L 2 (D). Approximate an eigenfunction by f(s) = m i=1 d iφ i (s), so that D C(s,x)f(x)dx λf(s) is orthogonal to each basis function φ i (x).

TTU, October 26, 2012 p. 14/3 Galerkin approximation Finding λ and d: D ( m d i i=1 D C(s, x)φ i (s)dx λ ) m d i φ i (s) φ j (s)ds = 0, for j = 1,, m, i=1 Gd = λbd, (2) where G is m m whose (i,j) element is D D φ i(s)c(s,x)φ j (x)dsdx, B is m m whose (i,j) element is D φ i(s)φ j (s)ds, d = (d 1,,d m ). Solve the generalized eigenvalue problem (2) to get λ and d. The m 2 elements of G are each 4-dimensional integrals. Ghanem and Spanos, 1991, Schwab and Todor, 2005, Phoon, et al. 2002, Huang, et al., 2001

TTU, October 26, 2012 p. 15/3 Gaussian Quadrature Method Gaussian Legendre quadrature: 1 1 g(x)dx m r i g(x i ). i=1 The approximation is exact for polynomials of order (2m 1) or less. The nodes x i are the roots of Legendre polynomial of m order and the weights r i also calculated from the Legendre polynomial. Extends to high dimension: 1 1 1 1 g(y 1,y 2 )dy 1 dy 2 m i=1 m r i r j g(x i,x j ) j=1

TTU, October 26, 2012 p. 16/3 Example of Nodes Nodes, m=20 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0

TTU, October 26, 2012 p. 17/3 Gaussian Quadrature Method for KL Given C(s,x), s,x [ 1,1] d (d = 2), find λ and f(x) such [ 1,1] d C(s,x)f(x)dx = λf(x). Let u i, i = 1,...,m be the m roots of Legendre polynomial and u i, i = 1,...,m 2 denotes the pairs (u j,u k ),j,k = 1,...,m. Find f = (f(u 1 ),...,f(u m 2)) by solving m 2 j=1 where w j = u k u l if u j = (u k,u l ). w j C(u i,u j )f(u j ) = λf(u i )

TTU, October 26, 2012 p. 18/3 Gaussian Quadrature Method for KL Write V = (C(u i,u j )),W = diag(w 1,...,w m 2). Then V Wf = λf A generalized eigenvalue problem. Hence λ is an eigenvalue of W 1/2 V W 1/2 and W 1/2 f is the corresponding eigenvector.

TTU, October 26, 2012 p. 19/3 Polynomial Interpolation Having obtained f, we now have f(u i ). Now approximate the eigenfunction by a polynomial through the Lagrange interpolation.

TTU, October 26, 2012 p. 20/3 Example 1 covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 distance (a) distance (b) distance (c) covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 distance (d) distance (e) Top: Gaussian quadrature; Bottom: Galerkin projection. distance (f)

TTU, October 26, 2012 p. 21/3 Example 2 covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 distance (a) distance (b) distance (c) covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.0 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile covariance 0.2 0.4 0.6 0.8 1.0 True Function 5%quantile median 95%quantile 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 distance (d) distance (e) Top: Gaussian quadrature; Bottom: Galerkin projection. distance (f)

TTU, October 26, 2012 p. 22/3 Simulation study Objective: To compare the predictive performance of the different low-rank approximations True model: Gaussian process Y (s),s [0,1] 2 with mean 0 and exponential covariance function exp( h/0.3). We use three low-rank models to approximate the true model: KL by Galerkin projection (GPKL), KL by Gaussian quadrature (GQKL) and the predictive process (Banerjee, et al., 2008). Simulate from the true model on a 81*81 lattice, but use each of the low rank model to estimate model parameters.

TTU, October 26, 2012 p. 23/3 Predictive Performance MSE: (1/n) (Y (s i ) Ŷ (s i )) 2. Logarithmic score (LogS): Let Z i = (Y (s i ) Ŷ (s i))/σ i (s i ). 1 n n i=1 Continuous ranked probability score: ( 1 2 log(2πˆσ i(s i ) 2 ) + 1 2 Z2 i ) crps(f,x) = (F(y) 1{y x}) 2 dy. CRPS = 1 n Gneiting, et al., 2006, 2007. n crps(f i,y (s i )). i=1

(a) MSE scores(b) CRPS scores(c) LogS scores using 150 eigen TTU, October terms 26, 2012 p. 24/3 150 eigen terms MSE 1.20 1.30 1.40 GQKL PP GPKL CRPS 1.08 1.12 1.16 GQKL PP GPKL 0 10 20 30 40 50 0 10 20 30 40 50 (a) (b) LogS 1.50 1.54 1.58 GQKL PP GPKL 0 10 20 30 40 50 (c)

(a) MSE scores(b) CRPS scores(c) LogS scores using 300 eigen TTU, October terms 26, 2012 p. 25/3 300 eigen terms MSE 1.16 1.18 1.20 1.22 1.24 GQKL PP GPKL CRPS 1.07 1.09 1.11 GQKL PP GPKL 0 10 20 30 40 50 0 10 20 30 40 50 (a) (b) LogS 1.49 1.50 1.51 1.52 1.53 GQKL PP GPKL 0 10 20 30 40 50 (c)

(a) MSE scores(b) CRPS scores(c) LogS scores using 300 eigen TTU, October terms 26, 2012 p. 26/3 600 eigen terms MSE 1.13 1.15 1.17 GQKL PP GPKL CRPS 1.065 1.075 1.085 GQKL PP GPKL 0 10 20 30 40 50 0 10 20 30 40 50 (a) (b) LogS 1.480 1.490 1.500 GQKL PP GPKL 0 10 20 30 40 50 (c)

TTU, October 26, 2012 p. 27/3 Multivariate covariogram Two processes Y i (s), i = 1,2. Marginal or direct covariogram: C ii (s,x) = Cov(Y i (s),y i (x)). Cross covariogram: C ij (s,x) = Cov(Y i (s),y j (x)), i j. The matrix valued function C(s,x) = (C ij (, )) must satisfy: For any s 1,,s n, i j α i C(s i,s j )α j 0 for any vectors α i R p. It is not trivial to specify a cross covariogram.

TTU, October 26, 2012 p. 28/3 Some existing models Proportional model: C ij (s,x) = σ ij ρ(s,xx). Linear model of coregionalization: Sum of proportional models. Convolution models: Gaspari and Cohn (1999), Gelfand et al. (2004) Asymmetric models: Apanasovich and Genton (2009), Li and Zhang (2010) Multivariate Matern model: Gneiting et al. (2010)

TTU, October 26, 2012 p. 29/3 Limitation of existing models Estimation of parameters becomes more complex; constrained maximization. Do not allow arbitrary marginal covariogram models. There are cases when the multivariate covariogram model does not improve over the univariate model in predictive performance.

TTU, October 26, 2012 p. 30/3 Valid cross covariogram for given marginals The problem: Given two marginal covariograms C ii (s,x), s,x R d, i = 1,2,, find the class of all valid cross covariograms. A theoretical solution: Apply the Karhunen-Loeve expansion to the marginals to write C ii (s, x) = k=1 λ ik f ik (s)f ik (x), s, x D where λ k is a decreasing sequence of positive numbers and D f ik(s) 2 ds = 1 and D is a compact subset of R d. The class of all possible valid cross covariograms is C ij (s, x) = r kl λ ik λ jl f ik (s)f jl (x) k=1 l=1 where r kl [ 1,1] are such that the matrix (r kl ) n k,l=1 cross covariograms are given by such an array {r kl }. is non-negative definite for any n. Hence all

TTU, October 26, 2012 p. 31/3 Non-parametric cross covariogram Consider a bivariate process. Write the KL expansion for each process Y i (s) = λik f ik (s)z ik,s D, k=1 where Z ik,k = 1,2, are a white noise (i = 1,2). In a special case, when Z 1j and Z 2k are either uncorrelated or perfectly correlated, we get the non-parametric cross covariogram C 12 (s,x) = j,k:cov(z 1j,Z 2k )=1 λ1j λ 2k f 1j (s)f 2k (x).

TTU, October 26, 2012 p. 32/3 Two practical steps For the non-parametric cross covariogram to work in practice, we need An efficient algorithm to computer the eigenpairs (λ j,f j (s)) for any given covariogram. Find the pairs (j,k) such that Cov(Z 1j,Z 2k ) = 1.

TTU, October 26, 2012 p. 33/3 Inferences Estimate parameters in each marginal model using only marginal data; Since cross covariogram involves no new parameters, we only need to identify the perfectly correlated pairs of Z 1j and Z 2k. This is equivalent to specifying the correlation matrix Corr(Z 1, Z 2 ) for Z i = (Z i1,, Z im ) which has at most m 1s with the other elements being 0. Start with Corr(Z 1, Z 2 ) = 0. Replace one 0 element with 1 to optimize some objective function (i.e., likelihood, prediction score) Fixing the non-zero elements, replace one more 0 element with 1 to optimize the objective function. Iterate. The non-parameter cross covariogram leads to a much simplified estimation procedure.

TTU, October 26, 2012 p. 34/3 An example Wang and Zhang (2012): Let W i (s), s [0,1] 2 be two latent processes with mean 0 and have a bivariate exponential covariogram, i.e., Cov(W i (s + h),w j (s)) = σ ij exp( h /φ ij ). We observe the following process Y i (s) = W i (s) + τ i ǫ i (s) at the 1225 equally spaced locations {(i/35, j/35), i, j = 1,, 35}. The model parameters for the covariogram are chosen to be: φ 11 = 0.2,φ 22 = 0.3,φ 12 = 0.2,σ 12 = 5.4, σ 11 = σ 22 = 9 and τ 2 1 = τ2 2 = 1.

TTU, October 26, 2012 p. 35/3 Model fitting Fit two models: The true model and the truncated KL expansion with m=600 terms. For the true model, we estimate the parameters by maximizing the joint likelihood. For the truncated KL model, we estimate the marginal parameters by maximizing the marginal likelihood and identify the perfectly correlated pairs by maximizing the joint likelihood.

TTU, October 26, 2012 p. 36/3 Predictive performance Predictive scores: RMSE and logarithmic score: MSE (i) = 1 n LogS (i) = 1 n n ( Yi (s k ) Ŷi(s k ) ) 2 k=1 n j=1 ( 1 2 log(2πˆσ2 ij) + 1 ) 2 z2 ij,i = 1,2 where Ŷ i (s k ) denotes the cokriging prediction of Y i (s k ) using data at all locations but s k ; σ ik the corresponding cokriging standard deviation and z ik = (Y i (s k ) Ŷ i (s k ))/σ ik. Among 100 simulations, the non-parametric cross covariogram model beats the fitted bivariate parametric model in each simulation according to the two predictive scores.