On the smallest eigenvalues of covariance matrices of multivariate spatial processes

Similar documents
Covariance function estimation in Gaussian process regression

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Stein s Method and Characteristic Functions

A CLASS OF SCHUR MULTIPLIERS ON SOME QUASI-BANACH SPACES OF INFINITE MATRICES

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

I = i 0,

Boundedly complete weak-cauchy basic sequences in Banach spaces with the PCP

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

MINIMAL NORMAL AND COMMUTING COMPLETIONS

Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data

Estimation de la fonction de covariance dans le modèle de Krigeage et applications: bilan de thèse et perspectives

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

A matrix over a field F is a rectangular array of elements from F. The symbol

Here is an example of a block diagonal matrix with Jordan Blocks on the diagonal: J

MATH 590: Meshfree Methods

Foundations of Matrix Analysis

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen

1 Isotropic Covariance Functions

Lecture 2 INF-MAT : A boundary value problem and an eigenvalue problem; Block Multiplication; Tridiagonal Systems

Z-Pencils. November 20, Abstract

Consistent Bivariate Distribution

Linear algebra. S. Richard

Comment on Article by Scutari

Fixed-domain Asymptotics of Covariance Matrices and Preconditioning

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Diagonalization by a unitary similarity transformation

ELEC633: Graphical Models

Smoothness of conditional independence models for discrete data

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

Iterative Methods for Solving A x = b

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Chapter 7 Iterative Techniques in Matrix Algebra

Parameter estimation in linear Gaussian covariance models

National Sun Yat-Sen University CSE Course: Information Theory. Maximum Entropy and Spectral Estimation

CANONICAL LOSSLESS STATE-SPACE SYSTEMS: STAIRCASE FORMS AND THE SCHUR ALGORITHM

ON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Numerical Methods in Matrix Computations

Algebra C Numerical Linear Algebra Sample Exam Problems

The Variational Gaussian Approximation Revisited

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

ELEMENTARY LINEAR ALGEBRA

arxiv:quant-ph/ v1 22 Aug 2005

Gaussian Models (9/9/13)

Linear Algebra Massoud Malek

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Queens College, CUNY, Department of Computer Science Numerical Methods CSCI 361 / 761 Spring 2018 Instructor: Dr. Sateesh Mane.

Lecture notes: Applied linear algebra Part 1. Version 2

Effective Sample Size in Spatial Modeling

Numerical Analysis Comprehensive Exam Questions

2 Two-Point Boundary Value Problems

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Lecture 8 : Eigenvalues and Eigenvectors

Climate Change: the Uncertainty of Certainty

Principles of Real Analysis I Fall VII. Sequences of Functions

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

Unit Roots in White Noise?!

Lecture 4: Numerical solution of ordinary differential equations

Digital Image Processing

Trace Inequalities for a Block Hadamard Product

Consistency of the maximum likelihood estimator for general hidden Markov models

CAAM 454/554: Stationary Iterative Methods

STA 4273H: Statistical Machine Learning

Non white sample covariance matrices.

The spectra of super line multigraphs

Australian Journal of Basic and Applied Sciences, 5(9): , 2011 ISSN Fuzzy M -Matrix. S.S. Hashemi

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

Linear Operators Preserving the Numerical Range (Radius) on Triangular Matrices

Semidefinite Programming

The Geometry of Semidefinite Programming. Bernd Sturmfels UC Berkeley

Part III. 10 Topological Space Basics. Topological Spaces

Coregionalization by Linear Combination of Nonorthogonal Components 1

Lecture I: Asymptotics for large GUE random matrices

Inference For High Dimensional M-estimates. Fixed Design Results

ELEMENTARY LINEAR ALGEBRA

Some Background Material

On Expected Gaussian Random Determinants

ANSWER TO A QUESTION BY BURR AND ERDŐS ON RESTRICTED ADDITION, AND RELATED RESULTS Mathematics Subject Classification: 11B05, 11B13, 11P99

MATH 205C: STATIONARY PHASE LEMMA

b jσ(j), Keywords: Decomposable numerical range, principal character AMS Subject Classification: 15A60

On positive positive-definite functions and Bochner s Theorem

Math212a1413 The Lebesgue integral.

Statistical inference on Lévy processes

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Lecture 7 Introduction to Statistical Decision Theory

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Gaussian Random Field: simulation and quantification of the error

Eigenvalues and Singular Values of Random Matrices: A Tutorial Introduction

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Dynamic System Identification using HDMR-Bayesian Technique

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

Review of some mathematical tools

Transcription:

On the smallest eigenvalues of covariance matrices of multivariate spatial processes François Bachoc, Reinhard Furrer Toulouse Mathematics Institute, University Paul Sabatier, France Institute of Mathematics and Institute of Computational Science, University of Zurich, Switzerland February 4, 206 Abstract: There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum likelihood estimators for multivariate spatial processes have not received their deserved attention yet. In this article we consider the classical increasing-domain asymptotic setting restricting the minimum distance between the locations. Then, one of the main components to be studied from a theoretical point of view is the asymptotic positive definiteness of the underlying covariance matrix. Based on very weak assumptions on the matrix covariance function we show that the smallest eigenvalue of the covariance matrix is asymptotically bounded away from zero. Several practical implications are discussed as well. Keywords:increasing-domain asymptotics; matrix covariance function; maximum likelihood; multivariate process; spectral representation; spectrum Introduction The motivation of this work is the question of what simple conditions on the sampling design and covariance functions have to be imposed for a particular likelihood estimator to be consistent or, say, asymptotically normal, in the setting of multivariate spatial processes. More precisely, let { Zk (s) : s D R d, k p } () be a multivariate stationary random process, for some fixed d N + and p N +. Without loss of generality, we assume that the process () has zero mean and a matrix covariance

function of the form C(h) = {c kl (h)} k,l p. The diagonal elements c kk are called direct covariance (functions) and the off-diagonal entries c kl, k l, are called cross covariance (functions). Consider p sets (x () i ) i n,..., (x (p) i ) i np of points in D. In the geostatistical literature, we often denote these points as locations and assume that each Z k (s) is observed at (x (k) i ) i nk. One important problem is to estimate the matrix covariance function from these observations. One common practice is to assume that this function belongs to a given parametric set, and to estimate the corresponding covariance parameters by maximum likelihood approaches. practice, with the availability of massive computing power. These approaches are now largely implementable in In contrast, finite sample and asymptotic properties of these approaches for multivariate processes are still on the research agendas. Currently, the majority of the asymptotic results are specific to the univariate case, where d = and n = n. While two types of asymptotic settings exist, fixed-domain and increasing-domain [Ste99], we shall focus throughout this article on the increasingdomain setting. In a seminal article, [MM84] give conditions which warrant consistency and asymptotic normality of the maximum likelihood estimator. Additional results and conditions are provided in [Bac4] for the maximum likelihood estimator and in [SR2] for the tapered maximum likelihood estimator. A central quantity in these three references is the covariance matrix of the observation vector. In particular, a necessary condition for the results above is that the smallest eigenvalue of this covariance matrix be bounded away from zero as n. This seemingly difficult-to-check condition is guaranteed by much simpler and non-restrictive conditions applying only on the covariance function, provided that there exists a fixed minimal distance between any two different observation points, as shown by [Bac4]. The picture is similar in the multivariate setting (d > ), but currently incomplete. The recent references [BFG + 5] and [FBD5] provide asymptotic results for maximum likelihood approaches, and require that the smallest eigenvalue of the covariance matrix of the observations be bounded away from zero as n. However, this condition has currently only been shown to hold for some specific covariance functions and regular grids, see [BFG + 5]. In this article, we show that this condition holds in general, provided that the minimal distance between two different observation points of the same process - x (k) i and x (k) j for i j - is bounded away from zero as n,..., n p, and provided that the matrix covariance function satisfies certain non-restrictive conditions generalizing those of [Bac4]. While the starting argument is similar as in Proposition D.4 of [Bac4] the proof here requires different and more elaborate tools based on complex matrices and Fourier functions. Our approach allows a proof in both cases, collocated and non-collocated observations. We also show that the lower bound we provide can be made uniform over parametric families of matrix covariance functions. We perceive this article as the first step towards a rigorous 2

analysis of the asymptotic properties of various maximum likelihood type estimators for multivariate spatial processes in the framework of increasing-domain asymptotics. The next section introduces the notation and states the main result. Due to the interesting tools used in the proof, it is of interest on its own and we have kept it in the main part of the article. Section 3 concludes with some remarks. 2 Notation and main result For x C m, we let x = max i m x i, where z is the modulus of a complex number z. Recall that any Hermitian complex matrix M of size m m has real eigenvalues that we write λ (M) λ m (M). For a complex vector v of size m, we let v = v t be the transpose of its conjugate vector and we let v 2 = v v. The next assumption is satisfied for most standard covariance and cross covariance functions. Assumption. There exists a finite fixed constant A > 0 and a fixed constant τ > 0 so that the functions c kl satisfy, for all x R d, c kl (x) A. (2) + x d+τ We define the Fourier transform of a function g : R d R by ĝ(f ) = (2π) d R d g(x )e ıf x dx, where ı 2 =. Then, from (2), the covariance functions c kl have Fourier transforms ĉ kl that are continuous and bounded. Also, note that, for any f R d, Ĉ(f ) = {ĉ kl(f )} k,l p is a Hermitian complex matrix, that have real non-negative eigenvalues 0 λ {Ĉ(f )} λ p {Ĉ(f )}. We further assume the following. Assumption 2. We have Also, we have 0 < λ {Ĉ(f)} for all f Rd. c kl (x) = ĉ kl (f)e ıf x df. R d (3) The condition (3) is very weak and satisfied by most standard covariance and cross covariance functions. On the other hand, the condition 0 < λ {Ĉ(f )} for all f Rd is less innocuous, and is further discussed in Section 3. Let d N + and p N + be fixed. Consider p sequences (x () i ) i N +,..., (x (p) i ) i N + of points in R d, for which we assume the following. Assumption 3. There exists a fixed > 0 so that for all k, inf i,j N + ;i j x (k) i x (k) j. For all n,..., n p N +, let, for 0 k p, N k = n + + n k, with the convention that N 0 = 0. Let also N = N p. Note that N,..., N p depend on n,..., n p but that we do not explicitly write this dependence for concision. Then, let Σ (also depending on n,..., n p ) be the N N covariance matrix, filled as follows: For a = N k + i and b = N l + j, with k, l p, i n k and j n l, σ ab = c kl (x (k) i x (l) j ). Our main result is the following. 3

Theorem 4. Assume that Assumptions, 2 and 3 are satisfied. Then, we have inf λ (Σ) > 0. n,...,n p N + Proof of Theorem 4. From the proof of Proposition D.4 in [Bac4], there exists a function ĥ : R d R + that is C with compact support [0, ] d and so that there exists a function h : R d R which satisfies h(0) > 0, with the notation of (2), and h(x ) A + x d+τ τ > 0, h(x ) = ĥ(f )e ıf x df. R d It can be shown, similarly as in the proof of Lemma D.2 in [Bac4], that there exists a fixed 0 < δ < so that sup k p,n k N +, i n k j n k ;j i { h δ ( x (k) i x (k) j ) } h (0). (4) 2 For all k p, using Gershgorin circle theorem, the eigenvalues of the n k n k symmetric matrix [ h { δ ( x (k) i x (k) )}] j belong to the balls with center h(0) and i,j n k radius { ( (k) )} j n k,j i h δ x i x (k). j Thus, because of (4), these eigenvalues belong to the segment [h(0) (/2)h(0), h(0) +(/2)h(0)] and are larger than (/2)h(0). Hence, the N N matrix T defined as being block diagonal, with p blocks and with block k equal to [ h { δ ( x (k) i x (k) )}] j, also has eigenvalues larger than (/2)h(0). i,j n k Consider now a real vector v of size N. Then, because T has eigenvalues larger than (/2)h(0), n k 2 h (0) v Nk +i 2 k= i= n k k= i,j= { v Nk +iv Nk +jh δ ( x (k) i x (k) ) } j. (5) Now, define the Np real vector w as follows: The N first components are the components N + to 2N are 0,..., 0, n times and so on, until the N last components are v,..., v n, 0,..., 0, N n times v n +,..., v n +n 2, 0,..., 0, N n n 2 times 0,..., 0, v Np +,..., v Np +n p. N p times 4

Let also {s,..., s N } be defined as {x (),..., x () n, x (2),..., x (2) n 2,..., x (p),..., x (p) n p }. Then we have, from (5), n k 2 h (0) v Nk +i 2 k= i= N w (k )N+i w (k )N+j h {δ(s i s j )}. k= i,j= Let, for i N, w i be the p vector (w (k )N+i ) k p. Then, we can write n k 2 h (0) v Nk +i 2 k= i= N i,j= k= = δ d R d ĥ = ( f ĥ δ R d δ d w (k )N+i w (k )N+j δ d R d ĥ ( ) f e ıf (s i s j ) df δ ( ) f N (e ıf s i w i ) (e ıf s j w j )df δ i,j= ) N 2 e ıf s i w i df. (6) Let E be the set of f so that ĥ (f /δ) is non-zero. Then E is bounded since ĥ has a compact support. Hence, because the Hermitian matrix Ĉ(f ) has strictly positive eigenvalues for all f R d and is continuous, we have inf f E λ {Ĉ(f )} > 0. Also, because ĥ is continuous, sup f E ĥ(f /δ) < +. Hence, there exists a fixed δ 2 > 0 so that for any z C p, f R d, z Ĉ(f )z δ 2 ĥ (f /δ) z 2. Hence, from (6), 2 h (0) k= n k i= This concludes the proof. i= v Nk +i 2 ( N ) ( N ) e ıf s i w δ d i Ĉ(f ) e ıf s i w i df δ 2 R d i= i= N = = = δ d δ 2 δ d δ 2 i,j= k,l= k,l= i,j= δ d δ 2 v t Σv. w (k )N+i w (l )N+j R d ĉ kl (f )e ıf (s i s j ) df N w (k )N+i w (l )N+j c kl (s i s j ) We now extend Theorem 4 to the case of a parametric family of covariance and cross covariance functions {c θkl (h)} k,l p, indexed by a parameter θ in a compact set Θ of R q. Let Ĉθ(f ) be as Ĉ(f ) with {c kl(h)} k,l p replaced by {c θkl (h)} k,l p. The proof of the next theorem is identical to that of Theorem 4, up to more cumbersome notations, and is omitted. Theorem 5. Assume that, for all θ Θ, the functions {c θkl (h)} k,l p satisfy Assumption, where A and τ can be chosen independently of θ. Assume that (3) holds with {c kl (h)} k,l p replaced by {c θkl (h)} k,l p, for all θ Θ. Assume that Ĉθ(f) is jointly 5

continuous in f and θ and that λ {Ĉθ(f)} > 0 for all f and θ. Assumption 3 is satisfied. Assume finally that Then, with Σ θ being as Σ with {c kl (h)} k,l p replaced by {c θkl (h)} k,l p, we have inf λ (Σ θ ) > 0. θ Θ,n,...,n p N + The lower bound provided by Theorem 5 is typically assumed in the references [SR2], [BFG + 5] and [FBD5]. 3 Concluding remarks Note that Theorem 4 is typically applicable for models of stochastic processes with discrete definition spaces, such as time series or Gauss-Markov random fields. Indeed, these models incorporate a fixed minimal distance between any two different observation points. It is well-known (e.g. Wac03) that covariance and cross covariance functions {c kl (h)} k,l p satisfy λ {Ĉ(f )} 0 for all f. It is also known that if λ{ĉ(f )} > 0 for almost all f R d, then λ (Σ) > 0 whenever the points (x (k) i ) i nk are two-by-two distinct for all k. We believe that this latter assumption on λ {Ĉ(f )} is nevertheless generally insufficient for Theorem 4 to hold. Indeed, consider for illustration the univariate and unidimensional case with the triangular covariance function, that is d =, p = and c(h) = c (h) = ( h ) { h }. Then (see, e.g., Ste99), ĉ(f) = sinc 2 (f/2)/(2π) has a countable number of zeros. Consider now the sequence of observation points (x () i ) i N + = (x i ) i N + = (i/2) i N +. Then, Σ is a tridiagonal Toeplitz matrix with σ = and σ 2 = σ 2 = /2. Hence, see e.g., [NPR3], for any value of n = n, the eigenvalues of Σ are [ + cos{(iπ)/(n + )}] i n. Thus, although λ (Σ) = + cos{(nπ)/(n + )} is strictly positive for any n, it goes to 0 as n. Hence, Assumption 2, stating that λ {Ĉ(f )} is strictly positive for all f, appears to be generally necessary for Theorem 4 to hold. In order to derive increasing-domain asymptotic properties for covariance tapering (e.g. in SR2 or BFG + 5), one typically assumes that the smallest eigenvalues of tapered covariance matrices are lower bounded, uniformly in n,..., n p. A tapered covariance matrix is of the form Σ A, where is the Schur (component by component) product and where A is symmetric positive semi-definite with diagonal components. Because of the relation λ (Σ A) (min i A ii )λ (Σ) (see HJ9, Theorem 5.3.4), Theorem 4 directly provides a uniform lower bound for the smallest eigenvalues of tapered covariance matrices. Acknowledgment:Reinhard Furrer acknowledges support of the UZH Research Priority Program (URPP) on Global Change and Biodiversity and the Swiss National Science Foundation SNSF-43282. 6

References [Bac4] F. Bachoc. Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes. Journal of Multivariate Analysis, 25(): 35, 204. 2, 4 [BFG + 5] M. Bevilacqua, A. Fassò, C. Gaetan, E. Porcu, and D. Velandia. Covariance tapering for multivariate Gaussian random fields estimation. Statistical Methods & Applications, pages 7, 205. 2, 6 [FBD5] R. Furrer, F. Bachoc, and J. Du. Asymptotic properties of multivariate tapering for estimation and prediction. http: // arxiv. org/ abs/ 506. 0833, 205+. 2, 6 [HJ9] R. Horn and C. Johnson. Topics in matrix analysis. Cambridge University Press, Cambridge, 99. 6 [MM84] K.V. Mardia and R.J. Marshall. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 7:35 46, 984. 2 [NPR3] S. Noschese, L. Pasquini, and L. Reichel. Tridiagonal Toeplitz matrices: properties and novel applications. Numerical Linear Algebra with Applications, 20(2):302 326, 203. 6 [SR2] B.A. Shaby and D Ruppert. Tapered covariance: Bayesian estimation and asymptotics. Journal of Computational and Graphical Statistics, 2(2):433 452, 202. 2, 6 [Ste99] M.L Stein. Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York, 999. 2, 6 [Wac03] H. Wackernagel. Multivariate Geostatistics. Springer, Berlin, 2003. 6 7