Nonstationary cross-covariance models for multivariate processes on a globe

Size: px

Start display at page:

Download "Nonstationary cross-covariance models for multivariate processes on a globe"

Amice Ball
6 years ago
Views:

1 Nonstationary cross-covariance models for multivariate processes on a globe Mikyoung Jun 1 April 15, 2011 Abstract: In geophysical and environmental problems, it is common to have multiple variables of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for multivariate spatial processes, in particular, the cross-covariance models. On the other hand, many data sets these days cover a large portion of the Earth such as satellite data, which require valid covariance models on a globe. We present a class of parametric covariance models for multivariate processes on a globe. The covariance models are flexible in capturing nonstationarity in the data yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output. We compare our model to the multivariate version of the Matérn cross-covariance function and models based on coregionalization and demonstrate the superior performance of our model in terms of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges in modeling the cross-covariance structure of the temperature and precipitation data. Based on the fitted results using full data, we give the estimated cross-correlation structure between the two variables. KEY WORDS: cross-covariance model, linear model of coregionalization, multivariate process, nonstationary process, process on a globe 1 Mikyoung Jun is Assistant Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX ( mjun@stat.tamu.edu). 1

2 1. INTRODUCTION Geophysical or environmental problems routinely involve multiple variables measured at the same spatial location and time point. Often, the main interest is to study the relationships between the multiple variables, which would include an accounting of any spatial and/or temporal correlations. On the other hand, with the advance of science and technology, it is common to have data with global coverage. One good example is the study about the relationship between surface temperature and precipitation (Trenberth and Shea 2005, Tebaldi and Lobell 2008, Tebaldi and Sansó 2009). As stated in Tebaldi and Lobell (2008), from the climate impact research point of view, studying the joint distribution of surface temperature and precipitation is more interesting than studying each variable separately. Trenberth and Shea (2005) estimate the empirical (spatial) cross-correlation between surface temperature and precipitation using the numerical model outputs from the Community Climate System Model version 3 (CCSM3) developed by the National Center for Atmospheric Research (NCAR). However, their estimates are based on sample correlations and, as several authors point out including Bishop and Hodyss (2007), sample correlations often give spurious correlations when the dimension of the system is much larger than the sample size. Therefore, it is essential to develop joint models for the variables that can account for not only the marginal but also the cross-covariance structures and that are valid over the whole globe. A number of authors have developed cross-covariance models for multivariate spatial processes. One of the most traditional methods is the linear model of coregionalization (LMC) (Goulard and Voltz 1992, Wackernagel 2003) and the key idea is to represent each process as a linear combination of latent, independent, and stationary (often isotropic) processes. Schmidt and Gelfand (2003) present a Bayesian stationary cross-covariance model based on the idea of the LMC. Gelfand, Schmidt, Banerjee, and Sirmans (2004) provide a good review of the history of the methods for multivariate processes and Schmidt and Gelfand (2003) extend the model by using a spatially varying LMC to account for nonstationarity. Majumdar and Gelfand (2007) present an approach to modeling stationary processes based on convolving covariance functions. This model is then extended to nonstationary processes in Majumdar, Paul, and Bautista (2010). A semiparametric approach to modeling multivariate spatial processes is proposed in Reich and Fuentes (2007). Their covariance model is nonstationary but has a separable structure; the cross-covariance is factored into a multivariate component and a spatial component, which may be limiting in some situations. Choi, Reich, Fuentes, and Davis (2009) use a spatio-temporal version of the LMC model to deal with speciated fine particles over the US with separable covariance functions. 2

3 In terms of developing parametric classes of covariance models for multivariate spatial processes, other than those based on LMC, there has only been a few papers. Apanasovich and Genton (2010) propose using latent dimensions to create a valid covariance model for multivariate processes from a covariance model for univariate process. They present the model for spatio-temporal processes. Their method is convenient to produce valid covariance models for multivariate processes but they assume stationarity (although in principle their method can be used for nonstationary processes). They introduce a concept of distance between different processes, which is different from the usual spatial distances or temporal lags and it is not clear what this distance actually means and how it compares with physical distances in the space and time domains. Under their setting, one needs to estimate this distance along with covariance parameters. Gneiting, Kleiber, and Schlather (2010) present a Matérn type covariance model for multivariate processes. Their model is for isotropic processes only. One of the nice features of their model is it allows different smoothness for different processes in the multivariate setting, which can be useful for some data. Note though that their crosscovariance model is symmetric. That is, if we consider a bivariate process, (Z 1, Z 2 ), on locations s 1 and s 2, their model implies Cov{Z 1 (s 1 ), Z 2 (s 2 )} = Cov{Z 2 (s 1 ), Z 1 (s 2 )}, for all s 1 and s 2, which may not be the case in many geophysical and environmental data sets. The co-located correlation parameter ρ, which controls the strength of cross-correlation between the two variables, is constant over the entire domain and this may be too restrictive for some data sets. As demonstrated in Section 4, for the data set that we consider in this paper, this limitation leads to an estimate of ˆρ 0, even though there is a clear dependence between the two variables. Furthermore, the above covariance functions are designed for stationary (or isotropic) processes and none are for processes on a globe. To the best of our knowledge, there is no such flexible nonstationary cross-covariance function for spatial processes on a sphere. Our focus in this paper is to develop cross-covariance functions for multivariate processes on a globe. Moreover, the covariance model is flexible enough to capture nonstationarity in the data and other complex covariance patterns. Our covariance models require moderate numbers of covariance parameters and are thus computationally feasible. The remainder of the paper is organized as follows. In Section 2, we discuss some properties of nonstationary covariance structure. Section 3 presents the construction of our covariance model and discusses some computational issues. The application to the joint modeling of global surface temperature and precipitation data is presented in Section 4. Section 4 shows a comparison of our model with some models proposed in Gneiting et al. (2010) and the LMC models. We also show our estimated cross-correlation between the surface temperature and precipitation and we compare it with the result of Trenberth and Shea (2005). We 3

4 conclude the paper with some discussion in Section NONSTATIONARY COVARIANCE STRUCTURE In this section, we explore some prevalent features of nonstationarity in the covariance structure of geophysical processes on a globe. We discuss those properties for both marginal and crosscovariances in parallel and show some empirical figures from our data (for details of the data, see Section 4.1). Throughout the section, we denote the variable of interest as (Z i (L, l), i = 1, 2,...,N), a multivariate process on the surface of a globe S 2 (the surface of a sphere in R 3 with radius R) and we illustrate for the case when N = 2. Note that L and l denote latitude and longitude, respectively. 2.1 Dependence on latitude It is common for geophysical processes on a globe to have covariance structure depending on latitude (Stein 2007). In particular, the local variation of the process usually changes with latitude and in fact, Jun and Stein (2008) show that variances of several linear combinations of total column ozone level exhibit strong dependence on latitude. Surface temperature and precipitation data that we consider in this paper possess this kind of nonstationarity for both marginal and cross-covariances. Figures 1-3 in Trenberth and Shea (2005) show that the standard deviations for both variables as well as their cross-correlations have patterns depending on latitude. Figure 1 of this paper displays the standard deviations and cross-correlations of surface temperature and precipitation, averaged over November to March each year and then averaged over 1970 to Figures (a), (c) and (e) give each quantity with respect to latitude, that is, at each latitude, the standard deviation or cross-correlation of the data across all longitude values is calculated. Figures (b), (d), and (f) give the standard deviation or cross-correlation of the data calculated across all latitude values, at each longitude. See Appendix for more details on how these values are calculated. Notice that although we see some dependence of standard deviation and cross-correlation with respect to longitude, the dependence is more obvious with respect to latitude. This may suggest the processes are reasonably modeled as axially symmetric (Jones 1963) both marginally and jointly; the covariance structure is stationary with respect to longitude and nonstationary with respect to latitude. 4

5 2.2 Longitudinal reversibility We say a univariate process Z 1 is longitudinally reversible if Cov{Z 1 (L 1, l 1 ), Z 1 (L 2, l 2 )} = Cov{Z 1 (L 1, l 2 ), Z 1 (L 2, l 1 )} for all L 1, L 2, l 1, l 2 (Stein 2007). Stein (2007) and Jun and Stein (2008) show that the total column ozone process is longitudinally irreversible; we find that it is also the case for both temperature and precipitation data marginally (not shown). This concept can be applied to crosscovariances as well. Call the cross-covariance of the two processes Z 1 and Z 2 longitudinally reversible if Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} = Cov{Z 1 (L 1, l 2 ), Z 2 (L 2, l 1 )} for all L 1, L 2, l 1, l 2. For some data sets, however, longitudinally irreversible cross-covariances may be hard to estimate. Figure 2 (a) shows the empirical estimate of the difference of the cross-correlations at a latitude band, Cor{Z 1 (L, l), Z 2 (L, l + )} Cor{Z 1 (L, l + ), Z 2 (L, l)}, against latitude (x-axis) and longitude lag, (y-axis). The empirical estimate for the above quantity is based on temporally averaged data, a 30 year average (see Section 4.1 for details on how we aggregate the data temporally). To get these empirical estimates, first we bin the latitude with the bin size roughly 7. Then within each bin, for each, we calculate the correlations between the two variables at the longitude lag of. To assess the uncertainty of the empirical longitudinal irreversibility, we also split the total 30 year period into 30 intervals of one year and instead of calculating irreversibility using a 30 year average, we calculate the irreversibility based on 30 annual averages. The mean and the standard deviation of these irreversibility quantities based on these 30 data points (annual averages) are given in (b) and (c), respectively. Note (a) and (b) give quite similar patterns although the range of irreversibility in (b) is narrower. Although there seems to be some strong longitudinal irreversibility in the cross-correlation near the poles and mid latitude of Southern Hemisphere at large longitudinal lags, the uncertainty associated with it (especially near the poles) is high. It might be hard to fit the pattern with any smooth function of latitude and longitude lags due to the complex nature of the empirical irreversibility surface. See Section 4 for more discussion on this issue. 5

6 2.3 Asymmetry We now define a general concept of asymmetry for multivariate spatial processes. We call the cross-covariance of Z 1 and Z 2 symmetric if Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} = Cov{Z 1 (L 2, l 2 ), Z 2 (L 1, l 1 )} for all L 1, L 2, l 1, l 2. If the cross-covariance structure is asymmetric, then the cross-covariance matrix of some set of observations will generally be asymmetric. Note that the model propose in Gneiting et al. (2010) is always symmetric. The model from the linear model of coregionalization is also symmetric unless the coefficients vary spatially. Apanasovich and Genton (2010) present crosscovariance models that is asymmetric in space-time domain, which is somewhat different from the asymmetry discussed in this paper. 3. METHODOLOGY In this section, we develop joint covariance models that can exhibit the nonstationary properties discussed in Section 2 for not only marginal but also cross-covariance structure. We also discuss computational methods that enable us to compute full likelihoods efficiently when we have large global data sets on a regular grid for multivariate processes. 3.1 Model Jun and Stein (2007) proposed an approach to produce nonstationary covariance models for a univariate process on a globe to capture space-time asymmetry, which is commonly found in environmental data (Gneiting 2002; Jun and Stein 2004; Li, Genton, and Sherman 2008). The key idea is to apply differential operators with respect to latitude, longitude, and time to an isotropic spatio-temporal process and Section 4 of Jun and Stein (2007) demonstrates the effectiveness of the approach in capturing such space-time asymmetry. Jun and Stein (2008) further explore the idea of applying differential operators with respect to latitude and longitude to an isotropic spatial process to represent various nonstationary properties of univariate process on a globe. They demonstrate that their model captures small scale variation in the process for a univariate process well. The key in the model is the flexibility, resulting from the products of first order differential operators with respect to latitude and longitude, applied to an underlying process. We extend this idea for multivariate spatial processes on a globe. Now we show how the idea of applying differential operators to the processes can be applied to multivariate isotropic spatial processes to create nonstationary cross-covariance structure, in particular, 6

7 asymmetry and longitudinal irreversibility, which depends on latitude in a flexible way. Suppose we have a multivariate spatial process, (Z 1 (L, l),..., Z N (L, l)), defined on a globe, S 2, and we are interested in modeling the joint distribution of Z i s. We will focus on the case that N = 2 here; for the case that N > 2, the method extends in a natural way. Let us assume (Z 1, Z 2 ) is a bivariate process with mean zero. We also assume that Z i s are axially symmetric both marginally and jointly. Let us write Y = G(α, β, ν) if the process Y defined on S 2 has mean zero and its covariance is given by a Matérn covariance function: ( d ) νkν ( d Cov{Y (L 1, l 1 ), Y (L 2, l 2 )} = K(L 1, L 2, l 1 l 2 ) = α. (1) β β) Here K denotes the covariance function of Y, L i s are latitude values and l i s are longitude values (i = 1, 2). The parameters, α, β, ν > 0, are the sill, spatial range, and the smoothness parameters for a Matérn class, respectively, K ν is the modified Bessel function, and { ( ) ( )} d = d(l 1, L 2, l 1 l 2 ) = 2R sin 2 L1 L 2 + cosl 1 cos L 2 sin 2 l1 l 1/2 2 (2) 2 2 denotes the chordal distance between the two locations, (L 1, l 1 ) and (L 2, l 2 ). To ensure the positive definiteness of (1) on S 2, we need to use chordal distance as a spatial metric instead of a geodesic distance (see Jun and Stein (2007) for a detailed discussion). Let us first consider a simple setting: δz i (L, l) = a i {Y (L + δ, l) Y (L, l)} + b i {Y (L, l + δ) Y (L, l)}, i = 1, 2, (3) with a i, b i constants and δ > 0. When δ 0, (3) is essentially equivalent to the model with differential operators with respect to latitude and longitude applied to the process Y in the L 2 sense (instead of taking differences). It is easy to see from (3) that when δ 0, the cross-covariance of Z 1 and Z 2 can be written as Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} = a 1 a 2 L 1 2 K(L 1, L 2, l) b 1 b 2 L 2 l 2K(L 1, L 2, l) a 1 b 2 L 1 l K(L 1, L 2, l) + b 1 a 2 L 2 l K(L 1, L 2, l), (4) with l = l 1 l 2. Note that the second order partial derivatives of K in (4) originate from the limits of the second order differences of the covariance K. For example, L 1 1 K(L 1, L 2, l) = lim L 2 δ 0 δ 2 {K(L 1 + δ, L 2 + δ, l) K(L 1 + δ, L 2, l) K(L 1, L 2 + δ, l) + K(L 1, L 2, l)}. Furthermore, to have the limit properly defined, we need to have ν > 1. For more details on this condition, see Section 2 of Stein (1999) and the result in Jun and Stein (2007). 7

8 Now let us consider the longitudinal irreversibility and the asymmetry discussed in Sections 2.2 and 2.3. It is straightforward from (4) that when δ 0, the longitudinal irreversibility is given by and the asymmetry is given by Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} Cov{Z 1 (L 1, l 2 ), Z 2 (L 2, l 1 )} = 2a 1 b 2 L 1 l K(L 1, L 2, l) + 2b 1 a 2 L 2 l K(L 1, L 2, l), (5) Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} Cov{Z 1 (L 2, l 2 ), Z 2 (L 1, l 1 )} = ( a 1 b 2 + b 1 a 2 ) { L 1 l K(L 1, L 2, l) + L 2 l K(L 1, L 2, l) }. (6) From (5) and (6), it is clear that the types of nonstationary in the cross-covariance discussed in Sections 2.2 and 2.3 are achieved mainly from the interactions between the first and second order differences of Y in (3). If a 1 = a 2 and b 1 = b 2, the asymmetry in (6) reduces to zero, although the longitudinal irreversibility in (5) may not be zero. We show some plots of the cross-covariance structure for various values of a i s and b i s to further demonstrate the behavior of the proposed covariance model in a simple setting. We work with correlation scale instead of the covariance scale. We fix α = 1, a 1 = 1, and β = 2000 (Km). We vary a 2, b 1, b 2, and ν to explore the nonstationarities of the covariance model of (3) when δ 0. Figure 3 gives the longitudinal irreversibility in the cross-correlation structure given in (5). The irreversibility in the covariance scale should have the same shape except the scale. Notice that different ν values give different shapes of the irreversibility. From (5), the irreversibility depends on a i s and b i s through a 1 b 2 and b 1 a 2. If the signs of a 1 b 2 and b 1 a 2 change together, the sign of irreversibility should also change. Therefore, we see that several sets of the a i s and b i s give either the same irreversibility or the same magnitude of irreversibility with different signs. For example, when a 2 = 0, the pairs, (a) and (b), (c) and (d), (e) and (f), and (g) and (h), give the same irreversibility. Moreover, when a 2 = 0, the irreversibilities in (a) and (b) and those in (c) and (d) have same magnitudes but different signs. When a 2 0, the pairs, (a) and (d), (b) and (c), (e) and (h), and (f) and (g), give the same magnitude of the irreversibility with different signs. It may then appear that there are some identifiability problems in a i s and b i s since some sets of these coefficients give the same irreversibility curves. However, this is not the case for the covariance structure of the bivariate process. As long as we fix the sign of only one of the four coefficients (a i s and b i s), we can avoid the identifiability problem (see (4)). Figure 4 gives the asymmetry against longitudinal lags in the cross-correlation structure given in (6). Unlike the longitudinal irreversibility, the asymmetry in the covariance scale may have different 8

9 shape than the asymmetry in the correlation scale. This is because the asymmetry in the correlation scale is Cor{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} Cor{Z 1 (L 2, l 2 ), Z 2 (L 1, l 1 )} = Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} Var{Z1 (L 1, l 1 )}Var{Z 2 (L 2, l 2 )} Cov{Z 1 (L 2, l 2 ), Z 2 (L 1, l 1 )} Var{Z1 (L 2, l 2 )}Var{Z 2 (L 1, l 1 )}, (7) and the terms in each denominator are in general not the same except the case such as b 2 = a 2 b 1 (a 2 0), for any L 1, L 2, l 1, l 2. Therefore, the asymmetries in the correlation scale in general do not simply depend on the coefficients through a 1 b 2 b 1 a 2 but in a more complex manner. For example, when a 2 = 0, even if (a) and (b) have the same b 2 value, their asymmetries are different. When a 2 = ±0.1 and b 2 = a 2 b 1, the asymmetry is zero. When a 2 = ±0.1 and b 2 = a 2 b 1, then we may get the same magnitude of the irreversibility but with different signs. For example, the pairs, (b) and (c) with a 2 = 0.1 and (a) and (d) with a 2 = 0.1, give the same magnitude of the irreversibility with different signs. It is interesting to note that even if l 1 l 2 = 0, the asymmetry for some combinations of the coefficients is not zero (because L 1 L 2 ). Figure 5 (a)-(d) display the asymmetry against latitude. Note the asymmetries are zero when L 2 = 0 and the pairs, (a) and (d) or (b) and (c), give symmetric asymmetry values around L 2 = 0. The fact that the asymmetries are zero when L 2 = 0 can also be easily explained by (11) (we will discuss this further when we introduce (11) later). Note that the asymmetries are not necessarily symmetric against the equator, which is realistic for most real data sets. We now generalize the model in (3) in the sense that the coefficients are functions of latitude values. That is, we write: Z i (L, l) = n { k=1 A i,k (L) L + B i,k(l) l } Y k (L.l) + C i (L)Y 0 (L, l). (8) Here, the partial derivatives are defined in the L 2 sense and Y k = G(α k, β k, ν k ) (α k, β k > 0, k = 0,...,n, ν 0 > 0, and ν k > 1, k = 1,...,n). If ν k 1 for k = 1,..., n, then the mean square derivatives of Y k is not properly defined. We assume the Y k s (k = 0,...,n) are independent of each other. A useful simplification is to assume Y k share the same covariance parameters for k = 1,..., n, but we generally let Y 0 have different covariance parameters than Y 1,...,Y n to allow sufficient flexibility in the local behavior of the model. Note it is not necessary to include Y 0 in (8). We may then let C i = 0 for parsimony. The functions A i,k, B i,k, and C i in (8) are nonrandom functions and we model these functions as 9

10 linear combinations of Legendre polynomials. For instance, we let A i,k (L) = m a ikj P j (sinl), (9) j=0 where P j denotes the Legendre polynomial of order j. Then a ikj R are additional covariance parameters to be estimated along with other covariance parameters. The maximum order of Legendre polynomials used here, m, is chosen arbitrarily and we expect a modest number of m should be able to produce flexible covariance functions. The values of m for A i,k, B i,k, and C i may be different. Larger m will obviously give more flexibility to the covariance structure and we may let m = 0 for a parsimonious model. We compare the different possibilities discussed here in Section 4. Although the Y k s are independent of each other, the Z i s have nonzero cross-covariance and we can get explicit expressions for the cross-covariance of the Z i s. For instance, suppose n = 1, Y 1 = G(1, β, ν) (ν > 1), and C i = 0 (i = 1, 2). Set h = h(l 1, L 2, l 1 l 2 ) = (d/β) 2 for d defined in (2), h p = h p (L 1, L 2, l 1 l 2 ) = h x p and h pq = h pq (L 1, L 2, l 1 l 2 ) = 2 h x p x q, where x 1 = L 1, x 2 = L 2, and x 3 = l 1 l 2 (see Appendix A of Jun and Stein (2007) for explicit expressions for h p and h pq ). Also let M ν (x) = x ν K ν (x). Then the cross-covariance function of Z 1 and Z 2 is given by, Cov{Z 1 (L 1, l 1 ), Z 2 (L 2, l 2 )} = Γ 1 M ν 2 ( h) + Γ 2 M ν 1 ( h), (10) where Γ 1 and Γ 2 are Γ 1 = 1 4 {A 1,1(L 1 )A 2,1 (L 2 )h 1 h 2 B 1,1 (L 1 )B 2,1 (L 2 )h 2 3 A 1,1 (L 1 )B 2,1 (L 2 )h 1 h 3 +B 1,1 (L 1 )A 2,1 (L 2 )h 2 h 3 }, and Γ 2 = 1 2 {A 1,1(L 1 )A 2,1 (L 2 )h 12 B 1,1 (L 1 )B 2,1 (L 2 )h 33 A 1,1 (L 1 )B 2,1 (L 2 )h 13 +B 1,1 (L 1 )A 2,1 (L 2 )h 23 }. The cross product terms of A 1,k or B 1,k and A 2,k or B 2,k in Γ 1 and Γ 2 come from the covariance of processes with L and l applied in (8) and through the linear combination terms for A i,k and B i,k as in (9), the resulting cross-covariance model in (10) achieves great flexibility and can capture complex nonstationary structure in the data. In particular, it can be easily shown that for any latitude L, longitude l and longitudinal lag, Cov{Z 1 (L, l), Z 2 (L, l + )} Cov{Z 1 (L, l + ), Z 2 (L, l)} ={B 1,1 (L)A 2,1 (L) A 1,1 (L)B 2,1 (L)} 4β 2 R 2 sinlcoslsin( 2 ) cos ( 2 ) { 1 2 M ν 2( h)4β 2 R 2 cos 2 Lsin 2 ( 2 ) M ν 1( h)}, (11) 10

11 where h = 4β 2 R 2 cos 2 Lsin 2 ( 2 ). Therefore, by letting A i,k and B i,k functions depend on the latitude, L, the proposed covariance model can produce flexible longitudinal irreversibility. (11) can also be used to prove the fact that the asymmetries are zero when L 2 = 0 in Figure 5 (note that when L 1 = L 2, the asymmetry reduces to the longitudinal irreversibility). In fact, under the current covariance model, longitudinal irreversibility at the equator is always zero due to the term sinl in (11). Figure 5 (e)-(f) display the asymmetry when A 1,1 (L) = 1, A 2,1 (L) = a 2, B 1,1 (L) = b 1, B 2,1 (L) = b 2 (L) = b 20 (5 + 5P 2 (sinl)). These plots demonstrate that by allowing the coefficients A i,k and B i,k to depend on the latitude, we get more flexibility in the resulting covariance structure. It is shown in Jun and Stein (2008) that the marginal correlation from the model in (8) (with C i = 0) can be as small as 1. For cross-correlations, we also achieve the range of 1 to 1. For an extreme example, in (8), suppose n = 1, A 1,1 = A 2,1 = 1, B 1,1 = B 2,1 = 0, and C 1 = C 2 = 0. Then it is easy to see that the cross-correlation between Z 1 and Z 2 is 1 everywhere. The proposed method here has a similar spirit as the LMC model in the sense that each process Z i is modeled as a linear combination of latent processes. However, the model proposed in this paper has several significantly different aspects compared to those of LMC. In the model, cross-covariance structure is characterized by the first term of (8) and even when n = 1 with C i = 0, we do achieve fairly flexible cross-covariance models that we cannot with the LMC models with more covariance parameters (see Section 4.3). The expression in the summation of (8) may appear to be a linear combination of latent processes, but in fact the differential operators are defined in the L 2 sense. The variations of LMC models that give nonstationary covariance models such as in Gelfand et al. (2004) achieve the nonstationarity quite differently from the way the models in (8) achieve it. One of the fundamental differences between the approach proposed in this paper and the LMC model is that the differential operators with respect to latitude and longitude are applied to the same process(y k in (8)). Suppose we have n = 1 in (8). In the LMC models, they consider linear combinations of independent processes, but the differential operators in (8) effectively evaluate covariances of differences of the same process, Y 1, with small latitudinal or longitudinal lags (see (3)). Hence, the nonstationarity with respect to latitude not only come from the coefficients of partial differential operators, A i,k and B i,k, but also from the differential operators, L and l, and the covariance between the processes with each differential operators applied. We choose to use Legendre polynomials in modeling the coefficients of the differential operators. It is not clear how we could get empirical estimates of these coefficients from the data and thus we instead model these coefficients through some orthogonal polynomials of the latitude. Legendre polynomials in that sense are natural choice since they are orthogonal over the interval [ 1, 1] and 11

12 thus P j (sinl) s for 90 L 90 are orthogonal. There are possible limitations of the model in (8). The first limitation is that each Z i s may have the same spatial range and smoothness parameter since Z i s consist of the same processes (Y k s). One easy fix of this problem is either by letting Y 0 have different covariance parameter than Y k (k > 0) s or by adding more terms (processes) in (8) and let these terms have different covariance parameters. We will explore this issue further for the climatological application in Section Computational Issues It is common to estimate the covariance parameters as well as the mean parameters using maximum likelihood estimation and for that purpose, we from now on assume that the process is multivariate Gaussian. Many spatial data sets these days are of large dimension and often it can be quite challenging to efficiently compute the full likelihood. For the case of regularly spaced data, which is usually the case for satellite data and the numerical model outputs, however, the computation of the exact likelihood can be quite efficient. Jun and Stein (2008) demonstrate such a method using the Discrete Fourier transform (DFT) for univariate spatial process. The key idea is the following: since the covariance model is axially symmetric and we have regularly spaced longitude values covering full range, the resulting covariance matrix can be written in a block circulant form. Then using the fact that a block circulant matrix can be diagonalized by applying the DFT, we can calculate the inverse and the determinant of the covariance matrix efficiently (see Jun and Stein (2008) for more details on how this works). The same idea can be applied for the cross-covariance matrix. As long as the multiple spatial processes are on the same longitudinal grids, cover the full longitude range, and the cross-covariance structure is axially symmetric (note the model in (8) does give an axially symmetric cross-covariance structure), both the marginal covariance matrix for each process and the cross-covariance matrix can be diagonalized by applying the DFT. Note Chan and Wood (1999) consider a multivariate stationary Gaussian random field defined on a rectangular grid in R d and they apply circulant embedding of a block Toeplitz matrix (Toeplitz structure comes from the stationarity of the random field) to create a block circulant covariance matrix. Then they perform the DFT to block diagonalize the covariance matrix. Since in our domain, the block circulant structure of the covariance matrix is naturally given through regularly spaced longitudinal points with 360 coverage, we do not need the step of circulant embedding. Suppose we consider a bivariate process (Z 1, Z 2 ) observed on a regular grid with p latitude points and q longitude points (longitudinal points must be equally spaced over the full longitude range). We denote Z i (L j ) = {Z i (L j, l 1 ),...,Z i (L j, l q )} T for j = 1,...,p and FZ i (L j ) is the DFT (with 12

13 respect to longitude) of Z i (L j ). Then it is well known that the corresponding covariance matrix of the complex normal vector, FZ i (L j ), is a diagonal matrix. Although FZ i (L) is a complex normal random variable, the likelihood of it can be obtained simply by calculating as if it is a real normal random variable with the appropriate covariance matrix (Wooding 1956). Therefore, if we denote Z i = {Z i,1,1,...,z i,p,1, Z i,1,2,...,z i,p,2,...,z i,p,q} T where Zi,j,k is the kth element of FZ i(l j ), then the covariance matrix of {Z 1 T,Z 2 T } T can be written as Σ = D 1 D 12, where D 1, D 2, and D 12 are complex block diagonal matrices with D 12 D 2 p p block diagonals and D 12 is the conjugate transpose of D 12. The determinant of the matrix Σ can be calculated using det(σ) = det(d 1 D 12 D 1 2 D 12 ) det(d 2) and the quadratic form in the likelihood can be efficiently calculated using the fact that Σ 1 (D = 1 D 12 D 1 2 D 12 ) 1 D 1 1 D 12(D 2 D 12 D 1 D 1 2 D 12 (D 1 D 12 D 1 2 D 12 ) 1 (D 2 D 12 D 1 2 D 12) 1 1 D 12) 1 Note that the lower off diagonal matrix is the conjugate transpose of the upper off diagonal matrix and the inverses of D 1 and D 2 can be calculated efficiently since they are block diagonal matrices with block size p p. In our application, we have p = APPLICATION 4.1 Data As noted in Section 1, the relationship between precipitation and surface temperature has received a lot of attention by scientists and it is important in the climate impact research area. We apply our covariance functions developed here to build a joint model between surface temperature and precipitation; the data originates from one of the numerical model outputs used in Trenberth and Shea (2005), the NCAR CCSM3. The NCAR CCSM3 is one of the climate models developed by NCAR. Jun, Knutti, and Nychka (2008) give a more detailed background on this and other climate models. We look at the 5 months average for Northern winter (November to March) and we take averages of these over 1970 to 1999 (we call it NDJFM from now on). The temperature output from this model, CCSM3, has also been analyzed by Jun et al. (2008) but they consider the differences between the observations and numerical model outputs and they only look at the latitude range of 50 S to 50 N on a coarser grid resolution (of 5 5 ). We use the numerical model output only (no observations) for the entire globe (full longitude and latitude ranges) in the original resolution of 13

14 ( in both longitude and latitude). It is common in climate studies to use numerical model outputs rather than observations since observations usually have a large fraction of missing observations (especially near the poles). For example, Trenberth and Shea (2005) used numerical model outputs only, to study the relationship between the temperature and precipitation. Note the unit for temperature is K and the unit for precipitation is Kg/(m 2 s) (Kilogram per squared meter per second). Tebaldi and Sansó (2009) deal with multiple numerical model outputs along with observations to build a joint model between surface temperature and precipitation, but their approach is relatively simple in terms of modeling the cross-covariance structure of the two variables. In their approach, cross-correlations between the two variables only come from the mean of the processes, that is, they let the mean of the precipitation be a linear function of the surface temperature. They consider spatio-temporal processes, but in this work, we focus on the spatial component of the process. 4.2 Model Model for the mean The first row of Figure 6 gives the NDJFM average of temperature and precipitation data. Since the order of precipitation data is 10 5, from now on, we multiply 10 5 to the precipitation data to make it comparable to the surface temperature data. For temperature, it is clear that the mean structure of the field mainly depends on the latitude. For precipitation, such dependence is not as strong as temperature data and there are places with large amount of precipitation around the equator. We first filter out the spatial mean structure using spherical harmonics and work with the residuals. Specifically we use spherical harmonics up to order r = 12 and regress each variable (surface temperature and precipitation) on the spherical harmonics, {Y s r (sinl, l) r = 0, 1, 2,...,s = r,..., r} for r = 12, separately. The second row of Figure 6 gives the estimated mean structure and the third row gives the residuals. Overall, the estimated mean field removes most of the large-scale spatial patterns in the data Model for the covariance Since our main interest in this paper is estimating covariance structure, we focus on fitting the covariance models using the residuals. We fit several covariance models to the data. We consider a Matérn model in Gneiting et al. (2010), a version of the LMC model, and a couple of variations of our covariance model developed in Section 3.1. Here, Z 1 denotes the surface temperature process 14

15 and Z 2 denotes the precipitation process. Note that these processes are the residuals after filtering out the mean as explained in Section Matérn model (MAT): we use the parsimonious bivariate Matérn model in Gneiting et al. (2010). In particular, we let Z i = G(α i, β, ν i ), i = 1, 2. The parameter ν 3 gives the smoothness for the cross-covariance and by construction, ν 3 = ν 1+ν 2 2. We also have the co-located correlation coefficient ρ. 2. LMC model (LMC): we use a version of the LMC model, that is, we set Z i (L, l) = a i W 1 (L, l)+ b i W 2 (L, l) + c i U i (L, l), where a i, b i, and c i are constants, W j = G(1, β, ν j ) (j = 1, 2), and U i = G(1, β, ω i ). We also assume W j s are independent, U i s are independent, and W j s and U i s are independent of each other. Therefore U i does not contribute to the cross-covariance structure of the Z i s. 3. Our covariance model (Nonstationary Multivariate Global model): (a) NMG1: we set Z i (L, l) = { a i L + b i l} Y (L, l) + ci U i (L, l). Here, Y = G(1, β, ν), U i = G(1, β, ω i ), and we assume the U i s are independent of Y. Note that ν > 1. All of the above models have the property that the processes Z 1 and Z 2 have the same spatial range parameter. Each of the model s properties discussed in Section 2 are summarized in Table 1. These models are intentionally set to be relatively simple since they will be used in Section 4.3 with the data over a subregion. We use the following more complex model to fit the full data in Section 4.4. (b) NMG2: we let Z i (L, l) = { A i (L) L + B i(l) l} Y (L, l) + Ci (L)W(L, l) + d i U i (L, l) with A i, B i, and C i being defined as in (9) (for instance, A i (L) = m j=0 a ijp j (sinl)). We let Y = G(1, β 1, ν 1 ) (ν 1 > 1), W = G(1, β 2, ν 2 ), and U 1 = G(1, β 3, ν 3 ) (note ν 2, ν 3 > 0). For the choices of m and d i, see Section 4.4. For the models LMC, NMG1, and NMG2, we may have an identifiability problem if we let all the parameters of the coefficients, a i, b i, c i, and the coefficients of the linear combinations in A i, B i, and C i vary in R. To avoid the problem, for LMC model, we take the signs of a 1, b 1, c 1 and c 2 positive. For NMG1 model, we take the signs of a 1, c 1, and c 2 positive. For NMG2 model, we take the sign of b 10, c 20, and d 1 positive. 4.3 Fit over North America Even if we use the computational technique through DFT described in Section 3.2, using the full data set for both variables to estimate the covariance structure takes quite some time (the total 15

16 data size is = 65, 536). Therefore, we first choose a subregion over some parts of North America and fit several covariance models for a quick comparison in terms of likelihood and prediction accuracy. We perform the prediction on a region disjoint with the estimation sites, over North America. Figure 7 shows the locations of estimation sites and the prediction sites. Note that there are 774 estimation sites and 172 prediction sites. For the fit in this section, since we have a manageable size of the data, we do not use the technique using DFT. In fact it is not possible since the technique through DFT requires that the data should cover the entire longitude range. We compare the covariance models listed in Section (1,2, and 3(a)) and estimate each covariance parameter using the maximum likelihood estimation method. We used numerical optimization (using nlm and optim functions in R and for optim, we use the default Nelder-Mead algorithm) and tried several starting points. The optimization procedures reached to the same maximum point for all of the different starting points that were tried. Table 2 gives the estimated covariance parameter values along with their asymptotic standard errors. Asymptotic standard errors are obtained from the inverse of the Hessian matrix. The maximized loglikelihood values for each model and the corresponding AIC values are also given. First thing to note is that the NMG1 model gives significantly larger loglikelihood values compared to the other models given comparable number of covariance parameters (the LMC model has the most covariance parameters). The AIC value for the NMG1 model is the smallest among the three. The fact that the LMC model, despite having the most covariance parameters, gives a much smaller loglikelihood value than NMG1 may be a sign that the nonstationarity, in particular, the dependence of covariance structure on latitudes, longitudinal irreversibility, and the asymmetry in the data (for marginal and/or cross-covariance structure) are rather strong and the differential operator term in (8) helps to explain these properties better. In terms of different smoothness in the two variables, it seems that the precipitation process is smoother than the temperature process. From the LMC model, the smallest smoothness parameter value, ν 1 is shared by the two processes, temperature and precipitation but the coefficient, a 1, is much larger in magnitude than the coefficient, a 2, and also b 2 is larger in magnitude than b 1. Therefore, the roughest process, W 1, mostly contributes to the temperature process and smoother process, W 2, mostly contributes to the precipitation process. From the NMG1 model, first of all, w 1 is smaller than w 2. Note that the effective smoothness of the process Y is ν 1 = 1.07, and thus Y has similar amount of smoothness to the process U 1. In that sense it is not clear whether precipitation process is smoother than the temperature process or not. Nevertheless the result in Tebaldi and Sansó (2009) shows that the precipitation process is smoother than the temperature process. For the LMC model, the estimate of ω 1 reached near the upper boundary of the parameter space and 16

17 thus we could not obtain its asymptotic standard errors. It is a common practice to set the range for the smoothness parameter to be (0, 2.5) since the covariance model is not valid for zero or negative values and for large values we often run into the numerical instability problem. This poor fit may imply that the data do not provide enough information on this parameter for the particular model of LMC. The model NMG1 does not have this problem. The estimate for the co-located correlation parameter of the MAT model is almost zero (ˆρ = 5.5e-06), while the empirical cross-covariance estimate is around 0.3. Figure 8 shows the prediction errors for the covariance models MAT, LMC, and NMG1 at the prediction sites. We display the difference between the true and the predicted values at the prediction sites against latitude. Note that the three models do not show much difference in terms of prediction accuracy for the temperature variable, although we see significant difference for the precipitation. The superiority of the NMG1 model is apparent for the prediction of precipitation. For the precipitation variable, the predictive skills of the MAT and LMC models are similar. To make a fair comparison of the predictive performances of the three models, we now repeat the above procedure over 24 disjoint subdomains, S 1,..., S 24, that cover most of globe altogether. That is, S 2n is a subdomain in the Northern Hemisphere with latitude range 0 to 60 N and S 2n 1 is a subdomain in the Southern Hemisphere with latitude range 0 to 60 S for n = 1,...,12. For each n, S 2n and S 2n 1 cover the longitude range 30(n 1) to 30n and for S 2n and S 2n 1, we set aside the data in the longitude range {30(n 1) + 10} to {30(n 1) + 15} for the validation of prediction. Table 3 shows the maximum loglikelihood values for the three models from the fits over the 24 subdomains. Except S 12, S 16, and S 24, NMG1 achieves the largest maximum loglikelihood values and the differences between the loglikelihood values of NMG1 and the other two models are significantly large in most of the subdomains. Table 4 gives the summary of prediction performance of the three models over the 24 subdomains. It provides the median, mean, and maximum values of Mean Squared Errors (MSEs) from the prediction over the 24 subdomains. Except the mean for temperature and maximum for the precipitation, NMG1 gives the smallest MSE values for all the summary statistics. Along with the result in Figure 8, this result demonstrates that NMG1 model indeed outperforms the other two models not only in terms of the maximum loglikelihood values (and AIC) but also the predictive performance. 4.4 Fit over full domain We now fit the full data set through the computational technique described in Section 3.2. From the fitted results in Section 4.3, it is clear that there is a strong nonstationarity in the data and the 17

18 covariance models listed in Section (except NMG2) are not flexible enough to capture such nonstationarity. On the other hand, from Tables 2-4 and Figure 8, it is clear that the NMG1 model outperforms the MAT and LMC models given comparable number of covariance parameters. Hence, we fit the full data to estimate cross-correlation between the temperature and precipitation data using an extended version of NMG1, NMG2, described in Section We set d 1 (0, ) and d 2 = 0. This is an attempt to capture the difference in smoothness for the two variables. We keep d 1 positive to avoid the identifiability problem. We could let d 1 = 0 instead of d 2, but since the parameter estimates in Table 2 and the study by Tebaldi and Sansó (2009) suggest that surface temperature data is less smooth than precipitation data, by adding the term d 1 U 1, we hope to capture the roughness in the temperature data. We also fitted the model with d 1, d 2 (0, ) but the improvement over NMG2 was not significant (that is, the loglikelihood values do not increase significantly and the fitted values do not change noticeably). We also let Y, W j, and U 1 have different spatial range parameters. That is, we set Y = G(1, β 1, ν 1 ), W = G(1, β 2, ν 2 ), and U 1 = G(1, β 3, ν 3 ). With the data size over 30, 000 for each variable, we have sufficient information to let these parameters differ. The estimated covariance parameter values along with their asymptotic standard errors are given in Table 5. It is interesting to note all three spatial range parameters are different, although for both variables, the maximum spatial range parameter estimate is given by the process W (ˆβ 2 ). The smoothness parameter estimates for Y and U 1 are comparable to the corresponding estimates for the fit in Section 4.3 in Table 2. Also the smallest estimate of the spatial range parameters, ˆβ1, is similar to the estimate of the corresponding parameter, β, in Table 2. The estimates for the remaining parameters differ significantly from those in Table 2, as we expected due to the difference between the two models, NMG1 and NMG2. As explained in Section 2.1, Figure 1 gives a comparison between the empirical and fitted variances and cross-correlations for the temperature and precipitation variables (NDJFM). For figures, (a), (c) and (e), the solid line gives the fitted values with the parameters in Table 5. For figures (b), (d), and (e), it is not obvious how to display the corresponding fitted values since as in Appendix, the empirical quantities are calculated through the sum across latitudes and thus at each longitude value, corresponding fitted values do not come in one number, but rather you get different fitted values for different combinations of latitudes. Figures (a)-(d) show the standard deviation for the univariate processes, and (e) and (f) show cross-correlations. Overall, fitted values do a reasonable job at capturing the pattern of the empirical values. Fitted variance for temperature is rather flat with respect to latitude since d 1 has relatively large estimate and the estimates of A 1,1 and B 1,1 in NMG2 for the temperature process got smaller weight. On the other hand the fitted variance for 18

19 precipitation captures the pattern in the data well. It may be interesting to see if increasing m for the temperature process would improve the fit. Fitted values for the cross-covariance structure are problematic in some places, but this may be partly due to the complex nature of the cross-covariance structure of the data. Figure 2 shows the comparison of the (a)-(b) empirical, (d) fitted using OLS, and (e) fitted using MLE of the longitudinal irreversibility, that is, r(l, L, ) = Cor{Z 1 (L, l), Z 2 (L, l + )} Cor{Z 1 (L, l + ), Z 2 (L, l)}, for L (x-axis) and (y-axis) in degrees. As explained in Section 2.2, we bin the latitude with the bin size roughly 7. For the OLS, we obtain another set of parameter estimates by minimizing the sum of the squared differences between the empirical irreversibility and the model fitted irreversibility across the latitude bins (fitted values are evaluated at the center of the latitude bins) and longitude lags,, up to 180. We also tried weighted least squares using the reciprocals of the number of data points at each latitude bin and longitude lag as weights, but the results were quite similar to the OLS fit. For (e), we use the covariance parameter estimates in Table 5. The OLS fit captures the empirical pattern better than the MLE fit, although irreversibility is overestimated near the North Pole. The MLE fit captures the positive irreversibility values near the North Pole, but overall the estimates are much smaller in magnitude than the empirical values. The fact that the OLS fit captures the empirical pattern quite well suggests that the model in (8) is indeed flexible. However, the fitted irreversibility from MLE is different from the empirical values by a factor of 10. The misfit of MLE estimates may be somewhat disappointing at first sight, but considering there are not many covariance models that can produce such irreversibility, it is encouraging to develop covariance models in this direction. Also note that as shown in figure (c), the uncertainty in the empirical longitudinal irreversibility is quite large. We also calculated the standard errors for the fitted irreversibility in (e) using the asymptotic standard errors of the fitted covariance parameters in Table 5, but the magnitude of the standard error is almost the same as the magnitude of the fitted irreversibility in (e). Now let us compare our estimated cross-correlation (in Figures 1 (e) and (f)) to the one in Trenberth and Shea (2005). Note that due to the axial symmetry assumption, our fitted cross-covariance is identical across each latitude bands. The high correlation level in the high latitude area in Northern hemisphere matches well with the result in Trenberth and Shea (2005). Our estimated correlation values are close to the average correlation levels across longitude at each latitude levels in Trenberth and Shea (2005) except the South Pole area; in this region, our estimated levels are slightly negative whereas Trenberth and Shea (2005) give high correlation levels. The estimated cross-correlation in Trenberth and Shea (2005) show clear distinction over land and sea. Unlike in Trenberth and 19

Non-stationary Cross-Covariance Models for Multivariate Processes on a Globe

Scandinavian Journal of Statistics, Vol. 38: 726 747, 2011 doi: 10.1111/j.1467-9469.2011.00751.x Published by Blackwell Publishing Ltd. Non-stationary Cross-Covariance Models for Multivariate Processes