Simple example of analysis on spatial-temporal data set

Simple example of analysis on spatial-temporal data set I used the ground level ozone data in North Carolina (from Suhasini Subba Rao s website) The original data consists of 920 days of data over 72 locations in North Carolina I used first 50 days of data over 28 locations Missing values are imputed in a simple way The data is standardized, so that it s mean is close to zero and the distribution is close to be Normal c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 1 / 20

Location c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 2 / 20

Normality Normal Q Q Plot Frequency 0 50 150 250 350 Sample Quantiles 2 1 0 1 2 3 2 1 0 1 2 3 ozone 3 2 1 0 1 2 3 Theoretical Quantiles c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 3 / 20

Temporal correlation Series o3[, 10] Series o3[, 15] ACF 0.0 0.2 0.4 0.6 0.8 1.0 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 Lag 0 5 10 15 20 25 30 Lag c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 4 / 20

Two simple models considered Model 1: isotropic, fully symmetric Matérn model Parameters: α, β 1, β 2, ν Model 2: separable model with both Matérn models Tried same number of covariance parameters as above (fix smoothness for space and time same) as well as the case with different spatial and temporal smoothness parameter values c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 5 / 20

Fitted results From model 1: α = 0.158, β 1 = 0.00770 (miles), β 2 = 105.617 (days), ν = 0.115, LogLik=108.584 From model 2: α = 0.034, β 1 = 0.301, β 2 = 108.853, ν = 0.114, LogLik=107.859 From model 2: (different smoothness) α = 0.147, β 1 = 0.295, β 2 = 109.324, ν 1 = 1.205, ν 2 = 0.114, LogLik=107.859 c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 6 / 20

Now... How to simulate random fields? c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 7 / 20

Simulation of Random Fields Now we will see some methods of simulating spatial data Why do we care about it? Sometimes we have to demonstrate our statistical method works by showing that the method exhibits satisfactory long-run behavior We may have to perform randomization tests or other hypothesis tests (we will see an example in a minute) Many times we lack of replication in spatial data sets c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 8 / 20

Example of usage of simulated spatial data set Jun, Knutti, and Nychka (2008, JASA) We have 20 climate models (each model is a gigantic system of PDEs) The assumption is that some of these climate models have correlated errors For a given time point, we have one output from each climate model If we estimate correlation between a certain pair of model errors, we cannot do inference on the statistical significance of it What do we do? c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 9 / 20

Example of usage of simulated spatial data set Our idea is that, we build a spatial (or spatial-temporal) model for each climate model Based on our spatial model, we do independent simulation many many times for each climate model If climate model errors are independent, the correlation that we get from the independent simulation should be similar to the actual correlation that we get from the actual climate model outputs Our spatial model and simulation from it gives us a ground for the test for the significance of the correlation c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 10 / 20

Jun et al. (2008, Tellus) c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 11 / 20

Jun et al. (2008, JASA) c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 12 / 20

Jun et al. (2008,Tellus) c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 13 / 20

Jun et al. (2008,Tellus) c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 14 / 20

Simulations and Gaussianity From now on, we will assume the spatial field has a Gaussian distribution This is due to the fact that all we know about the process is mean and the covariance structure Gaussian distribution is a special distribution that is determined 100 % by mean and the covariance We can not do much if the process is not Gaussian c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 15 / 20

Conditional vs unconditional simulation Suppose our spatial domain is D R 2 and we have observations on s 1,, s m for some m We consider simulating values of the spatial process (with the same mean and same covariance, and thus same distribution under Gaussianity) on the locations of D other than s 1,, s m for some m Conditional simulation means that we respect the observation values. That is, our simulated values on the locations of observations should be the same as the actual observations Unconditional simulation means that we do not have such restriction Obviously unconditional simulation would be easier than the conditional simulation c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 16 / 20

Unconditional simulation of Gaussian random fields Suppose we consider simulating Y N(µ, Σ) Note that since Σ is symmetric and positive definite, we can write Σ = Σ 1/2 (Σ 1/2 ) T Then if X N(0, I ) (I is an identity matrix with the same dimension as Σ), we can show that µ + Σ 1/2 X has the same distribution as Y Now the question is now to find Σ 1/2 c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 17 / 20

Unconditional simulation of Gaussian random fields 1 Cholesky decomposition As we discussed in the previous lecture, we can decompose Σ = U T U for U being a upper triangular matrix and we can have all of the diagonals of U being positive (and they are unique) You can use Σ 1/2 = U T 2 Eigenvalue decomposition There is another way of decomposing Σ That is, we can let Σ = P P T where is a diagonal matrix with its diagonal values as eigenvalues of Σ (they should be all positive) Also P should be orthonormal matrix (P T P = I ) Since the diagonals of are positive, we can let Σ 1/2 = P 1/2 P T c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 18 / 20

Conditional simulation of Gaussian random fields Suppose we want to simulate the random field Z(s), s ind Suppose also that we have observations Z(s 1 ),, Z(s m ) We denote the simulated values of Z as S A conditional simulation produces n = m + k values such that S(s) = [Z(s 1 ),, Z(s m ), S(s m+1 ),, S(S m+k )] c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 19 / 20

Conditional simulation of Gaussian random fields 1 Sequential simulation for Gaussian random field Note the fact from the multivariate Gaussian distribution that if ( ) ( ) ( ) Z(s0 ) µ0 σ 2 c N(, T ) Z(s) µ c Σ then, Z(s 0 ) Z(s) N(µ 0 + c T Σ 1 (Z(s) µ), σ 2 c T Σ 1 c) Using the above fact, we calculate the conditional distribution of S(s m+i ) given Z(s 1 ),, Z(s m ), S(s m+1 ),, S(S m+i 1 )] which is Gaussian 2 Conditioning a simulation by Kriging Consider the decomposition Z(s) = p sk (s; Z) + Z(s) p sk (s; Z) Then we replace Z(s) p sk (s; Z) by S(s) p sk (s; S m ), where p sk (s; S m ) denotes the simple kriging predictor at location s based on the values of the unconditional simulation at s 1,, s m Then we can show that the above quantity has the desired property c Mikyoung Jun (Texas A&M) Stat647 Lecture 16 October 23, 2012 20 / 20