Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Similar documents
Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Ch 2: Simple Linear Regression

Ch 3: Multiple Linear Regression

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Introduction to Geostatistics

Model Selection for Geostatistical Models

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Hierarchical Modeling for Univariate Spatial Data

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Covariance function estimation in Gaussian process regression

Hypothesis Testing hypothesis testing approach

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Stat 579: Generalized Linear Models and Extensions

BIOS 2083 Linear Models c Abdus S. Wahed

Simple Linear Regression

Simple and Multiple Linear Regression

Empirical Market Microstructure Analysis (EMMA)

Linear models and their mathematical foundations: Simple linear regression

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Linear Regression Model. Badr Missaoui

Chapter 4 - Fundamentals of spatial processes Lecture notes

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Simple Linear Regression

Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Lectures on Simple Linear Regression Stat 431, Summer 2012

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Hierarchical Modelling for Univariate Spatial Data

General Linear Model: Statistical Inference

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

STAT 540: Data Analysis and Regression

Multiple Linear Regression

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

ST430 Exam 2 Solutions

MS&E 226: Small Data

2. Regression Review

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Math 423/533: The Main Theoretical Topics

Master s Written Examination - Solution

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Model comparison and selection

Linear Models and Estimation by Least Squares

CAS MA575 Linear Models

Chapter 1. Linear Regression with One Predictor Variable

Master s Written Examination

Point-Referenced Data Models

1 One-way analysis of variance

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Spatial Modeling and Prediction of County-Level Employment Growth Data

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

2.2 Classical Regression in the Time Series Context

Multiple Linear Regression

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Quasi-likelihood Scan Statistics for Detection of

Quick Review on Linear Multiple Regression

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Formal Statement of Simple Linear Regression Model

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

8. Hypothesis Testing

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Random and Mixed Effects Models - Part III

Master s Written Examination

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Statistícal Methods for Spatial Data Analysis

Comparing two independent samples

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Applied Regression Analysis

Lecture 4 Multiple linear regression

Matrix Approach to Simple Linear Regression: An Overview

Classification. Chapter Introduction. 6.2 The Bayes classifier

Lecture 15. Hypothesis testing in the linear model

Regression, Ridge Regression, Lasso

Maximum Likelihood Estimation

Applied Econometrics (QEM)

Hierarchical Modelling for Univariate Spatial Data

STAT5044: Regression and Anova. Inyoung Kim

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Inference for Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Using Estimating Equations for Spatially Correlated A

Overview Scatter Plot Example

MATH 644: Regression Analysis Methods

Introduction to Estimation Methods for Time Series models Lecture 2

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Transcription:

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable Y (s) and the vector of independent variable x(s) are observed at a finite number of sites s 1,, s n The multiple linear regression method assumes Y (s) and Y (s ) are independent but the geostatistical method assume they are not In the following, we will first present the statistical methodology and then provide a real example to numerical interpret it 1 Multiple Linear Regression The multiple regression model provides the estimates of parameters and their ANOVA table The multiple linear regression approach is to view the data as arising from independent observations, in which the statistical model is Y (s) = x (s)β + ϵ(s), (1) where β is the vector of unknown parameters and ϵ(s) is an identically independent distributed (iid) error term with ϵ(s) iid N(0, σ 2 ) In matrix notation, Model (1) can be expressed Y = Xβ + ϵ, where Y = (Y (s 1 ),, Y (s n )), X = (x(s 1 ),, x(s n )) and ϵ = (ϵ(s 1 ),, ϵ(s n )) It assumes ϵ N(0, σ 2 I) so that Y N(Xβ, σ 2 I) The unknown parameters β and σ 2 can be estimated by the least square (LS) method: ˆβ = (X X) 1 X Y (2) and ˆσ 2 = 1 n p Y [I X(X X) 1 X ]Y, (3) where p is the column dimension of X The variance-covariance matrix of ˆβ is computed by Cov( ˆβ) = ˆσ 2 (X X) 1 (4) Let ˆβ j be the j-th component of ˆβ and ˆσ jk be the (j, k) entry of Cov( ˆβ) Then, its variance V ( ˆβ j ) = ˆσ jj 1

and its standard error is std( ˆβ) = ˆσ jj The t-value of ˆβ j is t( ˆβ j ) = ˆβ j std( ˆβ j ) Let α be the significance level If we test H 0 : β j = 0 H 1 : β j 0, then we claim β j 0 t( ˆβ j ) > t α/2,n p, where t α/2,n p is the upper α/2 quantile of t n p distribution The p-value of the test is given by P { t n p > t( ˆβ j ) } The fitted value of the response variable at a general site s is Ŷ (s) = x (s) ˆβ The sum of square of model (SSM) is defined by SSM = [Ŷ (s i) Ȳ ]2, the sum of square of error (SSE) is defined by SSE = [Y (s i ) Ŷ (s i)] 2, and the sum of square of total (SST) is defined by SST = [Y (s i ) Ȳ ]2, where Then, Ȳ = 1 Y (s i ) n SSM = SST SSE The R-square of the model is defined by R 2 = SSM SST It is known that 0 R 2 1 If R 2 is close to 1, then the fit is good; otherwise, the fit is bad Usually, the fit of a model is considered good if R 2 > 02 2

Table 1: Format of parameter estimates of the regression method Variable Estimate Std t-value p-value β 0 ˆβ0 std(β 0 ) ˆβ0 /std( ˆβ 0 ) P { t n p > ˆβ 0 /std( ˆβ 0 )} β 1 ˆβ1 std(β 1 ) ˆβ1 /std( ˆβ 1 ) P { t n p > ˆβ 1 /std( ˆβ 1 )} β p 1 ˆβp std(β p ) ˆβp 1 /std( ˆβ p 1 ) P { t n p > ˆβ p 1 /std( ˆβ p 1 )} Table 2: Format of ANOVA of the regression method Degree of Sun of Mean of Variable Freedom Suqare Square F -value P -value β 1 df 1 SS 1 MS 1 = SS 1 /df 1 F 1 = MS 1 /MSE P {F df1,n p > F 1 } β p 1 df p 1 SS p 1 MS p 1 = SS p 1 /df p 1 F p 1 = MS p 1 /MSE P {F dfp 1,n p > F p 1 } Error n p SSE MSE = SSE/(n p) Total n 1 SST The IsoMAP reports the type I ANOVA by the stepwise method To compute the ANOVA value of j-variable, it computes the SSM 1 of the first model and SSM 2 the second model Then, the ANOVA value of β j is Y (s) = β 0 + β 1 x 1 (s) + + β j 1 x j 1 (x) + N(0, σ 2 ) Y (s) = β 0 + β 1 x 1 (s) + + β j 1 x j 1 (x) + β j x j (s) + N(0, σ 2 ) SS j = SSM 2 SSM 1 = SSE 1 SSE 2 Let df j be the degrees of freedom of j-th varaible, then its p-value is where F j = P {F dfj,n p > F j } SS j /df j SSE/(n p) = MS j, j = 1,, p 1 MSE If the p-value is less than the significance level, then β j is significantly different from 0 To summarize, we display the format of parameter estimates in Table 1 and the format of ANOVA in Table 2 3

2 Geostatistical Model The approach taken here is to view the data as arising from geological space where each observation corresponds to a spatial location From this perspective, it might be appropriate to call our method a geostatistical method [2] The statistical model describes the relationship between the response variable and the independent variables according to a geostatistical model in which ϵ(s) in Equation (1) is assumed a spatially correlated error term The error term ϵ(s) are normally distributed with zero mean and a certain covariance function The covariance function is assumed stationary and isotropic, which has the form of Cov(ϵ(s), ϵ(s + h)) = σ 2 ρ(u), u = h, (5) where σ 2 = V [ϵ(s)] is the variances of the Gaussian process and ρ(u) is the correlation function A legitimate correlation function must be positive-definite In practice, this is usually ensured by working within one of several standard classes of parametric models Overall, one can generally choose the well-known Matérn correlation function [9] given by ρ θ (u) = θ 1 2 θ 3 1 Γ(θ 3 ) ( u θ 2 ) θ 3 K θ3 ( u θ 2 ), θ = (θ 1, θ 2, θ 3 ), u > 0, (6) where 0 θ 1 1, θ 2 > 0 and K θ3 ( ) is the modified Bessel function The range parameter θ 2 controls the rate of decay of ρ θ (u) between observations as distance increases The smoothness parameter θ 3 controls the behavior of the smoothness of ρ θ (u) The Matérn class includes the exponential correlation function when θ 3 = 05 The correlation function also includes the nugget effect, where the nugget effect is present if θ 1 < 1 In addition, θ 1 also controls the magnitude of spatial dependence in which a geostatistical model reduces to a multiple regression model if θ 1 = 0 The unknown parameter β in Equation (6) along with the correlation function ρ θ (u) reflects the spatial distribution of the response variation Y (s) (ie δ 18 O in this article) In matrix notation, Models (1) and (5) can be expressed as Y = Xβ + ϵ where R θ = Corr(Z) = 1 ρ θ (d 12 ) ρ θ (d 1n ) ρ θ (d 21 ) 1 ρ θ (d 2n ) ρ θ (d n1 ) ρ θ (d 2n ) 1 (7) where d ij is the distance between sites s i and s j We use the maximum likelihood method to estimate β and θ, which estimates β and θ by maximizing the following loglikelihood function l(β, σ 2, θ) = n 2 log(2π) n 2 log(σ2 ) 1 2 log det(r θ) 1 2σ (Y 2 Xβ)t Rθ 1 (Y Xβ) (8) 4

Because it is difficult to maximize l(β, σ 2, θ) with respect to β, σ 2 and θ simultaneously, we use the profile likelihood method to first compute the MLE of θ and then compute the MLE of β, σ 2 We briefly introduce our algorithm below If θ is given, then the (conditional) MLE of β can be derived by the generalized least square (GLS) method: and ˆβ θ = (X Rθ 1 X) 1 X Rθ 1 Y (9) ˆσ 2 θ = 1 n Y M θ Y, (10) where M θ = R 1 θ R 1 θ X(X R 1 θ X) 1 X R 1 θ Put (9) and (10) into (8) We have the profile loglikelihood function l P (θ) = n 2 [1 + log(2π n )] 1 2 log det(r θ) n 2 log(y t M θ Y ) (11) The profile maximum likelihood estimate (PMLE) of θ can be derived by using the Newton- Raphson algorithm Because the smoothness parameter θ 3 is sensitive in the algorithm, we compute the PMLE of θ 1 and θ 2 conditioning on θ 3 and then find the best θ 3 by a one-dimensional optimization method (eg the Golden Section Algorithm) If ˆθ is derived, then the MLE of β and σ 2 can be derived by using (9) and (10) When ˆθ, ˆσ 2 and ˆβ are derived, the interpolation of Y (s) at an unobserved site s 0 can be derived by using the universal kriging method Let x 0 = x(s 0 ) be the vector of independent variables at site s 0 and c 0 = Corr(Y (s 0 ), Y ) = (ρˆθ(d 01 ),, ρˆθ(d 0n )), where d 0i is the distance between sites s 0 and s i The universal kriging method interpolates Y (s 0 ) be Y (s 0 ) with the conditional expected value as Y (s 0 ) = E[Y (s 0 ) Y ] = x 0 ˆβ + c 0R 1 ˆθ (Y X ˆβ) (12) The variability of the universal kriging interpolation is given by the mean squared prediction error (MSPE), which is MSP E[Y (s 0 )] =E[Y (s 0 ) Y (s 0 )] 2 =σ 2 [1 + x 0(X R 1 ˆθ X) 1 x 0 c 0R 1 ˆθ X(X R 1 ˆθ X) 1 X R 1 ˆθ c 0] (13) The detail of the derivation of Equations (12) and (13) can be found in [10] The aim of model selection is to find the best linear function x (s)β in Equation (1) The best linear function x (s)β is selected from a collection of candidates The AIC of a specific model is AIC = 2l( ˆβ, ˆσ 2, ˆθ) + 2k 5

where ˆβ, ˆσ 2, ˆθ are the MLE of β, σ 2 and θ, and k is the number of parameters under a specific model Let Y (i) (s i) be the kriging interpolation of Y (s i ) if it is excluded from the dataset Then, the CV of a model is given by The best model has the lowest AIC or CV value CV = 1 [Y (s i ) Y n (i)(s i )] 2 3 Moran s I Test for Spatial Dependence A number of permutation testing methods have been proposed to test for spatial dependence Almost all of them need a well-defined measure of the closeness (or weight) between two units We choose Moran s I [8] in IsoMAP because it is the most popular one Let ˆϵ i be the residual of the regression model at i and w ij be the measure of the closeness between units i and j Moran s I is defined as I = 1 S 0 b 2 j=1,j i w ijˆϵ iˆϵ j, (14) where S 0 = n nj=1,j i w ij and b k = n ˆϵ k i /n Moran s I statistic usually ranges between 1 and 1 even though its absolute value could be over 1 in extreme cases With a coefficient close to 1, Moran s I indicates neighborhood dissimilarity; with a coefficient close to 1, Moran s I indicates neighborhood similarity spatial randomness or independence [1] When the coefficient of Moran s I is close to 0, it indicates The p-values of Moran s I is computed under random permutation test schemes A random permutation test generally calculates the moments of the test statistic under every possible arrangement of the data These arrangements would be used to generate the distribution of the test statistic under the null hypothesis A related approach uses many Monte Carlo rearrangements of the data rather than enumeration of all of the possible arrangements [3] If the number of Monte Carlo rearrangements is large and each arrangement has equal probability in each Monte Carlo replicate, then the exact test and Monte Carlo permutation test will have similar results, and the Monte Carlo permutation test is asymptotically equivalent to the exact permutation test ([4] 185-187) Note that for a general n, there are n! possible permutation arrangements in the exact permutation test It is generally impossible to obtain the exact moments of a test statistic under the permutation test scheme However, since the numerator of Moran s I is in quadratic form and their denominators are permutation invariant, the exact expressions of their moments under random permutation test scheme are available and have been included in the many textbooks (eg see Cliff and Ord [1]) In the following, we denote E R ( ) and V R ( ) as the expected value and variance of a statistic under the exact permutation test scheme Formulae of the moments then are given below accordingly: E R (I) = 1 n 1, 6

and E R (I 2 ) = S 1(nb 2 2 b 4 ) S0b 2 2 2(n 1) + (S 2 2S 1 )(2b 4 nb 2 2) S0b 2 2 2(n 1)(n 2) + (S2 0 S 2 + S 1 )(3nb 2 2 6b 4 ) S 2 0b 2 2(n 1)(n 2)(n 3), where B k = n z k i /n, S 1 = n nj=1,j i (w ij + w ji ) 2 /2 and S 2 = n [ n j=1,j i (w ij + w ji )] 2 The variances is V R (I) = E R (I 2 ) E 2 R(I), Assume I is asymptotic normal Then, it p-value is calculated by a two-sided z-test according to 2[1 Φ( I E R(I) V R (I) )] If the p-values are less than the significance level (eg 005), we conclude the significance of spatial dependence In the IsoMAP, we take z i as the i-th residual of the regression model and choose w ij by the k-nearest neighbor method The k-nearest neighbor is defined by w ij = 1 if the distance between s i and s j are among the least k distances between s i and all the rest sites and w ij = 0 otherwise 4 An Example The example is available at the homepage of IsoMAP with Key 5373, case name O18Global, and start time August 15, 2011 In this example, we selected data from 1980 to 1995 which included all the months from South Pole to North Pole The final dataset contained 341 stations We chose δ 18 O as response variable, and elevation, absolute latitude, and the square of latitude as independent variables Then, the statistical model was δ 18 O(s) = β 0 + β 1 e(s) + β 2 l(s) + β 3 l 2 (s) + e(s) + ϵ(s), where e(s) was the elevation and l(s) was the latitude of site s We computed the spherical distance between sites s i and s j according to the formula d ij =2R E arcsin{ 1 2 [(cos l a(s i ) cos l o (s i ) cos l a (s j ) cos l o (s j )) 2 + (cos l a (s i ) sin l o (s i ) cos l a (s j ) sin l o (s j )) 2 + (sin l a (s i ) sin l a (s j )) 2 ] 1/2 }, (15) where R E = 63781km was the radius of the Earth We used d ij to define the Matérn correlation function ρ θ (u) in Model (6) and the correlation matrix R θ in Model (7) The ANOVA and parameter estimates are given by Tables 3 and 4 The estimate of the variance of the error term then is ˆσ 2 = MSE = 185166 = 549454 337 Therefore, the fitted multiple linear regression model was δ 18 O(s) = 5689 000168Elev(s) + 02053 Lat(s) 000538Lat 2 (s) + ϵ(s) 7

Table 3: ANOVA of the multiple regression model Variable df SS MS F-value p-value Elevation (β 1 ) 1 580629 580629 105674 000126721 Abs(Latitude) (β 2 ) 1 402497 402497 732539 0 Latitude Square (β 3 ) 1 449111 449111 817377 0 Error 337 185166 549454 Total 340 63838 Table 4: Parameter estimates of multiple regression model Variable Estimate Standard Error t-value p-value β 0 568898 0249172 228315 0 β 1 000168255 0000106863 157449 0 β 2 020534 00148348 138418 0 β 3 000538409 0000209837 256584 0 with ϵ(s) iid N(0, 549454) This model can be used to interpolate δ 18 O be the multiple regression method The QQ-plot in the regression method was almost a straight line, which implies there was no significant variation of the normal assumption The residual plot in the regression method showed randomness which implies no additional transformation is required for the response variable (eg by using the Box-Cox transformation) We also fitted a geostatistical model, the parameter estimates and ANOVA table are displayed in Tables 5 and 6 respectively The fitted correlation function was ρ θ (u) =ρ (08820,109253,1) (u) u =08820( 109253 )K u 1( 109253 ) for u > 0 Because the p-value of Moran s I was almost 0, we recommend to use the geostatistical model instead of the regression model Table 5: ANOVA of the multiple regression model Variable df SS MS F-value p-value Elevation (β 1 ) 1 144974 144974 191415 0 Abs(Latitude) (β 2 ) 1 693622 693622 915813 0 Latitude Square (β 3 ) 1 559661 559661 73894 000690032 Error 337 255238 757384 Total 340 144974 8

Table 6: Parameter estimates of multiple regression model Variable Estimate Standard Error t-value p-value β 0 443342 106554 416071 403135e 05 β 1 000165697 0000116952 14168 0 β 2 0162598 00598151 271835 000690032 β 3 000500147 0000812748 615379 214676e 09 References [1] Cliff, AD and JK Ord Spatial Processes: Models And Applications, Pion, London, 1981 [2] Cressie, N (1989), Geostatistics, The American Statistician, 42, 197-202 [3] Dwass, M Modified randomization tests for nonparametric hypothesis Annals of Mathematical Statistics, 28:181-187, 1957 [4] Good, P Permutation Tests Springer, New York, 2000 [5] Hoeting, JA, Davis, RA, Merton, AA and Thompson, SA (2006), Model Selection for Geostatistical Models, Ecological Application, 16, 87-98 [6] Huang, HC and Chen, Chen, CS (2007), Optimal Geostatistical Model Selection, Journal of American Statistical Association, 102, 1009-1024 [7] Huang, HC, Martinez, F, Mateu, J and Montes, F (2007), Model Comparison and Selection for Stationary Space-Time Models, Computational Statistics and Data Analysis, 51, 4577-4596 [8] Moran, PAP (1948), The Interpretation of Statistical Maps, Journal of the Royal Statistic Society Series B, 10, 243-251 [9] Matérn, B (1986) Spatial Variation, 2nd Edition (Lecture Notes in Statist3 6) Springer, Berlin [10] Stein, A and Corsten, LCA (1991) Universal kriging and cokriging as regression procedure Biometrics, 47, 575-587 9