Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number of pages = 21 Number of supplemenal figures = 10 Number of supplemenal ables = 2 S1

16 SUPPLEMENTAL MATERIALS, TABLES AND FIGURES 17 18 19 20 Kriging Deails Consider he following model which decomposes an observaion (on he log scale) ino addiive signal and noise componens: Z S ; T, (1) 21 22 23 24 25 26 27 28 where Z represens an observaion a ime wihin some coninuous ime inerval T, and S and represen he underlying signal and noise componens, respecively. In pracice, T is ypically discreized ino a uniform grid of, say, daily unis. The signal and noise componens are assumed o be uncorrelaed, and he noise componen is 2 assumed o be a zero-mean uncorrelaed ime-series process wih consan variance =. Suppose ha he signal componen can be furher decomposed ino a fixed regression componen and a random componen represening a zero-mean, second-order saionary ime-series process possessing a emporal correlaion srucure. Specifically, consider he following model: S x β ; T, (2) 29 30 31 32 where x represens a (fixed) vecor of covariae values a ime and β represens he vecor of associaed regression coefficiens; and represens he zero-mean residual a ime. Assume ha he residual ime-series process has he following second-order saionary variance- covariance srucure: cov(, u) C( u ); for all, u T, (3) 33 where Ch ( ) represens he covariance beween residuals a disance of h apar; in paricular, 34 le C 2 (0). Second-order saionariy assumes a consan mean and variance, and a emporal S2

35 36 37 correlaion srucure ha depends only on he lag disance beween wo observaions, and his can be checked graphically. Therefore, we can express he observed process as: 38 where Z x β ; T, (4) and are assumed o be uncorrelaed, and r represens he observed 39 40 residual a ime, and hence has he following second-order saionary variance-covariance srucure: 2 2 ; u cov( r, ru ) CZ ( u ), for all, u T. C( u ); u (5) 41 42 43 44 45 46 47 Noe ha cov(, ) and cov( r, r ) are idenical excep in he case where = u (i.e., where he u u sampled and predicion locaions coincide). Suppose ha a sample of ime poins has been drawn, represened by A { : i 1,..., n} T {1,..., N}, where T represens he complee se of days in he ime i inerval of ineres (i.e., he N = 122 days of he usage season). Following Cressie (1993; Secion 3.2.1), we ypically wan o predic (some funcion of) he signal process S ( S1,..., SN ) over all days in T (i.e., we wan o filer ou he noise componen), using informaion abou 48 Z ( Z,..., Z ) obained from he sampled daa and informaion from he covariae(s) from he 1 n 49 50 51 2 sampled and non-sampled daa. In he absence of noise (i.e, if 0), predicions a he sampled poins will coincide wih he Z-values a hose poins. The (heoreical) universal kriging predicor of he signal a ime 0 can be expressed as: ˆ ˆ 1 S ˆ x β c Σ Z Xβ 0 0 0 ( ), (6) S3

52 1 1 1 where βˆ ( X Σ X) X Σ Z is he generalized leas squares (GLS) esimaor of β, 53 X ( x,..., x ), Σ var( Z), and 1 n c ( C( ),..., C( )). From linear-model heory 0 1 0 n 0 54 55 (Goldberger, 1962), we have ha ˆβ is he bes linear unbiased esimaor (BLUE) of β, and ha S is he bes linear unbiased predicor (BLUP) of ˆ0 S 0, where "bes" means ha he mean 56 57 squared (predicion) error is minimized. The mean squared predicion error (MSPE) of S ˆ can 0 be expressed as: ˆ ˆ 2 2 1 1 1 1 1 MSPE( S ) E( S S ) c Σ c ( x c Σ X)( X Σ X) ( x X Σ c ) 0 0 0 0 0 0 0 0 0 (7) 58 hence he (1 α) 100% predicion inerval of S ˆ can be expressed as: 0 (S 0 z α 2 MSPE(S 0 ), S 0 + z α 2 MSPE(S 0 )) (8) 59 60 61 62 where z α 2 is he upper α 2 criical value of he sandard normal disribuion. Typically, he elemens of Σ and c 0 are unknown, so we need o inser esimaed values in heir place; he esimaed values are obained from he following procedures involving semivariograms. The (heoreical) semivariogram of he observable (noisy) r residuals is defined as: 0, u 1 r (, u) 2 var( r ru ), for all, u T, 2 2 C( u ), u (9) 63 64 65 66 67 2 2 2 where is ofen referred o as he "parial sill," and is ofen referred o as he "sill." 2 Also, (0 ) lim ( u, ) is ofen referred o as he "nugge," and he "range" ofen refers r u 0 r o he smalles lag disance for which C( u ) 0. A special case of universal kriging (called ordinary kriging) applies when x β (i.e., he mean funcion is consan). S4

68 69 70 In pracice, ( u, ) is firs esimaed a he u lags in he sampled daa. For example, a 7-day r sampling frequency would allow us o esimae he semivariogram only a lags 7, 14, 21, ec. The nex sep is o fi a semivariogram model o hose esimaes (see Cressie, 1993, for furher 71 deails). The semivariogram model will provide esimaes of 2, 2, and C( u ), which can 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 hen be used as esimaes of he elemens of Σ and c. 0 Because of he limied sample and lack of daa a shor lags, only he piecewise linear semivariogram model (wih range = 14) was used for 14-day sampling. For 7-day sampling, we compared wo approaches: he firs was o successively fi exponenial, hen spherical models unil a sable fi was achieved, wih each semivariogram model fied o semivariogram esimaes successively derived from hree mehods described in Cressie (1993; Secion 2.4) (e.g., mehod-of-momens, hen robus, hen median approach). If no fi was sable hen a piecewise linear semivariogram model (wih range = 7) was used. The second was o fi only he linear semivariogram model. Also in pracice, universal kriging predicion ypically proceeds according o an ieraive process involving he following seps: 1) a regression model is fied o he daa by means of ordinary leas squares (OLS); 2) he elemens of Σ and c 0 are hen esimaed from he residuals of he resuling model; 3) he regression model is refied using he updaed elemens of Σ and he elemens of Σ and c 0 : 4) c 0 are hen re-esimaed from he residuals of he updaed model; and 5) his ieraive process is coninued unil he esimaes converge (e.g., Cressie, 1993; Secion 3.4.3). However, for his paper, he ieraive process was runcaed a he second sep, and hence he MSPE of S ˆ described above can be viewed as an approximaion. The jusificaions for 0 runcaing he ieraive process include logisical reasons (i.e., we needed a general simplified predicion approach ha could be applied across many sie years), and ha since our goal is S5

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 predicion (wih mehod performance evaluaed by comparison agains near-complee monioring daases), quesions abou he approximaions assumed in deriving he models are of less ineres. Ordinary kriging is hen applied o predic he residual ime series, and finally hose prediced values are added o he regression model o form he universal kriging predicions. The reason for OLS esimaion in he firs sep insead of GLS esimaion as described in he heoreical universal kriging predicor (6), is ha he GLS esimaor requires simulaneous knowledge of Σ and c 0, bu hese quaniies can only be esimaed in he second sep of he ieraive process. OLS esimaion is sill unbiased and is only slighly less precise han GLS esimaion, so he pracical applicaion in he wo seps described is ypically followed. Oher pracical consideraions included he use of rimming prediced values. For example, predicions ha fell below he LOD were rimmed o he LOD value, and ordinary kriging predicions of he residuals a non-sampled locaions were rimmed so ha hey could no exceed wo sandard deviaions from zero. Finally, exponeniaion of he kriging predicions and predicion limis was applied o back- ransform hese values o he original arazine scale. Exponeniaion may resul in a small amoun of bias, bu no correcion facor was applied (Aldworh and Cressie, 2003). SUPPLEMENTAL FIGURE CAPTIONS Supplemenal Figure S1. Sudy monioring sies, as idenified by he Arazine Ecological Monioring Program (AEMP) sie ID code. 109 110 Supplemenal Figure S2. Semivariogram models fied o empirical semivariograms for wo example sie-years wih daily daa. S6

111 112 113 114 115 Supplemenal Figure S3. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for ordinary kriging (solid circle) and linear inerpolaion (open circle) for 7-day sampling and maximum 7-day rolling average (a), and 14-day sampling and maximum 7-day rolling average (b). Connecing lines are blue if kriging performs beer and red if i performs worse han linear inerpolaion. 116 117 118 119 120 Supplemenal Figure S4. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for ordinary kriging (solid circle) and linear inerpolaion (open circle) for 7-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if kriging performs beer and red if i performs worse han linear inerpolaion. 121 122 123 124 125 Supplemenal Figure S5. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for ordinary kriging (solid circle) and linear inerpolaion (open circle) for 14-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if kriging performs beer and red if i performs worse han linear inerpolaion. 126 127 128 129 130 131 Supplemenal Figure S6. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for universal kriging wih PRZM-hybrid covariae (solid circle) and linear inerpolaion (open circle) for 7-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if kriging performs beer and red if i performs worse han linear inerpolaion. S7

132 133 134 135 136 137 Supplemenal Figure S7. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for universal kriging wih PRZM-hybrid covariae (solid circle) and linear inerpolaion (open circle) for 14-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if kriging performs beer and red if i performs worse han linear inerpolaion. 138 139 140 141 142 143 Supplemenal Figure S8. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for log-linear inerpolaion (solid circle) and linear inerpolaion (open circle) for 7-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if log-linear inerpolaion performs beer and red if i performs worse han linear inerpolaion. 144 145 146 147 148 149 Supplemenal Figure S9. RRMSPE (i.e., RMSPE/observed maximum rolling average) versus observed maximum rolling average for log-linear inerpolaion (solid circle) and linear inerpolaion (open circle) for 14-day sampling for yearly maximum, maximum 7-day rolling average, maximum 14-day rolling average, and maximum 30-day rolling average. Connecing lines are blue if log-linear inerpolaion performs beer and red if i performs worse han linear inerpolaion. 150 151 152 Supplemenal Figure S10. Ordinary kriging predicions (red) and corresponding predicion inervals a he 90% level (orange) for each possible sample of every 14-day sampling for sie KS-02 in 2011. 153 S8

154 S9

155 156 157 TABLES Supplemenal Table S1. Analysis daa se of sie-years, usage season sample sizes and percen days sampled in he Arazine Ecological Monioring Program (AEMP). 158 No. Days No. Days Sie Year Sampled Sampled Percen Percen (Max=122 Sie Year (Max=122 Complee Complee in April 1- in April 1- July 31) July 31) IA-03 2010 99 81 MO-01D 2011 99 81 IA-03 2011 104 85 MO-02 2009 99 81 IA-04 2010 97 80 MO-02 2010 112 92 IA-04 2011 100 82 MO-04A 2010 100 82 IA-05 2011 98 80 MO-04A 2011 100 82 IL-11 2010 99 81 MO-05 2010 100 82 IL-12 2010 99 81 MO-05B 2010 100 82 IL-14 2010 100 82 MO-06 2010 103 84 IL-15 2010 97 80 MO-06 2011 100 82 IL-16 2010 98 80 MO-07 2010 95 78 IL-17 2010 100 82 MO-07 2011 97 80 IN-12 2010 100 82 MO-08 2010 100 82 KS-01 2010 101 83 NE-04 2010 103 84 KS-01 2011 100 82 NE-04 2011 96 79 KS-02 2010 103 84 NE-05 2010 100 82 KS-02 2011 100 82 NE-05 2011 98 80 KS-03 2010 102 84 NE-08B 2009 92 75 KS-03 2011 95 78 NE-09 2010 100 82 MO-01 2011 100 82 NE-09 2011 100 82 MO-01C 2011 100 82 OH-05 2010 103 84 MO-01D 2010 96 79 OH-06 2010 92 75 159 160 S10

161 162 163 Supplemenal Table S2. Comparison of linear versus nonlinear semivariogram models using mean RRMSPE (i.e., RMSPE/observed maximum rolling average) for maximum rolling averages of duraion m=1,7,14,30, and for 7-day sampling. 164 Duraion of Rolling Average 1 Day 7 Days 14 Days 30 Days Linear semivariogram model 0.4811 0.3581 0.3196 0.2806 Nonlinear semivariogram model 0.4811 0.3624 0.3245 0.2852 Raio beween linear and nonlinear models 1.000 0.988 0.985 0.984 165 S11

166 SUPPLEMENTAL FIGURES 167 Supplemenal Figure S1 168 169 S12

170 Supplemenal Figure S2 171 172 173 S13

174 Supplemenal Figure S3 175 176 177 S14

178 Supplemenal Figure S4 179 180 181 182 S15

183 Supplemenal Figure S5 184 S16

185 Supplemenal Figure S6 186 S17

187 Supplemenal Figure S7 188 S18

189 Supplemenal Figure S8 190 S19

191 Supplemenal Figure S9 192 S20

193 Supplemenal Figure S10 194 195 S21