Recursive Estimation and Order Determination of Space-Time Autoregressive Processes

Size: px

Start display at page:

Download "Recursive Estimation and Order Determination of Space-Time Autoregressive Processes"

Sophia Atkins
5 years ago
Views:

1 Recursive Estimation and Order Determination of Space-Time Autoregressive Processes Ana Monica C. Antunes, Barry G. Quinn & T. Subba Rao First version: 23 September 2005 Research Report No. 4, 2005, Probability and Statistics Group School of Mathematics, The University of Manchester

2 Recursive estimation and order determination of space-time autoregressive processes By ANA MÓNICA C. ANTUNES Department of Mathematics University of Manchester Institute of Science and Technology (UMIST) PO Box 88, Manchester M60 1QD, UK BARRY G. QUINN Department of Statistics, Macquarie University Sydney, NSW, 2109, Australia And T. SUBBA RAO Department of Mathematics UMIST Abstract Space-time autoregressive moving average models may be used for time series measured at the same times in a number of locations. In this paper we propose a recursive algorithm for estimating space-time autoregressive (AR) models. We also propose an information criterion for estimating the model order, and prove its strong consistency. The methods are illustrated using both simulated and real data. The real data corresponds to hourly carbon monoxide (CO) concentrations recorded in September 1995 at four different locations in Venice. Key words: AIC, asymptotic normality, BIC, consistency, law of the iterated logarithm, order selection, recursive estimation, space-time AR process, Yule-Waler equations. Corresponding Author. Fax:

3 1 Introduction In many areas of research one is interested in the statistical analysis of measurements of one quantity taen at various locations (space) and times. Examples include air pollution concentrations at various locations in a city at different times, temperatures in a region, wind speeds and the spread of epidemics. Different methods have been proposed for the analysis of real data sets (see, for example, Mardia et al. (1998), Wicle & Cressie(1999), Guttorp et al. (1994), Shaddic & Waefield (2002), Haslett & Raftery (1989)). (Elaborate on how they have been used? ) (First STAR or STARMA model used when? Was it the same as in (1)?) In this paper we shall consider space-time autoregressions (Pfeifer & Deutsch (1980)). Let X(s i, t) (i = 1,..., N, t = 1,..., T ) denote a random variable associated with location s i and time t. Let X(t) (X(s 1 ; t),..., X(s N ; t)), t Z. X(t) is said to be a space-time AR process of order if X(t) satisfies a difference equation of the form X (t) + (φ I N + ψ W ) X (t ) = ε (t) (1) where the scalars φ, ψ ( = 1,..., ) are unnown parameters. I N is the N- dimensional identity matrix, W is a non-zero N N nown weighting matrix, with diagonal entries 0, without loss of generality) and off-diagonal entries related to the distances between the sites. Each row sum of W is normalised to 1, again without loss of generality. We assume that ε (t) is wealy stationary, with E ε (t) = 0 (2) and E ε (t) ε (t + s) σ 2 I N ; s = 0 = 0 ; s 0. (3) It is further assumed that the zeros of det I N + (φ I N + ψ W ) z (4) 2

4 are outside the unit circle, which ensures that there exists a wealy stationary process which satisfies (1). Fields of application of the above models include meteorology (Subba Rao & Antunes(2004)), criminology (Pfeifer & Deutsch (1980)), ecology (Epperson (2000), Stoffer(1986)), transportation studies (Garrido (2000)) and many others. A space-time AR process is a very parsimonious multivariate autoregression. It is this parsimony, however, which prevents the use of algorithms such as the Whittle(1963) algorithm to estimate the unnown parameters. However, we can use the principal ideas of Whittle(1963) and Quinn (1980) (see also Hannan & Deistler (1988, sec.6, ch. 5)) to develop recursive equations for the estimation of these parameters. The Levinson- Durbin and Whittle algorithms have a natural place in the estimation of uinvariate and multivariate autoregressive order (references). In this paper we also introduce an information criterion to estimate the order of the process and demonstrate its consistency. In Section 2, we derive Yule-Waler relations for the space-time AR process. Using these equations we obtain recursive relations for estimating the parameters of the model. The sampling properties of the estimators are discussed in Section 3. The problem of estimating the order of the model is considered in Section 4. For clarity of explanation the proofs of the results obtained in Sections 2-4 are given in a separate Section 5). We illustrate the methodology with simulated data in Section 6. In Section 7 we use hourly carbon monoxide (CO) concentrations recorded in 1995 at four locations in Venice to illustrate our estimation procedures. 2 Yule-Waler equations and recursive algorithm Here we obtain Yule-Waler lie equations for space-time AR processes. These will be not the usually-specified ones, which represent a 1:1 relationship between autocovariances and system parameters, because of the fact that the autoregressive matrices 3

5 are specified by the (2 + 1) parameters σ 2 and the φ and ψ. For Z, let γ = E X (t) X (t + ), π = E X (t) W X (t + ), λ = E X (t) W W X (t + ). Put σ 2 = σ2 and Φ () = [φ 1,..., φ ], Ψ () = [ψ 1,..., ψ ], U = γ i i,,...,, V = π i i,,...,, F = λ i i,,...,, [ ] Γ = γ 1 γ, [ Λ = λ 1 ] λ, [ Π = π 1 ] π, [ Π = π 1 ] π. THEOREM 1 Let X(t) be a space-time AR process satisfying (1) where ε (t) satisfies (2) and (3). Then U V V F Φ() Ψ () = Γ Π Nσ 2 = γ 0 + Γ Φ () (5) + Π Ψ () (6) Proof: See Section 5. That Φ () lemma. and Ψ () can easily be calculated solving (5) follows from the following 4

6 LEMMA 2 The 2 2 matrix Ω = U V V F is invertible for any. Proof: See Section 5. For given, the determination of Φ () by 2 matrix Ω. and Ψ () requires the inversion of the 2 The computational load can be reduced by using the Töplitz structure of Ω, a fact which is utilised in the Levinson-Durbin and Whittle algorithms. Moreover, as is the case with these algorithms, at each recursive step, information about the system order is revealed. In the next theorem we introduce a recursive, computationally efficient algorithm for solving (5). It should be recalled that the Whittle recursion involves the invention of forward autoregressive parameters. Our recursion will need these as well as another augmented set of parameters. Let (), Θ(), Φ(), Ψ(), (), Θ() U V V F Φ() Ψ () satisfy () Θ () = Γ Π Π Λ and U V V F Φ() Ψ () () Θ () = Γ Π Π Λ. (7) For = 0, 1,..., put Φ(+1) +1 (+1) +1 Ψ (+1) +1 Θ (+1) +1 Φ(+1) +1 (+1) +1 Ψ (+1) +1 Θ (+1) +1 = = Φ (+1) (+1) φ (+1) +1 δ (+1) +1 Ψ (+1) Θ (+1) ψ (+1) +1 θ (+1) +1 Φ (+1) φ (+1) +1 Ψ (+1) ψ (+1) +1 (+1) δ (+1) +1 Θ (+1) θ (+1) +1,, 5

7 where Φ (+1), Ψ (+1), (+1), Θ (+1), Φ (+1) φ (+1) +1, ψ(+1) +1, δ(+1) +1, θ(+1) +1, φ(+1), +1 ψ(+1), +1 δ(+1) x = Let [ x 1 x m ], let x = [ and for 1, let S 0 = T 0 = γ 0 π 0 π 0 λ 0, Ψ (+1), (+1) and Θ (+1) are 1 and +1 and θ (+1) +1 are scalar. For any vector x m x 1 ], i.e. x with its elements reversed., P 0 = Q 0 = γ 1 π 1 π 1 λ 1 P = γ +1 π +1 + π 1 λ +1 π 1 Γ Π Q = γ +1 + Γ π +1 λ +1 Π S = γ 0 π 0 + π 0 λ 0 T = γ 0 π 0 π 0 λ 0 + Γ Π Π Λ Γ Π Π Λ Π Λ Π Λ Φ() Ψ () Φ() Ψ () Φ() Ψ () Φ() Ψ () () Θ () () Θ () () Θ () () Θ (), (8), (9), (10) (11) THEOREM 3 For 0, and for 1, φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 = T 1 P, φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1 = S 1 Q Φ(+1) Ψ (+1) Φ(+1) Ψ (+1) (+1) Θ (+1) (+1) Θ (+1) = = Φ() Ψ () Φ() Ψ () () Θ () () Θ () + + () Φ Ψ () () Φ Ψ () () Θ () () Θ () φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1,. Proof: See Section 5. The following theorem shows that S and T are covariance matrices and are invertible. THEOREM 4 Let ε (t), u (t), ε (t) and u (t) be the differences between X (t) and 6

8 its predictors (a I + b W ) X (t ) which respectively minimise E E E and E X (t) (a I + b W ) X (t ) X (t) W X (t) X (t) (a I + b W ) X (t ), (c I + d W ) X (t ) W X (t) (a I + b W ) X (t + ) X (t) W X (t) (c I + d W ) X (t ), (a I + b W ) X (t + ) (c I + d W ) X (t + ) W X (t) (c I + d W ) X (t + ). Then S = E ε (t) u (t) T = E ε (t) u (t) [ ε (t) [ ε (t) u (t) u (t) ], ] and S and T are of full ran for all. Furthermore, Nσ 2 is the (1, 1) entry of S. Proof: See Section 5. The following lemma presents recursive expressions for S, T and σ 2. The equations (12) and (13) should be used instead of (10) and (11) when estimating, as the former equations are expected to be less prone to rounding problems. Similar relations exist for the Levinson-Durbin and Whittle algorithms. LEMMA 5 Let S and T be as defined above. Then S +1 = S I T +1 = T I φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 7 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1, (12). (13)

9 Proof: See Section 5. The following lemma reduces the computations needed in solving the recursive equations in Theorem 3. LEMMA 6 Let P and Q be as defined in (8) and (9), respectively. Then Q = P. Proof: See Section 5. 3 Parameter estimation for space-time AR processes The above results describe exact relationships between the theoretical covariance and autocovariance matrices, and the true parameters. The above algorithm becomes an estimation algorithm when the theoretical covariances are replaced by consistent estimators. Let X(t) be a space-time process. Given X (t) ; 1 t T, estimate γ, π and λ by the obvious moment estimators T ˆγ = T 1 X (t) X (t + ), t=1 T ˆπ = T 1 X (t) W X (t + ) and T ˆλ = T 1 X (t) W W X (t + ), t=1 t=1 respectively. Without loss of generality, we shall assume that the X(t) have mean zero, for, in practice, we shall always mean-correct them. This mean-correction will have no asymptotic effect, so we omit any further reference for notational simplicity. The following additional wea assumptions will be made in order to develop the asymptotic properties of the estimators: 1. ε (t) is strictly stationary and ergodic; 2. E ε (t) F t 1 = 0; 8

10 3. E ε (t) ε (t) F t 1 = σ 2 I N ; 4. E ε 4 (t) <, = 1,..., N, where F t is the σ-field generated by ε (s) ; s t. In the following we denote generically by θ the estimator of θ obtained from the recursion by substituting ˆγ for γ, etc. THEOREM 7 Let X (t) satisfy (1) and the above assumptions. Then 1. Φ () and Ψ () converge almost surely to Φ () and Ψ () ; 2. The distribution of T [ Φ() Φ () Ψ(), to the normal with mean zero and covariance matrix ] Ψ () converges, as T σ 2 U V V F 1. Proof: See Section 5. 4 Estimating The above presupposes the true order to be nown. In practice, however, we shall need to estimate. The following theorem describes the behaviour of a class of AICtype estimators of and establishes conditions under which these estimators are strongly and wealy consistent. THEOREM 8 Let be the minimiser of φ f () = NT log σ 2 + 2f (T ) over = 0, 1,..., K, where K is nown a priori to be larger than 0, the true order, and f (T ) /T 0 as T. Then 1. converges to 0 in probability if f (T ) diverges to ; 9

11 2. If f (T ) = c, for all T, then lim Pr = 0 + = T 0 ; < 0 p q K 0 ; 0 K 0, where the p and q are generated by p z = exp 1 z Pr ( χ 2 2 > 2c ) =0 and q z = exp 1 z Pr ( χ 2 2 2c ). =0 In particular, as K. lim Pr = 0 = q K 0 T 3. converges to 0 almost surely if exp lim inf T 1 Pr ( χ 2 2 > 2c ), f (T ) log log T > 1, and only if lim inf T f (T ) log log T 1. Thus, it can be seen that, when f(t ) is constant, the asymptotic probability of overestimating the order is greater than zero and the probability of underestimation is zero. Also, the probability of estimating the correct order is strictly less than 1. Hence the AIC estimator (which corresponds to the case where f(t ) = 2) is not a consistent estimator of the order. The case lim inf T f (T ) log log T = 1 10

12 is of some interest but will not be dealt with here, especially as the law of the iterated logarithm used to prove these results (see Lemma 10 in the appendix) can only be expected to hold for extremely large values of T. 5 Proofs of theorems Here we prove the results contained in the theorems and lemmas stated in Sections 2-4. PROOF of Theorem 1 We motivate the normal equations and recursions by developing the Gaussian maximum lielihood procedure, i.e. the maximum lielihood estimators when ε (t) is assumed to be Gaussian. We do not, however, need to assume that ε (t) is Gaussian we merely use the Gaussian lielihood to obtain the correct normal equations, which will hold whether or not ε (t) is Gaussian. The Gaussian maximum lielihood estimators of the parameters φ, ψ ( = 1,..., ) and σ 2 for a space-time AR process of order can be obtained by maximizing the Gaussian log-lielihood function l = T N 2 log ( 2πσ 2) 1 2σ 2 T ε (t) ε (t) t=1 where ε (t) is now understood to be the function of φ 1,..., φ, ψ 1,..., ψ, σ 2 X (1),..., X (T ) given by and ε (t) = X (t) + (φ I + ψ W ) X (t ). This is equiavlent to minimizing the sum of squares S (Φ, Ψ) = = T ε (t) ε (t) t=1 T X (t) + t=1 X (t) + (φ I + ψ W ) X (t ) (φ I + ψ W ) X (t ) 11

13 where Φ = [φ 1,..., φ ] and Ψ = [ψ 1,..., ψ ], and then setting T Nσ 2 equal to the minimum value of S (Φ, Ψ). Since S (Φ, Ψ) is quadratic in Φ and Ψ, we can minimise S (Φ, Ψ) by differentiating with respect to φ m, ψ m (m = 1,..., ) and equating the derivatives to zero. Now, S (Φ, Ψ) φ m = 2 S (Φ, Ψ) ψ m = 2 T X (t m) X (t) + t=1 (φ I + ψ W ) X (t ) T X (t m) W X (t) + t=1 (φ I + ψ W ) X (t ). Least squares estimators may be obtained by solving these equations, which comprise a set of 2 linear equations in 2 unnowns. We wish, however, to develop a recursive algorithm such as the Levinson-Durbin and Whittle algorithms, and so need to obtain the correct formulation for the Yule-Waler relations for space time autoregressions. These equations, which are theoretical normal equations, are obtained by equating the expectations of the right hand sides of the above to 0. We thus have for each m = 1,..., γ m + π m + (φ γ m + ψ π m ) = 0 (φ π m + ψ λ m ) = 0. Stacing these equations from m = 1 to, we obtain (5). Note that γ m = γ m and λ m = λ m but that π m π m. It is straightforward to verify that these equations hold true for any wealy stationary process X (t) satisfying (1). Moreover, from the 12

14 above, Nσ 2 = E = γ 0 + T 1 T ε (t) ε (t) t=1 (φ γ + ψ π ) + + ψ (π + φ i π i + = γ 0 + φ (γ + (φ γ + ψ π ). ) ψ i λ i φ i γ i + ) ψ i π i This completes the proof of the theorem. PROOF of Lemma 2 [ Let α and β be arbitrary constant 1 vectors, with form [ ] α β Ω α β α β ] 0. The quadratic is the expectation of α i X (t i) + β i X (t i) W α X (t ) + β W X (t ) i, = Z Z where Z = Z and Z = α X (t ) + β W X (t ). Thus Ω is non-negative definite and it is invertible unless Z = 0 almost everywhere, which occurs only when there is a linear relation amongst X (t 1),..., X (t ) and therefore amongst the ε (t). However, we have assumed that X (t) satisfies (1) and is wealy stationary, and that ε (t) is an uncorrelated process. There can therefore be no linear relations and so Ω is invertible and positive definite. 13

15 PROOF of Theorem 3 From (5), the normal equations of order + 1 may be written, using the notation in Theorem 3, as U Γ V Π Γ γ 0 Π π 0 V Π F Λ Π π 0 Λ λ 0 Φ (+1) φ (+1) +1 Ψ (+1) ψ (+1) +1 = Γ γ +1 Π π 1. of Φ () We now proceed to derive the recursions for calculating Φ (+1) +1 and Ψ (+1) +1 in terms and Ψ (). The above equations may be written as the two equations U V V F Φ(+1) Ψ (+1) + Γ Π Π Λ φ(+1) +1 ψ (+1) +1 = Γ Π (14) and Γ Π Π Λ Φ(+1) Ψ (+1) + γ 0 π 0 π 0 λ 0 φ(+1) +1 ψ (+1) +1 = γ +1 π 1. (15) Equations (14) and (5) give Φ(+1) Ψ (+1) = U = V Φ() Ψ () V F 1 U V Γ Π V F U 1 Γ Π V V F Π Λ 1 Γ Π φ(+1) +1 ψ (+1) +1 Π Λ. φ(+1) +1 ψ (+1) +1 Thus, from (15), we have γ 0 π 0 π 0 λ 0 Γ Π Π Λ U V V F 1 Γ Π Π Λ φ(+1) +1 ψ (+1) +1 = γ +1 π 1 + Γ Π Π Λ Φ() Ψ (), 14

16 or φ(+1) +1 ψ (+1) +1 = 0 π 0 π 0 λ 0 = 0 π 0 π 0 λ 0 Γ Π Π Λ Γ Π Π Λ U V V F U V V F 1 1 Γ Π Γ Π Π Λ Π Λ 1 1 P 1 P 1 (16) where P 1 is the first column of P. The above simplification results using the following reasoning: Suppose U V V F a c b d = Γ Π Π where a, b, c and d are of the same dimension. We can obtain Λ, (17) Γ Π Π Λ on the right hand side of (17) by reversing the rows of [ U V ] and [ V F ]. To obtain recognisable matrices, we then need to reverse the relevant columns. U and F remain unchanged, after reversal of both rows and columns, as they are Töplitz, while V is changed to V. We preserve (17) by reversing a, b, c and d. Thus (17) is the same as U V V F ã c b d = Γ Π Π Λ, 15

17 and we therefore have Γ Π Π Λ U V = Γ Π a Π Λ c = Γ Π ã Π Λ c = Γ Π Π Λ V F b d b d U V V F 1 Γ Π 1 Γ Π Π Λ Π Λ. Noting the differences between U V V F 1 Γ Π Π Λ and the solution to (5) : U V V F 1 Γ Π = Φ() Ψ (), it is clear that we need a double recursion and that we also need to augment the equations. Using the notation defined in the statement of the theorem, we augment (5) to the equation U V V F Φ() Ψ () () Θ () = and in the light of (16), define the forward equations Γ Π Π Λ (18) U V V F Φ() Ψ () () Θ () = Γ Π Π Λ. (19) 16

18 Using similar methods as used to obtain (14) and (15), we obtain Φ(+1) Ψ (+1) = = Φ() Ψ () Φ() Ψ () Φ(+1) Ψ (+1) = = Φ() Ψ () Φ() Ψ () (+1) Θ (+1) () Θ () () Θ () (+1) U Θ (+1) () Θ () () Θ () + + V () Φ Ψ () V F U V V () Φ Ψ () F () Θ () () Θ () 1 Γ Π 1 Π Λ φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 Γ Π Π Λ φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 (20), (21) φ(+1) +1 ψ (+1) +1, δ (+1) +1 θ (+1) +1 (22) φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 = γ 0 π 0 Γ Π π 0 λ 0 Π Λ = 0 π 0 + π 0 λ 0 = T 1 P, Γ Π Π Λ U V V Φ() Ψ () F () Θ () 1 Γ Π 1 P Π Λ 1 P and γ 0 π 0 Γ π 0 λ 0 Π Π Λ π 1 = γ +1 π +1 λ +1 U V V F + Γ Π 1 Π Λ Γ Π Π Φ() Ψ () Λ () Θ () φ(+1) +1 ψ (+1), +1 δ (+1) +1 θ (+1) +1 17

19 i.e. φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) = 0 π 0 π 0 λ 0 = 0 π 0 π 0 λ 0 + Γ Π Γ Π Π Λ Π Λ U V Φ() Ψ () V F () Θ () Γ Π Π 1 Q Λ Q = S 1 Q. The above equations are valid for 1. The recursion commences with the matrices φ(1) 1 δ (1) 1 ψ (1) 1 θ (1) 1 and φ(1) 1 ψ (1) 1 From above, these are, respectively, given by δ (1) 1 θ (1) 1. U 1 V 1 V 1 F 1 1 Γ 1 Π 1 Π 1 Λ 1 = γ 0 π 0 π 0 λ 0 = T 1 0 P 0 1 γ 1 π 1 π 1 λ 1 and 1 1 U 1 V 1 V 1 F 1 Γ 1 Π 1 Π 1 Λ 1 = γ 0 π 0 π 0 λ 0 γ 1 π 1 π 1 λ 1 = S 1 0 Q 0 This completes the proof. PROOF of Theorem 4 18

20 Let ε (t) be X (t) (a I + b W ) X (t ), where the a and b are chosen to minimise E X (t) (a I + b W ) X (t ) X (t) (a I + b W ) X (t ). Then, since all relevant covariances exist, by differentiating and equating to zero we obtain 0 = 1 E X (t) 2 a i [ (a I + b W ) X (t ) X (t) (a I + b W ) X (t ) = E X (t i) X (t) = γ i and ] (a I + b W ) X (t ) (a γ i + b π i ) 0 = 1 E X (t) 2 b i [X (t i) W (a I + b W ) X (t ) X (t) (a I + b W ) X (t ) = E X (t) = π i (a π i + b λ i ). ] (a I + b W ) X (t ) Similarly, we obtain [ ] 0 = E X (t i) W X (t) (c I + d W ) X (t ) = π i (c γ i + d π i ) 19

21 and ] 0 = E [X (t i) W W X (t) (c I + d W ) X (t ) = λ i (c π i + d λ i ). [ ] [ Thus, from (18), if a = a 1 a, b = [ ] and d = d 1 d, then b 1 b ], c = [ c 1 c ] a b c d = Φ() Ψ () () Θ (). Hence, letting φ () and ψ () be the th components of Φ () and Ψ (), we obtain, using (5), E ε (t) ε (t) ( = E X (t) + = γ ( ψ () = γ 0 + Γ Φ () φ () π + φ () I + ψ () W ) γ + ψ () π + ( φ () i + Π Ψ (). ) X (t ) X (t) + φ () ) π i + ψ () λ i γ + ( φ () i ( φ () I + ψ () W ) γ i + ψ () π i ) X (t ) The above expression is ust the (1, 1) element of S which is also equal to Nσ 2. The same reasoning shows that E ε (t) u (t) and E u (t) u (t) are the (1, 2) and (2, 2) elements of S. The result for T follows using an identical argument, but using prediction using the future, rather than the past. We now show that S and T are (strictly) positive definite for each. Suppose S has a zero eigenvalue. Then there exist scalars α and β, not both 0, such that αε (t) + βu (t) = 0, 20

22 a.e. But, letting 0 be the true order, we have αε (t) + βu (t) = α X (t) (a I + b W ) X (t ) + β W X (t) (c I + d W ) X (t ) = (αi + βw ) X (t) (αa + βc ) I + (αb + βd ) W X (t ) 0 = (αi + βw ) ε (t) (αi + βw ) (φ I + ψ W ) X (t ) (αa + βc ) I + (αb + βd ) W X (t ). However, the last two terms (the sums) are linear in X (t 1),... and therefore uncorrelated with ε (t). Thus αε (t) + βu (t) is zero, a.e. only if (αi + βw ) ε (t) is zero. This in turn implies that Σ = E ε (t) [ ] ε (t) W ε (t) ε (t) W is not of full ran. But Σ = σ 2 N tr W tr W tr (W W ) = σ 2 N 0. 0 tr (W W ) Thus the S (and in turn the T, since det T = det S ) are invertible as long as tr (W W ) 0. But tr (W W ) = N i, W 2 i. 21

23 Thus tr (W W ) > 0, with equality if and only if the W i are zero for all i and. But we have ruled out this case, for otherwise X (t) would be a vector of uncorrelated autoregressions. PROOF of Lemma 5 Although (10) and (11) define S and T, these formulae can sometimes cause problems in practice. Just as the Levinson-Durbin (Durbin (1960)) algorithm and Whittle(1963) recursions have alternative, more numerically stable formulae for the innovation variances and covariance matrices (see Quinn (1980), for example), there are alternative recursive formulae for the S and T. From (10) and (21) we have S +1 Π 1 = γ 0 π 0 + Γ +1 π 0 λ 0 Π +1 Λ +1 = γ 0 π 0 + π 0 λ 0 Γ Π Π Λ Φ(+1) +1 (+1) +1 Φ(+1) Ψ (+1) + γ +1 π 1 φ(+1) +1 δ (+1) +1 π +1 λ +1 ψ (+1) +1 θ (+1) +1 = γ 0 π 0 + π 0 λ 0 + Γ Π Π Λ π 1 + γ +1 π +1 λ +1 Γ Π () Φ Ψ () Π Λ () Θ () Ψ (+1) +1 Θ (+1) +1 Φ() Ψ () φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 () Θ () (+1) Θ (+1) φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 22

24 = S + Q = S S = S I φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 ψ (+1) +1 φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1 δ (+1) +1 θ (+1) +1 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1. Similarly, T +1 = T I φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1. PROOF of Lemma 6 By definition, P = γ +1 π +1 + π 1 λ +1 = γ +1 π +1 π 1 λ +1 Γ Π Γ Π Π Λ Π Λ Φ() Ψ () U = γ +1 π +1 Γ Π π 1 λ +1 Π Λ = γ +1 π +1 + π 1 λ +1 π 1 = γ +1 π +1 λ +1 Φ() Ψ () + Γ Π () Θ () Π Λ V () Θ () 1 V F U V V F Γ Π Π Φ() Ψ () Λ 1 () Θ () Γ Π Π Λ Γ Π Π Λ = Q. PROOF of Theorem 7 From (5), Φ () Ψ () = Û V V F 1 Γ Π. 23

25 Thus Φ () Ψ () Φ() Ψ () = Û V V F 1 Γ Π + Û V V F Φ() Ψ (). Now the th component of Γ + [ Û V ] Φ() Ψ () is T 1 T + X (t) X (t ) + t=1 T i ψ i T 1 t=1 T = T 1 X (t) + t=1 T i φ i T 1 t=1 X (t) W X (t + i ) φ i X (t i) + X (t) X (t + i ) ψ i W X (t i) X (t ) + o ( T 1+δ), almost surely, for any δ > 0, since the difference involves only a finite number of quadratic terms in the components of X (t), X (t 1),.... Thus the th component is T 1 T ε (t) X (t ) + o ( T 1+δ), almost surely as T. Similarly, the th component of t=1 Π + [ V ] F Φ() Ψ () is But both T 1 T ε (t) W X (t ) + o ( T 1+δ). t=1 T 1 T ε (t) X (t ) t=1 24

26 and T 1 T ε (t) W X (t ) converge almost surely to 0 by the ergodic theorem. Also, since t=1 Û V V F converges almost surely to U V V F, again by the ergodic theorem, the first part of the theorem follows. Now, letting a and b be arbitrary N-dimensional vectors, we have from the above, [ ] a b T = T 1 ε (t) t=1 Γ Π + Û V V F Φ() Ψ () (a I N + b W ) X (t ) + o ( T 1+δ). But, letting ξ (t) = ε (t) (a I N + b W ) X (t ), we have E ξ (t) F t 1 = 0 E ξ 2 (t) Ft 1 = σ 2 X (t i) (a i I N + b i W ) (a I N + b W ) X (t ) 25

27 and E ξ 2 (t) = σ 2 E X (t i) (a i I N + b i W ) = σ 2 = σ 2 [ (a I N + b W ) X (t ) (a i a γ i + a i b π i + b i a π i + b i b λ i ) ] a b U V V F a. b Thus, by Billingsley s martingale central limit theorem (Billingsley (1961)), it follows that T [ Φ() Φ () Ψ() ] Ψ () is asymptotically normal with mean zero and covariance matrix 1 1 σ 2 U V V F U 1 V V F U V V F = σ 2 U V V F. PROOF of Theorem 8 The derivations of conditions for wea and strong consistency of the minimiser of φ f () each depend on the asymptotic properties of φ f ( + 1) φ f () = NT log σ2 +1 σ 2 + 2f (T ). Now, σ 2 +1 σ 2 = 1 + σ2 +1 σ2 σ 2 = 1 + ŝ+1 ŝ ŝ, 26

28 where ŝ denotes the (1, 1) element of Ŝ. From (12), S +1 S = S = Q φ(+1) +1 ψ (+1) +1 δ (+1) +1 θ (+1) +1 φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 = Q T 1 P = P T 1 P, φ(+1) +1 δ (+1) +1 ψ (+1) +1 θ (+1) +1 by Lemma 6. Thus, ŝ +1 ŝ = P 1 1 T P 1 = [ φ(+1) +1 ψ (+1) +1 ] T φ (+1) +1 ψ (+1) +1 It follows from Theorem 7 that T converges almost surely to T, which is positive definite, and that φ (+1) +1 and ψ (+1) +1 converge almost surely to φ (+1) +1 and ψ (+1) +1, which are both zero when 0, but not when = 0 1. Thus, when 0, we have ŝ +1 ŝ = T 1 σ 2 z z 1 + o (1), almost surely as T, where z is defined in Lemma 10 and ( ) φ f ( + 1) φ f () = NT log 1 + ŝ+1 ŝ + 2f (T ) ŝ [ = NT log 1 T 1 σ 2 z z ] 1 + o (1) Nσ 2 = z z 1 + o (1) + 2f (T ). + 2f (T ) 27

29 Also, when < 0, we have, almost surely as T, ŝ +1 ŝ [ [ ] φ +1 ψ +1 T φ +1 0 ; < 0 1 φ 0 ψ 0 ] T 0 1 ψ +1 φ 0 ψ 0 < 0 ; = 0 1. It follows that T 1 φ f ( + 1) φ f () converges almost surely to N log 1 s 1 +1 [ φ +1 ψ +1 ] T φ +1 ψ +1 0, if < 0 1, to N log [ 1 s 1 0 φ 0 ψ 0 ] T 0 1 φ 0 ψ 0 < 0, if = 0 1, and is asymptotically equivalent to T 1 z z + 2f (T ) (23) if 0. Hence φ f () is asymptotically non-increasing when < 0 1, and strictly decreasing at = 0 1. Since by Lemma 10 the z z are asymptotically independent and χ 2 2 when 0, φ f ( + 1) φ f () is asymptotically positive (in probability) if f (T ) diverges to with T. Thus, will be wealy consistent if f (T ) diverges to. Of course, consistency is an asymptotic concept, and criteria with larger values of f (T ) yield estimators which are equal or smaller. Thus in small samples, we may in fact underestimate the order. It is of interest therefore to choose f so as to fix the (asymptotic) probability of overestimation at some acceptable level. We proceed as 28

30 in Quinn (1988). Put f (T ) = c and note from above that Pr < 0 0. Then, noting that the z z 2c are asymptotically independent and distributed as χ 2 2 2c for = 0,..., K 1, we have lim Pr = 0 + T Pr W 1 < 0,..., W K 0 < 0 ; = 0 = Pr 0 < W, W 1 < W,..., W 1 < W, W +1 < W,..., W K 0 < W ; 1, where W = U i and the U i are independent and identically distributed with U i + 2c distributed as χ 2 2. Now Pr 0 < W, W 1 < W,..., W 1 < W, W +1 < W,..., W K 0 < W = Pr U U > 0,..., U > 0, U +1 < 0,..., U U K 0 < 0 = Pr U U > 0,..., U > 0 Pr U +1 < 0,..., U U K 0 < 0 = Pr W 1 > 0,..., W > 0 Pr W 1 < 0,..., W K 0 < 0 = p q K 0, where and p = Pr W 1 > 0,..., W > 0 q = Pr W 1 < 0,..., W < 0. Note that in the above we have relabelled the U i in order to obtain expressions in terms of probabilities involving the W i. Setting p 0 = q 0 = 1, and using Sparre-Andersen s theorems (see, for example, Sparre-Andersen(1954), Spitzer(1956) and Feller (1971)), 29

31 we have lim Pr = 0 + = p q K 0, T where the p i and q i may be obtained from the generating functions i=0 p i z i = e 1 z Prχ 2 2 >2c and i=0 q i z i = e 1 z Prχ 2 2 <2c. When K is moderately large, we can approximate lim T Pr = 0 + where q = lim q. Let τ = q 1 q. Then by p q, τ z = (q 1 q ) z 1 = 1 + z q z =0 =0 q z. 30

32 Hence 1 q = lim (1 q ) = lim lim z 1 = lim lim (q 1 q ) z z 1 = lim z 1 τ z = 1 + lim z 1 ( (q 1 q ) z z q z =0 ) q z =0 = 1 lim (1 z) e 1 z Prχ 2 <2c 2 z 1 = 1 lim z 1 elog(1 z)+ 1 z Prχ 2 2 <2c = 1 lim z 1 e 1 z [Prχ 2 2 <2c 1] = 1 e 1 Prχ 2 2 >2c. Thus q = e 1 Prχ 2 2 >2c. Note that when c > 1, q > 0, since by the central limit theorem as, Pr χ 2 2 > 2c χ = Pr 4 > (c 1) Pr Z > (c 1) 1 2π e 1 2 (c 1). Finally, we establish the conditions for strong consistency. From (23), the minimiser of φ f depends on the behaviour of the terms z z + 2f (T ), 31

33 = 0,..., K 1. From Lemma 10, we have for = 0,..., K 1, lim sup T z z 2 log log T = 1. If lim inf T f (T ) log log T > 1, then lim inf T z z + 2f (T ) 2 log log T > 0, and consequently 0, almost surely. If, on the other hand, lim inf T f (T ) log log T < 1, then we have lim inf T z z + 2f (T ) 2 log log T < 0, and so the event φ f ( + 1) φ f () < 0 occurs infinitely often, for each 0. Hence will not converge almost surely to 0. The condition lim inf T is thus necessary for strong consistency. f (T ) log log T 1 6 Empirical analysis of the properties of ˆ Let φ f () = NT log σ 2 + 2f (T ) be the criterion for selecting the order of the spacetime AR process. AIC, BIC and the Hannan & Quinn Criterion (HQIC) are obtained by putting f (T ) = 2, log T and 2 log (log T ) (Quinn (1980)), respectively. In order to compare the performance of the three criteria, the procedures developed in the previous sections were implemented in MATLAB and tested on simulated data. The performance is compared only for relatively small values of T. All the simulations reported were performed using pseudo-normal random numbers. 32

34 Data was simulated from a system of nine sites with weighting matrix: W = and the errors were pseudo-n(0, 1). Each table below shows the results of 100 replications of each model for varying sample sizes T and different parameters. Tables 1 and 2 show the frequencies of the orders estimated by each criterion for space-time AR processes of order 1. Results for Space-Time AR processes of order 2 are shown in Tables 3 and 4. From the following tables it can be seen that, for small Φ and Ψ and small T, AIC overestimates the order of the process more often than the other two criteria and BIC underestimates the order more often. HQIC lies between these two criteria by underestimating to a lesser degree than BIC and overestimating to a lesser degree than AIC. On the basis of this we prefer. Negative values of the parameters do not seem to give different results than positive values. When considering different values of the parameters, the correct order seems to be found more often when the value of Φ is larger than when the value of Ψ is larger. This can be observed in sub tables 3, 4 and 5 of Table 2. This observation seems to indicate a stronger weight of Φ in the model and reflects the fact that the influence of Ψ is somewhat weaened by the existence of the weighting matrix W. Although the performance of AIC improves, as T increases, for small Φ and Ψ, this improvement stops after a certain value of T. Also, for larger values of the coefficients, the performance does not appear to improve with T. This is easily explained by the inconsistency of AIC, shown in Theorem 8. Particularly bad performance from all criteria is observed in Table 3 for Φ = Ψ = [ ]. The 33

35 Φ = Ψ = [0.1] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [0.2] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [0.3] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [ 0.1] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [ 0.2] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Table 1: Frequency of correct order selection, for simulated data from Space T ime AR models of order 1. 34

36 Φ = [0.2] Ψ = [ 0.2] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [ 0.2] Ψ = [0.2] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [0.3] Ψ = [ 0.1] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [0.1] Ψ = [ 0.3] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [0.1] Ψ = [0.3] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Table 2: Frequency of correct order selection, for simulated data from Space T ime AR models of order 1. 35

37 Φ = Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = Ψ = [ ] T=50 T= T= > > > > 2 AIC HQIC BIC Φ = Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Table 3: Frequency of correct order selection, for simulated data from Space T ime AR models of order 2. 36

38 Φ = [ ] Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [ ] Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Φ = [ ] Ψ = [ ] T=50 T=100 T=150 T= > > > > 2 AIC HQIC BIC Table 4: Frequency of correct order selection, for simulated data from Space T ime AR models of order 2. 37

39 reason for this is that for these values, 2 of the zeros in (4) are 0. (Something wrong here. Two zeros appear to be outside the unit circle. Could you do the simulations again with 0.45 instead of 0.5?) 7 Data Analysis: Space-time AR models for Carbon monoxide concentrations in Venice The atmospheric concentration of carbon monoxide (CO) is a central issue in environmental monitoring, because of the direct relationship between excess CO and global warming (greenhouse effect). The problem is spatio-temporal in nature and given time series from different monitoring stations, we expect better insights into the dynamics of the process by fitting a spatio-temporal model to the data rather than analysing the spatial and temporal features separately. The data consist of 300 hourly carbon monoxide (CO) concentrations (in micrograms per cubic meter) recorded in September 1995 at four different locations in Venice (see Figure 1). The data were considered in Tonellato(2001) and are available at In Tonellato(2001) a multivariate time series model is fitted to the data using Bayesian methods. Tonellato assumes that the process is isotropic, which may be difficult to ustify. With data available at only four locations it is also difficult to model the spatial covariance structure of the system. Hence, we believe that the space-time autoregressive model is suitable for modelling this data. 6 site 1 6 site 2 atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) 6 site 3 6 site 4 atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) Figure 1: Original series of hourly carbon monoxide in 4 stations in Venice 38

40 Site Table 5: Distances between sites (in m) 7 1 Analysis We follow Tonellato(2001) by replacing the missing data with the means of the values at the same site and same hour over the whole sample period. We analyse the logarithms of the data, depicted in Figure 2. (This figure seems wrong. Compare with the one I ve attached. In particular, compare the first ten values in the first picture. I m worried that the analysis might not be correct.) There is evidence that this transformation stabilises the variance but we believe the data is still non-gaussian. The distances between the sites (Table 5) are used to construct the weighting matrix. The weights are taen as the inverse of the corresponding distances between the sites. The weighting matrix is standardised such that each row sum is 1: W = atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) Figure 2: Logarithm of series of hourly carbon monoxide in 4 stations in Venice Following Tonellato, we subtract the seasonal component present in each time 39

41 series individually prior to modelling. Assuming that the length of each cycle is 24 hours and denoting by Y(t) the logarithm of the original observations, we write where Y(t) = S(t)ξ + X(t) (24) [ S(t) = cos(πt/12) cos(6πt/12) sin(πt/12) sin(6πt/12) ] I 4 and [ ξ = ξ 1,1 ξ 1,6 ξ 4,1 ξ 4,6 ξ 1,1 ξ 1,6 ξ 4,1 ξ 4,6 ] and X(t) is the deseasonalised process in Figure 3, which will be used for fitting a space-time AR model. In practice, X(t) are the residuals obtained after fitting the model (24) by least squares to the transformed data. atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) atmospheric CO (micgr/m 3 ) atmospheric CO (micgr/m 3 ) t(hours) t(hours) Figure 3: Deseasonalised series(using harmonics) of log-hourly carbon monoxide in 4 stations in Venice. Using our methodology for order determination and using the recursive estimation techniques, we found that the best order for the space-time AR model is 1 and the corresponding estimated parameters are ˆφ = , ˆψ = , ˆσ 2 = The graph of the residuals obtained after fitting the space-time AR model (Figure 4) does not present any specific pattern. The residual autocorrelation functions are not significantly different from zero and the cross correlations are also very close to zero. 40

42 atmospheric CO (micgr/m 3 ) t(hours) atmospheric CO (micgr/m 3 ) t(hours) atmospheric CO (micgr/m 3 ) t(hours) atmospheric CO (micgr/m 3 ) t(hours) Figure 4: Residual after fitting a space-time AR(1) model to the data in Figure 3. Acnowledgment The wor of Ana Mónica C.Antunes was supported by a Grant from Fundação para a Ciência e a Tecnologia, Portugal (SFRH/BD/1473/2000) and the author is grateful for the Grant. Appendix A. Application of the Law of the iterated logarithm The following lemmas are needed for the proof of Theorem 8. LEMMA 9 Let P 1 denote the first column of P, where P is defined in Theorem 3. Then, when 0, the true order, P 1 = almost surely as T. T 1 T t=1 ε (t 1) ε (t) T 1 T t=1 u (t 1) ε (t) 1 + o (1), LEMMA 10 Let z = T 1/2 σ 1 T 1/2 P 1, where T 1/2 is any matrix whose square is T 1. Then 1. For = 0,..., K, where K is fixed, the z are asymptotically independent and normally distributed with zero means and identity covariance matrices. 41

43 2. For 0, if α is any 2-dimensional vector, almost surely. lim sup T 3. For = 0,..., K, where K is fixed, 1 2α α log log T α z = 1, almost surely. lim sup T z z 2 log log T = 1, PROOF of Lemma 9 From (8), P 1 = γ +1 π 1 = γ +1 π 1 Γ + Π Γ + Π Γ + Π Π Λ Φ () Ψ () Π Λ Π Λ Φ () Ψ () Φ() Ψ () Φ() Ψ (). 42

44 When 0, Γ Π γ +1 + π 1 Π Λ = T 1 T t=1 X (t) X (t + + 1) T 1 T t=1 X (t) W X (t + + 1) 0 + φ T 1 T t=1 X (t) X (t ) T 1 T t=1 X (t) W X (t ) Φ() Ψ () 0 + ψ T 1 T t=1 X (t) W X (t ) T 1 T t=1 X (t) W W X (t ) T = T 1 t=1 X (t) ε (t + + 1). T t=1 X (t) W ε (t + + 1) Now, from the proof of Theorem 7, Γ Π = Π Λ Γ Π = U V Π Λ V F () Φ Ψ () U = V = () Φ Ψ () V F () Θ () V 1 1 Γ Π Γ Π Γ V F Π Φ() Ψ () 1 Π Λ Π Λ + Γ Π Û V V F + Γ Π Γ Π Û V + + Φ() Ψ () V F Û V Û V Φ() Ψ () V F V F 1 + o (1) Φ() Ψ () Φ() Ψ () 1 + o (1). 1 + o (1) 1 + o (1) 43

45 Thus P 1 = T 1 T t=1 X (t 1) ε (t) T 1 T t=1 X (t 1) W ε (t) () Φ () T + 1 T t=1 X (t i) ε (t) Ψ () Θ () T 1 T t=1 X (t i) W ε (t) = T 1 T t=1 ε (t 1) ε (t) T 1 + o (1). 1 T t=1 u (t 1) ε (t),...,,..., 1 + o (1) PROOF of Lemma 10 The asymptotic behaviour of P 1 is, from the above, the same as that of Now T 1 T t=1 ε (t 1) ε (t) T 1 T t=1 u (t 1) ε (t) E ε (t 1) [ ] ε (t) ε (t) ε u (t 1) u (t 1) (t 1) = E E ε (t 1) [ ] ε (t) ε (t) ε u (t 1) u (t 1) Ft 1 (t 1) = σ 2 E ε (t 1) [ ] ε (t 1) u (t 1) (t 1) = σ 2 T, u. by Theorem 4, since both ε (t 1) and u (t 1) are constructed from X (t 1),..., X (t 1) and are thus fixed given F t 1. Again using Billingsley s martingale central limit theorem (Billingsley (1961)), α z is asymptotically normal with zero mean and variance α α. In fact, if α, = 0,..., K are arbitrary 2-dimensional vectors, then α z, = 0,..., K are asymptotically independent. To see this, without loss of 44

46 generality choose and l with > l 0. Then T E t=1 ε (t 1) ε (t) [ T T t=1 u t=1 (t 1) ε (t) ε (t) ε l (t l 1) T t=1 ε (t) u l (t l 1) T = σ 2 E t=1 ε (t 1) ε l (t l 1) T t=1 ε (t 1) u l (t l 1) T t=1 u (t 1) ε. l (t l 1) T t=1 u (t 1) u l (t l 1) ] Now, using (7) and since, for i = 0, 1,..., l 1 l + i, we obtain E ε (t 1) ε l (t l 1) ( = E X (t 1) + = + + X (t l 1) + γ l + l l = 0. φ (l) i ψ (l) i ( l ( φ () γ l+i + π l+i + φ (l) i φ () I N + ψ (l) W I N + ψ () W ) γ l + ψ () π +l ( ( φ () φ () i ) X (t 1 + ) ] ) X (t l 1 + i) ) γ l+i + ψ () π i+l ) π l+i + ψ () λ l+i 45

47 Similarly, E u (t 1) u l (t l 1) ( = E W X (t 1) + = + + W X (t l 1) + λ l + l l = 0, δ (l) i θ (l) i ( δ () π l i + λ l+i + l ( δ (l) i δ () I N + θ (l) W I N + θ () W ) π l + θ () λ l ( ( δ () δ () i ) X (t 1 + ) ] ) X (t l 1 + i) ) γ l+i + θ () π l + i ) π l+i + θ () λ l+i E ε (t 1) u l (t l 1) ( = E X (t 1) + = + + W X (t l 1) + = 0 π l + l l δ (l) i θ (l) i ( φ () γ l+i + π l+i + l ( δ (l) i φ () I N + ψ () W I N + θ (l) W ) π l + ψ () λ l ( ( φ () φ () i ) X (t 1 + ) ] ) X (t l 1 + i) ) γ l+i + ψ () π l + i ) π l+i + ψ () λ l+i 46

48 and E u (t 1) ε l (t l 1) ( = E W X (t 1) + = + + X (t l 1) + π l + l l = 0. φ (l) i ψ (l) i ( l ( δ () π l i + λ l+i + φ (l) i δ () I N + ψ (l) W i I N + θ () W ) γ l + + θ () π l + ( ( δ () δ () ) X (t 1 + ) ] ) X (t l 1 + i) ) γ l + i + θ () π l + i ) π l+i + θ () λ l+i The z are thus ointly asymptotically normal and independent. Using Stout s law of the iterated logarithm (Stout(1970)), we also have lim sup T 1 2α α log log T α z = 1 almost surely. Using the methods in Hannan (1980), which we do not reproduce here, it is also the case that lim sup T z z 2 log log T = 1. 47

49 References Billingsley, P. (1961). The Lindeberg-lévy theorem for martingales. Proc. Amer. Math. Soc 12, Durbin, J. (1960). The fitting of time series models. Rev. Inst. Statist. 28, Epperson, B. (2000). Spatial and space-time correlations in ecological models. Ecological Modelling 132, Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2. New Yor: Wiley, 2nd ed. Garrido, R. (2000). Spatial interaction between the truc flows through the Mexico- Texas border. Transportation Research, Part A 34, Guttorp, P., Meiring, W. & Sampson, P. D. (1994). A space-time analysis of ground level ozone data. Environmetrics 5, Hannan, E.J. (1980). The estimation of the order of an ARMA process. Ann. Statist. 8, Hannan, E. & Deistler, M. (1988). The statistical Theory of linear systems. New Yor: John Wiley & Sons, 1st ed. Haslett, J. & Raftery, A. E. (1989). Space-time modelling with long-memory dependence: assessing Ireland s wind power resourse. Appl. Statist. 38, Mardia, K. V., Goodal, C., Redfern, E. J. & Alonso, F. J. (1998). The riged Kalman filter. Test 7, Pfeifer, P. & Deutsch, S. (1980). A three stage interactive procedure for spacetime modeling. Technometrics 22, Quinn, B. (1980). Order determination for a multivariate autoregression. J. Roy. Statist. Soc. B 42, Quinn, B. (1988). A note on AIC order determination for multivariate autoregressions. J. Time Ser. Anal. 9, Shaddic, G. & Waefield, J. (2002). Modelling daily multivariate pollutant data at multiple sites. Appl. Statist. 51, Sparre-Andersen, E. (1954). On the fluctuations of sums of random variables ii. Math. Scand. 2, Spitzer, F. (1956). A combinatorial lemma and its applications to probability theory. 48

50 Trans. Am. Math. Soc. 82, Stoffer, D. (1986). Estimation and identification of space-time ARMAX models in the presence of missing data. J. Amer. Statist. Assoc. 81, Stout, W. (1970). The Hartman-Wintner law of the iterated logarithm for martingales. Ann. Math. Statist. 41, Subba Rao, T. & Antunes, A. (2004). Spatio temporal modelling of temperature time series: A comparative study. In Time series analysis and applications to geophysical systems, F. Schoenberg, D.R. Brillinger, E. Robinson, eds, vol. 139 of IMA Volumes in Mathematics and its Applications. New Yor: Springer-Verlag, Tonellato, S. F. (2001). A multivariate time series model for the analysis and prediction of carbon monoxide atmospheric concentrations. Appl. Statist. 50, Whittle, P. (1963). On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix. Biometria 50, Wicle, C. K. & Cressie, N. (1999). A dimension reduced approach to space-time Kalman filtering. Biometria 86,

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Seminar p.1/28 Predictive spatio-temporal models for spatially sparse environmental data Xavier de Luna and Marc G. Genton xavier.deluna@stat.umu.se and genton@stat.ncsu.edu http://www.stat.umu.se/egna/xdl/index.html