Modeling Long-Range Dependence, Nonlinearity, and Periodic Phenomena in Sea Surface Temperatures Using TSMARS

Size: px

Start display at page:

Download "Modeling Long-Range Dependence, Nonlinearity, and Periodic Phenomena in Sea Surface Temperatures Using TSMARS"

Walter Rich
5 years ago
Views:

1 Modeling Long-Range Dependence, Nonlinearity, and Periodic Phenomena in Sea Surface Temperatures Using TSMARS Peter A. W. Lewis; Bonnie K. Ray Journal of the American Statistical Association, Vol. 92, No (Sep., 1997), pp Stable URL: Journal of the American Statistical Association is currently published by American Statistical Association. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is an independent not-for-profit organization dedicated to creating and preserving a digital archive of scholarly journals. For more information regarding JSTOR, please contact support@jstor.org. Sun Aug 6 12:

2 Modeling Long-Range Dependence, Nonlinearity, and Periodic Phenomena in Sea Surface Temperatures Using TSMARS Peter A. W. LEWIS and Bonnie K. RAY We analyze a time series of 20 years of daily sea surface temperatures measured off the California coast. The temperatures exhibit quite complicated features, such as effects on many different time scales, nonlinear effects, and long-range dependence. We show how a time series version of the multivariate adaptive regression splines (MARS) algorithm, TSMARS, can be used to obtain univariate adaptive spline threshold autoregressive models that capture many of the physical characteristics of the temperatures and are useful for short- and long-term prediction. We also discuss practical modeling issues, such as handling cycles, long-range dependence, and concurrent predictor time series using TSMARS. Models for the temperatures are evaluated using out-of-sample forecast comparisons, residual diagnostics, model skeletons, and sample functions of simulated series. We show that a categorical seasonal indicator variable can be used to model nonlinear structure in the data that is changing with time of year, but find that none of the models captures all of the cycles apparent in the data. KEY WORDS: Adaptive spline threshold autoregressive models; Forecasting; Long memory; Multivariate adaptive regression splines; Nonlinear time series; Threshold autoregression. 1. INTRODUCTION AND DATA DESCRIPTION Physical systems, such as sea surface temperatures (SSTs), cloud cover, wind velocities, wind speeds, and water salinity, exhibit complex and interconnected behaviors that are difficult to capture using standard time series models. For example, Figure 1 shows the natural logarithm of daily SSTs (in "C) at Granite Canyon, on the coast of California approximately 30 km south of Monterey, over the period March 1, November 9, Oceanographers are interested in modeling SSTs to understand what drives changes in temperatures and to obtain accurate predictions of SSTs. Short-term predictions (approximately 1-20 days) are used in large-scale weather models (Haltiner and Williams 1980), whereas long-term predictions (2-3 years or more) are used to explore issues such as global warming and El Nino effects. Our goals are to model SSTs as a function of past SST values and other climatic variables and to develop good short- and long-term predictions. The series of logged SSTs has cycles on several different time scales. The cyclic time-of-year effect is the dominant feature in the data. Temperature drops every spring, corresponding to the coastal upwelling, although the drop does not occur at precisely the same time every year. Following the spring transition, the temperatures remain low and rise after the summer. It is well known (Hidaka 1954; see also Breaker and Lewis 1988) that this effect is due mainly to the wind direction. There is also a longer cyclic effect Peter A. W. Lewis is Distinguished Professor, Department of Operations Research, Naval Postgraduate School, Monterey, CA Bonnie K. Ray is Associate Professor, Department of Mathematics and Center for Applied Math and Statistics, New Jersey Institute of Technology, Newark, NJ The research of Bonnie Ray was supported in part by a separately budgeted research grant from the New Jersey Institute of Technology and National Science Foundation grant DMS The authors thank Larry Breaker for his input concerning the substantive issues of interest to oceanographers. They also thank the editor, associate editor, and the referees for comments that helped to substantially improve the presentation. due to the El Nino phenomenon. This phenomenon is reflected in the higher temperatures seen in Figure 1 in 1972, 1976, 1983, and 1987; it has a quasi-period of 4-5 years. An 18.6-year cycle in the SSTs has also been investigated (Royer 1991). A sinusoid with period 1 year fit to the first 20 years of the data using least squares approximates the yearly effect (see Fig. 1). The futility of trying to capture the cyclic effects completely with a global deterministic term can be seen from Figure la; the later data (March 1, 1991 and beyond) shows a massive and broad El Nifio warming in progress, which extended well into Therefore, fitting a 1-year cycle to the entire series would produce a sinusoid with a higher level than the sinusoid fitted using only the first 20 years of data, giving a worse fit to the data in the earlier years than that shown. The SST data also exhibit long-range dependence, which shows up as very long cycles of different lengths. The clear downward slope of the log periodogram relative to the log frequency shown in Figure 2 is characteristic of long-range dependent processes (Mandelbrot and Van Ness 1968). Because the logged periodogram ordinates are distributed as log exponential random variables, which are strongly negatively skewed, a robust straight line fit is shown. A similar spectral behavior has been seen in 60 years of daily SSTs recorded at Pacific Grove, CA, as well as other sets of SSTs (Mendelson 1990 and personal communication). A more subtle cyclic pattern is an approximately 47-day oscillation (Breaker and Lewis 1988). Previous attempts to model this oscillation by complex demodulation, for example, have proved fruitless, but it is believed that this oscillation may be related to the effect of the wind on SSTs (Hidaka 1954). We examine this nonlinear hypothesis in more detail in Section 1997 American Statistical Association Journal of the American Statistical Association September 1997, Vol. 92, No. 439, Applications and Case Studies

3 Journal of the American Statistical Association, September Years of Daily Sea Surface Temps at Granite Canyon.. :: Year Xr.. (a) 7 0 ' I I 1 I I I I Time in Days from 3/1/1971 (b) Figure 1. (a) The Logarithm of the SSTs From March 1, 1971-November 9, A fitted sinusoid with a I-year period is also shown for this period. (b) the residuals from this fit. It has been suggested that a linear trend component be included in a model for SSTs to describe the apparent longterm increase in temperatures. In fact, investigation of the SST data began in the early 1980s when oceanographers at the Naval Postgraduate School inquired about the existence Initial part of periodogram of daily sea surface temperatures at Granite Canyon years Log of frequency in cycles per year Figure 2. Plot of the Logarithm of the Periodogram of 20 Years of Logged SSTs at Granite Canyon Versus the Logarithm of an Initial Set of Fourier Frequencies. The circles depict logarithms of each periodograrn ordinate. The x's are logarithms of averages of adjacent periodograrn points; the straight line is a fit to the x points. of a long-term trend. Changes observed in flora and fauna in the region suggested that an increase in temperatures had occurred; oceanographers wanted to know whether such a trend was likely to continue. Fitting a linear trend plus 1-year cycle to the first 20 years of logged data gives a trend coefficient x 1OP' with estimated standard error,0588 x lop5 (neglecting the effect of the dependence structure of Xt), which translates into a roughly.43% yearly increase in SST. Studies of global temperatures (see, e.g., Smith 1993) give conflicting results about long-term trends, especially when long-range dependence is taken into consideration. Additionally, although there may be a very longterm cycle in the data, continued warming at a rate predicted by a straight line fit is unlikely. Thus a linear trend model is not satisfactory to oceanographers as a descriptive or predictive device. Because global warming is debatable, other models for the SSTs that do not assume a linear trend, such as spline-type functions, are preferable. We analyze the daily SSTs at Granite Canyon using both a linear long-range dependent model and nonlinear threshold-type autoregressive (AR) models obtained using TSMARS, a modification of the multivariate adaptive regression splines (MARS) algorithm (Friedman 199 1, 1993a, 1993b) appropriate for time series. The period March 1, 1971-February 28, 1991 is used for model estimation; the remainder of the data up to November 9, 1993 (620 days) are used for model validation. We also investigate models that include covariate series of wind velocities and wind directions at Granite Canyon and models that include a co-

Lewis and Ray: Time Series Multivariate Adaptive Regression Splines 883 variate categorical time series representing seasons of the year. The article is organized as follows.

4 Lewis and Ray: Time Series Multivariate Adaptive Regression Splines 883 variate categorical time series representing seasons of the year. The article is organized as follows. Section 2 describes TSMARS and discusses some computational issues. Section 3.1 discusses methods of handling the yearly cyclic effects in the data when using TSMARS. Sections 3.2 and 3.3 analyze the extent of long-range dependence in the SSTs and model it using both a linear long-memory model and an approximating nonlinear AR model with lags of up to 5 years. Section 3.4 evaluates the fitted models using out-of-sample forecast root-mean squared errors (RMSEs), residual diagnostics, model skeletons, and stability of simulated series. Section 4 illustrates how wind speeds, wind directions, and seasonal effects can be incorporated into our model using TSMARS. Section 5 summarizes our findings and gives directions for future research. Previous attempts at modeling SSTs using TSMARS have been reported by Stevens (1991), Lewis, Ray, and Stevens (1993), and Lewis and Ray (1993). Unlike that previous work, here we explore the inclusion of a categorical seasonal predictor variable that affects the entire nonlinear structure of the SSTs rather than only their mean value. We find that explicitly subtracting a fitted sinusoidal cycle from the data before using TSMARS gives the best predictive models. We also incorporate lagged values of up to 5 years in the nonlinear models in an attempt to capture long-range dependent behavior. The models of Lewis and Ray (1993) included lags only up to 365 and 50 days, which were not sufficient to account for long-range dependence or the El Nino warming. 2. USING TSMARS FOR NONLINEAR TIME SERIES MODELING 2.1 Threshold Autoregressive Models Threshold models, or models with partition points, are nonlinear time series models that emerge naturally when different models for the response variable {Xt) hold over different regions of the predictor space. Tong (1983, 1990) introduced threshold autoregression (TAR) models, which characterize this structure using different linear AR models in disjoint subregions of the predictor space. TAR models are powerful and flexible, but their identification is not systematic. Additionally, they can contain subregion AR models that are discontinuous at subregion boundaries. Systematic implementation of the TAR modeling methodology based on Friedman's (1991) multivariate adaptive regression splines (MARS) algorithm was introduced by Lewis and Stevens (1991). For a time series of re- sponses {Xt) with predictors {Xt-l}, {XtP2).... > {XtPp), the MARS algorithm can be used to obtain continuous nonlinear threshold-type AR models with interactions between predictors. We call the MARS algorithm modified for time series TSMARS, and we call the resulting models adaptive spline threshold autoregressive (ASTAR) models. (The form and analysis of univariate ASTAR models have been discussed in detail in Lewis and Stevens 1991.) TSMARS provides a systematic procedure for fit- ting nonlinear threshold models that are continuous in the domain of the predictor variables, allow interactions among lagged predictor variables, and have multiple lagged predictor variable thresholds, thus overcoming the limitations of Tong's approach and admitting more general nonlinear threshold models. Models containing covariate predictor series are called semi-multivariate ASTAR (SMASTAR) models (Lewis and Stevens, unpublished manuscript). For t = , n, let {x) denote the input time series and let {Xt) denote the output time series. A simple SMA- STAR model for {Xt) with threshold value T, on Xt-dl and T, on and an interaction between {Xt-,l,) and {x-d~) is where c is an intercept and (x - a)+ denotes a truncated spline function with (x - a)+ = (x - a) if x > a and 0 otherwise. The lags dl and d2 are not necessarily equal. Also, Yt, the current value of the covariate time series, may or may not be included in (1). The SMASTAR model can have a physical interpretation in the sense that the effects of lagged variables, covariates, and threshold levels on the response variable can be seen from the form of the fitted model. This is not always true of other nonlinear models, such as neural networks. 2.2 The TSMARS Algorithm TSMARS is a computer-intensive methodology that uses an exhaustive search algorithm to evaluate a very large number of potential models for the data, selects one of them, and estimates parameters for that model. The exact form of the selected model for Xt depends on the nature of the data, as well as on the parameters that control the implementation of the TSMARS algorithm, as specified by the user. A SPAN parameter is used to specify the minimum number of observations allowed between threshold placements for each predictor variable, thus acting to control the level of fine structure captured by the model. The SPAN parameter may be specified a priori or selected to minimize a model selection criterion, such as out-of-sample RMSE. The maximum number of linear truncated spline functions allowed in an interaction term is specified by the maximum interaction (MI) parameter. For ease of interpretation, MI is usually set to 2 or 3. An MI value of 1 precludes interactions between predictors. The maximum number of basis functions, MB, allowed in the forward step recursive partitioning algorithm can also be specified by the user. Additionally, a SPEED factor can be set to control the exhaustiveness of the search algorithm, with 1 indicating complete exhaustive search of the predictor space and 5 indicating a modified, less thorough, search. We use SPEED = 1 to obtain all models discussed in this article. Different depths of search do not seem to affect the accuracy of the approximating models in the regression framework (see Friedman 1993b), but may affect out-of-sample predictive accuracy.

884 Journal of the American Statistical Association, September 1997 Given the large number of algorithm parameters and the difficulty of predicting even simple nonlinear models, we have found it best

5 884 Journal of the American Statistical Association, September 1997 Given the large number of algorithm parameters and the difficulty of predicting even simple nonlinear models, we have found it best to allow TSMARS as much flexibility in fitted models as is possible in the available computing environment. To evaluate model fit, TSMARS uses the residual sum of squared errors in the forward steps of the algorithm. A final model is selected by backward trimming using one of four selection criteria that penalize for overparameterization: the generalized cross-validation (GCV) criterion, the Akaike information criterion (AIC), Amemyia's prediction criterion, or the Schwarz-Rissanen criterion (SC). When using MARS in a time series setting, the SC (Schwarz 1978) was found by Stevens (1991) and Lewis and Ray (1993) to give more parsimonious models than the GCV criterion used by Friedman. This result is consistent with the behavior of the SC when selecting among linear ARMA(p, q) models. The SC is used to select all models investigated in this article. Of course, common sense and physical interpretation should dictate the form of the final model. Estimating the precision of TSMARS parameter estimates is difficult, because a TSMARS-selected model rarely corresponds to the true model. Estimates of precision that are not conditional on the fitted model have not been fully addressed even for simple cases; for example, estimating a linear AR(p) model by first selecting p and then estimating the coefficients (see Chatfield 1995 for a discussion). For TAR models, Tong (1990, thms. 5.7 and 5.8) showed that assuming that the delay variable, threshold values, and orders of the subregion autoregressions are known, the least squares estimates of the AR coefficients are asymptotically normally distributed, with standard errors equal to those given by linear regression theory under certain conditions. The estimated thresholds do not have a limiting normal distribution, however. Tong obtained standard errors for estimated threshold values in a TAR model using a bootstrap method. For the more complex ASTAR models, bootstrapping is often impractical; a single model may have several predictor variables, with each variable having several threshold points. Even if bootstrapping is feasible, models for the bootstrapped data are not guaranteed to contain the same terms as the original model. Consequently, we give only regression standard errors for the estimated coefficients conditioned on the selected ASTAR and SMASTAR models, although these provide only a rough lower bound on precision. 3. MODELING SSTs USING ASTAR AND ARFIMA MODELS It is well known that SSTs exhibit high, positive-valued autocorrelations at lags between 1 and approximately 40, and that the autocorrelation function (ACF) is not well modeled by the usual linear ARMA(p, q) models (Breaker, Lewis, and Orav 1983). In this section we model possible long-range dependence with a linear fractionally integrated ARMA (ARFIMA) model and with approximating ASTAR models obtained using TSMARS. 3.1 Handling Cyclic Effects Using TSMARS The dominant effect in the SSTs (see Fig. 1) is the 1-year cycle. This cycle is a fixed effect. That it is a true cycle can be shown by computing the amplitude of the yearly spectral component for the first year, the next 2 years, the next 3 years, and so on. A regression analysis shows that the amplitude increases linearly in n, the number of days. We consider three ways of handling this cyclic effect: 1. Use the general form of a univariate ASTAR model with an extensive range of lagged XtVj's as predictors and allow thresholds and interactions between lagged values. This can generate cyclic processes and limit cycles (see Lewis and Stevens 1991). Of course even a linear, firstorder AR equation with parameter close to 1 generates long "cycles," but the cycles vary in length. 2. Subtract a fitted sinusoidal function St = /A + cy cos(2rt/365) + psin(2rt/365) from Xt and let X,' = Xt - St. TSMARS is then used to model the transformed process, X,', which is assumed to be stationary. Although our results in Section indicate that this method gives possibly the best predictive model for the Granite Canyon SSTs, it is awkward and restrictive as an explanatory model from the viewpoint of oceanographers. To see this, consider X,' = Xt - St and transform it back to the variable Xt. Then the threshold on Xt will be cyclic! In practical terms, this implies that the nonlinearity occurs when the variable gets higher than a "time-of-year level" rather than when the variable reaches an absolute level. The nonlinear dynamical structure is not affected by the linear filter X,' = Xt - St (Broomhead, Huke, and Muldoon 1992); however, it is not clear whether inference is affected by substituting residuals from the cyclic fit in the TSMARS algorithm. Some initial results regarding this issue are discussed later in this section. 3. Use both lagged values of the SSTs and the 1-year cosine and sine terms as covariate predictor time series. This is a more general model than that described in method 1. However, for the SST data, the sine and cosine terms seldom came into the final model, even with 20 years of data, and the resulting predictions were extremely poor. In fact, TSMARS seems very poor at detecting a fixed cyclic effect with a long period. As a simple test, we generated series from a Gaussian AR process of order one with an added sinusoid and used TSMARS with a sinusoidal covariate and one lagged predictor to obtain strictly linear models for the series. Other TSMARS parameters were set commensurate with those used to model the Granite Canyon data. In 50 replications, the resulting TSMARS model only once included a sinusoid. The estimates of the AR coefficient, q5, were better than those obtained by the usual device of fitting a cycle by least squares, subtracting the cycle, and estimating q5 from the residuals, however. Consequently, for the univariate TSMARS models discussed in the following sections, a yearly cycle was removed from the logged data prior to modeling and predicting. As noted earlier, this may not be the best procedure from an explanatory viewpoint, but it does give the best predictors, as we discuss in Section The same variance stabilizing and cyclic detrending was applied before fitting a linear long-range dependent model

6 Lewis and Ray: Time Series Multivariate Adaptive Regression Splines 885 to the data, as discussed in Section 3.2. Because the longmemory model is linear, however, the (linear) detrending does not affect the interpretability. 3.2 Handling Long-Memory Using ARFIMA Models The existence of long-range dependence in physical systems in the form of very long cycles and apparently shifting mean levels has been the subject of much study in the last 20 years (see, e.g., Haslett and Raftery 1989, Hosking 1984, Mendelson 1990, and McLeod and Hipel 1978). By long-range dependence, we mean that the spectrum of the process at frequencies w close to 0 is proportional to up", 0 < a < 1 and is unbounded at zero frequency. Existence of such behavior might help to explain the anomalous effects of apparent global warming (Smith 1993). Figure 2 shows that the Granite Canyon SSTs have a sample spectrum that takes very large values at small frequencies, which is characteristic of series having long-range dependence. Several authors have proposed linear time series models that describe the slowly decaying structure of such series (Hosking 1981; Jonas 1983; Mandelbrot and Van Ness 1968). For example, the ARFIMA model of Granger and Joyeux (1980) and Hosking (1981), a generalization of the Box-Jenkins ARIMA models in which the degree of integration, d, is allowed to take fractional values, has long memory. The fractional integration operation (1- B ) ~ is defined by the binomial expansion (1 - B )~ = CZo($)(-B)" where B denotes the backward shift operator. The exponent d is related to the long-range dependence parameter a by a = 1-2d. We use an ARFIMA model to describe the low-frequency components of the 20 years of daily SSTs at Granite Canyon after first removing the 1-year cycle. Removing the cycle does not change the characteristics of the residual data (Yajima 1988). Because of the large sample size, we estimate the model in two steps. The long-memory parameter, d, is estimated using the periodogram spectral regression method of Geweke and Porter-Hudak (GPH; 1983), with number of regressors equal to [&I = 85. For the 20 years of Granite Canyon SSTs, we obtain d =.37, with standard error.076. This is consistent with the slope of the estimated log-spectral density for the SSTs shown in Figure 2. Hurvich, Deo, and Brodsky (1996) showed that the GPH estimate of d is consistent assuming that the process is Gaussian. A normal probability plot for the logarithm of the SSTs indicated that a Gaussian assumption is plausible. The estimate of d is used to approximate the fractional "difference" of the SST data, (1- B)'x~, using the infinite AR representation of the fractional differencing operator truncated at lag 500. The sample ACF and partial autocorrelation function (PACF) of the resulting fractionally differenced data indicate that the series contains shortmemory ARMA components. In fact, the fractionally differenced data exhibit behavior consistent with that of an AR(1) process, with estimated coefficient 4 =,606 (standard error.009). Thus there is strong correlation between SSTs from one day to the next and small, but nonnegligi- ble, correlation between SSTs many days apart. The result of the two-step procedure is an ARFIMA(1, d. 0) model for the logged and detrended SSTs. The ACF of the residuals from the fitted ARFIMA(1: d: 0) model (Fig. 3a) shows little correlation. However, given the nonlinear nature of the data, it seems highly unlikely that the linear ARFIMA model describes the data adequately. Examining the ACF for the squared residuals, as suggested by Granger and Anderson (1978), shows that the squared residuals retain some correlation (Fig. 3b). The correlation of squared residuals may indicate nonlinearity in the data, possibly in the direction of bilinearity, or a more general model misspecification, perhaps reflecting omission of a co- variate regressor. A more general model is discussed in Section Handling Long-Memory Using TSMARS An invertible long-memory ARFIMA(p. d, q) process has an infinite AR representation with AR that decay at w j-d-l as j + O. Thus it is reasonable to try to approximate long-range effects using an AR model with many lags. For a linear fractional noise process (an ARFIMA(0. d. 0) process), Ray (1993) found that an approximating AR(p) model accurately forecasted a fractional noise process at long lead times. Thus we attempt to approximate long-range dependence in ASTAR models by in- cluding many lagged predictor variables. We do not require the AR coefficients to decay as earlier; if the process has long-range dependence, then fitting a long AR model to the data should result in significant AR coefficients at long lags of approximately the correct form. We use TSMARS to model 20 years of the Granite Canyon data (with the 1-year cycle removed) using all lagged SSTs up to lag 100, and every fifth lag thereafter up to lag 1,925, representing lags up to approximately 5 years and 3 months. (Using all lags to 1,925 is computationally infeasible; the limited grid here represents a tenfold increase over the computation times in Lewis and Ray 1993.) We allow the SPAN and MI parameters to vary, to investigate the effect of these parameters on the selected model. Let the log-transformed and detrended process be where St is the SST on day t and the values in parentheses are the standard errors of the estimated coefficients (neglecting the effect of the dependence structure of xt). The sample variance of Xt is 62 = The differences that result from prohibiting as opposed to allowing interactions between predictors in the TSMARS algorithm can be seen from examining the following two models. Using SPAN = 1 and no allowable interactions (MI = 1) when applying the TSMARS algorithm to Xt, the resulting model is as follows:

7 ARFIMA Model Residuals Journal of the American Statistical Association, September 1997 Figure 3. (a) The Sample ACF for the Residuals From an ARFlMA(1, d, 0) Model Fitted to Granite Canyon SSTs. (b) The Sample ACF for the Squared Residuals. The dotted lines indicate upper and lower bounds on the estimated ACF assuming the residuals are white noise. Threshold TSMARS model for 20 years of SSTs at Gran- Granite Canyon SPAN 1; order 3 interactions allowed (MI ite Canyon SPAN 1; no interactions allowed (MI = 1): = 3): with 6; = var(xt - = The threshold in the lag-one term in the model indicates that SST is strongly correlated with the previous day's SST and that the dependence structure is nonlinear, with temperature increasing with coefficient if the temperature the previous day was more than -.I16 and increasing with coefficient.838 if it was less than Given that the minimum value of Xt over the 20 years is -.340, we see that lags 2 and 17 enter the model linearly (no interior threshold). Terms with lags of 8 and 17 probably reflect the fact that the average time between storm fronts in the vicinity of Granite Canyon in the winter is about 8 days. The model does not contain terms with lags greater than 17, as would be expected for a long-range dependent process. This suggests that the long-memory behavior may only be observed in conjunction with other variables; that is, interactively. When three interaction levels are allowed (MI = 3), the resulting model for Xt has the following form: Nonlinear long-memory model for 20 years of SSTs at with 8: = var(xt - x ~)~ = The highly nonlinear first-order terms are similar to those of the previous model, but lags 8 and 17, as well as higher-order terms, now enter as interaction terms. (The previous model did not allow interactions between variables.) The existence of long lag terms in the model (up to 1,425, or slightly less than 4 years) is consistent with long-range dependence in SSTs, although the variables Xt-435 or Xt-1415, for example, may not be physically interpretable. In fact, monthly averages corresponding to 14 and 48 months previous may provide more stable and interpretable models in the final analysis. We do not explore that avenue here, but note that TSMARS currently allows

8 Lewis and Ray: Time Series Multivariate Adaptive Regression Splines K-STEP-AHEAD FORECAST COMPARISONS - W T VALUE... MCAN VALUE _ _ R t -... ARFlHA(l.d,O) THRESHOLD. SPAN 1 IMERACTIM: SPAN 25 INTERACTIM: SPAN 1 Figure 4. The RMSEs of k-step-ahead Forecasts for the Granite Canyon SSTs. The forecast period begins on March 1, 1991, and extends for 620 days. aggregated variables as predictors. The last three-way interaction term is nonzero if XtP2 > (which always occurs), Xt-435 >,071 (which occurs about half of the time), and Xt-1415 < (which rarely occurs). If the interaction is nonzero, then the contribution to xt is large and negative. Consequently, the following xt values take very large positive values, quickly causing the model to become unstable (see Sec ). With SPAN = 25 and allowing up to three-way interactions, the resulting model for Xt has the following form: Nonlinear long-memory model for 20 years of SSTs at Granite Canyon SPAN 25; order 3 interactions allowed (MI = 3): with 32 = var(xt - x ~)~ =, The first-order terms are similar to those of the previous models, but there are now no level-2 interactions. SPAN = 50 gives a similar model, but contains only three terms of interaction level 3. As would be expected, using larger SPAN values results in a "smoother" model. The model of (5) has no three-way interactions with very large coefficients, as in the model of (4), and the resulting process appears to be stable. Equations (4) and (5) are approximations to a model with long-range dependence. We do not know of a specific ASTAR model that is also truly long-range dependent, as opposed to an approximating model. Other types of nonlinear models having long-range dependence, such as fractionally integrated models with ARCH errors or random solutions to the Burger's' equation, have been discussed by Robinson (1994), Rosenblatt (1987), and Taqqu (1987). In the following sections we compare the estimated linear ARFIMA model and the ASTAR models on the basis of their forecasts, residuals, and model skeletons. 3.4 Comparison of Linear and Nonlinear Models Out-of-Sample Forecasts. Predicted SSTs are used as input to large-scale weather models and for analysis of long-term environmental trends, so the predictive capability of the models at both short and long forecast horizons is important. Figure 4 shows the RMSEs of predicted k-step-ahead forecasts beginning on March 1, 1991, for the linear and nonlinear models discussed in Sections 3.2 and 3.3. The RMSEs are calculated in the original scale of the data and are shown in "C. We also include the RMSE obtained if the last observed value in the time series is used for prediction. The last observed value is a reasonable predictor

888 Journal of the American Statistical Association, September 1997 for highly correlated data and is typically the standard by which physicists judge forecast ability.

9 888 Journal of the American Statistical Association, September 1997 for highly correlated data and is typically the standard by which physicists judge forecast ability. The RMSE of the naive forecast (i.e., the sample mean) is also shown. The mean is the typical basis of forecast comparison for statisticians. The RMSEs were obtained using a moving forecast origin, forecasting k-steps ahead, k = 1,...,620, and taking the square root of the average squared difference between forecast and actual value. We see that the RMSEs do not increase monotonically in k, probably because the data have nonlinear structure (Casdagli 1992). Accuracy of nonlinear forecasts depends on the region over which the forecasts are made. We extend our forecasts out to 620 days to include more than one year in the forecast region. All models (except the sample mean) are competitive for small forecast steps (i.e., k = 1,...,20). This is not surprising, given the high correlation between adjacent SSTs. However, there are differences at larger forecast steps. For example, at k = 365, the RMSEs of the forecasts range from about 1.35"C using (4) to about 1.74"C using the naive mean forecast. Thus we have a forecast improvement using the TSMARS model of about 22%. The TSMARS model with SPAN = 1 and MI = 3, the model of (4), appears to be the best predictive ASTAR model. The SPAN 25 model is competitive at steps > 200, differing from the SPAN 1 model by at most about 5%. The linear ARFIMA model performs worse than any of the ASTAR models. The RM- SEs of the ASTAR model forecasts shown in Figure 4 are larger than the measurement errors for the SSTs, which are known to be approximately *0.l0C, but are smaller than the RMSEs for the forecasts based on the last observed value, the method currently used by oceanographers (Haltiner and Williams 1980). Of course, RMSE measurements are subject to sampling variability, which has not been investigated here. Estimates of the precision of the RMSEs for the models discussed could be obtained by considering the variability of the RMSEs over different segments of the data. From Figure 1 there appears to be a change (at least in overall level) of the SSTs at about the time the forecast period begins (February 28, 1991). There is no known reason for a permanent change in the structure of the SST process around late 1990 or early 1991, but there is evidence of a massive El Niiio warming in progress. Alternatively, the apparent changes in behavior around this period could be attributed to the long-range dependence property of the SST data. What is the effect of the change in SSTs on the forecasts? The forecasts may be less accurate than they could be, because the forecasting model is based on the behavior of the SST process up until February 28, 1991, and there does not appear to have been a comparably broad warming episode in the SST process before that time. Model updating (i.e., reestimation of the model using - additional data including w the El Nifio episode) can be used to investigate changes in process behavior, but this is not computationally feasible Residual Diagnostics. Figure 5 shows the ACF for the residuals and squared residuals from the TSMARS model of Eq. (4) fitted to the 20 years of SSTs. The dotted lines indicate upper and lower bounds on the estimated ACF assuming that the residuals are white noise. The residuals are uncorrelated, but the ACF for the squared residuals indicates some remaining correlation, although less than that observed for the squared residuals from the ARFIMA model. The ACFs of Figure 5 are similar to those of Figure 3 for the fitted ARFIMA(1, d, 0) process, and both suggest the kind of pattern seen in ARCH processes (see Tong 1991, p. 115); that is, processes having variances that are a function of lagged values of the squared process. The linear ARFIMA model and the TSMARS models of Sections 3.2 and 3.3 assume that the dominant-year cycle manifests itself both in the mean and the standard deviation of the process. No fitted ASTAR model showed a limit cycle of 365 days, and the cyclic effect on the mean value was removed (approximately) with an additive sinusoidal component for the logarithm of the data. We can also ask whether the structure of the process is changing with time of year. (Note that in an ASTAR model, the structure changes with the level of the process; no cyclic time thresholds have been postulated.) Figure 6 shows the mean, the standard deviation, and the lag 1 serial correlation, pl, for each of the 365 days in the year. The plots are formed by averaging across the 20 years of logged data and then smoothing along the time axis. The mean value of the logged data (Fig. 6a) is time-of-year dependent; the standard deviation (Fig. 6b) is not. The value of pl (Fig. 6c) is large and changes with time of year, but not always in the same way that the mean changes. Given the multiplier of Xt-l in Equation (5) when XtPl > -.116, the size of pl is not surprising. But pl does not remain high whenever XtPl is high. To cope with this nonstationarity in the data, a TSMARS model that introduces time of year as a categorical predictor variable is considered in Section Model Skeletons and Simulated Periodograms. The right side of an equation such as (4) is called the model skeleton. If it is used with the first 1,415 observed Xt's to generate x ~ and ~ subsequently, ~ ~, x1416 and X1415,..., X2 are used to generate x ~ and ~ so ~ on, then ~ the, resulting output sequence is a trajectory of the skeleton of the estimated ASTAR model. If the initial 1,415 values in the SSTs eventually reappear in this sequence, then the model is said to have a limit cycle. Although Lewis and Stevens (1991) found such a limit cycle for the Wolfer sunspot series using TSMARS, no limit cycles were found for any of the models for SSTs discussed in this article. If trajectories are generated as just described, but with white noise having variance 6: added, (i.e., xi416 is now for the SST data. But because we are using an adaptive ~1416+~1416, etc.), then we have a simulated sample path of spline-type model, an updated model would accommodate the ASTAR model. In generating trajectories having differexperience in the SST process of a broader kind than that ent lengths and/or initial values with and without noise, we modeled in, for example, (5). may find empirically that the process is unstable, meaning

10 Lewis and Ray: Time Series Multivariate Adaptive Regression Splines Best TSMARS Model Residuals Figure 5. (a) The Sample ACF for the Residuals From Equation (4), the Fitted Interactive TSMARS Model With SPAN I, for the Granite Canyon SSTs. (b) The Sample ACF for the Squared Residuals. that /Xt/ tends to infinity with t (see Tong 1990, p. 7 the last three-way interaction term is so large that it uland sec. 4.1). Thus in generating data from the SPAN timately builds up an unstable oscillation with period 2, 1 model of (4) with added noise, we found that the fit- 435, or 1,415 days. Clearly, the interaction is complex. Alted model was in fact stochastically unstable. One can see though examination of simulated data from (4) suggests the the cause of this in (4). The multiplier of a stochastically unstable model, the process does appear LOG SEA SURFACE TEMPERATURES March I to February 28 (a) March 1 to February 28 Figure 6. Averaged and Smoothed (Along the Time Axis) Sample Statistics for the 20 Years of Logged SSTs.

11 890 Journal of the American Statistical Association, September 1997 to be deterministically stable. Other studies of dynarnical systems have also found a nonsystematic relationship between stability of a skeleton generated with and without noise. Intuitively, it seems that models with small SPAN (little smoothing) may be ultimately unstable due to overfitting of the data, although these models generally seem to provide the best long-term predictions. Nonetheless, the difference in predictive power is not great, as can be seen from Figure 4. If the sample path of the series remains stable throughout the simulation, as was the case when using (5), then one can look at sample properties of the simulated series, such as the log periodogram versus log frequency, to see whether the model captures the behavior of the series, such as the long-range dependence. By examining a few very long simulated time series, we conclude that the series simulated from the fitted ASTAR models for SSTs appear to behave like long-range dependent processes; that is, they have long cycles, changing mean levels, and large-sample spectra values at low frequencies, even though the estimated models only approximate true long-range dependence. Other model properties can also be investigated via simulation; for example, the sample ACF for series simulated from a fitted ASTAR model can be used to investigate stationarity of the model. 4. SMASTAR AND CASTAR MODELS A recent implementation of the MARS algorithm allows for categorical predictor variables in MARS (Friedman 1993a). Adapting this implementation to time series implies that if one had m categorical predictor variables and each was constrained in the TSMARS algorithm to enter linearly and without interactions, then each value of each of the categorical predictor variables would contribute a possibly different additive constant to the model for Xt at time t. If interactions among the categorical variables are allowed, then more complex patterns of dependence of Xt can occur. If interactions between categorical and ordinal variables are allowed, then a different AR threshold model can be obtained for each different combination of the categorical variables. We include categorical predictor variables in TSMARS to obtain semimultivariate models of SSTs. For example, the SSTs may be a function of past values of the SSTs, present or past wind speeds, and present or past wind directions, which are categorical predictors. (WD) using the categories 1 = East: On< WD < 45"or 315"< WD 5 360"; 2 = North: 45"< WD 5 135"; 3 =West: 135"< WD < 225"; and 4 = South: 225"< WD < 315". Days in which no wind or only light airs are reported receive a code of 5. The wind categorical variable is not ordered, so that categories East and South are not adjacent, but TSMARS allows these categories to be combined if the SSTs are found to have similar behavior when the wind blows from either direction. Categorizing the wind direction gives information concerning the broad directional effect and leads to interpretable models. We call models that include categorical predictor variables categorical ASTAR (CASTAR) models. The effect of wind direction on SSTs is shown in Figure 7. Using the 5 years and 3 months of data, the figure depicts the range of temperatures reported for each wind category when temperature lags wind direction by 1, 6, 11, or 16 days. Clearly, the SSTs depend on the wind direction; they tend to be lower when the wind blows from the northwest (Categories 2 and 3). The dependence is strongest at lag 1. In fact, oceanographers agree (Hidaka 1954) that the spring transition is driven by the wind shifting to blow from the northwest in the spring. A good descriptive model should show this nonlinear effect on SSTs. We model logged SST (Xt) using 10 lags of wind direction (WDt-,), 10 lags of the logarithm of (1 + wind speed), WSi-,., and 50 lags of logged SSTs (Xt-j). Here we do not remove the 1-year cycle from the data, but include sine and cosine terms as potential predictors variables to investigate whether the cyclic effects are wind driven. The SPAN parameter in the TSMARS model is set equal to 10. The fitted CASTAR model, given next, makes explicit the relationship seen in Figure 7 between the wind direction and the SSTs: Model for 5 years of logged SSTs using wind direction and log of (1 + wind speed): 4.1 Modeling SSTs Using Wind Speeds and Directions Wind speeds and wind directions are two of the most important meteorological variables affecting SSTs, but they are difficult and expensive to measure reliably. For Granite Canyon, wind speed and direction data are available from January 1, 1986-March 31, The direction from which the wind blows at Granite Canyon is a circular variable measured from 0 C-360 C. We code it as a categorical variable The fitted model has the following properties: 1. The logged SSTs range in value from over this 5-year period, and the first, second, and fifth terms in (6) (not counting the constant term) are functions of lagged SSTs. The first term is always present because the threshold of 2.13 is the minimum value of the logged SST series. The third term adds to xt only when Xt-34 is less than 2.22, which rarely happens because the minimum value of the logged SSTs is The three-way interaction between

12 Lewis and Ray: Time Series Multivariate Adaptive Regression Splines EFFECT OF WIND DIRECTION ON LOGARITHM OF SEA SURFACE TEMPERATURES Temperature lags Wind Direction by 1 day Temperature lags Wind Direction by 6 days I Wind Direction Category Temperature lags Wind Direction by 11 days Wind Direction Category Temperature lags Wind Direction by 16 days Wind Direction Category Wind Direction Cateeorv Figure 7. The Effect of Wind Direction on the SSTs at Granite Canyon From January 1, 1986-March 31, logged SSTs in the fifth term occurs infrequently. Lags 1, 7, and 17 were also present in the models of (3), (4), and (5). 2. The change in xt when the wind blows from the northwest on the previous day, which is explicit in the third and fourth terms of (6), agrees with the relationship seen in Figure 7. Splitting the two terms into three by separating out directions 1, 2 and 3, we see that direction 1 causes a slight increase in the overall level (+.013), while directions 2 and 3 cause slight decreases of,013 -,035 = and Both the third and fourth three-way interaction terms involve Xt-4S. Only one of these terms is present for a given t, depending on whether Xt-4g is greater than or less than 2.51 and whether the wind direction is in category 2 or 3. When Xt-4g is larger than 2.51 and the wind speed is greater than 3.00 knots, Xt and Xt-4S are coupled. This kind of oscillation was observed by Breaker and Lewis (1988) for several sets of SSTs, including the one at Granite Canyon. The last two terms of the CASTAR model show that the "oscillation" is present only when the wind direction is from the northwest (WDtPl E {2.3)), and the oscillation increases as lag 1 wind speed increases. Attempts to understand the oscillation with complex demodulation models were fruitless. 4. The WSt = log(1 + wind speed) thresholds, which are selected automatically by the TSMARS algorithm, have a meteorological interpretation. A transformed wind speed threshold of 1.10 knots translates into m/sec, below which wind speeds have little effect on SSTs. Looking at the seventh term in the CASTAR model, a transformed wind speed greater than 3.00 knots, which translates into 9.82 m/sec, lowers SSTs when the wind blows from the northwest and the logged temperature 49 days ago exceeded Note that in deriving (6), we did not subtract a time-of-year cycle from either transformed wind speeds or the transformed SSTs. As noted before, subtracting the cycle from the data destroys the relationship of the thresholds to the physical measurements. It may be better to first remove the cyclic trend if one were to make forecasts from the model. To forecast more than one step ahead requires forecasts for wind speeds and directions. One solution (Lewis and Ray 1993) is to use TSMARS to fit a semi-multivariate model for the wind speeds, as well as for the SSTs, and use these models for prediction. 4.2 Modeling Cyclic Nonstationary in the SSTs using Seasonal Categorical Variables The residual plots in Section 3 (Figs. 3 and 5) indicated that the squares of the residuals for the fitted ARFIMA and ASTAR models are not uncorrelated sequences. This result can perhaps be explained by the fact that the 1-year cycle manifests itself in the SSTs not only in the mean of the time series, but also in its lag-1 serial correlation. A categorical variable Ct taking value 1, 2, 3, 4, 5, or 6, depending on whether t corresponds to the first 2 months of the year, the second 2 months of the year, and so on, can be included as an additional predictor variable to capture more of the yearly cycle. The resulting model can be thought of as a nonlinear generalization of the periodically stationary models of Gladyev (1961). However, it has a much larger

13 892 Journal of the American Statistical Association, September 1997 periodicity (365) than typically considered in the literature. (See McLeod 1994 for details concerning linear modeling of periodically correlated time series.) Preliminary results concerning the use of TSMARS for nonlinear modeling of periodically correlated monthly data having period 1 year have been reported in another work (Lewis and Ray, unpublished manuscript). Additional research into the properties of such models is ongoing. 5. DISCUSSION AND DIRECTIONS FOR FUTURE RESEARCH We have demonstrated how the TSMARS algorithm can be used to obtain ASTAR, SMASTAR, and CASTAR models for a set of SSTs having complex nonlinear and cyclic patterns. The ASTAR models obtained using TSMARS are able to model nonlinear effects and long-range dependence. The ASTAR models with a fixed cycle give better long-term predictions than the linear ARFIMA model analyzed here. None of the skeletons produced from the ASTAR models for SSTs indicates a limit cycle, but the number of cycles present (20) may be too small to induce this behavior. A model containing a limit cycle would be better than one in which a 1-year harmonic must be subtracted from the data for adequate fitting. The SMASTAR model that includes wind direction and wind speed is interpretable by oceanographers. Predictions for more than one step ahead are difficult with the SMA- STAR model, however. Modeling several series with separate SMASTAR models is one solution, but to have a complete multivariate model, one needs to postulate or empirically derive a simple model for the resultant error sequences. Alternatively, an extension of TSMARS to the multivariate case, analogous to multivariate nonparametric regression modeling, could be developed for nonlinear multivariate time series modeling. This is an area for future research. [Received June Revised December REFERENCES Breaker, L. C., and Lewis, P. A. W. (1988), "A Day Oscillation in Sea-Surface Temperature Along the Central California Coast," Estuarine, Coastal, and Shelf Science, 26, Breaker, L. C., Lewis, P. A. W., and Orav, E. J. (19831, "Analysis of a 12- Year Record of Sea-Surface Temperatures Off Point Sur, California," Report NPS , Naval Postgraduate School, Dept. of Operations Research. Broomhead, D. S., Huke, J. P., and Muldoon, M. R. (1992),"Linear Filters and Nonlinear Systems," Biometrika, 54, Casdagli, M. (1992), "Nonlinear Forecasting, Chaos, and Statistics," in Modeling Complex Phenomena, eds. L. Lam and V. Naroditsky, New York: Springer-Verlag, pp Chatfield, C. (1995), "Model Uncertainty, Data Mining, and Statistical Inference" (with discussion), Journal of the Royal Statistical Society, Ser. A, 158,419466, Friedman, J. H. (1991), "Multivariate Adaptive Regression Splines," The Annals of Statistics, 19, (1993a), "Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines," in New Directions in Statistical Data Analysis and Robustness, eds. S. Morgenthaler, E. Rohchetti, and W. A. Stahel, Boston: Birkhausen-Verlag. (1993), "Fast MARS," Technical Report 110, Stanford University, Dept, of Statistics. Geweke, J., and Porter-Hudak, S. (19831, "The Estimation and Application of Long Memory Time Series Models," Journal of Time Series Analysis, 4, Gladyev, E. G. (1961), "Periodically Correlated Random Sequences," Soviet Mathematics (Doklady Akademic Nauk), 2, Granger, C. W., and Anderson, A. P. (1978), "An Introduction to Bilinear Time Series Models," Vandenhoeck and Ruprecht, Granger, C. W. J., and Joyeux, R. (19801, "An Introduction to Long- Memory Time Series Models and Fractional Differencing," Journal of Time Series Analysis, 1, Haltiner, G. J., and Williams, R. T. (1980), Numerical Prediction and Dynamic Meteorology, New York: Wiley. Haslett, J., and Raftery, A. E. (1989), "Space-Time Modelling With Long- Memory Dependence: Assessing Ireland's Wind Power Resource," Applied Statistics, 38, Hidaka, K. (1954), "A Contribution to the Theory of Upwelling and Coastal Currents," Transactions of the American Geophysical Union, 35, Hosking, J. R. M. (1981), "Fractional Differencing," Biometrika, 68, (1984), "Modeling Persistence in Hydrological Time Series Using Fractional Differencing," Water Resources Research, 20, Hurvich, C., Deo, R., and Brodsky, J. (in press), "The Mean-Squared Error of Geweke and Porter-Hudak's Estimator of the Memory Parameter of a Long Memory Time Series," submitted to Journal of Time Series Analysis. Jonas, A. B. (1983), "Persistent Memory Random Processes," Ph.D, thesis, Harvard University. Lewis, P. A. W., and Ray, B. K. (19931, "Nonlinear Modelling of Multivariate and Categorical Time Series Using Multivariate Adaptive Regression Splines," in Dimension Estimation and Models, ed. H. Tong, Singapore: World Scientific, pp Lewis, P. A. W., Ray, B. K., and Stevens, J. (19931, "Modelling Time Series Using Multivariate Adaptive Regression Splines (MARS)," in Time Series Prediction: Forecasting the Future and Understanding the Past, eds. A. Weigend and N. Gershenfeld, Santa Fe Institute, Sante Fe, NM: Addison-Wesley, pp Lewis, P. A. W., and Stevens, J. G. (1991), "Nonlinear Modeling of Time Series Using Multivariate Adaptive Regression Splines (MARS)," Journal of the American Statistical Association, 87, (1992), "Semi-Multivariate Modeling of Time Series Using Adaptive Spline Threshold Autoregressions (ASTAR)," preprint. Mandelbrot, B. B., and Van Ness, J. W. (1968), "Fractional Brownian Motions, Fractional Noises, and Applications," SIAM Review, 10, McLeod, A. I. (1994), "Diagnostic Checking of Periodic Autoregression Models With Application," Journal of Time Series Analysis, 15, McLeod, A. I., and Hipel, K. W. (1978), "Preservation of the Rescaled Adjusted Range. 1. A Reassessment of the Hurst Phenomenon," Water Resources Research, 14, Mendelson, R. (1990), "Relating Fisheries to the Environment in the Gulf of Guinea: Information, Causality and Long-Term Memory," in Les Pecheries Ouest-Africaines: Variabilite, instabilite et changement, eds. P. Cury and C. L. Roy, Paris: OSTROM, pp Ray, B. K. (1993), "Modeling Long Memory Processes for Optimal Long- Range Prediction," Journal of Time Series Analysis, 14, Rosenblatt, M. (1987), "Scale Renormalization and Random Solutions of the Burgers Equation," Journal ofapplied Probability, 24, Robinson, P. M. (1994), "Time Series With Strong Dependence," in Advances in Econometrics: Sixth World Congress, Vol. 1, ed. C. Sims, Cambridge, U.K.: Cambridge University Press, pp Royer, T. C. (1991), "High-Latitude Oceanic Variability Associated With the 18.6-Year Nodal Tide," Journal of Geophysical Research, 98, Schwarz, G. (19781, "Estimating the Dimension of a Model," The Annals of Statistics, 6, Smith, R. L. (1993), "Long-Range Dependence and Global Warming," in Statistics for the Environment, eds. V. Barnett and K. F. Turkman, New York: Wiley, pp

at least 50 and preferably 100 observations should be available to build a proper model

III Box-Jenkins Methods 1. Pros and Cons of ARIMA Forecasting a) need for data at least 50 and preferably 100 observations should be available to build a proper model used most frequently for hourly or