Trend Analysis. By Andrew Harvey. Keywords: component. global warming, outlier, prediction, signal extraction, stochastic trend, unit root, unobserved

Trend Analysis By Andrew Harvey Keywords: component global warming, outlier, prediction, signal extraction, stochastic trend, unit root, unobserved Abstract: The question of defining a trend is one which has exercised the minds of statisticians for many years. In much of the statistical literature, a trend is conceived of as that part of a series that changes relatively slowly over time. Viewed in terms of prediction, the estimated trend is that part of the series that when extrapolated gives the clearest indication of the future long-term movements in the series. Trend analysis is best carried out by means of unobserved-component, or structural, time-series models. The trend components in structural time-series models are always set up in such a way that their forecast functions indicate the long-term movements in the series, and the trend within the series can be obtained by signal extraction methods. The question of defining a trend is one which has exercised the minds of statisticians for many years. Most people would claim to be able to recognize a trend when they saw one, but few would be able to go beyond the rather vague definition in the Concise Oxford Dictionary, which is that a trend is a general direction and tendency [1, p. 1488]. Actually, this definition is not a bad one, in that it defines the trend in terms of prediction. This is the view taken here. In much of the statistical literature, however, a trend is conceived of as that part of a series that changes relatively slowly over time. In other words, smoothness plays a key role in the definition. There is no fundamental reason, though, why a trend should be smooth, except that it is somewhat easier on the eye. What, then, is a trend? Viewed in terms of prediction, the estimated trend is that part of the series that when extrapolated gives the clearest indication of the future long-term movements in the series. The definition makes no mention of smoothness and it is consistent with the idea of indicating a general direction. Having defined the trend in terms of its properties when extrapolated, a mechanism is needed for making this extrapolation, just as a mechanism is needed for making predictions of future values of the series itself. Such a mechanism is provided by a statistical model, which is fitted in the hope that it will provide the best possible predictions. A statistical model for the trend component can be set up as part of the overall model for the series. This submodel should be such that the optimal estimator of the trend at the end of the series gives rise to a forecast function satisfying the criterion of the previous paragraph. The optimal estimator of the trend component within the series is then defined automatically. The properties of the estimated trend within the series therefore emerge as a consequence of the required properties of the trend forecast function and the characteristics of the data. Cambridge University, Cambridge, UK Update based on original article by Andrew Harvey, Wiley StatsRef: Statistics Reference Online 2014 John Wiley & Sons, Ltd. 1

Given the above-mentioned framework, trend analysis is best carried out by means of unobservedcomponent, or structural, time series models. This is the approach described in the books by Harvey [2], Jones [3], Kitagawa and Gersch [4], West and Harrison [5], and Young [6]. The trend components in structural time-series models are always set up in such a way that their forecast functions indicate the long-term movements in the series, and the trend within the series can be obtained by signal extraction methods. The following section describes structural time series models and explains how they are handled statistically. This is followed by a section looking at the implied weighting patterns for such models. These weighting patterns are important for understanding the relationship between parametric and nonparametric approaches to trendanalysis. Afterlookingat kernelestimationand cubic splines,twotypesoftests are introduced. These are tests against nonstationary trends and tests of an increase or decrease in the level of the trend. Outliers and breaks are then discussed and a new class of robust time series models is introduced. The last two sections look at multivariate trend analysis and non-gaussian observations. Most of the data sets used as illustrations can be found in the STAMP computer package of Koopman et al. [7]. 1 Structural Time Series Models The simplest time series models for trend analysis are made up of a stochastic trend component, μ t,anda random irregular term. A deterministic trend emerges as a special case. In the local level model, the trend is just a random walk.thus, y t =μ t + ε t,ε t NID(0,σε 2 ), t = 1,, T (1) μ t =μ t 1 + η t, η t NID(0,ση 2 ) (2) where the irregular and level disturbances, ε t and η t respectively, are mutually independent and where the notation NID(0, σ 2 ) denotes normally and independently distributed with mean zero and variance σ 2. When ση 2 is zero, the level is constant. The signal-to-noise ratio, q = σ2 η σ2 ε, determines how observations should be weighted for prediction and signalextraction. Thehigher is q, the more past observations are discountedin forecasting the future. Similarly, a higher q means that the closest observations receive a higher weight when signal extraction is carried out. Note that, even though the forecast function is horizontal, the model is deemed to have a trend, unless ση 2 iszero,becausethelevelchangesovertime. The local linear trend model is more general in that the trend component in Equation (2) has a stochastic slope, β t, which itself follows a random walk. Thus μ t =μ t 1 + β t 1 + η t, η t NID(0,ση 2 ) (3) β t = β t 1 + ζ t, ζ t NID(0,σ 2 ζ ) (4) where the irregular, level, and slope disturbances, ε t, η t,andζ t, respectively, are mutually independent. If both variances ση 2 and σ2 are zero, the trend is deterministic, that is ζ μ t =μ 0 + βt, t = 1,, T (5) When only σ 2 is zero, the slope is fixed and the trend reduces to a random walk with drift: ζ μ t =μ t 1 + β + η t (6) 2 Allowing σ 2 ζ to be positive but setting σ2 η to zero gives an integrated random walk (IRW) trend, which when estimated tends to be relatively smooth. The signal-to-noise ratio is now given by q ζ = σ 2 ζ σ2 ε.themodel is often referred to as the smooth trend model. Other things being equal, it is desirable to have a smooth trend because it is easier on the eye and more appealing to policymakers. Thus, if the IRW and random walk plus drift trends give a similar fit, the IRW may be preferred.

1.1 State-Space Form and the Kalman Filter The statistical treatment of unobserved component models is based on the state-space form (SSF; see State- Space Methods). Once a model has been put in SSF, the Kalman filter yields estimators of the components based on current and past observations (see Space-Time Kalman Filter). Signal extraction refers to estimation of components based on all the information in the sample. Signal extraction is based on smoothing recursions that run backward from the last observation. Predictions are made by extending the Kalman filter forward. Root mean square errors (RMSEs) can be computed for all estimators and prediction or confidence intervals constructed. The unknown variance parameters are estimated by constructing a likelihood function from the onestep ahead prediction errors, or innovations, produced by the Kalman filter. The likelihood function is maximized by an iterative procedure. Full details can be found in Ref. 2. The calculations can be done with the STAMP package of Koopman et al. [7]. Once estimated, the fit of the model can be checked using standard time series diagnostics such as tests for residual serial correlation. Missing data can be handled very easily within a state-space framework. Irregularly spaced observations are treated by setting up a model in continuous time and then letting the SSF deal with the implied timevarying discrete time model (Ref. 2, Chapter 9). 1.2 Reduced Form If the local level model is differenced, it can be seen that the autocorrelations are zero beyond the first lag. The reduced form is therefore an autoregressive integrated moving average, ARIMA(0,1,1) model (see Autoregressive Integrated Moving Average (ARIMA) Models): Δy t = ξ t + θξ t 1, ξ t NID(0,σ 2 ) (7) and by equating the autocorrelations at lag one, it can be shown that θ =[ q 2 +(q 2 + 4q) 1 2 ] 2 (8) Note that θ is restricted to the range 1 θ 0. The local linear trend model is made stationary by differencing twice. It can be shown that the reduced form is ARIMA(0,2,2): Δ 2 y t = ξ t + θ 1 ξ t 1 + θ 2 ξ t 2, ξ t NID(0,σ 2 ) but only part of the invertibility region is admissible (Ref. 2, p. 69). 1.3 Cycles Distinguishing a long-term trend from short-term cyclical movements is important for macroeconomic time series and it also has applications in many environmental problems. Short-term movements may be captured by adding a serially correlated stationary component, ψ t, to the model. Thus y t =μ t + ψ t + ε t, t = 1,, T (9) An autoregressive process is often used for ψ t. Another possibility is the stochastic cycle: [ ] [ ][ ] [ ] ψt cos λc sin λ ψt = ρ c ψt 1 κt + sin λ c cos λ c κt, t = 1,,T (10) ψ t 1 3

where λ c is frequency in radians and κ t and κt are two mutually independent white-noise disturbances with zero means and common variance σ 2. Given the initial conditions that the vector (ψ k 0 ψ 0 ) has zero mean and covariance matrix σψ 2 I, it can be shown that for 0 ρ<1, the process ψ t is stationary and indeterministic with zero mean, variance σψ 2 = σ2 κ (1 ρ2 ), and autocorrelation function (ACF): ρ(τ) =ρ τ cos λ c τ, τ = 0, 1, 2, (11) For 0 <λ c <π, the spectrum (see Spectral Analysis)ofψ t displays a peak, centered on λ c, which becomes sharper as ρ moves closer to one (Ref. 2, p. 60). The period corresponding to λ c is 2π/λ c. In the limiting cases when λ c = 0orπ, ψ t collapses to first-order autoregressive processes with coefficients ρ and ρ, respectively. More generally, the reduced form is an autoregressive moving average, ARMA(2,1) process in which the autoregressive part has complex roots. The complex root restriction can be very helpful in fitting a model, particularly if there is reason to include more than one cycle. Imposing the smooth trend restriction often allows a clearer separation into trend and cycle. However, following the comments made earlier, such a model should not be forced on the data if diagnostic and goodness-of-fit statistics indicate that it is inappropriate. Example 1. Minks in Canada Data on the number of mink furs traded annually by the Hudson Bay Company in Canada from 1848 to 1906 have featured as an example in many time series studies. The data display a pronounced cycle, and fitting the local linear trend model would simply track the cycle. This is not what is wanted. Use of the STAMP package of Koopman et al. [7] to fit a smooth stochastic trend plus a stochastic cycle plus an irregular disturbance gives the breakdown shown in Figure 1. The damping factor of the cycle, ρ, is estimated as 0.93, and the period is 10.5. The trend actually reduces to a deterministic linear trend as the estimate of the slope variance, σ 2, is zero. However, assuming a deterministic trend at ζ the outset would have been inadvisable. Logarithms (a) 11.5 11 10.5 10 0.5 Logarithms (b) 0 0.5 Logarithms (c) 0.25 0 0.25 1850 1855 1860 1865 1870 1875 1880 1885 1890 1895 1900 1905 Year Figure 1. Mink for trading in Canada by the Hudson Bay Company, 1848 1906: (a) trend ( ), and log mink (- - -) (b) cycle, (c) irregular disturbance. 4

When ρ is unity, the cycle becomes nonstationary. However, suppose that the model is amended slightly by regarding σψ 2, rather than σ2 κ,asbeingfixed.inthiscase,σ2 κ is defined by σ2 κ =(1 ρ2 )σψ 2 and so σ2 κ 0 as ρ 1. When ρ = 1, the model becomes deterministic, but the initial conditions are stochastic because (ψ 0 ψ 0 ) still has zero mean and covariance matrix σψ 2 I. As a result, the process is still stationary as in each realization, different initial values are generated and ψ t has zero mean and ACF as in Equation (11). Hence, the process is deterministic in the sense of Wald and it has a purely discrete or line spectrum. The frequency domain properties of a cycle-plus-noise model are displayed by the spectral distribution function that shows a sudden jump at λ c. A test of the null hypothesis of a deterministic cycle, ρ = 1, against ρ<1isgiven in Ref. 8. Example 2. Rainfall in Fortaleza An initial analysis of an annual series on the number of centimeters of rain falling in Fortaleza, a town in northeast Brazil, indicates that there may be a cycle of around 12 13 years and another of around 25 years. Fitting a model with a random walk trend and two stochastic cycles results in the cycles being estimated as deterministic. The random walk reduces to a constant, so the series is apparently stationary. The stochastic cycle model may be extended with the aim of yielding a smoother extracted cycle. Harvey and Trimbur [9] propose an n-th order stochastic cycle, ψ n,t, defined by [ ] [ ][ ] [ ] ψ1,t cos λc sin λ = ρ c ψ1,t 1 κt + (12) sin λ c cos λ c 0 [ ψi,t ψ i,t ] ψ 1,t [ cos λc sin λ = ρ c sin λ c cos λ c ][ ψi,t 1 ψ i,t 1 ψ 1,t 1 ] [ ] ψi 1,t +, i = 2,,n 0 where κ t NID(0,σκ 2 ). The first-order cycle is as in Equation (10), but with the variance of the disturbance term for the auxiliary process ψ 1,t set to zero instead of σ2 κ.forn = 2, ψ 2,t has a first-order stochastic cycle as a driving variable so the shocks to a second-order cycle are themselves cyclical. The idea of restricting the shock to ψ 1,t is so that the extracted cycle changes relatively smoothly, in the same way as the IRW trend. The estimation of models with higher order cycles is an option in the STAMP package. 1.4 Seasonality Observations recorded quarterly or monthly often exhibit seasonal effects. A seasonal component, γ t, therefore needs to be incorporated into the model. For example, the basic structural model consists of trend, seasonal, and irregular components, that is y t =μ t + γ t + ε t, t = 1,, T (13) Seasonal patterns may evolve over time. In the trigonometric form of stochastic seasonality γ t = s 2 where s isthenumberofseasonsandeachγ j,t is generated by [ ] [ ] [ ] γj,t cos λj sin λ γ = j γ j,t 1 j,t sin λ j cos λ j γ + j,t 1 [ ] s 1 j = 1,, 2 j=1 γ j,t, t = 1,, T (14) [ ωj,t ω j,t ] (15) 5

where λ j = 2πj/s is frequency, in radians, and ω j,t and ω are two mutually independent white-noise disturbances with zero means and common variance σω 2, which is the same for all j.fors even, [(s 2) 2] = j,t s/2, whereas for s odd, [(s 1)/2] = (s 1)/2. For s even, there is a single component at j = s 2: γ s 2,t = γ s 2,t 1 cos λ s 2 + ω s 2,t, var(ω s 2,t )=σω 2 2 (16) An equivalent way of modeling a changing seasonal pattern is by using dummy variables, defined in such awaythatz jt is one in season j and zero otherwise, j = 1,, s.then s γ t = γ jt z jt j=1 where the coefficients, γ jt, j = 1,,s, are constrained to sum to zero so that the seasonal component is not confounded withthe trend.although alls seasonal components are continually evolving, only one affects the observations at any particular point in time, that is γ t = γ jt when season j is prevailing at time t. The seasonal components are continually evolving as random walks, that is γ jt = γ j,t 1 + ω jt, j = 1,,s (17) where ω jt is a zero mean, serially uncorrelated variable, but they are constrained always to sum to zero by the restriction that the disturbances always sum to zero. This restriction is implemented by the correlation structure in Var(ω t )=σ 2 ω (I s 1 ii ) (18) where ω t =(ω 1t,,ω st ), coupled with initial conditions requiring that the seasonals sum to zero at t = 0. ItcanbeseenfromEquation(18)thatVar(i ω t )=0. In the present context, the main reason for modeling seasonal effects is so that attention can be focused on the trend. If desired, the seasonal component can be extracted by smoothing and then subtracted from the observations to give a seasonally adjusted time series. 2 Weighting Patterns for Prediction and Signal Extraction As already noted, the calculations for prediction and signal extraction are generally carried out by means of state-space methods. This section examines the way in which observations are weighted in order to make predictions and extract trends. This can be done analytically in simple cases. For more complex models and irregularly spaced observations, the algorithm described in Ref. 10 may be used to compute weights numerically. Seeing the weighting patterns is important in acquiring an understanding of what exactly the models are doing and how they relate to other methods. The term optimal in this context means minimum mean square error (MSE) as it is assumed that the models are Gaussian. Without Gaussianity, we would still have minimum MSE estimators within the class of linear estimators. 2.1 Prediction 6 An expression for an optimal predictor of a future observation in a local level model can be obtained from the steady state of the Kalman filter or by using classic signal extraction theory as in Ref. 11. It can also be obtained from the ARIMA(0,1,1) reduced form. It depends only on θ, whichisafunctionofq as in Equation (8). Given observations up to and including y T, the optimal predictor of future observations is the same for all lead times and is equal to the filtered estimator of the level at time T, m T.Thus,

ỹ T+l t = m T =(1 + θ) = λ ( θ) j y T j j=0 (1 λ) j y T j, l = 1, 2, 3, (19) j=0 where λ = 1 + θ is the smoothing constant in the exponentially weighted moving average (EWMA). A negative θ, as implied by Equation (8), corresponds to a value of λ between zero and one. The fact that this model underpins the EWMA was pointed out by Muth [12]. The relationship between forecasts from the linear trend and techniques such as Holt Winters and double exponential smoothing is discussed in Ref. [2, Chapter 2]. 2.2 Signal Extraction Consider a Gaussian model consisting of two mutually independent stochastic components, μ t and ν t, that is y t =μ t + ν t, t = 1,, T (20) The classic Wiener Kolmogorov (WK) formula for finding the weights used to extract m t,thatis,the optimal estimator of μ t in a doubly infinite sample, is m t = w(l) y t = w(l) = γ μ (L) γ y (L) j= w j L j y t = j= w j y t+j, (21) where L is the lag operator and γ μ (L) the autocovariance generating function (ACGF) of μ t.theacgfof y t is given by γ y (L) =γ μ (L)+γ ν (L) (22) although this is usually evaluated in terms of the reduced-form parameters. For a stationary ARMA process, written φ 1 (L)θ(L)ξ t, where φ(l) and θ(l) are polynomials in the lag operator and ξ t is white noise with variance σ 2, the ACGF is given directly by θ (L) 2 γ y (L) = φ(l) 2 σ2 (23) where θ(l) 2 = θ(l)θ(l 1 ),andsimilarlyforφ(l). Although Equation (21) is only proved for stationary models in Ref. [11, pp. 56 58], it can still be used for nonstationary models even though expressions such as Equation (23) are no longer strictly ACGFs. The MSE of an estimated component may be derived from the ACGF of m t μ t, denoted by evaluating it at L 0.As m t μ t = γ μ (L) = γ y (L) y t μ t [ ] γμ (L) γ y (L) 1 μ t + γ μ (L) γ y (L) ν t its ACGF is γ m μ (L) = γ μ (L) γ ν (L) γ y (L) (24) 7

If ν t is white noise, ε t, then γ m μ (L) =σε 2 w(l), and if expressions for the weights used to extract the trend can be found, then so can the estimation MSE. 2.2.1 Local-level model Provided q > 0, the WK formula for estimating μ t yields m t = w(l) y t [( ) σ 2 /(σ 2 η 1 + θl 2 ) ] = y 1 L 2 1 L 2 t (1 + θ)2 = 1 + θl y 2 t (25) as ση 2 =(1 + θ)2 σ 2. On recognizing that 1 1 + θl 2 is the ACGF of a first-order autoregressive, AR(1), model with parameter θ, it can be seen that the weights decline symmetrically and exponentially, that is ( ) 1 + θ w j = ( θ) j, j = 0, 1, 2, (26) 1 θ Setting L = 1 in Equation (25) shows that the weights sum to unity. The weights for the smoothed estimator of μ t near the end of a semi-infinite sample are given in Ref. [11, p. 69] as: ( ) 1 + θ [( θ) w j = j +( θ) j + 2(T t)+1], < j T t (27) 1 θ Setting t = T gives the weights for the filtered estimator (Equation 25), while if t T, theweightsareas for a doubly infinite sample as given in Equation (26). The MSE for Equation (25) is ( ) MSE(m t )=σε 2 1 + θ (28) 1 θ The formula may be adapted to semi-infinite samples. The MSE associated with Equation (27) is shown by Whittle [11, p. 70] to be ( ) σε 2 1 + θ [1 θ ( θ) 2(T t) ], t T (29) 1 θ The MSE of the filtered estimator (Equation 19) is obtained when t = T,soitisσε 2 (1 + θ), whereas if T t is large we get the MSE of the smoother, namely Equation (28). 2.2.2 Local linear trend In the local linear trend model, the weights may be found from the WK formula (Equation 21) by noting that the reduced form is ARIMA(0, 2, 2) as in Equation (9). For the smooth trend model, σ 2 η = 0, but σ2 ζ > 0so σ 2 ζ w(l) = σ 2 ζ + 1 L 4 σε 2 = σ 2 ζ σ 2 1 + θ 1 L + θ 2 L 2 2 (30) 8 Because 1 1 + θ 1 L + θ 2 L 2 2 is the ACGF of an AR(2) process with complex roots, the weights decay according to a damped sine wave as in Figure 2. The pattern depends on the signal noise ratio, q = σ 2 ζ σ2 ε.

0.4 0.3 0.2 w j 0.2 0.1 w j 0.1 (a) 0.0 (b) 0.0 0.125 0.06 0.100 0.075 0.04 w j 0.050 0.025 w j 0.02 (c) 0.000 40 20 0 20 40 j (d) 0.00 40 20 0 20 40 j Figure 2. Implied weighting patterns for a smoothed estimator of trend in integrated random-walk plus noise models: (a) q = 1, (b)q = 0.25, (c) q = 0.0001, and (d) q = 0.000625. q is the signal noise ratio. In the more general case when ση 2 isnotnecessarilyzero,theweightingfunctionisoftheform σ 2 ζ w(l) = + 1 L 2 ση 2 σ 2 ζ + 1 L 2 ση 2 + 1 L 4 σε 2 = σ2 μ σ 2 1 + θ μ L 2 1 + θ 1 L + θ 2 L 2 2 (31) where the numerator is obtained from the reduced form of the trend component, which is such that Δ 2 μ t is a moving average MA(1) process with MA parameter θ μ and disturbance variance σ 2 μ.theweightsarenow obtained from the ACGF of an ARMA(2,1) process in which 1 θ μ 0 and, as before, 0 θ 2 < 1and 2 <θ 1 0. The roots of the implied autoregressive polynomial need no longer be complex but, if they are, the damped sine wave starts from j = 1 rather than j = 0. If the roots are real, they must be non-negative. The weights for the slope are obtained by first writing Δy t = β t 1 + η t 1 +Δε t (32) Application of the WK filter then gives a set of weights attached to first differences of the observations, that is b t = w β (L)Δy t+1 = σ 2 ζ σ 2 1 + θ 1 L + θ 2 L 2 2 Δy t+1 (33) As the denominator is equal to σ 2 ζ + 1 L 2 ση 2 + 1 L 4 σε 2,itcanbeseenthatw β (1) = 1, so the weights sum to one. The weights attached to the observations themselves are given by w β (L)(1 L)L 1,andthese clearly sum to zero. However, the weights for estimating the level of the trend sum to one. 9

For the smooth trend, obtained by setting ση 2 = 0, it follows immediately from Equations (30) and (33) that w β (L) =w(l). It is interesting that this weighting function for extracting the level from the observations is the same as the one for extracting the slope from the first differences. However, this is not surprising because b t = w β (L)(1 L) y t+1 = m t+1 m t (34) and this corresponds to the identity in the model. 2.2.3 Higher-order trends The IRW model may be extended further to give a trend in which the mth difference is normally and independently distributed with mean zero and constant variance. The WK filter for a doubly infinite series yields a low-pass Butterworth filter, as described by Gomez [13]. 3 Nonparametric Signal Extraction: Kernels and Splines 10 There is a very close link between the so-called nonparametric techniques for signal extraction and those based on unobserved-component time series models. If the trend in a time series is regarded as a deterministic function of unspecified form, it may be estimated at all points by a weighted MA, the shape of this MA being termed a kernel. By adopting the rather artificial device that observations are assumed to arrive more frequently (in a given time interval) as the sample size increases, it can be shown that a suitably designed kernel will estimate the trend consistently [14]. The method assumes that the nontrending part of the series is white noise, but it may be extended by adopting a hybrid procedure based on fitting an ARMA model as in Ref. 15. An RMSE can sometimes be calculated for the estimated trend as in the case of the simple MA considered in Ref. 16. The implied signal extraction weights from a fitted unobserved-component time series model constitute a kernel. In the random-walk plus noise model, the signal-to-noise ratio plays a similar role to a kernel bandwidth. A lower q corresponds to a wider bandwidth. The difference is that q is treated as a parameter and estimated by maximum likelihood and the fit of the model is checked by standard time series diagnostics. Cubic splines are another way of estimating trends nonparametrically (see Splines in Nonparametric Regression). The connection between cubic splines and the local linear trend model has been known for many years [17]. It is surprising that the cubic spline fitting methodology, as expounded in Ref. 18 and elsewhere, makes little or no reference to the time series literature. The local linear trend model that gives the cubic spline is very close to the smooth-trend (IRW) model, and so Figure 2 gives a good indication of the weight patterns. The filter proposed by Hodrick and Prescott [19],whichisusedextensivelyinmacroeconomics, is the smoothed trend or detrended series obtained as the smoothed component from an IRW model with a preassigned signal noise ratio, q ζ ; for quarterly data, q ζ = 1 1600. Fitting a model seems to be altogether more satisfactory than using a kernel or fitting a spline. Furthermore, the model implicitly provides appropriate weights near the beginning and end of the series and allows RMSEs to be computed for extracted trends at all points in the series. Predictions can also be made and, as discussed later, the model can be made robust to outliers and structural breaks by specifying t-distributions for the disturbances. When the Hodrick Prescott filter is applied to a macroeconomic time series, the aim is usually to extract a cycle by detrending. Thus, it makes far more sense to include a cycle in the model that forms the basis for detrending; the gains from doing so are analyzed in Ref. 20.

4 Testing against Nonstationary Trends The stochastic trend component in the structural time-series model reduces to a deterministic trend as a special case. We may therefore wish to test the null hypothesis that the trend is deterministic against the alternative that the level is stochastic and nonstationary. When the level is stochastic, we may wish to test the null hypothesis that the slope is deterministic against the alternative that it is stochastic. In other words, the test is of the random walk plus drift against the local linear trend. 4.1 Testing against a Stochastic Level Tests of the null hypothesis of a deterministic trend can be based on test statistics that are locally best invariant (LBI). Standard asymptotic theory does not apply, but the test statistics have asymptotic distributions under the null hypothesis that belong to the Cramér von Mises family. In the Gaussian random-walk plus noise model (Equations 1 and 2), the LBI test of the null hypothesis that ση 2 = 0, against the alternative that σ2 η > 0, can be formulated as where η = T i=1 [ i t=1 e t ] 2 T 2 s 2 > c (35) e t = y t y (36) s 2 = T 1 T t=1 (y t y) 2 (37) and c is a critical value [21]. The test can also be interpreted as a one-sided Lagrange multiplier (LM) test. The asymptotic distribution of the statistic is Cramér von Mises, for which the 5% critical value is 0.461. If the trend is a random-walk plus drift, as in Equation (6), it becomes deterministic when ση 2 = 0. Thus, y t =μ 0 + βt + ε t, t = 1,, T (38) The test statistic, η 2, is as in Equation (35) except that it is formed from the ordinary least squares (OLS) residuals from a regression on a constant and time. The asymptotic distribution is a second-level Cramér von Mises distribution for which the 5% critical value is 0.149. If ε t is any indeterministic stationary process, rather than white noise, the asymptotic distribution of the test statistic under the null hypothesis remains the same if s 2 is replaced by a consistent estimator of the long-run variance: σ 2 L = τ= γ (τ) (39) where γ(τ) is the autocovariance of ε t at lag τ. Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) construct such an estimator nonparametrically as Ref. 22 T σ 2 1 L (l) =T e 2 t t=1 = γ(0)+2 + 2T 1 l τ=1 w(τ,l) T t=τ+1 e t e t τ l w(τ,l) γ(τ) (40) τ=1 where w(τ,l) is a weighting function, such as w(τ,l) =1 τ/(l + 1), τ = 1,, l. Note that the longrun variance can be interpreted as the spectrum at frequency zero, and the weighting function w(τ,l) is 11

a lag window. An important decision in applying the KPSS test is the choice of l. Although asymptotic theory tells us the rate at which l should increase with T, it gives no guidance as to the choice of l in a small sample. Indeed, the best choice depends on the properties of the series, but it is precisely this information that the nonparametric approach assumes is unavailable. Serial correlation can be handled parametrically by estimating an unrestricted model and then constructing a test statistic of the form (Equation 35) from the innovations obtained by running the Kalman filter with σ 2 η set to zero[23]. Following the argument in Ref. 24, a parametric test will tend to have a more reliable size and higher power. Example 3. Nile AnnualdataonthevolumeoftheflowoftheNileareshowninFigure3.Fittingamean and computing the test statistic in Equation (35) gives a value of η = 2.53, indicating a clear rejection of the null hypothesis that there is no random walk component. The KPSS test gives the same result, with the statistics for l = 3 and 7 being 1.10 and 0.73, respectively. Before jumping to the conclusion that the trend is a random walk, it should be noted that the η test also has power against structural breaks in an otherwise stationary series. We will return to this point later. Example 4. Rainfall and Minks Environmental time series may contain cyclical components that are deterministic or near deterministic (see Time Series, Periodic). Care has to be taken if nonparametric tests are used in such situations. For example, the near-deterministic cycle in the mink series (Example 1) results in the KPSS statistic (with time trend fitted) taking the values 0.176, 0.096, 0.146, 0.133, and 0.164 for lag lengths, l, of 0, 5, 10, 15, and 20, respectively, so the test gives no clear message. Deterministic cyclical components, as found in the Fortaleza rainfall series (Example 2), should be accounted for by including the relevant sines and cosines as explanatory variables. The test takes a different form when the alternative is a smooth trend and the null is σ 2 ζ Harvey [25] derive this test but then go on to show that it has no more power than η 2. = 0. Nyblom and 1400 Nile Level 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Figure 3. Flow of the Nile (- - -) and its estimated level ( ). 12

4.2 Testing against a Slope Taking first differences in Equation (3), and assuming for simplicity of notation that y 0 is also observed, gives Δy t = β t 1 + ω t, t = 1, 2,,T (41) where ω t = η t +Δυ t. More generally, ω t may be a zero mean weakly dependent process with long-run variance σ 2 L > 0. 4.2.1 Deterministic slope When the slope in Equation (41) is assumed to be deterministic, a nonparametric test of the hypothesis that it is zero can be set up in first differences using the test statistic t β (l) =T 1 2 β σ L (l) (42) where β = T 1 T t=1 Δy t =(y T y 0 ) T and the denominator is the square root of an estimator of the long-run variance based on l lags, as in Equation (40). Various options for the kernel and guidelines for choosing l may be found in Andrews [26]. Under the null hypothesis of no slope, that is β = 0, t β (l) is asymptotically standard normal; see Busetti and Harvey [27]. The test can also be carried out by fitting a parametric model and testing the significance of the estimate of the fixed slope β. Either an ARMA model can be fitted to first differences or a structural time series model may be estimated. The latter may have some attraction when the series is such that it is natural to include components such as cycles and seasonals in a model for the levels. 4.2.2 Stochastic slope Consider the model given by Equations (41) and (4). If ω t NID(0,σ 2 ω ) and ζ t NID(0,σ2 ζ ),thelbitest of the null hypothesis of a deterministic slope against the alternative of a nonstationary stochastic slope, that is H 0 σ 2 ζ = 0 against H 1 σ2 ζ > 0istorejectforlargevaluesof ζ = T 2 σ 2 T t=1 ( t ) 2 T (Δy i β) = T 2 σ 2 (y t y 0 tβ) 2 (43) i=1 where σ 2 is the sample variance of the first differences. This is the test of Nyblom and Mäkeläinen [21] applied to first differences. Asymptotically, ζ has the Cramér von Mises distribution, denoted CvM 1, under the null hypothesis. Parametric and nonparametric forms of the above-mentioned LBI tests may be constructed. In the nonparametric case, σ 2 L (l) replaces σ2 just as in the t β (l) statistic of Equation (42). The properties of this test are investigated in Ref. 27. A parametric statistic can be constructed by fitting an ARIMA or structural time series model. t=1 5 Tests of an Increase or Decrease in the Level Grenander [28] considered the deterministic trend model y t = α + βt + υ t, t = 1,, T (44) where υ t is a stationary indeterministic process and suggested a test of significance which could be one-sided on b, the OLS estimator of β. Thevarianceofb in this case is given by σ 2 L Σ(t t)2, where 13

σ 2 is the long-run variance defined in Equation (39). While the stationary part of the model is handled L nonparametrically, the assumption of a linear trend is very restrictive. A test constructed to have power against the alternative of a deterministic but monotonically increasing (or decreasing) trend was suggested by Brillinger [29]. The test uses the linear combination w t y t where w t = { ( (t 1) 1 t 1 T )} 1 2 { t ( 1 t T )} 1 2, t = 1,, T (45) The weighting pattern means that it strongly contrasts the beginning and end levels of the series. The test statistic wt y τ = t ( σ 2 ) L w 2 1 2 (46) t has a standard normal distribution when the level is constant and the test is one-sided on the positive (negative) tail. The method suggested by Brillinger for computing an estimator of σ 2 is different from the L one in Equation (40) in that it is based on a rectangular spectral window applied to the residuals from a simple MA. When a stochastic trend is fitted, the relevant question may be whether there has been a change in the underlying level. This requires estimating the temporal (level) contrast, μ T μ 1,andRMSE.Toseehowthis is done, suppose a model contains a general stochastic trend of the form (3), (4) and consider estimating themoregeneralcontrast,μ r μ s, where r > s. The smoothed estimators, μ r T and μ s T, are easily obtained. To find the RMSE of μ r T μ s T, define μ (s,r) t = { μ t μ s for (s + 1) t r μ r μ s for t > r and add it to the state vector. By including the transition equation μ (s,r) t (47) =μ (s,r) t 1 + β t 1 + δ(s,r) t η t, t = s + 1,, T (48) with μ (s,r) s = 0, and δ (s,r) t = { 1 for (s + 1) t r 0 for t > r (49) The estimator of μ (s,r) t is tracked by the Kalman filter. The estimator of μ r μ s is given by μ (s,r) together with T the RMSE. A test of significance, possibly one-sided, may be based on the standard normal distribution. The temporal contrast test statistic is (s,r) μ r T μ s T MSE( μ r T μ s T ) = μ T MSE(μ (s,r) T ) (50) 14 The above-mentioned algorithm is basically a fixed-point smoother as in Ref. [2, subsection 3.6.1]. Using a fixed-interval smoother requires that the covariance between the estimation errors at times s and r be computed. This may be done as outlined in Ref. 30, but adding the extra code may be tedious. However, if t s is relatively large, and the trend evolves fairly rapidly, the covariance is negligible and the test statistic

can be computed by noting that MSE( μ r T μ s T ) MSE( μ r T )+MSE( μ s T ), s r (51) Inthecasewherethetwoendpointsaretobecompared,theabove-mentionedMSEisgivenby2MSE( μ T T ) and so it may be obtained from the Kalman filter output at the end of the series. However, in making such a comparison, it should be noted that the estimates of the trend at the end and beginning have a larger MSE than those within the sample. A more accurate contrast is given between estimates a few periods after the beginning and before the end. Example 5. Nile Fitting a local level to the Nile data introduced earlier (Example 3) gives σ ε = 122.9and σ η = 38.3. The estimated underlying level is 1111.7 at the beginning of the series and 798.4 at the end. The temporalcontrasttest statistic is 2.64, which is clearly significant. Example 6. Mink In the mink series (see also Examples 1 and 4), the trend is deterministic, so the approximate formula (51) cannot be used. However, there is no need, as a direct test can be carried out on the t-statistic obtained by dividing the estimate of the slope by its RMSE. As this is based on a full model, the effect of the cyclical component is taken into account. However, as was demonstrated at the end of the previous section, a nonparametric estimator of the long-run variance needs to be constructed with some care. Example 7. Global Warming Figure 4 shows the global annual surface land and marine air anomolies from 1880 to 1990 with respect to the 1950 1979 average; see Ref. 31. The smooth trend (IRW) model gives a good fit. The STAMP output shows that the level at the end of the series is 0.331 with an RMSE of 0.052. The level at the beginning, in 1880, is 0.423 and the difference is clearly significantly different from zero because the RMSE for the temporal contrast test statistic (Equation 50) is 0.076. 0.4 Globtemp Trenddet Trendsmooth 0.2 0.0 0.2 0.4 0.6 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 Figure 4. Global annual surface land and marine air anomalies with respect to the 1950 1979 average (horizontal line), together with stochastic trend and deterministic trend (straight line). 15

1.0 Temp_anomaly Realized-Temp_anomaly Forecast-Temp_anomaly +/ SE 0.8 0.6 0.4 0.2 0.0 0.2 Figure 5. to 1990. 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Predictions of temperature (anomalies) made from fitting an IRW plus noise model to data up Example 8. Global Warming Revisited The accumulation of more observations as the data were analyzed for the first edition of this volume allows a genuine assessment of the performance of the model; see the HadCRUT4 data set in http://www.cru.uea.ac.uk/cru/data/temperature. Figure 5 shows the unconditional forecasts made from 1990 together with the realized observations. The performance of the model is impressive. If an AR(1) disturbance is fitted, the predictions, shown in the figure, are even better: the projected trend is lower and very close to the 2015 observation. In a model with a stochastic trend, the lack of interaction between estimators of the level at the beginning and end of the series makes it easy to obtain the implicit weighting function used to construct μ T.Atthe end of the series, it is simply the weights used to form the filtered estimator. At the beginning, the weights are the same but negative. Thus μ T = w t y t (52) where, for the local level model w t = λ (1 λ) T t λ (1 λ) t 1, t = 1,, T 16 As in Equation (19), the rate of decay, λ, depends on the signal-to-noise ratio, q. Ifq =, there is no irregular component, and λ = 1; in this case w T = 1andw 1 = 1with( μ T )=0. The weighting pattern for the local linear trend model will show a slower initial decay, as it has a shape that derives from Figure 2. It is interesting to contrast these weighting patterns with the nonparametric weights used in Brillinger s test statistic (Equation 46). More complicated structural time-series models, including ones with nonstationary seasonal components, may be constructed and the test of a rise or fall in the level carried out in the same way. The weighting pattern for μ may be obtained from the algorithm of Koopman and T Harvey [10].

6 Outliers and Structural Breaks 6.1 Detection A stochastic trend allows for small breaks in every time period. However, there may sometimes be large structural breaks that arise because of some external event or policy change (see Change, Detecting). These can sometimes be detected by looking at a trend. A better way, which leads to formal tests, is to look at what Harvey and Koopman [32] call the auxiliary residuals. These are estimates of the various disturbances in the model. Thus, the level auxiliary residual is an estimate of η t in Equation (2) or (3), and a large value indicates a shift in the level. If a break can be identified with a particular event, it is reasonable to model it by including a dummy variable in the model. Example 9. Nile Fitting a local level model to the Nile data of Figure 3 shows the estimated trend falling around 1900. A plot of the level auxiliary residuals (see State-Space Methods) points to 1899 as being the likely date for a break, and in fact, this was when the first Aswan dam was built. If the local level is fitted with a dummy variable that is zero up to 1898 and one thereafter, ση 2 is estimated to be zero and so the level is fixed. Outlying observations can be similarly detected by the irregular auxiliary residuals. 6.2 Robust Estimation Robust models offer an alternative way of dealing with outliers and breaks (see Robust Inference). Thus, allowing the irregular term, ε t in Equation (1), to have a heavy-tailed distribution, such as Student s t, provides a way of dealing with outliers, while a heavy-tailed distribution in the transition equation allows for the possibility of structural breaks. Simulation techniques, such as those described in Durbin and Koopman [33] and Robert and Casella [34], are used to estimate the model. Example 10. Gas Consumption in the United Kingdom Estimating a Gaussian basic structural model for UK gas consumption produces a rather unappealing wobble in the seasonal component at the time North Sea gas was introduced in 1970. Durbin and Koopman [33] allow the irregular distribution to follow a t -distribution and estimate its degrees of freedom to be 13. The robust treatment of the atypical observations in 1970 produces a seasonal component that evolves more smoothly around that time. An alternative approach to constructing a robust trend model is to adopt an observation-driven model in which the conditional distribution of the tth observation is specified to have a heavy-tailed distribution with a mean that is a linear combination of past scores; see Refs 35 37. Thus, for the simple case when the trend takes the form of a random walk plus drift, y t = μ t t 1 + v t, t = 1,,T μ t+1 t = δ + μ t t 1 + κu t (53) where κ is a parameter, v t a serially independent random variable with zero location and scale φ, andu t the conditional score with respect to location, that is the first derivative of the logarithm of the conditional probability density function of y t with respect to μ t t 1. When the conditional distribution is Student s t with f degrees of freedom, u t =(1 +(y t μ t t 1 ) 2 φ 2 f ) 1 v t (54) 17

u 2 1 2 1 1 2 3 4 5 6 7 y 1 2 Figure 6. Score for t 5 (thick dash) with score for EGB2 (with shape parameter ξ = 0.5) and normal (thin dash). Figure 6 shows the impact, u, of a standardized (φ = 1) observation for five degrees of freedom. The Gaussian response is the 45 line. For low degrees of freedom, observations that would be seen as outliers for a Gaussian distribution are far less influential. The score is redescending in that as y, the response tends to zero. As such, the score is an impact curve that exhibits a soft form of what is called trimming in the robust literature; see Maronna et al. [38] Trimming may be contrasted with Winsorizing where the response remains as a nonzero constant after a certain point. Figure 6 shows a soft form of Winsorizing induced by modeling the observations with an EGB2 distribution, as in Refs 39 and 40. The above-mentioned model belongs to the class of what are called dynamic conditional score (DCS) models in Ref. 35 and generalized autoregressive score (GAS) in Ref. 36. These models have proved to be very effective for modeling volatility, especially in financial series, as well as for modeling location. 7 Multivariate Models The local linear trend model can be generalized to the multivariate case in a straightforward manner by simply writing y t = μ t + ε t, ε t NID(0, Σ ε ), t = 1,, T (55) μ t = μ t 1 + β t 1 + η t, η t NID(0, Σ η ), t = 1,, T (56) β t = β t 1 + ζ t, ζ t NID(0, Σ ζ ), t = 1,, T (57) 18 where y t is an N 1vectorandΣ ε, Σ η,andσ ζ are N N covariance matrices. Seasonals and cycles can be added. The covariance matrices could reflect all types of influences. For example, if the series are for different regions, they may have some kind of spatial structure that captures geographical proximity. As well as having an interesting interpretation, multivariate models may provide more efficient inferences and

forecasts. Common factor models, in which some of the covariance matrices are less than full rank, are of particular importance. The basic ideas can be illustrated with a bivariate local level model y 1t =μ 1t + ε 1t, μ 1t =μ 1t 1 + η 1t, t = 1,, T (58) y 2t =μ 2t + ε 2t, μ 2t =μ 2t 1 + η 2t, t = 1,, T (59) The covariance matrix of (η 1t, η 2t ) may be written [ Σ η = σ 2 1η ρ η σ 1η σ 2η ρ η σ 1η σ 2η σ 2 2η ] (60) where ρ η is the correlation. The model can be transformed as follows: y 1t =μ 1t + ε 1t (61) y 2t = πμ 1t + μ t + ε 2t (62) where π = ρ η σ 2η /σ 1η, and the covariance matrix of the disturbances driving the new multivariate random walk (μ 1t, μ t ) is [ ] [ ] η1t σ 2 0 1η var = η t 0 σ 2 2η (63) ρ2 η σ2 2η In other words by setting η 2t = πη 1t + η t, two uncorrelated levels, μ 1t and μ t,based,respectively,onη 1t and η t are obtained. If ρ η =±1, then μ t, is constant and there is only one common trend, which will now be denoted as μ t rather than as μ 1t.Hence y 1t =μ t + ε 1t, t = 1,, T (64) y 2t = πμ t + μ+ε 2t, t = 1,, T (65) If π = 1, the trend in the second series is always at a constant distance, μ, from the trend in the first series, that is μ 2t = μ+μ 1t. This is known as balanced growth. Premultiplying the observation vector in Equations (64) and (65) by the matrix [ ] π 1 0 1 gives y 2t = πy 1t + μ+ε t (66) y 1t =μ t + ε 1t (67) where ε t = ε 2t πε 1t and π = sgn (ρ)σ 2 /σ 1. This is equivalent to the original common trends model and can be estimated directly. Because the linear combination y 2t πy 1t is stationary, the nonstationary series y 2t and y 1t are said to be cointegrated. There is a vast amount of econometric literature on cointegration, and it is important to note the connection with common trends; see, for example, Ref. 41. These techniques have carried over to the climate change literature in recent years. For example, the 2013 article by Kaufmann et al. [42] concludes these results indicate that using the ideas of stochastic trends, cointegration, and error correction can generate reliable conclusions regarding the causes of changes in global surface temperature and more reliable forecasts of how temperature will evolve in the future. 19

Multivariate models can be used for estimating an intervention effect with a control group. Common trends can lead to significant gains in efficiency as illustrated by the analysis of the British seat-belt law in Ref. 43. 8 Count Data and Qualitative Observations Data are sometimes intrinsically non-gaussian. For example, point processes give rise to count data or duration data, while a qualitative responses, such as yes or no, may be coded as a binary series (see Binary Data). Most of the distributions used to model such data are from the exponential family. There are two main ways of dealing with stochastic trends. The first is to combine a nonlinear measurement equation with a linear transition equation. Thus, for a Poisson distribution p(y t )=λ t e λ t yt! (68) the mean, λ t, may be connected to trend, cycle, and seasonal components by an exponential link function, that is ln λ t =μ t + ψ t + γ t. The statistical treatment is by simulation methods; see Refs 44 and 45. Computer-intensive techniques can be applied within the classical framework or via Bayesian methods and modeling. Alternatively simulation techniques can be avoided by developing observation-driven models, as in Ref. 44. One class of such models, described in Refs 33 and 34, makes use of conjugate filters. Thus, for Poisson observations, a gamma prior combines naturally to produce a gamma posterior and a likelihood function. The disadvantage is that only the level can be stochastic. Nevertheless, although the slope and seasonals have to be fixed, this may not be a major drawback for count and qualitative observations. As regards testing, Nyblom [47] shows that the η test against a random walk component can be employed with non-gaussian observation, and Brillinger [48] generalizes the test statistic in Equation (46) so as to detect an increase or decrease in the level of binary data. Related Articles Change, Detecting; Forecasting, Environmental; Temporal Change; Trend Detecting. References [1] OUP (1995) The Concise Oxford Dictionary, 9th edn, Oxford University Press, Oxford. [2] Harvey, A.C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, Cambridge. [3] Jones, R.H. (1993) Longitudinal Data with Serial Correlation: A State-Space Approach, Chapman and Hall, London. [4] Kitagawa, G. and Gersch, W. (1996) Smoothness Priors Analysis of Time Series, Springer-Verlag,Berlin. [5] West, M. and Harrison, P.J. (1989) Bayesian Forecasting and Dynamic Models, Sringer-Verlag,NewYork. [6] Young, P. (1984) Recursive Estimation and Time Series Analysis, Springer-Verlag,Berlin. [7] Koopman, S.J., Harvey, A.C., Doornik, J.A., and Shephard, N. (2009) STAMP 8.2: Structural Time Series Analysis Modeller and Predictor, Timberlake Consultants Ltd, London. [8] Harvey, A.C. and Streibel, M. (1998) Tests for deterministic versus indeterministic cycles. J. Time Ser. Anal., 19, 505 529. [9] Harvey, A.C. and Trimbur, T. (2003) General model-based filters for extracting trends and cycles in economic time series. Rev. Econ. Stat., 85, 244 255. [10] Koopman, S.J. and Harvey, A.C. (2003) Computing observation weights for signal extraction and filtering. J. Econ. Dyn. Control, 27, 1317 1333. [11] Whittle, P. (1983) Prediction and Regulation, 2nd edn, Blackwell, Oxford. [12] Muth, J.F. (1960) Optimal properties of exponentially weighted forecasts. J. Am. Stat. Assoc., 55, 299 305. 20