Time Series. Anthony Davison. c

Series Anthony Davison c 2008 http://stat.epfl.ch Periodogram 76 Motivation............................................................ 77 Lutenizing hormone data.................................................. 78 Periodogram........................................................... 79 Example: Sine wave with white noise.......................................... 80 Example: Sine wave with white noise.......................................... 81 Example: AR(1), 0.9..................................................... 82 Example: AR(1), 0.9.................................................... 83 Example............................................................. 84 Comments............................................................ 85 A trigonometric lemma................................................... 86 Properties of the periodogram............................................... 87 Reminder: Multivariate normal distribution...................................... 88 Smoothing 89 Motivation............................................................ 90 Moving averages........................................................ 91 Polynomial regression..................................................... 92 Local polynomial regression................................................ 94 Local linear polynomial smoother............................................. 95 Comments............................................................ 97 STL decomposition...................................................... 98 Summary............................................................ 101 1

Week 3 Periodogram Smoothing Series Autumn 2008 slide 75 Periodogram slide 76 Motivation Many series have periodic structure (e.g. sunspots, CO2 data,...), but we may not know what the frequencies are in advance of looking at the data The periodogram is a summary description based on representing the observed series as a superposition of sine and cosine waves of various frequencies The idea is that the periodogram will tell us what frequencies are most important Consider first the simple model Y t = αcos(ωt) + β sin(ωt) + ε t, t = 1,...,n, (1) where {ε t } is white noise, ω = 2π/p is the known frequency of the fluctuations, p is their known period, and α,β are unknown parameters to be estimated by least squares. As we can write αcos(ωt) + β sin(ωt) = (α 2 + β 2 ) 1/2 sin(ωt + γ), γ = tan 1 (α/β), the right-hand side of equation (1) is equivalent to any sinusoidal function with frequency ω Series Autumn 2008 slide 77 Lutenizing hormone data Data on lutenizing hormone in n = 48 successive blood samples from a woman, taken at 10-minute intervals. Fourier series for n = 48 are shown in the lower panel. lh 1.5 2.0 2.5 3.0 3.5 0 10 20 30 40 Fourier series 0 10 20 30 40 Series Autumn 2008 slide 78 35

Periodogram Definition 12 (a) If y 1,...,y n is an equally-spaced time series, its periodogram ordinate for ω is defined as { 2 { } 2 I(ω) = n 1 y t sin(ωt)} + y t cos(ωt), 0 < ω < π/2. (b) The periodogram is a plot of I(2πj/n) for the Fourier frequencies 2πj/n for j = 1,...,m = (n 1)/2 ; I(π) is included only if n is even. By default R plots the log periodogram, log I(2πj/n), against j/n. (c) The cumulative periodogram C r = r j=1 I(2πj/n) m l=1 I(2πl/n), r = 1,...,m, is a plot of C 1,...,C m against the frequencies j/n for j = 1,...,m. Series Autumn 2008 slide 79 Example: Sine wave with white noise Top: data from a simulated sine wave with added white noise. Bottom: log periodogram with red horizontal line showing noise variance σ 2 = 0.25, and a green vertical line showing the signal frequency 1/200. The blue line shows the width of a 95% confidence interval for the true value at each point. Sine wave with frequency 200 and white noise variance 0.25 2 1 0 1 2 0 100 200 300 400 500 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 80 36

Example: Sine wave with white noise Top: data from a simulated sine wave with added white noise. Bottom: log periodogram with red horizontal line showing noise variance σ 2 = 1, and a green vertical line showing the signal frequency 1/20. The blue line shows the width of a 95% confidence interval for the true value at each point. Sine wave with frequency 20 and white noise variance 1 4 2 0 2 4 0 100 200 300 400 500 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 81 Example: AR(1), 0.9 Data from a simulated AR(1) model with parameter 0.9, with log periodogram and theoretical value (in red). The blue line shows the width of a 95% confidence interval for the true value at each point. The log scale on the vertical axis means there is a very large change in the periodogram itself. AR(1), 0.9 y 6 2 0 2 4 6 0 100 200 300 400 500 spectrum 1e 03 1e 01 1e+01 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 82 37

Example: AR(1), 0.9 Data from a simulated AR(1) model with parameter 0.9, with log periodogram and theoretical value (in red). AR(1), 0.9 6 2 0 2 4 6 y 0 50 100 150 200 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.00144 Series Autumn 2008 slide 83 Example Data on luteinizing hormone in n = 48 blood samples at 10-minute intervals from a human female. Top left: data; top right: periodogram; bottom left: possible Fourier series for n = 48; bottom right: cumulative periodogram. lh 1.5 2.0 2.5 3.0 3.5 spectrum 0.01 0.05 0.20 1.00 0 10 20 30 40 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.00601 Fourier series 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 frequency Series Autumn 2008 slide 84 38

Comments Low frequency variation (trend) appears at the left of the periodogram, and high frequency variation (rapid oscillations) appears at the right. The rationale for considering only the frequencies ω = 2πj/n is that m yt 2 = I(0) + 2 I(2πj/n) + I(π), (2) j=1 with I(π) included only if n is even. Thus the periodogram decomposes the total variability y 2 t of the data into components associated with each of these frequencies, plus one for the grand mean I(0) = ny 2, which we ignore because it is not periodic. The rationale for plotting the log periodogram is that the periodogram ordinates are (roughly) exponentially distributed, and the log transformation is variance-stabilising for the exponential distribution. A rough significance scale for the log-periodogram is shown by the vertical line on its right. The cumulative periodogram provides a visual test of whether the series is white noise. We compare C r with its expected value r/m: a large value of the Kolmogorov Smirnov statistic D = max C r r/m suggests that the underlying series is not white noise. The test involves seeing whether the cumulative periodogram falls outside a diagonal band, whose width determines the size of the test. Series Autumn 2008 slide 85 A trigonometric lemma Lemma 13 (a) Let ω j = 2πj/n, for n N and positive integer j < n/2, and write c t = cos(tω j ),s t = sin(tω j ). Then c t = s t = s t c t = 0, s 2 t = c 2 t = n/2. (b) If ω 1 = 2πj 1 /n,ω 1 = 2πj 2 /n for positive integers j 1 j 2 < n/2, and we write c kt = cos(tω k ),s kt = sin(tω k ) for k = 1,2, then s 1t s 2t = s 1t c 2t = c 1t c 2t = 0. (c) If n is odd, and we write s tj = sin(tω j ), c tj = cos(tω j ), ω j = 2πj/n, for j = 1,...,m = n/2, t = 1,...,n, then the columns of the n n matrix 1 s 11 c 11 s 12 c 12 s 1m c 1m 1 s 21 c 21 s 22 c 22 s 2m c 2m Q =....... 1 s n1 c n1 s n2 c n2 s nm c nm are orthogonal, Q T Q = diag(n,n/2,...,n/2) = D, say. Series Autumn 2008 slide 86 39

Properties of the periodogram iid Theorem 14 If Y 1,...,Y n N(µ,σ 2 ), then all the periodogram ordinates are independent, and (a) the I(2πj/n), for j = 1,...,m, are exponential random variables with common mean σ 2 ; (b) if n is even, I(π) σ 2 χ 2 1 ; finally, (c) the cumulative periodogram ordinate C r = r j=1 I(2πj/n) m l=1 I(2πl/n), r = 1,...,m, has a beta(r,m r) distribution, and so has mean and variance r/m, r(m r)/m. Part (a) of this result tells us that Gaussian white noise has a flat spectrum. In fact the assumption of Gaussianity is needed only for the independence, and the spectrum of non-gaussian white noise is flat. Series Autumn 2008 slide 87 Reminder: Multivariate normal distribution Definition 15 The vector random variable Y = (Y 1,...,Y n ) T is said to have the multivariate normal distribution with mean vector µ n 1 = (µ 1,...,µ n ) T with ith element µ i = E(Y i ) and (co)variance matrix Ω n n with (i,j) element ω ij = cov(y i,y j ), written Y N n (µ,ω), if its density function is f(y;µ,ω) = 1 (2π) n/2 Ω 1/2 exp { 1 2 (y µ)t Ω 1 (y µ) }, y R n,µ R n, where Ω is a symmetric positive definite matrix. In this case its moment-generating function is E(e tty ) = exp(t T µ + 1 2 tt Ωt), t R n. In particular, if Y 1,...,Y n iid N(µ,σ 2 ), then the mean vector and variance matrix are µ1 n and σ 2 I n. Lemma 16 If Y N n (µ,ω), and a m 1 and B m n are constant, with B of rank m < n, then a + BY N m (a + Bµ,BΩB T ). Series Autumn 2008 slide 88 40

Smoothing slide 89 Motivation Underlying model is Y t = µ(t) + ε t, where µ(t) is smooth function of t and {ε t } is stationary. Differencing removes (some) trend to give (roughly) stationary series Sometimes we want to examine the trend by smoothing the time series Approaches: moving average (simple, related to differencing) polynomial (simple, doesn t work very well) local polynomial (simple, easy to robustify) spline (simple, similar to local polynomial) STL decomposition (robust fitting of local polynomial, with seasonal effects) Series Autumn 2008 slide 90 Moving averages Classical approach to smoothing: given data y 1,...,y n, replace y t by (y t+1 + y t + y t 1 )/3, or in general construct the moving average of order 2p + 1, s t = p j= p w j y t+j, t = p + 1,...,n p, p N, and weights w j, with w j = 1 and (usually) w j > 0 and w j = w j. This is an example of a linear filter. Fixes are possible near the ends, but usually p n, so the details are unimportant. Choose weights by iterating simple (equally-weighted) smoothers (example) choosing higher order to remove (or at least decrease) seasonality, for example taking p = 6, w 6 = w 6 = 1/24 and all other w j = 1/12. taking smaller order to highlight seasonality Series Autumn 2008 slide 91 41

Polynomial regression Fit polynomial of degree k to the data; assume that where {ε t } is stationary series Y t = s(t) + ε t = β 0 + β 1 t + + β k t k + ε t, Choose parameters β 0,...,β k to minimise the sum of squares {y t s(t)} 2 = { y t (β 0 + β 1 t + + β k t k )} 2, giving β k+1 1 = (X T X) 1 X T y, where y T = (y 1,...,y n ) and (t,j) element of n (k + 1) matrix X is t j 1. Comments: sensitivity to observations at extremities of series often leads to poor fit usually doesn t work well because polynomials are too restrictive may need orthogonal polynomials to avoid numerical problems if n, k large easily copes with missing values/unequally spaced observations Series Autumn 2008 slide 92 Example: Northern hemisphere temperatures Temperature anomaly ( C) for 0 1979 relative to 1961 1990 instrumental average, with polynomials of degree k = 3 (blue), 10 (red), 20 (cyan) moberg 1.0 0.5 0.0 0 500 1000 1500 2000 Series Autumn 2008 slide 93 42

Local polynomial regression Fit polynomial of degree k = 0,1 or so to the data, but locally See picture on next slide. Use kernel weights w(t t 0 ) that downweight observations far from t 0, and minimise weighted sum of squares [ { w(t t 0 ) y t β 0 + β 1 (t t 0 ) + + β k (t t 0 ) k}] 2, giving β(t 0 ) = (X T WX) 1 X T Wy, where y and X as before and n n diagonal matrix W contains the weights. Use β(t 0 ) to estimate curve at t 0. Refit for numerous 1 t 0 n, and interpolate the fitted values Can robustify by downweighting observations with large residuals in initial fit Lowess (locally weighted scatterplot smoother) uses nearest neighbourhood smoother, with p = 2/3, which uses the 2/3 of the data nearest to t 0 Automatic choice of bandwidth (or equivalent degrees of freedom degree of polynomial) for kernel tends to be too small, owing to autocorrelation of time series. Series Autumn 2008 slide 94 Local linear polynomial smoother Left: observations in the shaded part of the panel are weighted using the kernel shown at the foot, with h = 0.8, and the solid straight line is fitted by weighted least squares. The local estimate is the fitted value when t = t 0, shown by the vertical line. Two hundred local estimates formed using equi-spaced t 0 were interpolated to give the dotted line, which is the estimate of g(t). Right: local linear smoothers with h = 0.2 (solid) and h = 5 (dots). Series Autumn 2008 slide 95 43

Example: Northern hemisphere temperatures Temperature anomaly ( C) for 0 1979 relative to 1961 1990 instrumental average, with smoothing splines with degrees of freedom k = 3 (blue), 10 (red), 20 (cyan), and the automatically chosen (and much too big) value 158 (green). moberg 1.0 0.5 0.0 0 500 1000 1500 2000 Series Autumn 2008 slide 96 Comments Local polynomial smoothing is an example of nonparametric smoothing Another example is the use of smoothing splines Such methods allow for local behaviour of series, and so are preferable to general fitting of polynomials They all use a local fit depend on a bandwidth, related to the equivalent degrees of freedom high bandwidth low degrees of freedom smooth fit, and small bandwidth high degrees of freedom wiggly fit have approaches to choosing the bandwidth automatically, but usually for time series this gives a fit that is too wiggly can be robustifed, so that outliers have less impact on the fitted curves Series Autumn 2008 slide 97 STL decomposition An approach to removing overall trend and seasonal components, robust and (in principle) copes with missing data (but the R function stl does not!) Underlying model is Y t = U(t) + S(t) + ε t, where U(t) is trend, and S(t) is seasonal variation {ε t } stationary, Can fit a single seasonal component (next slide) or a slowly-varying one (next slide but one) Note how seasonal component gradually increases amplitude in the second plot: why? Series Autumn 2008 slide 98 44

Example: Mauna Loa data data seasonal 320 350 3 0 2 remainder trend 320 350 1960 1970 1980 1990 0.5 0.5 Series Autumn 2008 slide 99 time Example: Mauna Loa data seasonal 3 0 2 trend 320 350 0.6 0.0 0.6 data remainder 320 350 1960 1970 1980 1990 Series Autumn 2008 slide 100 time 45

Summary Today we talked about the periodogram decomposes total variation into frequency components provides a test of white noise based on the cumulative periodogram trend/seasonality estimation by smoothing moving average polynomial fitting local polynomial fitting robust local polynomial fitting STL decomposition Next time: detailed consideration of AR(1) model Series Autumn 2008 slide 101 46