Time Series. Anthony Davison. c

Similar documents
Periodogram of a sinusoid + spike Single high value is sum of cosine curves all in phase at time t 0 :

Chapter 3: Regression Methods for Trends

8.2 Harmonic Regression and the Periodogram

Statistics of stochastic processes

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Unstable Oscillations!

Computational Data Analysis!

Statistics 203: Introduction to Regression and Analysis of Variance Course review

The Spectral Density Estimation of Stationary Time Series with Missing Data

Periodogram of a sinusoid + spike Single high value is sum of cosine curves all in phase at time t 0 :

From Data To Functions Howdowegofrom. Basis Expansions From multiple linear regression: The Monomial Basis. The Monomial Basis

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

Lecture 11: Spectral Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

A time series is called strictly stationary if the joint distribution of every collection (Y t

Local Polynomial Modelling and Its Applications

Time Series Analysis -- An Introduction -- AMS 586

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Statistics: A review. Why statistics?

Time Series: Theory and Methods

Classic Time Series Analysis

Autoregressive Models Fourier Analysis Wavelets

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

Fourier Analysis of Stationary and Non-Stationary Time Series

Lecture 3: Statistical sampling uncertainty

Statistics of Stochastic Processes

Nonparametric Bayesian Methods (Gaussian Processes)

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals

Using wavelet tools to estimate and assess trends in atmospheric data

Basics: Definitions and Notation. Stationarity. A More Formal Definition

DATA IN SERIES AND TIME I. Several different techniques depending on data and what one wants to do

Lecture 5 Least-squares

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Some general observations.

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES

Wavelets and Multiresolution Processing

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Regression with correlation for the Sales Data

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Time Series 3. Robert Almgren. Sept. 28, 2009

Estimation of cumulative distribution function with spline functions

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

GATE EE Topic wise Questions SIGNALS & SYSTEMS

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Chapter 3 - Temporal processes

Dynamic Time Series Regression: A Panacea for Spurious Correlations

SIGNAL AND IMAGE RESTORATION: SOLVING

EL1820 Modeling of Dynamical Systems

How to build an automatic statistician

X random; interested in impact of X on Y. Time series analogue of regression.

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Part 6: Multivariate Normal and Linear Models

Interaction effects for continuous predictors in regression modeling

Homework #2 Due Monday, April 18, 2012

Nonparametric regression with martingale increment errors

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

COMPUTATIONAL ISSUES RELATING TO INVERSION OF PRACTICAL DATA: WHERE IS THE UNCERTAINTY? CAN WE SOLVE Ax = b?

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

Estimating Periodic Signals

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

14 - Gaussian Stochastic Processes

Classifying and building DLMs

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Time series models in the Frequency domain. The power spectrum, Spectral analysis

41903: Introduction to Nonparametrics

E 4101/5101 Lecture 6: Spectral analysis

4 Multiple Linear Regression

1 Arabidopsis growth curves

Time Series Examples Sheet

data lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle

This is the number of cycles per unit time, and its units are, for example,

Heteroskedasticity and Autocorrelation Consistent Standard Errors

Lecture 2: Univariate Time Series

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Numerical Methods I Orthogonal Polynomials

1 Class Organization. 2 Introduction

Worksheet #2. Use trigonometry and logarithms to model natural phenomena from a periodic behavior perspective.

State Space Representation of Gaussian Processes

Multiple Linear Regression

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Time Series Examples Sheet

A Diagnostic for Seasonality Based Upon Autoregressive Roots

Short-term electricity demand forecasting in the time domain and in the frequency domain

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

ANOVA: Analysis of Variance - Part I

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Multivariate Analysis and Likelihood Inference

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Regularizing inverse problems. Damping and smoothing and choosing...

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

IDL Advanced Math & Stats Module

IDENTIFICATION OF ARMA MODELS

Analysis of Violent Crime in Los Angeles County

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Transcription:

Series Anthony Davison c 2008 http://stat.epfl.ch Periodogram 76 Motivation............................................................ 77 Lutenizing hormone data.................................................. 78 Periodogram........................................................... 79 Example: Sine wave with white noise.......................................... 80 Example: Sine wave with white noise.......................................... 81 Example: AR(1), 0.9..................................................... 82 Example: AR(1), 0.9.................................................... 83 Example............................................................. 84 Comments............................................................ 85 A trigonometric lemma................................................... 86 Properties of the periodogram............................................... 87 Reminder: Multivariate normal distribution...................................... 88 Smoothing 89 Motivation............................................................ 90 Moving averages........................................................ 91 Polynomial regression..................................................... 92 Local polynomial regression................................................ 94 Local linear polynomial smoother............................................. 95 Comments............................................................ 97 STL decomposition...................................................... 98 Summary............................................................ 101 1

Week 3 Periodogram Smoothing Series Autumn 2008 slide 75 Periodogram slide 76 Motivation Many series have periodic structure (e.g. sunspots, CO2 data,...), but we may not know what the frequencies are in advance of looking at the data The periodogram is a summary description based on representing the observed series as a superposition of sine and cosine waves of various frequencies The idea is that the periodogram will tell us what frequencies are most important Consider first the simple model Y t = αcos(ωt) + β sin(ωt) + ε t, t = 1,...,n, (1) where {ε t } is white noise, ω = 2π/p is the known frequency of the fluctuations, p is their known period, and α,β are unknown parameters to be estimated by least squares. As we can write αcos(ωt) + β sin(ωt) = (α 2 + β 2 ) 1/2 sin(ωt + γ), γ = tan 1 (α/β), the right-hand side of equation (1) is equivalent to any sinusoidal function with frequency ω Series Autumn 2008 slide 77 Lutenizing hormone data Data on lutenizing hormone in n = 48 successive blood samples from a woman, taken at 10-minute intervals. Fourier series for n = 48 are shown in the lower panel. lh 1.5 2.0 2.5 3.0 3.5 0 10 20 30 40 Fourier series 0 10 20 30 40 Series Autumn 2008 slide 78 35

Periodogram Definition 12 (a) If y 1,...,y n is an equally-spaced time series, its periodogram ordinate for ω is defined as { 2 { } 2 I(ω) = n 1 y t sin(ωt)} + y t cos(ωt), 0 < ω < π/2. (b) The periodogram is a plot of I(2πj/n) for the Fourier frequencies 2πj/n for j = 1,...,m = (n 1)/2 ; I(π) is included only if n is even. By default R plots the log periodogram, log I(2πj/n), against j/n. (c) The cumulative periodogram C r = r j=1 I(2πj/n) m l=1 I(2πl/n), r = 1,...,m, is a plot of C 1,...,C m against the frequencies j/n for j = 1,...,m. Series Autumn 2008 slide 79 Example: Sine wave with white noise Top: data from a simulated sine wave with added white noise. Bottom: log periodogram with red horizontal line showing noise variance σ 2 = 0.25, and a green vertical line showing the signal frequency 1/200. The blue line shows the width of a 95% confidence interval for the true value at each point. Sine wave with frequency 200 and white noise variance 0.25 2 1 0 1 2 0 100 200 300 400 500 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 80 36

Example: Sine wave with white noise Top: data from a simulated sine wave with added white noise. Bottom: log periodogram with red horizontal line showing noise variance σ 2 = 1, and a green vertical line showing the signal frequency 1/20. The blue line shows the width of a 95% confidence interval for the true value at each point. Sine wave with frequency 20 and white noise variance 1 4 2 0 2 4 0 100 200 300 400 500 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 81 Example: AR(1), 0.9 Data from a simulated AR(1) model with parameter 0.9, with log periodogram and theoretical value (in red). The blue line shows the width of a 95% confidence interval for the true value at each point. The log scale on the vertical axis means there is a very large change in the periodogram itself. AR(1), 0.9 y 6 2 0 2 4 6 0 100 200 300 400 500 spectrum 1e 03 1e 01 1e+01 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.000577 Series Autumn 2008 slide 82 37

Example: AR(1), 0.9 Data from a simulated AR(1) model with parameter 0.9, with log periodogram and theoretical value (in red). AR(1), 0.9 6 2 0 2 4 6 y 0 50 100 150 200 spectrum 1e 02 1e+00 1e+02 0.0 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.00144 Series Autumn 2008 slide 83 Example Data on luteinizing hormone in n = 48 blood samples at 10-minute intervals from a human female. Top left: data; top right: periodogram; bottom left: possible Fourier series for n = 48; bottom right: cumulative periodogram. lh 1.5 2.0 2.5 3.0 3.5 spectrum 0.01 0.05 0.20 1.00 0 10 20 30 40 0.1 0.2 0.3 0.4 0.5 frequency bandwidth = 0.00601 Fourier series 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 frequency Series Autumn 2008 slide 84 38

Comments Low frequency variation (trend) appears at the left of the periodogram, and high frequency variation (rapid oscillations) appears at the right. The rationale for considering only the frequencies ω = 2πj/n is that m yt 2 = I(0) + 2 I(2πj/n) + I(π), (2) j=1 with I(π) included only if n is even. Thus the periodogram decomposes the total variability y 2 t of the data into components associated with each of these frequencies, plus one for the grand mean I(0) = ny 2, which we ignore because it is not periodic. The rationale for plotting the log periodogram is that the periodogram ordinates are (roughly) exponentially distributed, and the log transformation is variance-stabilising for the exponential distribution. A rough significance scale for the log-periodogram is shown by the vertical line on its right. The cumulative periodogram provides a visual test of whether the series is white noise. We compare C r with its expected value r/m: a large value of the Kolmogorov Smirnov statistic D = max C r r/m suggests that the underlying series is not white noise. The test involves seeing whether the cumulative periodogram falls outside a diagonal band, whose width determines the size of the test. Series Autumn 2008 slide 85 A trigonometric lemma Lemma 13 (a) Let ω j = 2πj/n, for n N and positive integer j < n/2, and write c t = cos(tω j ),s t = sin(tω j ). Then c t = s t = s t c t = 0, s 2 t = c 2 t = n/2. (b) If ω 1 = 2πj 1 /n,ω 1 = 2πj 2 /n for positive integers j 1 j 2 < n/2, and we write c kt = cos(tω k ),s kt = sin(tω k ) for k = 1,2, then s 1t s 2t = s 1t c 2t = c 1t c 2t = 0. (c) If n is odd, and we write s tj = sin(tω j ), c tj = cos(tω j ), ω j = 2πj/n, for j = 1,...,m = n/2, t = 1,...,n, then the columns of the n n matrix 1 s 11 c 11 s 12 c 12 s 1m c 1m 1 s 21 c 21 s 22 c 22 s 2m c 2m Q =....... 1 s n1 c n1 s n2 c n2 s nm c nm are orthogonal, Q T Q = diag(n,n/2,...,n/2) = D, say. Series Autumn 2008 slide 86 39

Properties of the periodogram iid Theorem 14 If Y 1,...,Y n N(µ,σ 2 ), then all the periodogram ordinates are independent, and (a) the I(2πj/n), for j = 1,...,m, are exponential random variables with common mean σ 2 ; (b) if n is even, I(π) σ 2 χ 2 1 ; finally, (c) the cumulative periodogram ordinate C r = r j=1 I(2πj/n) m l=1 I(2πl/n), r = 1,...,m, has a beta(r,m r) distribution, and so has mean and variance r/m, r(m r)/m. Part (a) of this result tells us that Gaussian white noise has a flat spectrum. In fact the assumption of Gaussianity is needed only for the independence, and the spectrum of non-gaussian white noise is flat. Series Autumn 2008 slide 87 Reminder: Multivariate normal distribution Definition 15 The vector random variable Y = (Y 1,...,Y n ) T is said to have the multivariate normal distribution with mean vector µ n 1 = (µ 1,...,µ n ) T with ith element µ i = E(Y i ) and (co)variance matrix Ω n n with (i,j) element ω ij = cov(y i,y j ), written Y N n (µ,ω), if its density function is f(y;µ,ω) = 1 (2π) n/2 Ω 1/2 exp { 1 2 (y µ)t Ω 1 (y µ) }, y R n,µ R n, where Ω is a symmetric positive definite matrix. In this case its moment-generating function is E(e tty ) = exp(t T µ + 1 2 tt Ωt), t R n. In particular, if Y 1,...,Y n iid N(µ,σ 2 ), then the mean vector and variance matrix are µ1 n and σ 2 I n. Lemma 16 If Y N n (µ,ω), and a m 1 and B m n are constant, with B of rank m < n, then a + BY N m (a + Bµ,BΩB T ). Series Autumn 2008 slide 88 40

Smoothing slide 89 Motivation Underlying model is Y t = µ(t) + ε t, where µ(t) is smooth function of t and {ε t } is stationary. Differencing removes (some) trend to give (roughly) stationary series Sometimes we want to examine the trend by smoothing the time series Approaches: moving average (simple, related to differencing) polynomial (simple, doesn t work very well) local polynomial (simple, easy to robustify) spline (simple, similar to local polynomial) STL decomposition (robust fitting of local polynomial, with seasonal effects) Series Autumn 2008 slide 90 Moving averages Classical approach to smoothing: given data y 1,...,y n, replace y t by (y t+1 + y t + y t 1 )/3, or in general construct the moving average of order 2p + 1, s t = p j= p w j y t+j, t = p + 1,...,n p, p N, and weights w j, with w j = 1 and (usually) w j > 0 and w j = w j. This is an example of a linear filter. Fixes are possible near the ends, but usually p n, so the details are unimportant. Choose weights by iterating simple (equally-weighted) smoothers (example) choosing higher order to remove (or at least decrease) seasonality, for example taking p = 6, w 6 = w 6 = 1/24 and all other w j = 1/12. taking smaller order to highlight seasonality Series Autumn 2008 slide 91 41

Polynomial regression Fit polynomial of degree k to the data; assume that where {ε t } is stationary series Y t = s(t) + ε t = β 0 + β 1 t + + β k t k + ε t, Choose parameters β 0,...,β k to minimise the sum of squares {y t s(t)} 2 = { y t (β 0 + β 1 t + + β k t k )} 2, giving β k+1 1 = (X T X) 1 X T y, where y T = (y 1,...,y n ) and (t,j) element of n (k + 1) matrix X is t j 1. Comments: sensitivity to observations at extremities of series often leads to poor fit usually doesn t work well because polynomials are too restrictive may need orthogonal polynomials to avoid numerical problems if n, k large easily copes with missing values/unequally spaced observations Series Autumn 2008 slide 92 Example: Northern hemisphere temperatures Temperature anomaly ( C) for 0 1979 relative to 1961 1990 instrumental average, with polynomials of degree k = 3 (blue), 10 (red), 20 (cyan) moberg 1.0 0.5 0.0 0 500 1000 1500 2000 Series Autumn 2008 slide 93 42

Local polynomial regression Fit polynomial of degree k = 0,1 or so to the data, but locally See picture on next slide. Use kernel weights w(t t 0 ) that downweight observations far from t 0, and minimise weighted sum of squares [ { w(t t 0 ) y t β 0 + β 1 (t t 0 ) + + β k (t t 0 ) k}] 2, giving β(t 0 ) = (X T WX) 1 X T Wy, where y and X as before and n n diagonal matrix W contains the weights. Use β(t 0 ) to estimate curve at t 0. Refit for numerous 1 t 0 n, and interpolate the fitted values Can robustify by downweighting observations with large residuals in initial fit Lowess (locally weighted scatterplot smoother) uses nearest neighbourhood smoother, with p = 2/3, which uses the 2/3 of the data nearest to t 0 Automatic choice of bandwidth (or equivalent degrees of freedom degree of polynomial) for kernel tends to be too small, owing to autocorrelation of time series. Series Autumn 2008 slide 94 Local linear polynomial smoother Left: observations in the shaded part of the panel are weighted using the kernel shown at the foot, with h = 0.8, and the solid straight line is fitted by weighted least squares. The local estimate is the fitted value when t = t 0, shown by the vertical line. Two hundred local estimates formed using equi-spaced t 0 were interpolated to give the dotted line, which is the estimate of g(t). Right: local linear smoothers with h = 0.2 (solid) and h = 5 (dots). Series Autumn 2008 slide 95 43

Example: Northern hemisphere temperatures Temperature anomaly ( C) for 0 1979 relative to 1961 1990 instrumental average, with smoothing splines with degrees of freedom k = 3 (blue), 10 (red), 20 (cyan), and the automatically chosen (and much too big) value 158 (green). moberg 1.0 0.5 0.0 0 500 1000 1500 2000 Series Autumn 2008 slide 96 Comments Local polynomial smoothing is an example of nonparametric smoothing Another example is the use of smoothing splines Such methods allow for local behaviour of series, and so are preferable to general fitting of polynomials They all use a local fit depend on a bandwidth, related to the equivalent degrees of freedom high bandwidth low degrees of freedom smooth fit, and small bandwidth high degrees of freedom wiggly fit have approaches to choosing the bandwidth automatically, but usually for time series this gives a fit that is too wiggly can be robustifed, so that outliers have less impact on the fitted curves Series Autumn 2008 slide 97 STL decomposition An approach to removing overall trend and seasonal components, robust and (in principle) copes with missing data (but the R function stl does not!) Underlying model is Y t = U(t) + S(t) + ε t, where U(t) is trend, and S(t) is seasonal variation {ε t } stationary, Can fit a single seasonal component (next slide) or a slowly-varying one (next slide but one) Note how seasonal component gradually increases amplitude in the second plot: why? Series Autumn 2008 slide 98 44

Example: Mauna Loa data data seasonal 320 350 3 0 2 remainder trend 320 350 1960 1970 1980 1990 0.5 0.5 Series Autumn 2008 slide 99 time Example: Mauna Loa data seasonal 3 0 2 trend 320 350 0.6 0.0 0.6 data remainder 320 350 1960 1970 1980 1990 Series Autumn 2008 slide 100 time 45

Summary Today we talked about the periodogram decomposes total variation into frequency components provides a test of white noise based on the cumulative periodogram trend/seasonality estimation by smoothing moving average polynomial fitting local polynomial fitting robust local polynomial fitting STL decomposition Next time: detailed consideration of AR(1) model Series Autumn 2008 slide 101 46