Akaike criterion: Kullback-Leibler discrepancy

Similar documents
Akaike criterion: Kullback-Leibler discrepancy

Parameter estimation: ACVF of AR processes

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Long-range dependence

Introduction to Time Series Analysis. Lecture 11.

Contents. 1 Time Series Analysis Introduction Stationary Processes State Space Modesl Stationary Processes 8

Statistics of stochastic processes

Modelling using ARMA processes

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Multivariate Time Series

ITSM-R Reference Manual

Review Session: Econometrics - CLEFIN (20192)

Advanced Econometrics

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

STAT Financial Time Series

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y).

Applied time-series analysis

Econometrics I: Univariate Time Series Econometrics (1)

Econometric Forecasting

Introduction to Time Series Analysis. Lecture 12.

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

Econometría 2: Análisis de series de Tiempo

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Linear models. Chapter Overview. Linear process: A process {X n } is a linear process if it has the representation.

AR, MA and ARMA models

Model selection using penalty function criteria

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

STAT 248: EDA & Stationarity Handout 3

Econ 623 Econometrics II Topic 2: Stationary Time Series

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

ECON/FIN 250: Forecasting in Finance and Economics: Section 8: Forecast Examples: Part 1

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Ch 6. Model Specification. Time Series Analysis

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Time Series Analysis -- An Introduction -- AMS 586

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Note: The primary reference for these notes is Enders (2004). An alternative and more technical treatment can be found in Hamilton (1994).

Empirical Market Microstructure Analysis (EMMA)

MAT 3379 (Winter 2016) FINAL EXAM (SOLUTIONS)

Model comparison and selection

9. Multivariate Linear Time Series (II). MA6622, Ernesto Mordecki, CityU, HK, 2006.

Dynamic Time Series Regression: A Panacea for Spurious Correlations

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

MAT3379 (Winter 2016)

If we want to analyze experimental or simulated data we might encounter the following tasks:

Nonlinear time series

Model Selection for Geostatistical Models

Midterm Suggested Solutions

STOR 356: Summary Course Notes

Estimating AR/MA models

STAT 443 (Winter ) Forecasting

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Vector autoregressions, VAR

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Time Series I Time Domain Methods

Stationary Stochastic Time Series Models

Lecture on ARMA model

Figure 29: AR model fit into speech sample ah (top), the residual, and the random sample of the model (bottom).

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Automatic Autocorrelation and Spectral Analysis

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Lecture 2: ARMA(p,q) models (part 2)

Part II. Time Series

Chapter 3 - Temporal processes

Econometrics II Heij et al. Chapter 7.1

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -18 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

Lesson 4: Stationary stochastic processes

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Introduction to Estimation Methods for Time Series models Lecture 2

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Time Series 3. Robert Almgren. Sept. 28, 2009

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

TMA4285 December 2015 Time series models, solution.

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES

Master s Written Examination

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Covariances of ARMA Processes

Differencing Revisited: I ARIMA(p,d,q) processes predicated on notion of dth order differencing of a time series {X t }: for d = 1 and 2, have X t

Econometrics I, Estimation

Minitab Project Report Assignment 3

Akaike Information Criterion

Introduction to Stochastic processes

Part III Example Sheet 1 - Solutions YC/Lent 2015 Comments and corrections should be ed to

Lecture 1: Stationary Time Series Analysis

Statistics 910, #5 1. Regression Methods

MAT 3379 (Winter 2016) FINAL EXAM (PRACTICE)

Lecture 1: Stationary Time Series Analysis

STAT 720 sp 2019 Lec 06 Karl Gregory 2/15/2019

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Notes on the Multivariate Normal and Related Topics

14 - Gaussian Stochastic Processes

Transcription:

Model choice. Akaike s criterion Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; θ) is (ψ θ) = E θ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; θ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; θ) is ( ) f (x; ψ) d(ψ θ) = (ψ θ) (θ θ) = 2 log f (x; θ) dx. R n f (x; θ) 24 novembre 2014 1 / 29

Model choice. Akaike s criterion Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; θ) is (ψ θ) = E θ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; θ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; θ) is ( ) f (x; ψ) d(ψ θ) = (ψ θ) (θ θ) = 2 log f (x; θ) dx. R n f (x; θ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. Hence ( ) f (x; ψ) d(ψ θ) 2 log f (x; θ) dx = 0 R n f (x; θ) with equality only if f (x; ψ) = f (x; θ) a.e. [f ( ; θ)]. 24 novembre 2014 1 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. 24 novembre 2014 2 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). 24 novembre 2014 2 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 24 novembre 2014 2 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 Indeed remember that for an ARMA(p,q) process { L(φ, ϑ, σ 2 ) = (2πσ 2 ) n/2 (r 0... r n 1 ) 1/2 exp 1 } S(φ, ϑ) 2σ2 with S(φ, ϑ) = n (x j ˆx j ) 2. j=1 r j 1 r 0,..., r n 1 depend only on parameters (φ, ϑ) and not on observed data. Data enter likelihood only through the terms (x j ˆx j ) 2 in S(φ, ϑ). 24 novembre 2014 2 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 n = 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) n = ˆσ 2 ( ) E θ ( ( ˆψ θ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 S Y ( ˆφ, ˆϑ) )) + E (φ,ϑ,σ 2 ) n. ˆσ 2 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). 24 novembre 2014 4 / 29

Model choice. Akaike s criterion Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) ˆσ 2 σ 2 (n p q 2)/n 24 novembre 2014 4 / 29

Model choice. Akaike s criterion Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) ˆσ 2 σ 2 (n p q 2)/n From E θ ( ( ˆψ θ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 )) + E (φ,ϑ,σ 2 ) AICC = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 2(p + q + 1)n ) + n p q 2 is an approximate unbiased estimate of (ˆθ θ). ( ) SY ( ˆφ, ˆϑ) n ˆσ 2 24 novembre 2014 4 / 29

Model choice. Akaike s criterion Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q 2. The second term can be considered a penalty for models with a large number of parameters. 24 novembre 2014 5 / 29

Model choice. Akaike s criterion Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q 2. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2)) 24 novembre 2014 5 / 29

Model choice. Akaike s criterion Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q 2. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2)) A rule of thumb is the fits of model 1 and model 2 are not significantly different if AICC 1 AICC 2 < 2 (only the difference matters, not the absolute value of AICC). Hence, we may decide to choose model 1 if it simpler than 2 (or its residuals are closer to white-noise) even if AICC 1 > AICC 2 as long as AICC 1 < AICC 2 + 2. 24 novembre 2014 5 / 29

Model choice. Akaike s criterion Tests on residuals ˆX t ( ˆϕ, ˆϑ) predicted value of X t given the estimates ( ˆϕ, ˆϑ). Ŵ t = X t ˆX t ( ˆϕ, ˆϑ) ( ) 1/2 standardized residuals. r t 1 ( ˆϕ, ˆϑ) Portmanteau tests on ACF of Ŵ t : Box-Pierce; Ljung-Box; Test on turning points Rank tests... 24 novembre 2014 6 / 29

Autocovariance A mutivariate stochastic process {X t R m }, t Z is weakly stationary if E(X 2 t,i) < t, i E(X t ) µ, Cov(X t+h, X t ) Γ(h). In particular γ ij (h) = Cov(X t+h,i, X t,j ) = E((X t+h,i µ i )(X t,j µ j )). 24 novembre 2014 7 / 29

Autocovariance A mutivariate stochastic process {X t R m }, t Z is weakly stationary if E(Xt,i) 2 < t, i E(X t ) µ, Cov(X t+h, X t ) Γ(h). In particular γ ij (h) = Cov(X t+h,i, X t,j ) = E((X t+h,i µ i )(X t,j µ j )). Note that in general γ ij (h) γ ji (h), while γ ij (h) = Cov(X t+h,i, X t,j ) = (stationarity) = Cov(X t,i, X t h,j ) = (symmetry) = Cov(X t h,j, X t,i ) = γ ji ( h). 24 novembre 2014 7 / 29

Autocovariance A mutivariate stochastic process {X t R m }, t Z is weakly stationary if E(X 2 t,i) < t, i E(X t ) µ, Cov(X t+h, X t ) Γ(h). In particular γ ij (h) = Cov(X t+h,i, X t,j ) = E((X t+h,i µ i )(X t,j µ j )). Note that in general γ ij (h) γ ji (h), while γ ij (h) = Cov(X t+h,i, X t,j ) = (stationarity) = Cov(X t,i, X t h,j ) = (symmetry) = Cov(X t h,j, X t,i ) = γ ji ( h). Another simple property is γ i,j (h) (γ ii (0)γ jj (0)) 1/2. The ACF ρ ij (h) = γ ij (h) (γ ii (0)γ jj (0)) 1/2. 24 novembre 2014 7 / 29

Multivariate White-noise and MA A mutivariate stochastic process {Z t R m } is a white-noise with covariance S, {Z t } WN(0, S), if { S h = 0 {Z t } is stationary with mean 0 and ACVF Γ(h) = 0 h 0. 24 novembre 2014 8 / 29

Multivariate White-noise and MA A mutivariate stochastic process {Z t R m } is a white-noise with covariance S, {Z t } WN(0, S), if { S h = 0 {Z t } is stationary with mean 0 and ACVF Γ(h) = 0 h 0. {X t R m } is a linear process if X t = and C k are matrices s.t. + k= + k= C k Z t k {Z t } WN(0, S) (C k ) ij < + for all i, j = 1... m. 24 novembre 2014 8 / 29

Multivariate White-noise and MA A mutivariate stochastic process {Z t R m } is a white-noise with covariance S, {Z t } WN(0, S), if { S h = 0 {Z t } is stationary with mean 0 and ACVF Γ(h) = 0 h 0. {X t R m } is a linear process if X t = and C k are matrices s.t. + k= + k= {X t } is stationary and Γ X (h) = C k Z t k {Z t } WN(0, S) (C k ) ij < + for all i, j = 1... m. k= C k+h SC t k. 24 novembre 2014 8 / 29

Estimation of mean The mean µ can be estimated through X n. From the univariate theory, we know E( X n ) = µ, V(( X n ) i ) 0 (as n ), if γ ii (h) h 0 nv(( X n ) i ) + h= γ ii (h) if + h= γ ii (h) < +. Moreover ( X n ) i is asymptotically normal. Stronger assumptions are required for the vector X n to be asymptotically normal Theorem If X t = µ + then n 1/2 ( X n µ) = N(0, + k= C k Z t k {Z t } WN(0, S) k= C k+h SC t k ). 24 novembre 2014 9 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 n m-dimensional confidence ellipsoid. But... k= C k+h SCk t ) one could build an 24 novembre 2014 10 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 C k+h SCk t ) one could build an n k= m-dimensional confidence ellipsoid. But... not intuitive, C k and S not known and have to be estimated... Instead, build confidence intervals from ( X n ) i N(µ i, 1 n + h= γ ii (h)). 24 novembre 2014 10 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 C k+h SCk t ) one could build an n k= m-dimensional confidence ellipsoid. But... not intuitive, C k and S not known and have to be estimated... Instead, build confidence intervals from ( X n ) i N(µ i, 1 n + h= + h= γ ii (h)). γ ii (h) = 2πf i (0) can be consistently estimated from r ( 2πˆf i (0) = 1 h ) ˆγ ii (h) where r n and r n r n 0. h= r 24 novembre 2014 10 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 C k+h SCk t ) one could build an n k= m-dimensional confidence ellipsoid. But... not intuitive, C k and S not known and have to be estimated... Instead, build confidence intervals from ( X n ) i N(µ i, 1 n + h= + h= γ ii (h)). γ ii (h) = 2πf i (0) can be consistently estimated from r ( 2πˆf i (0) = 1 h ) ˆγ ii (h) where r n and r n r n 0. h= r Componentwise confidence intervals can be combined. If we found u i (α) s.t. P( µ i ( X n ) i < u i (a)) 1 α, then m P( µ i ( X n ) i <u i (a), i = 1, m) 1 P ( µ i ( X n ) i u i (a) ) 1 mα. i=1 24 novembre 2014 10 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 C k+h SCk t ) one could build an n k= m-dimensional confidence ellipsoid. But... not intuitive, C k and S not known and have to be estimated... Instead, build confidence intervals from ( X n ) i N(µ i, 1 n + h= + h= γ ii (h)). γ ii (h) = 2πf i (0) can be consistently estimated from r ( 2πˆf i (0) = 1 h ) ˆγ ii (h) where r n and r n r n 0. h= r Componentwise confidence intervals can be combined. If we found u i (α) s.t. P( µ i ( X n ) i < u i (a)) 1 α, then m P( µ i ( X n ) i <u i (a), i = 1, m) 1 P ( µ i ( X n ) i u i (a) ) 1 mα. Choosing α = 0.05 m i=1, one has a 95%-confidence m-rectangle. 24 novembre 2014 10 / 29

Estimation of ACVF (bivariate case, m = 2) 1 n h (X t+h X n )(X t X n ) t 0 h < n n t=1 ˆΓ(h) = 1 n (X t+h n X n )(X t X n ) t n < h < 0. t= h+1 ˆρ ij (h) = ˆγ ij (h)(ˆγ ii (0)ˆγ jj (0)) 1/2. 24 novembre 2014 11 / 29

Estimation of ACVF (bivariate case, m = 2) 1 n h (X t+h X n )(X t X n ) t 0 h < n n t=1 ˆΓ(h) = 1 n (X t+h n X n )(X t X n ) t n < h < 0. t= h+1 ˆρ ij (h) = ˆγ ij (h)(ˆγ ii (0)ˆγ jj (0)) 1/2. Theorem If X t = µ + + k= C k Z t k {Z t } IID(0, S) then h ˆγ ij (h) P γ ij (h) ˆρ ij (h) P ρ ij (h) as n. 24 novembre 2014 11 / 29

An example: Southern Oscillation Index Southern Oscillation Index (an environmental measure) compared to fish recruitment in South Pacific (1950 to 1985) Southern Oscillation Index -1.0 0.0 0.5 1.0 1950 1960 1970 1980 Recruitment 0 20 60 100 1950 1960 1970 1980 24 novembre 2014 12 / 29

ACF of Southern Oscillation Index soi soi & rec ACF -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 Lag rec & soi -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 Lag rec Bottom left panel is γ 12 of negative lags. ACF -0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0-1.5-1.0-0.5 0.0 0.0 0.5 1.0 1.5 Lag Lag 24 novembre 2014 13 / 29

An example from Box and Jenkins Sales (V2) with a leading indicator (V1) sales V2 200 220 240 260 V1 10 11 12 13 14 0 50 100 150 Time 24 novembre 2014 14 / 29

ACF of sales data V1 V1 & V2 ACF -0.2 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Lag V2 & V1-0.2 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Lag V2 Data are not stationary. ACF -0.2 0.2 0.4 0.6 0.8 1.0-0.2 0.2 0.4 0.6 0.8 1.0-15 -10-5 0 0 5 10 15 Lag Lag 24 novembre 2014 15 / 29

Differenced sales data dsales V2-2 0 2 4 V1-0.5 0.0 0.5 0 50 100 150 Time 24 novembre 2014 16 / 29

ACF of sales data V1 V1 & V2 ACF ACF -0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0 0 5 10 15 Lag V2 & V1-0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0 0 5 10 15 Lag V2 Only crosscorrelation relevant only at lags 2, 3. -15-10 -5 0 0 5 10 15 Lag Lag 24 novembre 2014 17 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But 24 novembre 2014 18 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But Theorem Let X t,1 = j= α j Z t j,1 X t,2 = j= β j Z t j,2 with {Z t,1 } WN(0, σ 2 1), {Z t,2 } WN(0, σ 2 2) and independent. 24 novembre 2014 18 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But Theorem Let X t,1 = j= α j Z t j,1 X t,2 = j= β j Z t j,2 with {Z t,1 } WN(0, σ 2 1), {Z t,2 } WN(0, σ 2 2) and independent. Then nv(ˆγ 12 (h)) n j= γ 11 (j)γ 22 (j). 24 novembre 2014 18 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But Theorem Let X t,1 = j= α j Z t j,1 X t,2 = j= β j Z t j,2 with {Z t,1 } WN(0, σ 2 1), {Z t,2 } WN(0, σ 2 2) and independent. Then nv(ˆγ 12 (h)) n n 1/2 ˆρ 12 (h) = N 0, j= j= γ 11 (j)γ 22 (j). ρ 11 (j)ρ 22 (j). 24 novembre 2014 18 / 29

Testing for independence of time-series: an example Suppose {X t,1 } and {X t,2 } are independent AR(1) processes with ρ i,i (h) = 0.8 h. Then asymptotic variance of ˆρ 12 (h) is n 1 h= 0.64 h n 1 4.556 24 novembre 2014 19 / 29

Testing for independence of time-series: an example Suppose {X t,1 } and {X t,2 } are independent AR(1) processes with ρ i,i (h) = 0.8 h. Then asymptotic variance of ˆρ 12 (h) is n 1 h= 0.64 h n 1 4.556 Values of ˆρ 12 (h) quite larger than 1.96n 1 should be common even if the two series are independent. 24 novembre 2014 19 / 29

Testing for independence of time-series: an example Suppose {X t,1 } and {X t,2 } are independent AR(1) processes with ρ i,i (h) = 0.8 h. Then asymptotic variance of ˆρ 12 (h) is n 1 h= 0.64 h n 1 4.556 Values of ˆρ 12 (h) quite larger than 1.96n 1 should be common even if the two series are independent. Instead, if one series is white-noise, then V(ˆρ 12 (h)) 1 n. 24 novembre 2014 19 / 29

Testing for independence of time-series: an example Suppose {X t,1 } and {X t,2 } are independent AR(1) processes with ρ i,i (h) = 0.8 h. Then asymptotic variance of ˆρ 12 (h) is n 1 h= 0.64 h n 1 4.556 Values of ˆρ 12 (h) quite larger than 1.96n 1 should be common even if the two series are independent. Instead, if one series is white-noise, then V(ˆρ 12 (h)) 1 n. Hence, in testing for independence, it is often recommended to prewhiten one series. 24 novembre 2014 19 / 29

Pre-whitening a time series Instead of testing ˆρ 12 (h) of the original series, one trasforms them into white noise. If {X t,1 } and {X t,2 } are invertible ARMA, then where k=0 π (i) k Z t,i = k=0 π (i) k X t k,i WN(0, σ 2 i ), i = 1, 2 zk = π (i) (z) = φ (i) (z)/θ (i) (z). 24 novembre 2014 20 / 29

Pre-whitening a time series Instead of testing ˆρ 12 (h) of the original series, one trasforms them into white noise. If {X t,1 } and {X t,2 } are invertible ARMA, then where k=0 π (i) k Z t,i = k=0 π (i) k X t k,i WN(0, σ 2 i ), i = 1, 2 zk = π (i) (z) = φ (i) (z)/θ (i) (z). {X t,1 } and {X t,2 } are independent if and only if {Z t,1 } and {Z t,2 }, hence one test for ˆρ Z1,Z 2 (h). 24 novembre 2014 20 / 29

Pre-whitening a time series Instead of testing ˆρ 12 (h) of the original series, one trasforms them into white noise. If {X t,1 } and {X t,2 } are invertible ARMA, then where k=0 π (i) k Z t,i = k=0 π (i) k X t k,i WN(0, σ 2 i ), i = 1, 2 zk = π (i) (z) = φ (i) (z)/θ (i) (z). {X t,1 } and {X t,2 } are independent if and only if {Z t,1 } and {Z t,2 }, hence one test for ˆρ Z1,Z 2 (h). As φ (i) (z) and θ (i) (z) not known, one fits an ARMA to the series, and uses the residuals Ŵ t,i in place of Z t,i. It may be enough doing this just to one series. 24 novembre 2014 20 / 29

Siimulated data 1st series is AR(1) with ϕ = 0.9; 2nd series is AR(2) with ϕ 1 = 0.7, ϕ 2 = 0.27. dat_sim dat2-6 -4-2 0 2 4 dat1-4 -2 0 2 4 0 50 100 150 200 Time 24 novembre 2014 21 / 29

ACF of simulated data dat1 dat1 & dat2 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag 0 5 10 15 20 Lag dat2 & dat1 dat2 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0-20 -15-10 -5 0 Lag 0 5 10 15 20 Lag 24 novembre 2014 22 / 29

ACF of residuals MLE fits the correct model to both series. 24 novembre 2014 23 / 29

ACF of residuals MLE fits the correct model to both series. fitunk1$res fitunk1$res & fitunk2$res ACF ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag fitunk2$res & fitunk1$res 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag fitunk2$res A few crosscorrelation coefficient may appear slightly significant. -20-15 -10-5 0 0 5 10 15 20 Lag Lag 24 novembre 2014 23 / 29

Bartlett s formula More generally Theorem If {X t } is a bivariate Gaussian time series with lim ncov (ˆρ 12(h), ˆρ 12 (k)) = n + j= + h= γ ij (h) <, then [ ρ 11 (j)ρ 22 (j + k h) +ρ 12 (j + k)ρ 21 (j h) ρ 12 (h) (ρ 11 (j)ρ 12 (j + k) + ρ 22 (j)ρ 21 (j k)) ρ 12 (k) (ρ 11 (j)ρ 12 (j + h) + ρ 22 (j)ρ 21 (j h)) ( 1 +ρ 12 (h)ρ 12 (k) 2 ρ2 11(j) + ρ 2 12(j) + 1 ) ] 2 ρ2 22(j) 24 novembre 2014 24 / 29

Spectral density of multivariate series If + h= γ ij (h) <, one can define f (λ) = 1 e ihλ Γ(h), λ [ π, π] 2π and one obtains h= π Γ(h) = π e iλh f (λ) dλ 24 novembre 2014 25 / 29

Spectral density of multivariate series If + h= γ ij (h) <, one can define f (λ) = 1 e ihλ Γ(h), λ [ π, π] 2π and one obtains and h= π Γ(h) = X t = π π π e iλh f (λ) dλ e iλh dz(λ) where Z i ( ) are (complex) processes with independent increments s.t. λ2 λ 1 ( ) f ij (λ) dλ = E (Z i (λ 2 ) Z i (λ j ))(Z j (λ 2 ) Z j (λ 1 )). 24 novembre 2014 25 / 29

Coherence of series For a bivariate series the coherence at frequency λ is X 12 (λ) = f 12 (λ) [f 11 (λ)f 22 (λ)] 1/2 and represents the correlation between dz 1 (λ) and dz 2 (λ). The squared coherency function is X 12 (λ) 2 satisfies 0 X 12 (λ) 2 1. 24 novembre 2014 26 / 29

Coherence of series For a bivariate series the coherence at frequency λ is X 12 (λ) = f 12 (λ) [f 11 (λ)f 22 (λ)] 1/2 and represents the correlation between dz 1 (λ) and dz 2 (λ). The squared coherency function is X 12 (λ) 2 satisfies 0 X 12 (λ) 2 1. Remark. If X t,2 = + k= ψ k X t k,1, then X 12 (λ) 2 1. 24 novembre 2014 26 / 29

Periodogram n Define J(ω j ) = n 1/2 X t e itω j, t=1 for j between [(n 1)/2] and [n/2]. ω j = 2πj/n Then I n (ω j ) = J(ω j )J (ω j ) where means transpose and complex conjugate. ( n ) ( n ) I 12 (ω j ) = 1 n is the cross periodogram. t=1 X t1 e itω j t=1 X t2 e itω j 24 novembre 2014 27 / 29

Estimation of spectral density and coherence Again, one estimates f (λ) by ˆf (λ) = 1 2π If X t = + k= C kz t k m n k= m n W n (k)i n ( g(n, λ) + 2π k ). n {Z t } IID(0, S) then m n ˆf ij (λ) AN f ij (λ), f ij (λ) Wn 2 (k) 0 < λ < π. k= m n The natural estimator of X 12 (λ) 2 is ˆχ 2 12(λ) = ˆf 12 (λ) 2 ˆf 11 (λ)ˆf 22 (λ). 24 novembre 2014 28 / 29

An example of coherency estimation Squared coherency between SOI and recruitment squared coherency 0.0 0.2 0.4 0.6 0.8 1.0 The horizontal line represents a (conservative) test of the assumption X 12 (λ) 2 = 0. Strong coherency at period 1 yr. and longer than 3. 0 1 2 3 4 5 6 frequency 24 novembre 2014 29 / 29