Model selection using penalty function criteria

Similar documents
Review Session: Econometrics - CLEFIN (20192)

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Autoregressive Approximation in Nonstandard Situations: Empirical Evidence

Econometric Forecasting

Model comparison and selection

Department of Econometrics and Business Statistics

STAT Financial Time Series

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Akaike criterion: Kullback-Leibler discrepancy

4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore,

ARIMA Modelling and Forecasting

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Introduction to Stochastic processes

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

AR, MA and ARMA models

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

ECON 616: Lecture 1: Time Series Basics

Midterm Suggested Solutions

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

ARMA MODELS Herman J. Bierens Pennsylvania State University February 23, 2009

3. ARMA Modeling. Now: Important class of stationary processes

Introduction to Time Series Analysis. Lecture 11.

Estimating AR/MA models

Applied time-series analysis

Parameter estimation: ACVF of AR processes

Lecture 2: Univariate Time Series

MAT 3379 (Winter 2016) FINAL EXAM (SOLUTIONS)

Advanced Econometrics

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Performance of Autoregressive Order Selection Criteria: A Simulation Study

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Focused Information Criteria for Time Series

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Information Criteria and Model Selection

Contents. 1 Time Series Analysis Introduction Stationary Processes State Space Modesl Stationary Processes 8

Empirical Market Microstructure Analysis (EMMA)

Econ 623 Econometrics II Topic 2: Stationary Time Series

Note: The primary reference for these notes is Enders (2004). An alternative and more technical treatment can be found in Hamilton (1994).

Financial Time Series: Changepoints, structural breaks, segmentations and other stories.

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

CUSUM TEST FOR PARAMETER CHANGE IN TIME SERIES MODELS. Sangyeol Lee

Univariate ARIMA Models

Regression I: Mean Squared Error and Measuring Quality of Fit

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Econometrics II Heij et al. Chapter 7.1

ECON/FIN 250: Forecasting in Finance and Economics: Section 8: Forecast Examples: Part 1

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Testing Restrictions and Comparing Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

ITSM-R Reference Manual

Non-Stationary Time Series and Unit Root Testing

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

A time series is called strictly stationary if the joint distribution of every collection (Y t

Statistica Sinica Preprint No: SS

Moreover, the second term is derived from: 1 T ) 2 1

Unit Root and Cointegration

CHANGE DETECTION IN TIME SERIES

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Lecture 1: Stationary Time Series Analysis

A Course in Time Series Analysis

Model selection criteria Λ

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Univariate Time Series Analysis; ARIMA Models

Information Theory Based Estimator of the Number of Sources in a Sparse Linear Mixing Model

Minimum Message Length Autoregressive Model Order Selection

If we want to analyze experimental or simulated data we might encounter the following tasks:

Model Selection and Geometry

Quantitative Finance I

Econ 427, Spring Problem Set 3 suggested answers (with minor corrections) Ch 6. Problems and Complements:

Covariance function estimation in Gaussian process regression

Time Series. Chapter Time Series Data

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

Introduction to ARMA and GARCH processes

Introduction to Time Series Analysis. Lecture 12.

MAT3379 (Winter 2016)

Dimension Reduction Methods

Covariances of ARMA Processes

Box-Jenkins ARIMA Advanced Time Series

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

Time Series I Time Domain Methods

End-Semester Examination MA 373 : Statistical Analysis on Financial Data

Ch 6. Model Specification. Time Series Analysis

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Forecasting Egyptian GDP Using ARIMA Models

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series 2. Robert Almgren. Sept. 21, 2009

Modelling using ARMA processes

How the mean changes depends on the other variable. Plots can show what s happening...

Lecture 1: Stationary Time Series Analysis

FE570 Financial Markets and Trading. Stevens Institute of Technology

Transcription:

Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20

Outline Classes of models. Penalty function model selection criteria. AIC (Akaike, 1973), BIC (Schwarz, 1978), DIC, FIC, SIC, TIC and more alphabet soup. MDL (Rissanen, 1979,... ) Analysis from the distant past AR model under 4th moment conditions. Relaxation of moment conditions to 2 < α < 4 moments. Applications: AR modelling. Long memory Fractionally integrated AR models. Counting structural breaks.

Outline Classes of models. Penalty function model selection criteria. AIC (Akaike, 1973), BIC (Schwarz, 1978), DIC, FIC, SIC, TIC and more alphabet soup. MDL (Rissanen, 1979,... ) Analysis from the distant past AR model under 4th moment conditions. Relaxation of moment conditions to 2 < α < 4 moments. Applications: AR modelling. Long memory Fractionally integrated AR models. Counting structural breaks.

Outline Classes of models. Penalty function model selection criteria. AIC (Akaike, 1973), BIC (Schwarz, 1978), DIC, FIC, SIC, TIC and more alphabet soup. MDL (Rissanen, 1979,... ) Analysis from the distant past AR model under 4th moment conditions. Relaxation of moment conditions to 2 < α < 4 moments. Applications: AR modelling. Long memory Fractionally integrated AR models. Counting structural breaks.

Outline Classes of models. Penalty function model selection criteria. AIC (Akaike, 1973), BIC (Schwarz, 1978), DIC, FIC, SIC, TIC and more alphabet soup. MDL (Rissanen, 1979,... ) Analysis from the distant past AR model under 4th moment conditions. Relaxation of moment conditions to 2 < α < 4 moments. Applications: AR modelling. Long memory Fractionally integrated AR models. Counting structural breaks.

Outline Classes of models. Penalty function model selection criteria. AIC (Akaike, 1973), BIC (Schwarz, 1978), DIC, FIC, SIC, TIC and more alphabet soup. MDL (Rissanen, 1979,... ) Analysis from the distant past AR model under 4th moment conditions. Relaxation of moment conditions to 2 < α < 4 moments. Applications: AR modelling. Long memory Fractionally integrated AR models. Counting structural breaks.

Model set Models live in a union of parametric classes M h M = h H M h where dim(m h ) = d h and often d h < d h+1. The simplest example is AR modelling of a stationary time series. Then M h consists of causal AR(h) models of the form h j=0 φ h,jy t j = ɛ t where φ h,0 = 1 and, to avoid ambiguities, φ h,h 0. We do not assume that the true model is in any of the M h, however it may be in the closure of M. For (regular) stationary series the Wold decomposition gives a causal AR( ) representation.

Model selection criteria Given observations y n = y 1,..., y n take an estimation criterion γ(y n ; θ) For each h we can find a ˆθ h minimizing γ(y n ; θ), θ M h. From all of the ˆθ h we would like to select a best model. Define a loss function l(θ 1, θ 2 ). A best model should minimise the expected loss. Writing θ as the true model, which need not be in M, select h n to minimize E y {l(θ, ˆθ h )} Of course θ is unknown, but any data-based model selection procedure should mimic this choice (this choice is occasionally called an oracle ).

Maximum likelihood Minimize the negative log likelihood γ(y n, ; θ) = log f (y n ; θ) over M h to estimate ˆθ h. Then an appropriate loss function is Kullback-Liebler divergence D(θ ˆθ) = f (z; θ ) log[f (z; θ )/f (z; ˆθ)]dz and the best model should minimize E y {D(θ ˆθ h )} = f (z; θ ) log f (z; θ ) E{ f (z; θ ) log f (z; ˆθ h )} The first term of the RHS is independent of h so select h to minimize the second term. Akaike s argument suggests that log f (y n ; ˆθ h ) is a biased estimate of the expected K-L divergence, and the bias is 2 [number of parameters].

Least squares prediction For 1 step prediction, minimize MSE [y t + t h a j y t j ] 2 j=1 obtaining coefficients ˆφ(h) = [ ˆφ h,1,..., φ h,h ] and prediction errors denoted by ˆɛ h,t. Denote the true AR( ) model by y t + φ j y t j = ɛ t j=1 E{ɛ 2 t } = σ 2 and φ = [φ 1, φ 2,...]. The loss function is { ( ) } 2 E{(ˆɛ h,t ɛ t ) 2 } = E ( ˆφ h,j φ j )y t j and the preferred model will minimize risk. Define ɛ h,t = y t + φ h,j y t j as the (unobservable) prediction error obtained by minimizing E{[y t h j=1 a jy t j ] 2 }.

Least square prediction ctd... Evidently ˆɛ h,t ɛ t = (ˆɛ h,t ɛ h,t ) + (ɛ h,t ɛ t ) Then the expected loss becomes the sum of two squared terms E{E{(ˆɛ h,t ɛ t ) 2 }} = E{E{(ˆɛ h,t ɛ h,t ) 2 }} + φ(h) φ 2 Γ The first term of the loss function is σ 2 /n (it is the expectation of a χ 2 h distribution) this term accounts for the variance in estimating the h parameters in the autoregression model. The second term is the bias due to using the wrong model decreases with increasing h. Let h n minimise the expected loss. As there is no true model in M we cannot talk about consistency, so a good data-based model selection procedure should select model close to ˆθ hn.

Penalty function criteria Penalize the estimation criterion according to the number of parameters. So, select the model that minimizes γ(y n ; ˆθ h ) + p(h, n) Depending on the kind of model being considered, every parameter need not have the same penalty. [When estimating a periodic signal, amplitudes are estimated with variance O(n 1 ) while frequencies are estimated with much higher precision, variance O(n 3 ). This requires different penalties, MDL for example suggests a penalty of log n/n for each amplitude, and 3 log n/n for each frequency, (Kavalieris and Hannan, 1993). Similar considerations apply when estimating the number of breakpoints, though that depends on the choice of asymptotics.]

Technical results from the 80 s Basic results used for the analysis of are the uniform convergence results for autocovariances: Denote sample and population covariance by ˆγ(h) = n 1 n j=h+1 y ty t k and γ(h) = E{y t y t h } respectively. Then max ˆγ(h) γ(h) = O 0 h H n { log n n } 1/2 almost surely. Conditions: y t = φ j ɛ t j where ɛ t are stationary martingale differences with a finite 4th moment. These results are due to Ted Hannan + coworkers; a synthesis of those results appears in Hannan and Deistler (1988).

Technical results ctd... For AR modelling, a penalty function criterion is log ˆσ h 2 + h log n/n where ˆσ n 2 = n 1 ˆɛ 2 h,t is the mean square prediction error from an AR(h) model. The prediction variance is partitioned into three bits: ˆσ h 2 = 1 ˆɛ 2 n h,t = 1 ɛ 2 n t +[Variance due to h parameters] +[Bias due to wrong model] Penalty term must wipe out enough variation in the 2nd and 3rd terms so that they look like the expected loss function for least squares estimation. Results are quite complete when an AR model is used to estimate a time series with geometrically decaying ACF (the ARMA case).

Moments of order 2 < α < 4 Can we relax the moment conditions? Yes, but at a cost... Kavalieris (2008) shows that if E{ ɛ t α } < and the ɛ t are independent, then max 0 h H n 1 n { } log n 1/2 ɛt ɛ t h = O n almost surely, but this is not enough to establish a similar rate of convergence for auto correlations. Under the independence assumption, we can show that max 0 h H n ˆγ(h) γ(h) = O p { log n n } 1/2 in probability (and the rate given here is the best possible when H n n a for any 0 < a < 1). Then most of the Hannan and Kavalieris (1986) results can be recovered.

Long memory The O(log n/n) 1/2 convergence rate was not established for long memory it requires j 1/2 ψ j < where y t = ψ j ɛ t j. If the ɛ t are iid and normally distributed, then we can recover the O(log n/n) 1/2 convergence rate, and a little more. [Normality is not really needed, only many moments!] Define v k,t = ψ k,j ɛ t j, k n a for any a > 0 where the ψ k,j are square summable. Then max k,l n a [vk,t v l,t E{v k,t v l,t }] Var{vk,t }Var{v l,t } = O(n log n) 1/2 Here it does not matter that min k Var{v k,t } = 0.

Fractional AR models Model: y t = (1 z) d φ(z)ɛ t where φ(z) = 1 + 1 φ jz j. This is the fractional ARIMA(, d, 0) model where the AR component is meant to capture the short range correlation structure. We want to estimate the model as an ARIMA(h, d, 0) model y t = (1 z) d φ h (z)ɛ t where now φ h (z) contains only h lags. In this case estimating h by minimizing a BIC penalty criterion does lead to consistent estimates of the spectrum (and a consistent estimate of d).

References Hannan, E.J. and Kavalieris, L. (1986) Regression-autoregression models, J. Time Series Analysis, 7, 27-49. Hannan, E.J. and Deistler, M. (1998) The statistical theory of linear systems Wiley, New York. Kavalieris, L. and Hannan, E.J (1994) Determining the number of terms in a trigonometrical series. J.T.S.A. 15, 613 626. Kavalieris, L. (2008) Uniform convergence of autocovariances. Statistics and Probability Letters 78, 830-838.