Focused Information Criteria for Time Series

Similar documents
The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods

Model selection using penalty function criteria

FocuStat, FIC themes, this workshop... with more to come

PARAMETRIC OR NONPARAMETRIC: THE FIC APPROACH

Statistics 910, #5 1. Regression Methods

Chapter 3: Regression Methods for Trends

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Likelihood-Based Methods

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Regression I: Mean Squared Error and Measuring Quality of Fit

Elements of Multivariate Time Series Analysis

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Ch 6. Model Specification. Time Series Analysis

Switching Regime Estimation

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

A time series is called strictly stationary if the joint distribution of every collection (Y t

An Introduction to Mplus and Path Analysis

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

An Introduction to Path Analysis

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

R&D Research Project: Scaling analysis of hydrometeorological time series data

Geographically Weighted Regression as a Statistical Model

Regression, Ridge Regression, Lasso

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp

Model Selection. Frank Wood. December 10, 2009

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

ARIMA Modelling and Forecasting

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Covariance function estimation in Gaussian process regression

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Econ 582 Nonparametric Regression

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Econ 623 Econometrics II Topic 2: Stationary Time Series

Chapter 9: Forecasting

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

ISyE 691 Data mining and analytics

Optimizing forecasts for inflation and interest rates by time-series model averaging

How the mean changes depends on the other variable. Plots can show what s happening...

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

F9 F10: Autocorrelation

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Construction of PoSI Statistics 1

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS

Statistica Sinica Preprint No: SS R3

STAT 518 Intro Student Presentation

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Statistics 262: Intermediate Biostatistics Model selection

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Econometric Forecasting

Testing Restrictions and Comparing Models

arxiv: v2 [stat.me] 15 Jan 2018

Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion

Lecture 4: Dynamic models

Bayesian nonparametric modelling of covariance functions, with application to time series and spatial statistics. Gudmund Horn Hermansen

Introduction to Statistical modeling: handout for Math 489/583

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Testing for Regime Switching in Singaporean Business Cycles

Topic 4: Model Specifications

Estimation of Parameters in ARFIMA Processes: A Simulation Study

Generalized Linear Models

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Differencing Revisited: I ARIMA(p,d,q) processes predicated on notion of dth order differencing of a time series {X t }: for d = 1 and 2, have X t

Econometría 2: Análisis de series de Tiempo

MS&E 226: Small Data

Akaike Information Criterion

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Forecast comparison of principal component regression and principal covariate regression

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

A strategy for modelling count data which may have extra zeros

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Lesson 15: Building ARMA models. Examples

Working Paper Series. 2/2013 Empirical prediction intervals revisited Y.S. Lee and S. Scholtes Forthcoming in the International Journal of Forecasting

On the Power of Tests for Regime Switching

S-GSTAR-SUR Model for Seasonal Spatio Temporal Data Forecasting ABSTRACT

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Variable Selection for Highly Correlated Predictors

Efficient and Robust Scale Estimation

Lecture Stat Information Criterion

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

X random; interested in impact of X on Y. Time series analogue of regression.

ISSN Article. Selection Criteria in Regime Switching Conditional Volatility Models

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

Tuning Parameter Selection in L1 Regularized Logistic Regression

Choice is Suffering: A Focused Information Criterion for Model Selection

Problem Set 2: Box-Jenkins methodology

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Transcription:

Focused Information Criteria for Time Series Gudmund Horn Hermansen with Nils Lid Hjort University of Oslo May 10, 2015 1/22

Do we really need more model selection criteria? There is already a wide range of criteria, e.g. AIC, AIC C, BIC, TIC, FPR, HQ, etc. The underlying motivations are not particularly well known among practitioners. Example: For stationary time series, there are two versions based on similar reasoning of the AIC, i.e. for model M we have and AIC n (M) = 2 log-likelihood max (M) 2p AIC (M) = 2 Whittle-log-likelihood max (M) 2p + 2q, where p = dim(m) and q has to be estimated, see Hermansen and Hjort (2015). There are (at least) three good selling points for the FIC: (1) allows for problem specific focus (2) clear and simple motivation (minimising estimated mse) (3) in principle as easy to use as the AIC (e.g. no tuning parameters) 2/22

The focused information criterion For model M let µ M be a focus parameter as a function of M, e.g. quantile, threshold probability, covariance lag, etc. It is important the µ M has the same interpretation across models. Let µ M be estimated by the plug-in principle for each M, e.g. in the case where M is specified by θ M R p we have µ M = µ( θ M ). The goal is to find best model/estimator for µ with respect to mse( µ M ) = (bias( µ M )) 2 + Var( µ M ) = sqb( µ M ) + Var( µ M ). Model selection strategy (1) Obtain a reasonable estimator for the mse and use FIC(µ, M) = mse( µ M ) = ŝqb( µ M ) + Var( µ M ). (2) Choose the model and estimator with the smallest estimated mse. 3/22

The focused information criterion Here, we take the common large-sample approximations approach to obtain general estimators for the mse. This work extends Claeskens and Hjort (2003). A parametric approach, where all models are nested between a wide specified by (θ, γ) and a narrow model with (θ, γ 0 ) and γ 0 known. The true generating model for Y is parametrised by (θ 0, γ 0 + δ/ n). The mse( µ M ) is based on an (unbiased) estimate for the mse of n( µm µ true ) in the limit experiment, where µ true is µ evaluated at the truth. The extension has certain time series specific challenges, e.g. - we would like to include predictions - and data-dependent foci like µ M (Y 1,... Y n ) = Pr{Y n+1 > a and Y n+2 > a Y 1,... Y n }, for a certain constant a. 4/22

Model and assumptions Let Y t = m β (x t ) + ɛ t, where ɛ t is a stationary Gaussian time series with E ɛ t = 0 and spectral density f η and x t are covariate vectors. Following Claeskens and Hjort (2003) the true model is specified by and m true (x t ) = m(x t ; β 0, γ 0,1 + δ 1 / n) f true (ω) = f(ω; ν 0, γ 0,2 + δ 2 / n) where θ 0 = (β 0, ν 0 ) R p1+p2 and γ 0 = (γ 0,1, γ 0,2 ) R q1+q2. In addition, we need (essentially) that 1 n [ m 0(x t )] t Σ(f 0 ) 1 [ m 0 (x t )] M exists with m 0 (x t ) = m(x t ; β 0, γ 0,1 )/ β and Σ(f 0 ) being the associated covariance matrix, and where f 0 (ω) = f(ω; ν 0, γ 0,2 ) has continuous and bounded second order derivatives. 5/22

Model and assumptions This setup allows for misspecification in both trend and dependency. The wide model has p + q = (p 1 + p 2 ) + (q 1 + q 2 ) parameters. The candidate models are nested between the wide model and the narrow model, where γ = γ 0. A total of 2 q1+q2 possible models obtained by including/excluding elements of γ 0. We only consider those that are judged as sufficiently plausible. There are few papers dealing with the FIC related topics for time series models, see e.g. Claeskens et al. (2007). The derived results are also valid for locally-stationary process of Dahlhaus (1997). Example: Suppose Y t = 0 + ɛ t and that we are interested in a certain (important) covariance lag h, then µ M (h) = cov fν (Y t, Y t+h ) = π π cos(ωh)f ν (ω) dω. 6/22

Okay, but does it really work? Simple simulation study with an AR(4) model focusing on various µ M (h) = cov fν (Y t, Y t+h ). The true model has σ = 1, ρ = (0.4, 0.4, 0.4, 0.2), and the figure is based on 50 simulated series of length n = 50. 7/22

The focused information criterion for time series Suppose µ = µ(θ), where θ = (β, ν), only depends on the model parameters. Then, for each submodel M we obtain a general argument for n( µm µ true ) d Λ M, where Λ M has a certain multivariate normal distribution. From this an unbiased estimator for mse(λ M ) is constructed via mse(λ M ) = ŝqb(λ M ) + Var(Λ M ) A common challenge is that ŝqb(λ M ) is itself biased and should be corrected, resulting in a robust mse(λ M ) = max{0, ŝqb(λ M ) bias(ŝqb(λ M ))} + Var(Λ M ) The FIC strategy is to use mse(λ M ) to approximate the mse for each submodel M. Here, compared to Claeskens and Hjort (2003) the general structure, arguments and formulas are quite similar. 8/22

The focused information criterion for time series Under the Gaussian assumption the parameters related to trend and dependency are independent in the limit. The traditional (non-robust) FIC can therefore be expressed as FIC(µ, M) = σ 2 narrow + 2( σ 2 M f + σ 2 M m ) + ( ψ wide ψ Mf ψ Mm ) 2, where the σ are related to the variance and the ψ to the bias. If µ is independent of either trend or dependency, e.g. µ = m β (1) m β (0) or µ = C(0) the FIC-scores are indifferent to changes in the excluded direction. This suggests to either detrend prior to the analysis or that scores should be estimated under the respective candidate model. Also, the formulas involved simplify if: - m β (x t ) = β - m β (x t ) = x t tβ and x t are from a well-behaved distribution - if x t is smooth in t - Y t is a locally-stationary process (cf. Dahlhaus (1997)) 9/22

What makes focus functions data-dependent? Some foci are more interesting in a conditional framework. Illustration: Consider the data-independent threshold probability or data-dependent µ = Pr{Y n+1 > a and Y n+2 > a} µ(h m ) = Pr{Y n+1 > a and Y n+2 > a H m } for suitable constant a, and with (recent) history H m = (Y n m+1,..., Y n ). In principle we could have H m with m = n. In practice, quite often m is independent of n and m n. Example: For AR(q) processes it will often be sufficient with m = q. Short-memory series with H m and m = m n should be effectively approximated in a fixed and recent history framework. 10/22

What about predictions? Predictions are essentially data-dependent focus functions. To easily see why, - let F k = (Y n+1,..., Y n+k ) represent the near future and - suppose we intend to predict g(f k ), e.g. for one-step ahead predictions g(f k ) = Y n+1, - and that µ M (H m ) is a predictor for g(f m ) - then mse( µ M (H m )) = E { µ M (H m ) E[g(F k ) H m ] + E[g(F k ) H m ] g(f k )} 2 = E { µ M (H m ) E[g(F k ) H m ]} 2 + 0 + E {E[g(F k ) H m ] g(h m )} 2 - with the conclusion that a good predictor for g(f k ) is equivalent (in term of mse) to a good estimator for µ true (H m ) = E[g(F k ) H m ]. 11/22

Why this interest with recent history? A data-dependent focus begs the question of mse( µ M (H m )) or mse( µ M (H m ) H m ). Use the one that best represents what is important. Does not necessarily make sense if m = n, since mse( µ M (H m ) H m ) = mse( µ M (H n ) H n ) = 0 for all unbiased submodels estimators. If m is independent of n and m n it makes sense to introduce cfic(µ, M, H m ) = mse( µ M (H m ) H m ). If the large-sample arguments hold conditionally and things are independent of H m it the limit, then: - the familiar FIC formulas remain largely unchanged (everything involving µ do now depend on H m ) - and should be interpreted in relation to conditional mse 12/22

Why this interest with recent history? A key step of the FIC argument depends on the delta method, i.e. n( µm (H m ) µ true (H m )) = n(µ( θ M, γ M, γ 0,M c, H m ) µ true (H m )). = µ(θ 0, γ 0 + δ/ n, H m ) t Z n where Z n = depends on H m through ( θ, γ M ). ( ) n( θ θ0 ) n( γm γ 0,M ) In the conditional framework µ(θ 0, γ 0 + δ/ n, H m ) is not random anymore, which simplifies the arguments needed. And if Z n H m d Z, with Z independent of H m, we have justified our cfic(µ, M, H m ) = mse( µ M (H m ) H m ). Is the conditional convergence of Z n generally true? Again, this will not work if m = n and H m = H n = (Y 1,..., Y n ). 13/22

Now, how does this play out unconditionally? It is much harder to find a simple limit experiment such that unconditionally. n( µ M (H m ) µ true (H m )) Λ M (H m ) pr 0 However, following the general idea mse(λ M (H M )) mse( n( µ M (H m ) µ true (H m ))) = E {mse( n( µ M (H m ) µ true (H m )) H m )}. E { mse(λ M (H m ) H m )} A quick (and dirty) solution resulting in explicit (but quite messy) formulas is to use FIC(µ, M, H m ) = Ê{mse( µ(h m) H m )}. 14/22

Illustration: The Hjort liver quality index (1859 2012) For a individual fish HSI fish = 100 weight of liver weight of fish. The HSI is a measure of the quality of life and is e.g. related to reproduction. Understanding the dynamic of the HSI index to e.g. external factors. 15/22

Illustration: The Hjort liver quality index (1859 2012) Predicting the future liver quality index. The model we consider is HSI yeari = β 0 + β 1 year i + x t iβ + ɛ i, where the intercept β 0, and σ are protected and x i contains winter Kola temperature, mortality rate (F) and food availability (capelin). 16/22

Illustration: The Hjort liver quality index (1859 2012) Other foci we looked at was relative slope and the probability of two lean years in a row. More details and discussion can be found in Hermansen et al. (2015) 17/22

Illustration: Prototype FIC R-package Simple threshold probability simulation experiment. True model is Y t = 1 + 2(t/n) + ɛ t, where ɛ t is an AR(4) prosess with ρ = (0.3, 0.2, 0.1, 0.1) and σ = 1, and n = 50. Focus parameter is and µ 1 (H m ) = 0.48. µ 1 (H m ) = Pr{Y n+1 > 0 H m }, m f mu mse bias sd psi tau.sq fic fic.b aic bic p fir.r fic.b.r aic.r bic.r 1 1 1111 0.56 0.97 0.00 0.99 1.70 0.820 1.60 1.6-258.98-277.22 7 5 5 5 8 2 1 1110 0.52 0.44-0.46 0.81 2.00 0.500 1.10 1.3-257.30-272.93 6 3 3 3 4 3 1 1100 0.53 0.43-0.47 0.81 2.00 0.490 1.10 1.3-255.31-268.34 5 2 2 1 2 4 1 1000 0.64 0.74 0.45 0.73 0.86 0.380 1.40 1.4-263.35-273.77 4 4 4 7 6 5 1 0000 0.78 7.30 2.70 0.50-1.10 0.096 8.00 8.0-273.17-280.98 3 10 10 9 9 6 0 1111 0.41 2.10 1.10 0.94 2.80 0.720 2.70 2.7-259.35-274.98 6 6 6 6 7 7 0 1110 0.40 2.20 1.30 0.75 3.10 0.400 2.90 2.9-257.44-270.47 5 9 9 4 3 8 0 1100 0.40 2.20 1.30 0.75 3.10 0.400 2.90 2.9-255.54-265.96 4 8 8 2 1 9 0 1000 0.48 0.01-0.65 0.66 2.00 0.280 0.67 1.1-265.65-273.46 3 1 1 8 5 10 0 0000 0.61 2.10 1.40 0.40 0.00 0.000 2.80 2.8-279.65-284.87 2 7 7 10 10 tau.null = 0.4 psi.wide = 1.7 18/22

Illustration: Prototype FIC R-package Simple threshold probability simulation experiment. True model is Y t = 1 + 2(t/n) + ɛ t, where ɛ t is an AR(4) prosess with ρ = (0.3, 0.2, 0.1, 0.1) and σ = 1, and n = 50. Focus parameter is and µ 2 (H m ) = 0.35. µ 2 (H m ) = Pr{Y n+1 > 0 and Y n+2 > 0 H m }, m f mu mse bias sd psi tau.sq fic fic.b aic bic p fir.r fic.b.r aic.r bic.r 1 1 1111 0.39 0.98 0.000 0.99 1.400 0.74 1.50 1.50-258.98-277.22 7 4 4 5 8 2 1 1110 0.38 0.98-0.037 0.99 1.400 0.74 1.50 1.50-257.30-272.93 6 3 3 3 4 3 1 1100 0.38 0.39-0.550 0.83 1.400 0.44 0.89 1.20-255.31-268.34 5 2 2 1 2 4 1 1000 0.52 1.80 1.100 0.72 0.025 0.28 2.30 2.30-263.35-273.77 4 6 6 7 6 5 1 0000 0.61 7.20 2.600 0.62-1.400 0.14 7.70 7.70-273.17-280.98 3 10 10 9 9 6 0 1111 0.24 2.60 1.300 0.92 2.700 0.60 3.10 3.10-259.35-274.98 6 8 8 6 7 7 0 1110 0.24 2.70 1.300 0.92 2.800 0.60 3.20 3.20-257.44-270.47 5 9 9 4 3 8 0 1100 0.25 2.00 1.200 0.74 2.700 0.30 2.50 2.50-255.54-265.96 4 7 7 2 1 9 0 1000 0.33-0.22-0.780 0.62 1.400 0.14 0.28 0.88-265.65-273.46 3 1 1 8 5 10 0 0000 0.37 1.30 1.100 0.49 0.000 0.00 1.80 1.80-279.65-284.87 2 5 5 10 10 tau.null = 0.49 psi.wide = 1.4 19/22

Illustration: Prototype FIC R-package Simple threshold probability simulation experiment. True model is Y t = 1 + 2(t/n) + ɛ t, where ɛ t is an AR(4) prosess with ρ = (0.3, 0.2, 0.1, 0.1) and σ = 1, and n = 50. Focus parameter is and µ 10 (H m ) = 0.79. µ 10 (H m ) = Pr{Y n+10 > 0 H m }, m f mu mse bias sd psi tau.sq fic fic.b aic bic p fir.r fic.b.r aic.r bic.r 1 1 1111 0.78 0.52 0.0000 0.72-0.84 0.200 0.39 0.39-258.98-277.22 7 3 3 5 8 2 1 1110 0.77 0.52-0.0032 0.72-0.83 0.200 0.39 0.39-257.30-272.93 6 2 2 3 4 3 1 1100 0.77 0.52-0.0096 0.72-0.83 0.200 0.39 0.39-255.31-268.34 5 1 1 1 2 4 1 1000 0.79 0.58 0.2400 0.72-1.10 0.190 0.44 0.44-263.35-273.77 4 4 4 7 6 5 1 0000 0.78 0.96 0.6800 0.71-1.50 0.180 0.83 0.83-273.17-280.98 3 6 6 9 9 6 0 1111 0.59 2.50 1.5000 0.59 0.69 0.020 2.40 2.40-259.35-274.98 6 8 8 6 7 7 0 1110 0.59 2.50 1.5000 0.59 0.69 0.020 2.40 2.40-257.44-270.47 5 10 10 4 3 8 0 1100 0.59 2.50 1.5000 0.59 0.69 0.020 2.40 2.40-255.54-265.96 4 9 9 2 1 9 0 1000 0.60 1.80 1.2000 0.58 0.43 0.013 1.60 1.60-265.65-273.46 3 7 7 8 5 10 0 0000 0.61 0.83 0.7100 0.57 0.00 0.000 0.70 0.70-279.65-284.87 2 5 5 10 10 tau.null = 0.57 psi.wide = -0.84 20/22

Concluding remarks Some work is still needed before completion. An R-package is planned/under development. Simulation study. Model averaging and AFIC. There are no good model selection tools for the locally-stationary processes of Dahlhaus (1997). The FIC based on the Whittle approximation (cf. Whittle (1953)) is equally rational motivation. The methodology is valid for models with known change point locations. Seasonality. Nonparametric (focused) covariance estimator. 21/22

References Claeskens, G., Croux, C., and Van Kerckhoven, J. (2007). Prediction focussed model selection for autoregressive models. Australian & New Zealand Journal of Statistics, 49(4):359 379. Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98:900 916. Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Annals of Statistics, 15:1 37. Hermansen, G. and Hjort, N. L. (2015). A new approach to Akaike s information criterion and model selection issues in stationary Gaussian time series. Technical report, University of Oslo and Norwegian Computing Centre. Hermansen, G. H., Hjort, N. L., Kjesbu, O. S., and Tara Marshall, C. (2015). Recent advances in statistical methodology applied to the hjort liver index time series (1859 2012) and associated influential factors 1. Canadian Journal of Fisheries and Aquatic Sciences, 73(999):1 17. Whittle, P. (1953). The analysis of multiple stationary time series. Journal of the Royal Statistical Society. Series B (Methodological), 15(1):125 139. 22/22