Minimum Message Length Autoregressive Model Order Selection

Similar documents
Minimum Message Length Grouping of Ordered Data

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Efficient Linear Regression by Minimum Message Length

Performance of Autoregressive Order Selection Criteria: A Simulation Study

Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay

Review Session: Econometrics - CLEFIN (20192)

The Minimum Message Length Principle for Inductive Inference

Econometric Forecasting

Forecasting Levels of log Variables in Vector Autoregressions

Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

10. Time series regression and forecasting

Order Selection for Vector Autoregressive Models

MML Invariant Linear Regression

Alfredo A. Romero * College of William and Mary

Univariate ARIMA Models

On Autoregressive Order Selection Criteria

Autoregressive Moving Average (ARMA) Models and their Practical Applications

CONSIDER the p -th order autoregressive, AR(p ), explanation

Elements of Multivariate Time Series Analysis

Univariate Time Series Analysis; ARIMA Models

Lecture 2: Univariate Time Series

Empirical Market Microstructure Analysis (EMMA)

On Consistency of Tests for Stationarity in Autoregressive and Moving Average Models of Different Orders

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

Lecture on ARMA model

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Model selection using penalty function criteria

Econometrics II Heij et al. Chapter 7.1

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Autoregressive Approximation in Nonstandard Situations: Empirical Evidence

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

ARDL Cointegration Tests for Beginner

Estimating Moving Average Processes with an improved version of Durbin s Method

Quantitative Finance I

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Model selection criteria Λ

Stationarity and cointegration tests: Comparison of Engle - Granger and Johansen methodologies

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

MML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration

Econometría 2: Análisis de series de Tiempo

Problem Set 2: Box-Jenkins methodology

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Lecture 4a: ARMA Model

Cointegration Lecture I: Introduction

Terence Tai-Leung Chong. Abstract

Time Series I Time Domain Methods

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

AR, MA and ARMA models

Lecture 1: Stationary Time Series Analysis

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

STAT Financial Time Series

Financial Econometrics

Applied time-series analysis

7. Integrated Processes

Stationary Stochastic Time Series Models

Information Criteria and Model Selection

Arma-Arch Modeling Of The Returns Of First Bank Of Nigeria

Lecture note 2 considered the statistical analysis of regression models for time

Estimating AR/MA models

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Volume 30, Issue 1. The relationship between the F-test and the Schwarz criterion: Implications for Granger-causality tests

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

3 Theory of stationary random processes

Minimum Message Length Shrinkage Estimation

ITSM-R Reference Manual

Introduction to ARMA and GARCH processes

Time Series: Theory and Methods

Box-Jenkins ARIMA Advanced Time Series

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

FE570 Financial Markets and Trading. Stevens Institute of Technology

Forecasting 1 to h steps ahead using partial least squares

ISyE 691 Data mining and analytics

Dynamic Time Series Regression: A Panacea for Spurious Correlations

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Advanced Econometrics

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

Gaussian Copula Regression Application

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Applied Time. Series Analysis. Wayne A. Woodward. Henry L. Gray. Alan C. Elliott. Dallas, Texas, USA

Recall that the AR(p) model is defined by the equation

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

System Identification

SOME COMMENTS ON THE THEOREM PROVIDING STATIONARITY CONDITION FOR GSTAR MODELS IN THE PAPER BY BOROVKOVA et al.

Chapter 4: Models for Stationary Time Series

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

Transcription:

Minimum Message Length Autoregressive Model Order Selection Leigh J. Fitzgibbon School of Computer Science and Software Engineering, Monash University Clayton, Victoria 38, Australia leighf@csse.monash.edu.au David L. Dowe School of Computer Science and Software Engineering, Monash University Clayton, Victoria 38, Australia Farshid Vahid Department of Econometrics and Business Statistics, Monash University Clayton, Victoria 38, Australia Farshid.Vahid@BusEco.monash.edu.au Abstract We derive a Minimum Message Length MML) estimator for stationary and nonstationary autoregressive models using the Wallace and Freeman 987) approximation. he MML estimator s model selection performance is empirically compared with AIC, AICc, BIC and HQ in a Monte Carlo experiment by uniformly sampling from the autoregressive stationarity region. Generally applicable, uniform priors are used on the coefficients, model order and log σ for the MML estimator. he experimental results show the MML estimator to have the best overall average mean squared prediction error and best ability to choose the true model order. Keywords Minimum Message Length, MML, Bayesian, Information, ime Series, Autoregression, AR, Order Selection. INRODUCION he Wallace and Freeman 987) Minimum Message Length estimator MML87) [] is an information-theoretic criterion for model selection and point estimation. It has been successfully applied to many problems including univariate) linear and polynomial regression models in [9, 8, 7, 4] and sequence data [6, 7], etc. [,, ]). he purpose of this paper is to investigate the use of the MML87 methodology for autoregressive time series) models. Autoregressive models differ from standard linear regression models in that they do not regress on independent variables since the regressor is a subset of the dependent variables i.e., its lagged values) - the independent variable is really time. his has important philosophical and practical) ramifications for the MML87 regression estimator, which otherwise assumes that the independent variables are transmitted to the receiver up front or are already known by the receiver). he independent variables appear in the Fisher information matrix, which is necessary for computation of the MML87 coding volume. Such a protocol would be nonsensical if directly applied to autoregression since it corresponds to transmission of the majority of the data, before the data, in order to transmit the data. In this paper we investigate several MML87 estimators that take these issues into account. In particular, we focus on an MML87 estimator that is based on the conditional likelihood function, using the least squares parameter estimates. his MML estimator s model selection performance is empirically compared with AIC [], corrected AIC AICc) [], BIC [5] or 978 MDL [3]), and HQ [9] in a Monte Carlo experiment by uniformly sampling from the autoregressive stationarity region. While MML is geared towards inference rather than prediction, we find that choosing the autoregressive model order having the minimum message length gives a prediction error that is superior to the other model selection criteria in the Monte Carlo experiments conducted. BACKGROUND Autoregression Regression of a time series on itself, known as autoregression, is a fundamental building block of time series analysis. A linear autoregression with unconditional mean of zero relates the expected value of the time series linearly to the p previous values: y t = φ y t +... + φ p y t p + e t, e t N, σ ) ) An order p autoregression ARp)) model has p + parameters: θ = φ,..., φ p, σ ). Some example data generated from various AR) models are displayed in Figures to 7. he examples chosen are all stationary with all inside, but many close to, the boundary of the stationarity region see Figure ) to illustrate some of the diversity that can be found in an autoregression model., ),) φ, ) Figure. AR) stationarity region with the plotted example points see Figures to 7 on next page) identified. In this paper we only consider zero mean autoregressions for simplicity. his does not cause any loss of generality, and all results would be qualitatively the same without this assumption. φ

Figure. Example data: φ =, φ =.99 and σ =. Figure 3. σ =. Example data: φ =.499, φ =.499 and Figure 4. Example data: φ =.99, φ =.995 and σ =. Figure 5. Example data: φ =, φ =.5 and σ =. Figure 6. Example data: φ =, φ =.99 and σ =. Figure 7. Example data: φ =.99, φ =.995 and σ =. he conditional negative log-likelihood of θ is: log fy p+,..., y θ, y,.., y p ) = p logπσ ) + σ y i φ y i... φ p y i p ) ) i=p+ It is common to make the assumption that y t is weakly stationary. his assumption means that the time series has a constant mean in our case zero) and the autocovariances depend only on the distance in time between observations and not the times themselves: Ey t ) = t 3) Ey t y t j ) = Ey t y t+j ) = γ j t j 4) he first p γ s can be calculated as the first p elements of the first column of the p by p matrix σ I p F F )) [8, page 59], where F is defined as [8]: F = φ φ φ 3... φ p φ p............... 5) he stationarity assumption implies that the first p values are distributed as a multivariate Normal distribution [8, page 4]. he negative log-likelihood of θ for the first p values, z p = [y, y,..., y p ], is: log fy,..., y p θ) = p logπσ ) log Vp + σ z pvp z p 6) where σ V p is the p by p autocovariance matrix. he exact negative log-likelihood function, when stationarity is assumed, is therefore using equations 6 and ): log fy θ) = log fy,..., y p θ) log fy p+,..., y θ, y,.., y p )7) Parameter Estimation here are a large number of methods used to estimate the parameters of an autoregressive model see, e.g., [8, 3, 4]). Some of these methods include: Least squares Conditional least squares OLS) Yule-Walker Burg Maximum likelihood Maximum likelihood is rarely used because its solution is non-linear, requiring a numerical solution which is slow and can suffer from convergence problems [4]. he estimates provided by the Yule-Walker, Burg and maximum likelihood methods are guaranteed to be stationary whereas the least conditional on the first p observed values.

squares estimates are not [4]. Conditional least squares estimates are often preferred since they are easily computed and consistent regardless of whether the data is considered to be stationary or not [8, page 3]. Model Order Selection Some methods that are commonly used to automatically select the order of an autoregression are AIC [], corrected AIC AICc) [], BIC [5] the same as 978 MDL [3], as noted below), and HQ [9]: AICp) = logˆσ p) + p 8) AICcp) = logˆσ p) p + ) + 9) p BICp) = MDLp) = logˆσ p) + p log ) ) HQp) = logˆσ p) p log log ) + ) where ˆσ p is the maximum likelihood estimate of the noise variance for the pth order model. In practice, the exact maximum likelihood estimate is rarely used. Instead, one of the other methods mentioned in the previous section is used as an approximation. A robust method that can be used for stationary and nonstationary data [6] is to use the OLS estimate for φ,..., φ p and estimate σp using ˆσ p = p i=p+ y i ˆφ y i +... + ˆφ ) p y i p ) ) Minimum Message Length Inference he Minimum Message Length MML) principle [,, ] is a method of inductive inference, and encompasses a large class of approximations and algorithms. In this paper we use the popular MML87 approximation [], which approximates the message length for a model consisting of several continuous parameters θ = θ,..., θ n ) by: ) hθ)fy,..., y N θ)ɛ N MessLenθ, y) = log + Iθ) n + log κ n) log hn) 3) where hθ) is a prior distribution over the n parameter values, fy,..., y N θ) is the standard statistical likelihood function, Iθ) is the expected Fisher Information matrix, κ n is a lattice constant κ = /, κ = 5/36 3), κ 3 = 9/9 /3 ), κ n /πe) as n ) [5, page 6] [, sec. 5.3] which accounts for the expected error in the log-likelihood function due to quantisation of the n-dimensional space, ɛ is the measurement accuracy of the data, and hn) is the prior on the number of parameters. he MML87 message length equation, Equation 3, is used for model selection and parameter estimation by choosing the model order and parameters that minimise the message length. For a comparison between MML and the subsequent Minimum Description Length MDL) principle [3], see []). MML87 AUOREGRESSION In this section we derive several MML87 message length expressions for an autoregression model. he first decision that must be made is the form of the MML message. here are two seemingly obvious message formats that we could use based on the conditional or exact likelihood functions. If the exact likelihood function is used we are assuming that the data comes from a stationary process. If the conditional likelihood function Equation ) is used, stationarity need not be assumed, but we must transmit the first p elements {y,..., y p }) of the series prior to transmission of the data. In this section we give three message formats. he first uses the exact likelihood function and thus assumes stationarity, the second uses the conditional likelihood function and does not assume stationarity, and the third is a combination of the previous two. Stationary Message Format he format of the MML87 message using the exact likelihood function Equation 7) is as follows:. First part: p, φ,..., φ p, σ. Second part: y,..., y MML87 requires the determinant of the expected Fisher information matrix in order to determine the uncertainty volume for the continuous parameters. We can write the expected Fisher information as the sum of two terms which arise in the exact likelihood function Equation 7 from Equations 6 and ): Iθ) = I y,...,y p θ) + I yp+,...,y θ) 4) he expected Fisher information is easily calculated for the conditional likelihood term: [ σ EX ] X) I yp+,...,y θ) = 5) p σ 4 where X is a p) by p matrix: y p y p... y y p+ y p... y X =.... y y... y p 6) Since we have assumed stationarity, Ey t y t j ) = γ j, and therefore: [ ] p)vp I yp+,...,y θ) = 7) p σ 4 he Fisher information for the multivariate Normal term recalling Equation 6), I y,...,y p θ), is difficult to calculate and Iθ) can be approximated as p I y p+,...,y θ) see, e.g., [3, page 33]). Using this approximation, the determinant of

the expected Fisher information is equal to: Iθ) p+ σ 4 V p 8) he stationary model message length is calculated by substitution of Equations 7 and 8 into Equation 3 along with the selection of a suitable prior we give a generally applicable prior in a later section). Nonstationary Message Format In this subsection we give the format of the MML message based on the conditional likelihood function Equation ). When the conditional likelihood function is used we must transmit the first p elements, {y,..., y p }, of the series to the receiver prior to transmission of the rest of) the data. he format of the message is therefore:. First part: p, y,..., y p, φ,..., φ p, σ. Second part: y p+,..., y We see that the initial values are transmitted before the continuous parameters, therefore they can then be used to determine the optimal uncertainty volume for the continuous parameters. However, the downside of this ordering is that the receiver must be able to decode the first p values of the time series in order to be able to decode both the parameters and remaining) data. he initial values must therefore be encoded without such knowledge. We assume that the receiver knows the minimum ymin) and maximum ymax) of the data 3. We can then encode the initial p values independently, using a uniform density over the interval [ymin ɛ/, ymax + ɛ/] where ɛ is the measurement accuracy of the data, in a message of length: MessLeny,..., y p ) = p log + ymax ymin ɛ ) 9) Since the data elements in the message are also encoded to an accuracy of ɛ recall Equation 3) we find the term log ɛ appearing in the overall message length and therefore for sufficiently small ɛ) the choice of ɛ does not significantly affect model order selection. Calculation of the expected Fisher information for the conditional likelihood function in the nonstationary case is difficult. Instead, we use the partial expected Fisher: [ σ X ] X I yp+,...,y θ) p σ 4 ) he nonstationary model message length is then the sum of Equation 9 and Equation 3 after substitution of Equations and and the selection of a suitable prior we give a generally applicable prior in a later section). Combining the wo Formats Combination Format) A slightly more efficient message can be constructed by using a combination of the stationary and nonstationary messages. he sender can compute the length of the transmission using 3 or equivalently that these values have been transmitted using a general code, thus adding a constant to the message length. both messages and then choose to use the one having the shorter message length. An additional bit is required at the beginning of the message to tell the receiver which message format has been used. In practice we will generally be using the OLS estimate since the proper MML estimate requires a non-linear solution. For the OLS estimate this coding scheme can be translated to the following procedure: Compute the OLS estimate. Determine if the estimated model is stationary e.g. by checking the eigenvalues of F, Equation 5). If nonstationary then the nonstationary message must be used because the assumptions made in the multivariate Normal term Equation 6) of the likelihood function are no longer valid Else if stationary) use the shorter of the two messages. BAYESIAN PRIORS Supposing that we are ignorant prior to observation of the data, but expect the data to be stationary, we place a uniform 4 prior on p, φ,..., φ p and log σ : hp) ) hφ,..., φ p, σ ) R p σ = R p σ ) where R p is the hypervolume of the stationarity region. Based on results in [], [] gives the following recursion for computation of the stationarity region hypervolume: M = p M p+ = p + M p R p = M M 3... M p ) for p even R p+ = R p M p+ he stationarity region hypervolume is plotted in Figure 8 for orders to 3. EMPIRICAL COMPARISON In the following experiments we have used the conditional ordinary least squares OLS) estimates and concentrated on the nonstationary message format i.e., the second format) 5 using the priors given in the previous section. he nonstationary MML format is labelled MML in the graphs and tables. We have used the OLS estimates because they come closest to the true minimum message length estimates. When data is generated we simulate + elements and then dispose of the initial elements. he residual variance estimate, ˆσ, is calculated using Equation where is 4 We note that the ranges used on p and log σ do not affect the order selection experiments conducted in this paper but would need to be specified in some modelling situations. 5 While the combination message format was found to provide a slight improvement over the nonstationary message format, it did not fit into our experimental protocol i.e., it could be considered to have an unfair advantage over the other criteria by gaining information from the multivariate Normal terms appearing in the exact likelihood function). It is also less practical since it requires significantly more computation time.

R p 9 8 7 6 5 4 3 5 5 5 3 p Figure 8. Plot of the hypervolume of the stationarity region for orders to 3. the total sample size). he mean squared prediction error MSPE) is used to measure performance: MSPEp) = i= + y i ˆφ y i +... + ˆφ ) p y i p ) 3) AR) Experiments Data were generated from an AR) model for varied values of φ over the stationarity region, with = 5. Each method was required to select the model order from the set of OLS estimates for orders to. he results can be seen in Figure 9. Each point represents the average MSPE for data-sets. We see that MML has the smallest average MSPE, especially when the signal is strong i.e., for φ ). ARp) Experiments We conducted experiments for p = to p = where autoregression models were sampled uniformly from the stationarity region see [] for a means of sampling). Each method was required to select the model order from the set of OLS estimates for orders to p max. he results for p max = and = 3 can be found in able and Figure. Results for p max = and = 5 can be found in able and Figure. he tables include both frequency counts indicating the number of times each method under/correctly/over inferred the model order and the standard deviation of the MSPE. Each figure and table clearly shows that the MML criterion has the best overall MSPE and best ability to choose the correct model order. able. ARp) Simulation result totals for = 3 Criterion ˆp < p ˆp = p ˆp > p Average MSPE AIC 35 74 859.87 ± 5.5384 AICc 3899 893 48.59 ± 4.75 BIC 337 55 578.336 ± 5.8 HQ 85 65 7496.655 ± 5.4838 MML 5999 396 85.7 ±.668 able. ARp) Simulation result totals for = 5 Criterion ˆp < p ˆp = p ˆp > p Average MSPE AIC 86 3 95.559 ±.759 AICc 36 3445 4549.67 ±.585 BIC 45 38 39.65 ±.649 HQ 99 385 663.655 ±.876 MML 573 434 973.956 ±.668.8.6.4 AIC AICc BIC HQ MML 3.5 3 AIC AICc BIC HQ MML. Average MSPE.8.6 Average MSPE.5.4.5..8.6.4...4.6.8 φ 3 4 5 6 7 8 9 Model Order Figure 9. Average Mean Squared Prediction Error for data-sets for varied true φ = φ with = 5. Figure. Average Mean Squared Prediction Error for data-sets for varied true p with = 3.

Average MSPE 3.8.6.4..8.6.4. AIC AICc BIC HQ MML 3 4 5 6 7 8 9 Model Order Figure. Average Mean Squared Prediction Error for data-sets for varied true p with = 5. ACKNOWLEDGEMENS We gratefully acknowledge the support of Australian Research Council ARC) Discovery Grant DP34365. CONCLUSION We have investigated autoregressive modelling in the MML framework using the Wallace and Freeman 987) approximation. hree message formats were formulated. he most appropriate format for general use is the nonstationary format, which is based on the conditional likelihood function. his format is simple, easily computed and can be applied regardless of whether the data is stationary or not. When used in conjunction with least squares estimates the MML estimator was found to have very good performance in squared error and order selection), as the experimental results show. REFERENCES [] Akaike, H. A new look at the statistical model identification. IEEE ransactions on Automatic Control AC-9, 6 974), 76 73. [] Barndorff-Nielsen, O., and Schou, G. On the parametrization of autoregressive models by partial autocorrelations. Journal of Multivariate Analysis 3 973), 48 49. [3] Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. ime Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, New Jersey, 994. [4] Broersen, P. M.., and de Waele, S. Empirical time series analysis and maximum likelihood estimation. In IEEE Benelux Signal Processing Symposium ), pp. 4. [5] Conway, J. H., and Sloane, N. J. A. Sphere Packings, Lattices and Groups. Springer-Verlag, New York, 999. [6] Edgoose,., and Allison, L. MML Markov classification of sequential data. Journal of Statistics and Computing 9 999), 69 78. [7] Fitzgibbon, L. J., Allison, L., and Dowe, D. L. Minimum message length grouping of ordered data. In Algorithmic Learning heory Berlin, ), vol. 968 of LNAI, Springer-Verlag, pp. 56 7. [8] Hamilton, J. D. ime Series Analysis. Princeton University Press, New Jersey, 994. [9] Hannan, E. J., and Quinn, B. G. he determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B Methodological) 4, 979), 9 95. [] Hurvich, C. M., and sai, C. Regression and time series model selection in small samples. Biometrika 76 989), 97 37. [] Jones, M. C. Randomly choosing parameters from the stationarity and invertibility region of autoregressive-moving average models. Journal of Applied Statistics 36, 987), 34 38. [] Piccolo, D. he size of the stationarity and invertibility region of an autoregressive-moving average process. Journal of ime Series Analysis 3, 4 98), 45 47. [3] Rissanen, J. J. Modeling by shortest data description. Automatica 4 978), 465 47. [4] Rumantir, G. W., and Wallace, C. S. Sampling of highly correlated data for polynomial regression and model discovery. In Advances in Intelligent Data Analysis Mar. ), vol. 89, pp. 37 377. [5] Schwarz, G. Estimating the dimension of a model. he Annals of Statistics 6 978), 46 464. [6] say, R. S. Order selection in nonstationary autoregressive models. he Annals of Statistics, 4 984), 45 433. [7] Vahid, F. Partial pooling: A possible answer to o pool or not to pool. In Cointegration, Causality, and Forecasting: A Festschrift in honour of Clive W. J. Granger, R. F. Engle and H. White, Eds. Oxford University Press, New York, 999. Chapter 7. [8] Viswanathan, M., and Wallace, C. S. A note on the comparison of polynomial selection methods. In Workshop on Artificial Intelligence and Statistics Jan. 999), Morgan Kauffman, pp. 69 77. [9] Wallace, C. S. On the selection of the order of a polynomial model. ech. rep., Royal Holloway College, London, 997. [] Wallace, C. S., and Boulton, D. M. An information measure for classification. Computer Journal, Aug. 968), 85 94. [] Wallace, C. S., and Dowe, D. L. Minimum message length and Kolmogorov complexity. he Computer Journal 4, 4 999), 7 83. [] Wallace, C. S., and Freeman, P. R. Estimation and inference by compact encoding. Journal of the Royal Statistical Society. Series B Methodological) 49 987), 4 5.