AR-order estimation by testing sets using the Modified Information Criterion

Similar documents
Selecting an optimal set of parameters using an Akaike like criterion

Testing composite hypotheses applied to AR-model order estimation; the Akaike-criterion revised

On the convergence of the iterative solution of the likelihood equations

On the convergence of the iterative solution of the likelihood equations

Order Selection for Vector Autoregressive Models

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Performance of Autoregressive Order Selection Criteria: A Simulation Study

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Introduction to Statistical modeling: handout for Math 489/583

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Model comparison and selection

CHAPTER 6: SPECIFICATION VARIABLES

Time Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation

If we want to analyze experimental or simulated data we might encounter the following tasks:

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Automatic Spectral Analysis With Time Series Models

On Moving Average Parameter Estimation

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Model selection criteria Λ

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

On Autoregressive Order Selection Criteria

Model selection using penalty function criteria

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Information Criteria and Model Selection

On the Behavior of Information Theoretic Criteria for Model Order Selection

Testing and Model Selection

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson

10. Time series regression and forecasting

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

1.5 Testing and Model Selection

Box-Jenkins ARIMA Advanced Time Series

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Brief Sketch of Solutions: Tutorial 3. 3) unit root tests

LTI Systems, Additive Noise, and Order Estimation

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

Detection of Outliers in Regression Analysis by Information Criteria

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

THE PROCESSING of random signals became a useful

Full terms and conditions of use:

Day 4: Shrinkage Estimators

Linear Discrimination Functions

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Bias-corrected AIC for selecting variables in Poisson regression models

Economics 308: Econometrics Professor Moody

Minimum Feedback Rates for Multi-Carrier Transmission With Correlated Frequency Selective Fading

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Goodness of Fit Test and Test of Independence by Entropy

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

A New Method for Varying Adaptive Bandwidth Selection

1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

ISSN Article. Simulation Study of Direct Causality Measures in Multivariate Time Series

ROYAL INSTITUTE OF TECHNOLOGY KUNGL TEKNISKA HÖGSKOLAN. Department of Signals, Sensors & Systems

ISyE 691 Data mining and analytics

Föreläsning /31

Statistical Data Analysis Stat 3: p-values, parameter estimation

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Przemys law Biecek and Teresa Ledwina

Alfredo A. Romero * College of William and Mary

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

ACCURATE ASYMPTOTIC ANALYSIS FOR JOHN S TEST IN MULTICHANNEL SIGNAL DETECTION

Regression and Time Series Model Selection in Small Samples. Clifford M. Hurvich; Chih-Ling Tsai

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Math 423/533: The Main Theoretical Topics

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

2.2 Classical Regression in the Time Series Context

NONLINEAR STRUCTURE IDENTIFICATION WITH LINEAR LEAST SQUARES AND ANOVA. Ingela Lind

Anomaly Detection in Time Series of Graphs using ARMA Processes

A Test of Cointegration Rank Based Title Component Analysis.

Constructing Ensembles of Pseudo-Experiments

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Statistics 262: Intermediate Biostatistics Model selection

Univariate linear models

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

On the econometrics of the Koyck model

INFORMATION THEORY AND AN EXTENSION OF THE MAXIMUM LIKELIHOOD PRINCIPLE BY HIROTOGU AKAIKE JAN DE LEEUW. 1. Introduction

Expressions for the covariance matrix of covariance data

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

A. Motivation To motivate the analysis of variance framework, we consider the following example.

5 Autoregressive-Moving-Average Modeling

Univariate ARIMA Models

Bootstrap for model selection: linear approximation of the optimism

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Automatic Autocorrelation and Spectral Analysis

The Method of Finite Difference Regression

Model Selection by Sequentially Normalized Least Squares

LATVIAN GDP: TIME SERIES FORECASTING USING VECTOR AUTO REGRESSION

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Review Session: Econometrics - CLEFIN (20192)

Topic 4: Model Specifications

Transcription:

AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows performance control by means of a simple a priori defined parameter, the upper-bound on the error of the first kind (false alarm probability). The criterion MIC is for example used to estimate the order of Auto-Regressive (AR) processes. The criterion can only be used to test pairs of composite hypotheses; in an AR-order estimation this leads to sequential testing. Usually the Akaike criterion is used to test sets of composite hypotheses. The difference between sequential and set testing corresponds with the difference between searching the first local and the global minimum of the Akaike criterion. We extend the criterion MIC to testing a composite null-hypothesis versus a set of composite alternative hypotheses; these alternative hypotheses form a sequence where every element introduces one additional parameter. The theory is verified by simulations and is compared with the Akaike criterion used in sequential and set testing. Due to the excellent correspondence between the theory and the experimental results we consider the AR-model order estimation problem for low order AR-processes with Gaussian white noise as solved. Key words: AIC, Akaike criterion, AR, autoregressive processes, composite hypothesis, maximum likelihood, model order, system identification, time series analysis. 1 Introduction In a recent publication [1] (see also [2, 3]) we introduced the Modified Information Criterion (MIC 1 )) and applied this criterion to Auto-Regressive (AR) model order estimation. In its original form the criterion MIC could only be applied to testing pairs of composite hypotheses[5, chapter 35]; later some extensions have been derived. The criterion MIC can be compared with the Akaike criterion (AIC) [6, 4, 7] and the Generalized Information Criterion (GIC) [8]. All criteria mentioned are used to test composite hypotheses, [9, pp. 86 96]; a composite hypothesis is a hypothesis with some unknown parameters to be estimated. University of Groningen, Department of Computing Science, P.O. Box 800, NL-9700 AV Groningen, The Netherlands, e-mail: rudy@cs.rug.nl 1 ) IC stands for information criterion and A is added so that similar statistics, BIC, DIC etc., may follow [4]. Similarly the M of modified is added 1

ML: Maximum Likelihood MLL: Mean Log Likelihood (expected log likelihood) MMLL: Maximum Mean Log Likelihood; i.e. the maximum of the MLL-function ALL: Average Log Likelihood; an unbiased statistic to estimate the MLL MALL: Maximum Average Log Likelihood; i.e. the ALL-function at the MLestimate; a biased statistic to estimate the MMLL Table 1: Frequently used abbreviations In AR-model order estimation there is an essential difference between the application of MIC versus AIC or GIC. The models with different AR-order are called the hypotheses H I. The index I of the hypotheses corresponds to the ARmodel order. There exist two essentially different approaches with respect to testing composite hypotheses applied to AR-model order estimation: sequential testing and set testing. In sequential testing we test an I th order AR-model versus a J th order ARmodel where J = I + 1 for increasing I till the I th order model is preferred above the J th order model. The last value of I is considered to be an estimate of the correct AR-model order M. For sequential testing a reliable method to test pairs of composite hypotheses is needed. In case of the Akaike criterion or GIC the method of order selection is not described, although this is essential for the result. Most researchers use set testing, they compute the criterion AIC or GIC for all orders up to a certain maximum candidate order L and use the order with smallest AIC or GIC as an estimate of the correct order. Using sequential testing the estimated AR-model order corresponds to the first local minimum of the criterion. Set testing selects the AR-order corresponding with the absolute minimum as a function of the model order given L. In a recent publication [10] we showed that significantly better results can be achieved by using sequential testing instead of set testing in case of AR model selection using AIC or GIC. The criterion MIC has been designed to test pairs of composite hypotheses, so it can only be used in sequential testing. The introduction of the criterion MIC made it possible to estimate the AR-order with a preselected maximum probability α on selecting a model with a too high AR-order. Reading referee reports on our recent publications indicates that the signal processing community is reluctant to accept sequential testing. Is it possible to apply the criterion MIC in case of set testing? In this publication we present such an extension to the criterion MIC. 2

I J K L M N AR-order of the (null) hypothesis AR-order of the alternative hypothesis size of the alternative set of hypotheses maximum candidate order correct AR-order number of samples Table 2: Used symbols 2 Formulation of the problem Assume the stochastic signal x, generated by an M th order AR-model: x n = ɛ n + a 1 x n 1 + a 2 x n 2 +... + a M x n M (1) where the noise ɛ is stationary, white and normally distributed with zero mean and unit variance. The correct conditional probability density function (pdf) g(x n x n 1,..., x n M ) of the stochastic process x is according to hypothesis H I modeled by the conditional pdf of an I th order AR-model f x (x n x n 1,..., x n I ; a 1,..., a I, σ) = f ɛ (ɛ n ; σ) (2) where f ɛ (ɛ n ; σ) is a normal distribution with variance σ 2. This AR-model has a parameter vector p I = (a 1, a 2,..., a I, σ) which consists of (dim p I ) = I +1 independently adjustable parameters. Determine the AR-model order by selecting the best hypothesis H I given a sequence of N observations x 1, x 2,..., x N. We aim to estimate the simplest model (lowest AR-order) which maximizes the Mean Log-Likelihood (MLL) E {l n (p I )}. The single observation log likelihood of hypothesis H I given the parameter vector p I is defined by l n (p I ) = log f x (x n x n 1,..., x n I ; p I ) (3) The MLL is a function of the parameter vector p I and is, due to the (information theoretical) information inequality [11, (2.82)], bounded by the conditional neg(ative)-entropy: E {l n (p I )} < E { log g(x n x n 1,..., x n M ) } = H { x n x n 1,..., x n M } (4) where the conditional entropy H { } x n x n 1,..., x n M corresponds with the entropy per sample of the stationary stochastic process x. Equality holds if and only if g(x n x n 1,..., x n M ) = f x (x n x n 1,..., x n I ; p I ) and I M, i.e. if the correct conditional pdf can exactly be modeled. This can only be the case for p I = p I ; so searching for the vector p I for which the MLL has a maximum, the Maximum Mean Log-Likelihood (MMLL) searches for the optimal fit of f x to g. The method of Maximum Likelihood (ML) is based on this principle. Theoretically the MMLL is a non decreasing (increasing or flat) function of I. Adding superfluous parameters never decreases the MMLL, so the MMLL is flat for I M. Testing two composite hypotheses, the null-hypothesis H I and the alternative hypothesis H J, we distinguish two essentially different test situations: the critical test and the non-critical test. 3

In case of a critical test we are testing a correct null-hypothesis (I M) versus an alternative hypothesis (J > I) with (more) superfluous parameters. Both hypotheses are equally likely and have consequently an equal MMLL, so the null-hypothesis is the simplest and therefore the best choice. In AR-model order estimation the tests between hypotheses become critical if both hypotheses model the AR-process sufficiently (J > I M). The non-critical test occurs if the alternative hypothesis is significantly more likely than the null-hypothesis (I < M and J > I), consequently the MMLL s differ. Because the order estimation algorithm always performs better in the non-critical case than in the critical case the non-critical case is irrelevant for the development of an order estimation algorithm. All criteria: AIC, GIC and MIC, are based on a statistic, the Maximum Average Log-Likelihood (MALL), to estimate the MMLL. First we introduce the Average Log-Likelihood (ALL) being an unbiased statistic to estimate the MLL [1]: N Ê {l n (p I )} = 1 N log f x (x n x n 1,..., x n I ; p I ) (5) n=1 where Ê {...} denotes a statistic to estimate the mean. This ALL-function corresponds with a normalized log-likelihood function as in the method of ML [12, chapter 7][13, section 11.5 ]. Analogous to the average being an unbiased statistic to estimate the mean, the ALL is for every value of p I an unbiased statistic to estimate the MLL. The MMLL can be estimated by replacing p I in the ALL-function by the ML-estimate p I of p I ; the resulting statistic is the MALL. This MALL is biased with respect to the MMLL due to the finite number of observations [1]: } E {Ê {l( pi )} E {l(p I )} + dim p I 2N (6) where dim p I is the dimension, i.e. the number of elements, of p I. 2.1 Set testing using GIC The Akaike criterion [6, 4, 7] is a member of a family of criteria, the Generalized Information Criteria (GIC) [8] : GIC (λ K ) = Ê {l( p I)} + λ K dim bp I N (7) Different values for λ K are used: no correction (λ K = 0), Bhansali (λ K = 1 2 ) [14], Akaike (λ K = 1) [6], Broersen (λ K = 3 2 ) [15] and Åström (λ K = 2) [16]. Even considerably larger values of λ K, even depending on N, have been used [17, 18]. The estimated AR-order is the order for which GIC reaches a minimum. GIC can be used for both set testing and sequential testing. 2.2 Sequential testing using MIC The criterion MIC [1] is designed to test a pair of composite hypotheses with a test which can be compared with a likelihood-ratio test. The test statistic is 4

α K=1 K=2 K=3 K=4 K=5 1% 3.3174 3.3678 3.3717 3.3720 3.3717 2% 2.7059 2.7854 2.7949 2.7964 2.7960 5% 1.9207 2.0538 2.0799 2.0867 2.0885 10% 1.3528 1.5334 1.5810 1.5978 1.6033 20% 0.8212 1.0420 1.1169 1.1501 1.1671 Akaike 1.0000 1.0000 1.0000 1.0000 1.0000 Table 3: The λ K = Nη J,J+1,K as a function of the size K of the alternative set of hypotheses and the false alarm probability α. the difference in Maximum Average Log-Likelihood ( MALL 2 )): C I,J = Ê {l( p J)} Ê {l( p I)} (8) Notice that C I,J = C I,k + C k,j for any k. The statistic 2NC I,J is in case of a critical test approximately chi-squared distributed with (dim p J dim p I ) = (J I) degrees of freedom [1, 19, 20]. Consequently: E { } C I,J = BIAS { } C I,J = J I 2N VAR { } C I,J = J I (9) 2N 2 The expectation equals the bias because the difference in Maximum Mean Log- Likelihood ( MMLL) is zero in case of a critical test. In the non-critical case the statistic is, due to the central limit theorem, normally distributed with expectation MMLL. The range of the outcomes of the statistic will, depending on a threshold η high, be divided into two intervals: C I,J < η high accept H I C I,J η high accept H J (10) The constant η high depends on the probability α of erroneously selecting alternative hypothesis H J (a the too high order) and can be solved from the equation: α = 2Nη high x k 1 e 1 2 x dx where k = 1 Γ(k)2 k 2 (J I) (11) For J = I +1 the constant η high = λ 1 /N, where λ 1 equals λ K of (7) with K = 1, is shown in table 3. The criterion MIC, testing the I th order AR-model versus the J st order AR-model where J = I + 1, is equivalent to using GIC (λ 1 ) to test a pair of hypotheses. 3 Testing hypotheses Originally the theory with respect to MIC was developed for testing a composite null-hypothesis versus a composite alternative hypothesis [1]. In a previous publication [21] we have extended the theory to testing a composite null-hypothesis 2 ) In related work we used the term MALL-ratio instead. This term does note correctly represent the order in which the operations are performed: take the logarithm, take the average, find the maximum and divide the likelihoods consequently subtract the log-likelihoods. 5

versus an alternative set of composite hypotheses where all these hypotheses introduce one additional parameter. Now we will extend the theory to testing a composite null-hypothesis versus an alternative set of composite hypotheses where all these hypotheses introduce additional parameters incrementally. The null-hypothesis consists of an I th order model and the alternative set of hypotheses consists of K = L I models with order (I + 1), (I + 2),..., L. Now we generalize the theory of testing two composite hypotheses using MIC (see subsection 2.2) to testing a composite null-hypothesis versus an alternative set of composite hypotheses. We accept the null-hypothesis if, similar to (10), the following conditions are fulfilled; otherwise we accept the alternative set of hypotheses: C I,I+1 < η I,I+1,K and C I,I+2 < η I,I+2,K and (12)......... C I,L < η I,L,K There are K = L I different thresholds η I,J,K to be determined, where K is the number of elements in the alternative set of hypotheses. To reduce the number of degrees of freedom we choose a relation between the constants η I,J,K. To make the test procedure more Akaike-like, we have chosen for the relation: η I,J,K = λ K J I N (13) Alternative choices are possible. Now we possess the framework for testing the null-hypothesis consisting of an I th order model versus an alternative set of hypotheses consisting of the (I + 1) st, (I + 2) nd,..., L th order models. Performing these tests for increasing I till the null-hypothesis is selected leads to an algorithm to estimate the AR-order. An example of a Pascal implementation is: I := 0; REPEAT stop := true; FOR J := I+1 TO L DO stop := stop AND ( MALL[J] - MALL[I] <= lambda[l-i] * (J-I) / N ); IF NOT stop THEN I := I + 1; UNTIL stop; where MALL[J] corresponds with Ê {l( p J)} and lambda[l-i] corresponds with λ L I. The selected model order remains in the variable I. Suppose we want to perform the tests of (12) by using GIC or AIC. We select the null-hypothesis if the I th order model has minimum GIC and we select the alternative hypothesis if one of the models with order (I + 1), (I + 2),..., L has a minimum GIC. Every condition in (12) corresponds with comparing the GIC, as defined in (7), of the I th order model with the GIC of the J th order model: Ê {l( p J)} + λ K dim bp J N < Ê {l( p I)} + λ K dim bp I N This inequality is due to (dim p I ) = I + 1 and (8) equivalent to: (14) C I,J > λ K J I N = η I,J,K (15) 6

This inequality justifies our choice for the relation (13). This suggests that both criteria GIC and AIC should also depend K, which is due to K = L I a function of the maximum candidate order. In the next section we will derive λ K given an a priori determined probability on selecting a too high order (false alarm probability). L. 4 Computation of the threshold In the method described in the previous section there remains a threshold λ K to be determined. This threshold will be determined such that the probability on selecting a too high order α has a preselected value. In case of a critical test we are testing a correct null hypothesis versus a superfluous alternative hypothesis. Both hypotheses are equally likely, so the null-hypothesis is the best choice. The test statistic 2NC I,J is chi-squared distributed with J I (assume J > I) degrees of freedom [19, 20, 1]. In the relevant critical case we assume that all C I,I+1 for different I are independent amongst each other. At this moment we can not proof this assumption. The assumption seams reasonable because the sum 2NC I,J = 2NC I,I+1 + 2NC I+1,I+2 +...+2NC J 1,J must be chi-squared distributed with J degrees of freedom. This condition is satisfied if all 2NC I,I+1 are chi-squared distributed with one degree of freedom and are independent amongst each other. The resulting thresholds based on this assumption are verified by simulations. These simulation results match perfectly with the theory. We will show how to compute the λ K for K = 1, 2 and 3; the reader may extend the theory for K > 3. In case of a critical test and K = 1 the alternative hypothesis will, according to (11), erroneously be selected with a probability α = 2λ 1 f χ (c 1 ) dc 1 (16) where f χ is a chi-squared distribution with one degree of freedom, where c 1 = 2NC I,I+1 and where λ K = Nη J,J+1,K. Searching λ 1 for a preselected value of α such that (16) holds provides us with a threshold for K = 1 (see table 3) In case of K = 2 we have an alternative set of two hypotheses. Now two chi-squared distributed variables c 1 = 2NC I,I+1 and c 2 = 2NC I+1,I+2, where c 1 + c 2 = 2NC I,I+2, play a role. According to (12) the alternative set of hypotheses will be selected if one of the inequalities c 1 > 2λ 2 or c 1 + c 2 > 4λ 2 (17) is satisfied. The probability on erroneously selecting the alternative hypothesis can be computed by integration of the product of chi-squared distributions f χ (c 1 )f χ (c 2 ) over the area defined by (17): α = + 2λ 2 f χ (c 1 ) dc 1 2λ 2 4λ 2 c 1 f χ (c 1 )f χ (c 2 ) dc 2 dc 1 (18) 7

Solving λ 2 from this equation provides us with λ 2 given α for K = 2 (see also table 3) For K = 3 the set of conditions is: The corresponding equation is: (19) c 1 > 2λ 3 or c 1 + c 2 > 4λ 3 or c 1 + c 2 + c 3 > 6λ 3 α = + + 2λ 3 f χ (c 1 ) dc 1 2λ 3 4λ 3 c 1 f χ (c 1 )f χ (c 2 ) dc 2 dc 1 2λ 3 4λ 3 c 1 6λ 3 c 1 c 2 f χ (c 1 )f χ (c 2 )f χ (c 3 ) dc 3 dc 2 dc 1 (20) Generalization to higher values of K is in theory trivial. Computation or approximation of the threshold λ K for K > 5 is an unsolved numerical problem. We have tried to solve this problem using Mathematica. For K > 5 the integration method becomes too inaccurate or the computation effort becomes too large. For the missing values we have assumed λ K λ 5 ; this is for large K a reasonable assumption (see table 3). 5 Simulations Within the scope of this article we compare our estimated AR-orders with the estimated AR-orders using the Akaike criterion in case of set testing and sequential testing. For a more extended comparison of MIC with AIC and GIC see our earlier work [1, 10, 21]. The simulation results in table 4 to table 7 confirm our earlier results. Sequential testing applied to AIC performs significantly better than set testing [10]. In case of sequential testing the Akaike criterion has a theoretical probability of 15.7% [1] of estimating a too high order. Notice that sequential testing using AIC in an AR-order estimation context is equivalent to sequential testing using MIC where α = 15.7% Of course our method should select a 0 th order model in case of no model x n = ɛ n. The estimated number of 0 th order models in table 4 corresponds with the preselected value of α. Only for L = 10 and α = 20% some deviations are observed. These deviations are mainly caused by the practical approximation λ K λ 5 for K > 5. The Akaike criterion in case of set testing depends strongly on the maximum candidate order L. This problem disappears using sequential testing. Notice how accurate the theoretical probability α = 15.7%, in theory 8430 correct 0 th order identifications, is reproduced. The model x n = 0.55x n 1 + 0.05x n 2 + ɛ n is a typical example of a second order model which can (for small N easily) be misidentified as a first order model (see table 6). For N = 100 or N = 1000 the 1 st order model is erroneously preferred above the 2 nd order model. For the given number of samples N these models are indistinguishable. Our claim with respect to the probability on estimating a too high order is never seriously violated. Notice the better 8

performance of AIC using sequential testing instead of set testing. Our method performs better due to the performance control using different values of α. The model x n = 0.5x n 1 +0.25x n 2 +0.125x n 3 +0.0625x n 4 +0.0625x n 5 + 0.015625x n 6 +ɛ n is hard to identify as a sixth order model. Only for N = 10000 a reasonable number of times the correct order has been estimated (see table 6). Because λ K for K > 5 has never been used we expect no deviations. Our claim with respect to the probability on estimating a too high order is never violated. As expected the estimated order increases with the number of samples N. AIC in a set testing context performs better because this method will in general overestimate the correct order, which is a benefit in this situation. The last model x n = 0.5x n 1 0.25x n 4 + ɛ n where a 2 = a 3 = 0 is typically a model where set testing could perform better than sequential testing (see table 7). For N 1000 the set testing strategy using AIC performs better than the sequential testing strategy. For N = 10000 sequential testing performs better. Our method with α = 20% has for small N a similar performance as AIC using set testing. The overall performance of our method is in general better. Is for N = 1000 the 4 th order model significantly better than the 1 st order model? If these models are in 1731 cases more or less equivalent, sequential testing is to be preferred. The problem of indistinguishable models demands a better study. Due to the noticeable deviations we may conclude that for α = 20% it is important to develop techniques to compute λ K for K > 5. Except for this known deficiency the method of order selection works satisfactory. 6 Discussion Although many researchers think otherwise, sequential testing is to be preferred above sequential testing. Theoretically the MMLL is a non decreasing (increasing or flat) function. Adding superfluous parameters never decreases the MMLL. Consequently spurious local minima of the MALL can only be caused by statistical fluctuations. Notice that the bias is also an increasing function (6), so the bias correction is decreasing. Overcompensation for bias creates an absolute maximum of the bias corrected MALL function; this corresponds with a minimum of GIC (λ) where λ > 1 2. The position of this absolute maximum highly depends on the chosen value of λ, i.e. the overcompensation for bias. What is the meaning of the first local maximum of the MALL? At the first local maximum the estimation error is in the same order of magnitude as the increase in MMLL between the I th and (I + 1) st model, i.e. the improvement of the model is negligible with respect to the estimation noise. Consequently the models are indistinguishable. Only if the MMLL as a function of the order has a plateau, the strategy of selecting the first local maximum is incorrect. Broersen [15] states: the influence of L is very important for all asymptotic criteria, and remains a nuisance for the finite sample criteria. The value of λ in case of set testing should depend on L. Assume we increase L by adding a superfluous parameter. There is a finite probability that the AR-order corresponding to this superfluous parameter will be selected. To keep the probability on selecting a too high order constant, the probability on selecting any other model containing superfluous parameters should decrease. Therefore the value of λ should increase and depends therefore on L. The dependence is shown in table 3 and derived in section 3. 9

7 Conclusions In this publication, as in earlier work [1, 21], the correspondence between the theory and the simulations is so good that we consider the AR-model order estimation problem for low order AR-processes with Gaussian white noise as solved. Earlier work [10] showed that sequential testing leads to excellent results and set testing is insufficiently motivated. Except for AR-models where the maximum mean log-likelihood as a function of the order has a plateau we prefer sequential testing. Criteria like AIC, GIC or MIC should depend on the application and should therefore depend on the testing strategy: sequential testing or set testing. The missing information with respect to the testing strategy is a nuisance for criteria like AIC and GIC. In this publication we have presented an extension on the criterion MIC, which was designed for sequential testing, such that it can also be applied in a set testing strategy. Now we possess a theoretically well-founded criterion, MIC, which can both be used in sequential testing and set testing. This criterion allows in both situations performance control by means of the upper-bound on the probability of selecting a too high order (false alarm probability). For criteria like AIC and GIC using set testing the constant λ K should depend on K and therefore also on the maximum candidate order L. This is a serious shortcoming of AIC and GIC. At this moment, although in theory the values are known, the computation or approximation of the threshold λ K for K > 5 is an unsolved numerical problem. This problem is a limitation to our method. Furthermore a theoretical foundation of the assumption that the C I,I+1 s are independent is necessary. References [1] R. Moddemeijer, Testing composite hypotheses applied to AR order estimation; the Akaike-criterion revised, Submitted to IEEE Transactions on Signal Processing, 1997. [2] R. Moddemeijer, Testing composite hypotheses applied to AR order estimation; the Akaike-criterion revised, in Signal Processing Symposium (SPS 98), Leuven (B), Mar. 26-27 1998, pp. 135 138, IEEE Benelux Signal Processing Chapter. [3] R. Moddemeijer, Testing composite hypotheses applied to AR order estimation; the Akaike-criterion revised, in Nineteenth Symposium on Information Theory in the Benelux, P. H. N. de With and M. van der Schaar-Mitrea, Eds., Veldhoven (NL), May 28-29 1998, pp. 149 156, Werkgemeenschap Informatie- en Communicatietheorie, Enschede, (NL). [4] H. Akaike, A new look at the statistical model identification, IEEE Trans. on Information Theory, vol. 19, no. 6, pp. 716 723, 1974. [5] H. Cramér, Mathematical methods of statistics, Princeton, Princeton Univ. Press, 1945. [6] H. Akaike, Information theory and an extension of the maximum likelihood principle, in Proc. 2nd Int. Symp. on Information Theory, P. N. Petrov and F. Csaki, Eds., Budapest (H), 1973, pp. 267 281, Akademia Kiado. 10

[7] Y. Sakamoto, Akaike information criterion statistics, Reidel Publ. Comp., Dordrecht (NL), 1986. [8] P. M. T. Broersen and H. E. Wensink, On the penalty factor for autoregressive order selection in finite samples, IEEE Trans. on Signal Processing, vol. 44, no. 3, pp. 748 752, 1996. [9] H. L. van Trees, Detection, estimation, and modulation theory part I, John Wiley & Sons, Inc., New York, 1968. [10] R. Moddemeijer, Application of information criteria to AR order estimation, Resubmitted as Technical Paper for publication in IEEE Transactions on Automatic Control, 1998. [11] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, Inc., New York, 1991. [12] S. Brandt, Statistical and Computational Methods in Data Analysis, North-Holland Publ. Comp., Amsterdam (NL), 2 nd edition, 1976. [13] E. Kreyszig, Introductory Mathematical Statistics, John Wiley & Sons, Inc., New York, 1970. [14] R. J. Bhansali, A Monte Carlo comparison of the regression method and the spectral methods of regression, Journal American Statistical Association, vol. 68, no. 343, pp. 621 625, 1973. [15] P. M. T. Broersen, The ABC of autoregressive order selection criteria, in 11 th IFAC Symp. on System Identification, SYSID 97, Kitakyushu, Fukuoka, Japan, July 8-11 1997, vol. 1, pp. 231 236, Society of Instrument and Control Engineers (SICE). [16] K. J. Åström, Maximum likelihood and prediction error methods, Automatica, vol. 16, pp. 551 574, 1980. [17] J. Rissanen, Modelling by shortest data description, Automatica, vol. 14, pp. 465 471, 1978. [18] E. J. Hannan and B. G. Quinn, The determination of the order of an autoregression, J. R. Statist. Soc. Ser. B, vol. 41, pp. 190 195, 1979. [19] S. S. Wilks, The large-sample distribution of the likelihood ratio for testing composite hypotheses., Ann. Math. Stat., vol. 9, pp. 60 62, 1938. [20] V. K. Rohatgi, An introduction to probability theory and mathematical statistics, John Wiley & Sons, Inc., New York, 1976. [21] R. Moddemeijer, An efficient algorithm for selecting optimal configurations of AR-coefficients, Submitted for publication in IEEE Transactions on Signal Processing, 1999. 11

N=100 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 9903 89 7 1 0 5% 9503 383 78 28 8 20% 8013 1020 487 257 223 AIC (set) 7453 1180 646 394 327 L=6 1% 9903 89 7 1 0 0 0 5 % 9503 384 77 27 8 1 0 20% 7991 996 467 234 142 101 69 AIC (set) 7292 1141 610 351 263 201 142 L=10 1% 9903 89 7 1 0 0 0 0 0 0 0 5% 9503 384 77 27 8 1 0 0 0 0 0 20% 7934 985 465 226 141 103 45 41 34 15 11 AIC (set) 7155 1119 598 332 243 174 111 104 78 43 43 AIC (seq.) 8440 1311 212 29 8 0 0 0 0 0 0 N=1000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 9911 81 7 1 0 5% 9540 352 77 24 7 20% 8074 986 459 269 212 AIC (set) 7533 1160 609 395 303 L=6 1% 9911 81 7 1 0 0 0 5% 9539 352 76 23 7 3 0 20% 8021 964 420 262 138 106 89 AIC (set) 7355 1110 570 366 238 190 171 L=10 1% 9911 81 7 1 0 0 0 0 0 0 0 5% 9539 352 76 23 7 3 0 0 0 0 0 20% 7965 953 414 262 136 98 59 42 31 20 20 AIC (set) 7229 1080 558 352 220 170 130 87 71 55 48 AIC (seq.) 8486 1275 202 33 2 2 0 0 0 0 0 N=10000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 9894 97 7 2 0 5% 9489 402 72 30 7 20% 7952 1047 478 281 242 AIC (set) 7399 1188 632 426 355 L=6 1% 9894 97 7 2 0 0 0 5% 9489 402 71 31 5 2 0 20% 7891 1028 452 260 151 115 103 AIC (set) 7226 1152 599 374 269 200 180 L=10 1% 9894 97 7 2 0 0 0 0 0 0 0 5% 9488 402 71 31 5 2 0 1 0 0 0 20% 7838 1020 446 260 150 105 71 33 23 35 19 AIC (set) 7124 1123 581 351 246 166 135 88 61 73 52 AIC (seq.) 8404 1335 215 39 6 1 0 0 0 0 0 Table 4: The estimated AR-order in case of 10.000 estimations of the 0 th order AR-process x n = ɛ n as a function of the number of samples N and the maximum candidate order L. We have estimated the order using the algorithm proposed in section 3 in case of α = 1%, 5% and 20%. The results are compared with the estimated order using the Akaike criterion using both set testing and sequential testing. 12

N=100 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 9 9856 126 8 1 5% 5 9338 542 80 35 20% 2 7682 1308 521 487 AIC (set) 1 7299 1478 667 555 L=6 1% 9 9856 126 8 1 0 0 5 % 5 9343 537 76 31 6 2 20% 2 7698 1229 465 314 147 145 AIC (set) 1 7047 1408 604 449 254 237 L=10 1% 9 9856 126 8 1 0 0 0 0 0 0 5% 5 9343 537 76 31 6 2 0 0 0 0 20% 2 7621 1214 464 300 136 97 72 42 29 23 AIC (set) 1 6878 1363 585 415 219 174 122 105 75 63 AIC (seq.) 3 8130 1598 222 42 3 2 0 0 0 0 N=1000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 8590 1392 15 3 5% 0 6736 3016 199 49 20% 0 4319 4193 863 625 AIC (set) 0 3914 4385 1038 663 L=6 1% 0 8589 1392 15 3 1 0 5 % 0 6749 3003 187 42 12 7 20% 0 4370 4053 792 370 240 175 AIC (set) 0 3757 4138 951 523 369 262 L=10 1% 0 8589 1392 15 3 1 0 0 0 0 0 5% 0 6747 3004 185 42 12 6 4 0 0 0 20% 0 4320 4013 775 359 225 116 66 57 36 33 AIC (set) 0 3656 4010 918 483 338 197 135 110 83 70 AIC (seq.) 0 4435 4672 745 117 27 3 0 1 0 0 N=10000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 95 9809 86 10 5% 0 12 9474 409 105 20% 0 0 7966 1175 859 AIC (set) 0 0 7826 1319 855 L=6 1% 0 95 9809 86 10 0 0 5 % 0 13 9469 395 93 19 11 20% 0 0 7960 1026 485 270 259 AIC (set) 0 0 7379 1197 648 413 363 L=10 1% 0 95 9809 86 10 0 0 0 0 0 0 5% 0 13 9472 392 91 20 8 2 1 1 0 20% 0 0 7901 992 466 236 161 97 68 42 37 AIC (set) 0 0 7152 1139 594 353 261 179 127 98 97 AIC (seq.) 0 0 8395 1319 242 35 6 3 0 0 0 Table 5: As table 4 for the 2 nd order AR-process x n = 0.55x n 1 +0.05x n 2 +ɛ n 13

N=100 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 873 7064 1928 135 5% 0 234 5358 3636 772 20% 0 40 2807 4355 2798 AIC (set) 0 31 2692 4771 2506 L=6 1% 0 872 7070 1918 131 8 1 5 % 0 234 5367 3593 652 131 23 20% 0 44 2922 4165 1671 730 468 AIC (set) 0 28 2447 4130 1890 960 545 L=10 1% 0 872 7070 1918 131 8 1 0 0 0 0 5% 0 233 5376 3585 648 130 20 5 3 0 0 20% 0 44 2927 4098 1599 663 283 151 107 66 62 AIC (set) 0 28 2343 3873 1762 833 404 258 228 138 133 AIC (seq.) 0 79 3364 4738 1472 293 41 13 0 0 0 N=1000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 0 2 4629 5369 5% 0 0 0 2351 7649 20% 0 0 0 1506 8494 AIC (set) 0 0 0 1037 8963 L=6 1% 0 0 2 4533 4805 632 28 5 % 0 0 0 2241 5641 1826 292 20% 0 0 0 784 4593 3064 1559 AIC (set) 0 0 0 652 4534 3370 1444 L=10 1% 0 0 2 4532 4805 632 27 2 0 0 0 5% 0 0 0 2253 5655 1801 234 43 12 1 1 20% 0 0 0 812 4684 2837 874 363 200 127 103 AIC (set) 0 0 0 604 4203 2893 1084 501 322 208 185 AIC (seq.) 0 0 0 1037 5068 3113 651 113 16 2 0 N=10000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 0 0 0 10000 5% 0 0 0 0 10000 20% 0 0 0 0 10000 AIC (set) 0 0 0 0 10000 L=6 1% 0 0 0 0 899 7635 1466 5 % 0 0 0 0 256 6370 3374 20% 0 0 0 0 46 3897 6057 AIC (set) 0 0 0 0 42 4431 5527 L=10 1% 0 0 0 0 902 7678 1397 20 3 0 0 5% 0 0 0 0 261 6587 2889 201 41 18 3 20% 0 0 0 0 51 4351 4043 785 383 205 182 AIC (set) 0 0 0 0 41 3757 4113 945 519 343 282 AIC (seq.) 0 0 0 0 78 4431 4631 734 110 13 3 Table 6: As table 4 for the 6 th order AR-process x n = 0.5x n 1 + 0.25x n 2 + 0.125x n 3 + 0.0625x n 4 + 0.0625x n 5 + 0.015625x n 6 + ɛ n 14

N=100 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 42 8639 187 169 963 5% 14 5662 321 359 3644 20% 5 2168 236 353 7238 AIC (set) 3 1780 249 476 7492 L=6 1% 42 8635 187 169 949 16 2 5 % 14 5640 317 384 3409 188 48 20% 5 2203 251 421 5516 923 681 AIC (set) 3 1617 222 403 5889 1134 732 L=10 1% 42 8635 187 169 949 16 2 0 0 0 0 5% 14 5634 319 382 3404 184 44 15 1 1 2 20% 5 2165 251 433 5464 780 381 222 132 89 78 AIC (set) 3 1544 217 385 5439 981 547 337 236 170 141 AIC (seq.) 7 7314 1058 220 1195 165 37 4 0 0 0 N=1000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 0 0 0 10000 5% 0 0 0 0 10000 20% 0 0 0 0 10000 AIC (set) 0 0 0 0 10000 L=6 1% 0 0 0 0 9894 102 4 5 % 0 0 0 0 9472 434 94 20% 0 0 0 0 7972 1212 816 AIC (set) 0 0 0 0 7861 1344 795 L=10 1% 0 0 0 0 9894 102 4 0 0 0 0 5% 0 0 0 0 9476 417 85 17 4 1 0 20% 0 0 0 0 7940 1043 460 246 137 95 79 AIC (set) 0 0 0 0 7329 1175 572 359 234 181 150 AIC (seq.) 0 1731 15 0 6930 1128 164 28 4 0 0 N=10000 0 1 2 3 4 5 6 7 8 9 10 L=4 1% 0 0 0 0 10000 5% 0 0 0 0 10000 20% 0 0 0 0 10000 AIC (set) 0 0 0 0 10000 L=6 1% 0 0 0 0 9904 86 10 5 % 0 0 0 0 9466 420 114 20% 0 0 0 0 7966 1188 846 AIC (set) 0 0 0 0 7856 1304 840 L=10 1% 0 0 0 0 9904 87 9 0 0 0 0 5% 0 0 0 0 9459 400 97 33 8 2 1 20% 0 0 0 0 7930 995 468 257 166 93 91 AIC (set) 0 0 0 0 7250 1130 622 377 267 182 172 AIC (seq.) 0 0 0 0 8455 1304 200 33 8 0 0 Table 7: As table 4 for the 4 th order AR-process x n = 0.5x n 1 0.25x n 4 + ɛ n 15