Finite Sample Confidence Regions for Parameters in Prediction Error Identification using Output Error Models

Size: px

Start display at page:

Download "Finite Sample Confidence Regions for Parameters in Prediction Error Identification using Output Error Models"

Kory Henderson
5 years ago
Views:

Proceedings of the 7th World Congress The International Federation of Automatic Control Finite Sample Confidence Regions for Parameters in Prediction Error Identification using Output Error Models

1 Proceedings of the 7th World Congress The International Federation of Automatic Control Finite Sample Confidence Regions for Parameters in Prediction Error Identification using Output Error Models Arnold J. den Dekker Xavier Bombois Paul M.J. Van den Hof Delft Center for Systems and Control, Delft University of Technology, Delft, The etherlands (Tel: ; a.j.dendekker@tudelft.nl). Abstract: The purpose of this paper is to evaluate the reliability in finite samples of different methods for constructing probabilistic parameter confidence regions in prediction error identification using Output Error (OE) models. The paper presents alternatives to the classical method of constructing asymptotically valid confidence regions, which is based on the asymptotic statistical properties of the parameter estimator. It is shown that if alternative test statistics are used, more reliable confidence regions for finite samples can be obtained. Particularly, it is demonstrated that the use of a test statistic based on the Fisher score allows the construction of exact confidence regions for finite samples. Keywords: Prediction error methods; uncertainty; system identification; statistical inference; parameter identification; maximum likelihood estimators; parameter estimation.. ITRODUCTIO Prediction error (PE) methods have become a wide-spread technique for system identification. In PE identification, the reliability of the parametric dynamic models identified on the basis of measurement data is generally limited due to noise disturbances and the finite length of the data. The needfor quantifying model uncertainties is especially relevant when identified models are used as a basis for model based control, monitoring, simulation or any other model based decision-making. An indication of the reliability of the identified model is given by probabilistic confidence regions for its parameters. A confidence region is a region in the parameter space that attempts to cover the true but unknown parameter vector a nominal probability. Confidence regions are exact if the coverage probability equals the nominal probability. The confidence regions that are most widely used in PE identification are derived from the asymptotic statistical properties of the parameter estimator. These regions generally have a simple ellipsoidal shape, but for finite data lengths their nominal level is often misleading. Finite-time analysis of parameter inference in PE identification is an important problem, however few results so far. For some results see e.g. (Campi andweyer, 2002; Weyer andcampi, 2002; Campi and Weyer, 2005). In this paper, we consider the construction of confidence regions for the parameters of Output Error (OE) models. The classical method as well as a recently proposed alternative (Douma and Van den Hof, 2005) are briefly reviewed. In addition, three new alternative methods ( finite time perspectives) are proposed, two of which are based onlikelihood theory. The goal of the paper is to evaluate, validate and compare the reliabilityofdifferentmethods forconstructingconfidence regions for finite data lengths. This is done by means of Monte-Carlo simulation experiments. It is shown that although all methods considered are equivalent in very large samples, for finite data lengths the methods show different reliability. Results of a similar study for the case of ARX (Auto Regression exogenous inputs) models were presented earlier (den Dekker et al., 2007). 2. STATISTICAL IFERECE I PREDICTIO ERROR IDETIFICATIO We will consider dynamical data generating systems of the form y(t) = G 0 (q)u(t) + H 0 (q)e(t) () q the standard shift operator, y(t) the stochastic (measurable) output signal, u(t) the deterministic (measurable) input signal and e(t) (non-measurable) zero-mean Gaussian white noise. In (), G 0 (z) and H 0 (z) are proper rational transfer functions that have no poles in z, which means that the system is stable. In addition, H 0 (z) will be restricted to be monic and minimum-phase. The one-step aheadpredictor ofy(t), given the system () and given the observations (y(s), u(s)), s t, is given by ŷ(t t ) = H0 (q)g 0(q)u(t) + [ H0 (q)]y(t), (2) which can be rewritten as y(t) = ŷ(t t ) + e(t). (3) The one-step ahead predictor (2) is the best one-step ahead predictor in the sense of the conditional expectation (Ljung, 999). In reality, the true system (G 0 (z), H 0 (z)) is generallyunknown, andpredictormodels determinedbya collection of two rational transfer functions (G(z), H(z)) are considered instead. A predictor model set M is defined as any collection of predictor models: M := (G(q, θ), H(q, θ)) θ Θ R n (4) /08/$ IFAC / KR

2 7th IFAC World Congress (IFAC'08) θ a real valued parameter vector ranging over a subset of R n. It is assumed that this model set is composed of predictor models (i.e., transfer functions) that satisfy the same conditions of properness, stability and monicity as the transfer functions H 0 (z) and G 0 (z) described above. Underlying the set of models, there is a parameterization that determines the specific relation between a parameter θ Θ and a model M in M. If we assume that the data generating system S belongs to the model set (S M), there exists an exact parameter θ 0 reflecting the transfer functions G 0 and H 0 and one may rewrite (3) as y(t) = ŷ(t t ; θ 0 ) + e(t), (5) ŷ(t t ; θ) = H (q, θ)g(q, θ)u(t)[ H (q, θ)]y(t). (6) The model of the observations is given by y(t) = ŷ(t t ; θ) + ε(t, θ), (7) ε(t, θ) the prediction errors. Since S M, the prediction errors evaluated at θ 0 are equal to e(t) and thus zero mean, independent, Gaussian distributed, probability density function (PDF) [ f ε (ε(t, θ 0 ); θ 0 ) = exp ] 2πσ 2 2σ 2ε2 (t, θ 0 ) (8) σ 2 the variance of e(t). The joint PDF of the observations y = y(t),, (conditioned on the given deterministic input sequence u ) is given by: f y (y ; θ 0 ) = f ε (y(t) ŷ(t t ; θ 0 )) = f ε (ε(t, θ 0 ); θ 0 ). (9) Taking the logarithm yields: log f y (y ; θ 0 ) = log f ε (ε(t, θ 0 ); θ 0 ), (0) which can be written as log f y (y ; θ 0 ) = 2 2. The Fisher score V (θ 0 ) = log(2π) log σ 2σ 2 V (θ 0 ) () ε(t, θ 0 ) 2. (2) The Fisher score S(θ) is defined as S(θ) = log f y(y ; θ). (3) θ It can be shown that the Fisher score (3) evaluated at the true value θ 0 of θ has mean zero: E[S(θ 0 )] = 0. (4) It follows from (3) and () that the Fisher score can be written as S(θ) = 2σ 2 V (θ) (5) V (θ) the first derivative of V (θ) respect to θ. 2.2 The Fisher information matrix The covariance matrix of the Fisher score S(θ 0 ) is described by J(θ 0 ) = E [ S(θ 0 )S T (θ 0 ) ] (6) which is known as the Fisher information matrix (Fisher, 922). It can be shown that J(θ 0 ) may alternatively be written as J(θ 0 ) = E [ 2 log f y (y ; θ) θ 2 θ=θ0 ]. (7) It follows from (), (5), (6) and (7) that the Fisher information matrix is given by: J(θ 0 ) = 2 4σ 4 E[ V (θ 0 )V T (θ 0 ) ], (8) or, alternatively, by J(θ 0 ) = 2σ 2 E[V (θ 0 )], (9) V (θ) the second derivative of V (θ) respect to θ. Furthermore, by the multivariate centrallimit theorem, it is generally derived that, for, S(θ 0 ) (0, J(θ 0 )), (20) that is, the Fisher score is asymptotically normally distributed expectation value zero and covariance matrix J(θ 0 ) (Wilks, 962). 2.3 The likelihood function and the maximum likelihood estimator By substituting the available observations y for the corresponding indeterminate variables in (9) and regarding the resulting expression as a function of the parameter vector θ for fixed observations y, the likelihood function, written as f y (θ; y ), is obtained. The maximum likelihood estimator (MLE) of θ 0 is given by ˆθ = arg maxf y (θ; y ) = argmin V (θ). (2) θ θ Fisher (922) has shown that, for, ˆθ (θ 0, J (θ 0 )). (22) Furthermore, Wald (949) has shown that, under very general conditions, the MLE ˆθ is a consistent estimator. Finally, it can be shown that the MLE of σ 2 is given by ˆσ ML 2 = V (ˆθ ), (23) whereas a (slightly) more accurate estimator of σ 2 is given by ˆσ 2 = n V (ˆθ ). (24) 3. OUTPUT ERROR MODELLIG The OE model structure describes the input-output relationship of a linear dynamical system as in () G(q, θ) = q n k B(q, θ) F(q, H(q, θ) = (25), θ) n k the delay and B(q, θ) = ( b 0 + b q + + b nb q n b+ ), (26) F(q, θ) = + f q + + f nf q n f, (27) θ T = [b 0, b, b nb, f,, f nf ]. In an output error model structure we consider the one-step ahead predictor B(q, θ) ŷ(t t ; θ) = u(t) (28) F(q, θ) 5025

3 7th IFAC World Congress (IFAC'08) and we denote the predictor derivative: ψ(t, θ) = ; θ). (29) θŷ(t t Furthermore, define Ψ(θ) = ψ T (, θ).. ψ T (, θ). (30) Suppose that the data generating system belongs to the model class (S M). The MLE ˆθ of θ 0 is then given by (2), where V (θ) = (y(t) ŷ(t t ; θ)) 2 (3) and the Fisher score is equal to S(θ) = 2σ 2 V (θ) = σ 2 ψ(t, θ)[y(t) ŷ(t t ; θ)] = σ 2 ΨT (θ)(y ŷ(θ)), (32) y = [y(), y(2),..., y()] T, (33) and ŷ(θ) = [ŷ( 0; θ), ŷ(2 ; θ),...,ŷ( ; θ)] T. (34) Evaluating (32) at θ = θ 0 yields: S(θ 0 ) = σ 2 ΨT (θ 0 )e, (35) e = [e() e()] T. It is very important to note that for OE models, the Fisher score (35) is exactly normally distributed (due to the fact that the term Ψ(θ 0 ) is deterministic) and the asymptotic result (20) holds for finite data as well: S(θ 0 ) (0, J(θ 0 )),, (36) where the Fisher information matrix J(θ 0 ) is given by: J(θ 0 ) = E [ S(θ 0 )S T (θ 0 ) ] = σ 2 ΨT (θ 0 )Ψ(θ 0 ). (37) As we will see later, this result allows us to construct confidence regions that are exact also for finite values of, which is remarkable since in the classical approach, based onthe statisticalproperties ofthe parameterestimator, exact confidence regions for finite samples are only obtained for linear regression models deterministic regressors, such as Finite Impulse Response (Ljung, 999). 4. COFIDECE REGIOS Confidence regions can be interpreted as the result of hypothesis testing. If we want to test the null hypothesis H 0 : θ 0 = θ (38) against the alternative hypothesis H : θ 0 θ, (39) at significance level α, where the significance level is defined as the probability of rejecting H 0 when H 0 is true, we first have to construct a test statistic known distribution under H 0. Such a test statistic will be a function of θ and y. The general test principle now states that the null hypothesis H 0 is rejected if the sample value of the test statistic used is larger than some user specified threshold. Knowledge of the PDF of the test statistic under H 0 allows one to compose tests (i.e., set thresholds) a desired significance level. This principle can now be used to compose confidence regions for the parameters θ 0. This is done as follows. First, select a test statistic for testing the null hypothesis θ 0 = θ against the alternative θ 0 θ, at significance level α. A 00( α)% confidence region for θ 0 is then constituted by the set of all values θ for which the null hypothesis θ 0 = θ would be accepted. 4. The classical approach In the classical approach to constructing confidence regions for the parameters of OE models, the starting point is a first order Taylor expansion: (ˆθ θ 0 ) [V (θ 0 )] [V (θ 0 )], (40) V (θ 0 ) = 2 ΨT (θ 0 )Ψ(θ 0 ) (assuming a deterministic input sequence u ) (Ljung, 999). It can be shown that V (θ 0) (0, 2σ2 V (θ 0 )). (4) Hence, inthe firstordertaylorapproximation, anexpression for the covariance matrix of (ˆθ θ 0 ) is given by P = [V (θ 0 )] 2σ 2 V (θ 0 )[V (θ 0 )] = 2σ2 [V (θ 0 )]. (42) This covariance matrix is approximated using an estimate of V (θ 0 ) given by the term 2 ΨT (ˆθ )Ψ(ˆθ ) to arrive at the test statistic σ 2 (ˆθ θ) T Ψ T (ˆθ )Ψ(ˆθ )(ˆθ θ). (43) Under H 0, the test statistic (43) has approximately a χ 2 n distribution, i.e., a chi-square distribution n degrees of freedom. An approximately valid 00( α)% confidence region for θ 0 is then given by θ σ 2 (ˆθ θ) T Ψ T (ˆθ )Ψ(ˆθ )(ˆθ θ) χ 2 n, α, (44) where χ 2 n, α is the α quantile of the chi-square distribution n degrees of freedom (cfr. Mood et al. (974)). This confidence region corresponds the one implemented in the MatlabSystem IdentificationToolbox (Ljung, 2003) and has also been derived by Douma and Van den Hof (2006) using an alternative paradigm for probabilistic uncertainty bounding. Alternatively, (44) can be derived starting from the (asymptotic) statistical properties of the MLE described in section 2. It follows from (22), and the consistency property of the MLE that the quadratic form (ˆθ θ 0 ) T J(ˆθ )(ˆθ θ 0 ). (45) has asymptotically (i.e., for ) a χ 2 n distribution. (Kay, 998). Then, if we want to test the null hypothesis (38) against the alternative hypothesis (39) the test statistic T W = (ˆθ θ) T J(ˆθ )(ˆθ θ), (46) which is known as the Wald test statistic, may be used. Asymptotically, (46) has a χ 2 n distribution under H 0. It follows from section 3 that for the case of normally distributed data y and an OE model structure, the test statistic T W can be written as: 5026

4 7th IFAC World Congress (IFAC'08) T W = σ 2 (ˆθ θ) T Ψ T (ˆθ )Ψ(ˆθ )(ˆθ θ), (47) which equals (43). Finally, it can be shown that if σ 2 is replaced by ˆσ 2, the distribution of (43) is better approximated by an F distribution and the term χ 2 n, α in (44) is to be replaced by nf n, n, α, F n, n, α the α quantile of the F distribution n and degrees of freedom. This leads to the following asymptotically valid 00( α)% confidence region for θ 0 : θ n (ˆθ θ) T Ψ T (ˆθ )Ψ(ˆθ )(ˆθ θ) ˆσ 2 F n, n, α (48) 4.2 An alternative approach out Taylor approximation An alternative approach, out Taylor approximation, was recently proposedby Douma and Van den Hof (2005). Since the MLE ˆθ of θ 0 is given by Eq.(2), it must satisfy V (ˆθ ) = 0. By defining y F (t) = F(q, ˆθ ) y(t); u F (t) = F(q, ˆθ ) u(t) (49) the equation V (ˆθ ) = 0 can be rewritten as [ F(q, ˆθ )y F (t) z n k B(q, ˆθ ] )u F (t) ψ(t, ˆθ ) = 0. (50) The parameter estimate ˆθ satisfying these equations can now be written in a linear regression-type equation: ˆθ = ( Ψ T Φ ) Ψ T y F (5) Ψ = Ψ(ˆθ ), y F = [y F () y F ()] T, [ Φ T = ϕ F (, ˆθ ),, ϕ F (, ˆθ ] ), (52) and ϕ T F (t, ˆθ ) = [u F (t n k ) u F (t n k n b + ) The system s relations y F (t ) y F (t n f )]. (53) y(t) = q n k B 0 (q) u(t) + e(t) (54) F 0 (q) can then be rewritten as: F 0 (q)y F (t) = q n k B 0 (q)u F (t) + F 0(q) F(q, ˆθ e(t), (55) ) which can be rewritten in the regression form: y F = Φθ 0 + e F, (56) where e F = F0(q) [e() F(q,ˆθ e()]t. Substituting (56) in ) (5) now delivers: ( Ψ T Φ ) (ˆθ θ 0 ) = Ψ T e F. (57) Unlike the Taylor approximation (40), (57) is an exact result. Unfortunately, the statistical distribution of the random variable Ψ T e F is unknown for finite values of. It can be shown, however, that asymptotically Ψ T e F is normally distributed zero mean and covariance matrix Q = σ 2 Ψ(θ 0 )Ψ T (θ 0 ). Therefore, the test statistic (ˆθ θ) T P D (ˆθ θ) P D = ( Ψ T Φ ) ( Q Φ T Ψ ) (58) is asymptotically χ 2 n distributed under H 0. By replacing the term Ψ(θ 0 )Ψ(θ 0 ) T by the estimate ΨΨ T, and σ 2 by ˆσ 2, Douma and Van den Hof (2005) arrive at the following asymptotically valid 00( α)% confidence region for θ 0 : θ (ˆθ θ) T Ps (ˆθ θ) nf n, n, α, (59) P s = ( Ψ T Φ ) ˆσ2 Ψ T Ψ ( Φ T Ψ ). (60) 4.3 A new approach out Taylor approximation A new alternative approach starts again from the equation V (ˆθ ) = 0, or equivalently, ψ(t, ˆθ )ε(t, ˆθ ) = 0. (6) It can be shown that, in the case of OE systems, ε(t, ˆθ ) = e(t) ϕ T oe(t, θ 0 )(ˆθ θ 0 ), (62) ϕ T oe (t, θ 0) = [u F (t n k ) u F (t n k n b + ) G(q, θ 0 )u F (t ) G(q, θ 0 )u F (t n f )] (63) being a vector dimension n = n b + n f. Substituting (62) for ε(t, ˆθ ) in (6) and using a matrix notation it follows Ψ T Φ oe (θ 0 )(ˆθ θ 0 ) = Ψ T e, (64) Ψ = Ψ(ˆθ ) and Φ T oe(θ 0 ) = [ϕ oe (, θ 0 ),, ϕ oe (, θ 0 )]. Pursuing a similar line of reasoning as in subsection 4.2, we can conclude that the test statistic (ˆθ θ) T P B (θ)(ˆθ θ), P B (θ) = ( Ψ T Φ oe (θ) ) σ 2 Ψ T Ψ ( Φ T oe (θ)ψ), (65) is asymptotically χ 2 n distributed under H 0. An asymptotically valid 00( α)% confidence region for θ 0 can then be formulated as θ (ˆθ θ) T Poe (θ)(ˆθ θ) nf n, n, α, (66) P oe (θ) = ( Ψ T Φ oe (θ) ) ˆσ2 Ψ T Ψ ( Φ T oe(θ)ψ ). (67) ote that unlike P s in (59) and Ψ T Ψ in (48), the term P oe in (66) depends on θ. Therefore, the construction of (66) is generally computationally expensive, requiring the evaluation of P oe (θ) at a sufficient number of points to produce contours. In addition, whereas (48) and (59) are ellipsoids, confidence region (66) generally is not. 4.4 An approach based on the likelihood ratio A second alternative approach is to use a test statistic that is based on a comparison of (maximized) likelihood functions under the hypotheses H 0 and H. Since the models underlying these hypotheses are nested, the generalized likelihood ratio (LR) L G = f y(θ; y ) sup θ f y (θ; y ) = f y(θ; y ) (68) f y (ˆθ ; y ) is bound to be between 0 (likelihoods are non-negative) and. It has been shown that under certain regularity conditions, the test statistic T LR = 2 logl G (69) 5027

5 7th IFAC World Congress (IFAC'08) has asymptotically a χ 2 n distribution under H 0 (Kay, 998). It follows from section 3 that for the case of normally distributed data y and an OE model structure, the test statistics T LR can be written as: T LR = ) (V σ 2 (θ) V (ˆθ ), (70) Based on this test statistic, the following (asymptotically valid) 00( α)% confidence region for θ 0 can be derived: ( ) θ V (θ) V (ˆθ ) σ 2 χ 2 n, α, (7) If σ 2 is unknown and replaced by ˆσ 2, χ 2 n, α is replaced by nf(n, n, α) to arrive at ( ) V (θ) V (ˆθ ) θ F n, n, α (72) n ˆσ 2 ote that the construction of (7) and (72) is generally computationally expensive, requiring the evaluation of V (θ) at a sufficient number of points to produce contours. Likelihood ratio based confidence regions have also been discussed by Quinn et al. (2005). 4.5 An exact Fisher score based finite sample approach A third new approach is based on the statistical properties of the Fisher score described in section 3. It follows from (36) that for OE models the quadratic form S(θ 0 ) T J (θ 0 )S(θ 0 ) (73) has a χ 2 n distribution. Then, if we want to test the null hypothesis (38) against the alternative hypothesis (39) the test statistic T R = S(θ) T J (θ)s(θ), (74) which is known as the Rao (or score) test statistic (Kay, 998) may be used, since it is known to be exactly χ 2 n distributed under H 0 (for all ). It follows from section 3 that (74) can be written as: T R = σ 2 (y ŷ(θ))t P(θ)(y ŷ(θ)), (75) P(θ) = Ψ(θ)(Ψ T (θ)ψ(θ)) Ψ T (θ). (76) ote that the matrix (76) is an orthogonal projection matrix since it is symmetric and idempotent, i.e., P 2 = P. Based on this test statistic, the following exact 00( α)% confidence regions for θ 0 can be derived: θ (y ŷ(θ)) T P(θ)(y ŷ(θ)) σ 2 χ 2 n, α (77) If σ 2 is unknown, it is known from nonlinear regression theory (Hamilton, 986; Seber and Wild, 989) that it is stillpossible toconstructanexactconfidence regionbased onthe Rao test statistic. Sucha regionis obtained byusing the following variant of the test statistic T R : T R = n (y ŷ(θ)) T P(θ)(y ŷ(θ)) n (y ŷ(θ)) T [I P(θ)](y ŷ(θ)), (78) I the identity matrix. It can be shown that the test statistic T R is the ratio of two independent χ2 distributed random variables n and n degrees of freedom, respectively. Therefore, it is exactly F n, n distributed under H 0. This leads to the following exact 00( α)% confidence region: θ n n (y ŷ(θ)) T P(θ)(y ŷ(θ)) (y ŷ(θ)) T [I P(θ)](y ŷ(θ)) F n, n, α (79) ote that the construction of (77) and (79) is generally computationally expensive, requiring the evaluation of P(θ) ata sufficientnumber ofpoints to produce contours. The method of obtaining the exact confidence region (79) is often referred to as the lack-of-fit method (Donaldson and Schnabel, 987; Gallant, 987). 5. SIMULATIO EXPERIMET In a MATLAB environment, a Monte Carlo simulation experiment was performed to evaluate and compare the methods for computing confidence regions described in the preceding sections. For different data lengths, K data sets (y, u ) = y(t), u(t),, were generated using a data generating system S that is completely known and belongs to the OE model class: G 0 (q) = q (b 0 + b q ) + f q + f 2 q 2, H 0(q) =, (80) b = 0.047, b 2 = , f =.5578 and f 2 = For each value of, we used a fixed input sequence u, u a realization of a zero mean, Gaussian distributed white noise process variance σu 2 = being uncorrelated the zero mean, Gaussian distributed white noise process e(t) variance σ 2 =. For each value of, K different data sets were obtained by repeating the same experiment K times, where each time only the noise realization e was different. From each data set, the model was identified using a model set M the same OE structure as the data generating system: G(q, θ) = q (b 0 + b q ) + f q ; H(q, θ) =, (8) + f 2 q 2 θ T = [b 0 b f f 2 ] and it was recorded whether or not the confidence regions described by (48), (59), (66), (72), and (79) contained the true value θ 0. ote that determining whether θ 0 lay in the confidence regions did not require the construction of the full confidence regions. The observed coverage γ α, for a particular nominal confidence level α, is defined as the percentage of the total number of data sets K for which θ 0 lay in the confidence region. In this study, we used K = Furthermore, a nominal confidence level α = 0.05 was chosen. This means that the (asymptotic) theory predicts an observed coverage of 95%. Figure shows the observed coverage rates γ 0.05 as a function of the number of data points. The 95% confidence intervals for γ 0.05 can be obtained from the binomial distribution. The maximum width of these confidence intervals was approximately 0.0. The results show that for increasing data lengths, all observed coverage rates tend to 0.95, as predicted by asymptotic theory. For finite data lengths, however, the different confidence regions show different reliability. The lack of fit method (based on the Rao test statistic) turns out to yield the most reliable confidence regions in the sense that the coverage probability equals the nominal probability for all data lengths (as predicted by theory). For the other confidence regions considered, coverage and nominal probabilities differ significantly for small. Particularly the classical confidence region (48) and the alternative (59) turn out to be unreliable for small. The reliability of the LR based confidence region (72) and the alternative (66) turn out to be relatively high, but suboptimal when compared to the lack-of-fit 5028

6 7th IFAC World Congress (IFAC'08) observed coverage rates number of data points Fig.. Results of the simulation experiment described in section 5. Observed coverage rates of the confidence regions based on the classical approach (48)( ), the alternative approaches (59)(+) and (66)(*), the LR based approach (72)( ), and the lack of fit method (79)( ), as a function of. The data generating system is given by (80) and the OE model set is described by (8). The nominal confidence levelis All results were obtained from realizations. method. The simulationexperimentwas repeatedfor data sequences obtained using different realizations of both the noise contribution e and the input sequence u in each of the K = data sets (for each value of ). The results were similar to those obtained a fixed input sequence u. More simulation experiments were performed, using alternative data generating systems (all belonging to the OE model class), parameters and nominal confidence rates. All experiments yieldedsimilar results. 6. COCLUSIO It was shown that the classical method for constructing confidence regions for OE model parameters yields unreliable inference results for small data lengths. In addition, the possibility of constructing exact confidence regions for finite data lengths was demonstrated. It should be noted that whereas the classical regions (ellipsoids) are easy to compute, the construction of the newly proposed (exact) confidence regions is computationally expensive. Therefore, more research should be done towards their practical implementation. The prospect of using randomized algorithms (Tempo et al., 2004) for this purpose is subject of ongoing research, as well as a further generalization of the results to the Box-Jenkins model structure. ACKOWLEDGEMETS The authors gratefully acknowledge fruitful discussions Sippe Douma and Asbjörn Klomp. M.C. Campi and E. Weyer. Guaranteed non-asymptotic confidence regions in system identification. Automatica, 4(0):75 764, A. J. den Dekker, X. Bombois, and P.M.J. Van den Hof. Likelihood based uncertainty bounding in prediction error identification using ARX models: A simulation study. In Proc. European Control Conference ECC 07, pages , Kos, Greece, July 2-5, J.R. Donaldson and R.B. Schnabel. Computational experience confidence regions and confidence intervals for nonlinear least squares. Technometrics, 27():67 82, 987. S.G. Douma and P.M.J. Van den Hof. An alternative paradigm for probabilistic uncertainty bounding in prediction error identification. In Proc. 44th IEEE Conf. Decision and Control and European Control Conference ECC 05, CDC-ECC 05, pages , Sevilla, Spain, December 2-5, S.G. Douma and P.M.J. Van den Hof. Probabilistic model uncertainty bounding: An approach finitetime perspectives. In Preprints 4th IFAC Symposium on System Identification, pages , ewcastle, Australia, March 27-29, R. A. Fisher. On the mathematical foundations of theoretical statistics. Phil. Trans. Roy. Soc. London, Series A, 222: , 922. R.A. Gallant. onlinear statistical models. John Wiley and Sons, Inc., ew York, 987. D. Hamilton. Confidence regions for parameter subsets in nonlinear regression. Biometrika, 73():57 64, 986. S.M. Kay. Fundamentals of Statistical Signal Processing, VolumeIIDetection Theory. Prentice Hall PTR, Upper Saddle River, ew Jersey, 998. L. Ljung. System Identification - Theory for the User. Prentice Hall, Upper Saddle River, J, 2nd edition, 999. L. Ljung. System identification toolbox in matlab. The Mathworks, Inc., A. M. Mood, F. A. Graybill, and D. C. Boes. Introduction to the Theory of Statistics. McGraw-Hill, Tokyo, 3 rd edition, 974. S.L. Quinn, T.J. Harris, and D.W. Bacon. Accounting for uncertainty in control-relevant statistics. Journal of Process Control, 5(): , G. A. F. Seber and C. J. Wild. onlinear regression. John Wiley and Sons, ew York, 989. R. Tempo, G. Calafiore, and F. Dabbene. Randomized algorithms for analysis and control of uncertain systems. Springer Verlag, ew York, A. Wald. ote on the consistency of the maximum likelihood estimate. Ann. Math. Stat., 20:595 60, 949. E. Weyer and M.C. Campi. on-asymptotic confidence ellipsoids for the least-squares estimate. Automatica, 38: , S.S. Wilks. Mathematical Statistics. John Wiley and Sons, Inc., ew York, 962. REFERECES M.C. Campi and E. Weyer. Finite sample properties of system identification methods. IEEE Trans. Autom. Control, 47(8): ,

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA