RESEARCH ARTICLE Parameter Consistency and Quadratically Constrained Errors-in-Variables Least-Squares Identification

Size: px

Start display at page:

Download "RESEARCH ARTICLE Parameter Consistency and Quadratically Constrained Errors-in-Variables Least-Squares Identification"

Leslie Hancock
5 years ago
Views:

1 RESEARCH ARTICLE Parameter Consistency and Quadratically Constrained Errors-in-Variables Least-Squares Identification Harish J. Palanthandalam-Madapusi Department of Mechanical and Aerospace Engineering Address: Syracuse University, 49 Link Hall, Syracuse, NY 3244, USA Phone: Fax: Tobin H. Van Pelt Dennis S. Bernstein i3d Technologies, LLC Department of Aerospace Engineering 953 W. California Ave. The University of Michigan Mill Valley, CA 9494 Ann Arbor, MI 4809 tobin@hyphen-8.com dsbaero@umich.edu (Received 00 Month 200x; final version received 00 Month 200x) In this paper we investigate the consistency of parameter estimates obtained from least squares identification with a quadratic parameter constraint. For generality, we consider infinite impulse response systems with colored input and output noise. In the case of finite data, we show that there always exists a possibly indefinite quadratic constraint depending on the noise realization that yields the true parameters of the system when a persistency condition is satisfied. When the autocorrelation matrix of the output noise is known to within a scalar multiple, we show that the quadratically constrained least squares estimator with a semidefinite constraint matrix yields consistent parameter estimates. This research was supported in part by the National Science Foundation Information Technology Research initiative, through Grant ATM and by the Air Force Office of Scientific Research under grants F and F Corresponding Author

2 International Journal of Control Introduction Least squares is undoubtedly one of the most widely developed, extensively applied, and generally useful techniques in science and engineering Björck (996), Lawson and Hanson (995), Draper and Smith (98). In its simplest form, least squares seeks approximate solutions to Ax = b when this equation has no solution per se due to inherent inconsistency or due to errors in the regressor matrix A or the output vector b. More precisely, least squares seeks a vector x that minimizes the Euclidean norm Ax b 2, which can be viewed as seeking the smallest residual w in the equation Ax = b + w. In a stochastic setting, where A, b, or w is random variable, the solution ˆx to the least squares problem is also a random variable. In this case, questions of interest concern the unbiasedness and consistency of the least squares estimator ˆx, where unbiasedness refers to the situation in which the expected value of ˆx is the true value x true, while consistency refers to the convergence, with probability, of ˆx = ˆx N to x true as the number N of data points used to construct A and b increases without bound. Although unbiasedness and consistency are independent conditions, consistency is usually obtained by constructing an unbiased estimator whose variance converges to zero as the amount of available data increases without bound. The best linear unbiased estimator (BLUE) has been extensively studied, see, for example, (Campbell and Meyer 99, Chapter 6). In system identification, least squares is the key tool for estimating the coefficients of time series and state space models Ljung (999), Söderström and Stoica (989), Hsia (977), Juang (993), Van Overschee and De Moor (996). For time series models, a difficulty arises from the fact that the components of the residual w are past values of the process noise, while the entries of the regressor matrix A include past values of the output. Consequently, the residual is correlated with the regressor, resulting in a biased estimator ˆx. This case is not addressed by the classical theory of BLUE. Hence, time series estimation poses difficulties that transcend the central results of classical linear estimation theory. In time series identification, correlation between the regressor matrix and residual is absent in two very special cases, namely, when the time-series model has an equation error with known spectrum (Ljung 999, p. 205), (Söderström and Stoica 989, pp. 66, 87) and when the dynamics are finite impulse response. As shown in (Ljung 999, p. 205), when the time-series model has an equation error with known spectrum, the time series model can be written using the notation of Ljung (999) as A(q)y(t) = B(q)u(t) + κ(q)e(t), where q is the backward shift operator, A(q) and B(q) are unknown polynomials in q, κ(q) is a known filter and e(t) is white. Then the time series model can be rewritten as A(q)y F (t) = B(q)u F (t) + e(t), where y F (t) = κ (q)y(t) and u F (t) = κ (q)u(t). By pre-filtering the data to obtain y F (t) and u F (t), standard least-squares techniques can be applied to estimate the time series model. Alternatively, consider an errors-in-variables time series model in which the input u(t) and the output y(t) are corrupted by colored noise κ (q)e (t) and κ 2 (q)e 2 (t), respectively, where

3 2 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein e (t) and e 2 (t) are white and κ (q) and κ 2 (q) are coloring filters. The time series model can then be written as A(q)y(t) = B(q)u(t) + A(q)κ (q)e (t) + B(q)κ 2 (q)e 2 (t). In this case, it follows that the equation error is A(q)κ (q)e (t) + B(q)κ 2 (q)e 2 (t). Now, even if the coloring filters κ (q) and κ 2 (q) are known, the spectrum of the equation error remains unknown since A(q) and B(q) are unknown. Therefore the pre-filtering technique can no longer be applied. Further details on these cases are given in Section 5. The issue of bias in time series model identification is well known, and the relevant literature is extensive. An early, pivotal contribution is the Koopmans-Levin (KL) method Levin (964), in which knowledge of the noise statistics is used to reduce the bias in the estimates of the model coefficients. This technique hinges on the minimization of a ratio of quadratic forms, which in turn leads to the solution of a generalized eigenvalue problem, for which the generalized eigenvector provides the parameter estimates. The KL method is further investigated in Smith and Hilton (967), Aoki and Yue (970), Sagara and Wada (977), Zhdanov and Katsyuba (979), Fernando and Nicholson (985), Zheng (2002). Since identification of a transfer function entails estimation of the coefficients of the numerator and denominator polynomials, scaling is needed to remove the inherent ambiguity due to a ratio of polynomials. The most obvious scaling is to scale both polynomials so that the denominator polynomial is monic. However, alternative constraints can be enforced. For example, a constraint on the norm of the coefficients can be used to scale the numerator and denominator coefficients Furuta and Paquet (970), Regalia (994), Z.-Ying and Chunbo (995). If the norm constraint involves a quadratic form, then the resulting quadratically constrained least squares (QCLS) estimation problem can be solved using Lagrange multipliers, as pursued in Lemmerling and Moor (200), Yeredor and Moor (2004), van Pelt and Bernstein (200). The resulting optimality conditions have the form of a generalized eigenvalue problem, in which the generalized eigenvector is an estimate of the transfer function coefficients. The above discussion suggests that the KL method can be viewed as the consequence of a particular QCLS optimization problem. A careful examination of the literature on unbiasedness and consistency shows that this linkage has been largely overlooked, resulting in a fragmented collection of approaches to this problem. Related techniques for removing the bias in least squares identification include James (972), Akashi (975), Söderström (98), Merhav (975), Zheng (998), Zheng and Feng (989), Zheng (999, 2002), Moor (994), Guo and Billings (2007). Finally, the instrumental variable method Wong and Polak (967), Söderström and Stoica (98), Stoica and Soderstrom (982) provides yet another approach to this problem. Recent work such as Zheng (2002), Lemmerling and Moor (200), Yeredor and Moor (2004), Chen (2007), Agüero and Goodwin (2008), Söderström (2002), Söderström (2007), Mahata (2007), Hong (2007) suggests continued interest in unbiased and consistent estimation. Errors-in-variables identification problems can be classified as either functional or structural Agüero and Goodwin (2008). When the true input to the system is an arbitrary deterministic signal, the identification problem is called a functional problem, whereas a structural problem arises when the true input to the system is a stochastic, usually white noise, signal. For the structural case, results on consistency and unbiasedness are available (see Mahata (2007), Söderström (2002) for example), but for the functional case proofs of consistency are mostly absent. Moreover, in the structural case, the noise covariance can be estimated from the input

4 International Journal of Control 3 data, however, in the functional case it is almost always assumed that the noise covariance matrix is known. For instance, for the functional problem, Hong (2007) shows that the estimates are asymptotically Gaussian distributed (unbiased but not consistent), when the covariance matrix is known to within two unknown parameters. A detailed discussion on all of these cases is provided in Söderström (2007). In the present paper, we focus on the functional case, and we assume that the autocovariance matrix of the noise is known to within a scalar multiple. Moreover, the algorithm derived in the present paper is closely related to the KL method, bias-compensation methods, and the Frisch scheme Koopmans (937), Levin (964), Aoki and Yue (970), Fernando and Nicholson (985), Zheng (998, 2002), Hong (2006), Mahata (2007), Söderström (2007). However, the novel features of the present paper include an alternative formulation of the above-mentioned methods, and, more importantly, this alternative formulation allows us to derive unbiasedness and consistency results for the functional case. The first objective of the present paper is accomplished by the above discussion, that is, to point out that the KL method can be viewed as a special case of QCLS estimation. With this realization in mind, our first objective is to characterize the solution set of an appropriate quadratically constrained quadratic programming problem. Because of space constraints, the details of this development are provided in Palanthandalam-Madapusi (2008), with only the key results quoted for use herein. We note, however, that, unlike earlier work based on Lagrange multipliers and necessary conditions, we provide necessary and sufficient conditions for the existence of solutions along with a complete characterization of the solution set directly in terms of the properties of matrix pencils. Next, we consider an errors-in-variables SISO ARMAX identification problem with minimal assumptions on the structure of the model and the nature of the noise. For this problem we show that there exists a possibly indefinite quadratic constraint on the system coefficients such that the resulting solution of the optimization problem yields the true coefficients. This analysis, as well as all persistency conditions, is entirely deterministic. The existence of a quadratic constraint that leads to the true parameter values provides a framework for understanding the success of the KL algorithm. However, the constraint matrix in the deterministic formulation depends on the sensor errors (noise) and thus is unknowable. We therefore assume the noise signal to be stochastic and ergodic, and replace the indefinite QCLS constraint with a positive-semidefinite constraint based on the noise covariance. Under the assumption that this noise covariance is known to within a multiplicative constant, we prove that solutions of this modified QCLS problem are both unbiased and consistent in the sense that the averaged problem and limiting problem produce, respectively, unbiased and true (with probability ) estimators, respectively. In addition, we provide numerical results that suggest that the QCLS estimator is unbiased and consistent in the standard sense. The main objective of this paper is to provide the missing foundation for the KL method and its numerous variants in the literature, while providing a complete development of unbiasedness and consistency in a precise sense. The essential idea of estimating the transfer function coefficients by means of constrained optimization is given an intuitively compelling foundation, thus linking, clarifying, and extending the prior literature.

5 4 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein Notation q is the backward shift operator, 0 n is the n n zero matrix, I n is the n n identity matrix, R n is the set of real n column vectors, and R n m is the set of real n m matrices. For A R n m, rank A is the rank of A, R(A) is the range of A, N (A) is the null space of A, def A is the defect (nullity) of A, λ max (A) is the largest eigenvalue of A, λ min (A) is the smallest eigenvalue of A, and A + is the Moore-Penrose generalized inverse of A. Furthermore, diag (a i ) is the diagonal matrix with diagonal entries a i. For a symmetric matrix A, A > 0 means that A is positive definite, and A 0 means that A is positive semidefinite. 2 Null Space Condition Consider the single-input, single-output system a 0 y 0 (k) + a y 0 (k ) + + a n y 0 (k n) = b 0 u 0 (k) + b u 0 (k ) + + b n u 0 (k n), () where u 0 (k) is the system input and y 0 (k) is the system output. By defining the backward shift operator q and the polynomials and () can be rewritten as where the system parameter vector ϑ R 2n+2 is A(q ;ϑ) = a 0 + a q + + a n q n (2) B(q ;ϑ) = b 0 + b q + + b n q n, (3) A(q ;ϑ)y 0 (k) = B(q ;ϑ)u 0 (k), (4) ϑ = [ a 0 a n b 0 b n ] T. (5) We assume that the polynomials A(q ;ϑ) and B(q ;ϑ) are coprime, that is, () is a minimal representation. Furthermore, we assume that a 0 0, which is equivalent to the assumption that () is causal. Hence ϑ 0. we have Next, defining the nth-order proper transfer function G(q ;ϑ) by G(q ;ϑ) = B(q ;ϑ) A(q ;ϑ), (6) y 0 (k) = G(q ;ϑ)u 0 (k). (7)

6 International Journal of Control 5 Remark 2.. Note that G(q ;ϑ) = G(q ;ηϑ) for all nonzero η R. This nonuniqueness of the system parameterization can be removed by choosing η = /a 0 so that the first component of ηϑ is unity. Although this normalization yields a unique system parameterization, we choose not to do this in order to facilitate the following analysis. Next, assuming the system input u 0 (k) and the system output y 0 (k) are available for k = 0,...,l, where l n, we define the noise-free regression matrix Φ 0 R (l n+) (2n+2) by Φ 0 = φ T 0 (n). φ T 0 (l) where the noise-free regression vector φ 0 (k) R 2n+2 is, (8) φ 0 (k) = [ y 0 (k) y 0 (k n) u 0 (k) u 0 (k n) ] T. (9) With this notation, () can be written as which is equivalent to the null space condition Φ 0 ϑ = 0, (0) ϑ N (Φ 0 ). () Finally, define the positive-semidefinite matrix M 0 R (2n+2) (2n+2) by M 0 = l ΦT 0 Φ 0. (2) Note that rank M 0 = rank Φ 0, N (M 0 ) = N (Φ 0 ), and def M 0 = def Φ 0 Bernstein (2005). Therefore, the null space condition () is equivalent to ϑ N (M 0 ). (3) Next, partition Φ 0 as Φ 0 = [ Φ 0 Φ 02 ], (4) where Φ 0 R l n+ and Φ 02 R (l n+) (2n+), and partition ϑ as [ ϑ = a0 ϑ ], (5)

7 6 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein where ϑ R 2n+. Then the null space condition () is equivalent to a 0 Φ 0 + Φ 02 ϑ = 0, (6) which implies that Φ 0 = a 0 Φ 02 ϑ R(Φ 02 ) = R(Φ 0 ), and thus rank Φ 02 = rank Φ 0. Hence, Φ 0 = Φ 02 Φ + 02 Φ 0 = Φ 0 Φ + 0 Φ 0 and The following result characterizes N (Φ 0 ). [ ] Φ + 02 Φ N (Φ 0 ). (7) 0 where Proposition 2.2. Let {u 0 (k)} l k=0 and {y 0(k)} l k=0 satisfy (7). Then [ Φ + 0 = β Φ T ( 0 Φ02 Φ T ) + ] 02 Φ + 02 β Φ + 02 Φ 0Φ T ( 0 Φ02 Φ T ) +, (8) 02 β = + Φ T 0 ( Φ02 Φ T 02) + Φ0. (9) Furthermore, N (Φ 0 ) = R ( I 2n+2 Φ + 0 Φ ) 0 ([ β β ( Φ + 02 = R Φ ) T 0 β Φ + 02 Φ 0 I 2n+ Φ + 02 Φ 02 + β Φ + 02 Φ 0 ( Φ + 02 Φ 0 ) T ]). (20) Proof. See Bernstein (2005), Fact Next, note that the null space condition () uniquely determines ϑ up to a scalar factor if N (Φ 0 ) is one dimensional, that is, if def Φ 0 =. Noting that rank Φ 0 + def Φ 0 = 2n + 2, the following definition considers this case. Definition 2.3. The input sequence {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ) if rank Φ 0 = 2n +. Note that, given {u 0 (k)} k= such that {u 0(k)}ˆl k= is persistently exciting for G(q ;ϑ), it follows that, for all l ˆl, {u 0 (k)} l k= is persistently exciting for G(q ;ϑ). satisfy (7). Then the following state- Proposition 2.4. Let {u 0 (k)} l k=0 and {y 0(k)} l k=0 ments are equivalent:

8 International Journal of Control 7 i) {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ). ii) rank M 0 = 2n +. iii) def M 0 =. iv) rank Φ 0 = 2n +. v) def Φ 0 =. vi) rank Φ 02 = 2n +. vii) def Φ 02 = 0. viii) Φ 02 Φ T 02 > 0. In this case, it follows that l 3n, Φ + 02 = (ΦT 02 Φ 02 ) Φ T 02, Φ+ 02 Φ 02 = I 2n+, and ([ ]) N (Φ 0 ) = N (M 0 ) = R Φ + 02 Φ = R(ϑ). (2) 0 Suppose that {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ) so that N (Φ 0 ) = R(ϑ). Then G(q ;ϑ) = G(q ;θ) for all θ N (Φ 0 ) \{0} = R(ϑ) \{0}, and G(q ;ϑ) is determined uniquely up to an arbitrary scaling by the null space condition (), which is characterized by (2). 3 Errors-in-Variables Formulation Next, we assume that the system output y 0 (k) is corrupted by output noise w(k) so that the measured output y(k) is given by Hence y(k) = y 0 (k) + w(k). (22) y(k) = G(q ;ϑ)u 0 (k) + w(k). (23) Furthermore, suppose that the system input u 0 (k) is uncertain so that the measured input u(k) is given by u(k) = u 0 (k) + v(k), (24) where v(k) is input noise. Then (23) can be written as the errors-in-variables model y(k) = G(q ;ϑ)u(k) G(q ;ϑ)v(k) + w(k). (25) Remark 3.. We make no assumptions on w(k) and v(k) until Section 0. Since w(k) need not be white, (25) can represent a Box-Jenkins model. By setting v(k) 0 and w(k) = H(q ) w(k), where H(q ) is a stable transfer function and w(k) is zero-mean and white, it

9 8 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein follows that (25) becomes the Box-Jenkins model y(k) = G(q ;ϑ)u(k) + H(q ) w(k). (26) We write (25) in regression form as A(q ;ϑ)y(k) B(q ;ϑ)u(k) = A(q ;ϑ)w(k) B(q ;ϑ)v(k), (27) which can be rewritten as φ T (k)ϑ = ψ T (k)ϑ, (28) where the regression vector φ(k) R 2n+2 is φ(k) = [ y(k) y(k n) u(k) u(k n) ] T (29) and the noise vector ψ(k) R 2n+2 is defined as Furthermore, note that ψ(k) = [ w(k) w(k n) v(k) v(k n) ] T. (30) φ(k) = φ 0 (k) + ψ(k). (3) Henceforth, we consider a finite measured output sequence {y(k)} l k=0 generated by (25) with measured input sequence {u(k)} l k=0 and noise sequences {w(k)}l k=0 and {v(k)}l k=0. Assuming l n, we define the noisy regression matrix Φ R (l n+) (2n+2) by Φ = and the noise matrix Ψ R (l n+) (2n+2) by It then follows from (28) that Since Ψ = φ T (n). φ T (l) ψ T (n). ψ T (l) (32). (33) Φϑ = Ψϑ. (34) Φ Ψ = Φ 0, (35)

10 International Journal of Control 9 it follows that (34) is equivalent to (0). However, note that ϑ N (Φ) (36) if and only if Ψϑ 0. Now, partition Φ = [ Φ Φ 2 ], where Φ R l n+ and Φ 2 R (l n+) (2n+), and define M R (2n+2) (2n+2) by M = l ΦT Φ. (37) Next, we consider the following definition. Definition 3.2. The input sequence {u 0 (k)} l k=0 and the noise sequences {w(k)}l k=0 and {v(k)} l k=0 are jointly persistently exciting for G(q ;ϑ) if rank Φ = 2n + 2. The assumption that {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ) and the assumption that {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 are jointly persistently exciting for G(q ;ϑ) are independent, that is, one does not imply the other. satisfy (25). Then the following state- Proposition 3.3. Let {u(k)} l k=0 and {y(k)}l k=0 ments are equivalent: i) {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 are jointly persistently exciting for G(q ;ϑ). ii) rank M = 2n + 2. iii) def M = 0. iv) M > 0. v) rank Φ = 2n + 2. vi) def Φ = 0. In this case, l 3n +. 4 Errors-in-Variables Least Squares In this section we review the standard least squares problem to provide a framework for QCLS identification. Consider the model G(q ;θ) of G(q ;ϑ), where the model parameter vector θ R 2n+2 is θ = [ θ 0 θ n θ n+ θ 2n+ ] T, (38)

11 0 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein and where Remark 2. is also valid for θ. Note that the definition of θ assumes that the system order n is known. We partition θ as [ θ = θ0 θ ], (39) where θ R 2n+. In the noise-free case, ϑ satisfies Φ 0 ϑ = 0 (see (0)). Hence, when noise is present, we seek θ that minimizes Φθ 2. Here we fix θ 0 and consider the standard least squares cost The standard least squares problem is thus All solutions to (4) are given by the following result. J( θ) = Φθ 2 2 = θ 0 Φ + Φ 2 θ 2 2. (40) min J( θ). (4) θ R 2n+ Proposition 4.. Assume that {u(k)} l k=0 and {y(k)}l k=0 satisfy (25). Then ˆ θ is a global minimizer of J( θ) if and only if there exists ξ R 2n+ such that ˆ θ = θ 0 Φ + 2 Φ + ( I 2n+ Φ + 2 Φ ) 2 ξ. (42) In this case, J(ˆ θ) = θ 2 0 Φ T ( Il n+ Φ 2 Φ + ) 2 Φ. (43) If, in addition, {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 G(q ;ϑ), then are jointly persistently exciting for ˆ θ = θ 0 Φ + 2 Φ = θ 0 ( Φ T 2 Φ 2 ) Φ T 2 Φ (44) is the unique minimizer of (40). Proof. Note that J( θ) = θ 0 Φ + Φ 2 θ 2 2 = θ 0 Φ + Φ 2 θ θ 2 0 ΦT Φ 2Φ + 2 Φ θ 2 0 ΦT Φ 2Φ + 2 Φ = Φ 2 ( θ + θ 0 Φ + 2 Φ ) θ2 0 ΦT ( Il n+ Φ 2 Φ + ) 2 Φ. Since θ0 2 ( ΦT Il n+ Φ 2 Φ + ) 2 Φ is independent of θ and Φ 2 θ + θ0 Φ 2 Φ + 2 Φ 2 2 is nonnegative, all global minimizers of J( θ) satisfy Φ 2 θ + θ0 Φ 2 Φ + 2 Φ 2 2 = 0. Thus ˆ θ is a global minimizer of

12 International Journal of Control J( θ) if and only if ˆ θ = θ0 Φ + 2 Φ + ν, where ν N (Φ 2 ). Since N (Φ 2 ) = R ( I 2n+ Φ + 2 Φ 2), all global minimizers of J( θ) are of the form (42) with minimum value given by (43). Finally, assume that {u 0 (k)} l k=0 and {w(k)}l k=0 are jointly persistently exciting for G(q ;ϑ). Then, since rank Φ 2 = 2n +, it follows that Φ + 2 Φ 2 = I 2n+, and (42) specializes to (44). 5 Correlation Between the Regressor and Noise Vectors First, partition Ψ = [ Ψ Ψ 2 ], where Ψ R l n+ and Ψ 2 R (l n+) (2n+). Then, as noted in (Ljung 999, p. 205), the asymptotic error in the least-squares estimate is due to correlation between Φ 2 and Ψ 2. Since, for all k, y(k) and w(k) are correlated, and u(k) and v(k) correlated, it follows that Φ 2 and Ψ 2 are generally correlated for the system (25). However, under the two special cases discussed in the following subsections, this correlation disappears. As noted in (Söderström and Stoica 989, pp. 66, 87), both of these cases are extremely specialized. 5. White Equation Error Consider the white equation-error model y(k) = G(q ;ϑ)u(k) + A(q w(k), (45) ;ϑ) where w(k) is white. The model (45) is obtained as a special case of (25) by setting v(k) 0 and w(k) = A(q ;ϑ) w(k). Writing (45) in the time-series form yields a 0 y(k) + a y(k ) + + a n y(k n) = b 0 u(k) + b u(k ) + + b n u(k n) + w(k), (46) and thus it follows that Ψ 2 = 0 and hence Φ 2 and Ψ 2 are uncorrelated. 5.2 Finite Impulse Response A related case is the finite impulse-response model y(k) = B(q ;ϑ) a 0 u(k) + w(k), (47) which is a special case of (25) with v(k) 0 and A(q ;ϑ) =. Again, writing (47) in the time-series form yields a 0 y(k) = b 0 u(k) + b u(k ) + + b n u(k n) + w(k), (48) and thus it follows that Ψ 2 = 0 and hence Φ 2 and Ψ 2 are uncorrelated. Note that w(k) need not be white. However, if w(k) is white, then (48) is obtained as a special case of (46) by setting a = a 2 = = a n = 0.

13 2 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein 6 Quadratically Constrained Least Squares Since the standard least-squares solutions may be biased, we now develop an alternative approach to estimating ϑ. Instead of fixing a 0 in (38), we minimize the least squares cost (40) subject to a quadratic constraint on θ. Hence, we consider the generalized least squares cost J (θ) = l Φθ 2 2 = l θ 0Φ + Φ 2 θ 2 2, (49) which is identical to (40) except that in (49) there is an additional factor of /l and the argument is θ R 2n+2. Using (37), it follows that (49) can be rewritten as J (θ) = θ T Mθ. (50) Since M is positive semidefinite it follows that J is convex on R 2n+2. Furthermore, J is strictly convex on R 2n+2 if and only if M is positive definite (see Bernstein (2005), p. 320). Next, let γ > 0, let N R (2n+2) (2n+2) be symmetric, and define the parameter constraint set D γ (N) by D γ (N) = {θ R 2n+2 : θ T Nθ = γ}. (5) The quadratically constrained least squares (QCLS) problem is then given by min J (θ). (52) θ D γ(n) Note that D γ (N) is closed and symmetric, that is, θ D γ (N) if and only if θ D γ (N). Furthermore, D γ (N) if and only if λ max (N) > 0. Next, define the solution set W γ (N) by W γ (N) = {θ D γ (N) : J (θ) = min J θ D (θ )}. (53) γ(n) W γ (N) is closed and symmetric. Furthermore, If W γ (N) then the QCLS problem (52) has at least two solutions. In particular, θ R 2n+2 solves the QCLS problem (52) if and only if θ does. Define N LS R (2n+2) (2n+2) by γ N LS = (54) 0 0 With N = N LS it can be seen that solutions of the QCLS problem (52) are solutions of the standard least squares problem (4) with a 0 = ±. Although N LS is positive semidefinite, we do not require that N be positive semidefinite.

14 International Journal of Control 3 7 Existence and Uniqueness of QCLS Problem Let A, B R p p. Then the matrix pencil P A,B (s) is defined by P A,B (s) = A sb. (55) Furthermore, define the characteristic polynomial χ A,B (s) by χ A,B (s) = det(a sb). (56) The pair (A, B) is regular if χ A,B (s) is not the zero polynomial. The roots of χ A,B (s) are the generalized eigenvalues of (A, B). Furthermore, if A > 0, B is symmetric, and (A, B) is regular, then all generalized eigenvalues of (A, B) are real Palanthandalam-Madapusi (2008). Next, define S = {α 0 : αn M} (57) and α max = max S, and note that S = [0,αmax ]. The following result concerns properties of α max. Proposition 7.. The following statements hold: i) If N M, then α max. ii) α max is a generalized eigenvalue of (M, N). Furthermore, assume that α max > 0 and (M, N) is regular. Then the following statements hold: iii) Let α, α 2 (0, α max ). Then 0 = def(m α N) = def(m α 2 N) < def(m α max N). (58) iv) α max is the smallest positive generalized eigenvalue of (M, N). Proof. See Palanthandalam-Madapusi (2008). Thus M α max N has a nontrivial null space. The following result considers existence of solutions to (52). Theorem 7.2. For all θ D γ (N), J (θ) α max γ. (59)

15 4 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein Furthermore, θ D γ (N) satisfies J (θ) = α max γ (60) if and only if θ N (M α max N) D γ (N). Finally, W γ (N) = N (M α max N) D γ (N). (6) Proof. See Palanthandalam-Madapusi (2008). Corollary 7.3. Assume M > 0. Then and thus (52) has a solution. min J (θ) = α maxγ, (62) θ D γ(n) 8 Choosing N in QCLS Define M R (2n+2) (2n+2) by Note that M M and M = M M 0. (63) M = l ( Φ T Φ Φ T 0 Φ ) 0, (64) which may be indefinite. In the noise-free case M = 0, and thus D γ ( M) =. In the noisy case, however, M may be nonzero. For the rest of this subsection we set N = M, and assume that λ max ( M) > 0. Proposition 8.. α max. Furthermore, if {u 0 (k)} l k=0 is persistently exciting, and {u 0 (k)} l k=0, {w(k)}l k=0 and {v(k)}l k=0 are jointly persistently exciting, then α max = is the smallest positive generalized eigenvalue of (M, M). Proof. Since M M, it follows from Proposition 7. that α max. Next, since N (M) N ( M) = {0} and rank M 0 = rank (M M) = 2n+ < 2n+2 = rank M it follows that def(m) = 0 < = def(m 0 ) = def(m M) and thus Proposition 7. with N = M in (58) implies that α max =. The following result gives conditions under which the QCLS problem correctly identifies G(q ;ϑ).

16 International Journal of Control 5 Theorem 8.2. Assume that {u 0 (k)} l k=0 is persistently exciting and {u 0 (k)} l k=0, {w(k)} l k=0 and {v(k)}l k=0 are jointly persistently exciting. Then { } γ W γ ( M) = ± ϑ T Mϑ ϑ. (65) Proof. From Proposition 8., it follows that α max =. Next, since M > 0, it follows from Corollary 7.3 that W γ ( M). Furthermore, it follows from Theorem 7.2 that W γ ( M) = N (M 0 ) D γ ( M). Furthermore, since M > 0, M = M 0 + M and ϑ T M 0 ϑ = 0, it follows that ϑ T Mϑ > 0. Hence, since def M 0 =, it follows from (3) that (65) holds. Note that if M > 0, then N (M) N ( M) = {0}. Thus, if rank M 0 = 2n+ and M > 0, then Theorem 8.2 implies that G(q ;ϑ) = G(q ;θ) for all θ W γ ( M). However, the choice N = M is not possible in practice since M is not known. In the next section, we consider the QCLS problem with an approximation N M. 9 Unbiasedness of QCLS It was shown in Section 8 that, if N = M and M > 0, then the QCLS problem has exactly two solutions θ, both of which satisfy G(q ;θ) = G(q ;ϑ). However, since M is not known, we cannot set N = M in practice. In this section we choose an alternative but closely related value of N and show that the resulting QCLS solution is unbiased. For the remainder of the paper, we assume that {w(k)} k=0 and {v(k)} k=0 are stationary and have zero mean and finite second moments. Furthermore, we assume that {w(k)} k=0 and {v(k)} k=0 are jointly ergodic random processes in the sense that, for all ρ, σ {0,, 2} and for all i, E[w ρ (i)v σ (i)] = lim l l l k=0 wρ (k)v σ (k) wp. For convenience, define the positive-semidefinite matrix R R (2n+2) (2n+2) by R = E [ ψ T (k)ψ(k) ], where n k l. However, note that since w(k) and v(k) are stationary, R is independent of k. Next, partition R as where R ww R (n+) (n+) is given by R = [ ] Rww R wv, (66) R vw R vv w 2 (k) w(k)w(k ) w(k)w(k n) w(k)w(k ) w 2 (k) w(k)w(k n + ) R ww = E......, (67) w(k)w(k n) w(k)w(k n + ) w 2 (k)

17 6 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein while R vv R (n+) (n+) and R wv R (n+) (n+) are defined analogously. Next, using Φ = Φ 0 + Ψ, M has the form M = l = l = l ( Φ T Φ Φ T 0 Φ 0 ) ( (Φ0 + Ψ) T (Φ 0 + Ψ) Φ T 0 Φ 0) ( Φ T 0 Ψ + Ψ T Φ 0 + Ψ T Ψ ). (68) Proposition 9.. Assume that {y 0 (k)} l k=0 and {u 0(k)} l k=0 are bounded. Then E [ M] = R (69) and E [M] = M 0 + R. (70) Proof. [ The result follows from (68) and the fact that, for all continuous functions f : R R, E ] l l k=0 f(w(k)) = E[f(w(k))]. Let η > 0. Then, define ᾱ max = max {α 0 : αηr E[M]}. Then, QCLS with N = ηr is an unbiased generator of ϑ if Finally, we have the following unbiasedness result. N (E[M] ᾱ max ηr) = R(ϑ). (7) Proposition 9.2. Assume that {y 0 (k)} l k=0 and {u 0(k)} l k=0 are bounded and satisfy (25). Furthermore, assume that {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ), assume that {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 are jointly persistently exciting for G(q ;ϑ), let η > 0, and assume M 0 + R > 0. Then, for all η > 0, QCLS with N = ηr is an unbiased generator of ϑ. Proof. From Proposition 9., it follows that E[M] = M 0 + R. Next, since M 0 0, R 0 and M 0 + R > 0, it follows from cogredient simultaneous diagonalization of M 0 and R (Rao and Mitra (97), Theorem 6.2.3, p. 22), that there exists S R (2n+2) (2n+2) such that SM 0 S T and SRS T are diagonal. Next, for i =,...,2n + 2, let m i 0 and r i 0 denote the diagonal entries of SM 0 S T and SRS T, respectively. Hence S(M 0 +R)S T = diag (m i + r i ). Since M 0 + R is positive definite, it follows that m i + r i > 0 for all i =,...,2n + 2. Furthermore, since {u 0 } l k=0 is persistently exciting, it follows from (2) that N (M 0) = R(ϑ) and def M 0 =, and thus exactly one m i is zero. For convenience, assume m = 0. Next, since SE[M]S T = diag (m i + r i ), ηsrs T = diag (ηr i ), and m = 0, it follows from Proposition 7. that ᾱ max = /η. Finally, N (E [M] ᾱ max ηr) = N (M 0 + R R) = N (M 0 ) = R(ϑ).

18 International Journal of Control 7 Let ˆθ be a solution of the QCLS problem with M and N = ηr, where η > 0. Since M is a random variable, it follows that ˆθ is a random variable. The traditional notion of unbiasedness states that E[ˆθ] = ± ϑ. (72) ηϑ T Rϑ Numerical simulations given below suggest that QCLS with N = ηr is unbiased in this sense. 0 Consistency of QCLS In this section, we write M 0,l, M l, M l, ˆθ l, and α max,l for M 0, M, M, ˆθ, and α max, respectively, to indicate the dependence on l. We then let l tend to infinity and show that the resulting QCLS solution is consistent. Lemma 0.. Let {x(k)} k=n R, assume there exists β > 0 such that, for all k n, 0 x(k) β, let K {n,n +,... }, and define K l = K {n,...,l}. Then, lim l l k K l w(k)x(k) = 0 wp. (73) Proof. Write K = {k, k 2,... }. Next, suppose that K has a finite number of elements. Then, since w(k) has a finite second moment, it follows that (73) holds. Next, suppose that K has an infinite number of elements. Letting ˆl be the number of elements in K l, it follows that βˆl ˆl w(k i ) i= ˆl l w(k i )x(k i ) i= w(k)x(k) ˆl l l w(k i )x(k i ) k K l i= βˆl ˆl w(k i ) i=. Therefore, it follows that β lim ˆl ˆl ˆl i= w(k i ) lim l l w(k)x(k) β k K l lim ˆl ˆl ˆl i= w(k i ). Furthermore, since {w(k)} k=0 is stationary and ergodic, it follows that {w(k i)} i= and ergodic. Hence 0 = β E[w(k)] lim l l k K l w(k)x(k) β E[w(k)] = 0 is stationary

19 8 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein Lemma 0.2. Assume that {y 0 (k)} k=0 and {u 0(k)} k=0 are bounded. Then lim l l ΨΦT 0 = 0 wp. (74) Proof. The (,) entry ΨΦ T 0 is l k=n w(k)y 0(k). Next, let K + = {k : k n and sign(y0 (k)) = } and K = {k : k n and sign(y0 (k)) = }. Furthermore, define K l,+ = K+ {n,...,l} and K l, = K {n,...,l}. Then, it follows that l w(k)y 0 (k) = k=n Using Lemma 0., it follows that w(k)y 0 (k) w(k) y 0 (k). k K l,+ k K l, lim l l l k=n w(k)y 0 (k) = lim l l w(k)y 0 (k) lim l l k K l,+ k K l, w(k) y 0 (k) = 0 wp. A similar argument holds for the remaining entries of ΨΦ T 0. Proposition 0.3. Assume that {y 0 (k)} k=0 and {u 0(k)} k=0 are bounded. Then lim l M l = R wp. (75) Proof. It follows from (68), (66), Lemma 0.2, and the fact that w(k) and v(k) are jointly ergodic that lim M [ l = lim Ψ T Ψ + Φ T 0 Ψ + Ψ T ] [ Φ 0 = lim Ψ T Ψ ] = E [ ψ(n) T ψ(n) ] = R wp. l l l l l For the following development we consider M 0 R (2n+2) (2n+2) defined by when the limit exists. M 0 = lim l M 0,l, (76)

20 International Journal of Control 9 Proposition 0.4. M 0 exists if and only if, for all 0 κ n, lim l l l k=0 u 0(k)u 0 (k+ κ), lim l l l k=0 y 0(k)y 0 (k+κ), lim l l l k=0 y 0(k)u 0 (k+κ), and lim l l l k=0 u 0(k)y 0 (k+ κ) exist. Proposition 0.4 shows that M 0 exists if and only if the given autocorrelations exist. Lemma 0.5. Given {u 0 (k)} k=0, assume that there exists ˆl 3n such that {u 0 (k)}ˆl k=0 is persistently exciting for G(q ;ϑ). Then, for all l ˆl, {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ). Next, let {u 0 (k)}ˆl k=0 be persistently exciting for G(q ;ϑ). Then using Lemma 0.5, it follows that {u 0 (0),...,u 0 (ˆl), 0, 0,... } is persistently exciting for G(q ;ϑ) and thus it follows from (2) that, for all l ˆl, N (M 0,l ) = R(ϑ). Furthermore, for the input {u 0 (0),...,u 0 (ˆl), 0, 0,... }, the stability of G(q ;ϑ) implies that y 0 (k) 0 as k. It thus follows that M 0 = 0 and hence N (M 0 ) = R 2n+2 R(ϑ) = N (M 0,l ) for all l ˆl. Hence, in this case, for all l ˆl, N (M 0,l ) is a proper subset of N (M 0 ). More generally, if M 0 exists, then, for all l 3n, N (M 0,l ) N (M 0 ). However, the above example shows that N (M 0 ) = N (M 0,l ) is not true in general. Thus we have the following definition. For convenience, for κ n, let σ 2n+,κ be the second smallest singular value of φ 0 (κ).. φ 0 (κ + 3n) Definition 0.6. The input sequence {u 0 (k)} k=0 is infinitely persistently exciting for G(q ;ϑ) if there exists ε > 0 such that, for all κ n, σ 2n+,κ > ε. The following result is immediate. Proposition 0.7. If {u 0 (k)} k=0 is infinitely persistently exciting for G(q ;ϑ), then the following statements hold: i) For all κ n, {u 0 (κ + k)} 3n k=0 is persistently exciting for G(q ;ϑ). ii) For all θ R 2n+2 such that θ R(ϑ), there exists ε θ > 0 such that, for all κ n, φ 0 (κ). θ > ε θ. φ 0 (κ + 3n) 2

21 20 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein is infinitely persistently ex- Lemma 0.8. Assume that M 0 exists and that {u 0 (k)} k=0 citing for G(q ;ϑ). Then, for all l 3n, N (M 0 ) = N (M 0,l ), (77) and thus N (M 0 ) = R(ϑ). (78) Proof. Since ϑ N (M 0,l ) for all l 3n, it follows that ϑ N (M 0 ). Hence, for all l 3n, N (M 0 ) N (M 0,l ). Conversely, since {u 0 (k)} k=0 is infinitely persistently exciting, it follows that, for all l 3n and κ n, φ 0 (κ) N (M 0,l ) = N. = R(ϑ). (79) φ 0 (κ + 3n) Next, let θ R 2n+2 be such that θ N (M 0,l ) for all l 3n. Then it follows from φ 0 (κ) (79) that, for all κ 0,. θ 0. Next, for all l 3n, we have θ T M 0,l θ = φ 0 (κ + 3n) ( θ T l l k=0 φ 0(k) T φ 0 (k)) θ = l l k=0 r2 k, where r k = φ 0 (k)θ. Since {u 0 (k)} k=0 is infinitely persistently exciting and θ R(ϑ), it follows from Proposition 0.7 that there exists ε θ > 0 such that, for all κ n, φ 0 (κ) 2 κ+3n k=κ r2 k =. θ > ε 2 θ, and thus, for all l 0 > 0, 3nl0 3nl 0 k=0 r2 k > φ 0 (κ + 3n) 2 ε 2 θ 3n. It then follows that θt M 0 θ = lim l θ T M 0,l θ = lim l l l k=0 r2 k ε2 θ 3n > 0. Since M 0 0, it follows that θ N (M 0 ). Next, QCLS with N = ηr is a consistent generator of ϑ if the following three conditions are satisfied: () M = liml M l wp exists. (2) W γ, (ηr) = N (M α max, ηr) D γ (ηr), where α max, = max{α 0 : αηr M }. (3) N (M α max, ηr) = R(ϑ). Statement 2 states that QCLS with M replaced by M and with N = ηr has a solution. Lemma 0.9. Assume M 0 exists and that {y 0 (k)} k=0 and {u 0(k)} k=0 are bounded and satisfy (25). Furthermore, for all l 3n, assume that {u 0 (k)} l k=0 is persistently exciting for G(q ;ϑ), and {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 are jointly persistently exciting for

22 International Journal of Control 2 G(q ;ϑ). Then, M exists and is given by M = M 0 + R. (80) wp. Proof. Proposition 0.3 implies that lim l M l = lim l M 0,l +lim l M l = M 0 +R Next, we have the following consistency result. Theorem 0.0. Assume M 0 exists and that {y 0 (k)} k=0 and {u 0(k)} k=0 are bounded and satisfy (25). Furthermore, for all l 3n, assume that {u 0 (k)} l k=0 is infinitely persistently exciting for G(q ;ϑ), assume {u 0 (k)} l k=0, {w(k)}l k=0, and {v(k)}l k=0 are jointly persistently exciting for G(q ;ϑ), let η > 0, and assume that M > 0. Then QCLS with N = ηr is a consistent generator of ϑ. Proof. From Theorem 7.2, it follows that ˆθ l satisfies (M l α max,l ηr) ˆθ l = 0. (8) Next, using Lemma 0.9 and Corollary 7.3, it follows that M = M 0 + R > 0 and W γ,. Furthermore, since M 0 and R are positive semidefinite, it follows by cogredient simultaneous diagonalization of M 0 and R (Rao and Mitra (97), Theorem 6.2.3, p. 22), that there exists S R (2n+2) (2n+2) such that SM 0 S T and SRS T are diagonal. Next, for i =,...,2n + 2, let m i 0 and r i 0 denote the diagonal entries of SM 0 S T and SRS T, respectively. Hence S(M 0 + R)S T = diag (m i + r i ). Since M 0 + R is positive definite, it follows that m i + r i > 0 for all i =,...,2n + 2. Furthermore, since {u 0 } k=0 is infinitely persistently exciting, it follows from Lemma 0.8 that N (M 0 ) = R(ϑ) and def M 0 =, and thus exactly one m i is zero. For convenience, assume m = 0. Next, since SM S T = diag (m i + r i ), ηsrs T = diag (ηr i ), and m = 0, it follows from Proposition 7. that α max, = /η. Finally, since N (M α max, ηr) = N (M 0 + R R) = N (M 0 ), it follows from Lemma 0.8 that N (M α max, ηr) = R(ϑ) and thus QCLS with N = ηr is consistent. We note that Theorem 0.0 holds for arbitrary η > 0. Hence, in practice, R need only be known to within a scalar multiple. The traditional notion of consistency states that ϑ ˆθ l ± ηϑ T Rϑ as l. (82) Numerical simulations given below suggest that QCLS with N = ηr is consistent in this sense.

23 22 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein An Analytical Example Illustrating Consistency In this section, we illustrate that when the noise statistics are known, QCLS with N = ηr is a consistent generator of ϑ, whereas the standard least squares estimator is generally not consistent. Consider the first-order strictly proper stable system G(q ;ϑ) = b q a 0 + a q, (83) where, in accordance with Remark 2., ϑ = [ ] T, [ ] T, a 0 a b θ = θ0 θ θ 2 and φ(k) = [ ] T. y(k) y(k ) u(k ) Furthermore, assume that the noise sequence w(k) is white, v(k) 0, and let the input sequence {u 0 (k)} l k=0 be a realization of a zero-mean white noise sequence with variance σu 2. The measured output {y(k)}l k=0 is given by (22) Next, note that the standard least squares error can be written as δ θ l = (/θ0 ) ˆ θl (/a 0 ) ϑ = ( ) l ΦT 2 Φ 2 l ΦT 2 Ψϑ, (84) so that the asymptotic standard least squares bias δ θ is given by δ θ = lim l δ θ l = R q, (85) where R is defined by (66) and [ ] a0 E[y(k )w(k)] + a q = lim l l ΦT 2 Ψϑ = E[y(k )w(k )]. (86) a 0 E[u(k )w(k)] a E[u(k )w(k )] Furthermore, computing the expectations in R and (86) we have E[y 2 (k)] = b2 σ2 u a σ a2 w 2, (87) E[y(k )u(k )] = 0, (88) E[u 2 (k)] = σ 2 u, (89) E[y(k )w(k)] = 0, (90) E[y(k )w(k )] = σ 2 w, (9) E[u(k )w(k)] = 0, (92) E[u(k )w(k )] = 0. (93)

24 International Journal of Control 23 Substituting (87)-(93) into R and (86), it follows that (85) becomes δ θ = [ b 2 σ 2 u a 2 0 a2 + σ 2 w 0 0 σ 2 u ] [a σw 2 ] [ = 0 Furthermore, we define the signal-to-noise ratio ς by ς = l y 2 (k)/ k=0 ] a σw 2 (a2 0 a2) (b 2 σ2 u +(a2 a2 )σ2 ) w 0. (94) l w 2 (k), (95) k=0 so that lim l ς = E[y 2 (k)] E[w 2 (k)] = b 2 σ2 u (a 2 0 a2 )σ2 w +. (96) Therefore, using (96), (94) becomes δ θ = [ ] ς 2 0. (97) Note that the estimate of b is consistent, while the estimate of a is biased. Next, using QCLS identification, we assume that R ww is known to within a scalar multiple. Since R ww = σ 2 wi n+ in this example, and we choose Using Theorem 7.2 and Corollary 7.3 we write N = (98) (M α max N) θ = 0. (99) We note that E[y 2 (k)] E[y(k)y(k )] E[y(k)u(k )] lim M = E[y(k)y(k )] E[y 2 (k )] E[y(k )u(k )]. (00) l E[y(k)u(k )] E[y(k )u(k )] E[u 2 (k )] Noting E[y 2 (k )] = E[y 2 (k)] and E[y(k)y(k )] = a ( b 2 σ2 u ), a 0 a 2 0 a 2 (0) E[y(k)u(k )] = b2 σ2 u a 0, (02)

25 24 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein we substitute (87)-(93), (0) and (02) into (00) and (99). It then follows that ˆθ satisfies b 2 σ2 u (a 2 0 a 2 ) + σ2 w α max a b 2 σ2 u a b 2 σ2 u a 0(a 2 0 a 2 ) Solving (03) for α max, we obtain a 0(a 2 0 a 2 ) b σ 2 u a 0 b 2 σ2 u (a 2 0 a 2 ) + σ2 w α max 0 b σu 2 a 0 0 σu 2 ˆθ = 0. (03) α max = σ 2 w, or α max = σ 2 w + b2 σ 2 u(a 2 0+a 2 ) a 2 0(a 2 0 a 2 ). (04) Furthermore, since the system is stable, a < a 0, and thus b2 σ 2 u(a 2 0+a 2 ) a 2 0(a 2 0 a 2 ) > 0. Hence α max = σ 2 w. By substituting α max into (03) and solving for ˆθ we obtain ˆθ = θ 0 a 0 ϑ, (05) where a 0 is a nonzero scalar. Therefore, using Remark 2., it follows that G(q ;ϑ) = G(q ; ˆθ ). Hence, in accordance with Theorem 0.0, QCLS with N = ηr is a consistent generator of ϑ. 2 A Numerical Example Illustrating Unbiasedness and Consistency Consider the stable transfer function G(q ;ϑ) = q q 2 0.5q 0.9q 2, (06) where ϑ = [ ] T [ ] T. and θ = θ θ 2 θ 3 θ 4 θ 5 θ 6 We construct {u0 (k)} l k=0 to the sum of two sinusoids at frequencies 2π rad/s and.5π rad/s as well as a realization of a zero-mean white noise sequence with standard deviation. The input {u 0 (k)} l k=0 is corrupted by white noise v(k) with standard deviation 0.04, while the output {y 0 (k)} l k=0 is corrupted by white noise w(k) with standard deviation 0.4. However, since the numerical implementation of the white noise process introduces an approximation, we compute R based on the actual (numerically generated) realization of the white noise processes. For example, for l = 8 0 5, R is computed to be R = (07)

26 International Journal of Control 25 Parameter QCLS soln Standard LS soln True parameter value Parameter Parameter QCLS LS 0.5 QCLS LS 0.3 QCLS LS Parameter Parameter Parameter QCLS LS 0.25 QCLS LS 0.34 QCLS LS Figure. Numerical example. Comparison of estimates of system parameters obtained using standard least squares and using QCLS, for l = 000. The large and represent the average of the estimates over 50 runs for the QCLS and standard least-squares solutions, respectively. The averages of the QCLS estimates over 50 runs are close to the true parameter values, which suggests that the QCLS solutions are unbiased. Since we use N = R in the QCLS problem, the normalized parameter vector is γ ϑ 0 = ϑ T Rϑ ϑ = [ ] T. (08) We set l = 000, perform 50 runs with different realizations of w(k) and v(k), and compute the QCLS estimates as determined by Theorem 7.2 along with the standard least-squares estimates. Since standard least squares is equivalent to normalizing θ to ±, we scale the leastsquares estimates so that θ matches the first component of ϑ 0. Figure shows that the averages of QCLS estimates over the 50 runs are close to the true parameter, indicating that the QCLS estimates are unbiased. Next, we vary l from 50 to and compute the QCLS solutions as determined by

27 26 H. J. Palanthandalam-Madapusi, T. H. Van Pelt, and D. S. Bernstein Parameter Parameter 2 Parameter 3 Parameter 4 Parameter 5 Parameter QCLS soln Standard LS soln True parameter value Amount of data (l) Figure 2. Numerical example. Comparison of estimates of system parameters obtained using standard least squares and using QCLS, as a function of the amount of data. It is seen that the QCLS estimates converge to the true parameters as l becomes large. Theorem 7.2. We compare these estimates to parameter estimates obtained by standard least squares. Figure 2 shows a comparison between the QCLS estimates and the standard leastsquares estimates of all 6 parameters for increasing l. It is seen that the QCLS estimates converge to the true parameters as l becomes large.

28 International Journal of Control 27 3 Conclusions In this paper we investigated the consistency of parameter estimates obtained from least squares identification with a quadratic parameter constraint. For generality, we considered infinite impulse response systems with colored and possibly correlated input and output noise. In the case of finite data, we showed that there always exists a possibly indefinite quadratic constraint depending on the noise realization that yields the true parameters of the system when a persistency condition is satisfied. When the autocorrelation matrix of the output noise is known to within a scalar multiple, we showed that the quadratically constrained least squares estimator with a definite constraint matrix yields consistent parameter estimates. We thus provided the missing foundation for the KL method and its numerous variants in the literature, while providing a complete development of unbiasedness and consistency in a precise sense. Future work will focus on proving that the QCLS estimator is unbiased and consistent in the traditional sense. Acknowledgments We wish to thank Spilios Fassois and Dave Bayard for lengthy discussions and numerous helpful suggestions.

29 28 REFERENCES References Agüero, J.C., and Goodwin, G.C. (2008), Identifiability of Errors in Variables Dynamical Systems, Automatica. Akashi, H. (975), Parameter Identification of Systems with Noise in Input and Output, Memoirs of the Faculty of Engineering, Kyoto Univ., 37, Aoki, M., and Yue, P.C. (970), On A Priori Error Estimates of Some Identification Methods, IEEE Trans. Autom. Contr., AC-5, Bernstein, D.S., Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory, Princeton University Press (2005). Björck, A., Numerical Methods for Least Squares Problems, SIAM (996). Campbell, S.L., and Meyer, C.D., Generalized inverses of Linear Transformations, Dover Publications (99). Chen, H.F. (2007), Recursive Identification for Multivariate Errors-in-variables Systems, Automatica, 43, Draper, N.R., and Smith, H., Applied Regression Analysis, 2nd ed., New York: Wiley (98). Fernando, K.V., and Nicholson, H. (985), Identification of Linear Systems with Input and Output Noise: The Koopmans-Levin Method, IEE Proc. Part D, 32, Furuta, K., and Paquet, J.G. (970), On the Identification of Time-invariant Discrete Processes, IEEE Trans. Autom. Contr., 5, Guo, L., and Billings, S.A. (2007), A Modified Orthogonal Forward Regression Least-Squares Algorithm for System Modelling from Noisy Regressors, Int. J. Contr., 80, Hong, M., Söderström, T., and Zheng, W.X. (2006), Accuracy Analysis of Bias-eliminating Least Squares Estimates for Errors-in-variables Identification, in Proc. IFAC Symp. on Sys. Ident., Newcastle, Australia, pp Hong, M., Söderström, T., and Zheng, W.X. (2007), Accuracy Analysis of Bias-eliminating Least Squares Estimates for Errors-in-variables Systems, Automatica, 43, Hsia, T.C., Identification: Least Squares Methods, Lexington Books (977). James, P.N., Souter, P., and Dixon, D.C. (972), Suboptimal Estimation of the Parameters of Discrete Systems in the Presence of Correlated Noise, Electron. Lett., 8, Juang, J.N., Applied System Identification, Prentice Hall (993). Koopmans, T.J., Linear Regression Analysis of Economic Time Series, De erven F. Bohn N. V., Haarlem: The Netherlands Economic Institute (937). Lawson, C.L., and Hanson, R.J., Solving Least Squares Problems, SIAM (995). Lemmerling, P., and Moor, B.D. (200), Misfit versus Latency, Automatica, 37, Levin, M.J. (964), Estimation of a System Pulse Transfer Function in the Presence of Noise, IEEE Trans. Autom. Contr., AC-9, Ljung, L., System Identification: Theory for the User, 2nd ed., Prentice Hall (999). Mahata, K. (2007), An Improved Bias-compensation Approach for Errors-in-variables Model Identification, Automatica, 43, Merhav, S.J. (975), Unbiased Parameter Estimation by Means of Autocorrelation Functions, IEEE Trans. Autom. Contr., AC-20, Moor, B.D., Gevers, M., and Goodwin, G.C. (994), L 2 -overbiased, L 2 -underbiased and L 2 -unbiased estimation of transfer functions, Automatica, 30, Palanthandalam-Madapusi, H.J., van Pelt, T.H., and Bernstein, D.S. (2008), Existence Conditions for Quadratic Programming with a Sign-Indefinite Quadratic Equality Constraint, submitted for publication.

Kalman-Filter-Based Time-Varying Parameter Estimation via Retrospective Optimization of the Process Noise Covariance

2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016. Boston, MA, USA Kalman-Filter-Based Time-Varying Parameter Estimation via Retrospective Optimization of the Process Noise