Identification of Stochastic Systems Under Multiple Operating Conditions: The Vector Dependent FP ARX Parametrization Fotis P Kopsaftopoulos and Spilios D Fassois Abstract The problem of identifying stochastic systems under multiple operating conditions, by using excitation response signals obtained from each condition, is addressed Each operating condition is characterized by several measurable variables forming a vector operating parameter The problem is tackled within a novel framework consisting of postulated Vector dependent Functionally Pooled ARX (VFP ARX) models, proper data pooling techniques, and statistical parameter estimation Least Squares (LS) and Maximum Likelihood (ML) estimation methods are developed Their strong consistency is established and their performance characteristics are assessed via a Monte Carlo study I INTRODUCTION In conventional system identification a mathematical model representing a system at a specific operating condition is identified based upon a single data record of excitation response signals Yet, in many applications, a system may operate under different operating conditions in different intervals of time, maintaining one such condition in each interval These operating conditions affect the system characteristics, and thus its dynamics Typical examples include physiological systems under different environmental conditions, mechanical systems under different load or lubrication conditions, systems under different configurations, hydraulic systems operating under different temperatures or fluid pressures, material and structures (civil mechanical aerospace) under different environmental conditions (such as temperature and humidity), and so on In such cases it is of interest to identify a global model describing the system under any operating condition, based upon excitation response data records available from each condition It could be, perhaps, argued that this may be handled by using conventional mathematical models and customary identification techniques that could artificially split the problem into a number of seemingly unrelated subproblems and derive a model based upon a single data record at a time Nevertheless, such a solution would be both awkward and statistically suboptimal Awkwardness has to do with the fact that a potentially large number of seemingly unrelated models (one per operating condition) would be obtained Statistical suboptimality has to do with the fact that the set of FP Kopsaftopoulos and SD Fassois are with the Stochastic Mechanical Systems & Automation (SMSA) Group, Department of Mechanical & Aeronautical Engineering, University of Patras, GR 6500, Patras, Greece Tel/fax: +30 610 997 405 E-mail: {fkopsaf,fassois}@mechupatrasgr, Internet: http://wwwmechupatrasgr/ sms Corresponding author identified models would be of suboptimal accuracy This is due to two reasons The first is the violation of the principle of statistical parsimony (model economy) as a large number of models would be used for representing the system This would result in a large number of estimated parameters, and thus reduced accuracy The second is the ineffective use of the information available in the totality of the data records Indeed, not all available information would be extracted, as the interrelations among the different records would be ignored as a result of separating the problem into seemingly unrelated subproblems This work aims at the postulation of a proper framework and methods for effectively tackling the problem of identifying stochastic systems under multiple operating conditions This is to be based upon three important entities: 14th Mediterranean Conference on Control and Automation (MED 006) Ancona, Italy, June 006 (a) (b) (c) A novel, Functionally Pooled (FP), stochastic model structure that explicitly allows for system modelling under multiple operating conditions via a single mathematical representation This representation uses parameters that functionally depend upon the operating condition It also uses a stochastic structure that accounts for the statistical dependencies among the different data records Data pooling techniques (see [1) for combining and optimally treating (as one entity) the data obtained from the various experiments Statistical techniques for model estimation The resulting framework is referred to as a statistical Functional Pooling framework, and the corresponding models as stochastic Functionally Pooled (FP) models A schematic representation is provided in Fig 1 The only essential practical condition for using this framework and identifying global system models is that each operating condition corresponds to a specific value of a measurable variable, henceforth referred to as the operating parameter The case of a scalar operating parameter (for instance operating temperature) is treated in a companion paper [ The present paper focuses on the case of a vector operating parameter (consisting of two or more scalars, for instance operating temperature and humidity) It should be also noted that early versions of the Functional Pooling framework, including certain simple models and estimation methods, have been already applied to practical fault diagnosis problems with very promising results The interested reader is referred to [3, [4 for details
C5DAE F 7 ( & G G 1 7 1 &, 0 /IH& 8 (, J0 (<0 K&<G 6, 7 0 1 (,+ 8L/ 6H/ > + 7 & % & ' (*)& +,- / 0 + 0 1 (,+ ' -435+ 6/ / 1 +,-46, 7 (' ' & 8 + 0 & 9:( ;& '<0 1 )&- 7 (, 0 & )> (' +, & (6/ 8?7 (' ' & 8 + 0 & 9@( ;& '<7 ' (A/ /AB / & 7 0 1 (,/!#"$ Fig 1 Schematic representation of the problem showing the operating points on the (k 1, k ) plane, an excitation response data set corresponding to one particular operating point, and the VFP-ARX model structure II THE DATA SET Excitation response data records from different operating points corresponding to various values of the (vector) operating parameter are used 1 : Z NM1M { x k, y k k [k 1 k T, with t 1,, N, k 1 {k 1 1,, k 1 M 1 }, k {k 1,, k M } } In this expression t designates normalized discrete time (the corresponding analog time being t T with T standing for the sampling period), k [k 1 k T the operating parameter (without loss of generality assumed to be twodimensional), and x k, y k the excitation and response signals corresponding to k N stands for the signal length (in samples) corresponding to each single experiment (each k) A total of M 1 M experiments (one for each element of k) are performed, with the complete series covering the required range of each scalar parameter, say [kmin 1, k1 max and [kmin, k max, via the discretizations k 1 k1, 1 k, 1, km 1 1 and k k1, k,, km Hence each experiment is characterized by a specific value of k, say k [ki 1, k j This vector is, for simplicity of notation, also designated as the duplet k i,j (ki 1, k j ) (the first subscript of k i,j designating the value of k 1 and the second that of k ) III THE VFP-ARX MODEL STRUCTURE The Vector dependent Functionally Pooled AutoRegressive with exogenous excitation (VFP-ARX) model structure postulated for treating the problem is of the form: na nb y k + a i (k) y k [t i b i (k) x k [t i+w k (1) i1 i0 1 Lower case/capital bold face symbols designate column-vector/matrix quantities, respectively a i (k) w k iid N ( 0, σ w(k) ) k R () a i,j G j (k), b i (k) j1 b i,j G j (k) (3) j1 E { w ki,j w km,n [t τ } γ w [k i,j, k m,n δ[τ (4) with na, nb designating the AutoRegressive (AR) and exogenous (X) orders, respectively, x k, y k the excitation and response signals, respectively, and w k the disturbance (innovations) signal that is a white (serially uncorrelated) zero-mean with variance σ w(k) and potentially crosscorrelated with its counterparts corresponding to different experiments The symbol E{ } designates statistical expectation, δ[τ the Kronecker delta (equal to unity for τ 0 and equal to zero for τ 0), N (, ) Gaussian distribution with the indicated mean and variance, and iid stands for identically independently distributed As (3) indicates, the AR and X parameters a i (k), b i (k) are modelled as explicit functions of the vector k belonging to a p-dimensional functional subspace spanned by the (mutually independent) functions G 1 (k), G (k),, G p (k) (functional basis) The functional basis consists of polynomials of two variables (vector polynomials) obtained as crossproducts from univariate polynomials (of the Chebyshev, Legendre, Jacobi and other families [5) The constants a i,j, b i,j designate the AR and X, respectively, coefficients of projection Defining: A[B, k na 1 + a i (k)b i, B[B, k nb b i (k)b i i1 i0 where A[B, k, B[B, k are the AutoRegressive (AR) and exogenous (X) polynomials in the backshift operator B ( B j u k u k [t j ), the VFP-ARX representation of (1) is
rewritten as: A[B, k y k B[B, k x k + w k (5) As already mentioned, the innovations sequences w k corresponding to different operating conditions may be contemporaneously correlated, that is E{w ki,j w ki,j } σ w([k i,j and E{w ki,j w km,n } γ w [k i,j, k m,n Defining the VFP-ARX model s cross-section innovations vector as: w [ w k1,1 w k1, w k1,m w km1,m T with covariance matrix: Γw E ww T σw[k 1,1 γ w[k 1,1, k M1,M 6 4 γ w[k M1,M, k 1,1 σw[k M1,M (6) (7) 3 then the covariance matrix corresponding to the time instants t 1,, N is given by: 7 5 Γw Γw I N (8) with designating Kronecker product [6, chap 7 In the case of cross-sectionally uncorrelated innovations sequences with different variances (σw[k 1,1 σw[k 1, σw[k M1,M, groupwise heteroscedasticity), the covariance matrix is given by: Γw σw[k 1,1 I N 0 0 σw[k M1,M I N (9) In the simpler case of cross-sectionally uncorrelated innovations sequences with equal variances (σ w[k 1,1 σ w[k 1, σ w[k M1,M σ w, groupwise homoscedasticity), the covariance matrix is given by Γw σ wi NM1M with I NM1M indicating the unity matrix The representation of equations (1) (4) is referred to as a VFP-ARX model of orders (n a, n b ) and functional subspace dimensionality p, or in short a VFP-ARX(n a, n b ) p model It is parameterized in terms of the parameter vector: θ [ α i,j bi,j γw [k i,j, k m,n T i, j, m, n (10) with γ w [k i,j, k i,j σ w[k i,j The VFP-ARX representation is assumed to satisfy the following conditions: A1 Stability condition The poles of the AR polynomial (see (5)) lie inside the unit circle for all operating parameters k A Irreducibility condition The polynomials A[B, k and B[B, k are coprime (have no common factors) k A3 The input signal x k is stationary, ergodic and persistently exciting with E { x ki,j w km,n } 0 i, j, m, n IV MODEL ESTIMATION A VFP-ARX model corresponding to the true system of (1) (4) may be expressed as: na nb y k + a i (k) y k [t i b i (k) x k [t i+e k (11) i1 a i (k) i0 e k iid N ( 0, σe(k) ) k R (1) a i,j G j (k), b i (k) b i,j G j (k) (13) j1 j1 E { e ki,j e km,n [t τ } γ e [k i,j, k m,n δ[τ (14) with e k designating the model s one-step-ahead prediction error or residual (corresponding to w k ) with variance σe(k) In the general case the model s one-step-ahead prediction error (residual) sequences e k may be contemporaneously correlated, that is E{e ki,j e ki,j } σe[k i,j and E{e ki,j e km,n } γ e [k i,j, k m,n, with the model residual cross-section vector defined as e [ e ek1,1 k T M1,M The cross-section vector covariance then is: Γe E ee T σe[k 1,1 γ e[k 1,1, k M1,M 3 6 4 7 5 γ e[k M1,M, k 1,1 σe[k M1,M and the covariance matrix for the time instants t 1,, N is given as: Γe Γe I N The VFP-ARX model estimation problem may then be stated as follows: Given the excitation response data records select an element M( θ) from the VFP-ARX model set: M { M( θ) : A[B, k, θ y k B[B, k, θ x k +e k [t, θ γ w [k i,j, k m,n E{e ki,j [t, θe km,n [t, θ}, i, j, m, n } that best fits the measured data The model identification problem is usually distinguished into two subproblems: the parameter estimation subproblem and the model structure selection subproblem The present paper focuses on the parameter estimation part, while the model structure selection subproblem is treated in a forthcoming paper [7 A A Functionally Pooled Linear Regression Framework The VFP-ARX model (11) may be rewritten as: y k [ ϕ T k gt (k) θ+e k φ T k θ+e k (15) with: ϕ k g(k) θ h y k [t 1 y k [t na i T xk x k [t nb ˆG 1(k) G p(k) T h i a 1,1 a T na,p b0,1 b nb,p
Pooling together the expressions of the VFP-ARX model [Equation (15) corresponding to all operating parameters k (k 1,1, k 1,,, k M1,M ) considered in the experiments (cross-sectional pooling) yields: y k1,1 y km1,m φ T k 1,1 φ T k M1,M θ+ y Φ θ + e e k1,1 e km1,m Then, following substitution of the data for t 1,, N the following expression is obtained: with: y y[1 y[n, Φ y Φ θ + e (16) Φ[1 Φ[N, e B Least Squares (LS) Based Estimation Methods e[1 e[n Using the above linear regression framework the simplest possible approach for estimating the projection coefficient vector θ is based upon minimization of the Ordinary Least Squares criterion: J OLS (θ, Z NM1M ) 1 N e T e which leads to the Ordinary Least Squares (OLS) estimator: ˆθ OLS [ Φ T Φ 1[ Φ T y (17) A more appropriate criterion for the contemporaneously correlated residual case is (in view of the Gauss-Markov theorem [8) the Weighted Least Squares (WLS) criterion: J WLS (θ, Z NM1M ) 1 N e T Γ 1 w e 1 N et Γ 1 w e with Γw, Γw given by (7) and (8), respectively This leads to the Weighted Least Squares (WLS) estimator: ˆθ WLS [ Φ T Γ 1 w Φ 1[ Φ T Γ 1 w y (18) As the covariance matrix Γw is practically unavailable, it may be consistently estimated by using the Ordinary Least Squares (OLS) estimator, thus: Γ OLS w 1 N e[t, ˆθ OLS e T [t, ˆθ OLS with e[t, ˆθ OLS designating the residuals e for θ ˆθ OLS Then: Γ OLS OLS w Γ w I N The estimator in (18) is then expressed as: ˆθ WLS [ Φ T ( Γ OLS w ) 1 Φ 1[ Φ T ( Γ OLS w ) 1 y (19) while the final residual covariance matrix is estimated as: Γ WLS w 1 e[t, N ˆθ WLS e T [t, ˆθ WLS In the case of cross-sectionally uncorrelated residual sequences with different variances (σ e[k 1,1 σ e[k 1, σ e[k M1,M, groupwise heteroscedasticity) the residual covariance matrix Γw for all k has the same form as (9) As the variances are practically unavailable, they may be consistently estimated as [9: ˆσ e(k, ˆθ OLS ) 1 N e k [t, ˆθ OLS (0) for all k, with e k [t, ˆθ OLS designating the residual sequences obtained by applying OLS The ˆθ WLS estimator is then given by (19) The final residual variance is estimated as: ˆσ w(k) ˆσ e(k, ˆθ WLS ) 1 N e k [t, ˆθ WLS (1) In the simpler case of cross-sectionally uncorrelated residual sequences with equal variances (σ e[k 1,1 σ e[k 1, σ e[k M1,M σ e, groupwise homoscedasticity) the covariance matrix is Γw σ wi NM1M with I NM1M designating the unit matrix In this case the WLS estimator coincides with its OLS counterpart The residual variance is estimated by (1) C The Maximum Likelihood (ML) Estimation Method The complete parameter vector θ is estimated as: ˆ θ ML arg max L(θ, Γw/e) θ with L( ) the natural logarithm of the conditional likelihood function [10, [11 In the general case of normally distributed and contemporaneously correlated residuals e k k [10, p 198 we have: N L(θ, Γw/e[t 1,, e[t N ) ln p(e/θ, Γw) 1 e T Γ 1 w e NM 1M ln π N ln det{γ w} () with p( ) designating the Gaussian probability density function By setting: () becomes: Λ(θ) 1 N e[t, θe T [t, θ (3) L(θ, Γw/e) N T rλ(θ)γ 1 w N ln det{γ w} NM 1M ln π (4)
The first derivative of (4) with respect to Γw leads to: L(θ, Γw/e) Γw N Γ 1 w Λ(θ)Γ 1 w N Γ 1 w and equating it to zero yields Γw Λ(θ) As it may be shown, L(θ, Γw/e) is maximized with respect to Γw for Γw Λ(θ) and the maximum likelihood estimate of Λ(θ) is given by (3) for the optimum value of θ that has to be determined By replacing Γw with Λ(θ) in (4) yields: L(θ/e) NM 1M (ln π + 1) N ln det{λ(θ)} (5) Maximizing equation (5) with respect to θ leads to the ML estimator: ˆθ ML arg min det{λ(θ)} (6) θ and Γw Λ(ˆθ ML ) 1 N N e[t, ˆθ ML e T [t, ˆθ ML Notice that obtaining ˆθ requires the use of iterative opti- ML mization techniques [10 In the heteroscedastic case we have: ln det{λ(θ)} ln ( σe[k 1,1, θ σe[k M1,M, θ ) ln σ e[k 1,1, θ + + ln σ e[k M1,M, θ k 1 M 1 k M k 1 k1 1 k k1 ln σ e(k, θ) (7) Maximizing (5) with respect to θ leads to the optimal value of θ (as in (6)) and: ˆσ w(k) ˆσ e(k, ˆθ ML ) 1 N In the homoscedastic case we have: e k [t, ˆθ ML (8) ln det{λ(θ)} ln [ σ e(θ) M 1M M1 M ln σ e(θ) (9) and the final residual variance is given by (8) V CONSISTENCY ANALYSIS The consistency of the OLS, WLS and ML estimators of the previous section is examined For simplicity, the case of cross-sectionally uncorrelated innovations sequences with different variances (heteroscedastic case) is considered The estimated model is assumed to have the exact structure of the true system, with the latter and the excitation signals satisfying the assumptions A1, A and A3 of section III The proofs of the theorems are provided in [7 For the Least Squares (LS) estimators of the previous section we have the following theorem: Theorem 1: Least Squares estimator consistency Let θ o be the true projection coefficient vector, w k a white zero mean process with E{w k } σ w(k) for every operating point, and E{φ k φ k T } a nonsingular matrix Then: ˆθ LS N as θ o (N ) with as designating convergence in the almost sure sense [9, pp 18-19 Note that, using the Kolmogorov theorem [9, p 3, it is easily seen that: as well ˆσ w(k, N) as σ w(k) (N ) Theorem : Maximum Likelihood estimator consistency Let θo [ θ T o γ w [k i,j, k m,n be the true parameter vector, w k a normally distributed zero mean white process with E{w k } σ w(k) for every operating point, and E{φ k φ T k } a nonsingular matrix Then: ˆ θ ML N as θ o (N ) VI MONTE CARLO STUDY The effectiveness of the OLS, WLS and ML estimators for VFP-ARX models is now examined via a Monte Carlo study It is noted that the ML estimator is initialized by the WLS estimates, and makes use of the Gauss Newton nonlinear optimization scheme [10 (maximum number of iterations 500; maximum number of function evaluations 5000; termination tolerance of the loss function 10 ; termination tolerance of the estimated parameters 10 1 ) The study is based upon a VFP ARX(4,) 9 model with zero delay (b 0 0 in the exogenous polynomial) and AR, X subspaces consisting of the cross-products of the first three (hence functional dimensionality p 9) shifted Chebyshev polynomials of the second kind [5 It includes 500 runs, in each one of which the first scalar operating parameter takes 16 values (ki 1 [1, 16) and the second scalar operating parameter takes 0 values (kj [1, 0) Thus, each run includes excitation response signals (of length equal to N 104 samples) from M 1 M 30 operating conditions Each response is corrupted by random noise at the 10% standard deviation level in accordance with the ARX structure expression (innovations standard deviation over the noise free response standard deviation equal to 010) The innovations sequences corresponding to different operating conditions are cross-sectionally uncorrelated, but characterized by different variances (groupwise heteroscedasticity) Some of the true system coefficients of projection (out of a total of 54) are indicated in the second column of Table I Monte Carlo partial estimation results by the Ordinary Least Squares (OLS), the Weighted Least Squares (WLS) and the Maximum Likelihood (ML) methods are presented in Table I (mean estimates ± standard deviations) Some of these results are pictorially depicted in Fig (AR/X coefficient of projection estimates, 95% confidence intervals) As may be readily observed, the results are all very accurate All three methods provide essentially unbiased estimates, with the WLS and ML methods expectedly providing better accuracy for the coefficients of projection (smaller standard deviations, thus narrower confidence intervals)
TABLE I INDICATIVE MONTE CARLO ESTIMATION RESULTS FOR THE VFP-ARX(4,) 9 MODEL (SELECTED PARAMETERS; 500 RUNS PER METHOD; MEAN ESTIMATE ± STANDARD DEVIATION) COEFF TRUE OLS ESTIMATE WLS ESTIMATE ML ESTIMATE a 1,1-00459 -00460 ± 000038-004599 ± 00004-004598 ± 00004 a 1,7-00058 -000579 ± 000018-000578 ± 00001-000579 ± 00001 a,1-03869 -038690 ± 00003-03869 ± 00001-03869 ± 00001 a,5 0035 00356 ± 0000 00349 ± 000015 00349 ± 000015 a 3,3-00533 -005333 ± 00004-005334 ± 000016-005334 ± 000016 a 3,8 00179 001790 ± 000018 001788 ± 00001 001788 ± 00001 a 4,1 06046 060466 ± 000030 060467 ± 0000 060467 ± 0000 a 4,9-00013 -000133 ± 000015-000135 ± 000011-000135 ± 000011 b 1,1 07453 07456 ± 000061 07457 ± 00004 07457 ± 00004 b 1,8 0484 04843 ± 000079 04844 ± 000035 04844 ± 000035 b,3-01987 -019871 ± 00008-019871 ± 000043-019871 ± 000043 b,6 0168 016815 ± 000084 01686 ± 000045 01687 ± 000045 Fig 3 finally depicts the true system s frequency response magnitude versus frequency and the first scalar operating parameter k 1 (for set k ) along with their mean estimated (by the OLS, WLS and ML methods) counterparts The agreement between each estimate and the true frequency response magnitude is excellent ACKNOWLEDGMENT The authors acknowledge the funding of this work by the European Social Fund (ESF), Operational Program for Education and Vocational Training II (EPEAEK II), and particularly the program PYTHAGORAS II of the Greek Ministry of Education REFERENCES [1 T Dielman, Pooled Cross-Sectional and Time Series Data Analysis Marcel Dekker, Inc, 1989 [ J Sakellariou and S Fassois, Externally Dependent Functional ARX (FARX) Models for the Identification of Dynamical Systems Under Multiple Operating Conditions, submitted for publication [3 J Sakellariou, K Petsounis, and S Fassois, A Functional Model Based Method for Fault Detection and Identification in Stochastic Structural Systems, in Proc of the Second European Workshop on Structural Health Monitoring (SHM 004), Munich, Germany, 004, pp 679 686 [4 J Sakellariou and S Fassois, Fault Detection and Identification in a Scale Aircraft Skeleton Structure via a Functional Model Based Method, in Proc of the International Conference on Noise and Vibration Engineering (ISMA 004), Leuven, Belgium, 004, pp 515 59 [5 C Dunkl and Y Xu, Orthogonal Polynomials of Several Variables Cambridge University Press, 001 [6 D Bernstein, Matrix Mathematics Princeton University Press, 005 [7 F Kopsaftopoulos and S Fassois, On the Estimation of Vector Dependent Functionally Pooled ARX (VFP-ARX) Models, under preparation for publication [8 W Greene, Econometric Analysis Prentice-Hall International, Inc, 1997 [9 H White, Asymptotic Theory for Econometricians, nd ed Academic Press, 001 [10 T Soderstrom and P Stoica, System Identification Prentice Hall, 1989 [11 J Mendel, Lessons in Estimation Theory for Signal Processing, Communications and Control Prentice Hall, 1995 α 1,1 0045 0046 0047 α,4 α 3,6 b 1,1 85 8 75 75 8 85 0746 0744 OLS WLS ML α 1, α,9 α 4,5 b 1,6 0045 0046 0053 0055 45 5 55 07 0718 OLS WLS ML α,3 α 3,3 b,3 001 00105 0011 0053 00535 α 4,9 0054 1 15 0198 0 OLS WLS ML Fig VFP-ARX(4,) 9 true values of the coefficients of projection (dashed lines) and Monte Carlo estimates (boxes indicate mean ±196 standard deviations) based upon the OLS, WLS and ML methods (500 runs per method) Fig 3 VFP-ARX(4,) 9 based frequency response magnitude versus frequency and k 1 (k is set to k9 ): (a) true system, (b) mean OLS estimate, (c) mean WLS estimate and (d) mean ML estimate (mean parameters over 500 runs)