System Modeling and Identification Lecture Note #7 (Chap.) CBE 702 Korea University Prof. Dae Ryoo Yang
Chap. Real-time Identification Real-time identification Supervision and tracing of time varying parameters for Adaptive control, filtering, prediction Signal processing Detection, diagnosis artificial neural networs, etc. Identification methods based on set of measurements are not suitable Only few data needed to be stored Drawbacs Requires a priori nowledge on model structure Iterative solutions based on larger data sets may be difficult to organize
Recursive estimation of a constant Consider the following noisy observation of a constant parameter 2 y = φθ + v, Ev { } = 0, Evv { } = σδ ( φ =, ) i j ij he least-squares estimate is found as the sample average θ = (/ ) i = y i Recursive form θ = i = y i θ = θ+ (/ )( y θ ) ( ) θ y = i= Variance estimate of the least-squares estimate p ( ) i= Note that p 0 as i 2 = σ φφ i i E θ θ θ θ 2 p = + 2 = 2 σ σ + p p p p {( )( )} σ
Derivation of recursive least-squares identification Consider as usual the regressor φ i and the observation y i Φ = φ φ Y = y y [ ] [ ] he least-squares criterion based on samples is V( θ) = (/2)( Y Φθ) ( Y Φ θ) = (/2) θ ( ) θ ( ) he ordinary least-squares estimate = Φ Φ Φ = θ ( ) Y ( φφ ) ( ) i i φ i i iy = = i Introduce the matrix = ( ) ( ) i i = Φ i= Φ P φφ P = P + φφ ( P 0 = 0) P = iyi + y = P P i + y = + P y = θ ( φ φ ) ( θ φ ) θ φ ( φθ ) Alternative form (avoiding inversion of matrices) P = ( Φ Φ ) = ( Φ Φ + φφ ) = ( P + φφ ) = P P φ ( I + φ P φ ) φ P cf) ( A BC) A A BI ( CA B) CA ( P = αi, with α ) + = + Matrix inversion lemma 0
Recursive Least-Squares (RLS) Identification he recursive least-squares (RLS) identification algorithm θ = θ + Pφ θ : the parameter estimate = y φθ : the prediction error P φφ P P = P, P given 2 : the parameter covariance estimate except σ + 0 φ P φ Some properties of RLS estimation Parameter accuracy and convergence Q( θ ) = (/2)( θ θ ) P ( θ θ) = (/2) θ P θ 2( Q( θ ) Q( θ )) = θ P θ θ P θ θ = θ + Pφ P P = φφ P = θ ( P P ) θ + 2θ φ + φ Pφ 2 = ( θ φ + ) + ( + φ P φ ) 2 2 = ( θ φ + ) + 2 2 φ P φ + 2 2 φ P φφ P φ P = P + φ P φ φ φ φ φ
Under the linear model assumption, If v =0 for all, Q decreases in each recursion step If Q tends to zero, it implies that tends to zero as the sequence of 2 weighting matrix P is an increasing sequence of positive definite matrix where P P for all >0. heorem. Q( θ ) Q( θ ) = v 2 2 2 2+ φ P φ y = φθ + v so that = θ φ + v he errors of estimated parameters and the prediction error for leastsquares estimation have a bound determined by the noise magnitude according to V( θ) + Q( θ) = ( θ) θ ( ) + θ P θ = vv 2 2 2 It implies 2 θ = vv ( θ) θ ( ) θ ΦΦ θ P0 2 2 2 he parameter convergence can be obtained for a stationary stochastic process {v } if ΦΦ >ci p p. (c is a constant) hus, poor convergence is obtained in cases of large disturbance and a ran-deficient ΦΦmatrix. θ
Properties of P matrix P is a positive definite and symmetric matrix. (P =P >0) P 0 as he matrix P is asymptotically proportional to the parameter estimate covariance provided that a correct model structure has been used. It is often called the covariance matrix. Comparison between RLS and offline LS identifications If initial value P 0 and θ 0 can be chosen to be compatible with the results of the ordinary least-squares method, the result obtained from RLS is same as that of offline least-squares identification. hus, calculate the initial values for RLS from the ordinary LS method using some bloc of initial data.
Modification for time-varying parameters he RLS gives equal weighting to old data and new data. If the parameters are time-varying, pay less attention to old data. Forgetting factor (λ) i 2 J( θ) = λ ( y ) (0 ) i φθ i < λ = 2 Modified RLS θ = θ + P φ = y φθ P ( φφ P P = P ), P0 given λ λ + φ P φ Disadvantages he noise sensitivity becomes more prominent as λ decreases. P matrix may increase as grows if the input is such that the magnitude of P - φ is small (P-matrix explosion or covariance matrix explosion)
Choice of forgetting factor rade-off between the required ability to trac a time-varying parameter (i.e., a small value of λ) and the noise sensitivity allowed A value of λ close to : less sensitive to disturbance but slow tracing of rapid variations in parameters Default choice: 0.97 λ 0.995 Rough estimate of data points in memory (time constant): /( λ) Example.3: Choice of forgetting factor y = φθ+ v, Ev Evv 2 { } = 0, { i j} = σδij ( φ =, ) θ = θ + P φ = y φθ P φφ P P = ( P ), P = 0 λ λ+ φ P φ θ = 2 λ = 0.99, 0.98, 0.95 Faster tracing
Delta model he disadvantage of z-transformation is that the z-transformation parameters do not converge to the Laplace transformation continuous parameters, from which they were derived, as sampling period decreases. Very small sampling periods yield the very small numbers from the transfer function numerator. he poles of transfer function approach the unstable domain as the sampling period decreases. hese disadvantages can be avoided by introducing a more suitable discrete model. δ-operator: δ ( z)/ h x + =Φ x +Γ u δ x =Φ x +Γ u = (/ h)( Φ I) x + (/ h) Γu y = Cx y = Cx his formulation maes the state-space realization and the corresponding system identification less error-prone due to favorable numerical scaling properties of the Φ and Γ matrices as compared to the ordinary z- transform based algebra.
Kalman filter interpretation Assume that the time-varying system parameter θ may be described by the state-space equation θ+ = θ + v, Ev { i} = 0, Evv { i j} = Rδij, i, j y = φθ + e, Ee { i} = 0, Eee { i j} = R2δij, i, j Kalman filter for estimation of θ θ = θ + K K = P φ /( R + φ P φ ) 2 = y φθ P φφ P P = P + R R2 + φ P φ Differences from RLS θ = θ + P φ = y φθ P φφ P P ( P ), P given = 0 λ λ + φ P φ he dynamic of P changes from exponential growth to linear growth rate for φ =0 due to R. P of the Kalman filter does not approach zero as for a nonzero sequence {φ }. (RLS)
Other forms of RLS algorithm Basic version: - Normalized gain version θ = θ + K [ y φθ ] K P = Pφ = P λ + φ P φ P ( = P ) λ λ + φ P φ φ φφ Multivariable case: P P = R and R = γ R θ = θ + γ R φ [ y φθ ] R = R + γ ( φφ R ) γ λ = + γ θ λ φθ φθ = argmin j [ y ] Λ [ y ] θ 2 i= j=+ i θ = θ + K [ y φθ ] ( ) K = P φ λ Λ + φ P φ ( = φ ( λλ + φ φ) φ ) λ P P P P P Λ =Λ + γ ( Λ ) if λ = λ, j, λ = λ j j=+ i j i Output error covariance (Prediction gain update) (Parameter error covariance update) (Output error covariance update)
Recursive Instrumental Variable (RIV) Method he ordinary IV solution RIV θ θ = θ + K K = P z /( + φ P z ) P z ( Z ) Z Y ( z ) ( ) i iφ zy = i i= i i = Φ = = y φθ P z φ P = P + φ P z Standard choice of instrumental variable: z = ( x x u u ) n n he variable x may be, for instance, the estimated output. RIV has some stability problem associated with the choice of IV and the updating of P matrix. A B
Recursive Prediction Error Methods (RPEM) RPEM Consider a weighted quadratic prediction error criterion i 2 i J( θ) = γ λ ( y ) ( ) i φθ i γλ = i= = 2 i J ( θ) =γ λ ψ ( ψ / θ) i= i i i [( / ) J ( ) ] ( θ) γ [ ψ J ( θ) ] = γ λ γ θ ψ = J + General RPEM search algorithm 0 0 J ( θ) = J ( θ) + γ ψ( θ) ( θ) J( θ) ( θ: optimal) θ = θ R J ( θ ) = θ + γ R ψ R = R + γ ( φφ R )
Stochastic gradient methods A family of RPEM Also, called stochastic approximation or least mean square (LMS) Uses steepest descent method to update the parameters ψ / θ = ( φθ y )/ θ = φ for linear model y = φθ he algorithm: (ime-varying regressor-dependent gain version) θ = θ + γφ γ = y φθ = Qφ / r ( Q = Q > 0) r = r + φ Q φ Rapid computation as there is no P matrix to evaluate Good detection of time-varying parameters Slow convergence and noise sensitivity Modification for time varying parameters r λr φ Q φ 0 λ Keeping the factor r at a lower magnitude
RPEM for multivariable case θ = θ + γ R ψ Λ ( θ ) ( ) R = R + γ ψ Λ ψ R Λ =Λ + γ ( Λ ) Projection of parameters into parameter domain D M θ = θ + K [ y φθ ] θ θ if θ D = θ if θ D M M
Recursive Pseudolinear Regression (RPLR) Recursive pseudolinear regression (RPLR) Also, called Recursive ML estimation, Extended LS method he regression model: he recursive algorithm θ = θ + K K = P φ /( + φ P φ ) P = y φθ P φφ P = P + φ P φ he regression vector: y = φθ + v θ = ( a an b b ) A n c c B nc θ = ( y y u u ) n n n he algorithm may be modified to iterate for the best possible. Estimate of v A B C
Application to Models RPEM to state-space innovation model Predictor Algorithm = y y x + ( θ) = F( θ) x ( θ) + G( θ) u + K( θ) v y = H( θ) x( θ) Λ =Λ + γ Λ R = R + γ ψ Λ ψ R θ = θ + γ R ψ Λ x = Fx + Gu + K y + H x + + W = ( F KH ) W + M KD ψ = + = W H + D ( θ ) + + + where F = F( θ ) G = G( θ ) H = H( θ ) K = K( θ ) d ψ( θ) y ( θ) dθ d W( θ) x( θ) dθ Innovation v = y y ( θ ) D( θ) H( θ) x( θ) θ M F x G u K θ [ ( θ) + ( θ) + ( θ ) ] θ,,, x u
RPEM to general input-output models System: Bq ( ) Cq ( ) Aq ( ) y = u + e Fq ( ) Dq ( ) Predictor Dq ( ) Aq ( ) Dq ( ) Bq ( ) y ( θ ) = y u Cq ( ) + Cq ( ) Fq ( ) Error definitions Dq ( ) Bq ( ) ( ) Dq Cq ( ) Fq ( ) Cq ( ) Aq ( ) = + aq + + a q Bq ( ) = bq + + b q ( ) = y y( ) = Aq ( ) y u = v θ θ w ( θ ) = Bq Fq ( ) ( ) u v Aq y w ( θ) = ( ) ( θ) Fq ( ) = + fq + + f q Cq ( ) = + cq + + c q Dq ( ) = + dq + + d q Parameter vector θ = a an b a bn f b fn c f cn d c d nd Regressor ϕ θ = y y u u w w v v ( ) na nb nf nc nd n b n c a n n n n b f d n c a n n f n d
Error calculations w ( θ) = bu + + b u fw ( θ) f w ( θ) n n n n Expression for prediction error ( θ) = v ( θ) + dv ( θ) + d v ( θ) c ( θ) c ( θ) = y θ ϕ( θ) Gradient expressions b b f f v ( θ) = y + ay + + a y w ( θ) n n a a ( θ) = v ( θ) + dv ( θ) + d v ( θ) c ( θ) c ( θ) n n n n d d c c n n n n = y + ay + + a y bu b u + fw ( θ) + + f w ( θ) n n n n n n c ( θ) c ( θ) + dv ( θ) + + d v ( θ) = y y n n n n ( θ ) Cq ( ) Fq ( ) y( θ ) = Fq ( ) Cq ( ) Dq ( ) Aq ( ) y ψ + Dq ( ) Bq ( ) u y ( θ ) ( θ) = θ d d c c a a b b f f c c d d y ( θ ) Dq ( ) y i a i Cq ( ) y ( θ ) Dq ( ) u i b i Cq ( ) Fq ( ) y ( ) θ ( ) Dq ( ) ψ θ = = w ( ) i θ f i Cq ( ) Fq ( ) y ( θ ) ( ) i θ c i Cq ( ) y ( θ ) v ( ) i θ di Cq ( )
Algorithm ϕ y ψ n n n n b b f f n n a + n + n + n + = y y R = R + γ ψψ R θ = θ + γ R ψ w = bu + + b u fw f w v = y + ay + + a y w n + n + c a = y y u u w w v a b f v v = + n n n n + + n n n n n n n n n n n n = θ ϕ dv + + d v c c d d d c c y = y + dy + + d y cy c y d d c c u = u + du + + d u gu g u d d g g w = w + dw + + d w gw g w = c c n n v = v cv c v n n c c d d g g + n + n + n + n + n + c c c = y y u u w w a b f v g C q F q i = coefficients of ( ) ( ) v d
Extended Kalman Filter Kalman filter for nonlinear state-space model System: x F ( x, ) G( ) u w ( Eww { } R) + = θ + θ + i j = δij y = H ( x, θ) + e ( Eee { } = δ R, Ewe { } = δ R ) i j ij 2 i j ij 2 With extended state vector: x X X = = F( X ) + G( θ ) u + + w θ y = H( X) + e where F ( x, θ) G w F( X) = G = w = H( X) = H ( x, θ) θ 0 0 Linearization F df( X, u) dh( X, u) = H = dx dx X= X X= X
Algorithm Given x, θ, P (start from =0) 0 0 0 K = [ FPH + R ][ HPH + R ] 2 2 X = FX + G( θ ) u + K [ y H X ] + P = FPF + R K [ HPH + R ] K + 2 o avoid the calculation of large matrices, partition the matrices Use of latest available measurements to update the parameter estimates K = [ FPH + R ][ HPH + R ] 2 2 X = X + K [ y H X ] X = FX + G( θ ) u + df( X, u) dh( X, u) F( X ) = H( X ) = dx dx X= X X= X P = FPF + R K [ HPH + R ] K + 2
Subspace Methods for Estimating State-Space Models Estimation of system matrices, A, B, C, and D offline x + = Ax + Bu + w x R u R n, p y = Cx + Du + v y R Assuming minimal realization If the estimates of A and C are nown, estimates of B and D can be obtained using linear least-squares method. y = CqI ( A) Bu + Du + v or y = CqI ( A) x δ + CqI ( A) Bu + Du + v 0 he estimates of B and D will converge to true values if A and C are exactly nown or at least consistent. C CA Or = r CA If the (extended) observability matrix (O r ) is nown, then A and C can be estimated. (r>n) m
For linear transformation, For now system order (n * =n) G = O pr n C = O p n r * ( ) r(:,: ) For unnown system order (n * >n) G = O pr n G = USV r O ( p+ : pr,: n) = O (: pr ( ),: na ) Partition the matrices depending on the singular values and neglect the portion for smaller singular values he estimate of the observability matrix can be O = US or O = U, etc. O = UR ( R is invertible) Obtain estimates of A and C from the estimated observability matrix Noisy estimate of the extended observability matrix G = USV = USV + (other terms) = O r + EN If O r explains the system well, the E N stems from noise. If E N is small, the estimate is consistent r x = x O = O r r * ( ) (SV D) G = USV G = USV = O r OV r = US = O r r r r r
Using weighting matrices in the SVD For flexibility, pretreatment before SVD can be applied G = WGW = USV USV hen the estimate of extended observability matrix becomes = W UR Or 2 When the noise is present, W has important influence on space spanned by U and hence on the quality of the estimate of A and C. Estimating the extended observability matrix he basic expression y = Cx + Du + v + i + i + i + i = CAx + CBu + Cw + Du + v + i + i + i + i + i = CA x + CA Bu + CA Bu + + CBu + Du i i i2 + + i + i + CA w + CA w + + Cw + v i i2 + + i + i
Define vectors y u D 0 0 0 y r u CB D 0 0 + r + Y = U = Sr = r2 r3 y+ r u+ r CA B CA B CB D r r hen Y = Ox r + SU r + V Introduce r r r Y = [ Y Y Y ] X = [ x x x ] hen 2 N 2 U = [ U U U ] V = [ V x V ] Y= O X+ S U+ V r r r r 2 N 2 r o remove U-term, use Π = IU ( UU ) U U YΠ = XΠ + VΠ since UΠ = U UU ( UU ) U = 0 O U r U U U Choose a matrix Φ/Ν so that the effect of noise vanishes G = YΠ Φ = O XΠ Φ + VΠ Φ O + V U U U N N N limvn = lim VΠ Φ = 0 N N U N limn = lim XΠ Φ = ( has fuul ran n) N N U N r r N N N N (projection orthogonal to U)
Finding good instrument s s s Let F = [ ϕ ϕ2 ϕ N ] N N N N VΠ Φ = V ϕ V U U U U N N N = = N U = = From the law of large numbers lim N hus, choose ϕ s so that they are uncorrelated with V. ypical choice is s r r r r s ( ) ( ) ( ) ( ϕ ) { ( ϕ ) } ( ) r r ( ) 0 { } { ( ) } ϕ VΠ Φ = E V E V U R E U U N where R E U U s r r s u { } = (If V and U are independent) u ϕ s y y s = u u s 2
Finding the states and Estimating the noise statistics r-step ahead predictor Y =Θ ϕ +Γ U + E Y =Θ F +Γ U+ E r s r Least-squares estimate of parameters [ ] ΦΦ ΦU Θ Γ = ( YΦ YU Θ= Π Φ ΦΠ Φ ) Y U U UΦ UU Predicted output r r Y = ( Y Y N = YΠ Φ ΦΠ Φ ) Φ U U SVD and deleting small singular values Y USV = OR SV = O X r Alternatively X= LY = x x where L= R U [ ] N Noise characteristics can be calculated from w = x Ax Bu v = y Cx Du + r ( X = R SV, Or = UR ) ( X = R U USV = R U Y)
Subspace identification algorithm. From the input-output data, form G = YΠ Φ N 2. Select weighting matrix W and W 2 and perform SVD G = WGW = USV USV MOESP: N4SID: IVM: CVA: 2 /2 /2 W = ((/ N) YΠ Y ), W2 = ((/ N) ΦΠ Φ ) U U 3. Select a full ran matrix R and define O. hen solve for C and A. (ypical choices are R=I, R=S, or R=S /2 r = W UR ) C = O (: p,: n) r W = I, W = ((/ N) ΦΠ Φ ) ΦΠ W I W N W N W N 2 U U =, 2 = ((/ ) ΦΠ Φ ) Φ U /2 /2 = ((/ ) YΠ Y ), 2 = ((/ ) ΦΦ ) U Or( p+ : pr,: n) = Or(: pr ( ),: na ) 4. Estimate B and D and x 0 from the linear regression problem N 2 argmin y CqI ( A) Bu Du CqI ( A) x0δ BDx,, 0 N = 5. If a noise model is sought, calculate X and estimate the noise contributions. U
Nonlinear System Identification General nonlinear systems xt () = f ( xt ()) + gxt ( (), ut ()) + vt () yt () = hxt ( (), ut ()) Discrete-time nonlinear models Hammerstein models y = Bz Az ( ) ( ) Fu ( ) x = f ( x, u ) + + v y = hx (, u ) + w Wiener models Bz ( ) y = F( u ) Az ( )
Wiener Models Nonlinear aspects are approximated by Laguerre and Hermite series expansions i 2 Lauerre operators: sτ a z a Li () s = Li ( z) = + sτ + sτ az az Hermite polynomial: i 2 ( 2 i x d x H ) i ( x) = ( ) e e i dx Dynamics is approximated by Laguerre filter and the static nonlinearity is approximated by the Hermite polynomial. x = L( zu ) i i 2 n = ii ( ) ( ) ( ) 2 in i i2 in i = 0 i = 0 i = 0 y c H x H x H x 2 n c yh x H x H x N 2 n ii ( ) ( ) ( ) 2 i = n i i2 in N = i = 0 i = 0 i = 0 2 n i
Volterra-Wiener Models Volterra series expansion L L L L L L 0 2 2 = + i i + i i i + + 2 i i i2 i i = 0 i = 0 i = 0 i = 0 i = 0 i = 0 y h hu hu u hu u u Volterra Kernel 2 2 n-dimensional weighting function, (L is same as the model horizon in MPC) Limitations of Volterra-Wiener models Difficult to extend to systems with feedbac Difficult to relate the estimated Volterra ernels to a priori information i h (multidimensional impulse response coefficients)
Power Series Expansions Example: ime-domain identification of nonlinear system System: x = ax + b xu + bu 0 ( ) Unnown parameters: θ = a b b0 Collection of data: x x0 xu 0 0 u0 a = b x N xn xn un u N b 0 ( ) Least-squares solution: θ = Φ Φ Φ Properties of this approach Standard statistical validation tests are applicable without extensive modifications Frequency domain approach is also possible Y
Continuous-time version System: Unnown parameters Solving ODE: Collection of data: x =ax b xu+ b u Least-squares solution: 0 θ = ( a b b ) 0 t t t t xdt =a xdt b xudt+ b udt + + + + 0 t t t t Power-series expansion of general nonlinear system Similarly, he parameters can be obtained from the least-squares method t t t x x xdt xdt xdt 0 t 0 t 0 t a 0 = b tn tn t x N N x N xdt xudt udt b 0 t t t θ ( ) = Φ Φ Φ m i i j ij i= i= j= N N N Y xt () = ax () t bx () tu () t + vt ()