Nonlinear System Identification Using MLP Dr.-Ing. Sudchai Boonto

Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkut s Unniversity of Technology Thonburi Thailand

Nonlinear System Identification Given a data set Z N = {y(k), φ(k); k = 1,, N}, generated from a nonlinear function y = g(φ) where the regressor vector φ(k) contains past input and output data A nonlinear regressor model is of the form y(k) = g(φ(k), θ) + e(k) y(k) represents measured system outputs the regressor vector φ(k) R r contains samples of measured input and output signals taken prior to sampling instant k θ R n p is a vector whose elements are the weights and biases of the MLP network to be trained 2/31

Nonlinear System Identification cont The nonlinear function g( ) describes the mapping of neural network inputs into outputs The perturbation v(k) is already transforms into the form where e(k) is a white noise process v(k) = e(k) + past values of v the disturbance acting on the measured plant output may be non-white, a noise model taking account of this is absorbed into the function g( ) and is represented together with the plant dynamics by the parameter vector θ the prediction error is white when θ takes its optimal value The predictor model ŷ(k) = g(φ(k), θ) 3/31

NNARX Model Structure The simplest neural network based model structure is the NNARX structure, where ] T φ(k) = [y k 1 y k n u k d u k d m φ 1 (k) φ r (k) NN θ y(k) ŷ(k) ε(k) 4/31

NNARX Model Structure cont a static feedforward network is used to represent a dynamic nonlinear system as predictor model in training mode, such a predictor model is used to find the weights and biases that give the best fit between predicted and measured output applied the data to the network, and the network output ŷ(k) obtained with weights and biases according to θ(l) are compared with the measured outputs y(k) the resulting values of ŷ(k) and ε(k) are used in Levenberg-Marquardt backpropagation to compute a search direction and update θ(l) 5/31

NNARMAX Model Structure The regressor vector for an NNARMAX model is ] T φ(k, θ) = [y k 1 y k n u k d u k d m ε k 1 ε k n The predictor model includes feedback from network output to inputs the regressor vector depends on the network parameters θ, thus instead of ŷ(k) = g(φ(θ), θ) we have ŷ(k) = g(φ(k, θ), θ) 6/31

NNARMAX Model Structure cont { φ(k) ε k n NN y(k) z 1 ŷ(k) z 1 ε k 1 θ ε(k) 7/31

NNARMAX Model Structure cont Since the predicted output is not a function of independent variables, the gradient of the cost is determined by the total derivative dŷ(k) dθ = ŷ(k) θ + ŷ(k) dε(k 1) + + ŷ(k) dε(k n) ε(k 1) dθ ε(k n) dθ Denote ψ(k) = ϕ(k) ŷ(k) ψ(k 1) ŷ(k) ψ(k n) ε(k 1) ε(k n) 8/31

NNARMAX Model Structure backpropagation Computing ψ k requires also the partial derivatives of predicted outputs with respect to past prediction errors The problem simplifies if the NNARMAX structure is modified such that the disturbance model is linear Such a regressor model is ŷ(k) = g( φ(k), θ) + (C(z 1 ) 1)ε(k) where φ and θ represent the regressor vector and the network parameters, respectively 9/31

NNARMAX Model Structure backpropagation cont The polynomial C(z 1 ) = 1 + c 1 z 1 + + c n z n describes the noise characteristics as in linear ARMAX models NN φ(k) θ ε(k) C(z 1 ) 1 ŷ(k) y(k) 10/31

NNARMAX Model Structure backpropagation cont With the modification, the total derivative becomes or ψ(k) = ϕ(k) c 1 ϕ(k 1) c n ϕ(k n) where ψ(k) = 1 C(z 1 ) ϕ(k) θ c θ = 1 c n ŷ(k) θ ε(k 1) and ϕ(k) = ε(k n) 11/31

NNARMAX Model Structure backpropagation cont Backpropagation can be used to compute ŷ(k) θ ψ(k) is then computed by forming ϕ(k) and filtering it through 1 C(z 1 ) the coefficients of C(z 1 ) are contained in θ and updated in each iteration the stability of 1 C(z 1 ) must be checked at each step If C is unstable the spectral factorization theorem can be used to replace it with a stable polynomial 12/31

Practical Issues Befor a two-layer MLP network is to be trained for a control application, the following choices have to be made Sampling time Dynamic system order Number of hidden neurons Training signal 13/31

ph Plant plant Figure: ph neutralisation process 14/31

ph Plant plant u 1 : the base (NaOH) flow rate u 2 : the buffer (NaHCO 3 ) flow rate u 3 : the acid (HNO 3 ) flow rate y : the ph of the effluent solution Figure: ph neutralisation process 14/31

ph Plant plant Figure: ph neutralisation process u 1 : the base (NaOH) flow rate u 2 : the buffer (NaHCO 3 ) flow rate u 3 : the acid (HNO 3 ) flow rate y : the ph of the effluent solution u 3 and volume (V ) of the tank are assumed to be constant 14/31

NNARX y k 1 u k d Neural Network y k n u k d m ŷ k Figure: NNARX model structure 15/31

NNARX-Lag space Dynamic system order (lag space) y(k) = g 0 [ϕ(k), θ)] ϕ T (k) = [y(k 1) y(k n), u(k d) u(k d m)] Too small implies that essential dynamics will not be modelled Use He and Asada (1993) method by using function lipschit but sometime it doesn t tell anything Trial and error, start from small value Recommenced value: n = 4, m = 4, d = 1 16/31

NNARX-Sampling Time Sampling Time (T s ) A reasonable choice of T s is between 5-10 samples in the rise time 17/31

NNARX-Sampling Time Sampling Time (T s ) A reasonable choice of T s is between 5-10 samples in the rise time 11 10 T r = 50 sec Step test 9 ph values 8 7 6 In this case, we choose T s = 5 s 5 4 3 0 50 100 150 200 Time (sec) 17/31

NNARX-Hidden neurons Number of hidden neurons Too small: the network will not have sufficient degrees of freedom to adequately represent the mapping Too large: more local minima, overfit, slow 1 08 06 04 02 Function Approximation Output 0 02 04 06 08 1 0 2 4 6 8 10 Input Start for a large value, ie 10, and reduce it to the smallest value that can map the data 18/31

Training Signal The input signal should : Excite all the frequencies of interest (persistently excitation) Excite the process over the whole of the required operation region For nonlinear systems: It is well known now that PRBS signal is not enough Multi-level PRBS Multisine You should know: System bandwidth Input ranges 19/31

Training Signal Multi-Level PRBS Multi-Level PRBS: u(k 1) u(k) = e(k) with probability α with probability 1 α where e(k) is a normal distribute random noise Note: The minimum hold time should be long enough so that the systems output has time to approach the new set point 20/31

Training Signal Multi-Level PRBS cont 30 Estimation Validation 20 u 1 10 0 0 05 1 15 2 x 10 4 12 10 y(ph) 8 6 4 0 05 1 15 2 time (sec) x 10 4 Figure: Estimation (first 10000 points) and validation (remaining 10000 points) input-output data Multisine 21/31

Training Signal Multi-Level PRBS cont 21/31

Training Signal Multisine Multisine where u(k) = N = ω max ω min η + 1 N A cos(2πω l k + ϕ l ) l=1 ϕ is random phase (more robust than Schroeder s phased) A is a required overall amplitude ω is a required frequency Normally start from 0 to 3 times bandwidth of open-loop system bandwidth Use command idinput in System Identification toolbox 22/31

Training Signal Multisine cont First trick N 1 u(k) = A cos(2πω l k + ϕ l ) + γ l=1 N 2 m=1 A cos(2πω m k + ϕ m ) where the first term start from 0 to 3 times bandwidth and the second term start from 3 times bandwidth to half of sampling frequency γ is very small, ie 03 Second Validation signal should be 10 times slower than training signal 23/31

Training Signal Multisine cont 30 20 u 1 10 0 0 2000 4000 6000 8000 10000 time (sec) 12 10 y(ph) 8 6 4 2 0 2000 4000 6000 8000 10000 time (sec) Figure: An example of multisine signal 24/31

Training Results 15 Output (solid) and one step ahead prediction (dashed) 10 5 0 0 500 1000 1500 2000 time (samples) Prediction error (y yhat) 1 05 0 05 1 0 500 1000 1500 2000 time (samples) 25/31

Training Results cont 1 Auto correlation function of prediction error 05 0 05 0 5 10 15 20 25 lag Cross correlation coef of u1 and prediction error 005 0 005 25 20 15 10 5 0 5 10 15 20 25 lag 26/31

Training Results cont 11 Output (solid) and 10 step ahead prediction (dashed) 10 9 8 7 6 5 4 3 2 0 500 1000 1500 2000 time (samples) 27/31

MATLAB NN Toolbox NNARX 28/31

MATLAB NN Toolbox cont 29/31

Reference 1 Lecture note on Neural and Genetic Computing for Control Engineering, Werner, H, TUHH 2 System Identification: Theory for the user, Ljung, L,1999, Prentice Hall 3 Neural Networks for Modelling and Control of Dynamic Systems, Norgaard, M Ravn, O Poulsen, N K and Hansen, L K 30/31