Identification of Nonlinear Systems using Polynomial Nonlinear State Space Models

Size: px

Start display at page:

Download "Identification of Nonlinear Systems using Polynomial Nonlinear State Space Models"

Junior Cummings
5 years ago
Views:

FACULTY OF ENGINEERING Department of Fundamental Electricity and Instrumentation Identification of Nonlinear Systems using Polynomial Nonlinear State Space Models Thesis submitted in fulfillment of

1 FACULTY OF ENGINEERING Department of Fundamental Electricity and Instrumentation Identification of Nonlinear Systems using Polynomial Nonlinear State Space Models Thesis submitted in fulfillment of the requirements for the degree of Doctor in de Ingenieurswetenschappen (Doctor in Engineering) by ir. Johan Paduart Chair: Vice chair: Secretary: Advisers: Jury: Prof. Dr. ir. Annick Hubin (Vrije Universiteit Brussel) Prof. Dr. ir. Jean Vereecken (Vrije Universiteit Brussel) Prof. Dr. Steve Vanlanduit (Vrije Universiteit Brussel) Prof. Dr. ir. Johan Schoukens (Vrije Universiteit Brussel) Prof. Dr. ir. Rik Pintelon (Vrije Universiteit Brussel) Prof. Dr. ir. Lennart Ljung (Linköping University) Prof. Dr. ir. Johan Suykens (Katholieke Universiteit Leuven) Prof. Dr. ir. Jan Swevers (Katholieke Universiteit Leuven) Prof. Dr. ir. Yves Rolain (Vrije Universiteit Brussel)

2 Print: Grafikon, Oostkamp Vrije Universiteit Brussel - ELEC Department Johan Paduart 2007 Uitgeverij VUBPRESS Brussels University Press VUBPRESS is an imprint of ASP nv (Academic and Scientific Publishers nv) Ravensteingalerij 28 B-1000 Brussels Tel (0) Fax ++32 (0) info@vubpress.be ISBN NUR 910 Legal deposit D/2008/11.161/012 All rights reserved. No parts of this book may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the author or the ELEC Department of the Vrije Universiteit Brussel.

3 Leaving the calm water of the linear sea for the more unpredictable waves of the nonlinear ocean.

5 Acknowledgements Research and writing a PhD thesis is not something one can do all by himself. Therefore, I would like to thank everyone who contributed to this thesis in one or many ways. First of all, I am grateful to Johan Schoukens and Rik Pintelon for giving me the opportunity to pursue my PhD degree at the ELEC department, and for introducing me to this fascinating research field. Without the guidance of the patjes (Jojo, Rik and Yves) over the past four years, this work would have been a much tougher job. I would also like to thank all my colleagues at the ELEC department for the stimulating work environment, the exchange of interesting ideas, and the relaxing talks. I am indebted to Lieve Lauwers, Rik Pintelon, Johan Schoukens, and Wendy Van Moer for proofreading (parts of) my thesis. A special word of thanks goes to Lieve, who has polished the rough edges of this text. Lieve, you have made my thesis much more pleasant to read. Furthermore, I thank Tom Coen, Thomas Delwiche, Liesbeth Gommé, and Kris Smolders for letting me use their measurements. I very much appreciate your difficult and time-consuming experimental work. Last but not least, I am grateful to my family and my lieve Lieve for their unconditional support and love. Johan Paduart

7 Table of Contents Operators and Notational Conventions Symbols Abbreviations v vii ix Chapter 1 Introduction What are Nonlinear Systems? Why build Nonlinear Models? A Framework for Nonlinear Modelling Approximation Criteria The Volterra-Wiener Theory Continuous-time versus Discrete-time Single Input, Single Output versus Multiple Input, Multiple Output What is not included in the Volterra Framework? Outline of the thesis Contributions Publication List 13 Chapter 2 The Best Linear Approximation Introduction Class of Excitation Signals Random Phase Multisine Gaussian Noise Properties of the Best Linear Approximation Single Input, Single Output Systems Multiple Input, Multiple Output Systems Some Properties of Nonlinear Systems Response to a Sine Wave Even and Odd Nonlinear Behaviour The Multisine as a Detection Tool for Nonlinearities Estimating the Best Linear Approximation Single Input, Single Output Systems Multiple Input, Multiple Output Systems 34 Appendix 2.A Calculation of the FRF Covariance from the Input/Output Covariances 40 Appendix 2.B Covariance of the FRF for Non Periodic Data 43 Chapter 3 Fast Measurement of Quantization Distortions Introduction The Multisine as a Detection Tool for Non-idealities DSP Errors Truncation Errors of the Filter Coefficients 51 i

8 3.3.2 Finite Precision Distortion Finite Range Distortion Influence of the Implementation Quality Analysis of Audio Codecs Conclusion 62 Chapter 4 Identification of Nonlinear Feedback Systems Introduction Model Structure Estimation Procedure Best Linear Approximation Nonlinear Feedback Nonlinear Optimization Experimental Results Linear Model Estimation of the Nonlinear Feedback Coefficients Nonlinear Optimization Upsampling Conclusion 79 Appendix 4.A Analytic Expressions for the Jacobian 80 Chapter 5 Nonlinear State Space Modelling of Multivariable Systems Introduction The Quest for a Good Model Structure Volterra Models NARX Approach State Space Models Polynomial Nonlinear State Space Models Multinomial Expansion Theorem Graded Lexicographic Order Approximation Behaviour Stability Some Remarks on the Polynomial Approach On the Equivalence with some Block-oriented Models Hammerstein Wiener Wiener-Hammerstein Nonlinear Feedback Conclusion A Step beyond the Volterra Framework Duffing Oscillator Lorenz Attractor Identification of the PNLSS Model Best Linear Approximation Frequency Domain Subspace Identification Nonlinear Optimization of the Linear Model Estimation of the Full Nonlinear Model 122 Appendix 5.A Some Combinatorials 130 ii

9 Appendix 5.B Construction of the Subspace Weighting Matrix from the FRF Covariance 131 Appendix 5.C Nonlinear Optimization Methods 133 Appendix 5.D Explicit Expressions for the PNLSS Jacobian 140 Appendix 5.E Computation of the Jacobian regarded as an alternative PNLSS system 142 Chapter 6 Applications of the Polynomial Nonlinear State Space Model Silverbox Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Comparison with Other Approaches Combine Harvester Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Semi-active Damper Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Quarter Car Set-up Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Robot Arm Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Wiener-Hammerstein Description of the DUT Description of the Experiments Level of Nonlinear Distortions Best Linear Approximation Nonlinear Model Comparison with a Block-oriented Approach Crystal Detector Description of the DUT Description of the Experiments Best Linear Approximation Nonlinear Model Comparison with a Block-oriented Approach 192 iii

10 Chapter 7 Conclusions 193 References 197 Publication List 204 iv

11 Operators and Notational Conventions Re () Im () arg min Ox θˆ x x fx () outline upper case font denotes a set: for example,,,, and are, respectively, the natural, the integer, the real, and the complex numbers. the Kronecker matrix product real part of imaginary part of the minimizing argument of () an arbitrary function with the property lim Ox x 0 estimated value of θ complex conjugate of x fx () () x < subscript re A Re( A) re = Im( A) superscript subscript u y with respect to the input of the system subscript with respect to the output of the system x ( r) vector which contains all the distinct nonlinear combinations of the elements of vector x, of exactly degree r x r vector which contains all the distinct nonlinear combinations of the elements of vector, from degree 2 up to superscript superscript superscript superscript superscript re A re = Re( A) Im( A) { } x r x T T H H + matrix transpose transpose of the inverse matrix Hermitian transpose: complex conjugate transpose of a matrix Hermitian transpose of the inverse matrix Moore-Penrose pseudo-inverse phase (argument) of the complex number A [:, j] j -th column of A A [ i, :] i-th row of A = ( max ( )) ( minσ condition number of an matrix i i ( A) ) n m i κ( A) σ i A x A = () magnitude of a complex number x ( Re() x ) 2 + ( Im x ) 2 x v

12 diag A 1, A 2,, A K ( ) block diagonal matrix with blocks A k with k = 12,,, K herm( A) A A H rank( A) n m A vec( A) { } = ( + ) 2 Hermitian symmetric part of an n m matrix rank of the matrix, i.e., maximum number of linear independent rows (columns) of a column vector formed by stacking the columns of the matrix on top of each other mathematical expectation Cov( X, Y) var() x cross-covariance matrix of X and Y variance of x C X = Cov( X) = Cov( X, X) covariance matrix of X A A A Ĉ X sample covariance matrix of X C XY = Cov( X, Y) cross-covariance matrix of X and Y Ĉ XY sample cross-covariance matrix of X and Y DFT( xt ()) xt () t = 0, 1,, N 1 I m m m Discrete Fourier Transform of the samples, identity matrix 0 m n m n I m m m zero matrix identity matrix S XX jω ( ) auto-power spectrum of xt S XY jω Xˆ ( ) cross-power spectrum of xt () and yt sample mean of = mean value of x µ x { x} X () () σ x 2 σˆ x 2 = var() x variance of x sample variance of x 2 covar xy σ xy = (, ) covariance of x and y σˆ xy 2 sample covariance of x and y vi

13 Symbols C BLA ( k) C n ( k) C NL ( k) f F total covariance of the Best Linear Approximation (MIMO) covariance of the BLA due to the measurement noise (MIMO) covariance of the BLA due to the stochastic nonlinear contributions (MIMO) frequency number of frequency domain data samples f s Gjω ( ) frequency response function sampling frequency G BLA ( jω) best linear approximation of a nonlinear plant j j 2 = 1 k M N frequency index number of (repeated) experiments number of time domain data samples n a n u n y,, and state dimension, input dimension, and output dimension n θ s dimension of the parameter vector Laplace transform variable θ s k k sk jωk Laplace transform variable evaluated along the imaginary axis at DFT frequency : = t T s continuous- or discrete-time variable sampling period Uk ()Yk, utt ( s ) ytt ( s ) t = 0, 1,, N 1 discrete Fourier transform of the samples and, U k, Y k Fourier coefficients of the periodic signals ut (), yt () Ujω ( ) Yjω ut (), yt (), ( ) Fourier transform of ut () and yt input and output time signals ( ) cost function based on F measurements V F θ, z () z z k Z-transform variable Z-transform variable evaluated along the unit circle at DFT frequency : k z k = e jω k T s = e j2πk N vii

14 εθ (, Z) F J( θ, Z) εθ (, Z) column vector of the model residuals (dimension ) = θ εθ (, Z) θ gradient of residuals w.r.t. the parameters (dimension F n θ ) θ ζ() t η() t ξ() t σ2 BLA ( k) σ 2 n ( k) σ2 NL ( k) column vector of the model parameters nonlinear vector map of the state equation nonlinear vector map of the output equation column vector that contains the stacked state and input vectors total variance of the Best Linear Approximation (SISO) variance of the BLA due to the measurement noise (SISO) variance of the BLA due to the stochastic nonlinear contributions ω = 2πf (SISO) angular frequency viii

15 Abbreviations BL BLA DFT DUT FIR FFT FRF GN iid IIR LS LTI MIMO MISO NARX NLS NOE pdf PID PISPOT PNLSS PSD RF RBF RMS RMSE rpm RPM SA SISO SDR SNR SVD w.p.1 WLS Band-Limited (measurement set-up) Best Linear Approximation Discrete Fourier Transform Device Under Test Finite Impulse Response Fast Fourier Transform Frequency Response Function Gaussian Noise independent identically distributed Infinite Impulse Response Least Squares Linear Time Invariant Multiple Input Multiple Output Multiple Input Single Output Nonlinear Auto Regressive with external input (model) Nonlinear Least Squares Nonlinear Output Error (model) probability density function Proportional-Integral-Derivative (controller) Periodic Input, Same Periodic OuTput Polynomial NonLinear State Space Power Spectral Density Radio Frequency Radial Basis Function Root Mean Square (value) Root Mean Square Error rotations per minute Random Phase Multisine State Affine (model) Single Input Single Output Signal-to-Distortion Ratio Signal-to-Noise Ratio Singular Value Decomposition with probability one Weighted Least Squares ix

16 x

17 CHAPTER 1 INTRODUCTION 1

18 Chapter 1: Introduction 1.1 What are Nonlinear Systems? It is difficult, if not impossible, to give a closing definition of nonlinear systems. The famous paradigm by the mathematician Stan Ulam illustrates this [6]: using a term like nonlinear science, is like referring to the bulk of zoology as the study of non-elephant animals. Nevertheless, the world around us is filled with nonlinear phenomena, and we are very familiar with some of these effects. Essentially, a system is nonlinear when the rule of three is not applicable to its behaviour. Tax rating systems in Belgium, for instance, behave nonlinearly: the higher someone s gross salary gets, the higher his/her average tax rate becomes. Audio amplifiers are another good example of nonlinear systems. When their volume is turned up too eagerly, the signals they produce get clipped, and the music we hear becomes distorted instead of sounding louder. The weather system also behaves nonlinearly: slight perturbations of this system can lead to massive modifications after a long period of time. This is the so-called butterfly effect. It explains why it is so hard to accurately predict the weather with a time horizon of more than a couple of days. In some situations, nonlinear behaviour is a desired effect. Video and audio broadcasting, mobile telephony, and CMOS technology would simply be impossible without nonlinear devices such as transistors and mixers. Hence, it is important to understand and model their behaviour. Finally, let us define, in a slightly more rigorous way, what nonlinear systems are. To this end, we start by defining a linear system. With zero initial conditions, the system T {}. is linear if it obeys the superposition principle and the scaling property T{ αu 1 () t + βu 2 () t } = αt{ u 1 () t } + βt{ u 2 () t }, (1-1) where u 1 () t and u 2 () t are two arbitrary input signals as a function of time, and αand βare two arbitrary scalar numbers. When the superposition principle or the scaling property is not fulfilled, we call T {}. a nonlinear system. The most important implication of this open definition, is that there exists no general nonlinear framework. That is why studying nonlinear systems is such a difficult task. 2

19 Why build Nonlinear Models? 1.2 Why build Nonlinear Models? From the handful examples listed in the previous section, it is clear that many real-life phenomena are nonlinear. Often, it is possible to use linear models to approximate their behaviour. This is an attractive idea because the linear framework is well established. Furthermore, linear models are easy to interpret and to understand. Building linear models usually requires significantly less effort than the estimation of nonlinear models. Unfortunately, linear approximations are only valid for a given input range. Hence, there has been a tendency towards nonlinear modelling in various application fields during the last decades. Technological innovations have resulted in less limitations on the computational, the memory, and the data-acquisition level, making nonlinear modelling a more feasible option. In order to build models for the studied nonlinear devices, we will employ system identification methods. Classic text books about system identification are [75], [38], and [56]. The basic goal of system identification is to identify mathematical models from the available input/ output data. This is often achieved via the minimization of a cost function embedded in a statistical framework (Figure 1-1). An excellent starting point for nonlinear modelling is [70]. Other reference works on nonlinear systems and nonlinear modelling include [60],[59],[7],[5],[33],[77], and [80]. In this thesis, we focus on the estimation of so-called simulation models: from measured input/output data, we estimate a model that, given a new input data set, simulates the output as good as possible. Such models can for instance be used to replace expensive experiments by cheap computer simulations. The major difference with prediction error modelling is that no past measured outputs are used to predict new output values. As mentioned before, there is no general nonlinear framework. However, there exists a class of nonlinear systems that has intensively been studied in the past, and which covers a broad spectrum of nice nonlinear behaviour: the class of Wiener systems. This class will be used as a starting point in this thesis; it stems from the Volterra-Wiener theory which will be briefly explained in what follows. data cost function model Figure 1-1. Basic idea of system identification: the cost function relates data and model. 3

20 Chapter 1: Introduction 1.3 A Framework for Nonlinear Modelling Approximation Criteria In this thesis, we will use models to approximate the behaviour of nonlinear systems. This requires an approximation quality measure. For this, two principal convergence criteria can be employed: convergence in mean square sense and uniform convergence. Definition 1.1 (Convergence in the Mean) The model ˆ f ( u, θ) converges in mean square sense to a system fu ( ), if for all ε > 0, M ε independent of θ such that for all M > M ε, 2 θ M u : fu ( ) fˆ ( u, θ M ) < ε with M = dim( θ), where the expected value is taken over the class of excitation signals. Definition 1.2 (Uniform Convergence) The model ˆ f ( u, θ) converges uniformly to a system fu ( ), if for all ε > 0, M ε independent of θ and u such that for all M > M ε, θ M u : fu ( ) fˆ ( u, θ M ) < ε with M = dim( θ). Note that uniform convergence is a stronger result than convergence in mean square sense, but the latter is often easier to obtain The Volterra-Wiener Theory In order to set up a rigorous framework for the following chapter, we consider a particular class of nonlinear systems. A classical approach is to make use of the Volterra-Wiener theory, which is thoroughly described in [59] and [60]. A short overview of the results that will serve in the rest of this thesis is given here. 4

21 A Framework for Nonlinear Modelling Volterra series can be seen as the dynamic extension of power series, and they are defined as the (infinite) sum of Volterra operators H n. For an input ut () as a function of time, the output yt () of the series is given by yt () = H n [ ut ()]. (1-2) n = 1 The n-th order continuous-time Volterra operator H n is defined as H n [ ut ()] = h n ( τ 1,, τ n )ut ( τ 1 ) ut ( τ n ) dτ 1 dτ n, (1-3) 0 0 where h n ( τ 1,, τ n ) is the n-th order Volterra kernel. For a first order system ( n = 1 ), equation (1-3) reduces to the well-known relation H 1 [ ut ()] = h( τ)ut ( τ) dτ, (1-4) 0 which is the convolution representation of a linear system with an impulse response h( τ). When the kernel h n is causal, it is zero for any negative argument: h n ( τ 1,, τ n ) = 0 for τ i < 0, i = 1,, n (1-5) Because we restrict ourselves to causal systems, the lower integral limits in (1-3) and (1-4) are set equal to zero. Volterra series can be used to approximate the behaviour of a certain class of nonlinear systems. However, they can suffer from severe convergence problems, which is a common phenomenon for power series. This can occur for example in the presence of discontinuities like hard clipping or dead-zones. To overcome this difficulty, Wiener introduced the Wiener-G functionals [60], which are Volterra functionals orthogonalized with respect to white Gaussian input signals. Note that the Wiener-G functionals are only required to solve numerical issues. Hence, what follows holds for both Volterra and Wiener-G functionals. The Wiener theory states that any nonlinear system f satisfying a number of conditions can be represented arbitrarily well in mean square sense by Volterra/Wiener-G functionals. The restrictions on system f are [60]: 5

22 Chapter 1: Introduction 1. f is not explosive, in other words, the system s response to a bounded input sequence is finite; 2. f has a finite memory, i.e., the present output becomes asymptotically independent of the past values of the input; 3. f is causal and time-invariant. The set of systems W satisfying these conditions is known as the class of Wiener systems. When approximating f by a Volterra/Wiener-G functional fˆ, the mean square convergence is guaranteed over a finite time interval, with respect to the class of white Gaussian input signals : [ fut ( ()) fˆ ( ut ())] 2 < ε t [ 0, T], f W, u (1-6) Boyd and Chua achieved even more powerful results with Volterra series via the introduction of a concept called Fading Memory [5]. Definition 1.3 (Fading Memory) f has Fading Memory on a subset K of a compact set, if there is a decreasing function w : ( 01],, with lim wt () = 0, such that t for each u 1 K and ε > 0 there is a δ > 0 such that for all u 2 K : sup w( t) u 1 () t u 2 () t t 0 < δ f( u 1 () t ) fu ( 2 () t ) < ε (1-7) Loosely explained, an operator has Fading Memory when two input signals close to each other in the recent past, but not necessarily in the remote past, yield present output signals that are close. This strengthened continuity requirement on f allows to obtain more powerful approximation results with Volterra series. Boyd and Chua have proved the uniform convergence of finite ( n < ) Volterra series to any continuous-time Fading Memory system for the class of input signals with bounded amplitude and bounded slew rate, without any restrictions on the time interval ( T ). 6

23 A Framework for Nonlinear Modelling Continuous-time versus Discrete-time Until now, we have only considered continuous-time Volterra series. However, this thesis mainly deals with the identification of discrete-time models. The discrete-time version of the n -th order causal Volterra operator is defined as H n [ ut ()] = h n ( τ 1,, τ n )ut ( τ 1 ) ut ( τ n ), (1-8) τ 1 = 0 τ n = 0 where h n ( τ 1,, τ n ) denotes the n -th order, discrete-time Volterra kernel. In [5], the approximation properties of the Volterra series are shown under the Fading Memory assumption for discrete-time systems. The only difference with the continuous-time case is that the slew rate of the input signal does not need to be bounded Single Input, Single Output versus Multiple Input, Multiple Output So far, only SISO (Single Input, Single Output) systems were considered, but in Chapter 5 we will deal with MIMO (Multiple Input, Multiple Output) systems as well. MIMO Volterra models with n y outputs are defined as n y separate MISO Volterra series. In the following, the analysis is only pursued for discrete-time systems. For notational simplicity, we define ut () as the vector of the n u assembled inputs at time instance t : u 1 () t ut () = u nu () t (1-9) A MISO Volterra series is defined as the sum of Volterra functionals H n : yt () = H n [ ut ()] n = 0 (1-10) Next, consider the n-th term of (1-10). The n-th order, discrete-time MISO Volterra functional is defined as 7

24 Chapter 1: Introduction j 1,, j n H n [ ut ()] = H n [ ut ()], (1-11) j 1,, j n where j i are input indices between 1 and n u, and j 1,, j n H n [ ut ()] h j 1,, j n = ( τ 1,, τ n )u j1 ( t τ 1 ) u jn ( t τ n ). (1-12) τ 1 = 0 τ n = 0 j 1,, j n To determine the number of distinct operators H n [ ut ()] of order n, we need to apply some combinatorials. In Appendix 5.A, the same problem is solved in a different context, but the idea remains the same. Hence, we distinguish n u + n 1 ( n + n u 1)! = n n! ( n u 1)! (1-13) different terms What is not included in the Volterra Framework? Since Volterra series are open loop models, they cannot represent a number of closed loop phenomena. Bearing the negative definition of nonlinear systems in mind, it is impossible to give an exhaustive list of the systems that cannot be approximated by Volterra models. However, it is possible to sum up a couple of examples. No Subharmonic Generation In [60], it was shown that the steady-state response of a Volterra series to a harmonic input is harmonic, and has the same period as the input. Hence, systems that generate subharmonics are excluded from the Volterra-Wiener framework. For this reason, we sometimes say that Wiener systems are PISPOT (Periodic Input, Same Periodic OuTput) systems. An example of subharmonic generation is given in the Duffing Oscillator on p

25 A Framework for Nonlinear Modelling No Chaotic Behaviour Chaos is typically the result of a nonlinear dynamic system of which the output depends extremely on the initial conditions. Such behaviour conflicts with the finite memory requirement of the Wiener class: the present output does not become asymptotically independent of the past. An example of chaotic behaviour is given in Lorenz Attractor on p No Multiple-valued Output Volterra series are a single-valued output representation. Hence, they cannot represent systems that exhibit output multiplicity, like for instance hysteresis. 9

26 Chapter 1: Introduction 1.4 Outline of the thesis All the work in this thesis relies on the concept of the Best Linear Approximation (BLA). Therefore, in Chapter 2 the BLA is first introduced in an intuitive way, and then rigorously defined for SISO and MIMO nonlinear systems. Furthermore, some interesting properties of multisine excitation signals with respect to the qualification and quantification of nonlinear behaviour are rehearsed. Finally, we explain how the BLA should be estimated, for both non periodic and periodic input/output data. As will become clear in Chapter 2, periodic excitations are preferred, since in that case more information can be extracted from the Device Under Test. The tools described in Chapter 2 are applied to a number of Digital Signal Processing (DSP) algorithms in Chapter 3. A measurement technique is proposed to characterize the nonidealities of DSP algorithms which are induced by quantization effects, overflows, or other nonlinear effects. The main idea is to apply specially designed excitations such that a distinction can be made between the output of the ideal system and the contributions of the system s non-idealities. The proposed method is applied to digital filtering and to an audio compression codec. In Chapter 4, an identification procedure is presented for a specific kind of block-oriented model: the Nonlinear Feedback model. By estimating the Best Linear approximation of the system and by rearranging the model s structure, the identification of the feedback model parameters is reduced to a linear problem. The numerical parameter values obtained by solving the linear problem are then used as starting values for a nonlinear optimization procedure. The proposed method is illustrated on measurements obtained from a physical system. Chapter 5 introduces the Polynomial Nonlinear State Space model (PNLSS) and studies its approximation capabilities. Next, a link is established between this model and a number of classical block-oriented models, such as Hammerstein and Wiener models. Furthermore, by means of two simple examples, we illustrate that the proposed model class is broader than the Volterra framework. In the last part of Chapter 5, a general identification procedure is presented which utilizes the Best Linear Approximation of the nonlinear system. Next, 10

27 Outline of the thesis frequency domain subspace identification is employed to initialize the PNLSS model. The identification of the full PNLSS model is then regarded as a nonlinear optimization problem. In Chapter 6, the proposed identification procedure is applied to measurements from various real-life systems. The SISO test cases comprise three electronic circuits (the Silverbox, a Wiener-Hammerstein system and a RF crystal detector), and two mechanical set-ups (a quarter car set-up and a robot arm). Furthermore, two mechanical MISO applications are discussed (a combine harvester and a semi-active magneto-rheological damper). Finally, Chapter 7 deals with the conclusions and some ideas on further research. 11

28 Chapter 1: Introduction 1.5 Contributions The main goal of this thesis is to study and design tools which allow the practicing engineer to qualify, to understand and to model nonlinear systems. In this context, the contributions of this thesis are: The characterization of DSP systems/algorithms via the Best Linear Approximation and multisine excitation signals. A method to generate starting values for a block-oriented, Nonlinear Feedback model with a static nonlinearity in the feedback loop. A method that initializes the Polynomial NonLinear State Space (PNLSS) model by means of the BLA of the Device Under Test. The establishment of a link between the PNLSS model structure and five classical blockoriented models. The application of the proposed identification method to several real-life measurement problems. 12

29 Publication List 1.6 Publication List Chapter 3 was published as J. Paduart, J. Schoukens, Y. Rolain. Fast Measurement of Quantization Distortions in DSP Algorithms. IEEE Transactions on Instrumentation and Measurement, vol. 56, no. 5, pp , The major part of Chapter 4 was presented at the Nolcos 2004 conference: J. Paduart, J. Schoukens. Fast Identification of systems with nonlinear feedback. Proceedings of the 6th IFAC Symposium on Nonlinear Control Systems, Stuttgart, Germany, pp , The comparative study between the PNLSS model and the block-oriented models from Chapter 5 were presented at IMTC 2007: J. Paduart, J. Schoukens, L. Gommé. On the Equivalence between some Block-oriented Nonlinear Models and the Nonlinear Polynomial State Space Model. Proceedings of the IEEE Instrumentation and Measurement Technology Conference, Warsaw, Poland, pp. 1-6, The identification of the PNLSS model and its application to two real-life set-ups was presented at the SYSID 2006 conference: J. Paduart, J. Schoukens, R. Pintelon, T. Coen. Nonlinear State Space Modelling of Multivariable Systems. Proceedings of the 14th IFAC Symposium on System Identification, Newcastle, Australia, pp , The application of the PNLSS model to a quarter car set-up led to the following publication: 13

30 Chapter 1: Introduction J. Paduart, J. Schoukens, K. Smolders, J. Swevers. Comparison of two different nonlinear state-space identification algorithms. Proceedings of the International Conference on Noise and Vibration Engineering, Leuven, Belgium, pp , Finally, the cooperation with colleagues from the ELEC Department (Vrije Universiteit Brussel), the PMA and BIOSYST-MeBioS Department (KULeuven) resulted in the following publications: J. Schoukens, J. Swevers, J. Paduart, D. Vaes, K. Smolders, R. Pintelon. Initial estimates for block structured nonlinear systems with feedback. Proceedings of the International Symposium on Nonlinear Theory and its Applications, Brugge, Belgium, pp , J. Schoukens, R. Pintelon, J. Paduart, G. Vandersteen. Nonparametric Initial Estimates for Wiener-Hammerstein systems. Proceedings of the 14th IFAC Symposium on System Identification, Newcastle, Australia, pp , T. Coen, J. Paduart, J. Anthonis, J. Schoukens, J. De Baerdemaeker. Nonlinear system identification on a combine harvester. Proceedings of the American Control Conference, Minneapolis, Minnesota, USA, pp ,

31 CHAPTER 2 THE BEST LINEAR APPROXIMATION In this chapter, we introduce the Best Linear Approximation in an intuitive way. Next, the excitation signals used throughout this thesis are presented, followed by a formal definition of the Best Linear Approximation. We then explain how the properties of a multisine excitation signal can be exploited to quantify and qualify the nonlinear behaviour of a system. Finally, we show how the Best Linear Approximation of a nonlinear system can be obtained. 15

32 Chapter 2: The Best Linear Approximation 2.1 Introduction As was shown in the introductory chapter, linear models have many attractive properties. Therefore, it can be useful to approximate nonlinear systems by linear models. Since we are dealing with an approximation, model errors will be present. Hence, a framework needs to be selected in order to decide in which sense the approximate linear model is optimal. We will use a classical approach and minimize the errors in mean square sense. Definition 2.1 (Best Linear Approximation) The Best Linear Approximation (BLA) is defined as the model G belonging to the set of linear models, such that G BLA = arg min { yt () Gut ( ()) 2}, (2-1) G where ut () and yt () are the input and output of the nonlinear system, respectively. In general, the Best Linear Approximation G BLA of a nonlinear system depends on the amplitude distribution, the power spectrum, and the higher order moments of the stochastic input ut () [56],[19],[21]. The amplitude dependency is illustrated by means of a short simulation example. Example 2.2 Consider the following static nonlinear system: y = tanh( u), (2-2) and three white excitation signals drawn from different distributions: a uniform, a Gaussian, and a binary distribution (Figure 2-1 (a)). The parameters of these distributions are chosen such that their variance is equal to one. Figure 2-1 (b) shows the BLA (grey) for the three distributions, together with the static nonlinearity (black). In this set-up, the BLA is a straight line through the origin. Note that in general, a static nonlinearity does not necessarily have a static BLA [19]. It can be seen from Figure 2-1 (b) that the slope g of the BLA changes from distribution to distribution. From this example, it is clear that the properties of the input are of paramount importance when the BLA of a nonlinear system is determined. That is why we will start by discussing the class of excitation signals used throughout this thesis. Next, a formal definition of the Best 16

33 Introduction Uniform Distribution Gaussian Distribution Binary Distribution (a) g = g = g = 0.76 (b) Figure 2-1. (a) Three different probability density functions; (b) Static nonlinearity (black) and BLA (grey). Linear approximation is given. Then, we will demonstrate how multisine signals can be used to quantify and qualify the nonlinear behaviour of the Device Under Test (DUT). Finally, we will show how the BLA can be determined for SISO and MIMO systems. 17

34 Chapter 2: The Best Linear Approximation 2.2 Class of Excitation Signals Since the Best Linear Approximation of a nonlinear system depends on the properties of the applied input signal, it is important to define the kind of excitations that will be employed, and to discuss their properties. In this thesis, we will utilize the class of Gaussian excitation signals with a user-defined power spectrum. Furthermore, it is required that the signals are stationary such that their power spectrum is well defined. Three excitation signals that are commonly used belong to : Gaussian noise, Gaussian periodic noise, and random phase multisines (Figure 2-2). For periodic noise and random phase multisines, this membership is only asymptotic, i.e., for the number of excited frequency components going to infinity ( N ). We will restrict ourselves to Gaussian noise and random phase multisines. We will give a definition and a brief overview of some of the properties of these signals. Gaussian Signals Gaussian Noise Gaussian Periodic Noise Random Phase Multisines * Figure 2-2. Class of excitation signals. *: asymptotic result, for the number of excited frequency components going to infinity Random Phase Multisine Definition 2.3 (Random Phase Multisine) A random phase multisine is a periodic signal, defined as a sum of harmonically related sine waves: ut () = N U k e N k = N j k 2πf max ---t + φ N k (2-3) with φ k = φ k, U k U k U kf max = = , and f the maximum frequency of the N max excitation signal. The amplitudes Uf () are chosen in a custom fashion, according to the user-defined power spectrum that should be realized. The phases of an independent distributed random process such that { e jφ k} = 0. φ k are the realizations 18

35 Class of Excitation Signals The factor 1 N serves as normalization such that, asymptotically ( N ), the power of the multisine remains finite, and its Root Mean Square (RMS) value stays constant as N increases. A typical choice is to take φ k uniformly distributed over [0, 2π), but for instance discrete phase distributions can be used as well, as long as { e jφ k} = 0 holds. Note that the random phase multisine is asymptotically normally distributed ( N ), but in practice 20 excited lines works already very well for smoothly varying amplitude distributions [56],[62]. Next, we illustrate some of the properties of a random phase multisine signal with a short example. Example 2.4 We consider a random phase multisine with N = 128, a flat power spectrum and f max = 0.25 Hz. Figure 2-3 (a) shows the histogram of this signal together with a theoretic Gaussian pdf having the same variance and expected value. Figure 2-3 (b) and (c) show the multisine in the time and frequency domain, respectively. From these plots, we see that the random phase multisine has a noisy behaviour in the time domain, and perfectly realizes the user-defined amplitude spectrum in the frequency domain. The main advantage of random phase multisines is the fact that their periodicity can be exploited to distinguish the measurement noise from the nonlinear distortions [15]. Further in this chapter, we will go into detail about this property. A drawback is the need to introduce a settling time for the transients, which is common to periodic excitation signals. 4 (a) 4 (b) 2 (c) 2 2 Amplitude Probability Time [s] Frequency [Hz] Figure 2-3. Some properties of a Random Odd, Random Phase Multisine: (a) Histogram (black) and theoretic Gaussian pdf (grey), (b) Time domain and (c) Frequency domain representation (DFT spectrum). 19

36 Chapter 2: The Best Linear Approximation Gaussian Noise Definition 2.5 (Gaussian Noise) A Gaussian noise signal is a random sequence drawn from a Gaussian distribution with a user-defined power spectral density. Example 2.6 An example of a Gaussian noise signal is shown in Figure 2-4 (b). To generate this sequence, a signal of N = 128 samples was drawn from a normal distribution. In order to achieve the same bandwidth as the random phase multisine from Figure 2-3, the signal was filtered using a 6th order Butterworth filter with a cut-off frequency of 0.25 Hz. Finally, to obtain the same RMS value as for the multisine example, the amplitude of the filtered sequence was normalized. Figure 2-4 (a) shows the histogram of this signal together with a theoretic Gaussian pdf. In Figure 2-4 (c), the DFT spectrum of the sequence is plotted. The DFT spectrum of the Gaussian noise contains dips that can lead to unfavourable results such as a low SNR at some frequencies. Two more disadvantages are associated with the non periodic nature of random Gaussian noise. First of all, no distinction can be made between the measurement noise and the nonlinear distortions using simple tools. Secondly, leakage errors are present when computing the DFT of this signal. Note that when comparing Figure 2-3 and Figure 2-4, it is impossible to distinguish a random phase multisine and a random Gaussian noise sequence based on their histogram and time domain waveform. 4 (a) 4 (b) 2 (c) 2 2 Amplitude Probability Time [s] Frequency [Hz] Figure 2-4. Some properties of filtered random Gaussian Noise: (a) Histogram (black) and Gaussian pdf (grey), (b) Time domain and (c) Frequency domain representation (DFT spectrum). 20

37 Properties of the Best Linear Approximation 2.3 Properties of the Best Linear Approximation Single Input, Single Output Systems Consider a noiseless Single Input, Single Output (SISO) nonlinear system S with an input and an output y (see Figure 2-5 (a)). We make the following assumption on S. u Assumption 2.7 There exists a uniformly bounded Volterra series of which the output converges in mean square sense to the output y of S for u. These systems are also called Wiener, or PISPOT systems. This class includes discontinuities like quantizers or relays, and excludes chaotic behaviour or systems with bifurcations. Theorem 2.8 If S satisfies Assumption 2.7, it can be modelled as the sum of a linear system G BLA ( jω), called the Best Linear Approximation, and a noise source y s. The Best Linear Approximation is calculated as G BLA ( jω) = S yu ( jω) , (2-4) S uu ( jω) where S uu ( jω) is the auto-power spectrum of the input, and S yu ( jω) the cross-power spectrum between the output and the input. Relation (2-4) is obtained by calculating the Fourier transform of the Wiener-Hopf equation, which on its turn follows from equation (2-1) (see for instance [24] for this classic result). Note that by using equation (2-4) no causality is imposed on G BLA ( jω). In [53], it was shown that the BLA for Gaussian noise and random phase multisines with an equivalent power spectrum are asymptotically identical. The BLA for random phase multisines converges to G BLA ( jω) as the number of frequency components N goes to infinity: G BLA, N ( jω) = G BLA ( jω) + ON ( 1 ) (2-5) The Best Linear Approximation for a nonlinear SISO system is illustrated in Figure 2-5 (b). The noise source y s represents that part of the output y that cannot be captured by the linear model G BLA ( jω). Hence, for frequency ω k we have that 21

38 Chapter 2: The Best Linear Approximation Yjω ( k ) = G BLA ( jω k )Ujω ( k ) + Y s ( jω k ). (2-6) Y s ( jω k ) depends on the particular input realization and exhibits a stochastic behaviour from realization to realization, with { Y s ( jω k )} = 0. Hence, G BLA ( jω) can be determined by averaging the system s response over several input realizations. (a) u nonlinear system y y s (b) u linear system y BLA y G BLA Figure 2-5. (a) SISO nonlinear system vs. (b) its alternative representation Multiple Input, Multiple Output Systems In [17], the Best Linear Approximation was extended to a Multiple Input, Multiple Output (MIMO) framework. Consider a noiseless nonlinear system S with inputs u i ( i = 1,, n u ) and outputs y j ( j = 1,, n y ). As in the SISO case, S needs to fulfil some conditions in order to define the Best Linear Approximation. Assumption 2.9 For i, j, there exists a uniformly bounded MIMO Volterra series of which the outputs converge in mean square sense to the outputs y j of S for u i. 22

39 Properties of the Best Linear Approximation Theorem 2.10 If S satisfies Assumption 2.9, it can be modelled as the sum of a linear system G BLA ( jω), and n y noise sources y() j s. The Best Linear Approximation of S is calculated as 1 G BLA ( jω) = S yu ( jω)s uu ( jω), (2-7) where S uu ( jω) n u n u is the auto-power spectrum of the inputs, and S yu ( jω) n y n u is the cross-power spectrum between the outputs and the inputs. Also for MIMO systems, it was proven that the Best Linear Approximation for Gaussian noise and random phase multisines is asymptotically equivalent [17]. Figure 2-6 (a) shows a nonlinear MIMO system with n u inputs and n y outputs. In Figure 2-6 (b), the alternative representation of this system is given: the Best Linear Approximation () j G BLA ( jω) together with the n y stochastic nonlinear noise sources Y s ( jωk ). For frequency ω k, the following equation holds: Y( jω k ) = G BLA ( jω k )U( jω k ) + Y s ( jω k ), (2-8) u 1 y 1 (a) u 2 nonlinear system y 2 u nu y ny y( 1 ) s y( 2 ) ( n s y y ) s ( 1) y BLA (b) u 1 u 2 u nu linear system G BLA ( 2) y BLA ( n y ) y BLA y 1 y 2 y ny Figure 2-6. (a) MIMO nonlinear system vs. (b) its alternative representation. 23

40 Chapter 2: The Best Linear Approximation with Y( jω k ) n y 1, G BLA ( jω k ) n y n u, U( jω k ) n u 1, and Y s ( jω k ) n y 1. Here also, we have that { Y s ( jω k )} = 0. 24

41 Some Properties of Nonlinear Systems 2.4 Some Properties of Nonlinear Systems From Definition 2.3, we know that the amplitude spectrum of a random phase multisine can be customized. We will demonstrate that this freedom can be used to detect, quantify and qualify the nonlinear behaviour of devices that satisfy Assumption 2.7. The tools developed here will be used extensively in Chapter 3 to characterize Digital Signal Processing (DSP) algorithms Response to a Sine Wave Figure 2-7 shows the response of a linear (a) and a nonlinear (b) system to a sine wave. The output of the linear system consists of a sine wave, possibly with a modified amplitude and phase. The output spectrum of the nonlinear system, in general, contains additional spectral components, harmonically related to the input sine wave. Hence, the spectral components on the non excited spectral lines indicate the level of nonlinear behaviour of the DUT. (a) A linear system A f f (b) A nonlinear system A f f Figure 2-7. Response of (a) a linear system and (b) a nonlinear system to a sine wave Even and Odd Nonlinear Behaviour Using this principle, we can also retrieve qualitative information about the nonlinear behaviour of the DUT. In Figure 2-8, two kinds of nonlinear systems are considered. In (a), an even nonlinear system is excited with a sine wave with a frequency f 0. The output spectrum of this system only contains contributions on the even harmonic lines (green arrows). This is due 25

42 Chapter 2: The Best Linear Approximation (a) A f even nonlinearity e.g. y = e u2 A f (b) A f odd nonlinearity e.g. y = tanh( u) A f Figure 2-8. Response of (a) an even nonlinear system and (b) an odd nonlinear system to a sine wave. to the fact that the output spectrum of an even nonlinear system only contains even combinations of the input frequencies (e.g. f 0 + f 0, f 0 f 0,...). For an odd nonlinear system (b), the converse is true: its output spectrum consists of components on the odd frequency lines (red arrows). Here, the output spectrum contains only odd combinations of the input frequencies (e.g. f 0 + f 0 + f 0, f 0 + f 0 f 0,...) The Multisine as a Detection Tool for Nonlinearities Consider now the case of a multisine signal, applied to a nonlinear system (see Figure 2-9). We will show that by carefully choosing the spectrum of the multisine, we can qualify and quantify the nonlinear behaviour of the DUT. For a multisine having only odd frequency components ( f 0, 3f 0, 5f 0,...), the even nonlinearities will only generate spectral components at even frequencies, since an even combination of odd frequencies always yields in an even frequency. Hence, the even frequency lines at the output can be used to detect even nonlinear behaviour of the DUT. Furthermore, odd combinations of odd frequency lines always A nonlinear system A f f Figure 2-9. Response of a nonlinear system to a multisine. 26

43 Some Properties of Nonlinear Systems result in odd frequency lines. Hence, when some of the odd frequency lines are not excited, they can serve to detect odd nonlinear behaviour of the DUT [56]. According to which frequency lines are used to detect the nonlinear behaviour, several kinds of random phase multisines can be distinguished: Full multisine: all frequencies up to f max are excited, U k = A for k = 1,, N. (2-9) Odd multisine: only the odd lines are excited, A k = 2n + 1 k N, n U k = 0 elsewhere (2-10) Since even nonlinearities only contribute to the even harmonic output lines, they do not disturb the FRF measurements in this case. Hence, a lower uncertainty is achieved for the BLA [64]. Random odd multisine: this is an odd multisine where the odd frequency lines are divided into groups with a predefined length, for instance, a block length of 4. In each block, all odd lines are excited ( U k = A ) except for one line which serves as a detection line for the odd nonlinearities. The frequency index of this line is randomly selected in each consecutive block [57]. This is the best way to reflect the nonlinear behaviour of the DUT. Hence this type of multisine should be the default option when employing random phase multisines. The ability to analyse the nonlinear behaviour of the DUT comes with a price: the frequency resolution diminishes or, equivalently, the measurement time increases. 27

44 Chapter 2: The Best Linear Approximation 2.5 Estimating the Best Linear Approximation In the last part of this chapter, we explain how the Best Linear Approximation of a nonlinear system can be determined. First, single input, single outputs systems are treated. Next, the results are extended to MIMO systems. Depending on the kind of excitation signals used during the experiments, the BLA is calculated differently. Therefore, we will make a distinction between periodic and non periodic data Single Input, Single Output Systems A. Periodic Data When using periodic excitation signals to determine the BLA of a SISO system, the experiments should be carried out according to the scheme depicted in Figure 2-10 [15]. In total, M different random phase multisines are applied, as shown on the vertical axis. After the transients have settled, we measure for each experiment P periods of the input and the output. Per experiment m and period p, we compute the DFT of the input ( U( k) [ mp, ] ) and the output ( Yk ( ) [ mp, ] ). Then, the spectra are averaged over the periods. For frequency k, we obtain: Û( k) m Ŷ( k) m [ ] 1 P = -- Uk ( ) [ mp, ] P [ ] 1 p = 1 P = -- Yk ( ) [ mp, ] P p = 1 (2-11) For every experiment m, we then calculate the FRF estimate Ĝ( jω k ) [ m] : Ĝ( jω k ) [ m] Ŷ( k) [ m] = , (2-12) [ m] Û( k) which is equivalent to (2-4) for periodic excitations. Next, the M FRF estimates are combined in order to obtain the Best Linear Approximation Ĝ BLA ( jω k ): Ĝ( jω k ) [ m] Ĝ BLA ( jω k ) = M Ĝ jω M ( k ) m m = 1 [ ] (2-13) 28

45 Estimating the Best Linear Approximation m = 1 Transient U [ 11, ] Y [ 11, ], P periods U [ 1, P], Y [ 1, P] Ĝ [ 1] σˆ n 2 [ 1], experiments M m = 2 U [ 21, ] Y [ 21, ], U 2 P, [, ] Y [ 2, P] Ĝ [ 2] σˆ n 2 [ 2], m = M U [ M, 1] Y [ M, 1], U MP, [, ] Y [ MP, ] Ĝ [ M] σˆ n 2 [ M], Figure Experiment design to calculate the BLA of a SISO nonlinear system. Furthermore, due to the periodic nature of the excitation signals, the effect of the nonlinear distortions and the measurement noise on the Best Linear Approximation can be distinguished from each other. The variations over the P periods stem from the measurement noise, while the variations over the M experiments are due to the combined effect of the measurement noise and the stochastic nonlinear behaviour. Note that non-stationary disturbances such as non-synchronous periodic signals can also be detected using the measurement scheme from Figure 2-10 [15]. First, we will determine the sample variance of Ĝ BLA ( jω k ) due to the measurement noise. A straightforward way to achieve this is to calculate the FRFs per period: Ĝ( jω k ) [ m, p] Yk ( ) [ mp, ] = , (2-14) Uk ( ) [ mp, ] and to employ Ĝ( jω k ) [ m, p] to calculate the sample variance σˆ n 2 [ m] of Ĝ( jω k ) [ m], which is then given by σˆ n 2 [ m] P. (2-15) PP ( 1) Ĝ ( jω ) [ m, p] Ĝ ( jω ) [ m] 2 = k k p = 1 29

46 Chapter 2: The Best Linear Approximation The drawback of this approach is that in equation (2-14) raw input data are employed without increasing the SNR by averaging over the periods. If the SNR of the input is low, the estimates Ĝ( jω k ) [ m, p] will be of poor quality: a non negligible bias is present (SNR < 10 db, [55]) and a high uncertainty (SNR < 20 db, [54]). A better option to determine σˆ n 2 [ m] is to use the covariance information from the averaged input and output spectra, and to apply a first order approximation [56]. First, we calculate the sample variances and covariance of the estimated spectra Û( k) [ m] and Ŷ( k) [ m] : σˆ U 2 [ m] 1 = σˆ Y 2 [ m] 1 σˆ YU 2 [ m] 1 = P U mp P 1 p = 1 P [, ] Û [ m] 2 [, ] Ŷ [ m] 2 = Y mp P 1 p = 1 P [, ] Ŷ [ m] [, ] Û [ m] ( Y mp )( U mp ) P 1 p = 1 (2-16) In (2-16), the frequency index k was omitted in order to simplify the formulas. From σˆ U 2 [ m], σˆ Y 2 [ m], and σˆ YU 2 [ m], the sample variance σˆ n 2 [ m] of Ĝ( jω k ) [ m] can be approximated by [56]: σˆ n 2 [ m] Ĝ [ m] 2 σˆ Y 2 [ m] σˆ U 2 [ m] σˆ YU 2 [ m] = , (2-17) P Ŷ [ m] Û [ m] 2 2Re Ŷ [ m] Û [ m] with σˆ n 2 [ m]. In a noiseless input framework, expression (2-17) simplifies to σˆ n 2 [ m] 1 σˆ Y 2 [ m] = (2-18) P Û [ m] 2 Next, the M estimates σˆ n 2 [ m] are averaged in order to acquire an improved estimate: 1 M ---- σˆ n 2 [ m]. (2-19) M m = 1 After applying the N -law, we obtain the uncertainty of Ĝ BLA due to the measurement noise: 30

47 Estimating the Best Linear Approximation σˆ n = σˆ n 2 m M 2 M m = 1 [ ] (2-20) Furthermore, the combined effect of the stochastic nonlinear behaviour and the measurement noise can also be measured by calculating the total sample variance from the M estimates Ĝ( jω k ) [ m] : σˆ 2 BLA. It is determined σˆ 2 1 BLA = M MM ( 1) Ĝ m m = 1 [ ] 2 Ĝ BLA (2-21) The total variance of the BLA is equal to the sum of the measurement noise variance and the variance due to the stochastic nonlinear contributions σ2 NL ( k) : σ2 BLA ( k) = σ2 NL ( k) + σ 2 n ( k) (2-22) Hence, σˆ 2 NL( k) is estimated with σˆ 2 NL( k) = σˆ 2 BLA( k) σˆ n 2 ( k). (2-23) To conclude, for frequency index k we now have: Ĝ BLA ( jω k ): The Best Linear Approximation. σˆ 2 BLA( k) : The total sample variance of the estimate Ĝ BLA ( jω k ), due to the combined effect of the measurement noise and the stochastic nonlinear behaviour. σˆ n 2 ( k) : The measurement noise sample variance of the estimate Ĝ BLA ( jω k ). σˆ 2 NL( k) : The sample variance of the stochastic nonlinear contributions. 31

48 Chapter 2: The Best Linear Approximation B. Non Periodic Data A combination of higher order correlation tests can be used to detect unmodelled nonlinearities for arbitrary excitations [20]. But in opposition to the case of periodic excitation signals, no simple methods exist to distinguish between the nonlinear distortions and the effect of measurement noise when non periodic excitations are used. Furthermore, leakage errors will be present when calculating the input and output DFT spectra [53]. However, it is still possible to characterize the combined effect of nonlinear distortions and measurement noise if we assume a noiseless input framework. First, the measured input and output time domain data are split into M blocks. In order to reduce the leakage effect, a Hanning or a diff window can for instance be applied to the signals; we will employ the latter [65]. Then, the input and output DFT spectra of each block are calculated. The next step is to calculate the sample cross-power spectrum between the output and the input Ŝ YU ( k) and the auto-power spectrum of the input Ŝ UU ( k), using Welch s method [84]: 1 M Ŝ XY = ---- X [ m] Y [ m]h. (2-24) M m = 1 The BLA is then given by Ĝ BLA ( jω k ) = Ŝ YU ( k) (2-25) Ŝ UU ( k) When working in a noiseless input framework, it can be shown that the following expression yields an unbiased estimate of the covariance of the output DFT Ŷ( k) : σˆ Y 2 ( k) = M (, (2-26) 2( M 1) Ŝ YY ( k ) Ĝ BLA ( jω k )Ŝ UY ( k) ) where the factor 2 stems from the usage of the diff window. From this expression, we can derive the uncertainty of SISO case): Ĝ BLA ( jω k ) (see the end result of Appendix 2.B, simplified to the 32

49 Estimating the Best Linear Approximation σˆ 2 BLA( k) = 1 σˆ Y 2 ( k) MŜ UU ( k) (2-27) To summarize, for frequency k we have: Ĝ BLA ( jω k ): The Best Linear Approximation. σˆ 2 BLA( k) : The total sample variance of the estimate Ĝ BLA ( jω k ), due to the combined effect of the measurement noise and the stochastic nonlinear behaviour. Remark: When choosing the number of blocks M to split the input and output data, a tradeoff is made between the leakage effects and the uncertainty due to the stochastic nonlinear behaviour and the measurement noise. When M is large, this results in shorter time records of length N. Hence, the leakage will be more important, since leakage is a O( N 1 ) effect for a rectangular window, and a ON ( 2 ) effect for a Hanning window [53]. Furthermore, the frequency resolution diminishes with increasing M. On the other hand, it can be seen from (2-27) that the uncertainty on the estimate Ĝ( k) reduces for a larger M. To conclude, with the number of blocks M we can balance between a lower variance and a better frequency resolution of the BLA. 33

50 Chapter 2: The Best Linear Approximation Multiple Input, Multiple Output Systems Next, we explain how the Best Linear Approximation is estimated for MIMO systems. In the case of periodic excitation signals, the essential difference between the SISO and the MIMO framework is the need for multiple experiments. This stems from the fact that the influences of the different inputs, superposed in a single experiment, need to be separated. Again, a distinction is made between periodic and non periodic data. A. Periodic Data When periodic signals are employed to determine the Best Linear Approximation of a MIMO system, the experiments are usually carried out according to the scheme depicted in Figure In total, M blocks of n u experiments are performed, as shown on the vertical axis. After the transients have settled, P periods of the input and the output are measured for each experiment. Per block m and period p, we assemble the n u DFT spectra of the n u inputs and n y outputs in the matrices Uk ( ) [ mp, ] nu n u and Yk ( ) [ mp, ] ny n u, respectively. Then, the input and output spectra are averaged over the periods per block m. For frequency k, we have: Û( k) m Ŷ( k) m [ ] 1 P = -- Uk ( ) [ mp, ] P [ ] 1 p = 1 P = -- Yk ( ) [ mp, ] P p = 1 (2-28) For every block m, we now calculate an FRF estimate Ĝ( jω k ) [ m] ny n u : Ĝ( jω k ) [ m] Ŷ ( k ) [ m] m = Û( k) [ ] 1. (2-29) The M FRF estimates Ĝ( jω k ) [ m] are then combined in order to obtain the Best Linear Approximation Ĝ BLA ( jω k ): Ĝ BLA ( jω k ) = M Ĝ jω M ( k ) m m = 1 [ ] (2-30) 34

51 Estimating the Best Linear Approximation m = 1 Transient U [ 11, ] Y [ 11, ], P periods U [ 1, P], Y [ 1, P] Ĝ [ 1] [ 1] Ĉ n, experiments M n u m = 2 m = M U [ 21, ] Y [ 21, ], U 2 P, U [ M, 1] Y [ M, 1], U MP, [, ] Y [ 2, P] [, ] Y [ MP, ] Ĝ [ 2] [ 2] Ĉ n, Ĝ [ M] [ M] Ĉ n, Figure Experiment design to calculate the BLA of a MIMO nonlinear system. Again, we are able to make a distinction between the nonlinear distortions and the effect of the measurement noise. First, we will determine the sample covariance matrix of Ĝ BLA ( jω k ) due to the measurement noise. For the discussion why the calculation should be carried out via the covariances of the input and output spectra, we refer to the SISO case ( Periodic Data on p. 28). The sample covariance matrices of the averaged DFT spectra Û( k) [ m] and Ŷ( k) [ m] are given by: Ĉ [ m] Û Ĉ [ m] Ŷ Ĉ [ m] ŶÛ = = = P PP ( 1) vec U mp p = 1 P [, ] [ m] Û PP ( 1) vec Y mp p = 1 P [, ] [ m] Ŷ PP ( 1) vec Y mp p = 1 [, ] [ m] Ŷ [, ] vec mp U [, ] vec mp Y [, ] vec mp U Û Ŷ Û [ m] H [ m] H [ m] H (2-31) In (2-31), the frequency index k was omitted in order to simplify the formulas. From Ĉ [ m] n u n u n u n u, Ĉ [ m] n y n u n y n u, and Ĉ [ m] n y n u n u n u, the sample Û [ m] Ŷ ŶÛ covariance Ĉ n of Ĝ [ m] is estimated with (see Appendix 2.A): 35

52 Chapter 2: The Best Linear Approximation [ m] Ĉ n = Û [ m] T I ny ĈŶ [ m] [ m] Û T I ny H + [ m] T Ĝ [ m] [ m] T Ĝ [ m] + Û ĈÛ [ m] Û H + (2-32) 2herm Û [ m] T I ny ĈŶÛ [ m] [ m] Û T Ĝ [ m] H with [ m] Ĉ n n y n u n y n u. In a noiseless input framework, expression (2-32) simplifies to [ m] Ĉ n = Û [ m] T I ny ĈŶ [ m] [ m] Û T I H ny (2-33) [ m] Since M estimates Ĉ n are at our disposal, we can combine them in order to obtain an improved estimate of the covariance matrix that characterizes the measurement noise: 1 m Ĉ n = Ĉ n M 2 M m = 1 [ ] (2-34) The combined effect of the stochastic nonlinear behaviour and the measurement noise is characterized by the total sample covariance estimates Ĝ( jω k ) [ m] : Ĉ BLA. This quantity is determined from the M 1 Ĉ BLA = MM ( 1) vec Ĝ M m = 1 [ m] Ĝ BLA vec Ĝ [ m] Ĝ H BLA (2-35) The total covariance of the BLA is equal to the sum of the measurement noise covariance C n ( k) and the covariance due to the stochastic nonlinear contributions C NL ( k) : C BLA ( k) = C NL ( k) + C n ( k) (2-36) Hence, Ĉ NL ( k) is estimated with Ĉ NL ( k) = Ĉ BLA ( k) Ĉ n ( k) (2-37) 36

53 Estimating the Best Linear Approximation To conclude, for frequency k we have: Ĝ BLA ( jω k ): The Best Linear Approximation. Ĉ BLA ( k) : The total sample covariance matrix of the estimate Ĝ BLA ( jω k ), due to the combined effect of the measurement noise and the stochastic nonlinear behaviour. Ĉ n ( k) : The measurement noise sample covariance matrix of the estimate Ĝ BLA ( jω k ). Ĉ NL ( k) : The sample covariance of the stochastic nonlinear contributions. Remark: When random phase multisines are used to perform FRF measurements on a multivariable system, it is possible to make an optimal choice for the phases. Within a block of experiments m, orthogonal random phase multisines should be used whenever possible, as they are optimal in the sense that they minimize the variance of the estimated BLA [18],[85]. When these signals are used, the condition number of the matrix Û( k) [ m] in equation (2-29) equals one. Hence, the inverse of Û( k) [ m] in (2-29) is calculated in optimal numerical conditions. Orthonormal random phase multisines are created in the following way. First, n u ordinary random phase multisines are generated: U 1,, U nu. Then, a unitary matrix W n u n u is used to define the excitation signals for the n u experiments. For frequency k, the applied signal is: w 11 U ( k) w 1 1nu U ( k) nu Uk ( ) = w nu 1 U 1 ( k) w n u n U ( k) u nu, (2-38) where we omitted the index [ m]. For W, the DFT matrix can for instance be used: 1 j2π( k 1) ( l 1) w kl = exp , (2-39) n n u u 37

54 Chapter 2: The Best Linear Approximation where w kl is the element of W at position ( k, l). In the case of a system with three inputs ( n u = 3 ), we have W = exp j2π exp j2π exp j2π exp j2π (2-40) B. Non Periodic Data Contrary to the case of periodic excitation signals, no simple methods are available to distinguish between the nonlinear distortions and the noise effects when non periodic excitations are used. Furthermore, leakage errors will be present when calculating the input and the output spectra. However, it is still possible to characterize the combined effect of the nonlinear distortions and the measurement noise. First, the input and output data are split into M > n u blocks. Then, the input and output DFT spectra are calculated per block. Again, leakage effects are diminished by means of a Hanning or diff window [65]. The next step is to calculate the sample cross-power spectrum of the inputs and outputs Ŝ YU ( k) and the auto-power spectrum of the inputs Ŝ UU ( k) using (2-24). The Best Linear Approximation is then given by 1 Ĝ BLA ( jω k ) = Ŝ YU ( k)ŝ UU ( k). (2-41) If we assume noiseless inputs, the following expression can be used to calculate the covariance of the output spectrum: Ĉ Y ( k) = M, (2-42) ( ( M n u ) Ŝ YY ( k ) Ĝ BLA ( jω k )Ŝ UY ( k) ) where the factor 2 stems from using a diff window. Making use of Ĝ BLA ( jω k ) can be derived (see Appendix 2.B): Ĉ BLA ( k) = 1 T ----Ŝ M UU ( k) Ĉ Y ( k) Ĉ Y ( k), the uncertainty of (2-43) 38

55 Estimating the Best Linear Approximation To summarize, for frequency k we have: Ĝ BLA ( jω k ): The Best Linear Approximation. Ĉ BLA ( k) : The total sample covariance matrix of the estimate Ĝ BLA ( jω k ), due to the combined effect of the measurement noise and the stochastic nonlinear behaviour. Remark: Again, a trade-off is made when choosing the number of blocks M. See the SISO case ( Non Periodic Data on p. 32) for a discussion. 39

56 Chapter 2: The Best Linear Approximation Appendix 2.A Calculation of the FRF Covariance from the Input/Output Covariances The measured input and output DFT coefficients for a block of Uk ( ) = U 0 ( k) + N U ( k) Yk ( ) = Y 0 ( k) + N Y ( k) n u experiments are given by (2-44) for frequencies k = 1,, F. U 0 ( k) n u n u and Y 0 ( k) n y n u are the noiseless Fourier coefficients; N U ( k) and N Y ( k) are the contributions of all the noise sources in the experimental set-up. Assumption 2.11 (Disturbing Noise): The input N U ( k) and output N Y ( k) errors satisfy the following set of equations: { vec( N U ( k) )} = 0 { vec( N Y ( k) )} = 0 vec N U ( k) vec NU ( k) vec N Y ( k) vec NY ( k) H H = = C U ( k) C Y ( k) (2-45) vec N Y ( k) vec NU ( k) H H = C YU ( k) = C UY( k) Furthermore, we assume that N U ( k) and N Y ( k) are independent of U 0 ( k) and Y 0 ( k). The FRF estimate Gjω ( k ) is given by Gjω ( k ) Yk ( )Uk ( ) 1 Y 0 ( k) + N Y ( k) U0 ( k) + N. (2-46) U ( k) 1 = = 40

57 Estimating the Best Linear Approximation We will calculate the variability of the FRF estimate using a first order Taylor approximation. For notational simplicity, we will omit the frequency index k in the following calculations. First, 1 we isolate U 0 from U 1 : G ( Y 0 + N Y ) 1 I nu + N U U 0 U = = ( Y 0 + N Y )U 0 I nu + N U U 1 0 (2-47) Next, we apply the Taylor expansion, restricting ourselves to the first order terms. For small α, we have I nu + α 1 I. (2-48) nu α When we apply this to (2-47), we obtain 1 1 G = ( Y 0 + N Y )U 0 I nu N U U 0, (2-49) and when we omit the second order terms in N U and N Y, 1 G Y 0 U 0 Y 0 U 1 0 NU U 1 1 = + N 0 Y U 0. (2-50) We define 1 G 0 = ( Y 0 )U 0, (2-51) and then rewrite (2-50): 1 1 G = G 0 + N G = G 0 G 0 N U U 0 + N Y U 0 (2-52) Hence, we obtain 1 N G = N Y U 0 1 G 0 N U U 0. (2-53) In order to compute vec( N G ) as a function of vec( N U ) and vec( N Y ), we will apply the following vectorization property: 41

58 Chapter 2: The Best Linear Approximation vec( ABC) = ( C T A)vec( B) (2-54) This results in T vec( N G ) = U 0 I ny vec ( NY ) U 0 T G 0 vec ( NU ). (2-55) Next, we determine the covariance matrix C G which is defined as C G vec( N G )vec( N G ) H =. (2-56) By combining equations (2-45), (2-55) and (2-56), we obtain: C G = T U 0 I ny CY T U 0 I H T ny U 0 G T + 0 CU U 0 G H 0 + 2herm T U 0 I T ny CYU U 0 G 0 H (2-57) 42

59 Estimating the Best Linear Approximation Appendix 2.B Covariance of the FRF for Non Periodic Data If we assume a noiseless input framework, then the measured input and output DFT spectra for block m are given by [ ] k U [ m] m ( k) = U 0 Y [ m] m ( k) = Y 0 ( ) [ ] k [ m] ( ) + NY ( k) (2-58) for frequencies k = 1,, F. U 0 ( k) n u 1 and Y 0 ( k) n y 1 are the noiseless DFT [ m] spectra; N Y ( k) represents the contributions of all the noise sources and the stochastic nonlinear behaviour in the experimental set-up. Assumption 2.12 (Disturbing Noise): The output error following set of equations: [ m] N Y ( k) satisfies the [ m] N Y ( k) = 0 [ m] [ m]h N Y ( k)ny ( k) = C Y ( k) (2-59) [ m] Furthermore, we assume that N Y ( k) is uncorrelated with U0 ( k) or Y 0 ( k). The noiseless FRF estimate Ĝ 0 ( jω k ) is given by 1 Ĝ 0 ( jω k ) = Ŝ Y0 U ( k)ŝ. (2-60) 0 U0 U ( k) 0 We will calculate the variability of the FRF estimate and omit the frequency index following calculations for notational simplicity. From (2-58), we have k in the Y [ m] [ m] [ m] [ m] = Y 0 + N Y = ĜU 0. (2-61) 1 [ m]h We right-multiply both sides of equation (2-61) with ----U, and compute the summation M 0 over block index m : 43

60 Chapter 2: The Best Linear Approximation M 1 m ---- Y M 0 m = 1 [ ] [ m]h U0 M 1 [ m] m N M [ ]H Y U0 Ĝ 1 M ---- U m = 0 m = 1 M m = 1 [ ] [ m]h U0 (2-62) or when we apply (2-24) M 1 Ŝ Y0 U M [ m] [ m]h N Y U0 = ĜŜ U0 U. 0 (2-63) m = 1 1 We then right-multiply (2-63) with Ŝ U0 U and use (2-60): 0 1 m Ĝ = Ĝ 0 + N G = Ĝ N M Y M m = 1 [ ] [ m]h 1 U0 ŜU0 U 0 (2-64) Hence, we obtain N G = M 1 [ m] [ m]h M N Y U0 ŜU0 U 0 (2-65) m = 1 In order to compute vec( N G ) as a function of [ m] vec( N Y ) = [ m] N Y, we apply the vectorization property vec( ABC) = ( C T A)vec( B). (2-66) This results in vec( N G ) = M 1. (2-67) M ---- U [ m ]H 1 0 ŜU0 T U I [ m] 0 ny NY m = 1 Next, we determine the covariance matrix C G which is defined as C G vec( N G )vec( N G ) H =. (2-68) By combining equations (2-67) and (2-68) we obtain: 44

61 Estimating the Best Linear Approximation M 1 C G M 2 U [ m ]H 1 = 0 ŜU0 T U I 0 ny m = 1 M n = 1 [ n]h 1 U 0 ŜU0 U 0 T I ny [ m] NY H [ n] NY (2-69) [ m] [ n] In order to eliminate the two sums, we make use of the independency of N Y and N Y for [ m] [ n] m n, and of U 0 and U 0 for m n. We also have that N[ m] [ n] Y is uncorrelated with U 0 for any m and n. This results in M C G. (2-70) M 2 U [ m ]H 1 0 ŜU0 T U I [ m] [ m]h [ m]h 0 ny NY NY 1 U0 ŜU0 TH U I = 0 ny m = 1 Applying (2-59) together with C Y = 1 C Y, results in 1 ( S XX ) H 1 = S XX, and taking into account the fact that M 1 C G T [ m]ht. (2-71) M 2 Ŝ U0 U U 0 0 I ny 1 CY [ m]t T U0 ŜU0 U I = 0 ny m = 1 Next, we make use of the Mixed-Product rule: ( A B) ( C D) = AC BD, (2-72) provided that A, B, C, D have compatible matrix dimensions. We then obtain M 1 T C G m = Ŝ M U0 U ---- U 0 M 0 m = 1 [ ] [ m]h U0 T T Ŝ U0 U0 C Y, (2-73) or finally, after using (2-24) and reintroducing the frequency index k : 1 T C G ( k) = ----Ŝ. (2-74) M U0 ( k) C U 0 Y ( k) 45

62 Chapter 2: The Best Linear Approximation 46

63 CHAPTER 3 FAST MEASUREMENT OF QUANTIZATION DISTORTIONS A measurement technique is proposed to characterize the nonidealities of DSP algorithms which are induced by quantization effects, overflows (fixed point), or nonlinear distortions in one single experiment/simulation. The main idea is to apply specially designed multisine excitations such that a distinction can be made between the output of the ideal system and the contributions of the system s nonidealities. This approach allows to compare and quantify the quality of different implementation alternatives. Applications of this method include for instance digital filters, FFTs, and audio codecs. 47

64 Chapter 3: Fast Measurement of Quantization Distortions 3.1 Introduction Digital Signal Processing (DSP) systems have the advantage of being flexible when compared with analog circuits. However, they are prone to calculation errors, especially when a fixed point implementation is used. These errors induce non-ideal signal contributions at the output such as the presence of quantization noise, limit cycles, and nonlinear distortions. In order to be sure that the design specifications are still met, it is necessary to verify the presence of these effects and to quantify them. A simple approach uses single sine excitations to test the system. Unfortunately, this method does not reveal all problems and requires many experiments in order to cover the full frequency range. A more thorough approach consists in a theoretical analysis of the Device Under Test (DUT). A good example of this is given in [50], where the effects of coefficient quantization of Infinite Impulse Response (IIR) systems is analysed. The main disadvantage is that for every new DUT, or for every different implementation of the DUT, the full analysis needs to be repeated. This can be an involved and time consuming task, as a different approach may be needed for every new DUT. In this chapter, a method is proposed that detects and quantifies the quantization errors using one single multisine experiment with a well chosen amplitude spectrum. First, the measurement concept will be introduced. Then, a brief discussion of the major errors in DSP algorithms will be given. Finally, the results will be illustrated on a number of examples. 48

65 The Multisine as a Detection Tool for Non-idealities 3.2 The Multisine as a Detection Tool for Non-idealities Here, a special kind of multisine will be used, namely a Random Odd Multisine (see The Multisine as a Detection Tool for Nonlinearities on p. 26). Using this excitation signal, it is possible to extract the BLA G BLA ( jω) of the DSP system for the class of Gaussian excitations with a fixed Power Spectral Density [62],[66]. The BLA can be obtained in two ways. The first way is to average the measured transfer functions for different phase realizations of the input multisine (see Estimating the Best Linear Approximation on p. 28). The second way is to identify a low order parametric model from a single phase realization. In order to have a realistic idea of the level of nonlinear distortions, the power spectrum of the excitation signal should be chosen such that it coincides with the power spectrum of the expected input signal of the system in later use. To summarize, the aim of the proposed method is two-folded: 1. Extract the BLA in order to evaluate the linear behaviour of the DSP system; 2. Detect and quantify the nonlinear effects in the DSP system in order to see whether the design specifications are met. The main advantage of this method is that it can be used for any DSP system or algorithm, as long as the aim is to achieve a linear operation. A possible drawback is the limitation to a specific class of input signals. However, it must be said that the class of Gaussian signals is not that restrictive, since, for example, most telecommunication signals fall inside this class [4], [13]. 49

66 Chapter 3: Fast Measurement of Quantization Distortions 3.3 DSP Errors In this section, multisines are used to quantify the quantization distortions. A fixed point digital filter serves as an example. The advantages of a fixed point representation in DSP systems are well-known when compared to floating point processing: they are both faster and cheaper. However, they suffer from a number of serious drawbacks as well. Fixed point implementations require some knowledge about the expected dynamic range of the input signal and the intermediate signal levels. They induce a finite numerical precision and a finite dynamic range for the internal representation of the processed samples. In contrast with a fixed point representation, floating point arithmetic allows input numbers with a practically unlimited dynamic range. Depending on how the numbers are quantized (e.g. ceil, floor or round), different kinds of distortion arise in the fixed point implementation. The overflow behaviour (saturation or two s complement overflow) also plays an important role, because it can lead to a higher level of distortions and even to chaotic behaviour [8]. To investigate the influence of the finite quantization and the range problems, a fourth order Butterworth low pass filter is considered with a normalized cut-off frequency of 0.2Hz to be operated at a sampling rate of f s = 1Hz. The normalized, non-quantized filter coefficients are (in Matlab notation, with a precision of five digits after the decimal point): b = a = (3-1) with b the numerator, and a the denominator coefficients. The filter is implemented in Direct Form (DF) II [50], with 32 bit wide accumulators and a 16 bit wide data memory, including one sign bit. The fixed point representation is illustrated in Figure 3-1. The accumulators consist of 14 integer bits and 17 fractional bits; the memory has 7 integer bits and 8 fractional bits. These settings are summarized in Table 3-1, together with the largest and smallest numbers that can be achieved, and the least significant bit (lsb). The lsb is nothing more than the numerical resolution Figure 3-1. Fixed point representation: 1: Sign bit - 2: Integer bits - 3: Fractional bits. 50

67 DSP Errors Sign Bit Int. Bits Fr. Bits Max. Val. Min. Val. lsb Accumulator Memory Table 3-1. DSP settings. The input signal consists of 16 phase realizations of a Random Odd Multisine, each with a period length of 1024 samples. The random grid has excited lines between DC ( f = 0Hz ) and 80% of the Nyquist frequency ( f = 0.4Hz ). Using a block length of 4, this results in 154 excited harmonic lines. The default RMS value of the input signal is set to about 328 lsb, which is 1% of the largest number that can be represented in the data memory. Two periods of each phase realization are applied. Here, the first period of the output signal is discarded in order to eliminate transients. A common way to determine the number of samples that need to be discarded is plotting the difference between the subsequent output periods. Then, by visual inspection the number of required transient points can be determined. Next, the Best Linear Approximation is calculated by averaging the measured Frequency Response Functions (FRFs) for each phase realization [56]. The total length of the input sequence applied to the DUT is in our experiment 1024 *2*16=32768 samples. The default truncation and overflow behaviour (these terms will be explained in the following sections) are set to rounding and saturation, respectively, but we will alter these settings in order to analyse their influence Truncation Errors of the Filter Coefficients Numerous different truncation methods exist [34]. Since the exact implementation of the truncation is of no importance to the proposed method, we shall consider two common truncation methods: arithmetic rounding and flooring. Both truncation methods are depicted in Figure 3-2. The solid black line represents the truncation characteristic, the dashed line stands for the ideal behaviour, and the grey line shows the average behaviour of the truncation method. The floor operation is often used since it requires the least computation time: the least significant bits of the calculated samples are simply discarded. However, it introduces an average offset of ½ lsb, which is not present when using the computationally more involved rounding method. 51

68 Chapter 3: Fast Measurement of Quantization Distortions ½ lsb (a) Floor (b) Round Figure 3-2. Truncation characteristics. To inspect the quantization effect on the filter coefficients, the FRF is calculated for every phase realization by dividing the measured output spectrum by the input spectrum at the excited frequencies. Then, these FRFs are averaged over all the phase realizations. The following definition is used: x d () t = xtt ( s ) t = 12,,, N X = DFT( x) Xl () = N x N d ()e t t = 1 j2πtl N (3-2) G is the measured transfer function of the DUT: Gkf ( 0 ) Ykf ( 0 ) = k = excited frequency lines Ukf ( 0 ) (3-3) with U and Y the DFT of the input and output signals, respectively: U = DFT( u) Y = DFT( y) (3-4) The measured transfer function can now be compared with the designed one (see Figure 3-3) for both truncation methods. Since the quantization error of the coefficients results in displaced system poles, the realized transfer functions (full grey lines) differ from the designed one (black line). Furthermore, the amplitude of the complex model errors (dashed grey lines) are shown on the same plot. From Figure 3-3, we conclude that rounding should be used in this particular example in order to implement a filter with a transfer function that is as close as possible to the designed transfer function in the pass-band. Hence, we will continue the analysis in the following sections with the rounded coefficients. 52

69 DSP Errors Best Linear Approximation [db] Ideal Round Floor Frequency [Hz] Figure 3-3. Distortion of the transfer function due to quantized filter coefficients Finite Precision Distortion The level of nonlinear distortions at the output, due to the finite precision of the arithmetic and the storage operations, is investigated from the same data that was used to calculate G. In Figure 3-4 (a) and (b), the output spectrum is plotted as a function of the frequency for the rounding and floor operation, respectively. At the non excited lines in these plots, crosses represent the even contributions, and circles denote the odd contributions. Next, the distortion level at these lines is extrapolated to the excited lines. Hence, the Signal to Distortion Ratio (SDR) on the latter can be determined. At the even detection lines, the even nonlinearities (e.g. x 2 ) pop up, and at the odd detection lines, the odd nonlinearities (e.g. x 3 ) are visible. From Figure 3-4 (a) and Figure 3-4 (b), it can be seen that the rounding method only leads to odd nonlinearities, while the floor operation leads to both odd and even nonlinearities. This behaviour can be explained by the symmetry properties of the truncation characteristics (see Figure 3-2). The rounding operation is an odd function: Round( x) = Round( x). If this function was decomposed in its Taylor series, it would only consist of odd terms. The following properties hold for odd functions: the sum of two odd functions is odd; the multiplication of an odd function with a constant number is odd; the cascade of odd functions results in an odd function; 53

70 Chapter 3: Fast Measurement of Quantization Distortions Output Spectrum [db] (a) Round Excited Odd Detection Even Detection Output Spectrum [db] (b) Floor 40 Frequency [Hz] Excited 20 Odd Detection Even Detection Frequency [Hz] Figure 3-4. Output spectrum of a single multisine for different truncation methods. The filtering operation performed by the DUT consists in subsequent additions and multiplications with constant numbers (i.e., the filter coefficients). That is why at the output of the system only odd nonlinearities are seen when rounding is used as a truncation method. The floor operation is neither an odd, nor an even nonlinear operation. This leads to even and odd terms when this function is decomposed in its Taylor Series. As a result, odd and even nonlinearities are both observed in the output spectrum. Figure 3-4 (b) also shows the presence of a DC component corresponding to the offset of ½ lsb that is introduced by the flooring operation (see Figure 3-2). In the pass band, we observe that for this RMS level the SDR due to the finite precision effects is about 60 db. 54

71 DSP Errors Finite Range Distortion Finally, the effects of the limited dynamic range of the numeric representation are discussed. Two approaches can be used to deal with this problem. First, when no precautions are taken and when the value of the samples exceeds the allowed dynamic range, a two s complement overflow will occur. The resulting transfer characteristic is given in Figure 3-5. Range Figure 3-5. Transfer characteristic for two s complement overflow. A second and better method to deal with the limited range is to detect and to saturate the representation. This leads to the characteristic given in Figure 3-6. Range Figure 3-6. Transfer characteristic for saturation overflow. In the following simulation experiment, the RMS value of the input signal is increased with a factor 3, up to 983 lsb. Figure 3-7 shows heavy distortion of the transfer function in the case of two s complement overflow, while in the case of saturation the distortion is much lower. In Figure 3-8 (a) and (b), the level of nonlinear distortion at the output is analysed, using the same data as before. 55

72 Chapter 3: Fast Measurement of Quantization Distortions Best Linear Approximation [db] Ideal Saturation Overflow Frequency [Hz] Figure 3-7. Transfer function for finite range. Output Spectrum [db] (a) Saturation Excited Odd Detection Even Detection 20 Output Spectrum [db] (b) Two s 0.2 Complement 0.3Overflow Frequency [Hz] Excited Odd Detection 40 Even Detection Frequency [Hz] Figure 3-8. Output spectrum of a single multisine for different overflow methods. 56

73 DSP Errors We observe the presence of a small nonlinear contribution at the even detection lines. This is caused by the asymmetry of the fixed point representation: the largest positive number is , while the largest negative number is 2 7. Consequently, the relation Overflow( x) = Overflow( x) does not hold for all x (neither for the saturation, nor for the two s complement overflow), resulting in a nonlinear behaviour that is not strictly odd. Thus, when rounding is used to truncate the numbers, the even distortions in the output spectrum can only be caused by an overflow event. Hence, the even detection lines can act as a warning flag for internal overflows Influence of the Implementation The method presented here is also appropriate to evaluate the relative performance of different types of implementations. In the following experiment, we compare three types of implementations: ordinary Direct Form II, a cascade of Second Order Sections (SOS) optimized to reduce the probability of overflow, and a cascade of SOS reducing the round-off noise. The results for the different implementations are shown in Figure 3-9. For this particular system and RMS value of the input signal (328 lsb), the Direct Form II system shows the smallest quantization noise. 57

74 Chapter 3: Fast Measurement of Quantization Distortions (a) DF II Output Spectrum [db] Excited Odd Detection Even Detection Output Spectrum [db] (b) 0.2SOS Overflow Frequency [Hz] Excited Odd Detection Even Detection Output Spectrum [db] (c) 0.2 SOS Round off Frequency [Hz] Excited Odd Detection Even Detection Frequency [Hz] Figure 3-9. Output spectrum of a single multisine for different implementations. 58

75 Quality Analysis of Audio Codecs 3.4 Quality Analysis of Audio Codecs The method described above can also be applied to the encoding and decoding of music in a compressed format, for instance MP3. Although the coding/decoding process cannot be considered as strictly time-invariant, we can still use the multisine technique to have an idea of the level of distortion that arises. This makes it easy to compare different codecs with identical bit rates, or different bit rates for the same codec. Of course, the psycho-acoustic models employed in the encoding process can only be rated through subjective listening tests; they are not considered here. Such models are used to determine the spectral music components that are less audible to the human ear. This effect is caused by the so-called masking effect: the human ear is less sensitive to small spectral components residing in the proximity of large spectral components. The codec takes advantage of this shortcoming in order to achieve higher compression ratios [31]. Since MP3 codecs are designed to handle music, it would be more interesting to use a music sample instead of a multisine signal. However, the benefits of even and odd detection lines are still needed. Detection lines in a music sample can easily be achieved with the following procedure. Consider the music sample vector x and the following anti-symmetrical sequence: xx0[ x] x x 0[ x], (3-5) where 0[ x] denotes a set of zero samples with the same length as vector x. For such a sequence, all the frequency lines which are a multiple of 2 or 3 of the fundamental frequency will be zero. Hence, they will serve as detection lines for the odd and even nonlinearities. The LAME MP3 codec (version 1.30, engine 3.92 [88]) is used in this example. The input signal is a monophonic music sequence, sampled at 44.1 khz with 16 bits per sample. In order to have a representative set of data, an excerpt of 2 20 samples from a pop song is used. After applying the above procedure (3-5) to create detection lines, the length of the test sequence is increased to about samples. In order not to overload the figures, the number of plotted points in Figure 3-10 has been reduced. The level of distortion for three bit rates (64 kbps, 128 kbps and 256 kbps) is plotted at the left hand side of Figure For the 64 kbps encoding/decoding process, the distortion level lies 30 db below the signal level for low frequencies. The results for 128 kbps and 256 kbps show that the distortion level decreases with about 10 db per additional 64 kbps. 59

76 Chapter 3: Fast Measurement of Quantization Distortions 64 kbps Amplitude [db] Distortion Level 0 Excited 20 Odd Detection Even Detection Amplitude [db] Phase [º] Best Linear Approximation kbps Amplitude [db] Amplitude [db] Phase [º] kbps Amplitude [db] Frequency [Hz] Amplitude [db] Phase [º] Frequency [Hz] Figure MP3 coding/decoding distortion and the BLA for different bit rates. 60

77 Quality Analysis of Audio Codecs Furthermore, the Best Linear Approximation is computed for the three bit rates and plotted at the right hand side of Figure We see a flat amplitude spectrum and a zero phase, as expected. The variations in the amplitude spectrum for the lowest bit rate are probably caused by the effect of masking (which lines are masked is decided by the psycho-acoustic model of the encoder). It can also be observed that in the encoding process for 64 and 128 kbps, a low pass characteristic is present (cut-off frequencies of 14 khz and 18 khz, respectively). Consequently, the MP3 codec cuts off the high frequencies when low bit rates are used. 61

78 Chapter 3: Fast Measurement of Quantization Distortions 3.5 Conclusion In this chapter, we showed that it is possible to identify and quantify many non-idealities that occur in DSP systems, using custom designed multisines. The proposed concepts allow to verify quickly whether the input range and the quantization level are well chosen for input signals with a certain pdf and power spectrum. We have illustrated the ideas for an IIR system, for which the impact of the filter coefficient quantization, the presence of round-off noise, the overflow behaviour, and the effect of the chosen implementation can easily be measured and compared with the design specifications. Finally, the method was successfully applied to analyse the performance of an audio compression/decompression process. 62

79 CHAPTER 4 IDENTIFICATION OF NONLINEAR FEEDBACK SYSTEMS In this chapter, a method is proposed to estimate block-oriented models which are composed of a linear, time-invariant system and a static nonlinearity in the feedback loop. By rearranging the model s structure and by imposing one delay tab for the linear system, the identification process is reduced to a linear problem allowing a fast estimation of the feedback parameters. The numerical parameter values obtained by solving the linear problem are then used as starting values for the nonlinear optimization. Finally, the proposed method is illustrated on measurements from a physical system. 63

80 Chapter 4: Identification of Nonlinear Feedback Systems 4.1 Introduction Many physical systems contain, in an implicit manner, a nonlinear feedback. Consider for example a mass-spring-damper system with a nonlinear, hardening spring. For this system, the differential equation describing the displacement y c () t of the mass m is given by my c() t + dy c () t + k 1 y c () t + k 3 y c 3 () t = u c () t, (4-1) where u c () t is the input force and d is the damping coefficient. The constants k 1 and k 3 characterize the behaviour of the hardening spring ( k 3 > 0 ). To demonstrate the implicit feedback behaviour of this system, equation (4-1) is rewritten as follows my c() t + dy c () t + k 1 y c () t = u c () t k 3 y c 3 () t. (4-2) The model structure which corresponds to this equation is shown in Figure 4-1. In this block scheme, Gs ( ) is the Laplace transfer function between the input signal u c () t and the output signal y c () t of the system. The nonlinear block NL contains a static nonlinearity and represents the term k 3 y 3 c () t which is fed back in a negative way. In the following sections, we will develop an identification procedure for this kind of Nonlinear Feedback systems. + u c () t + Gs ( ) y c () t - NL Figure 4-1. LTI system with static nonlinear feedback. 64

81 Model Structure 4.2 Model Structure For a band-limited input signal u c () t (for which the power spectrum S uu ( ω) = 0 for ω > ω max ), the linear system Gs ( ) can be approximated in the frequency domain by a discrete-time model Gzθ (, L ) n b b i z i i = 0 = and θ, (4-3) n a a j z j L = ( ab, ) j = 0 provided that the sampling frequency f s is sufficiently high such that no aliasing occurs. To this matter, note that the bandwidth of yc () t is possibly higher than the bandwidth of the input signal. This is due to the nonlinearity which is present in the feedback loop. The relation between the discrete-time signals u and y and the continuous-time signals u c and y c, for a given sampling period T s, is described by the following equations ut () = u c ( tt s ) yt () = y c ( tt s ) (4-4) The relation between input ut () and output yt () is then given by yt () = Gzθ (, L )ut (). (4-5) The static nonlinearity is represented by a polynomial f NL r f NL ( x) = p l x l and θ NL = p. (4-6) l = 0 The vector θ, which contains all the model parameters, is defined as θ = ( θ L, θ NL ). (4-7) The proposed model structure to identify the system is shown in Figure 4-2. The set of difference equations that describe the input/output relation is given by 65

82 Chapter 4: Identification of Nonlinear Feedback Systems + ut () xt () + Gzθ (, - L ) yt () NL( θ NL ) Figure 4-2. Model structure. n a j = 0 n b a j yt ( j ) = b i xt ( i) i = 0 xt () = ut () p l y l () t r l = 0 (4-8) Without losing generality, we divide the first equation by a 0, or equivalently, set a 0 1. We then substitute the second equation of (4-8) into the first one, and get n b n a r yt () = b i ut i a j yt ( j) b i p l y l ( t i). (4-9) i = 0 j = 1 i = 0 l = 0 n b We isolate the terms in y l () t and move them to the left hand side: n b n a r r yk [ ] + b 0 p l y l [ k] = b i uk i a j yk [ j] b i p l y l [ k i]. (4-10) l = 0 i = 0 j = 1 i = 1 l = 0 n b This equation is an example of a nonlinear algebraic loop [41]: a nonlinear algebraic equation should be solved for every time step t when the model output is calculated. In principle, this problem can be tackled by a numeric solver. The disadvantage of this loop is that it can have multiple solutions. In fact, it is even possible that no solution exists when the degree r is even. As an example, let us take a look at the polynomial shown in Figure 4-3. The black curve is a polynomial of third degree; it corresponds to the left hand side of (4-10). The grey horizontal levels represent possible values of the right hand side of (4-10). In this example, 66

83 Model Structure Figure 4-3. Number of solutions of the algebraic equation. we observe that according to the horizontal level, the number of solutions varies: one solution for the solid grey line, two solutions for the dotted line, and three solutions for the dash dotted line. Although yt ( 1) should be a good initial guess leading the numeric solver to the correct solution, we prefer to avoid multiple solutions. To do this in a simple way, we will impose one delay tab for the linear block or, equivalently, b 0 in equation (4-10) will be set to zero. Taking into account the imposed delay, we obtain the following model equation n b n a r yt () = b i ut i a j yt ( j) b i p l y l ( t i). (4-11) i = 1 j = 1 i = 1 l = 0 n b 67

84 Chapter 4: Identification of Nonlinear Feedback Systems 4.3 Estimation Procedure A three step procedure is used to identify the parameters of the model equations (4-11). In the first step, the parameters of the linear model are identified. Next, the coefficients of the static nonlinearity are estimated. Finally, a nonlinear search procedure is employed in order to refine the initial values obtained in the first two steps Best Linear Approximation The first step in the identification process consists in estimating the Best Linear Approximation (BLA) from the measured input u m () t and output y m () t for t = 1, N (see Estimating the Best Linear Approximation on p. 28). A parametric linear model is then estimated in the Z - domain and denoted as Ĝ BLA ( z, θ L ). Note that the linear behaviour of the nonlinear feedback branch Lin( NL) is implicitly included in the estimated BLA (see Figure 4-4) Nonlinear Feedback In the second step, the nonlinear block which is present in the feedback branch is identified. To achieve this, the feedback loop is opened, and the model is restructured as shown in Figure 4-5. In order to keep the identification simple, the measured output y m () t is used at the input of the NL block. The idea of using measured outputs instead of estimated outputs in order to avoid recurrence, is similar to the series-parallel architecture from the identification of neural networks [47]. Ĝ BLA ( z, θ L ) + + u c () t + + Gs ( ) y c () t - - Lin(NL) NL (NL) Figure 4-4. The BLA (grey box) of a Nonlinear Feedback system. 68

85 Estimation Procedure + ut () + Ĝ BLA ( z, θ L ) yt () - yt () NL Figure 4-5. Rearranged model structure. As will be shown in what follows, opening the loop allows the formulation of an estimation problem that is linear-in-the-parameters θ NL for a fixed Ĝ BLA ( z, θ L ). From Figure 4-5, we obtain the following equation ( ) ut yt () = Ĝ BLA z, θ L r () p l y l () t l = 0. (4-12) The residual wt () is defined as wt () y m () t Ĝ BLA ( z, θ L )u m () t, (4-13) where u m () t and y m () t are the measured input and output, respectively. Note that wt () is independent of the parameters θ NL. Next, the error e w ( t, θ NL ) needs to be minimized: r e w ( t, θ NL ) wt () Ĝ BLA ( z, θ L ) p l yl = m () t. (4-14) l = 0 To achieve this, the least squares cost function V( θ NL ) N V( θ NL ) = e w ( t, θ NL ) 2 t = 1 (4-15) is minimized with respect to p i. 69

86 Chapter 4: Identification of Nonlinear Feedback Systems Since Ĝ BLA ( z, θ L ) is a known linear operator, independent of θ NL, this minimization is a problem that is linear-in-the-parameters which can be solved in the time or frequency domain. Its solution is given by the matrix equation θˆ NL = H + w, (4-16) with H = Ĝ BLA ( z, θ L ) y 0 m y 1 m yr m, (4-17) and where w and y m are vectors that contain the elements wt () and y m () t for t = 1,, N, respectively. In (4-17), the power should be computed elementwise, and the operator Ĝ BLA ( z, θ L ) should be applied to all the columns. The pseudo-inverse H + can be calculated in a numerical stable way via a Singular Value Decomposition (SVD) [27]. Since measurements are used in the observation matrix H of this linear least squares problem, a bias is present on the estimated parameters θˆ NL [56]. However, when the Signal to Noise Ratio (SNR) achieved by the measurement set-up is reasonable, this bias remains small, yielding results that are well enough to initialize the nonlinear search procedure Nonlinear Optimization The starting values obtained from the initialization procedure can be improved by solving the full nonlinear estimation problem. The Levenberg-Marquardt algorithm (see The Levenberg- Marquardt Algorithm on p. 135) is used to minimize the weighted least squares cost function F V WLS ( θ) = Wk ( ) ε( k, θ) 2, (4-18) k = 1 where Wk ( ) is a user-chosen, frequency domain weighting matrix. The model error ε( k, θ) is defined as ε( k, θ) = Y m ( k, θ) Yk ( ), (4-19) where Y m ( k, θ) and Yk ( ) are the DFT of the measured and modelled output from equation (4-11), respectively. Hence, the cost function is formulated in the frequency domain, which 70

87 Estimation Procedure enables the use of nonparametric weighting. Typically, the weighting matrix Wk ( ) is chosen 1 equal to the inverse covariance matrix of the output Ĉ Y ( k). This matrix can be obtained straightforwardly when periodic signals are used to excite the DUT. In two different situations, leakage can appear in equation (4-19): when arbitrary excitations are employed, or when subharmonics are present in the measured or modelled output. In the first case, the leakage can be reduced by windowing techniques, or by increasing the length of the data record. In the second case, it suffices to increase the DFT window length such that an integer number of periods are measured in the window. The Levenberg-Marquardt algorithm requires the computation of the derivatives of the model error to the model parameters θ. The analytical expressions of the Jacobian are given in Appendix 4.A. In these expressions, the modelled outputs are utilized instead of the measured outputs. Hence, the bias which was present in the previous section due to the noise on the output is now removed. 71

88 Chapter 4: Identification of Nonlinear Feedback Systems 4.4 Experimental Results We will now apply the ideas of the previous sections to a practical measurement set-up. The Device Under Test is an electronic circuit, also known as the Silverbox [67], which emulates the behaviour of a mass-spring-damper system with a nonlinear spring. The experimental data originate from a single measurement and contain two main parts. In all experiments, the input and output signals are measured at a sampling frequency of 10 MHz/2 14 = Hz. The first part of the data is a filtered Gaussian signal with a RMS value that increases linearly with time. This sequence consists of samples and has a bandwidth of 200 Hz; it will be used for validation purposes. Note that the amplitude of the validation sequence exceeds the amplitude of the estimation sequence. A warning is in place here: generally speaking, extrapolation during the validation test should be avoided, since it reveals no information about the model quality. Good extrapolation performance, certainly in a black box framework, is often a matter of luck: the model structure happens to correspond exactly to the system s internal structure. The second part of the data consists of ten consecutive realizations of a random phase multisine with 8192 samples and 500 transient points per realization, depicted in Figure 4-6 with alternating colours. The bandwidth of the excitation signal is also 200 Hz and its RMS value is 22.3 mv. The multisines will be employed to estimate the model. In this measurement, odd multisines were used: U k = A k = 2n + 1 n U k = 0 elsewhere (4-20) Input Signal [V] Validation Estimation Time [s] Figure 4-6. Excitation signal that consists of a validation and an estimation set. 72

89 Experimental Results using the same symbols as in equation (2-3). The phases were chosen uniformly distributed over [0, 2π). We will extract the BLA of the DUT by averaging the measured transfer functions for different phase realizations of the multisine [56],[62],[66]. Note that this also diminishes the measurement noise on the BLA. φ k Linear Model We will start with a comparison of two second order linear models. The first model has completely unrestricted parameters. For the second model, we impose one delay tab which is equivalent to forcing b 0 to zero during the estimation. As mentioned before, this delay is imposed in order to avoid an algebraical loop. Furthermore, the order of the numerator is increased to reduce the model error. The consequences of the delay are now investigated by comparing the quality of the following models: 1. Full linear model: G 1 ( z) = b 0 + b 1 z 1 + b 2 z a 0 + a 1 z 1 + a 2 z 2 (4-21) 2. Linear model with imposed delay: G 2 ( z) = b 1 z 1 + b 2 z 2 + b 3 z 3 + b 4 z 4 + b 5 z a 0 + a 1 z 1 + a 2 z 2 (4-22) The estimation of the parameters of G 1 ( z) and G 2 ( z) for the DUT is carried out in the frequency domain, using the ELiS Frequency Domain Identification toolbox [56]. The resulting models are plotted in Figure 4-7 (dash dotted and dashed grey lines, respectively), together with the BLA (solid black line). From this figure, it can be seen that imposing a delay results in a slight distortion of the modelled transfer function for high frequencies. The following step consists in validating these models by using the validation data set. The simulation error signals of the linear models are plotted in Figure 4-8 (b) and (c), together with the measured output signal (a). From these plots, we conclude that the same model quality is achieved for both linear models: the Root Mean Square Error (RMSE) obtained with the validation data set is 14.3 mv. This means that imposing a delay tab does not significantly deteriorate the quality of the linear model in this particular experimental set-up. 73

90 Chapter 4: Identification of Nonlinear Feedback Systems Amplitude [db] Phase [ ] Frequency [Hz] Figure 4-7. Measured FRF (solid black line), model G 1 ( z) (dash dotted grey line) and model G 2 ( z) (dashed grey line). (c) (b) (a) Measured Output Signal [V] Error Linear Model 1 RMSE: 14.3 mv Error Linear Model 2 RMSE: 14.3 mv Time [s] Figure 4-8. Validation of the linear models. 74

91 Experimental Results The estimated coefficients of the second linear model are bˆ = , â = (4-23) We now proceed with the identification procedure using the second linear model and by extending it with the static nonlinear feedback branch (see Figure 4-2) Estimation of the Nonlinear Feedback Coefficients After estimating the linear transfer characteristics, the nonlinear feedback coefficients p are estimated in the time domain, using the ten measured multisine realizations. Several degrees r were tried out, and r = 1 : 3 yielded the best result. The identified coefficients are pˆ = (4-24) Again, the model is validated on the Gaussian noise sequence. Figure 4-9 (a) shows the simulation error of the Nonlinear Feedback model. Note that the vertical scale is enlarged 10 times compared with the plots in Figure 4-8. The RMSE has dropped with more than a factor 10 compared to the linear model. Furthermore in Figure 4-9 (a), the large spikes in the error signal have disappeared Simulation error NLFB model RMSE: 1.01 mv (a) Simulation error NLFB model, optimized RMSE: 0.77 mv 0.02 (b) Time [s] Figure 4-9. Validation of the nonlinear models. 75

92 Chapter 4: Identification of Nonlinear Feedback Systems Nonlinear Optimization The proposed identification procedure significantly enhances the results compared to the linear model. But, we can achieve even better results by applying the nonlinear optimization method from section Since no covariance information is available from the measured data, a constant weighting is employed. The resulting simulation error after applying the Levenberg-Marquardt algorithm is plotted in Figure 4-9 (b). The RMSE then decreases further with about 20% to 0.77 mv Upsampling To obtain a further improvement of the modelling results, we will upsample the input and output data. The idea behind upsampling is that the influence of the delay, which is imposed artificially and which is one sample period long, is reduced. Hence, the model quality should improve. After upsampling the input and output data with a factor 2, the estimation procedure described in the previous sections is applied. The simulation error of the Nonlinear Feedback model is shown in Figure 4-10 (a). We observe indeed that the validation test yields better results: the RMS error has decreased to 0.70 mv. In addition, a nonlinear search routine is used to optimize the parameters, resulting in a simulation error of 0.38 mv (Figure 4-10 (b)). The modelling results are summarized in Table 4-1. The linear models and the linear parts of 0.02 Simulation error NLFB model, P=2 RMSE: 0.7 mv (a) Simulation error NLFB model, P=2, optimized RMSE: 0.38 mv 0.02 (b) Time [s] Figure Validation of the nonlinear models for upsampled data. 76

93 Experimental Results the Nonlinear Feedback models are all of order n a = 2, n b = 5 ; the degree r of the polynomial is set to r = 1 : 3. From Table 4-1, we conclude that by extending the linear Model Linear Linear + delay RMSE Validation 14.3 mv 14.3 mv NLFB, f s =610 Hz 1.01 mv NLFB, f s =610 Hz (optimized) 0.77 mv NLFB, f s =1221 Hz 0.70 mv NLFB, f s =1221 Hz (optimized) 0.38 mv Table 4-1. Summary of the modelling results. model with a static nonlinear feedback, the simulation error on the validation data set is reduced by more than 20 db: from 14.3 mv to 1.01 mv. The nonlinear optimization slightly improves this result down to 0.77 mv. Furthermore, using upsampling the total error reduction increases to more than 30 db compared to the linear model. Finally, the simulation error of the optimized nonlinear model using the upsampled data set decreases to 0.38 mv. For a comparison with other modelling approaches on the same DUT, using the same data set, we refer to the Silverbox case study in Chapter 6 (see Comparison with Other Approaches on p. 151). 77

94 Chapter 4: Identification of Nonlinear Feedback Systems Amplitude [db] Frequency [Hz] Figure Spectrum of the measured output (solid black line); linear model error (solid grey line); NLFB model error (black dots); NLFB model error for upsampled data (grey dots). In Figure 4-11, the amplitude spectrum of the simulation errors of the various models is shown, together with the measured output spectrum (solid black line). The solid grey line represents the error of the (unrestricted) linear model. The grey and black dots represent the Nonlinear Feedback model errors with and without upsampling, respectively. 78

95 Conclusion 4.5 Conclusion The technique proposed in this chapter provides a practical and fast way to model systems that are composed of a linear, time-invariant system and a static nonlinear feedback. The estimated model gives satisfying modelling results, which can be further improved by applying a nonlinear optimization procedure. We have applied the method on experimental data, and obtained good results. The modelling error was significantly reduced to less than 3% of the error obtained with an ordinary linear model. 79

96 Chapter 4: Identification of Nonlinear Feedback Systems Appendix 4.A Analytic Expressions for the Jacobian Recall equation (4-11) which describes the Nonlinear Feedback model n b n b r yk ( ) = b i uk ( i) + b i p l y l ( k i) a j yk ( j). (4-25) n a i = 1 i = 1l = 1 j = 1 In order to use the Levenberg-Marquardt algorithm, we need to compute the derivatives of the output to the model parameters, i.e., the Jacobian. The Jacobian elements are defined as: J bn ( k) yk ( ) b n J an ( k) yk ( ) a n J pn ( k) yk ( ) p n (4-26) Finally, we obtain: n b r r J bn ( k) = uk ( n) + p l y l ( k n ) + p l b i ly l 1 ( k i)j ( bn k i ) a j J bn ( k j) l = 1 i = 1l = 1 j = 1 n b r n a J (4-27) an ( k) = yk ( n) + p l b i ly l 1 ( k i)j ( an k i ) + a j J an ( k j) i = 1 l = 1 j = 1 n b n b r n a J pn ( k) = b i y n( k i ) + p l b i ly l 1 ( k i)j ( pn k i ) a j J pn ( k j) i = 1 i = 1l = 1 j = 1 n a 80

97 CHAPTER 5 NONLINEAR STATE SPACE MODELLING OF MULTIVARIABLE SYSTEMS This chapter deals with the modelling of multivariable nonlinear systems. We will compare a number of candidate model structures and select the one that is most suitable for our modelling problem. Next, the different classes of systems that can exactly be represented by the selected model structure are discussed. Finally, an identification procedure to determine the model parameters is presented. 81

98 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems 5.1 Introduction The aim of this chapter is to model nonlinear Multiple Input, Multiple Output (MIMO) systems. One way to achieve this is to examine the Device Under Test (DUT) thoroughly, and to build a model using first principles: for instance, physics and chemistry laws. This process can be very time-consuming, since it requires an exact knowledge about the system s structure and all its parameters. It also induces that the system should be fully understood, which is not always feasible to achieve. Another way to tackle the modelling problem is to consider the system as a black box. In that case, the only available information about the system is given by its measured inputs and outputs. This approach usually means that no physical parameters or quantities are estimated. Hence, no physical interpretation whatsoever can be given to the model. Black box modelling implies the application of a model structure that is as flexible as possible, since no information about the device s internal structure is utilized. Often, this flexibility results in a high number of parameters. In this chapter, we will make use of discrete-time models. One of the arguments for this choice is that, when looking to control applications, discrete-time descriptions are more suitable since control actions are usually taken at discrete time instances. Furthermore, the estimation of nonlinear continuous-time models is not a trivial task, and can be computationally involved, because it may imply the calculation of time-derivatives or integrals of sophisticated nonlinear functions of the measured signals [79]. Finally, it should be noted that a continuous-time approach is not strictly necessary, since we are not interested in the estimation of physical system parameters. One of the objectives is to choose a model structure that is suitable for MIMO systems. Hence, it is important that the common dynamics, present in the different outputs of the DUT, are exploited in such a way that they result in a smaller number of model parameters. First, a number of candidate model structures found in literature will be examined. Next, a specific model structure will be selected, and the relation with some other model structures (standard block-oriented nonlinear models, among others) will be investigated. Finally, an identification procedure for the selected model structure will be proposed. 82

99 The Quest for a Good Model Structure 5.2 The Quest for a Good Model Structure The literature regarding nonlinear system identification is vast, and the number of available model structures is practically unlimited. In order not to re-invent the wheel, we will briefly discuss a number of candidate model structures, and pick the one that seems most adequate. Initially, only deterministic models are considered: the presence of any kind of noise is ignored. First, two popular examples of input/output models are considered. Volterra and NARX models are both satisfying from a system theoretic viewpoint, and because of their approximation capabilities Volterra Models An introduction to Volterra series was already given in the first chapter (see The Volterra- Wiener Theory on p. 4). Since we have chosen the Volterra-Wiener approach as a framework for the Best Linear Approximation (see Properties of the Best Linear Approximation on p. 21), it is a logical first step to consider Volterra models. These models have already been employed in many application fields [43]: in video and image enhancement, speech processing, communication channel equalization, and compensation of loudspeaker nonlinearities. The main advantage of Volterra series is their conceptual simplicity, because they can be viewed as generalized LTI descriptions. Furthermore, they are open loop models for which the stability is easy to check and to enforce. However, in the case of a nonparametric representation, these benefits do not outweigh one important disadvantage: when identifying discrete-time Volterra kernels, an enormous number of kernel coefficients needs to be identified, even for a modest kernel degree. We illustrate this with a simple example of a SISO Volterra model. Table 5-1 shows the number of kernel samples N kern of two Volterra functionals for a memory length of N = 10, and N = 100 samples. Note that triangular/ regular kernels were considered in order to eliminate the redundancy of the kernel coefficients. The number of effective kernel samples is then computed using the following binomial coefficient [43] (see also Appendix 5.A): N + n 1 N kern =, (5-1) n 83

100 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems where n is the kernel degree. Table 5-1 shows that the number of required kernel coefficients increases dramatically for growing model degree and memory length. From this example, it is clear that, despite the theoretical insights they provide, nonparametric Volterra functionals are not useful in practical identification situations. However, the combinatorial growth of N kern can be tackled in various ways. For instance, a frequency domain IIR representation can be employed for the kernels [61]. But, in a black box framework such a parametrization is not straightforward, and poses a difficult model selection problem. Another solution is to apply interpolation methods to approximate the kernels in the time or frequency domain. This approach works well when the kernels exhibit a certain smoothness, see for instance [48]. The application of this idea leads to a significant decrease in parameters and, thus, measurement time. Degree n N = 10 N = Table 5-1. Number of kernel coefficients N kern in a nonparametric representation, as a function of the kernel degree n and the memory length N. A multivariable Volterra model is, by definition, composed of n y different MISO models. When these models are parametrized independently, no advantage is taken of the common dynamics that appear in the different outputs. Consequently, such a representation does not satisfy our needs, since we are looking for a more parsimonious model structure NARX Approach To avoid the excessive numbers of parameters, NARX (Nonlinear AutoRegressive model with exogeneous inputs) models were intensively studied in the eighties, as an alternative for Volterra series. The generic single input, single output NARX model is defined as yt () = fyt ( ( 1),, yt ( n y ), ut ( 1),, ut ( n u )), (5-2) where f (). is an arbitrary nonlinear function of delayed inputs and outputs [7]. Contrary to the Volterra approach, the model output yt () is now also a function of delayed output 84

101 The Quest for a Good Model Structure samples. This is very similar to the extension of linear FIR models to IIR models. This has two important consequences. First of all, a longer memory length is achieved without suffering from the dramatic increase in the number of parameters, as was observed with nonparametric Volterra series. Secondly, due to the nonlinear feedback which is present, the stability analysis becomes very difficult compared with Volterra models. This is the major price that is paid for the increase in flexibility. In [37], it is proven that a nonlinear discrete-time, time-invariant system can always be represented by a general NARX model in a region around an equilibrium point, when it is subject to two sufficient conditions: the response function of the system is finitely realizable (i.e., the state space representation has a finite number of states); a linearised model exists when the system operates close to the chosen equilibrium point. Often, a more specific kind of NARX models is employed: polynomial NARX models. By applying the Stone-Weierstrass Theorem [16], it was shown in [7] that these models can approximate any sampled nonlinear system arbitrarily well, under the assumption that the space of input and output signals is compact (i.e., bounded and closed). The NARX model can also be used to handle multivariable systems. Just as with Volterra models, multivariable NARX models are defined as a set of MISO NARX models: y i ( t + n) = f i [ y 1 ( t + n 1 1),, y 1 () t, y ny ( t + n ny 1), y ny ( t + n ny 2),, y p () t, u 1 ( t + n), u 1 ( t + n 1),, u 1 () t, u nu ( t + n), u nu ( t + n 1),, u nu () t ] (5-3) with i = 1,, n y the output index, n u the number of inputs, n i the delay per output y i, and n = max( n 1,, n p ) the maximum delay [37]. For this general nonlinear model, there is no straightforward way to parametrize the functions f i, such that advantage is taken of the 85

102 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems common dynamics present in the different outputs. Hence, we will investigate another class of models than input/output models, namely state space models. It will turn out that the latter are a very suitable description for multiple input, multiple output systems State Space Models The most natural way to represent systems with multiple inputs and outputs is to use the state space framework. In its most general form, a model is expressed as n a -th order discrete-time state space xt ( + 1) = fxt ( ()ut, ) yt () = gxt ( ()ut, ) (5-4) In these equations, ut () n u is the vector that contains the n u input values at time instance t, and y() t n y is the vector of the n y outputs. The state vector xt () n a represents the memory of the system, and contains the common dynamics present in the different outputs. The use of this intermediary variable constitutes the essential difference between state space and input/output models. For the latter, the memory is created by utilizing delayed inputs or outputs. The first equation of (5-4), referred to as the state equation, describes the evolution of the state as a function of the input and the previous state. The second equation of (5-4) is called the output equation. It relates the system output with the state and the input. Furthermore, the state space representation is not unique. By means of a similarity transform, the model equations (5-4) can be converted into a new model that exhibits exactly the same input/output behaviour. The similarity transform x T () t = T 1 xt () with an arbitrary non-singular square matrix T yields x T ( t + 1) = T 1 ftx ( T ()ut t, ) = f T ( x T ()ut t, ) yt () = gtx ( T ()ut t, ) = g T ( x T ()ut t, ) (5-5) Note that when the similarity transform is applied to arbitrary functions f and g, the resulting functions f T and g T do not necessarily have the same form as f and g. This is illustrated in the following example. Consider the output equation yt () = a bx, (5-6) x 1 () t 2 () t 86

103 The Quest for a Good Model Structure where a and b are the model parameters. When we define t ij as the ( i, j) -th element of the matrix T 1, the similarity transform results in yt () = a t x () t + t 1T 12 x () t bt ( x () t + t 21 1T 22 x () t ) 2T 2T, (5-7) which obviously cannot be written in the form yt () = a T + b. (5-8) x 1T () t T x 2T () t Hence, the similarity transform can have an influence on the model complexity. Whether the similarity transform introduces redundancy in the representation, depends on the (fixed) parametrization of f and g. However, this issue is not so important to us: all the state space models discussed in this chapter retain their model structure under a similarity transform. In what follows, we assume that f( 00, ) = 0 and h( 0, 0) = 0, such that x = 0 is an equilibrium state. In the following section, we describe different kinds of state space models. A. Linear State Space Models The model equations of the well-known linear state space model are given by xt ( + 1) = Ax() t + Bu() t yt () = Cx() t + Du() t (5-9) with the state space matrices A n a n a, B n a n u, C n y n a, and D n y n u. The transfer function Gz ( ) that corresponds to (5-9) is given by Gz ( ) = CzI ( na A) 1 B + D, (5-10) with I na the identity matrix of dimension n a. From (5-10), it is clear that the poles of Gz ( ) are given by the eigenvalues of A. By means of a similarity transform, the set of state space matrices ABCD can be converted into a new set A T B T C T D T that exhibits exactly the same input/output behaviour. The similarity transform x T () t = T 1 xt () with an arbitrary nonsingular square matrix T yields 87

104 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems A T = T 1 AT B T = T 1 B C T = CT D T = D (5-11) It is easily verified that the similarity transform has no influence on the transfer function: G T ( z) = C T ( zi na A T ) 1 B T + D T = = = CT( zi na T 1 AT) 1 T 1 B + D CzTT ( 1 TT 1 ATT 1 ) 1 B + D Gz ( ) (5-12) B. Bilinear State Space Models A continuous-time, bilinear state space model is defined as dx c () t = Ax dt c () t + Bu c () t + Fx c () t u c () t y c () t = Cx c () t + Du c () t (5-13) where A n a n a, B n a n u, C n y n a, D n y n u, and F n a n u n a are the bilinear state space matrices. These models are a straightforward extension of linear state space models, which enables them to cope with nonlinear systems. It was shown in the past that, in continuous-time, these models are universal approximators for nonlinear systems [26],[46]: any continuous causal functional can be approximated arbitrarily well by a continuous-time, bilinear state space model within a bounded time interval. The discrete-time, bilinear state space model is given by xt ( + 1) = Ax() t + Bu() t + Fx() t ut () yt () = Cx() t + Du() t (5-14) Intuitively, it is expected that this model preserves the approximation capabilities of its continuous-time counterpart. Unfortunately, this is not the case: it is not possible to approximate all (nonlinear) discrete-time systems by discrete-time, bilinear models. The reason for this is that the set of discrete-time, bilinear systems is not closed with respect to the product operation: the product of the outputs of two discrete-time, bilinear state space systems is not necessarily a bilinear system again [26]. In order to maintain the universal 88

105 The Quest for a Good Model Structure approximation property also for discrete-time systems, a more generic class of models needs to be defined: state affine models. C. State Affine Models A single input, single output, state affine model of degree r is defined as r 1 xt ( + 1) = A i u i ()xt t i = 0 r 1 + yt () = C i u i ()xt t + i = 0 r i = 1 r i = 1 B i u i () t D i u i () t (5-15) with A i n a n a, B i n a n u, C i n y n a, and D i n y n u. These models were introduced in [73], and they pop up in a natural way in the description of sampled continuoustime, bilinear systems [7],[59]. On a finite time interval and for bounded inputs, they can approximate any continuous, discrete-time system arbitrarily well in uniform sense [26]. Just as in the case of bilinear models, the states xt () appear in the state and output equations in an affine way. Hence, such a model structure enables the use of subspace identification techniques to estimate the state space matrices [80]. D. Other Kinds of State Space Models In literature, state space models come in many different flavours. In this section, we give a non-exhaustive list of various existing approaches, and mention their most remarkable properties. The idea behind Linear Parameter Varying (LPV) models [80] is to create a linear, time-variant model. Its parameters are a function of a user-chosen vector pt () s which characterizes the operating point of the system, and which is assumed to be measurable. The state space equations are an affine function of pt (): 89

106 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems xt ( + 1) = A yt () = C xt () pt () xt () xt () pt () xt () + B + D ut () pt () ut () ut () pt () ut () (5-16) with A = A 0 A 1 A s, and A i n a n a. The other state space matrices B, C, and D are partitioned in a similar way. Two special cases of the LPV model are the bilinear and the state affine model. When pt () is chosen equal to ut (), and B i = 0, C i = 0, and D i = 0 for i = 1,, s, then the model equations become identical to the bilinear model equations (5-14). The state affine description of degree r is obtained from the LPV model by choosing pt () equal to a vector that contains all distinct nonlinear combinations of ut () up to degree r 1. The LPV model structure is particularly interesting for nonlinear control: it enables the use of different linear controllers at different operating points (i.e., gain scheduling). Another kind of state space models are the so-called Local Linear Models (LLM) [80]. The idea here is to partition the input space and the state space into operating regions in which a particular linear model dominates. The state space matrices are defined as a sum of weighted local linear models: s xt ( + 1) = p i ( φ t )( A i xt () + B i ut () + O i ) i = 1 s yt () = p i ( φ t )( C i xt () + D i ut () + P i ) i = 1 (5-17) The scalar weighting functions p i (). generally have local support, like for instance radial basis functions. The scheduling vector φ t is a function of the input ut () and the state xt (). The last type of nonlinear state space models discussed here are deterministic Neural State Space models [77],[78]. The general nonlinear equations in (5-4) are parametrized by multilayer feedforward neural networks with hyperbolic tangents as activation functions: xt ( + 1) = W AB tanh( V A xt () + V B ut () + β AB ) yt () = W CD tanh( V C xt () + V D ut () + β CD ) (5-18) 90

107 The Quest for a Good Model Structure The model in (5-18) can be viewed as a multi-layer recurrent neural network with one hidden layer. It is also a specific kind of NLq system, for which sufficient conditions for global asymptotic stability were derived in [78]. Furthermore, the NLq theory allows to check and to ensure the global asymptotic stability of neural control loops. 91

108 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems 5.3 Polynomial Nonlinear State Space Models The approach in this thesis consists in starting from the general model xt ( + 1) = fxt ( ()ut, ()θ, ) yt () = gxt ( ()ut, ()θ, ) (5-19) and to apply a functional expansion of the functions f (). and g ().. For this, we need to choose a set of basis functions out of the many possibilities: sigmoid functions, wavelets, radial basis functions, polynomials, hyperbolic tangents,... We opted for a polynomial approach. The main advantage of polynomial basis functions is that they are straightforward to compute, and easy to apply in a multivariable framework. We propose the following notation for the Polynomial NonLinear State Space (PNLSS) model: xt ( + 1) = Ax() t + Bu() t + Eζ() t yt () = Cx() t + Du() t + Fη() t (5-20) The coefficients of the linear terms in xt () and ut () are given by the matrices A n a n a and B n a n u in the state equation, and C n y n a and D n y n u in the output n ζ equation. The vectors ζ() t and η() t contain monomials in xt () and ut (); the matrices E n a n ζ and F n y n η contain the coefficients associated with those monomials. The separation between the linear and the nonlinear terms in (5-20) is of no importance for the behaviour of the model. However, later on in the identification procedure this distinction will turn out to be very practical. The reason for this is that the first stage of the identification procedure consists of estimating a linear model. n η First, we briefly summarize the multinomial expansion theorem and the graded lexicographic order, which are both useful concepts when dealing with multivariable monomials Multinomial Expansion Theorem In order to denote monomials in an uncomplicated way, we first define the multi-index α which contains the powers of a multivariable monomial: n -dimensional 92

109 Polynomial Nonlinear State Space Models α = α 1 α 2 α n, (5-21) with α i. A monomial composed of the components from the vector ξ n is then simply written as ξ α = n α i ξ i, (5-22) i = 1 where ξ i is the i -th component of ξ. The total degree of the monomial is given by α = n α i, (5-23) i = 1 and the factorial function of the multi-index α is defined as α! = α 1!α 2! α n!. (5-24) Furthermore, we define ξ ( r) as the column vector of all the distinct monomials of degree r (i.e., with multi-index α = r ) composed from the elements of vector ξ. The number of elements in vector ξ ( r) is given by the following binomial coefficient (see Appendix 5.A): n r 1 + r. (5-25) Finally, the vector ξ { r} is defined as the column vector containing all the monomials of degree two up to degree r. The length of this vector is given by n+ r L n, r = 1 n. (5-26) r The notations introduced above can now be used to express the multinomial expansion theorem [83]. 93

110 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Theorem 5.1 (Multinomial Expansion Theorem) The multinomial expansion theorem gives an expression for the power of a sum, as a function of the powers of the terms: n ξ i i = 1 k k! = ---- ξ α α! α = k (5-27) Graded Lexicographic Order To assemble the monomials in a deterministic way, it is convenient to define an order of succession among the monomials. A possible choice is to utilize the lexicographic (or alphabetical) order. For this, a sequence is chosen between the symbols { ξ 1 }, { ξ 2 },, { ξ n }. The most trivial choice is to base the ordering on the index i of ξ i : { ξ 1 } ξ 2 < { } < < { ξ n } (5-28) When the symbols ξ i are combined into a word, we arrange them according to this order. For instance, the disordered monomial ξ3 ξ 2 ξ2 should be written as { ξ. Furthermore, 1 1 ξ 1 ξ 2 ξ 3 } monomials of the same degree can be ordered like words in a dictionary, e.g. { ξ 1 ξ 1 ξ 2 } < { ξ 1 ξ 1 ξ 3 } < { ξ 2 ξ 2 ξ 3 }. (5-29) Monomials of different degrees are placed in groups of increasing degree. Within each degree, the lexicographic order is applied. This results in the so-called graded lexicographic order. We will use this to order the elements of the vector ξ { r}, which contains all monomials with a degree between two and r. For instance, the vector ξ { 3} with n = 2 denotes = ξ { 3} ξ ( 2) ; ξ ( 3) = T ξ 1 ξ1 ξ 2 ξ 2 ξ1 ξ1ξ2 ξ 1 ξ 2 ξ2 (5-30) Approximation Behaviour When a full polynomial expansion of (5-19) is carried out, all monomials up to a chosen degree r must be taken into account. First, we define ξ() t as the concatenation of the state vector and the input vector: 94

111 Polynomial Nonlinear State Space Models ξ() t = x 1 () x t na ()u t 1 () u t nu () t T. (5-31) As a consequence, the dimension of the vector ξ() t is given by n = n a + n u. Then, we define ζ() t and ηt () in equation (5-20) as ζ() t = ηt () = ξt () { r}. (5-32) This is our default choice for the PNLSS model structure. The total number of parameters required by the model in (5-20) is given by n+ r 1 ( na + n y ) r = n a + n u + r 1 ( na + n y ). (5-33) r When all the nonlinear combinations of the states are present in ζ() t and ηt () for a given degree, then the proposed model structure is invariant under a similarity transform. Since the n2 a elements of the transform matrix T can be chosen freely provided that T is non singular, the effective number of parameters becomes n a + n u + r 1 ( na + n y ) n2 a. (5-34) r A. The PNLSS Approach versus State Affine Models The question we want to answer is what the approximation properties are of this model structure. When taking a closer look at (5-15) ( State Affine Models on p. 89), we observe that the State Affine (SA) representation forms a subclass of the default PNLSS model structure. Therefore, the PNLSS model structure inherits its approximation properties from the state affine framework. The remaining question is then what the additional advantage is of the PNLSS approach over the state affine representation. To investigate this, we recapitulate a derivation given in [59] for a SISO system. This derivation starts from a polynomial expansion of degree 2r of the general state space equations (5-19). This expansion is expressed by Kronecker products: 95

112 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems r r xt ( + 1) = F ij x () i ()u t j () t with F 00 = 0 i = 0 j = 0 r r yt () = G ij x () i ()u t j () t with G 00 = 0 i = 0 j = 0 (5-35) where F ij and G ij are the Taylor series coefficients of f and g, respectively, and x i () t is defined as follows: x 2) () t = xt () xt () x r) () t = xt () xt () (5-36) The notation with Kronecker products is more elegant than the one we proposed in (5-20), but it has the disadvantage that redundant monomials are present in the vector x () i () t. However, for the ideas developed here the Kronecker notation is well suited. Next, difference equations are developed for x () i ( t + 1). For instance, for x ( 2) ( t + 1) we have ( ) ( t + 1) = xt ( + 1) xt ( + 1) x 2 r r i = 0 j = 0 r r i = 0 j = 0 = F ij x () i ()u t j () t F ij x () i ()u t j () t (5-37) Still following the calculations in [59] and using implicit summation, this results in a difference equation of the form ( ) ( t + 1) = F km F qn x 2 i, j 0 k + q = i m + n = j x () i ()u t j () t. (5-38) We apply the same procedure for defined: x ( 3) ( t + 1), and so on. Furthermore, a new state vector is 96

113 Polynomial Nonlinear State Space Models x () t = x ( 1) () t x ( r) () t. (5-39) Note that this state vector is non minimal for n a > 1, because it contains identical elements due to the redundant monomials of the Kronecker representation. Finally, the terms in (5-38) with a nonlinear degree greater than r (i.e., the terms for which i + j > r ) are neglected. This implies that the approximation of the system is actually of degree r. The following state affine model is then obtained: r 1 x ( t + 1) = A i u i ()x t () t i = 0 r 1 + yt () = C i u i ()x t () t + i = 0 r i = 1 r i = 1 B i u i () t D i u i () t (5-40) B. Comparison of the Number of Parameters In (5-39), we observe that the number of states in the state affine approximation grows combinatorially with the degree of approximation r. This is the price to be paid for the state affine representation. To calculate the number of required states, the redundant states that originate from the use of the Kronecker product, need to be taken into account. The total number of distinct states n of model (5-40) is: n a + r n = r 1 (5-41) For a SISO state affine model, there are n2 + 2n + 1 matrix coefficients per set of state affine matrices A i B i C i D i, and in total there are r such sets. We also have to take into account the similarity transform. Hence, the actual number of parameters is given by rn ( 2 + 2n + 1) n2. (5-42) We will now compare this to the number of parameters that are required for a PNLSS approximation of degree r. In Figure 5-1, the ratio between expressions (5-42) (state affine 97

114 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Ratio num. par. SA/PNLSS r=1 r=2 r=3 r=4 r=5 approach) and (5-33) (PNLSS approach) is shown for various system orders ( n a = 1,, 10 ), and for different degrees of approximation ( r = 1,, 5 ). For r = 1 (red line), the ratio is one. This is a natural result since it corresponds to a linear model which has the same order for both approximations. For higher than one Order n a of the approximated system Figure 5-1. Ratio of the number of parameters for the SA approximation and the PNLSS approximation, for different degrees. r r > 1, it can be seen from Figure 5-1 that the ratio is always C. Conclusion We have shown that, for an approximation of the same quality, the PNLSS model always requires a lower number of parameters than the state affine model. For this reason, we prefer to use the PNLSS model structure over the state affine one Stability The only recursive relation which is present in the general state space model (5-19) is the state equation. Hence, the stability of the model only depends on the function f, the initial conditions of the state, and the properties of the input signal. Therefore, when analysing the stability of (5-19), it suffices to study the following equation xt ( + 1) = fxt ( ()ut, ), (5-43) with x( 0) = x 0. The concept of Input-to-State Stability (ISS) reflects this idea. It was introduced in [74] for continuous-time systems, and extended to the discrete-time case in 98

115 Polynomial Nonlinear State Space Models [32]. In order to define ISS, the following notations and definitions are used: denotes the set of all non negative integers. The set of all input functions u: m with the norm u = sup{ ut (): t } < is denoted by lm, and. is the Euclidean norm. The initial state is given by x( 0) = x 0. Definition 5.2 A function γ: 0 0 is a -function if it is continuous, strictly increasing and if γ( 0) = 0. A function β: 0 0 is a -function if for each fixed t 0, the function β (., t) is a -function, and if for each fixed s 0, the function β( s,.) is decreasing, and if β( st, ) 0 as t. Definition 5.3 (Input-to-State-Stability) System (5-43) is globally ISS, if there exist a -function β and a -function γ, such that for each input u lm and each x 0 n a it holds that xtx (, 0, u) β( x 0, t) + γ( u ) (5-44) for each t. Loosely explained, a system is ISS if every state trajectory corresponding to a bounded input remains bounded, and if the trajectory eventually becomes small when the input signal becomes small as well, independent of the initial state. In this thesis, we will not try to find such functions β and γ Some Remarks on the Polynomial Approach A. Orthogonal Polynomials The question addressed here is whether the use of orthogonal polynomials can create some additional value to the identification of the PNLSS model. Orthogonal polynomials have proved their usefulness in times when computing power and memory were scarce. For linear problems, the orthogonality of regressors induces two advantages. First of all, the reestimation of parameters is circumvented when new regressors are added to an already solved problem. Unfortunately, this asset is of no use here, since the identification of the proposed model (5-20) requires solving a nonlinear problem (see Identification of the PNLSS Model on p. 115). Secondly, orthogonality can improve the numerical conditioning. To this 99

116 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems matter, it should be noted that orthogonal basis functions only offer a clear advantage when they are applied to signals with a given probability density function. For instance, Hermite polynomials and Chebyshev polynomials are well suited for Gaussian and uniformly distributed signals, respectively. Bearing in mind the class of excitation signals we chose in Chapter 2, it would be logical to select Hermite polynomials. However, the states, which are polynomial functions of the input and the previous states, do not necessarily have a Gaussian distribution. Therefore, we will employ ordinary polynomials, since no clear advantage can be taken from the application of orthogonal polynomials. B. Disadvantages of the Polynomial Approach A number of drawbacks come along with the application of polynomials. The most important one is the explosive behaviour of polynomials outside the region in which they were estimated. Indeed, a polynomial quickly attains large numerical values when its arguments are large. At first sight, this might seem a serious drawback compared to the well-behaved extrapolation of basis functions, which tend to a constant value for large input values. However, it should be noted that, in general, it is never a good idea to extrapolate with an estimated model. This fact is independent of the chosen basis functions, whether they are polynomials, hyperbolic tangents, radial basis functions, or sigmoids. The only exception to this rule is when there exists an exact match between the DUT s internal structure and the model structure. In a black box framework, this is seldom the case. We illustrate this rule of thumb by means of a short simulation, where we use two kinds of basis functions to approximate the relation y = atan( u). (5-45) To generate the estimation data set, an input signal of 2000 samples, uniformly distributed between u = 5 and u = 5 is used. First, we estimate a 15th degree polynomial using linear least squares. Then, a Gaussian Radial Basis Function (RBF) network with 8 centres is estimated with the RBF Matlab toolbox [51]. Both models require the estimation of 16 parameters. Next, we evaluate the extrapolation behaviour of both models by applying an input between u = 10 and u = 10. The result of this test is shown in Figure 5-2. The top plot shows the original function (solid black line), together with the polynomial approximation (dashed grey line) and the RBF approximation (solid grey line). The bottom plot shows the error of both approximations on a logarithmic scale. Although the output of the RBF model 100

117 Polynomial Nonlinear State Space Models 2 y Error [db] u Figure 5-2. Top: arctangent function (solid black line); polynomial approximation (dash dotted grey line); Gaussian RBF approximation (solid grey line). Bottom: Model error for both approximations. does not explode like with the polynomial approach (as a matter of fact, it converges to zero for x ± ), it still exhibits severe extrapolation errors close to the estimation region. We conclude that the use of well-behaved basis functions like RBFs, hyperbolic tangents, or sigmoids is no justification to employ them for extrapolation. 101

118 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems 5.4 On the Equivalence with some Block-oriented Models The past years, simple block-oriented models have been utilized extensively to model nonlinear systems. The block-oriented models most commonly used are the Hammerstein, Wiener, Wiener-Hammerstein, and the Nonlinear Feedback model. Numerous applications of these models in various fields can be found in literature: the modelling of heat exchangers [10], transmission lines [58], chemical processes [49], and biological systems [30]. Furthermore, various identification methods for block-oriented models exist [2],[3],[11]. In the following sections, we will establish a link between the Polynomial NonLinear State Space (PNLSS) model and a number of standard block-oriented models. We will restrict ourselves to SISO systems, because there exists no rigorous definition for MIMO block-oriented models: when the number of system inputs is different from the number of outputs, it is not clear which dimensions the intermediate signals (i.e., the signals between the blocks) should have. Furthermore, a distinction will be made between the Hammerstein, Wiener, and Wiener- Hammerstein system. The first two systems can be considered as a special case of the Wiener-Hammerstein system, but making a distinction will render the analysis more simple and interpretable Hammerstein A Hammerstein system consists of a static nonlinearity followed by a linear dynamic system (see Figure 5-3). A typical example where this model is utilized is in the case of a non-ideal sensor exhibiting a static nonlinear effect, which is followed by a transmission line showing a linear dynamic behaviour. u v y P G 0 ( z) Figure 5-3. Hammerstein system. In general, the input signal u is distorted by a static nonlinearity P, resulting in the intermediate signal v which is filtered by a linear system G 0 ( z). The linear system is 102

119 On the Equivalence with some Block-oriented Models parametrized as a n a -th order linear state space model with parameters { A 0 B 0 C 0 D 0 }. For the parametrization of the static nonlinearity, we rely on the Weierstrass theorem. Theorem 5.4 (Weierstrass Approximation Theorem) Let f be a continuous function on a closed interval [ ab, ]. Then, given any ε > 0, there exists a polynomial P of degree r such that fx ( ) Px ( ) < ε (5-46) for all x in [ a, b]. In other words, a continuous function on a closed interval can be uniformly approximated by polynomials [35]. Hence, the static nonlinearity in Figure 5-3 is parametrized as a polynomial with coefficients p i. The following equations describe the Hammerstein system: r vt () = p i u i () t i = 1 x 0 ( t + 1) = A 0 x 0 () t + B 0 vt () yt () = C 0 x 0 () t + D 0 vt () (5-47) (5-48) The substitution of (5-47) in (5-48) results in a set of equations identical to (5-20), when we define the system matrices in (5-20) as A = A 0 B = p 1 B C = C 0 0 D = p 1 D 0 E = p 2 B p 0 r B F = p 0 2 D p 0 r D 0 (5-49) and the vectors of monomials as ζ() t = ηt () = ut () { r}. (5-50) 103

120 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems For the Hammerstein system, ζ() t and ηt () are a subset of the polynomial vector functions defined in (5-32). Therefore, we can conclude that a Hammerstein system with a continuous nonlinearity can be represented by the PNLSS model in (5-20) Wiener A Wiener system is composed of a linear dynamic block block, as shown in Figure 5-4. G 0 ( z) followed by a static nonlinear u v G 0 ( z) P Figure 5-4. Wiener system. y The equations that describe the behaviour of a Wiener system are x 0 ( t + 1) = A 0 x 0 () t + B 0 ut () vt () = C 0 x 0 () t + D 0 ut () (5-51) r yt () = p i v i () t i = 1 (5-52) By substituting the second equation of (5-51) in (5-52), and by applying the Multinomial Expansion (see Multinomial Expansion Theorem on p. 92), we find the following set of system matrices: A = A 0 B = B 0 C = p 1 C D = p 0 1 D 0 E = 0 F = 2 r 1 p 2 C ( 1) 2p2 C ( 1)C ( 2) rp r C ( n 0 a )D 0 p r D 0 r (5-53) and ζ() t = 0 η() t = ξt () { r}. (5-54) 104

121 On the Equivalence with some Block-oriented Models A similar conclusion as for the Hammerstein systems can be drawn: ζ() t and ηt () in (5-54) are a subset of the polynomial vector functions defined in (5-32). Hence, Wiener systems with a continuous static nonlinearity can be represented using the PNLSS approach Wiener-Hammerstein Wiener-Hammerstein systems are defined as a static nonlinear block sandwiched between two linear dynamic blocks G 1 ( z) and G 2 ( z) with orders n 1 and n 2, respectively (see Figure 5-5). The intermediate signals are denoted as vt () and wt (). u v w y G 1 ( z) P G 2 ( z) Figure 5-5. Wiener-Hammerstein system. The system equations are: x 1 ( t + 1) = A 1 x 1 () t + B 1 ut () vt () = C 1 x 1 () t + D 1 ut () (5-55) r wt () = p i v i () t i = 1 x 2 ( t + 1) = A 2 x 2 () t + B 2 wt () yt () = C 2 x 2 () t + D 2 wt () (5-56) (5-57) These equations are combined in order to obtain the representation of (5-20). For this, the state vectors x 1 () t and x 2 () t are merged into the new state vector xt (). Again, the equivalence holds, and we obtain the system matrices in (5-58). 105

122 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems A A 1 0 n 1 n 2 = B p 1 B C 2 1 A 2 B 1 = C = p p 1 D B 1 D C 2 1 C 2 D = p 1 D D E = 0 n n n r 1 p 2 B 2 C ( 1) 2p2 B 1 2 C ( 1)C 1 1 ( 2) rp r B 2 C ( n 1 1 )D 1 0 n 1 1 p r B 2 D 1 r (5-58) F = D 2 r 1 2 p 2 C ( 1) 2p2 C ( 1)C ( 2) rp r C ( n 1 1 )D 1 p r D 1 r The vectors of monomials are defined as ζ() t = ηt () = ξ' () t { r}, (5-59) where ξ' () t = x 1 () t x n1 () t ut () T. (5-60) Nonlinear Feedback In this section, we will discuss a simple (I) and a more general (II) type of Nonlinear Feedback system. The first system is shown in Figure 5-6, and is referred to as NLFB I. u v y G 0 ( z) P Figure 5-6. Nonlinear Feedback I. It is described by the following equations: x 0 ( t + 1) = A 0 x 0 () t + B 0 vt () yt () = C 0 x 0 () t + D 0 vt () (5-61) 106

123 On the Equivalence with some Block-oriented Models vt () = ut () p i y i () t r i = 1 (5-62) After substitution of (5-62) in (5-61), we obtain: x 0 ( t + 1) = A 0 x 0 () t + B 0 ut () p i y i () t r i = 1 yt () = C 0 x 0 () t + D 0 ut () p i y i () t r i = 1 (5-63) The last equation of (5-63) is a nonlinear algebraic equation due to the presence of the direct term coefficient D 0. This coefficient renders the system incompatible with the PNLSS model. For the more general Nonlinear Feedback system (see Figure 5-7), similar nonlinear algebraic equations pop up due to the direct term of the linear subsystems. u G 1 ( z) y G 3 ( z) P G 2 ( z) Figure 5-7. Nonlinear Feedback II. In order to continue the analysis, the following assumptions are made: Assumption 5.5 (Delay in system NLFB I) A delay is present in the linear dynamic block G 0 ( z), i.e., D 0 = 0. Assumption 5.6 (Delay in system NLFB II) A delay is present in at least one of the linear blocks G 1 ( z), G 2 ( z), or G 3 ( z). This is equivalent to D 1 = 0, D 2 = 0, or D 3 = 0. Assumption 5.5 and Assumption 5.6 are true, when for instance a digital controller is present somewhere in the feedback loop. If this is not the case, we will still assume that a zero direct 107

124 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems term is present in one of the linear blocks. Note that a delay is always present in real-life systems. If the sampling frequency is sufficiently high, the delay will be in the same order of magnitude as one delay tab. When this condition is not fulfilled, the data can be upsampled in order to achieve a negligible direct term [12]. The equations in (5-63) are therefore reduced to: x 0 ( t + 1) = A 0 x 0 () t + B 0 ut () p i ( C 0 x 0 () t ) i yt () = C 0 x 0 () t r i = 1 (5-64) These system equations are equivalent to (5-20) using the following system parameters: A = A 0 p 1 B 0 C 0 B = B 0 C = C 0 D = 0 F = 0 E = B 2 r 1 r 0 p 2 C ( 1) 2p2 C ( 1)C ( 2) rp r C ( n 0 a 1)C 0 ( ) p r C 0 ( na ) n a (5-65) and the following vectors of monomials: ζ() t = xt () { r} η() t = 0. (5-66) To prove the equivalence for the system NLFB II, different polynomial vector maps are necessary. They depend on the position of the delay in the feedback loop (i.e., the linear system for which D i is assumed to be zero). In (5-67), we define three vectors composed of the state vectors of the linear systems G1 ( z), G 2 ( z), and G 3 ( z). ξ 1 () t = x 1 ()x t 2 () t T ξ 2 () t = x 2 () t ξ 3 () t = x 1 ()x t 2 ()x t 3 ()ut t T (5-67) The necessary monomials ζ() t and ηt () are listed in Table 5-2. D 1 = 0 D 2 = 0 D 3 = 0 ζ() t ξ 1 () t ξ { r} 2 () t ξ { r} 3 () t { r} η() t 0 ξ 2 () t { r} 0 Table 5-2. Required monomials as a function of the position of the delay. 108

125 On the Equivalence with some Block-oriented Models Conclusion We have established a link between a number of standard block-oriented models and the PNLSS model. The results for the different nonlinear block structures are summarized in Table 5-3 and Table 5-4. In each row, the required monomials for the PNLSS approach are presented. Hammerstein (5-49) Wiener (5-53) Wiener-Hammerstein (5-58) ζ() t ut () { r} 0 ξ' () t { r} η() t ut () { r} ξ() t { r} ξ' () t { r} Table 5-3. PNLSS monomials for open loop block-oriented models. NLFB I (5-65) NLFB II, D 1 = 0 NLFB II, D 2 = 0 NLFB II, D 3 = 0 (5-67) (5-67) (5-67) ζ() t xt () { r} ξ 1 () t ξ { r} 2 () t ξ { r} 3 () t { r} η() t xt () { r} 0 ξ 2 () t { r} 0 Table 5-4. PNLSS monomials for feedback block-oriented models. It is beyond discussion that block-oriented models give the most physical insight to the user. From an identification point of view, they often require less parameters than the PNLSS approach. The open loop block-oriented models have the advantage that the intermediary signals are solved in a non recurrent way during the estimation and the simulation. Therefore, their stability is simple to check and to ensure. On the other hand, block-oriented models require prior knowledge about the structure of the device, which is not always easy to obtain. For some block-oriented models, like the Wiener-Hammerstein system or the Nonlinear Feedback Structure, initial values are not always straightforward to obtain. The PNLSS model is inherently compatible with MIMO systems, and it does not need any prior knowledge. The price paid for this flexibility is the explosion of the number of required model parameters. The pros and cons of both approaches are summarized in Table 5-5. To conclude, 109

126 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems none of both the approaches is clearly better than the other. For this reason, the user should choose an appropriate model structure according to his/her needs. Block-Oriented State Space Physical interpretation Number of parameters Flexibility of the model Model initialization Table 5-5. Comparison of the block-oriented approach vs. the state space approach. 110

127 A Step beyond the Volterra Framework 5.5 A Step beyond the Volterra Framework By means of two simple examples, we illustrate that systems which fit into the polynomial nonlinear state space model structure, do not necessarily belong to the Volterra framework which was set up in the introductory chapter Duffing Oscillator The Duffing oscillator is a second order nonlinear dynamic system which is excited with a harmonic signal. Its behaviour is described by the following differential equation: d 2 y c, (5-68) dt 2 a dy c by dt c + cy3 c = dcos( ωt) where a, b, and c are the system parameters. The amplitude of the sinusoidal signal is determined by the parameter d. According to the value of this parameter, several kinds of output behaviour can occur, such as ordinary harmonic output, period doubling, period quadrupling, and even chaotic behaviour. The Duffing equation can also be written in a state space form: dx 1 () t = X dt 2 () t dx 2 () t = du dt c () t ax 2 () t bx 1 () t cx 3 1 () t (5-69) where u c () t = cos( ωt). Next, this continuous-time model is converted into a discrete-time model using the Euler rule with a time step h : dx() t = fxt ( ()u, dt c () t ) xt ( + 1) = xt () + hf( x()ut t, ) (5-70) such that Xth ( ) xt (), (5-71) and 111

128 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems ut () = u c ( th). (5-72) We apply this principle to (5-69), and obtain x 1 ( t + 1) = x 1 () t + hx 2 () t x 2 ( t + 1) = hdu() t + ( 1 ha)x 2 () t hbx 1 () t hcx 3 1 () t (5-73) The discrete-time model in (5-73) might be a poor approximation of its continuous-time counterpart due the simplicity of the differentiation rule. However, in what follows the focus will lie solely on the behaviour of the discrete-time model, and not on the relation between (5-69) and (5-73). From (5-73), it is clear that this system belongs to the PNLSS model class. In the next simulation, these equations are simulated during N sim = 10 6 iterations using the following settings: a = 1 b = 10 c = 100 d = 0.82 ω = 3.5 h = 2π ωn (5-74) 0.5 x 2 (t) Time [s] Spectrum x 2 [db] Angular Frequency [rad/s] Figure 5-8. Top plot: state trajectory of the discretized Duffing oscillator; bottom plot: DFT of the state trajectory (grey) and the input signal (black). 112

129 A Step beyond the Volterra Framework with N = The value of the time step h is chosen such that no leakage is present in the calculated DFTs. In Figure 5-8, the state trajectory of x 2 () t during the first 100 seconds is shown (top plot). In the bottom plot, the DFT of the last 8N samples of the state trajectory is plotted (grey), together with the DFT of the input signal (black). From this figure, we observe that besides harmonic components, also subharmonic components are present: the harmonic lines corresponding to ω 4, ω 2, and 3ω 4 are also excited. Although the DUT can be represented by the PNLSS model, it does not fit into the Volterra-framework Lorenz Attractor In [40], E. N. Lorenz studied the nonlinear differential equations that describe the behaviour of a forced, dissipative hydrodynamic flow. The solutions of these equations appeared to be extremely sensitive to minor changes of the initial conditions. This is the so-called butterfly effect. Often, the behaviour of this system is referred to as chaotic, while it is described by the following deterministic model equations: dx 1 () t = σ( X dt 2 () t X 1 () t ) dx 2 () t = X dt 1 ()ρ t ( X 3 () t ) X 2 () t dx 3 () t = X dt 1 ()X t 2 () t bx 3 () t (5-75) where the model parameters are given by ρ = 28, σ = 10, and b = 8 3. Like in the previous section, we convert these equations into a discrete-time description by applying the Euler differentiation method with time step h : x 1 ( t + 1) = hσ( x 2 () t x 1 () t ) + x 1 () t + hu() t x 2 ( t + 1) = hx ( 1 ()ρ t ( x 3 () t ) x 2 () t ) + x 2 () t x 3 ( t + 1) = hx ( 1 ()x t 2 bx 3 () t ) + x 3 () t (5-76) with X i ( th) x i () t. An input term hu() t is added to the first state equation, such that initial conditions can be imposed on the system. It can easily be seen that these equations fit into the proposed PNLSS model structure. As with the Duffing oscillator, we are not interested in a 113

130 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems x 1 (t) x 2 (t) x 3 (t) Time [s] Figure 5-9. State trajectory of the discretized Lorenz attractor (2D plot). perfect match between (5-75) and (5-76). In the following simulation, we use a time step h = 0.01, apply an impulse with amplitude A = 0.01 on the state x 1 () t, and simulate the equations in (5-76) during N sim = samples. The resulting state trajectory is shown in Figures 5-9 and The chaotic behaviour of this model is in contradiction to the Volterra framework, since here the response to a periodic input is definitely not periodic. This example illustrates that the PNLSS model structure is richer than the Volterra framework. Figure State trajectory of the discretized Lorenz attractor (3D plot). 114

131 Identification of the PNLSS Model 5.6 Identification of the PNLSS Model In this part of the chapter, an identification procedure for the model in (5-20) is proposed. It consists of four major steps. First, we estimate, in mean square sense, the Best Linear Approximation (BLA) of the plant. Then, a parametric linear model is estimated from the BLA using frequency domain subspace techniques. This is immediately followed by a nonlinear optimization to improve the linear model estimates. The last step consists of a nonlinear optimization procedure in order to obtain the parameters of the full nonlinear model Best Linear Approximation For calculating the Best Linear Approximation of the Device Under Test (DUT), we refer to Chapter 2 (see Estimating the Best Linear Approximation on p. 28). The procedure explained there converts the measured input/output data into a nonparametric linear model Ĝ( k), and its sample covariance denoted by Ĉ G ( k). Ĝ( k) is given in the form of a Frequency Response Function (FRF). This data reduction step offers a number of advantages. First of all, the Signal to Noise Ratio (SNR) is enhanced. Secondly, it allows the user to select, in a straightforward way, a frequency band of interest. Finally, when periodic data are available, the measurement noise and the effect of the nonlinear behaviour can be separated Frequency Domain Subspace Identification The next step is to transform the nonparametric estimate Ĝ( k) into a parametric model. The purpose is to estimate a linear, discrete-time state space model from Ĝ( k), taking into account the covariance matrix Ĉ G ( k). For this, we make use of the frequency domain subspace algorithm in [44] which allows to incorporate covariance information for non uniformly spaced frequency domain data. Furthermore, we rely on the results presented in [53], where the stochastic properties of this algorithm were analysed for the case in which the sample covariance matrix is employed instead of the true covariance matrix. In this section, we briefly recapitulate the algorithm and the model equations on which the algorithm is based. A. Model Equations First, consider the DFT in N samples of the state space equations (5-9): 115

132 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems z k Xk ( ) = AX( k) + BU( k) + z k x I Yk ( ) = CX( k) + DU( k) (5-77) for k = 1,, F, and with z k = e jω k T s. In the transient term z k x I, x I is defined as 1 x I = ( x( 0) xn ( )). (5-78) N In what follows, we will neglect the transient term. The procedures that determine the BLA result in an estimate in the form of a FRF. Hence, we rewrite (5-77) into the FRF form as well. This is done by setting Uk ( ) = I nu. The plant model then looks as follows: z k Xk ( ) = AX( k) + B Gk ( ) = CX( k) + D (5-79) with the state matrix Xk ( ) n a n u, Gk ( ) n y n u, and n a the order of the model. We multiply the second equation of (5-79) by z k Xk ( ) with the first equation of (5-79). p z k, and elaborate it by repeatedly substituting p p 1 z kgk ( ) = zk ( Cz k Xk ( ) + z k D) = p 1 z k p 2 z k ( CAX( k) + CB + z k D) CA 2 2 = ( Xk ( ) + CAB + z k CB + z kd) (5-80) After p 1 substitutions, we obtain p p p 1 z kgk ( ) CA Xk ( ) CA B zk CA p 2 p 1 p = + ( + B + + z k CB + zkd). (5-81) We write down equation (5-81) for p = 0,, r 1 with r > n a : 116

133 Identification of the PNLSS Model Gk ( ) = CX( k) + D z k Gk ( ) = CAX( k) + CB + z k D r 1 z k Gk ( ) = r 1 r 2 r CA Xk ( ) + CA B + + zk 2 + r 1 CB zk D (5-82) The extended observability matrix O r and the matrix S r that contains the Markov parameters, are defined as: O r C = CA S r CA r 1 = D CB D 0 0 CA r 2 BCA r 3 B CB D (5-83) We also define W r ( k) = r 1 1 z k z k T. (5-84) By applying the definitions (5-83) and (5-84) to the following relation: r equations of (5-82), we obtain the G = O r X + S r I, (5-85) where the matrices G, X, and I are defined as G = W r ( 1) G( 1) W r ( F) GF ( ) X = X( 1) XF ( ) (5-86) I = W r ( 1) I nu W r ( F) I nu The complex data equation in (5-85) is now converted into a real equation: G re = O r X re + S r I, (5-87) where we define Z re = Re( Z) Im( Z). 117

134 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Assumption 5.7 We assume the following additive noise setting: Gk ( ) = G 0 ( k) + N G ( k), (5-88) where the noise matrix N G ( k) has independent (over k ), circular complex normally distributed elements with zero mean { N G ( k) } = 0, (5-89) and covariance C G ( k) : C G ( k) = { vec( N G ( k) )vec H ( N G ( k) )}. (5-90) Equation (5-87) then becomes G re O r X re re = + S r I + N G, (5-91) with N G = W r ( 1) N G ( 1) W r ( F) N G ( F). (5-92) Another assumption concerns the controllability and the observability of the true plant model: Assumption 5.8 The true plant model can be written in the form (5-79) where is observable and ( A, B) is controllable. At first sight, it is awkward to refer to a true linear model, whereas the algorithm will be used to identify a model for a nonlinear DUT. The actual system will definitely not adhere to the linear representation in (5-79). However, one should bear in mind that, at this stage, the goal is to retrieve a parametric model for the Best Linear Approximation of the system. From the view point of the BLA, the nonlinear behaviour of the DUT only results in two kind of effects: bias contributions which change the dynamic behaviour of the BLA, and stochastic contributions which act like ordinary disturbing noise (see also Chapter 2). ( A, C) The state space matrices can be retrieved from equation (5-91) using the frequency domain subspace identification algorithm summarized in paragraph B. 118

135 Identification of the PNLSS Model B. Frequency Domain Subspace Identification Algorithm [44] 1. Estimate the extended observability matrix O r, given Gk ( ) and C G ( k). 1a. Initialization: Choose r> n a and form Z I re F = and C N Re W r ( k)w H i = r ( k) C G ( k), (5-93) G re k = 1 i = 1 n u i where C G ( k) denotes the i-th diagonal partition of C G ( k) (see Appendix 5.B). 1b. Eliminate the input I re from Z using a QR-decomposition of Z T : Z = R T Q T. Z I re R = = 11 G re T 0 R T 12 RT 22 Q 1 T Q 2 T (5-94) Define RT 11 as the left upper block of rn u rn u elements. Then, RT 12 has dimensions rn y rn u, and RT 22 ( rn y rn y ) remains after the elimination of I re from Z c. Remove the noise influence from (5-91): calculate the SVD of C N RT 22 : 1 2 C N RT 22 = UΣV T, (5-95) and estimate O r as 1 2 Ô r = C N U [, :1:na ]. (5-96) 2. Make use of the shift property of O r to estimate A and C from Ô r : Â = Ô r [ 1: ( r 1)ny,:] Ô r[ n and (5-97) y + 1:rn y,:] Ĉ = Ô r [ 1:n, : ] y 3. Estimate B and D, given Â and Ĉ : minimize V SS with respect to B and D: V SS = F k = 1 ε H 1 ( k)c G ( k)ε( k), (5-98) with ε( k) = vec( G SS ( Â, B, Ĉ, D, k) Gk ( )), (5-99) G SS ( A, B, C, D, k) = Cz ( k I na A) 1 B + D. (5-100) 119

136 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Nonlinear Optimization of the Linear Model The weighted least squares cost function V SS defined in (5-98) is a measure of the model quality. According to this measure, it turns out that the subspace algorithm generates acceptable model estimates. However, in practical applications V SS strongly depends on the dimension parameter r chosen in step 1.a of the identification procedure. A first action that can be taken to improve the model estimates, is to apply the subspace algorithm for different values of r, for instance r = n a + 1,, 6n a, and to select the model that corresponds to the lowest V SS. The second way to obtain better modelling results is to consider the cost function V WLS = F ε H 1 ( k)c G ( k)ε( k), (5-101) k = 1 with ε( k) = vec( G SS ( A, B, C, D, k) Gk ( )), (5-102) and to minimize V WLS with respect to all the parameters ( ABCD). This is a nonlinear problem that can be solved using the Levenberg-Marquardt algorithm (see The Levenberg- Marquardt Algorithm on p. 135). This method requires the computation of the Jacobian of the model error ε( k) with respect to the model parameters. From (5-102) and (5-100), we calculate the following expressions: ε( k) vec Cz ( A k I na A) 1 n a n a = I ij ( z k I na A) 1 ij ε( k) vec Cz ( B k I na A) 1 n a n u = I ij ij ε( k) C vec I n y n a = ( z ij k I na A) 1 ij ε( k) D vec I n y n u = ij ij B (5-103) 120

137 Identification of the PNLSS Model 10 3 Cost Function r Figure WLS cost function V SS of the subspace estimates (grey dots), and the cost function V WLS after the nonlinear optimization (black dots) for different values of dimensioning parameter r. The subspace method is used to generate a number of initial linear models (e.g. r = n a + 1,, 6n a ), which are used as starting values for the nonlinear optimization procedure. Finally, the model that corresponds to the lowest cost function V WLS is selected. Due to the fact that a high number of different initial models is employed, there is a higher probability to end up in a global minimum of V WLS, or at least in a good local minimum. Note that in the parameter space, there exists an infinite number of global minimizers for VWLS, 2 more precisely a subspace of dimension n a. This is a consequence of fully parametrizing the linear state space representation. Furthermore, while carrying out the nonlinear optimization, the unstable models estimated with the subspace algorithm are stabilized, for instance using the methods described in [14]. To exemplify this method, we apply it to the semi-active damper data set (see Description of the Experiments on p. 161), and estimate models of order n a = 3 for different values of r : r = 3,, 75. In Figure 5-11, the cost function of the subspace estimates V SS is shown (grey dots), together with the cost function of the optimized models V WLS (black dots). Figure 5-11 illustrates that V SS is a craggy function of r, and that for this particular data set, the Levenberg-Marquardt algorithm ends up in the same local minimum: the same value of V WLS is attained for a large number of initial models. 121

138 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Estimation of the Full Nonlinear Model The last step in the identification process is to estimate the full nonlinear model xt ( + 1) = Ax() t + Bu() t + Eζ() t yt () = Cx() t + Du() t + Fη() t + et () (5-104) with the initial state given by x( 1) = x 0, and where et () is the output noise. For this, a weighted least squares approach will be employed. In order to keep the estimates of the model parameters unbiased, the following assumption is required. Assumption 5.9 It is assumed that the input ut () of the model in (5-104) is noiseless, i.e., it is observed without any errors and independent of the output noise. In practical situations, it may occur that Assumption 5.9 is not fulfilled. When the SNR at the input is sufficiently high (> 40 db), the resulting bias in the estimated model parameters is negligible. When the SNR is too low, it can be increased by employing periodic signals: by measuring a sufficient number of periods, and averaging over time or frequency, the SNR is improved in a straightforward way. The weighted least squares cost function θ V WLS with respect to the model parameters = [ vec( A) ; vec( B) ; vec( C) ; vec( D) ; vec( E) ; vec( F) ] will be minimized: F V WLS ( θ) = ε H ( k, θ)wk ( )ε( k, θ), (5-105) k = 1 where Wk ( ) n y n y is a user-chosen, frequency domain weighting matrix. Typically, this 1 matrix is chosen equal to the inverse covariance matrix of the output Ĉ Y ( k). This matrix can be obtained straightforwardly when periodic signals are used to excite the DUT. By choosing Wk ( ) properly, it is also possible to put more weight in a certain frequency band of interest. When no covariance information is available and no specific weighting is required by the user, a constant weighting ( Wk ( ) = 1, for k = 1,, F ) is employed. Furthermore, the model error ε( k, θ) n y is defined as ε( k, θ) = Y m ( k, θ) Yk ( ), (5-106) 122

139 Identification of the PNLSS Model where Y m ( k, θ) and Yk ( ) are the DFT of the modelled and the measured output, respectively. Note that when Y m ( k, θ) is calculated with correct initial conditions, equation (5-106) does not pose serious leakage problems in the case of non periodic data, because the leakage terms present in Y m ( k, θ) and Yk ( ) cancel each other. A. Calculation of the Jacobian We minimize V WLS ( θ) by means of the Levenberg-Marquardt algorithm (see The Levenberg-Marquardt Algorithm on p. 135). This requires the computation of the Jacobian Jkθ (, ) of the modelled output with respect to the model parameters. Hence, we need to compute Jkθ (, ) ε( k, θ) = = θ Y m ( k, θ) (5-107) θ Given the nonlinear relationship in (5-104), it is impractical to calculate the model output and the Jacobian directly in the frequency domain. Therefore, we will perform these operations in the time domain, followed by a DFT in order to obtain Y m ( k, θ) and Jkθ (, ). Before deriving explicit expressions, we recapitulate some general aspects with respect to the calculation of the Jacobian, which are pointed out in [47] and [78]. Consider a general discrete-time nonlinear model xt ( + 1) = fxt ( ()ut, ()a, ) yt () = gxt ( ()ut, ()b, ) (5-108) where a and b are the model parameters present in the state and output equation, respectively. The derivatives of the output yt () with respect to a and b are given by xt ( + 1) fxt ( ()ut, ()a, ) a xt () fxt ( ()ut, ()a, ) = xt () a a yt () gxt ( ()ut, ()b, ) = a xt () xt () a yt () gxt ( ()ut, ()b, ) = b b (5-109) 123

140 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems These equations can be rewritten as follows x a ( t + 1) = f x ( xt ()ut, ()a, )x a () t + f a ( xt ()ut, ()a, ) y a () t = g x ( xt ()ut, ()b, )x a () t y b () t = g b ( xt ()ut, ()b, ) (5-110) Hence, the expressions that define the calculation of the Jacobian (5-110) can be regarded as a new dynamic discrete-time nonlinear model. The inputs of this Jacobian model are the inputs and the simulated states of the original model. These states are obtained by simulating the original model with the estimated parameters of the previous Levenberg-Marquardt iteration. For model equations (5-104), explicit expressions for the Jacobian can be found in Appendix 5.D. Furthermore, due to the polynomial nature of (5-104), the equations in (5-110) are in a polynomial form as well. Hence, a PNLSS model can be determined that calculates the elements of the Jacobian. In Appendix 5.E, explicit expressions are derived for the state space matrices of this new model. B. Initial Conditions In (5-110), the simulated states are employed to calculate the Jacobian. Hence, when the state sequence is computed, the initial state x 0 of the model in (5-104) should be taken into account. For this, three possible approaches are distinguished. The simplest, but rather inefficient way, is to calculate the Jacobian for the full data set, and then to discard the first N trans transient samples of both the Jacobian and the model error. In this way, a part of the data is not used for the model estimation. The second method can only be employed when periodic excitations are applied during the experiments. As mentioned earlier, the simulated states from the previous Levenberg- Marquardt iteration are used to calculate the Jacobian. By applying several periods of the input sequence, and by considering only the last simulated state period, the transients become negligible. This principle is depicted in Figure 5-12 for two input periods. In this particular example, it suffices to discard the first period to obtain states that are in regime. In order to save computing time, a fraction of a period can be used as a preamble. This can be done for highly damped systems, or when the number of samples per period is high. 124

141 Identification of the PNLSS Model input states nonlinear model Figure Removal of the transients in the simulated states; state(s) in regime (black), transient (red). The last method, which is suitable for both periodic and non periodic excitations, is to estimate the initial conditions x 0 as if they were ordinary model parameters. This can be achieved in a straightforward way, since the estimation of x0 is equivalent to estimating an extra column in the state space matrix B. The idea is to add an artificial model input u art to the model, which only contributes to the state equation in a linear way (i.e., only via the B matrix). The resulting input is then given by u' () t = ut () u art () t. (5-111) Assume that the original input, the state and the output data sequences are defined for time indices t = 1,, N. We consider u( 0) = 0 and x( 0) = 0, and apply an impulse signal to the artificial input of the system: u art () t = 1 t = 0 0 t = 1,, N (5-112) Then, we obtain the following state equation for t = 0 : x( 1) = Ax( 0) + Bu' ( 0) + Eζ( 0) = = Bu' ( 0) B [ :, nu + 1] (5-113) Consequently, the initial conditions can be estimated like ordinary model parameters by adding an artificial input u art to the model. 125

142 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems C. Starting Values The last obstacle that needs to be cleared before starting the nonlinear optimization is to choose good starting values for θ. For the matrices A, B, C, and D, we will use the estimates obtained from the parametric Best Linear Approximation. The other state space matrices ( E and F) are initially set to zero. The idea of using a linear model as a starting point for the nonlinear modelling is certainly not new (e.g. [71]), and is quite often employed. Using the parametric BLA as the initial nonlinear model offers two important advantages. First of all, it guarantees that the estimated nonlinear model performs at least as good as the best linear model. Secondly, for the model structure in (5-104), this principle results in a rough estimate of the model order n a. D. How to handle the similarity transform? As mentioned earlier, the state space representation is not unique: the similarity transform x T () t = T 1 xt () leaves the input/output behaviour unaffected. The n2 a elements of the transformation matrix T can be chosen freely, under the condition that T is non singular. The parameter space has thus at least n2 a unnecessary dimensions. This poses a problem for the gradient-based identification of the model parameters θ n θ : the Jacobian does not have full rank and, hence, an infinite number of equivalent solutions exists. One way to deal with this problem is to use a canonical parametrization, such that the redundancy disappears. However, it is known that this may lead to numerically ill-conditioned estimation problems [45]. A second way to cope with the overparametrization is to employ so-called Data Driven Local Coordinates (DDLC) [45], or a Projected Gradient search [80]. The key idea of these methods is to identify the manifold of models parametrized by θ in the parameter space, for which the models have an identical input/output behaviour. Thus, any parameter update for which the model remains on this manifold does not change the input/output behaviour. Therefore, the methods presented in [45] and [80] compute the parameter update such that it is locally orthogonal to the manifold: this is achieved by computing a projection matrix P n θ n θ n2 a such that the new Jacobian J DDLC ( θ) = J( θ)p (5-114) 126

143 Identification of the PNLSS Model has n2 a columns less than the original Jacobian J( θ), and has full rank. The matrix P needs to be determined during every iteration step. The third method to deal with the rank deficiency of the Jacobian consists in using a full parametrization, and employing a truncated Singular Value Decomposition (for more details, see The Levenberg-Marquardt Algorithm on p. 135). In [86], it is shown that this method and the DDLC method are equivalent: the search direction in the θ -space computed with DDLC and the one obtained by means of a truncated SVD are identical. The additional advantage of the DDLC method is the calculation of n2 a less columns of the Jacobian matrix compared to the full parametrization. This can save a considerable amount of computation time, especially when the model order is high. The DDLC approach is feasible when the computation of P is straightforward. This is the case for linear, bilinear, and LPV state space models. However, for the polynomial nonlinear state space model, the calculation of P is very involved. Hence, we will employ the third method: a full parametrization and a truncated SVD. E. Overfitting and Validation The nonlinear search should be pursued until the cost function in (5-105) stops decreasing. However, as it is often the case for model structures with many parameters, overfitting can occur during the nonlinear optimization [70]. This phenomenon can be visualized by applying a fresh data set to the models obtained from the iterations of the nonlinear search. In the case of overfitting, the model quality first increases up to an optimum, and then deteriorates as a function of the number of iterations. The reason for this is the following: at the start of the optimization, the important parameters are quickly pulled to minimizing values, and diminish the bias error. As the minimization continues, the less important parameters are more and more drawn to minimizing values. Hence, a growing number of parameters becomes activated, and the variance on the parameter estimates increases. In order to avoid this effect, we use the so-called stopped search [70]: we evaluate the model quality of every estimated model on a test set, and then select the model that achieves the best result. This method is a form of implicit regularization, because it prevents the activation of unnecessary parameters. 127

144 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems F. Stability during Estimation and Validation We will assume that the parametric linear model obtained from the BLA is stable. As mentioned before, this can be ensured using stabilizing methods for linear models, like for instance [14]. Hence, the nonlinear optimization of the full PNLSS model is started from a stable model. The first phase in the nonlinear optimization consists of calculating the Jacobian. Hence, the first question is whether this calculation remains stable. Consider the recursive expressions for the Jacobian given in (5-168). From these equations, it is observed that the time-varying dynamics of the Jacobian are determined by the factor ( A + Eζ' () t ). (5-115) On the other hand, the Jacobian matrix of the original state equation in (5-104), with respect to the states, is given by xt ( + 1) =. (5-116) xt () A + Eζ' () t This expression describes the linearised dynamic behaviour of the original model at every time instance. Since (5-115) and (5-116) are identical, the original model and the Jacobian model share the same dynamic behaviour (i.e., the instantaneous poles of both models are identical). Consequently, a stable model always yields a stable Jacobian. The second question is whether a parameter update during the nonlinear optimization yields a stable model. Naturally, this is not necessarily the case. When unstability occurs, it will be reflected by the value of the cost function (Inf or NaN). This phenomenon can easily be handled by the nonlinear optimization procedure, as if it was an ordinary increase of the cost function. On experimental data, it occurs from time to time that the estimated model becomes unstable on the validation set. To overcome this problem, the following heuristic approach is employed. The validation input signal is also passed on to the nonlinear optimization algorithm. In this way, the validation output of the updated model with parameters θ test (see Figure 5-14 in The Levenberg-Marquardt Algorithm on p. 135) can be computed in every iteration. When the validation output is unstable, the optimization algorithm reacts as if the cost function has increased. This approach guarantees a model which is stable for the validation set. 128

145 Identification of the PNLSS Model Nevertheless, this procedure prevents the iterative search to go through an unstable (validation) zone before ending up in a stable zone again. Consequently, this method should only be applied when it is strictly necessary. 129

146 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Appendix 5.A Some Combinatorials In this appendix, we calculate the number of distinct monomials in n variables of a given degree r. Choosing r different elements out of a set of n elements can be done in a number of different ways, which is given by the binomial coefficient: n n! = r r! ( n r)! (5-117) For instance, if we have to choose two different elements from 4 4! = 2 2! = ( 4 2)! 6 { 1, 2, 3, 4}, this results in (5-118) combinations, namely { 1, 2}, { 13, }, { 14, } { 23, }, { 2, 4} { 34, } (5-119) We would also like to add identical combinations, like { 1, 1} for instance. To do so, we need to add one dummy variable s to the set { 1, 2, 3, 4}. Then, the resulting 10 combinations are: { 1, 2}, { 13, }, { 14, }, { 1, s} s = 1 { 23, },{ 2, 4},{ 2, s} s = 2 { 34, }, { 3, s} s = 3 { 4, s} s = 4 (5-120) In general, we need to add r 1 dummy variables in order to obtain n+ r 1 ( n + r 1)! = r r! ( n 1)! (5-121) monomials of degree r in n variables. 130

147 Identification of the PNLSS Model Appendix 5.B Construction of the Subspace Weighting Matrix from the FRF Covariance The subspace identification algorithm requires the computation of the weighting matrix C N, which is defined as C N = Re( { N G ( k)n H G ( k) }). (5-122) In Chapter 2, we have determined the covariance matrix C G ( k) as: C G ( k) = { vec( N G ( k) )vec H ( N G ( k) )}. (5-123) The purpose of this appendix is to find an expression for C G ( k). C N as a function of the elements of When we substitute (5-92) in (5-122), we obtain: F C N Re [ W r ( k) N G ( k) ][ W r ( k) N G ( k) ] H = k = 1 (5-124) The following identities hold: ( A B) H = A H B H ( A B) ( C D) = AC BD (5-125) i.e., the Hermitian transpose of a Kronecker product and the Mixed Product rule. Applying these Kronecker product properties to (5-124) results in: F C N = Re W r ( k)w H r ( k) { N G ( k)n H G ( k) } k = 1 (5-126) We denote the i-th column of N G ( k) as N [:, i] ( k) and obtain n u N G ( k)n H G ( k) = N [:, i] ( k)n H [:, i] ( k). (5-127) i = 1 On the other hand, we also have that 131

148 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems i Ñ ij j Figure Ñ divided into n y n y partitions. Ñ = vec( N G ( k) )vec H ( N G ( k) ) = N [:, 1] ( k) N H :, 1 N [ :, nu ] ( k) [ ] H [, ] ( k) N : nu ( k). (5-128) Hence, the partition Ñ ij at position ( i, j) in Ñ with dimensions n y n y (see Figure 5-13) is given by Ñ ij = N [:, i] ( k)n H [:, j] ( k). (5-129) Taking into account equation (5-127), it is clear that { N G ( k)n H G ( k) } can be computed from the elements of C G ( k) = { Ñ( k) }. First, C G ( k) should be divided into n u n u partitions, in which each partition contains n y n y elements. Next, the diagonal partitions should be summed in order to determine { N G ( k)n H G ( k) }. Finally, we obtain F C N = Re W r ( k)w H r ( k) Ci G ( k), (5-130) k = 1 i = 1 where Ci G ( k) denotes the i-th n y n y partition on the diagonal of C G ( k). n u 132

149 Identification of the PNLSS Model Appendix 5.C Nonlinear Optimization Methods Consider the cost function V( θ, Z) which is a function of the parameter vector θ n θ and the measurements Z. In this appendix, we will summarize a number of standard iterative nonlinear optimization methods that can be used to minimize V( θ, Z) with respect to θ. Given their iterative nature, these methods all have in common the computation of a parameter update θ. A. The Gradient Descent Algorithm The gradient descent algorithm is the most intuitive method to find the minimum of a function. In this iterative procedure, the parameter update θ is proportional to the negative gradient V of the cost function θ = λ V, (5-131) with λ the damping factor. The main advantages of the gradient method are its conceptual simplicity and its large region of convergence to a (local) minimum. The most important drawback is its slow convergence. B. The Gauss-Newton Algorithm When a quadratic cost function V( θ, Z) needs to be minimized: N V( θ, Z) = e T ( θ, Z)e( θ, Z) = e k ( θ, Z), (5-132) k = 1 with e( θ, Z) N a residual, the Gauss-Newton algorithm is well suited. The reason for this is that this iterative procedure makes explicit use of the quadratic nature of the cost function. This results in a faster convergence compared to the gradient method [25]. The parameter update θ of this method is given by θ = 2 V 1 V. (5-133) This approach requires the knowledge of the Hessian matrix 2 V (i.e., the matrix containing the second derivatives) and the gradient V of the cost function, both with respect to θ. 133

150 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Further on, it will become clear that, for a quadratic cost function, the Hessian can be approximated by making use of only the first order derivatives of e( θ, Z). The Hessian matrix and the gradient are given by 2 V 2 N V( θ, Z), (5-134) θ 2 2J T ( θ, Z)J( θ, Z) 2 e k ( θ, Z) 2 e k ( θ, Z) = = θ 2 k = 1 V V( θ, Z) = = 2J, (5-135) θ T ( θ, Z)e( θ, Z) where J( θ, Z) is defined as the Jacobian matrix of e( θ, Z) with respect to θ: J( θ, Z) = e( θ, Z) (5-136) θ When the residuals e k ( θ, Z) are small, the second term in (5-134) is negligible compared to the first term. Hence, the Hessian can be approximated by 2 V( θ, Z) , (5-137) θ 2 2J T ( θ, Z)J( θ, Z) which is only a function of the Jacobian. Hence, the Gauss-Newton parameter update found by solving θ is J T ( θ, Z)J( θ, Z) θ = J T ( θ, Z)e( θ, Z). (5-138) The step θ can be computed in a numerically stable way via the Singular Value Decomposition (SVD) [27] of J( θ, Z) : J( θ, Z) = UΣV T. (5-139) The parameter update is then given by θ = VΣ 1 Ue( θ, Z). (5-140) If J( θ, Z) is not of full rank, then Σ 1 is singular, and a truncated SVD should be used in order to compute (5-140). This occurs for example when an overparametrized model is utilized. 134

151 Identification of the PNLSS Model The convergence rate of the Gauss-Newton algorithm depends on the assumption that the residuals e k ( θ, Z) are small. If this is the case, then the convergence rate is quadratic, otherwise it can become supralinear. The main drawback of the Gauss-Newton algorithm is its smaller region of convergence compared to the gradient method. C. The Levenberg-Marquardt Algorithm The Levenberg-Marquardt algorithm [36],[42] combines the large convergence region of the gradient descent method with the fast convergence of the Gauss-Newton method. In order to increase the numerical stability and to avoid comparing apples with oranges, the columns of the Jacobian matrix J( θ, Z) need to be normalized prior to the computation of the parameter update. The normalized Jacobian matrix J N ( θ, Z) is given by J N ( θ, Z) = J( θ, Z)N, (5-141) where the diagonal normalization matrix N n θ n θ is defined as N = 1 diag. (5-142) rms ( J [:1, ] ( θ, Z) ) 1,, rms ( J [ :, nθ ] ( θ, Z) ) In most cases, the normalization yields in a better condition number (i.e., the ratio between the largest and the smallest non zero singular value) for J N ( θ, Z) compared with J( θ, Z). Next, the parameter update θ N is computed by solving the equation ( J T N ( θ, Z)J N ( θ, Z) + λ 2 I nθ ) θ N = J T N ( θ, Z)e( θ, Z), (5-143) where the damping factor λ determines the weight between the two methods. If λ has a large numerical value, then the second term in (5-143) is important, and hence the gradient descent method dominates. When λ is small, the Gauss-Newton method takes over. In order to compute (5-143) in a numerically stable way, the SVD of J N ( θ, Z) is calculated first. When the Jacobian is singular, J N ( θ, Z) has rank ñ θ < n θ and the SVD is given by J N ( θ, Z) = Udiag σ 1, σ 2,, σ ñθ, 0,, 0 V T. (5-144) 135

152 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Next, the parameter update θ N is calculated using a truncated SVD. This results in θ N = VΛU T e( θ, Z), (5-145) where the matrix Λ is defined as Λ = σ ñθ σ 1 σ diag (5-146) σ2 1 + λ 2, σ2 2 + λ 2,, λ 2, 0,, 0 σ ñθ In the last step, the parameter update θ N needs to be denormalized again: θ = N θ N. (5-147) As a starting value for λ, the largest singular value of J N ( θ, Z) from the first iteration can be used [25]. Next, λ is adjusted according to the success of the parameter update. When the cost function decreases, the approximation made in (5-137) works well. Hence, λ should be decreased such that the Gauss-Newton influence becomes more important. Conversely, when the cost function increases, the gradient descent method should gain more weight: this is obtained by increasing λ. Different stop criteria can be employed to bring the iterative Levenberg-Marquardt algorithm to an end. For instance, the optimization can be broken off when the relative decrease of the cost function becomes smaller than a user-chosen value, or when the relative update of the parameter vector becomes too small. However, the most simple approach is to stop the optimization when a sufficiently high number of iterations i max is exceeded. A full optimization scheme that makes use of this stop criterion is shown in Figure In practice, we will also evaluate the cost function on the validation set, and choose the model which performs best on this data set (see Overfitting and Validation on p. 127). 136

153 Identification of the PNLSS Model Initialize θ Compute V( θ, Z) i = 1 λ = 1 i < i max no Stop yes λ = λ Compute J Normalize J [ U, S, V] = svd( J N ) V θ = = V test θ test λ = 1 yes λ = S( 1, 1) no Compute θ N Denormalize θ N θ test = θ + θ Compute V test ( θ test, Z) i = i + 1 λ = λ 10 yes V test < V i > or i max no Figure Levenberg-Marquardt algorithm. 137

154 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems D. Dealing with Complex Data Suppose the following cost function V( θ, Z) needs to be minimized: N V( θ, Z) = ε H ( θ, Z)ε( θ, Z) = ε k ( θ, Z), (5-148) k = 1 with εθz (, ) N a complex residual. V( θ, Z) can be rewritten as V( θ, Z) = ε T re ( θ, Z)ε re ( θ, Z), (5-149) where ε re ( θ, Z) is defined as ε re ( θ, Z) = Re( εθz (, )) Im( εθz (, )). (5-150) Furthermore, the matrix J re ( θ, Z) is defined as J re ( θ, Z) = Re( J( θ, Z) ) Im( J( θ, Z) ). (5-151) The matrices ε re ( θ, Z) and J re ( θ, Z) are thus real matrices which allows us to recycle the ideas described in section C. E. Weighted Least Squares In general, a Weighted Least Squares (WLS) cost function is defined as V WLS ( θ, Z) = ε H ( θ, Z)Wε( θz, ), (5-152) where W N N is a Hermitian, positive definite weighting matrix. Any Hermitian positive (semi-)definite matrix can be decomposed as [27]: W = W 1/ 2 W 1/ 2, (5-153) where the square root matrix W 1/ 2 is also Hermitian. Using the SVD of W = UΣV T, W 1/ 2 can be calculated straightforwardly: 138

155 Identification of the PNLSS Model W 1/ 2 = VΣ 1/ 2 V H = ( W 1/ 2 ) H. (5-154) For real matrices, a similar result holds: W 1/ 2 = VΣ 1/ 2 V T = ( W 1/ 2 ) T. (5-155) Equation (5-152) can thus be rewritten as V WLS ( θ, Z) = ε H ( θ, Z) W 1/ 2 H W, (5-156) 1/ 2 εθz (, ) = W 12 / εθz (, ) H W 1/ 2 εθz (, ) or V WLS ( θ, Z) = ε H ( θ, Z)ε ( θ, Z), (5-157) with ε ( θ, Z) = W 1/ 2 εθz (, ). (5-158) The Jacobian of ε ( θ, Z) is then given by ε ( θ, Z) ( W J ( θ, Z) / εθz (, )) = = = W. (5-159) θ θ 1/ 2 J( θ, Z) In this way, we recast the WLS problem such that it can be solved using the techniques from section C and D. 139

156 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Appendix 5.D Explicit Expressions for the PNLSS Jacobian In this appendix, we compute explicit expressions for the derivatives of the model output (5-104) with respect to the parameters θ. We first define the matrices ζ' () t n ζ n a and η' () t n η n a as ζ' () t η' () t ζ() t = = xt () η() t = = xt () ζ() t x 1 () t ζ() t x () t na η() t x 1 () t η() t x () t na (5-160) m n I ij m n denotes a zero matrix with a single element equal to one at entry ( i, j) : m n I ij = i (5-161) j We begin by computing the Jacobian with respect to the elements A ij of the matrix A. The derivative of the output equation with respect to A ij is given by: yt () A ij = = A ij ( Cx() t + Du() t + Fη() t ) C xt () + Fη' () t xt () A A ij ij (5-162) In order to determine the right hand side of (5-162), we also need the derivatives of the state equation which are given by xt ( + 1) A ij = ( Ax() t + Bu() t + Eζ() t ). (5-163) A ij We define x Aij () t n a as 140

157 Identification of the PNLSS Model x Aij () t = xt () (5-164) A ij Then, equation (5-163) is rewritten as x Aij ( t + 1) = I ij n a n a xt () + ( A + Eζ' () t )x Aij () t. (5-165) Combining equations (5-162) and (5-165) results in x Aij ( t + 1) = I ij n a n a J Aij () t = ( C + Fη' () t )x Aij () t xt () + ( A + Eζ' () t )x Aij () t (5-166) where J Aij () t n y is defined as J Aij () t = yt () A ij (5-167) The Jacobian of the other model parameters are computed in a similar way. We summarize the results below: x Aij ( t + 1) = I ij n a n a J Aij () t = ( C + Fη' () t )x Aij () t x Bij ( t + 1) = I ij n a n u J Bij () t = ( C + Fη' () t )x Bij () t x Eij ( t + 1) = I ij n a n ζ J Eij () t = ( C + Fη' () t )x Eij () t xt () + ( A + Eζ' () t )x Aij () t ut () + ( A + Eζ' () t )x Bij () t ζ() t + ( A + Eζ' () t )x Eij () t J Cij () t = I ij J Dij () t = I ij J Fij () t = I ij n y n a n y n u n y n η xt () ut () η() t (5-168) 141

158 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems Appendix 5.E Computation of the Jacobian regarded as an alternative PNLSS system It is clear from (5-168) that the computation of J Aij () t, J Bij () t, and J Eij () t is equivalent to calculating the output of an alternative PNLSS system. We consider here, for instance, the calculation of J Aij () t and we will attempt to write equations (5-166) in the following form: x ( t + 1) = Ãx () t + B ũ() t + Ẽζ () t ỹ() t = C x () t + F η () t (5-169) For this, we define the new inputs, states and outputs as follows: ũ() t = xt () x () t = x Aij () t ỹ() t = J Aij () t ut () (5-170) A number of relations between the original and the new system matrices are trivial: Ã = A C = C B = I ij 0 n a n u D = 0 (5-171) The remaining system matrices and monomials require slightly more effort to determine. The goal of the following calculations is to rewrite the terms Eζ' ()x t () t and Fη' ()x t () t as Ẽζ () t and F η () t, respectively. The time indices will be omitted for the sake of simplicity. Using the multinomial notation and with multi-index α j defined as the power of the j -th monomial ζ j, we have: ζ ζ 1 = = ũ α 1. (5-172) ζ nζ α nζ ũ ζ Next, we derive expressions for ζ' = The derivative of ζ with respect to the state x j variable x i is equal to α j ()ζ i j x 1 i. We can neglect the presence of the factor x 1 i when x i is not present in a given monomial, since the corresponding α j () i is in that case equal to zero. Hence, we obtain the following relation: 142

159 Identification of the PNLSS Model α 1 ( 1)ζ 1 x 1 1 α 1 ( n a )ζ 1 x 1 na ζ' = (5-173) α nζ ( 1)ζ nζ x 1 1 α nζ ( n a )ζ nζ x 1 na Then, the product ζ'x is given by α 1 ( 1)ζ 1 α 1 ( n a )ζ 1 1 x 1x 1 ζ'x =. (5-174) α nζ ( 1)ζ nζ α nζ ( n a )ζ nζ x 1 x na na We define the new j -th monomial ζ j as ζ j x 1 1x 1 = ζ j = 1 x 1ũ 1 ζ j, (5-175) x na x 1 na 1 ũ x na na which allows to rewrite (5-174) as α ζ 1 ζ'x = 0 α α nζ ζ 2 ζ n ζ. (5-176) This leads to the definition of the new matrix Ẽ : α Ẽ = E 0 α α nζ (5-177) 143

160 Chapter 5: Nonlinear State Space Modelling of Multivariable Systems 144

161 CHAPTER 6 APPLICATIONS OF THE POLYNOMIAL NONLINEAR STATE SPACE MODEL In this chapter, the nonlinear state space approach is applied to a number of real-life systems: the Silverbox, a combine harvester, a semi-active damper, a quarter car set-up, a robot arm, a Wiener- Hammerstein system, and a crystal detector. For each case study, the Device Under Test (DUT) and the performed experiments are described. Next, the Best Linear Approximation is estimated and a nonlinear state space model is built. Whenever possible, we compare our approach with other modelling methods. 145

162 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 6.1 Silverbox Description of the DUT The Silverbox is an electronic circuit that emulates the behaviour of a mass-spring-damper system (Figure 6-1). The input u c of the system is the force applied to the mass m ; the output y c represents the mass displacement. The spring acts nonlinearly and is characterized by the parameters k1 and k3. Since k3 is positive, the spring is hardening. This means that relatively more force is required as the spring is extended. The equation that describes the system s behaviour is given by my c() t + dy c () t + k 1 y c () t + k 3 y c 3 () t = u c () t. (6-1) The parameter d determines the damping which is present in the system. For a sinusoidal input u c () t, (6-1) is also known as the Duffing equation (see Duffing Oscillator on p. 111). k 1, k 3 m u c d Figure 6-1. Mass-spring-damper system Description of the Experiments The applied excitation signal consists of two parts (Figure 6-2). The first part of the signal is filtered Gaussian white noise with a linearly increasing RMS value as a function of time. This sequence consists of samples and has a bandwidth of 200 Hz. The average RMS value of the signal is 22.3 mv. This data set will be used to validate the models. The second part of the excitation signal contains 10 realizations of an odd random phase multisine with 8192 samples and 500 transient points per realization. The bandwidth of the excitation signal is also 200 Hz and its RMS value is 22.3 mv. This sequence is applied once to the system under test and will be used to estimate the models. In all experiments, the input and output signals are measured at a sampling frequency of 10 MHz/2 14 = Hz. 146

163 Silverbox Input Signal [V] Validation Estimation Time [s] Figure 6-2. Excitation signal that contains a validation and an estimation set Best Linear Approximation In order to obtain a nonparametric estimate of the Best Linear Approximation (BLA), the FRF is determined for every phase realization of the estimation data set. The BLA Ĝ BLA ( jω k ) is then calculated by averaging those FRFs. Next, a parametric second order linear state space model is estimated. From this model, initial values will be extracted in order to estimate some nonlinear models. The results are plotted in Figure 6-3: the top and bottom plot show the amplitude and phase of the BLA, respectively. The solid black line denotes the BLA; the solid grey line represents the linear model. The total standard deviation σˆ BLA( k) is also given (black dashed line), together with the model error (dashed grey line), i.e., the difference between the measured BLA and the linear model. Unfortunately, only one period per realization was measured. Hence, no distinction can be made between the nonlinear contributions and the measurement noise (see Periodic Data on p. 28). From Figure 6-3, we observe that the linear model is of good quality: up to a frequency of 120 Hz, the model error coincides with the standard deviation. A statistically significant, but small model error is present in the frequency band between 120 Hz and 200 Hz. This error is surprising, because equation (6-1) corresponds to a linear, second order system when omitting the nonlinear term k 3 y 3 c () t. To explain this behaviour, the way the measurements 147

164 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 20 Amplitude [db] Frequency [Hz] 45 Phase [º] Frequency [Hz] Figure 6-3. BLA of the Silverbox (solid black line); Total standard deviation (black dashed line); 2nd order linear model (solid grey line); Model error (dashed grey line). were made should be taken into account. A band-limited (BL) set-up was employed during the measurements [56]. Hence, a discrete-time, second order model does not suffice to model this continuous-time, second order system. Although the model error disappears for a third order model, we will neglect it and continue with the second order model. The second data set is now used to validate the linear model. The measured output of the system is plotted in Figure 6-4 (black line) together with the simulation error (grey line). The RMS value of the simulation error (RMSE) is 13.7 mv. This number should be compared to the RMS output level that measures 53.4 mv. 148

165 Silverbox 0.3 RMSE: 13.7 mv 0.2 Amplitude [V] Time [s] Figure 6-4. Validation result for the 2nd order linear model: measured output (black) and model simulation error (grey) Nonlinear Model Now, we will investigate whether better modelling results can be obtained with a nonlinear model. First, a second order polynomial nonlinear state space (PNLSS) model is estimated with the following settings: ξ() t = x 1 ()x t 2 ()ut t T (6-2) and ζ() t = ξt () { 3} η() t = 0. (6-3) Hence, the nonlinear degree in the state equation is nx=[2 3]. We include all cross products of the states and the input. In the output equation, only the linear terms are present (ny=0). This results in a nonlinear model that contains 37 parameters. The validation results for this nonlinear model are shown in Figure 6-5. Again, the measured output signal is denoted by the black line; the simulation error of the nonlinear model is plotted in grey. The RMS value of the model error has dropped significantly from 13.7 mv for the linear model to 0.26 mv for the nonlinear model. Hence, the second order polynomial nonlinear state space model performs more than a factor 50 better than the linear one. The 149

166 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 0.3 RMSE: 0.26 mv 0.2 Amplitude [V] spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey) are shown in Figure 6-6. The errors of the linear model are particularly present around the resonance frequency (approximately 60 Hz), i.e., for large signal amplitudes. The errors of the nonlinear model are concentrated around the resonance frequency and close to DC Time [s] Figure 6-5. Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). Higher model orders and degrees of nonlinearity were also tried out, but none of them gave better results than this second order nonlinear model. We also estimated some state affine 0 20 Amplitude [dbv] Frequency [Hz] Figure 6-6. DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). 150

167 Silverbox models (see State Affine Models on p. 89) of various orders and degrees. In Table 6-1, the validation results for such models of degree 3 and 4 are summarized. Model Order State Affine degree 3 State Affine degree 4 Validation RMSE [mv] Number of parameters Validation RMSE [mv] n= n= n= n= n= n= Number of parameters Table 6-1. Validation results for state affine models of degree 3 and 4. It is clear that the state affine approach yields unsatisfying results on the Silverbox data set. A possible reason for the poor performance is that the state affine approach approximates the nonlinear behaviour of a system by polynomial functions of the input, while the nonlinear behaviour of the Silverbox mainly consists of a nonlinear feedback of the output. As a result, high system orders are required in order to obtain a good model [69] Comparison with Other Approaches At the Symposium of Nonlinear Control Systems (Nolcos) in 2004, a special session was organized around the Silverbox device. The aim was to compare different modelling approaches applied to the same nonlinear device, using the same experimental data. In all the papers participating to this session, the multisine and the Gaussian noise data set were used for estimation and validation, respectively. Before continuing, a warning concerning the validation data is appropriate. As can be seen from Figure 6-2, the amplitude of the last part of the validation input exceeds the amplitude of the estimation data. Hence, for this part of the validation data, extrapolation will occur. It is important to emphasize that the performance achieved in this region is not a good measure for the model quality. It is rather a matter of luck : if there is an exact correspondence between the internal structure of the DUT and the model structure, this will generally yield in good extrapolation behaviour. But if this is not the case, and the estimated model is only an approximation, the extrapolation will in general be 151

168 Chapter 6: Applications of the Polynomial Nonlinear State Space Model poor. Therefore, the extrapolation behaviour should be discarded in a fair assessment. Note that the danger of extrapolation also resides in the use of too small amplitudes (for instance dead zones, which become relatively more important for smaller inputs). Among the papers, we distinguish three methodologies. The first one is a white box approach (H. Hjalmarsson [29], J. Paduart [52]), making explicit use of the knowledge about the internal structure of the Silverbox device. M. Espinoza [22], L. Sragner [76], and V. Verdult [81] employ a black box approach. Finally, the paper of L. Ljung [39] shows results for black box and grey box modelling. In the following, we will briefly describe each methodology. In [29], the internal block structure of the Silverbox is reordered to turn it into a MISO Hammerstein system. An existing relaxation method for Hammerstein-Wiener systems (published by E. Bai [1]) is extended to MISO systems, and applied to the Silverbox device. The model obtained with this method achieves a validation RMSE of 0.96 mv. Another white box approach is presented in [52]. In Chapter 4, the ideas from this paper are elaborated. The RMSE on the validation set is 0.38 mv. In [22], Least Squares Support Vector Machines (LS-SVM) for nonlinear regression are applied to model the Silverbox. The idea here is to consider a model where the inputs are mapped to a high dimensional feature space with a nonlinear mapping. This feature space is converted to a low dimensional dual space by means of a positive definite kernel K. As the final model is expressed as a function of K, there is no need to compute explicitly. Furthermore, the dual space allows to estimate the model parameters by solving a linear least squares problem under equality constraints. In [22], polynomial kernels are used on the Silverbox data, yielding a validation result of 0.32 mv. An even better model is obtained in [23] using a partially linear model (PL-LS-SVM): 0.27 mv. This approach includes the prior knowledge that linear regressors are present in the model. The model used in [81] is a state space model, composed of a weighted sum of two local second order linear models (LLM). The weights are a function of a scheduling vector, which is chosen equal to the output of the system for this particular DUT. A typical choice for the 152

169 Silverbox weighting functions are radial basis functions which are also used in [81]. The validation result obtained here is 1.3 mv. In [76], different types of Artificial Neural Networks (ANN) are assessed: Multi-Layer Perceptron (MLP) Networks and ΣΠ Networks, both making use of hyperbolic tangent base functions. In both cases, the maximal time lag for the input and the output is chosen equal to 5. The MLP Network has one hidden layer and contains 60 neurons. The ΣΠ Network has 20 multiplicative and 20 additive elements. The best model is a special MLP Network with only 10 hidden neurons, which also makes use of linear regressors. This results in a RMSE of 7.8 mv. A whole arsenal of different black and grey box techniques are applied in [39], such as neural networks, wavenets, block-oriented models, and physical models. The best result is achieved by a one hidden layer sigmoidal neural network with 30 neurons, using input and output regressors with a maximal time lag of 10. Note that a custom, cubic regressor is included, which improves the results significantly. This leads to a RMSE of 0.30 mv. In Table 6-2, the validation RMSE and the number of required parameters are summarized for the different approaches. Author Approach Validation RMSE [mv] Number of parameters J. Paduart PNLSS H. Hjalmarsson, [29] Physical block-oriented model J. Paduart, [52] Physical block-oriented model M. Espinoza, [22] LS-SVM with NARX M. Espinoza, [23] PL-LS-SVM with PL-NARX L. Sragner, [76] MLP-ANN V. Verdult, [81] Local Linear State Space model L. Ljung, [39] NL ARX model Table 6-2. Validation results for various modelling approaches. From Table 6-2, we conclude that the physical models achieve reasonable validation RMSE values. The main advantage of this approach is the small number of parameters, and the ability to give a physical interpretation to the identified parameters. In general, the black and deep-grey box models show the lowest RMSE values. The price paid for their excellent 153

170 Chapter 6: Applications of the Polynomial Nonlinear State Space Model performance is the higher number of parameters they require. An exception to the rule in Table 6-2 are the MLP neural networks. Several reasons can cause their poor performance for this particular set-up. First of all, the hyperbolic tangent functions used in the ANN approach do not exploit the polynomial behaviour of the Silverbox. Secondly, it could be that the neural network was not properly initialized, or that the nonlinear search used in the estimation procedure got stuck in a local minimum. We obtained a good result with the polynomial nonlinear state space model: a low RMSE value (0.26 mv), and a reasonable number of parameters (37). The reason why our approach works so well is due to the correspondence between the PNLSS model and the internal structure of the Silverbox, which basically consists of a cubic feedback of the output (see Nonlinear Feedback on p. 106). This match is a clear advantage in a validation test that requires extrapolation. To conclude, three black box models clearly stand out in this comparison, with RMSE values close to the noise level of 0.25 mv. This level was obtained from another data set, in similar experimental conditions. 154

Combine Harvester 6.2 Combine Harvester 6.2.1 Description of the DUT The system that we will model in this section is a New Holland CR-960 combine harvester (see Figure 6-7).

171 Combine Harvester 6.2 Combine Harvester Description of the DUT The system that we will model in this section is a New Holland CR-960 combine harvester (see Figure 6-7). Note that there is no grain header mounted at the front side of the harvester. We will use measurements performed by ir. Tom Coen from the KULeuven, Faculty of Bioscience Engineering, Department MeBioS. A block scheme of the traction system of the machine is shown in Figure 6-8. The black connections denote mechanical transmissions; the grey connection is part of the hydrostatic transmission. The diesel engine delivers the traction power and is coupled to a hydrostatic pump which on its turn drives a hydrostatic engine. The speed of the diesel engine is kept at the requested set point by a regulator which varies the fuel injection. The flow of the hydrostatic pump is controlled by an electric current. The power is then transferred to the front axle through the mechanical gearbox and the front differential. The traction system to be modelled has two inputs and one output. The first input is the steering current of the hydrostatic pump; the second input is the speed setting of the diesel engine. The engine speed is limited between 1300 and 2100 rotations per minute (rpm); the steering current of the hydrostatic pump can be varied between 0 % and 100 %. The output of this MISO system is the measured driving speed, expressed in km/h. A detailed analysis of Figure 6-7. Combine harvester. 155

172 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Diesel Engine Hydrostatic Pump Hydrostatic Engine Mechanical Gearbox Front Axle Speed Steering Machine Setting Current Speed [rpm] [%] [km/h] Figure 6-8. Traction system of the combine harvester. the expected system order of the traction system is presented in [9]. The dynamic behaviour is mainly located in the pump, and consists of three second order subsystems. A part of these dynamics can be neglected as they are relatively fast; hence the required model order turned out to be four Description of the Experiments All experiments were performed on the road, with the gearbox fixed in the second gear. Two sets of orthogonal random odd, random phase multisines were generated (see Periodic Data on p. 34). Hence, a total of 4 realizations were applied to both input channels. Each realization consisted of two periods of 4096 samples each, and 192 transient samples. The RMS value of the multisines for the first and second input were 57% and 1715 rpm, respectively. The bandwidth of the excitation signals was 2 Hz and the sampling frequency f s used in the experiments was 20 Hz. The first two realizations will be used to estimate the models and the remaining two to validate them. Due to timing problems with the PXI instrumentation system used to perform the experiments, the applied input signals were not completely periodic. As a consequence, we cannot exploit the periodic nature of the original signals to separate the measurement noise from the nonlinear contributions. Hence, we treat the data sequences as if they were non periodic (see Non Periodic Data on p. 38) Best Linear Approximation To estimate the Multiple Input, Multiple Output (MISO) BLA and its covariance, we split the estimation data ( 2 ( ) samples) in M = 32 subrecords of 524 samples. 156

173 Combine Harvester G G 12 Amplitude [db] Frequency [Hz] 0 45 G Frequency [Hz] 0 45 G 12 Phase [ ] Frequency [Hz] Frequency [Hz] Figure 6-9. The MISO BLA (G 11 and G 12 ) of the combine harvester (solid black line); Total standard deviation (dashed black line); 6th order linear model (solid grey line); Model error (dashed grey line). Then, we compute the auto- and cross spectra with equation (2-24). Finally, the BLA is obtained with equation (2-41), and its covariance with equation (2-43). Next, a 4th, 5th, and 6th order linear model is estimated with a subspace method (see Frequency Domain Subspace Identification on p. 115) from the BLA and its covariance matrix. The subspace method is then followed by a nonlinear optimization. Figure 6-9 shows the BLA (solid black line) and the total standard deviation (dashed black line), together with the 6th order linear model (solid grey line) and the amplitude of the complex model error (dashed grey line). G 11 is the transfer function from the steering current (input 1) to the measured speed (output 1); G 12 is the transfer function from the diesel engine s speed setting (input 2) to the measured speed (output 1). Next, a validation test is carried out with the 6th order linear model. In 157

174 Chapter 6: Applications of the Polynomial Nonlinear State Space Model RMSE: 0.73 km/h Speed [km/h] Figure 6-10, the model error for the validation data is shown (grey) together with the measured output (black). The validation data consists of two merged multisine realizations. Therefore, two transient phenomena are present in the model error: one at the start of the data set and one around 400 s. For the calculation of the RMSE (0.73 km/h), we discard 200 samples at the start of each realization in order to eliminate the effect of the transients. When taking a closer look at the simulation error, we observe periodic residuals. This effect can be caused by periodic disturbances (e.g. coupling with 50 Hz mains), or by unmodelled dynamics (since a quasi-periodic excitation signal was employed). In the next section, it will become clear that there are unmodelled dynamics Time [s] Figure Validation result for the 6th order linear model: measured output (black) and model simulation error (grey) Nonlinear Model For the nonlinear modelling, we use as starting values the 4th, 5th, and 6th order linear model obtained in the previous step. Two types of nonlinear state space models are considered: polynomial nonlinear models and state affine models. For the polynomial models, we have observed that a nonlinear output equation does not enhance the modelling results. Therefore, we only show the results for models with a linear output equation ( η() t = 0 ). We also have noticed that the nonlinear combinations with the inputs in the state equation do not improve the results. Hence, these terms are omitted in what follows. The validation RMSE of the estimated polynomial nonlinear models is shown in Table 6-3 (left: ζ() t = xt () ( 3), and right: ζ() t = xt () { 3} ). From this table, it is clear that the RMSE decreases for higher model orders, 158

175 Combine Harvester PNLSS nx=[3], ny=[] PNLSS nx=[2 3], ny=[] Model Order Validation RMSE [km/h] Number of parameters Validation RMSE [km/h] Number of parameters n= n= n= Table 6-3. Validation results for the polynomial nonlinear state space models. while the number of parameters increases significantly. The best result achieved with the polynomial nonlinear approach is a RMSE of 0.35 km/h using a 6th order model with nx=[3]. The validation error of this model is shown in Figure 6-11 (grey), together with the measured output (black). The two transients are discarded in the same way as described in the previous section. Figure 6-12 shows the spectra of the measured validation output (black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey). From this plot, we observe that the nonlinear model reduces the linear model error between DC and 1 Hz, but for higher frequencies no significant improvement is obtained. Furthermore, state affine models of degree 3 and 4 are also estimated. The validation results for these models are shown in Table 6-4. A 5th order model of degree 3 yields the best result (0.39 km/h). No clear trends are visible in Table 6-4; the results are comparable to what we RMSE: 0.35 km/h Speed [km/h] Time [s] Figure Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). 159

176 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 40 Amplitude [dbkm/h] obtained with the polynomial nonlinear state space models. Hence, both approaches perform equally well Frequency [Hz] Figure DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). State Affine degree 3 State Affine degree 4 Model Order Validation RMSE [km/h] Number of parameters Validation RMSE [km/h] Number of parameters n= n= n= Table 6-4. Validation results for state affine models of degree 3 and

177 Semi-active Damper 6.3 Semi-active Damper Description of the DUT This application concerns the modelling of a magneto-rheological (MR) damper. This damper is called semi-active, because the characteristics of the viscous fluid inside the damper is influenced by a magnetic field. Hence, the relation between the force over the damper and the position/velocity of the piston is changed. shaker damper load cell current piston viscous fluid Figure Measurement set-up of the magneto-rheological damper. Two quantities serve as input to this system: the reference signal applied to the PID controller to regulate the piston position via the shaker, and the current which determines the magnetic field over the viscous fluid. As system output, we consider the force over the damper, which is measured by a load cell. The measurement set-up is shown in Figure For an ideal, linear damper, we expect to obtain an improper first order model, i.e., the theoretical relationship between displacement and force for a perfect damper. Due to the non-idealities of the device, the required model order will turn out to be higher Description of the Experiments Both the construction of the set-up and the measurements were carried out by ir. Kris Smolders from the PMA Department of the KULeuven. He applied three realizations of a full grid, random phase multisine to the DUT. The multisines were excited in a frequency band between 0.12 Hz and 10 Hz, and 6 periods per realization were measured with samples per period. In all the measurements, a sampling frequency f s of 2000 Hz was used. 161

178 Chapter 6: Applications of the Polynomial Nonlinear State Space Model A slow DC trend present in the measured output data was removed prior to the estimation procedures. After removal of the DC levels, the signals applied to the first (piston reference) and second input (damper current) of the DUT have a RMS value of 39 mv and 194 mv, respectively. The first two multisine realizations are used for the estimation of the models, and the third realization for the validation Best Linear Approximation First, we will estimate the device s Best Linear Approximation. Unfortunately, only two realizations were available. For a dual input system, this is sufficient to calculate the BLA, but not enough to determine an estimate of the covariance. Hence, the approach described in 10 0 G G 12 Amplitude [db] Frequency [Hz] G Frequency [Hz] 270 G 12 Phase [ ] Frequency [Hz] Frequency [Hz] Figure The MISO BLA (G 11 and G 12 ) of the semi-active damper (solid black line); 3rd order linear model (solid grey line); Total standard deviation (dashed black line); Model error (dashed grey line). 162

179 Semi-active Damper RMSE: mv 0.2 Amplitude [V] Time [s] Figure Validation result for the 3rd order linear model: measured output (black) and model simulation error (grey). Periodic Data on p. 34 is not suitable. Therefore, we employ the method described in Non Periodic Data on p. 38 to determine the BLA from the averaged input/output data. We compute the auto- and cross spectra with equation (2-24), with M = 32 blocks of 4096 samples. Finally, the BLA is obtained with equation (2-41) and its covariance with equation (2-43). Then, some linear models with different model orders (2nd to 5th order) are estimated from the BLA and its covariance matrix, using a subspace method (see Frequency Domain Subspace Identification on p. 115) which is followed by a nonlinear optimization. Figure 6-14 shows the MISO BLA (solid black line) and the total standard deviation (dashed black line), together with the 3rd order linear model (solid grey line) and the amplitude of the complex model error (dashed grey line). G 11 is the transfer function from the piston reference (input 1) to the measured force (output 1); G 12 is the transfer function from the damper current (input 2) to the measured force (output 1). G 11 behaves like expected for a damper: ideally, the force over the damper should be proportional to the velocity of the piston, i.e., jω times the displacement. This is, indeed, roughly what we observe for G 11. Furthermore, from the top plots it can be seen that the relative uncertainty on G 12 is high compared with the one on G 11. Hence, the estimated linear model is mainly determined by G 11. Next, a validation test is carried out with the 3rd order linear model. In Figure 6-15, the model error for the validation data is shown (grey), together with the measured output (black). The RMS value of the model error (34 mv) is quite high compared with the RMS value of the measured output (71 mv). We will reduce this error using nonlinear models. 163

180 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Nonlinear Model For the nonlinear modelling, we use as starting values the 2nd to 5th order linear models obtained in the previous step. Again, two types of nonlinear state space models are considered: PNLSS and state affine models. For the polynomial models, we have observed that using a nonlinear relation for both the state and the output equation always yields better modelling results. Hence, no linear state nor output equation is considered in what follows. PNLSS, "full" nx=[2 3], ny=[2 3] PNLSS, "states only" nx=[2 3], ny=[2 3] Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n= n= n= Table 6-5. Validation results for the PNLSS models with (left) all the nonlinear combinations, and (right) without nonlinear combinations using the input. For the PNLSS model, we will make a distinction between two choices for the nonlinear vectors ζ() t and ηt (). First, we take into account all nonlinear combinations using the states and the inputs (referred to as "full", ζ() t = ηt () = ξt () { 3}, with ξ() t = [ xt ()ut ; ]). 0.2 RMSE: 6.6 mv Amplitude [V] Time [s] Figure Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). 164

181 Semi-active Damper 0 Amplitude [dbv] Frequency [Hz] Figure DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). Secondly, we consider only the nonlinear combinations of the states, without the inputs (referred to as "states only", ζ() t = ηt () = xt () { 3} ). The validation RMSE of the estimated polynomial models is given in Table 6-5. It is clear that the RMSE decreases for higher model orders up to n=4. We also conclude that taking into account the nonlinear combinations of the input improves, on average, the RMSE at the price of a significantly higher number of parameters. For the PNLSS approach, the best result is achieved using a states only 4th order model with degree nx=[23], ny=[23], resulting in a RMSE of 6.6mV. This is a reduction of the model error with a factor 5 compared with the linear model. The validation error for the best nonlinear model is given in Figure 6-16 (grey), together with the measured output (black). Figure 6-17 shows the spectra of the measured validation output signal (black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey). This plot illustrates that the nonlinear model squeezes down the model error over a broad frequency range. Furthermore, different state affine models of degree 3 and 4 are estimated. The validation results for these models are given in Table 6-6. A 4th order model of degree 4 yields the best result (13.5 mv). For this DUT, the PNLSS approach performs clearly better than the state affine approach. By employing a 4th order PNLSS model, the simulation error on the validation set was reduced with more than a factor 5 compared with the BLA: from 34 mv to 6.6 mv. This result should 165

182 Chapter 6: Applications of the Polynomial Nonlinear State Space Model State Affine degree 3 State Affine degree 4 Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n= n= n= Table 6-6. Validation results for different state affine models of degree 3 and 4. be compared with the noise level (1.8 mv), which can easily be determined since several periods of the measured data are available. Hence, it should be possible to reduce the model error with an additional factor 3. However, it was not possible to achieve this with the PNLSS approach. Maybe the nonlinear optimization got stuck in a local minimum, or the model order/ degree should be increased further in order to obtain better results. 166

Quarter Car Set-up 6.4 Quarter Car Set-up 6.4.1 Description of the DUT In this test case, we study a quarter car set-up which is situated at the PMA Department of the KULeuven, and which was built by ir.

183 Quarter Car Set-up 6.4 Quarter Car Set-up Description of the DUT In this test case, we study a quarter car set-up which is situated at the PMA Department of the KULeuven, and which was built by ir. Kris Smolders [72]. The set-up is a scale model of a car suspension based on masses, springs, and the magneto-rheological damper that was modelled in section 6.3. car mass load cell Magneto-rheological damper wheel mass Figure Quarter car set-up. shaker The system is excited by a hydraulic shaker which emulates, by means of a PID controller, the vertical road displacement. The reference signal for the position of the shaker serves as system input. The force over the damper is considered as the system output and is measured with a load cell, which is placed between the damper and the car mass. Taking into account the various interactions between the masses, springs and damper, and the shaker dynamics, the expected model order is about six (for a more elaborate discussion, see [72]) Description of the Experiments K. Smolders applied two realizations of a full grid, random phase multisine (RPM), which was excited in a frequency band between 0.05 Hz and 10 Hz. Per multisine realization, 10 periods 167

184 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 0.4 (a) RPM data set 0.4 (b) GN data set Amplitude [V] Time [s] Time [s] Figure (a) Random Phase Multisine data set, and (b) Gaussian Noise data set. RMS RPM (light grey line), RMS GN (dark grey line), and extrapolation zone (grey block). were measured, with samples per period. Furthermore, a filtered Gaussian noise (GN) sequence with a linearly increasing RMS value over time was applied to the system. This signal consisted of about data samples. The RMS value of the multisine and the noise sequence are, respectively, 75 mv and 58 mv. Both data sets are shown in Figure In the plot on the right side, the RMS value of the RPM sequence is given (light grey line), together with the RMS value of the GN data set (dark grey line). The latter is calculated per block of samples. From this plot, we observe that the RMS value of the Gaussian noise sequence exceeds the RMS value of the multisine data set around t=100 s. For larger values of t, we end up in the extrapolation zone (grey block). In all the measurements, the current applied to the semi-active damper was fixed to 1 A, and a sampling frequency f s of 2000 Hz was used. Prior to the estimation, a slow DC trend that stems from the load cell sensor was removed from all the measured data, using linear detrending. Originally, the RPM data set was intended for the estimation, and the GN data set for the validation. However, this leads to poor modelling results, even when the GN sequence is only used up to t=100 s. A possible (a) DFT Spectrum RPM signal (b) DFT Spectrum GN signal Frequency [Hz] Frequency [Hz] Figure DFT spectrum of (a) the RPM signal, and (b) the GN data set. 168

185 Quarter Car Set-up explanation for this is the fact that the spectrum of the GN is broader than the RPM s spectrum (see Figure 6-20). Hence, we decided to interchange the roles of both data sets such that spectral extrapolation is avoided: the GN data set serves now for the estimation of the models, and the RPM data set for the validation Best Linear Approximation Since the GN data set is employed for the estimation of the models, the approach described in Non Periodic Data on p. 38 is employed to determine the BLA. First, we compute the autoand cross spectra with equation (2-24), with M = 34 blocks of samples. Then, the BLA is obtained with equation (2-41) and its covariance with equation (2-43). From this data, 4th to 6th order linear models are estimated using a subspace method (see Frequency Domain Amplitude [db] Frequency [Hz] 90 Phase [ ] Frequency [Hz] Figure BLA of the quarter car set-up (solid black line); Total standard deviation (black dashed line); 4th order linear model (solid grey line); Model error (dashed grey line). 169

186 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 1 RMSE: 136 mv 0.5 Amplitude [V] Subspace Identification on p. 115), which is followed by a nonlinear optimization. Figure 6-21 shows the BLA (solid black line) and the total standard deviation (dashed black line), together with the 4th order linear model (solid grey line) and the amplitude of the complex model error (dashed grey line). Next, the 4th order linear model is validated on the RPM data set. In Figure 6-22, the simulation error for the validation data is given (grey) together with the measured output (black). The RMS value of the model error (136 mv) is quite high compared to the RMS value of the measured output (285 mv). Hence, we will try to reduce this error with the PNLSS approach Time [s] Figure Validation result for the 4th order linear model: measured output (black) and model simulation error (grey) Nonlinear Model In what follows, we only show the results for the PNLSS models. State affine models were also estimated but are omitted here, since they yielded poor results. In Table 6-7, the validation results are shown for PNLSS models of various orders, with a nonlinear state and output equation. Two kinds of models are discussed: models that contain all the nonlinear combinations of the states and the inputs (PNLSS, "full", ζ() t = ηt () = ξt () { 3} ), and models that only employ the nonlinear combinations of the states (PNLSS, states only, ζ() t = ηt () = xt () { 3} ). The entries that are indicated with N.A. correspond to models which could not be estimated due to memory restrictions. To this matter, recall that the estimation data set consists of about data samples. The best validation result is achieved by the 5th order model from the right hand side of the table, giving a simulation 170

187 Quarter Car Set-up PNLSS, "full" nx=[2 3], ny=[2 3] PNLSS, "states only" nx=[2 3], ny=[2 3] Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n=5 N.A. N.A n=6 N.A. N.A Table 6-7. Validation results for the PNLSS models with all the nonlinear combinations (left), and without nonlinear combinations with the input (right). error of 44 mv. Since the validation data are periodic and several periods were measured, we can compare this figure to the noise level at the output, which is 1.8 mv. Apparently, a significant amount of unmodelled dynamics is still present in the residuals, although the model error decreased with more than a factor 3 compared with the linear model. Hence, this DUT is an example where the PNLSS approach delivers unsatisfying results. A higher model order or an increased nonlinear degree might improve the results, but the size of the data set prevents the estimation of such models due to memory restrictions. Another possible explanation for the poor result is that the polynomial approximation is not suited for this setup, e.g. due to the presence of hard saturation in the DUT. In Figure 6-24, the spectra of the measured validation output signal (black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey) are shown. From this plot, it can be seen that a significant model error reduction is achieved by the nonlinear model between DC and approximately 50 Hz. RMSE: 44 mv Amplitude [V] Time [s] Figure Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). 171

188 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 20 Amplitude [dbv] Frequency [Hz] Figure DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). 172

Robot Arm 6.5 Robot Arm 6.5.1 Description of the DUT In this case study, we will model a robot arm (see Figure 6-25) that was constructed by ir.

The goal of his research is to design a controller for the robot arm such that it can be used for long-distance surgery.

189 Robot Arm 6.5 Robot Arm Description of the DUT In this case study, we will model a robot arm (see Figure 6-25) that was constructed by ir. Thomas Delwiche and his co-workers from the Control Engineering Department of the Université Libre de Bruxelles (ULB). The goal of his research is to design a controller for the robot arm such that it can be used for long-distance surgery. The manipulations carried out by a surgeon with the master robot arm should be repeated accurately by a slave device, and should give force feedback to the surgeon. T. Delwiche carried out experiments on the device in cooperation with the Department ELEC of the Vrije Universiteit Brussel. The robot arm rotates by means of a DC motor that is driven by a servo-amplifier, which incorporates a controller. The reference voltage sent to the servo-amplifier serves as input of the system. The input signal is proportional to the couple applied to the arm (1V = Nm). The output of the system is the angle of the arm, measured with a 1024 counts per turn encoder connected to the motor shaft (1V 90º). Furthermore, the speed of the arm is fed back through a controller in order to introduce damping in the system. This feedback loop is considered as an intrinsic part of the DUT. When we neglect the dynamics of the DC motor and the nonlinear effects in the set-up, we expect a second order relationship between the couple applied to the arm, and the resulting angle. Figure Robot arm. 173

190 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Description of the Experiments T. Delwiche performed several multisine experiments on the robot arm, using different RMS input levels and different bandwidths for the excitation signal. All experiments were performed at a sampling frequency of 10 MHz/2 14 = Hz. From all the available data, we selected a set of experiments in which the excitation signal has a bandwidth of 30 Hz and a RMS value of 80 mv. Ten realizations of a random odd, random phase multisine were applied to the DUT. Each realization consisted of two periods with samples per period. We use eight of these realizations (a total of samples) to estimate the models and the remaining two realizations ( samples) to validate them. 0 Amplitude [db] Frequency [Hz] Phase [º] Frequency [Hz] Figure BLA of the robot arm (solid black line); Total standard deviation (dashed black line); Measurement noise level (dotted black line); 2nd order linear model (solid grey line); Model error (dashed grey line). 174

191 Robot Arm Best Linear Approximation First, we calculate the BLA with formula (2-30). Since periodic excitations were used and more than one period per realization was measured, it is possible to distinguish the nonlinear contributions and the measurement noise. Figure 6-26 shows the estimated BLA of the robot arm (solid black line). Furthermore, the total standard deviation σˆ BLA( k) due to the combined effect of measurement noise and nonlinear distortions (dashed black line), and the standard deviation due to the measurement noise σˆ n( k) (dotted black line) are also plotted. We see that the total standard deviation lies significantly higher than the measurement noise, indicating that the nonlinear behaviour is dominant compared with the measurement noise. Then, a number of linear models are estimated with subspace techniques, followed by a nonlinear optimization of the cost function (5-101), using only the excited frequency lines. The best result is achieved by a 3rd order linear model. This model is also plotted in Figure 6-26 (solid grey line), together with the model error (dashed grey line). Although these models seem to fit well the nonparametric Best Linear Approximation estimate, they deliver poor validation results. Hence, a second nonlinear optimization is applied: this time using all the frequency lines including DC. After this optimization, the 3rd order linear model is validated. The result is presented in Figure 6-27 which shows the measured output (black) and the model error (grey). The RMS error of this model is 34.6 mv. This should be compared to the RMS level of the output which is 218 mv. We will now try to reduce this model error by estimating a number of nonlinear state space models RMSE: 34.6 mv Amplitude [V] Time [s] Figure Validation result for the 3rd order linear model: measured output (black) and model simulation error (grey). 175

192 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Nonlinear Model Again, we start by estimating some polynomial nonlinear state space models, using the linear models obtained in the previous step as starting values. The results are given in Table 6-8. The second column shows models with a nonlinear state and output equation ( ζ() t = ηt () = ξt () { 3} ); the third column shows models that only have a nonlinear state equation ( ζ() t = ξt () { 3}, η() t = 0 ). The best result is achieved by the 3rd order model of the second column (13.5 mv). PNLSS nx=[2 3], ny=[2 3] PNLSS nx=[2 3], ny=[] Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n= n= Table 6-8. Validation results for the polynomial nonlinear state space models. Furthermore, state affine models of degree 3 and 4 are estimated. The results are summarized in Table 6-9. An increasing model order improves the RMSE, apart from some exceptions where the nonlinear optimization probably got stuck into a local minimum. State Affine degree 3 State Affine degree 4 Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n= n= n= n= n= n= n= Table 6-9. Validation results for state affine models of degree 3 and

193 Robot Arm RMSE: 5.3 mv Amplitude [V] Time [s] Figure Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). For the robot arm, the state affine approach clearly yields better results than the PNLSS approach when it comes to minimizing the RMSE. To see this, we compare the best RMSE achieved on the validation set: 5.3 mv versus 13.5 mv. The validation test of the best nonlinear model is shown in Figure Compared with the linear model, the model error is reduced with almost a factor 7: from 34.6 mv to 5.3 mv. Although this is a good result, the smallest validation error is still large compared with the noise level (0.4 mv). This indicates that there are still unmodelled dynamics in the residuals. Figure 6-29 shows the DFT spectra 20 0 Amplitude [dbv] Frequency [Hz] Figure DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). 177

194 Chapter 6: Applications of the Polynomial Nonlinear State Space Model of the measured validation output signal (black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey). The nonlinear model visibly reduces the model error over a broad spectral range. The remaining errors are concentrated around DC and stem from the (low frequency) drift problems observed during the measurements. 178

195 Wiener-Hammerstein 6.6 Wiener-Hammerstein Description of the DUT In this section, we will model an electronic circuit with a Wiener-Hammerstein structure, designed by Gerd Vandersteen [82] from the Vrije Universiteit Brussel, Department ELEC. The system is composed of a static nonlinear block, sandwiched between two linear dynamic systems. u u 0 f f y 0 y Figure Wiener-Hammerstein system. The first linear system is a 3rd order Chebyshev low-pass filter with a 0.5 db ripple and a pass band up to 4.4 khz. The static nonlinearity is realized by resistors and a diode. The second linear system is a 3rd order inverse Chebyshev low-pass filter with a -40 db stop band, starting at 5 khz Description of the Experiments The excitation signal consists of two parts: four periods of a random odd, random phase multisine with samples per period, and about data points of filtered Gaussian noise. Both signals have a bandwidth of 10 khz and a RMS value of about 640 mv. The multisine will be utilized for the estimation procedure, and the Gaussian noise for validation purposes. We performed the measurements at a sampling frequency of 51.2 khz Level of Nonlinear Distortions First, we will analyse the level of nonlinear distortions from the multisine experiment. In Figure 6-31, the spectrum of the averaged output is shown. The solid black line represents the output at the excited lines. The grey circles and crosses denote the contributions at the odd and even detection lines, respectively. In order to improve the visibility of the figure, the 179

196 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Amplitude [db] Frequency [khz] Figure Averaged output spectrum Excited lines (solid black line), Standard deviation (dashed black line) Odd nonlinear distortion (grey circles), Even nonlinear distortion (grey crosses). number of plotted contributions on the detection lines is reduced. From Figure 6-31, it is clear that the nonlinear distortions lie in the pass band about 20 db below the linear contributions. Furthermore, the even nonlinear distortions slightly dominate the odd nonlinear contributions. The standard deviation on the excited lines, which is a measure for the measurement noise, is also plotted (dashed black line). We see that in the pass band, the noise level is about 30 db lower than the nonlinear distortion level Best Linear Approximation We now calculate Ĝ BLA ( jω k ) using the multisine data set. Since several periods were measured, the variance due to the measurement noise σˆ n 2 ( k) is estimated using formula (2-17). To calculate the total variance σˆ 2 BLA( k) (i.e., the effect of the nonlinear distortions and the measurement noise), we cannot use equation (2-21), because only one multisine realization was applied. However, the level of nonlinear distortions at the non excited harmonic lines can be interpolated to the excited frequency lines. This allows to calculate the total variance on the BLA. The BLA is plotted in Figure 6-32 (solid black line), together with the standard deviation standard deviation σˆ n( k) σˆ BLA( k) due to the measurement noise (dotted black line), and the total (dashed black line). 180

197 Wiener-Hammerstein 0 Amplitude [db] Frequency [khz] 180 Phase [º] Frequency [khz] Figure BLA of the Wiener-Hammerstein circuit (solid black line); Total standard deviation (dashed black line); Measurement noise level (dotted black line); 6th order linear model (solid grey line); Model error (dashed grey line). RMSE: 36.2 mv 0.5 Amplitude [V] Time [s] Figure Validation result for the 6th order linear model: measured output (black) and model simulation error (grey). 181

198 Chapter 6: Applications of the Polynomial Nonlinear State Space Model Linear models of various orders are estimated. For this, a subspace technique is used which is followed by a numeric optimization, both carried out in the frequency domain. The 6th order linear model yields the best result and is shown in Figure 6-32 (solid grey line), together with the model error (dashed grey line). Next, the 6th order linear model is validated using the Gaussian noise sequence. In Figure 6-33, the simulation error (grey) is plotted together with the measured output (black). The RMSE is quite high compared to the RMS level of the output: 36.2 mv versus 213 mv. Note also that the asymmetric behaviour of the model error is in agreement with the dominant even nonlinear behaviour of the system Nonlinear Model The linear models obtained in the previous section are now used as starting values to estimate a number of polynomial nonlinear state space models. We estimate models of 4th, 5th, and 6th order. Two kinds of models are discussed: models that use all the nonlinear combinations of the states and the input ( ξ() t { 3}, full ), and models that use the nonlinear combinations of the states only ( xt () { 3}, states only ). We also verify whether the use of a linear or nonlinear output equation influences the results. Table 6-10 shows the modelling results for the full PNLSS models. Model Order PNLSS, full nx=[2 3], ny=[2 3] Validation RMSE [mv] Number of parameters Validation RMSE [mv] PNLSS, full nx=[2 3], ny=[] Number of parameters n= n= n= Table Validation results for the full PNLSS models. In Table 6-11, the validation results are shown for models that do not use the input in the nonlinear combinations. Since there are less nonlinear terms in these models, the number of required parameters is significantly lower. However, the RMSE values are always higher than the corresponding entries in Table When taking a closer look at Table 6-10, we observe that the models with a linear output equation ( η() t = 0, on the right side of the table) always yield better results than the models with a nonlinear output equation. The best PNLSS 182

199 Wiener-Hammerstein model is the 6th order model with a linear output equation that uses all the nonlinear combinations of the states up to degree [2 3]. It has a validation RMSE of 3.21 mv. Model Order PNLSS, states only nx=[2 3], ny=[2 3] Validation RMSE [mv] Number of parameters PNLSS, states only nx=[2 3], ny=[] Validation RMSE [mv] Number of parameters n= n= n= Table Validation results for the states only PNLSS. Next, we estimate state affine models of various orders and of degree 3 and 4. Looking at Table 6-12, which shows the validation RMSE for the state affine approach, we observe a similar trend as with the robot arm data: the RMSE diminishes smoothly as the model order increases. State Affine degree 3 State Affine degree 4 Model Order Validation RMSE [mv] Number of parameters Validation RMSE [mv] Number of parameters n= n= n= n= n= n= Table Validation results for state affine models of degree 3 and 4. The best validation result is achieved by the 10th order model of degree 4, with a RMSE of 2.6 mv. The simulation error of this model is plotted in Figure 6-34 (grey), together with the measured output signal (black). 183

200 Chapter 6: Applications of the Polynomial Nonlinear State Space Model RMSE: 2.6 mv 0.5 Amplitude [V] Time [s] Figure Validation result for the best nonlinear model: measured output (black) and model simulation error (grey). Figure 6-35 shows the spectra of the measured validation output signal (black), the linear simulation error (light grey), and the nonlinear simulation error (dark grey). In the pass-band of the device, the nonlinear model pushes down the model error with about 20 db. Beyond 5 khz, no significant difference between the linear and the nonlinear model error can be observed. 0 Amplitude [dbv] Frequency [khz] Figure DFT spectra of the measured validation output signal (black), linear simulation error (light grey), and nonlinear simulation error (dark grey). 184

201 Wiener-Hammerstein Comparison with a Block-oriented Approach In [68], the same measurements were used to model this electronic circuit using a blockoriented approach. Both linear blocks of the Wiener-Hammerstein model were identified as a 6th order linear model. The static nonlinearity was parametrized as a 9th degree polynomial. Taking into account two exchangeable gains between the three blocks, this results in a total of 34 parameters. Furthermore, the RMS value of the simulation error for this model is 3.8 mv. Hence, we see that this error is reduced with more than 30% to 2.6 mv using the PNLSS/ state affine approach, at the cost of a significantly higher number of parameters. 185

202 Chapter 6: Applications of the Polynomial Nonlinear State Space Model 6.7 Crystal Detector Description of the DUT The last modelling challenge discussed in this chapter is an Agilent-HP420C crystal detector (see Figure 6-36). This kind of device is often used in microwave applications to measure the envelope of a signal. The RF connection of the crystal detector (the left part in Figure 6-36) serves as input of the DUT. The video connection of the detector (right part) is considered as output of the system. From the physical, block-oriented model proposed in [63], we expect to find a second order relationship between the input and output of this device. Agilent 423B Figure Agilent-HP crystal detector Description of the Experiments Ir. Liesbeth Gommé from the ELEC Department at the Vrije Universiteit Brussel carried out the experiments with the crystal detector. She applied two filtered Gaussian noise sequences of samples with a growing RMS value as a function of time. Both signals are superimposed on a DC level of 117 mv, and have a total RMS value of 118 mv. The input 0.2 (a) Estimation data set 0.2 (b) Validation data set Amplitude [V] Time [ms] Time [ms] Figure Estimation and validation input sequences. 186

Time domain identification, frequency domain identification. Equivalencies! Differences?

Time domain identification, frequency domain identification. Equivalencies! Differences? J. Schoukens, R. Pintelon, and Y. Rolain Vrije Universiteit Brussel, Department ELEC, Pleinlaan, B5 Brussels, Belgium