AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

Size: px

Start display at page:

Download "AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET"

Jessica Maryann Norton
5 years ago
Views:

Identification of Linear and Nonlinear Dynamical Systems Theme : Nonlinear Models Grey-box models Division of Automatic Control Linköping University Sweden General Aspects Let Z t denote all

1 Identification of Linear and Nonlinear Dynamical Systems Theme : Nonlinear Models Grey-box models Division of Automatic Control Linköping University Sweden General Aspects Let Z t denote all available (input-output) data up to time t A mathematical model for the system is a function from these data to the space where the output at time t, y(t) lives, in general ŷ(t t ) = g(z t, t) The function can be thought of as a predictor of the next output A parametric model structure is a parameterized family of such models: g(z t, θ) Choice of regressors and nonlinear function Functions for a scalar regressor Expansion into multiple regressors Examples of named structures All aspects on curve fitting applies pretty much also to this case The difficulty is the enormous richness in possibilities of parameterizations There are two main cases : General models of great flexibility Grey-box models: Models that incorporate some knowledge of the character of the actual system Black-box Models: General Comments NL Black Box: Choice of g 6 The general mapping g(z t, θ) is normally too flexible Let us split it into one mapping from Z t to a regression vector ϕ(t) of fixed dimension d and a mapping g from R d to R (assuming the output to be scalar): Leaves two problems g(z t, θ) = g(ϕ(t), θ) ϕ(t) = ϕ(z t ) Choose the mapping g(ϕ, θ) Choose the regression vector ϕ(t) ( or ϕ(t, θ) = ϕ(z t, θ)) First, consider ϕ to be scalar Basic form g(ϕ, θ) = N α k (β k (ϕ γ k )) k= (x) = cos(x): Fourier transform (x) = U(x): Unit pulse, gives piecewise constant functions g Soft version: (x) = e x / (x) = H(x): Step at x =, gives also piecewise constant functions Soft version: (x) = +e x α coordinates, β scale or dilation, γ location Several Regressors 7 Examples of Named Structures 8 Consider now ϕ to be a d-dimensional vector, but let still (x) be a function of one variable How to interpret (β(ϕ γ))? Radial β(ϕ γ) = ϕ γ β = (ϕ γ) T β(ϕ γ) γ a d-dimensional vector, β a d d-matrix (positive definite) or scaled version of the identity matrix with β a scalar Describes an ellipsoid in R d Ridge β(ϕ γ) = β T ϕ γ β a d-dimensional vector, γ a scalar Describes a hyperplane in R d Tensor is a product of factors corresponding to the components of the vector: (β(ϕ γ)) = d k= (β k(ϕ k γ k )) γ and β are d-dimensional vectors and subscript denotes component ANN: artificial Neural Networks One hidden layer sigmoidal: (x) = +e x, ridge extension Radial Basis Networks: (x) = e x /, radial extension Wavelets: is the mother wavelet and β j = j, γ k = j k (double indexing) as fixed choices (Neuro)-Fuzzy models: are the membership functions, tensor expansion

2 Simulation and Prediction 9 Choice of Regressors Suppose ϕ(t) = [y(t ), u(t )] T The (one-step ahead) predicted output at time for a given model θ is then Four players: Outputs y(t k), Inputs u(t k) ŷ p (t θ) = g([y(t ), u(t )] T, θ) It uses the previous measurement y(t ) A tougher test is to check how the model would behave in simulation, ie when only the input sequence u is used The simulated output is obtained as above, by replacing the measured output by the simulated output from the previous step: ŷ s (t, θ) = g([ŷ s (t, θ), u(t )] T, θ) Simulated model outputs ŷ s (t k, θ) Predicted model outputs ŷ p (t k θ) Regressors for dynamical systems are typically chosen among those: NFIR-models use past inputs NARX-models use past inputs and outputs NOE-models use past inputs and past simulated outputs NARMAX-models use inputs, outputs predicted outputs NBJ-models use all four regressor types Notice a possible stability problem! Recurrent Networks Network Aspects Several Layers For NOE, NARMAX and NBJ, previous outputs from the model have to be fed back into the model computations on-line: ϕ g The model structures are really basis function expansions However, since the basis functions are variants of the same function, a graphical description looks like a network One can also let the regressors be outputs from a previous layer of the network: Input layer Hidden layers Output layer θ θ θ q - q - These are called recurrent networks and require considerable more computational work to fit to data { ϕ g Example: Hydraulic Crane Data Linear Model These are data from a forest harvest machine: OUTPUT # Black: Measured Output Blue: Model Simulated Output Linear model: Fit 7 % INPUT # Sigmoidal ANN model Wavenet Model (Radial BF ANN Model) 6 Sigmoidal NN: Fit 9 % Wavenet model: Fit 7 %

7 Physical Modeling 8 Physical Modeling Semi-physical Modeling Block-models Local Linear Models Perform physical modeling (eg in MODELICA) and denote unknown physical parameters by θ Collect the

3 7 Physical Modeling 8 Physical Modeling Semi-physical Modeling Block-models Local Linear Models Perform physical modeling (eg in MODELICA) and denote unknown physical parameters by θ Collect the model equations as ẋ(t) = f(x(t), u(t), θ) y(t) = h(x(t), u(t), θ) (or in DAE, Differential Algebraic Equations, form) For each parameter θ this defines a simulated (predicted) output ŷ(t θ) which is the parameterized function ŷ(t θ) = g(z t, θ) in somewhat implicit form To be a correct predictor this really assumes white measurement noise Some more sophistical noise modeling is possible, usually involving ad hoc non-linear observers The approach is conceptually simple, but could be very demanding in practice Example: Missile 9 The Equations function [dx, y] = missile(t, x, p, u); MISSILE A non-linear missile system Output equation y = [x(); x(); x(); -p(8)*u()*(p()*x()+p()*u())/p(); -p(8)*u()*(p()*x()+p()*u())/p() ]; % Angular velocity around x % Angular velocity around y % Angular velocity around z % Acceleration in y-directi % Acceleration in z-directi inputs, outputs, 6 unknown parameters % State equations dx = [/p(9)*(p(7)*p(8)*(p()*x()+*p(6)*p(7)*x()/u()+ % Angular p(7)*u())*u()-(p()-p())*x()*x())+ p()*(u(6)-x()); /p()*(p(7)*p(8)*(p(8)*x()+*p(9)*p(7)*x()/u()+ % Angular velocity around y-axis Equations Initial Fit between Model and Data Tunn: measurement; Tjock: simulation p()*u())*u()-(p(9)-p())*x()*x())+ p()*(u(7)-x()); /p()*(p(7)*p(8)*(p()*x()+p()*x()*x()+ % Angular *p()*p(7)*x()/u()+p()*u()+ p()*u())*u()-(p()-p(9))*x()*x())+ p()*(u(8)-x()); (-p(8)*u()*(p()*x()+p()*u()))/(p()*u())- % Attack x()*x()+x()+p()*(u(9)/u()-x())+p(6)*x()ˆ; (-p(8)*u()*(p()*x()+p()*u()))/(p()*u())- % Slide angle x()+x()*x()+p()*(u()/u()-x()) ]; y y y y y [s] Adjusted Fit between Model and Data Semi-physical Models Tunn: measurement; Tjock: simulation y y y y Apply non-linear transformations to the measured data, so that the transformed data stand a better chance to describe the system in a linear relationship Rules: Only high-school physics and max minutes Simple examples: Another example: y [s]

A Solar Heated House Data 6 Storage Temperature (degree C) Fan OFF and ON Solar Panel Pump Pump Storage - hours Solar Intensity 8 6 y(t) : temperature in storage, I(t) : Solar intensity u(t) : Pump

4 A Solar Heated House Data 6 Storage Temperature (degree C) Fan OFF and ON Solar Panel Pump Pump Storage - hours Solar Intensity 8 6 y(t) : temperature in storage, I(t) : Solar intensity u(t) : Pump velocity Linear Model 7 Think 8 Suppose we had measured the temperature x(t) in the solar panel: x(t + ) x(t) = d I(t) d x(t) d x(t) u(t) y(t + ) y(t) = d x(t) u(t) d y(t) Eliminate x(t): y(t )u(t ) y(t) =( + d )y(t ) + ( d ) u(t ) + (d )( + d ) y(t )u(t ) + d d u(t ) I(t ) u(t ) d u(t )y(t ) + d ( + d )u(t )y(t ) tid (timmar) Reparameterize with θ being the coefficiente above, ignoring links between them Semi-physical Model 9 Neural Network Model tid (timmar) tid (timmar) Block-oriented Models Physical Modeling Semi-physical Modeling Block-oriented models Local Linear Models Building Blocks: Linear Dynamic System G(s) Nonlinear static function f(u)

Common Models Other Combinations Wiener Hammerstein Hammerstein- Wiener With the linear blocks parameterized as a linear dynamic system and the static blocks parameterized as a function ( curve ),

Hydraulic Crane Again Local Linear Models 6 Hammerstein model: Fit 76 % Non-linear systems are often handled by linearization around a working point The idea behind Local Linear Models is to deal

5 Common Models Other Combinations Wiener Hammerstein Hammerstein- Wiener With the linear blocks parameterized as a linear dynamic system and the static blocks parameterized as a function ( curve ), this gives a parameterization of the output as ŷ(t θ) = g(z t, θ) and the general approach of model fitting can be applied However, in this contexts many algorithmic variants have been suggested The Hydraulic Crane Again Local Linear Models 6 Hammerstein model: Fit 76 % Non-linear systems are often handled by linearization around a working point The idea behind Local Linear Models is to deal with the nonlinearities by selecting or averaging over relevant linearized models Example: Tank with inflow u and outflow y and level h: Equations: ḣ = h + u y = h Linearize around level h with corresponding flows u = y = h : ḣ = h (h h ) + (u u ) y = y + h (h h ) Tank Example, ctd 7 Data and Linear Model 8 Sampled data model around level h : Measured data: Linear Model (d = ) ŷ h (t) = θ T h ϕ(t) ϕ(t) = [ y(t T s ) u(t T s ) ] T θ h = [ γ α h β h ] T T s = sampling time Total model: select or average over these local predictions, computed at a grid of values of h Choices of weights w k : ŷ(t) = w k (h, h k )ŷ hk (t) k= Inflöde u Nivå h Local Linear Models 9 Local Linear Models: General Comments Two models (d=) Five models (d = ) Let the measured working point variable (tank level in example) be denoted by ρ(t) (sometimes called regime variable) If the regime variable is partitioned into d values ρ k, the predicted output will be ŷ(t) = w k (ρ(t), ρ k )ŷ (k) (t) k= If the prediction ŷ (k) (t) corresponding to ρ k is linear in the parameters, ŷ (k) (t) = ϕ T (t)θ (k) the whole model will be a linear regression

6 To Build a Local Linear Model Hybrid Models and LPV Models To build the model, we need to Select the regime variable ρ Decide the partition of the regime variable w k (ρ(t), η) Here η is a parameter that describes the partition Find the local models in each partition If the local models are linear regressions, the total model will be ŷ(t, θ, η) = w k (ρ(t), η)ϕ T (t)θ (k) k= is also an example of a hybrid model (piecewise linear) If the partition is to be estimated too, the problem is considerably more difficult ŷ(t, θ, η) = w k (ρ(t), η)ϕ T (t)θ (k) k= which for fixed η is a linear regression So called Linear Parameter Varying (LPV) are also closely related: ẋ = A(ρ(t))x + B(ρ(t))u y = C(ρ(t))x + D(ρ(t))u Experiment Design Input design Sparsity Local minima The design of inputs for non-linear models is considerable more difficult than for linear models For example, it is clear that a binary input will not work (Think of a Hammerstein model!) In addition to exciting all frequencies, an input for a general nonlinear model must also excite all amplitudes Gaussian noise (white or colored) would be an example of a signal that is generically exciting for nonlinear models Sparsity Sparsity: Some ideas 6 The basic problem: Non-linear surfaces in high dimensions can be very complicated and need support of many observed data points How to find parameterizations of such surfaces that both give a good chance of being close to the true system, and also use a moderate amount of parameters? The data cloud of observations is by necessity sparse in the surface space Using physical insight in grey-box models is one way to allow extrapolation and interpolation in the data space on physical grounds Hoping that most of the non-linear action takes place across hyperplanes or hyperspaces is another idea that will radically reduce the flexibility of the model In statistics the problem to find such subspaces is known as the index problem Finding projections S of dimension r m, r << m such that f(ϕ) = g(sϕ) captures most of nonlinearity consequently is an important problem The Ridge-based neural networks can be seen as one way to find several such hyperplanes that define the structure of the non-linear effects Local Minima 7 Conclusions Theme : Nonlinear Models 8 Adjusting a parameterized model structure to data typically is a non-convex problem and several local minima of the criterion function may exist This is one of the most pressing problem in non-linear identification, and calls for sophisticated initialization procedures In Neural Networks, some normalization is first applied to the data, and then a randomized initialization is made Typically one will have to try several initializations Wavenets use an initialization based on fixed location and dilation parameters, which gives a linear regression A nonlinear model can be seen as nonlinear mapping from past data to the space where the output lives: ŷ(t t ) = g(z t, t) Observations are then y(t) = ŷ(t t ) + e(t) Useful split of mapping: g(z t ) = g(ϕ(z t )) Non-parametric and Parametric methods Black-box and Grey-box parameterizations g(ϕ, θ) Black-box parameterizations usually employ one basic basis-function, that is scaled and located at different points For physical models, algebraic methods may produce linear regressions for initial estimates Block oriented models often employ several steps, fixing linear and/or nonlinear block to create smaller problems Grey-boxes can be based on (serious) physical modeling and on more leisurely semi-physical modeling Non-convexity of the optimization remains one of the more serious problems for most parametric methods

The Problem 2(47) Nonlinear System Identification: A Palette from Off-white to Pit-black. A Common Frame 4(47) This Presentation... 3(47) ˆθ = arg min

The Problem 2(47) Nonlinear System Identification: A Palette from Off-white to Pit-black. A Common Frame 4(47) This Presentation... 3(47) ˆθ = arg min y y y y y Tunn: measurement; Tjock: simulation 6 7 8 9.. 6 7 8 9.. 6 7 8 9 6 7 8 9 6 7 8 9 [s] 6 OUTPUT # 6 - INPUT # - 6.. -. OUTPUT # INPUT # - 6 The Problem (7) : A Palette from Off-white to Pit-black