AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

Size: px

Start display at page:

Download "AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET"

Tobias Cobb
6 years ago
Views:

The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control

Pitch rate, Canard, Elevator, Leading Edge Flap Aircraft Dynamics: From input y(t) pitch rate at time t u (t) canard angle at time t Try y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + b u

1 The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface angles affect the pitch rate? Aerodynamical derivatives? How to use the information in flight data? Pitch rate, Canard, Elevator, Leading Edge Flap Aircraft Dynamics: From input y(t) pitch rate at time t u (t) canard angle at time t Try y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + b u (t T) + b u (t T) + b u (t T) T = /s Using all inputs u canard angle; u Elevator angle; u Leading edge flap; y(t) = a y(t T) a y(t T) a y(t T) a y(t T) + b u (t T) + + b u (t T) + + b u (t T) + + b u (t T) T = /s Dashed line: Actual Pitch rate Solid line: step ahead predicted pitch rate, based on the fourth order model from canard angle only Lennart Ljung System Identification: Issues Course Outline: Themes Select a class of candidate models Select a member in this class using the observed data Evaluate the quality of the obtained model Design the experiment so that the model will be good The basic questions and (statistical) tools illustrated for a simple curve fitting problem Linear models: The model structures, Special techniques for linear models Time and frequency domain data Software session with hands-on experience onlinear models: Parameterizations, problems and techniques Some practical issues in system identification: Experiment design and data quality

2 ε 9 Background Terms onparametric methods Random (stochastic) variable, Expectation, Variance Independent random variables, random processes Law of Large umbers, Central Limit Theorem Curve Fitting Curve Fitting II: Several Regressors Most basic ideas from system identification, choice of model structures and model sizes are brought out by considering the basic curve fitting problem from elementary statistics Surface fitting : Unknown function g (x) For a sequence of x-values (regressors) {x, x,,x } (that may or may not be chosen by the user) observe the corresponding function values with some noise: y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The floor is formed by the regressors x, and the upright wall is the function value y = g (x) Φ The Curve-fitting problem y(k) = g (x k ) + e(k) Construct an estimate ĝ (x) from {y(), x, y(), x,,y(), x } The error ĝ (x) g (x) should be as small as possible Approaches: Parametric: Construct ĝ (x) by searching over a parameterized set of candidate functions on-parametric: Construct ĝ (x) by smoothing over (carefully chosen subsets of) y(k) onparametric methods Parametric Approach The Choices in the Parametric case Search for the function g in a parameterized family of functions: n g(x, ) = α k f k (x, k ), = {α k, k, k =,,n} Grey box/black box Local/Global basis functions Examples: g(x, ) = + x + + n x n + x + + n x n g(x, ) = + n+ x + + n+m x m n g(x, ) = + k cos(kπx) + k sin(kπx) g(x, ) = n α k U((x γ k )/β k ), U(x) unit pulse Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values

3 The Choices in the Parametric case Type of function family (Basis functions f k (x,)) Size of model (n or dim ) The parameter values onparametric methods 9 y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: Weighted Least Squares: ˆ = arg min V ()+δ ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /lambda t λ t Proportional to reliability of t:th measurement Ee (t) V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Focus in fit ) A extra weighting L(x t ) could also reflect the relevance of the point x t ( Fo- cus in fit ) y(t) = g (x t ) + e(t) y(t) = g (x t ) + e(t) Weighted Least Squares: (Regularized) Least squares: ˆ = arg min V ()+δ V () = L(x t ) y(t) g(x t, ) /λ t λ t Proportional to reliability of t:th measurement Ee (t) ˆ = arg min V () + δ V () = y(t) g(x t, ) A extra weighting L(x t ) could also reflect the relevance of the point x t δ penalizes excessive model flexibility Could come in various forms ( Focus in fit ) Why the Least Squares Criterion? Linear Least squares Gauss! Maximum Likelihood: ote that if the parameterization g(x, ) is linear in, the basic criterion becomes quadratic in, and the minimum can be found analytically: ˆ = arg min l(z, t) = log p(z, t), l(y(t) g(x t, ), t) p(z, t) is the probability density function (pdf) of e(t) g(x, ) = ϕ(x) T V () = (y(t) ϕ(x t ) T ) = Y Φ Y =col y(t), Φ = col ϕ(x t ) T Gaussian distribution p(z, t) e z /λt gives a quadratic criterion! ˆ = (Φ T Φ) Φ T Y Other choices min max t y(t) g(x t, ) ( unknown-but-bounded ) min y(t) g(x t, ) ǫ (ǫ-insensitive l norm, Support vector machines )

4 Choices in Parametric Methods So, the choice of parameters within a parameterized model is not that difficult: Fit to the observed data, by one criterion or another The choice of model size and model parameterization is a more interesting issue onparametric methods Choice of Model Size: Example Choice of Model Size: Example Observed data with true curve Example: Observed data with true curve Fit polynomials of different orders Fit polynomials of different orders The Model Fit 9 Model Curves The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Blue: True curve Green: nd order Red: th order Cyan: th order Fit Polynomial order The Model Fit The Model Fit The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set The value of the criterion as a function of polynomial order The fit between the true curve and the model curve The value of the criterion evaluated on a fresh data set Fit Fit Polynomial order Polynomial order

5 Choice of Size and Structure Testing the Model Parameterizations This is a more difficult choice, and we need to understand how the model error ĝ (x) g (x) depends on our choices Players: The fit for a certain data set Z: V (, Z) = (y(t) g(xt, )) Estimation (training) data Z e Validation (generalization) data Z v The empirical fit: V (ˆ, Z e ) = min V (, Z e ) (blue curve) Test by simulation: Monte-Carlo Do not get fooled by the empirical fit V (ˆ, Z e ) eed to understand how the empirical fit relates to H (ˆ ) Compute by calculations: Analysis Analytical Monte-Carlo : Assume certain properties of x k and e(k), the compute (if possible) the error The validation fit V (ˆ, Z v ) (black curve) The curve fit H(x, ) = g (x) g(x, ) For given x t -sequence H () = H(xt, ) H (ˆ ) was the red curve The expected (typical) value of H (ˆ ) would be a suitable goodness measure for the chosen parameterization Basic Tools for Analysis onparametric methods For a stationary stochastic process e( ) under mild conditions Law of large numbers (LL) lim e(t) = Ee(t) Central limit theorem (CLT) If e(t) has zero mean: e(t) converges in distribution to the normal (Gaussian) distribution with zero mean and variance λ = lim t,s= Ee(t)e(s) e(t) (, λ) Asymptotic Analysis: Probabilistic Setup Asymptotic Analysis: Basic Facts BIAS Analytical Monte-Carlo Experiment : For a given g ( ) and a given sequence x t collect the data y(t) = g (x t ) + e(t), Ee(t) = λ where the stochastic process e( ) obeys the LL and CLT and has variance λ Use a parameterization g(x, ) Form the estimate ˆ = arg min ĝ (x) = g(x, ˆ ) (y(t) g(x t, )) Except for very simple parameterizations g(x, ), the distribution of ˆ cannot be calculated (mainly due to arg min ) However its asymptotic distribution as can be established: E = averaging over x k : Ef(x) = lim f(x k) H() = lim H () = E g (x t ) g(x t, ) Best possible model in parameterization: = arg minh() If H( ) = we have a perfect curve fit, otherwise there be some bias in the curve fit Main Result: lim ˆ = Then ˆ and ĝ (x) are random variables with properties inherited from e What can be said about their distributions? System Identification: Curve Fitting Proof: Formal Calculations 9 Asymptotic Analysis: Basic Facts VARIACE (g(x t) + e(t) g(x t, )) V () = = (g(xt) g(x t, )) + e (t) + (g(xt) g(x t, ))e(t) LL: (g(xt) g(x t, ))e(t) (uniformly in!) so V () H() as Suppose the limit model is correct: g(x, ) g (x) and e white noise with variance λ: The asymptotic distribution of (ˆ ) is normal with zero mean and covariance matrix P = λ[eψ(t)ψ T (t)], ψ(t) = d d g(x t, ) Cov ˆ λ [Eψ(t)ψT (t)]

6 Proof: Formal Calculations A Remarkable Observation = V (ˆ ) = V ( ) + V ( )(ˆ ) (ˆ ) = [V ( )] V ( ) V () = (y(t) g(xt, ))g (x t, ) Recall the curve fit H(x, ) = g (x) g(x, ), H() = lim H(xt, ) (For the x-sequence of the estimation data) H(ˆ ) is a random variable, since the estimate depends on the e-sequence, and V ( ) = e(t)ψ(t) EH(ˆ ) = H( ) + λ d LL: V ( ) = ψ(t)ψ T (t) + e(t)g (x t, ) Eψψ T CLT: e(t)ψ(t) (, λeψψ T ) (ˆ ) (, λ[eψψ T ] ) where d is the number of estimated parameters independently of the parameterization! (Proof: ) Proof: Formal Calculations The Regularized Case g (x) = g(x, ) ( assumption ) H(ˆ ) = H( ) + H ( )(ˆ ) + (ˆ ) T H ( )(ˆ ) H ( ) = ( minimizes H()) H () = (g(x t) g(x t, ))g (x t, ) T H ( ) = g (x t, )g (x t, ) T = Eψ(t)ψ T (t) The variance is reduced by regularization, at the price of some bias In the previous result, the number of parameters d is replaced by d eff : Effective dimension of umber of eigenvalues of the Hessian of V that are larger than δ (the regularization parameter) ote: d eff d = dim EH(ˆ ) = H( ) + E (ˆ ) T H ( )(ˆ ) Etr [ (ˆ ) T H ( )(ˆ )] = Etr [ H ( )(ˆ )(ˆ ) T ] = tr H ( )Cov ˆ = λ tr [ (Eψψ T )(Eψψ T ) ] = d λ Choice of Size: Basic Trade-off onparametric methods H() = lim t g (x t ) g(x t, ), EH(ˆ ) H( ) + λ d A good model size is one that minimizes this expression H( ) is the best possible fit that can be achieved within the parameterization A smaller value of this means less bias Thus, more parameters gives a more flexible model parameterization and hence less bias More parameters lead however to higher variance The model size is thus a bias variance trade-off ote that this balance is usually reached with a non-zero H( ), that is, it is normal to accept bias Also a larger size model can be used when more data are available (larger ) If a regularized criterion is used, the size of the regularization parameter δ can also be used to control the flexibility of the parametrization Choice of Type (basis functions) Generally speaking, the parameterization should be such that useful flexibility is achieved with as few parameters as possible: Grey box models Tunable or on-tunable Basis functions: g(x, ) = n α kf k (x,) + More flexible structure = Less parameters More work to minimize (non-tunable = Linear Least Squares) Use (number of parameters) d or (regularization parameter) δ as a size-tuning knob When no natural ordering of structures: Easier to use δ onparametric methods

7 on-parametric Methods 9 Example A simple idea is to locally smooth the noisy observations of the function values: ĝ (x) = C(x, x k )y(k) C(x, x k ) = x Often C(x, x k ) = c(x x k )/λ k and c(r) = for r > β, β = the bandwidth These are known as kernel methods in statistics If C(x, x t ) is chosen so that it is non-zero (= /k) only for k observed values x t around x, this is the k-nearest neighbor method C(x, x k ) = U((x x k )/β); U( ) the unit pulse β = Cyan dots: Computed for x = : : System Identification: Curve Fitting Analysis of on-parametric Methods The Trade-off ε (x) = ĝ (x) g (x) = C(x, x k )y(k) g (x) = C(x, x k )(e(k) + [g (x k ) g (x)]) Eε (x) = C(x, x k )[g (x k ) g (x)] Var ε (x) = C (x, x k )λ (for white e with variance λ) Think of C(x, x k ) = U((x x k )/β) where U is the unit pulse: MSE: H(x) = { C(x, x k ) = if x x k β k else k = number of x k in the bin x x k β [ C (x, x k )λ + C(x, x k )[g (x k ) g (x)]] k λ + variation of g (x) over x x k β Trade-off: Want β to be small for small bias Want β to be large for small variance The best choice depends on the nature of g Parametric and onparametric Methods Summary Theme Consider the parametric method using unit pulses U(x): g(x, ) = n k U((x γ k )/β) n (y(t) g(x t, )) = n t: xt γk <β β and γ k given γ k γ k = β t: xt γk <β (y(t) k ) ˆ k = k (y(t) g(x t, )) = t: xt γk <β y(t) This means that ĝ(γ k ) = ˆ k If we use a nonparametric method to estimate g at x = γ k with C(x, x k ) = k U((x x k)/β) we obtain the same estimate We have used the simple case of curve-fitting to illustrate basic issues, frameworks and techniques for linear and nonlinear system identification Parametric onparametric methods Choice of model parametrization, model size and parameter values Parameter values easy: Some version of least squares fit Basic asymptotic properties: ˆ, best possible approximation available in the parameterization (for the used x t -sequence) (ˆ ) (, P), P = λ[eψ(t)ψ T (t)] (ormal distribution) Choice of parametric model structure guided by bias-variance trade off (number of parameters) Choice of nonparametric method guided by bias-variance trade off (band-with of the kernel)

Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42)

Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42) Outline 2(42) Sysid Course T1 2016 An Overview. Automatic Control, SY, Linköpings Universitet An Umbrella Contribution for the aterial in the Course The classic, conventional System dentification Setup