System Identification

Similar documents
2 Statistical Estimation: Basic Concepts

EECE Adaptive Control

System Identification

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation

Basic concepts in estimation

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Identification of ARX, OE, FIR models with the least squares method

ECE531 Lecture 10b: Maximum Likelihood Estimation

Estimators as Random Variables

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Advanced Signal Processing Introduction to Estimation Theory

EIE6207: Estimation Theory

System Identification, Lecture 4

Estimation techniques

System Identification, Lecture 4

LTI Systems, Additive Noise, and Order Estimation

6.867 Machine Learning

Mathematical statistics

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science

STAT 100C: Linear models

f-domain expression for the limit model Combine: 5.12 Approximate Modelling What can be said about H(q, θ) G(q, θ ) H(q, θ ) with

Modern Methods of Data Analysis - WS 07/08

12. Prediction Error Methods (PEM)

EIE6207: Maximum-Likelihood and Bayesian Estimation

6.867 Machine Learning

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Lecture Note 2: Estimation and Information Theory

Model structure. Lecture Note #3 (Chap.6) Identification of time series model. ARMAX Models and Difference Equations

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Outline lecture 2 2(30)

Lecture 5 September 19

Detection & Estimation Lecture 1

Machine Learning Basics: Maximum Likelihood Estimation

Bayesian Learning (II)

Chapter 1. Basics. 1.1 Definition. A time series (or stochastic process) is a function Xpt, ωq such that for

STAT 730 Chapter 4: Estimation

Further Results on Model Structure Validation for Closed Loop System Identification

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Advanced Process Control Tutorial Problem Set 2 Development of Control Relevant Models through System Identification

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

DESIGNING A KALMAN FILTER WHEN NO NOISE COVARIANCE INFORMATION IS AVAILABLE. Robert Bos,1 Xavier Bombois Paul M. J. Van den Hof

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

F & B Approaches to a simple model

Mathematical statistics

Detection & Estimation Lecture 1

Expressions for the covariance matrix of covariance data

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Signal detection theory

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

ADAPTIVE FILTER THEORY

p(z)

Brief Review on Estimation Theory

6.435, System Identification

Inference and estimation in probabilistic time series models

Statistically-Based Regularization Parameter Estimation for Large Scale Problems

Probability Generating Functions

Density Estimation: ML, MAP, Bayesian estimation

ESTIMATION ALGORITHMS

Rowan University Department of Electrical and Computer Engineering

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Nonlinear System Identification Using MLP Dr.-Ing. Sudchai Boonto

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Prediction Error Methods - Torsten Söderström

Introduction to Simple Linear Regression

From Bayes to Extended Kalman Filter

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Modeling and Identification of Dynamic Systems (vimmd312, 2018)

ECON 4160, Autumn term Lecture 1

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

y k = ( ) x k + v k. w q wk i 0 0 wk

EL1820 Modeling of Dynamical Systems

Outline Lecture 2 2(32)

9 Multi-Model State Estimation

CS 195-5: Machine Learning Problem Set 1

Advanced Signal Processing Minimum Variance Unbiased Estimation (MVU)

Least Squares Regression

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

On Identification of Cascade Systems 1

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

PARAMETER ESTIMATION AND ORDER SELECTION FOR LINEAR REGRESSION PROBLEMS. Yngve Selén and Erik G. Larsson

6.4 Kalman Filter Equations

Computer Vision & Digital Image Processing

Estimation and Detection

Introduction to Estimation Methods for Time Series models Lecture 2

Frequency estimation by DFT interpolation: A comparison of methods

Lecture 9: PGM Learning

Least Squares Regression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Introduction to Bayesian Learning. Machine Learning Fall 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Transcription:

System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8.

Statistical basis for estimation methods Parametrised models: G Gp, zq, H Hp, zq (pulse response, ARX, ARMAX,......, state-space) Estimation ˆ argmin Jp, Z q, (Z : finite-length measured noisy data) Examples: Least squares (linear regression) Prediction error methods Correlation methods How do the statistical properties of the data (i.e. noise effects) influence our choice of methods and our results? 8--8.3 Maximum likelihood estimation Basic formulation Consider observations, z,..., z. Each is a realisation of a random variable, with joint probability distribution, fp looooomooooon x,..., x ; random variables q ÐÝ family of distributions parametrised by. Another common notation is, fpx,..., x q ÐÝ the pdf for x,..., x given. For independent variables, ź fpx,..., x ; q f px ; qf px ; q f px ; q f i px i ; q i 8--8.4

Maximum likelihood estimation Likelihood function Substituting the observation, Z tz,..., z u, gives a function of, ˇ Lpq fpx,..., x ; q (Likelihood function) ˇxi z i,i,...,. Maximum likelihood estimator: ˆ ML argmax Lpq. The value chosen for is the one that gives the most agreement with the observation. 8--8.5 Maximum likelihood estimation Estimating the mean of a Gaussian distribution (σ.5).75 fpx; q px q? e σ πσ.5.5 8 x 6 4 x 4 6 8 8--8.6

Maximum likelihood estimation Estimating the mean of a Gaussian distribution (σ.5) Datum: z 7..75 fpx; q px q? e σ πσ.5 ˆ ML 7..5 Lpq fpz; q 8 x 6 4 x 4 6 8 8--8.7 Maximum likelihood estimation Log-likelihood function It is often mathematically easier to consider, ˆ ML argmax ln Lpq. As the ln function is monotonic this gives the same. This is typically the natural logarithm so as to be able to handle the exponentiation in typical pdfs. 8--8.8

Example Estimation of the mean of a set of samples z i, i,..., z i N p, σ i q. (note: different variances) Sample mean estimate: ˆSM ÿ i Probability density functions (pdf): is the common mean of the distributions. f i px i ; q a exp ˆ px i q πσ i σi For independent samples the joint pdf is: fpx,..., x ; q ź i a πσ i z i exp ˆ px i q σ i 8--8.9 Example Estimation of the mean of a set of samples ˇ ML argmax ln fpx,..., x ; q argmax argmax ln Lpq lnpπq ÿ i This gives (differentiate and equate to zero), ˆ ML ÿ ÿ z i σ i i σ i i ˇxi z i,i,...,. lnpσ i q ÿ i pz i q σ i 8--8.

Bayesian approach Random parameter framework Consider to be a random variable with pdf: f pxq. This is an a priori distribution (assumed before the experiment). Conditional distribution (inference from the experiment) Our model (plus assumptions) gives a conditional distribution, fpx,..., x q On the basis of the experiment (x i z i ), So, Probp z,..., z q ProbpZ q Probpq ProbpZ q argmax fp z,..., z q argmax fpz qf pq 8--8. Maximum a posteriori (MAP) estimation Estimator Given data, Z, ˆ MAP argmax fpz qf pq. We can interpret the maximum likelihood estimator as, ˇ ML argmax fpx,..., x ; q ˇxi z i,i,...,. argmax fpz q These estimates coincide if we assume a uniform distribution for. 8--8.

MAP estimation A priori parameter distribution f pq a e p aq σ, a 5, σ πσ. a 5.4.3 f pq. a σ a. 4 5 6 8 8--8.3 MAP estimation Estimating the mean: Gaussian distribution (σ.5, a 5, σ a ).5 fpx; qf pq px q? e σ a e p aq σ πσ πσ.. x 8 x 6 4 4 6 8 8--8.4

MAP estimation Estimating the mean: Gaussian distribution (σ.5, a 5, σ a ) Datum: z 7..5 fpx; qf pq px q? e σ a e p aq σ πσ πσ.. ˆ MAP 6.33 x fpz; qf pq 8 x 6 4 4 6 8 8--8.5 Cramér-Rao bound Mean-square error matrix * P E " ˆpZ q ˆpZ q T Assume that the pdf for Z is fpz ; q. Cramér-Rao inequality Assume EtˆpZ qu, and Z Ă R. Then, P ě M (M is the Fischer Information Matrix) #ˆ ˆ T d d M E d ln fpz ; q d ln fpz ; q +ˇˇˇˇ ˇ E " d d ln fpz ; q*ˇˇˇˇ 8--8.6

Maximum likelihood: statistical properties Asymptotic results for i.i.d. variables Consider a parametrised family of pdfs, Then, fpx,..., x ; q w.p. lim ˆ ML ÝÑ, ÝÑ8 ź f i px i ; q. i and? lim ˆML pz q ÝÑ8 N `, M. 8--8.7 Prediction error statistics Prediction error framework ɛpk, q ypkq ŷpk, q Assume that ɛpk, q is i.i.d. with pdf: f ɛ px; q. For example: ARX case, ɛpk, q N p, σ q. Joint pdf for prediction: ź fpx ; q f ɛ pɛpk, q; q k 8--8.8

Prediction error statistics Maximum likelihood approach ˆ ML argmax fpx ; q X Z argmax Lpq argmax argmax ln fpz q ÿ k ln f ɛ pɛpk, q; q. If we choose the prediction error cost function as, then, lpɛ, q ln f ɛ pɛ; q, ˆ PE argmin ÿ k lpɛpk, q, q ˆ ML 8--8.9 Prediction error statistics Example Gaussian noise case, ɛpkq N p, σ q. lpɛpk, q, q ln f ɛ pɛ; q constant ` ln σ ` ɛpk, q σ If σ is constant (and not a parameter to be estimated) then, ˆ ML argmax Lpq argmin ÿ k lpɛpk, q, q argmin }ɛpk, q} ˆ PE 8--8.

Prediction error statistics Example If we have a linear predictor, and independent gaussian noise, then, ˆ argmin }ɛpk, q}, Is a linear, least-squares problem; ÿ Is equivalent to minimizing ln f ɛ pɛ; q; k Is equivalent to a maximum likelihood estimation; Gives (asymptotically) the minimum variance parameter estimates. 8--8. Linear regression statistics One-step ahead predictor ŷpk q ϕ T pkq ` µpkq In the ARX case µpkq epkq. In other special cases µpkq can depend on Z. Prediction error: ɛpkq ypkq ϕ T pkq A typical cost function is: Jp, Z q ÿ k ɛpkq Least-squares criterion: ÿ ˆ LS ϕpkqϕ T pkq looooooooooooooomooooooooooooooon k R P Rdˆd ÿ ϕpkqypkq looooooooomooooooooon k f P R d 8--8.

Linear regression statistics vpkq ypkq Ap, zq ` Bp, zq upkq Least-squares estimator properties The least-squares estimate can be expressed as, ˆ LS R f True plant: ypkq ϕ T pkq ` vpkq Asymptotic bias: lim ˆ LS ÝÑ8 R E lim R ÝÑ8 ÿ k! ) ϕpkqϕ T pkq, f E tϕpkqvpkqu. ϕ T pkqvpkq pr q f. 8--8.3 Linear regression statistics Consistency of the LS estimator For consistency, lim ˆ LS, ÝÑ8 we require, pr q f. So,. R must be non-singular. Persistency of excitation requirement.. f E tϕpkqvpkqu. This happens if either: a. vpkq is zero-mean and independent of ϕpkq; or b. upkq is independent of vpkq and G is FIR. (n ). This gives,? lim ˆLS ÝÑ8 N `, σ pr q. 8--8.4

Correlation methods Ideal prediction error estimator ypkq ŷpk k q ɛpkq epkq loomoon ideally The sequence of prediction errors, tepkq, k, u, is white. If the estimator is optimal ( ) then the prediction errors contain no further information about the process. Another intrepretation: the prediction errors, ɛpkq, are uncorrelated with the experimental data, Z. 8--8.5 Correlation methods Approach Select a sequence, ζpkq, derived from the past data, Z. Require that the error, ɛpk, q, is uncorrelated with ζpkq, ÿ k ζpkqɛpk, q (could also use a function, αpɛq ) We can view the ID problem as finding such that this relationship is satisfied. The values, ζpkq, are known as instruments. Typically ζpkq P R dˆn y, where P R d, ypkq P R n y. 8--8.6

Correlation methods Procedure Choose a linear filter, F pzq for the prediction errors, ɛ F pk, q F pzqɛpk, q (this is optional). Choose a sequence of correlation vectors, ζpk, Z, q constructed from the data (and possibly ). Choose a function αpɛq (default is αpɛq ɛ). Then, ˆ, solving f p, Z q ÿ k ζpk, qαpɛpk, qq. 8--8.7 Pseudo-linear regressions Regression-based one-step ahead predictors For ARX, ARMAX, etc., model structures we can write the predictor, ŷpk q ϕ T pk, q. We previously solved this via LS (or iterative LS, or optimisation) methods. Correlation based solution ˆ PLR solving ÿ k ϕpk, qp ypkq ϕ T pk, q looooooooomooooooooon prediction error The prediction errors are orthogonal to the regressor, ϕpk, q. q. 8--8.8

Instrumental variable methods Instrumental variables ˆ IV solving ÿ k ζpk, qpypkq ϕ T pk, qq. This is solved by, ˆ IV ÿ k ζpkqϕ T pkq ÿ k ζpkqypkq. So, for consistency we require,! ) E ζpkqϕ T pkq to be nonsingular, and E tζpkqvpkqu (uncorrelated w.r.t. prediction error) 8--8.9 Example ARX model ypkq`a ypk q` `a n ypk nq b upk q` `b m upk mq`vpkq One approach: filtered input signals as instruments vpkq ypkq Ap, zq ` Bp, zq upkq xpkq P pzq Qpzq xpkq ` q xpk q ` ` q n xpk nq p upk q ` ` p m upk mq 8--8.3

Instrumental variable example vpkq ypkq Ap, zq ` Bp, zq upkq xpkq P pzq Qpzq Here, ζpkq xpk q... xpk nq upk q... upk mq R ÿ k ζpkqϕ T pkq is required to be invertible, and we also need, # + ÿ E ζpkqvpkq k. 8--8.3 Instrumental variable example Invertibility of R? y Bpzq Apzq u ` Apzq v x P pzq Qpzq u So, ζpkqϕ T pkq has the form, ζpkqϕ T pkq x k u k P uk Q u k P uk Q u k j y k u k j ` B uk A ` vk A j B uk A u k loooooooooooooooomoooooooooooooooon invertible? ` u k P j uk s Q vk A looooooooooooomooooooooooooon vanishing?pýñ q 8--8.3

Instrumental variable example y Bpzq Apzq u ` Apzq v, x P pzq Qpzq u» P pzq ζpkqϕ T pkq uk Qpzq u k This will be invertible if: vpkq and upkq are uncorrelated. fi fl j Bpzq uk u k Apzq» P pzq ` uk Qpzq upkq and xpkq P pzq upkq are sufficiently exciting. Qpzq There are no pole/zero cancellations between P pzq Bpzq and Qpzq Apzq. fi fl j vk Apzq 8--8.33 Instrumental variable approach A nonlinear estimation problem vpkq ypkq ` Bp, zq Ap, zq upkq xpkq P pzq Qpzq Choosing P pzq and Qpzq The procedure works well when P pzq «Bpzq and Qpzq «Apzq. Approach:. Estimate ˆ LS via linear regression.. Select Qpzq ÂLSpzq and P pzq ˆB LS pzq. 3. Calculate ˆ IV. 8--8.34

Instrumental variable approach Considerations Variance and MSE depend on the choice of instruments. Consistency (asymptotically unbiased) is lost if: Noise and instruments are correlated (for example, in closed-loop, generating instruments from u). Model order selection is incorrect. Filter dynamics cancel plant dynamics. True system is not in the model set. Closed-loop approaches: generate instruments from the excitation, r. 8--8.35 Bibliography Prediction error minimization Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [sections 7., 7. & 7.3]. Parameter estimation statistics Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [section 7.4]. Correlation and instrumental variable methods Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [sections 7.5 & 7.6]. 8--8.36