System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8.
Statistical basis for estimation methods Parametrised models: G Gp, zq, H Hp, zq (pulse response, ARX, ARMAX,......, state-space) Estimation ˆ argmin Jp, Z q, (Z : finite-length measured noisy data) Examples: Least squares (linear regression) Prediction error methods Correlation methods How do the statistical properties of the data (i.e. noise effects) influence our choice of methods and our results? 8--8.3 Maximum likelihood estimation Basic formulation Consider observations, z,..., z. Each is a realisation of a random variable, with joint probability distribution, fp looooomooooon x,..., x ; random variables q ÐÝ family of distributions parametrised by. Another common notation is, fpx,..., x q ÐÝ the pdf for x,..., x given. For independent variables, ź fpx,..., x ; q f px ; qf px ; q f px ; q f i px i ; q i 8--8.4
Maximum likelihood estimation Likelihood function Substituting the observation, Z tz,..., z u, gives a function of, ˇ Lpq fpx,..., x ; q (Likelihood function) ˇxi z i,i,...,. Maximum likelihood estimator: ˆ ML argmax Lpq. The value chosen for is the one that gives the most agreement with the observation. 8--8.5 Maximum likelihood estimation Estimating the mean of a Gaussian distribution (σ.5).75 fpx; q px q? e σ πσ.5.5 8 x 6 4 x 4 6 8 8--8.6
Maximum likelihood estimation Estimating the mean of a Gaussian distribution (σ.5) Datum: z 7..75 fpx; q px q? e σ πσ.5 ˆ ML 7..5 Lpq fpz; q 8 x 6 4 x 4 6 8 8--8.7 Maximum likelihood estimation Log-likelihood function It is often mathematically easier to consider, ˆ ML argmax ln Lpq. As the ln function is monotonic this gives the same. This is typically the natural logarithm so as to be able to handle the exponentiation in typical pdfs. 8--8.8
Example Estimation of the mean of a set of samples z i, i,..., z i N p, σ i q. (note: different variances) Sample mean estimate: ˆSM ÿ i Probability density functions (pdf): is the common mean of the distributions. f i px i ; q a exp ˆ px i q πσ i σi For independent samples the joint pdf is: fpx,..., x ; q ź i a πσ i z i exp ˆ px i q σ i 8--8.9 Example Estimation of the mean of a set of samples ˇ ML argmax ln fpx,..., x ; q argmax argmax ln Lpq lnpπq ÿ i This gives (differentiate and equate to zero), ˆ ML ÿ ÿ z i σ i i σ i i ˇxi z i,i,...,. lnpσ i q ÿ i pz i q σ i 8--8.
Bayesian approach Random parameter framework Consider to be a random variable with pdf: f pxq. This is an a priori distribution (assumed before the experiment). Conditional distribution (inference from the experiment) Our model (plus assumptions) gives a conditional distribution, fpx,..., x q On the basis of the experiment (x i z i ), So, Probp z,..., z q ProbpZ q Probpq ProbpZ q argmax fp z,..., z q argmax fpz qf pq 8--8. Maximum a posteriori (MAP) estimation Estimator Given data, Z, ˆ MAP argmax fpz qf pq. We can interpret the maximum likelihood estimator as, ˇ ML argmax fpx,..., x ; q ˇxi z i,i,...,. argmax fpz q These estimates coincide if we assume a uniform distribution for. 8--8.
MAP estimation A priori parameter distribution f pq a e p aq σ, a 5, σ πσ. a 5.4.3 f pq. a σ a. 4 5 6 8 8--8.3 MAP estimation Estimating the mean: Gaussian distribution (σ.5, a 5, σ a ).5 fpx; qf pq px q? e σ a e p aq σ πσ πσ.. x 8 x 6 4 4 6 8 8--8.4
MAP estimation Estimating the mean: Gaussian distribution (σ.5, a 5, σ a ) Datum: z 7..5 fpx; qf pq px q? e σ a e p aq σ πσ πσ.. ˆ MAP 6.33 x fpz; qf pq 8 x 6 4 4 6 8 8--8.5 Cramér-Rao bound Mean-square error matrix * P E " ˆpZ q ˆpZ q T Assume that the pdf for Z is fpz ; q. Cramér-Rao inequality Assume EtˆpZ qu, and Z Ă R. Then, P ě M (M is the Fischer Information Matrix) #ˆ ˆ T d d M E d ln fpz ; q d ln fpz ; q +ˇˇˇˇ ˇ E " d d ln fpz ; q*ˇˇˇˇ 8--8.6
Maximum likelihood: statistical properties Asymptotic results for i.i.d. variables Consider a parametrised family of pdfs, Then, fpx,..., x ; q w.p. lim ˆ ML ÝÑ, ÝÑ8 ź f i px i ; q. i and? lim ˆML pz q ÝÑ8 N `, M. 8--8.7 Prediction error statistics Prediction error framework ɛpk, q ypkq ŷpk, q Assume that ɛpk, q is i.i.d. with pdf: f ɛ px; q. For example: ARX case, ɛpk, q N p, σ q. Joint pdf for prediction: ź fpx ; q f ɛ pɛpk, q; q k 8--8.8
Prediction error statistics Maximum likelihood approach ˆ ML argmax fpx ; q X Z argmax Lpq argmax argmax ln fpz q ÿ k ln f ɛ pɛpk, q; q. If we choose the prediction error cost function as, then, lpɛ, q ln f ɛ pɛ; q, ˆ PE argmin ÿ k lpɛpk, q, q ˆ ML 8--8.9 Prediction error statistics Example Gaussian noise case, ɛpkq N p, σ q. lpɛpk, q, q ln f ɛ pɛ; q constant ` ln σ ` ɛpk, q σ If σ is constant (and not a parameter to be estimated) then, ˆ ML argmax Lpq argmin ÿ k lpɛpk, q, q argmin }ɛpk, q} ˆ PE 8--8.
Prediction error statistics Example If we have a linear predictor, and independent gaussian noise, then, ˆ argmin }ɛpk, q}, Is a linear, least-squares problem; ÿ Is equivalent to minimizing ln f ɛ pɛ; q; k Is equivalent to a maximum likelihood estimation; Gives (asymptotically) the minimum variance parameter estimates. 8--8. Linear regression statistics One-step ahead predictor ŷpk q ϕ T pkq ` µpkq In the ARX case µpkq epkq. In other special cases µpkq can depend on Z. Prediction error: ɛpkq ypkq ϕ T pkq A typical cost function is: Jp, Z q ÿ k ɛpkq Least-squares criterion: ÿ ˆ LS ϕpkqϕ T pkq looooooooooooooomooooooooooooooon k R P Rdˆd ÿ ϕpkqypkq looooooooomooooooooon k f P R d 8--8.
Linear regression statistics vpkq ypkq Ap, zq ` Bp, zq upkq Least-squares estimator properties The least-squares estimate can be expressed as, ˆ LS R f True plant: ypkq ϕ T pkq ` vpkq Asymptotic bias: lim ˆ LS ÝÑ8 R E lim R ÝÑ8 ÿ k! ) ϕpkqϕ T pkq, f E tϕpkqvpkqu. ϕ T pkqvpkq pr q f. 8--8.3 Linear regression statistics Consistency of the LS estimator For consistency, lim ˆ LS, ÝÑ8 we require, pr q f. So,. R must be non-singular. Persistency of excitation requirement.. f E tϕpkqvpkqu. This happens if either: a. vpkq is zero-mean and independent of ϕpkq; or b. upkq is independent of vpkq and G is FIR. (n ). This gives,? lim ˆLS ÝÑ8 N `, σ pr q. 8--8.4
Correlation methods Ideal prediction error estimator ypkq ŷpk k q ɛpkq epkq loomoon ideally The sequence of prediction errors, tepkq, k, u, is white. If the estimator is optimal ( ) then the prediction errors contain no further information about the process. Another intrepretation: the prediction errors, ɛpkq, are uncorrelated with the experimental data, Z. 8--8.5 Correlation methods Approach Select a sequence, ζpkq, derived from the past data, Z. Require that the error, ɛpk, q, is uncorrelated with ζpkq, ÿ k ζpkqɛpk, q (could also use a function, αpɛq ) We can view the ID problem as finding such that this relationship is satisfied. The values, ζpkq, are known as instruments. Typically ζpkq P R dˆn y, where P R d, ypkq P R n y. 8--8.6
Correlation methods Procedure Choose a linear filter, F pzq for the prediction errors, ɛ F pk, q F pzqɛpk, q (this is optional). Choose a sequence of correlation vectors, ζpk, Z, q constructed from the data (and possibly ). Choose a function αpɛq (default is αpɛq ɛ). Then, ˆ, solving f p, Z q ÿ k ζpk, qαpɛpk, qq. 8--8.7 Pseudo-linear regressions Regression-based one-step ahead predictors For ARX, ARMAX, etc., model structures we can write the predictor, ŷpk q ϕ T pk, q. We previously solved this via LS (or iterative LS, or optimisation) methods. Correlation based solution ˆ PLR solving ÿ k ϕpk, qp ypkq ϕ T pk, q looooooooomooooooooon prediction error The prediction errors are orthogonal to the regressor, ϕpk, q. q. 8--8.8
Instrumental variable methods Instrumental variables ˆ IV solving ÿ k ζpk, qpypkq ϕ T pk, qq. This is solved by, ˆ IV ÿ k ζpkqϕ T pkq ÿ k ζpkqypkq. So, for consistency we require,! ) E ζpkqϕ T pkq to be nonsingular, and E tζpkqvpkqu (uncorrelated w.r.t. prediction error) 8--8.9 Example ARX model ypkq`a ypk q` `a n ypk nq b upk q` `b m upk mq`vpkq One approach: filtered input signals as instruments vpkq ypkq Ap, zq ` Bp, zq upkq xpkq P pzq Qpzq xpkq ` q xpk q ` ` q n xpk nq p upk q ` ` p m upk mq 8--8.3
Instrumental variable example vpkq ypkq Ap, zq ` Bp, zq upkq xpkq P pzq Qpzq Here, ζpkq xpk q... xpk nq upk q... upk mq R ÿ k ζpkqϕ T pkq is required to be invertible, and we also need, # + ÿ E ζpkqvpkq k. 8--8.3 Instrumental variable example Invertibility of R? y Bpzq Apzq u ` Apzq v x P pzq Qpzq u So, ζpkqϕ T pkq has the form, ζpkqϕ T pkq x k u k P uk Q u k P uk Q u k j y k u k j ` B uk A ` vk A j B uk A u k loooooooooooooooomoooooooooooooooon invertible? ` u k P j uk s Q vk A looooooooooooomooooooooooooon vanishing?pýñ q 8--8.3
Instrumental variable example y Bpzq Apzq u ` Apzq v, x P pzq Qpzq u» P pzq ζpkqϕ T pkq uk Qpzq u k This will be invertible if: vpkq and upkq are uncorrelated. fi fl j Bpzq uk u k Apzq» P pzq ` uk Qpzq upkq and xpkq P pzq upkq are sufficiently exciting. Qpzq There are no pole/zero cancellations between P pzq Bpzq and Qpzq Apzq. fi fl j vk Apzq 8--8.33 Instrumental variable approach A nonlinear estimation problem vpkq ypkq ` Bp, zq Ap, zq upkq xpkq P pzq Qpzq Choosing P pzq and Qpzq The procedure works well when P pzq «Bpzq and Qpzq «Apzq. Approach:. Estimate ˆ LS via linear regression.. Select Qpzq ÂLSpzq and P pzq ˆB LS pzq. 3. Calculate ˆ IV. 8--8.34
Instrumental variable approach Considerations Variance and MSE depend on the choice of instruments. Consistency (asymptotically unbiased) is lost if: Noise and instruments are correlated (for example, in closed-loop, generating instruments from u). Model order selection is incorrect. Filter dynamics cancel plant dynamics. True system is not in the model set. Closed-loop approaches: generate instruments from the excitation, r. 8--8.35 Bibliography Prediction error minimization Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [sections 7., 7. & 7.3]. Parameter estimation statistics Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [section 7.4]. Correlation and instrumental variable methods Lennart Ljung, System Identification;Theory for the User, nd Ed., Prentice-Hall, 999, [sections 7.5 & 7.6]. 8--8.36