Parameter Estimation in a Moving Horizon Perspective

Similar documents
Using Multiple Kernel-based Regularization for Linear System Identification

System Identification: An Inverse Problem in Control

State Smoothing by Sum-of-Norms Regularization

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

TSRT14: Sensor Fusion Lecture 8

ML Estimation of Process Noise Variance in Dynamic Systems

13. Parameter Estimation. ECE 830, Spring 2014

Outline lecture 2 2(30)

Multiple Bits Distributed Moving Horizon State Estimation for Wireless Sensor Networks. Ji an Luo

An introduction to particle filters

9 Multi-Model State Estimation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

The Problem 2(47) Nonlinear System Identification: A Palette from Off-white to Pit-black. A Common Frame 4(47) This Presentation... 3(47) ˆθ = arg min

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

14 - Gaussian Stochastic Processes

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Lecture Outline. Target Tracking: Lecture 3 Maneuvering Target Tracking Issues. Maneuver Illustration. Maneuver Illustration. Maneuver Detection

Ch. 12 Linear Bayesian Estimators

Data assimilation with and without a model

Machine Learning 4771

Identification of Non-linear Dynamical Systems

Outline. What Can Regularization Offer for Estimation of Dynamical Systems? State-of-the-Art System Identification

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Handling parametric and non-parametric additive faults in LTV Systems

FUNDAMENTAL FILTERING LIMITATIONS IN LINEAR NON-GAUSSIAN SYSTEMS

Outline 2(42) Sysid Course VT An Overview. Data from Gripen 4(42) An Introductory Example 2,530 3(42)

Outline Lecture 2 2(32)

GAUSSIAN PROCESS REGRESSION

Gaussian with mean ( µ ) and standard deviation ( σ)

Data assimilation with and without a model

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

The Expectation-Maximization Algorithm

IN particle filter (PF) applications, knowledge of the computational

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Estimation theory and information geometry based on denoising

4. DATA ASSIMILATION FUNDAMENTALS

Gaussian processes for inference in stochastic differential equations

Bayesian Methods in Positioning Applications

1 Kalman Filter Introduction

2 Statistical Estimation: Basic Concepts

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

A Comparison of the EKF, SPKF, and the Bayes Filter for Landmark-Based Localization

NONUNIFORM SAMPLING FOR DETECTION OF ABRUPT CHANGES*

Calibration of a magnetometer in combination with inertial sensors

TSRT14: Sensor Fusion Lecture 9

Kalman Filter and Parameter Identification. Florian Herzog

Statistical Filtering and Control for AI and Robotics. Part II. Linear methods for regression & Kalman filtering

MMSE-Based Filtering for Linear and Nonlinear Systems in the Presence of Non-Gaussian System and Measurement Noise

EECE Adaptive Control

X t = a t + r t, (7.1)

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Probabilistic Graphical Models

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Factor Analysis and Kalman Filtering (11/2/04)

F & B Approaches to a simple model

6.4 Kalman Filter Equations

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

11. Further Issues in Using OLS with TS Data

Linear Dynamical Systems

Ergodicity in data assimilation methods

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Decentralized and distributed control

Expectation Propagation in Dynamical Systems

(Extended) Kalman Filter

Exam in Automatic Control II Reglerteknik II 5hp (1RT495)

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Introduction to the regression problem. Luca Martino

Computer Intensive Methods in Mathematical Statistics

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Detection and Estimation Theory

Temporal-Difference Q-learning in Active Fault Diagnosis

Dynamic measurement: application of system identification in metrology

Lecture 6: Gaussian Mixture Models (GMM)

Recursive Estimation

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming

Introduction to Probabilistic Graphical Models: Exercises

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

CS281 Section 4: Factor Analysis and PCA

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Dimension Reduction. David M. Blei. April 23, 2012

Gaussian Processes for Machine Learning

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

Likelihood-Based Methods

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Introduction to Graphical Models

Lecture 5 Least-squares

Linear Models for Regression

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Lecture 2 Machine Learning Review

Outline lecture 6 2(35)

ROBUST CONSTRAINED ESTIMATION VIA UNSCENTED TRANSFORMATION. Pramod Vachhani a, Shankar Narasimhan b and Raghunathan Rengaswamy a 1

State Estimation of Linear and Nonlinear Dynamic Systems

Lecture 4: Extended Kalman filter and Statistically Linearized Filter

What can regularization offer for estimation of dynamical systems?

Transcription:

Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet

State and Parameter Estimation in Dynamical Systems OUTLINE 1 Problem Formulation 2 State Estimation with Sparse Process Disturbances in Linear Systems 3 Parameter and State Estimation in Unknown (Linear) Systems

State and Parameter Estimation in Dynamical Systems Model x(t + 1) = f (x(t), u(t), w(t), ) y(t) = h(x(t), u(t), ) + e(t) Problem Measure y(t) and u(t), t = 1,..., N. Find x(t) and Linear Case f (x(t), u(t), w(t), ) = A()x(t) + B()u(t) + w(t) h(x(t), u(t), ) = C()x(t) Known System Case is a known vector

Maximum Likelihood View Θ = [, x(t), t = 1,..., N] as unknown parameters. Assume e(t) N(0, I). Then the negative log-likelihood function is V(, x( )) = N y(t) h(x(t), u(t), ) 2 t=1 Too many parameters! Regularize!

Change of Parameterization Do a (nonlinear) change of parameters: View Θ = [, x(1), w(1),..., w(n 1)] = [, w( )] as the new set of parameters, [x(k) = f (x(k 1), u(k 1), w(k 1), ) = x(k, Θ)] [The ML-estimate is unaffected by change of parameters!] The negative log-likelihood function for Θ is V( Θ) = N y(t) h(x(t, Θ), u(t), ) 2 t=1 This to be minimized wrt Θ = [, w( )].

Regularization With regularization: W( Θ) = Choices of regularization: N y(t) h(x(t, Θ), u(t), ) 2 + λr( Θ) t=1 R( Θ) = or R( Θ) = N w(t) 2 t=1 N w(t) t=1 [Tichonov] [sum-of-norms]

Classical Interpretation Regularization curbs the flexibility of (large) model sets by pulling the parameters toward the origin. Tichonov: Regularization for Bias-Variance Trade-off Sum-of-norms: Regularization for Sparsity: Solutions with many w(t) = 0 are favored

Bayesian Interpretation Suppose w(t) N(0, I) and is a random vector with N(0, ci) (dim = d). Then the joint pdf of, y( ), w( ) is 2 log P(y( ), w( ), ) N t=1 [ y(t) h(x(t, w( )), u(t), ) 2 + w(t) 2 ] + 2 /c + const x(t, w( )) = f (x(t 1, w( )), u(t 1), w(t 1), ) so the MAP estimate of Θ is ˆ Θ = arg min W( Θ) + 2 /c which for c is the same as the Tichonov-regularized ML estimate of Θ.

Outline 2 State Estimation with Sparse Process Disturbances in Linear Systems 3 Parameter and State Estimation in Unknown (Linear) Systems

Linear System with Sparse Process Disturbances x(t + 1) = Ax(t) + Bu(t) + w(t) y(t) = Cx(t) + e(t). Here, e is white measurement noise and w is a process disturbance. In many applications, w is mostly zero, and strikes, w(t ) = 0, only occasionally. Examples of applications: Control: Load disturbance Tracking: Sudden maneuvers FDI: Additive system faults Recursive Identification (x=parameters): model segmentation

Approaches Find the jump times t (w(t) = 0) and/or the smoothed state estimates ˆx s (t N), t = 1,..., N. Common methods: Say w(t ) = 0. View t and w(t ) as unknown parameters and estimate them. (Willsky-Jones GLR) Set the process noise variance to a small number and use Kalman Smoothing to estimate x (and w(t)) Branch the KF at each time instant: jump/no jump. Prune/merge trajectories (IMM). It is a non-linear filtering problem (linear but not Gaussian noise), so try particle filtering All methods require some design variables that reflect the trade-off between measurement noise sensitivity and jump alertness.

More on Willsky-Jones GLR For one jump at time t, estimate t and w(t ) as parameters. x(t + 1) = Ax(t) + Bu(t) + w(t); y(t) = Cx(t) + e(t). If t is known it is a simple LS problem to estimate w(t ). x(t) is a linear function of w(t ): min y(t) Cx(t) 2 w(t ) Using the variance of the estimate, the significance of the jump size can be decided in a χ 2 test. The time of the most significant jump is the t that minimizes min min y(t) Cx(t) 2 t w(t )

Willsky-Jones as a Constrained Optimization Problem Can be written as N min w(k),k=1,...,n 1 t=1 y(t) Cx(t) 2 s.t. W 0 = 1; W = [ w(1) 2,..., w(n 1) 2 ] such that x(t + 1) = Ax(t) + Bu(t) + w(t); x(1) = 0. k jumps:... Adjustable number of jumps: N min w(k),k=1,...,n 1 t=1 y(t) Cx(t) 2 + λ W 0

Do the l 1 Trick! (l 0 l 1 ) This problem is computationally forbidding, so relax the l 0 norm: N min w(k),k=1,...,n 1 t=1 N w(k),k=1,...,n 1 t=1 = min y(t) Cx(t) 2 + λ W 1 y(t) Cx(t) 2 + λ N t=1 w(t) 2 [StateSON] This is our Moving Horizon State estimation problem with SON-regularization. Choice of λ :... Ohlsson, Gustafsson, Ljung, Boyd: Smoothed state estimates under abrupt changes using sum-of-norms regularization. Automatica 48(4):595-605, April 2012

How Does it Work? DC motor with impulse disturbances at t = 49, 55 State RMSE over 500 realizations as a function of time t : 0 100. Dashed blue: Willsky-Jones, Solid green: StateSON 2.5 2 1.5 1 0.5 0 0 20 40 60 80 100

Varying SNRs Same system. Jump probability µ = 0.015. Varying SNR: Q = jump size, R e = measurement noise variance. For each SNR, RMSE averages over 500 MC runs. Many different approaches. 10 1 MSE 10 0 Clairvoyant CUSUM STATESON WJ PF Kalman IMM 10 0 10 1 (Q/R e ) 1/2

Conclusions Sparse State Estimation Solving the (moving horizon) state estimation problem with Sum-of-Norm (l 1 ) regularization is a good way to handle sparse process noise. Performance is at least as good as for more complicated (hypothesis-testing) routines

State and Parameter Estimation New problem: No longer assume that the parameter vector is known. How to estimate also the parameter in the system description?

State and Parameter Estimation Recall: Θ = [, x(1), w(1),..., w(n 1)] = [, w( )] x(k) = f (x(k 1), u(k 1), w(k 1), ) = x(k, Θ) min Θ N t=1 [ y(t) h(x(t, Θ), u(t), ) 2 + w(t) 2 ] This is (a) ML/MAP joint estimate of the states and the parameter vector. View it as minimization over Θ = [, x(1), x(2),..., x(n)] = [, x( )] x(k) = f (x(k 1), u(k 1), w(k 1, x( )), )

Joint State and Parameter Estimate minv(, x( )) Θ N V(, x( )) = y(t) h(x(t), u(t), ) 2 + w(t, x) 2 t=1 P(Y, X) [ ˆ J, x s (t, ˆ J )]= ˆΘ = arg min V(, x( )) Θ x s (t, ) = The smoothed states for given parameter The states are nuisance parameters in the estimation of.

Possible Parameter Estimates ˆ J as above; joint estimate with states ˆ ML = arg max P(Y ) arg max P(Y, X)P(X )dx ˆ SEM = arg min N t=1 y(t) h(x s (t, ), u(t), ) 2 "smoothing error est" ˆ J is conceptually simple to compute (in line with MPC) - could be a lot of numerical work, though. ˆ SEM sounds like a good idea: Smoothing Error minimization should be better than Prediction Error minimization ˆ ML has good credentials, but the ML criterion for nonlinear models involves solving the non-linear filtering problem. (The marginalization wrt x above is an extensive task.)

Marginalization: Picture The integration will of course in general affect the maximum:

One more Estimation Method: EM When the likelihood function is difficult to form, it may be advantageous to extend the problem with latent variables for a well defined likelihood function, and iterate between estimating these variables and the parameters. This is the EM-algorithm, and in our case the states x can serve as these latent variables. Take expectation of V(, x(.)) under the assumption that x has been generated by the model with the parameter value α: Q(, α) = E[V(, x( )) Y, = α)] k = arg min Q(, k 1 ) ˆ EM = lim k k [ ML?] How much work is required to form Q(, α)?

Linear Models How are these estimates related - and are they any good? 3 Parameter and State Estimation in Unknown Linear Systems Linear Model: (Joint discussions with Thomas Schön and David Törnqvist) x(t + 1) = A()x(t) + B()u(t) + w(t) y(t) = C()x(t) + e(t) Ew(t)w T (t) = Q() Ee(t)e T (t) = R() Specialize to (without much loss of generality): u(t) 0, Q() = I, R() = I

Notation X T = [ x(1) T x(2) T x(n) T] W T ;E T, and Y T analogously I 0 0 0 A() I 0 0 F = A 2 () A() I 0....... A N 1 () A N 2 () A N 3 () 0 C() 0 0 0 C() 0 H =...... 0 0 C()

Matrix Formulation X = F W, Y = H X + E W and E are Gaussian random vectors N(0, I). Y = H F W + E Y N(0, R ), 2 log P(Y ) = Y T R 1 Y + log det R 2 log P(Y, X) = Y H X 2 In a Bayesian setting with N(0, ci) R = H F F T HT + I P(Y, X, ) = P(Y X, )P(X )P() V(Y, X, ) = 2 log P(Y, X, ) = Y H X 2 + F 1 X 2 + 2 /c

The Estimates Joint Criterion: Estimates: W(, X) = Y H X 2 + F 1 X 2 (c ) State: X s () = F F T HT R 1 Y (Y H X s () =... = R 1 Y) Joint: J = arg min R 1 Y 2 + F 1 F F T HT R 1 Y 2 = arg min Y T R 1 Y Smoothed: ˆ SEM = arg min R 1 Y 2 = arg min Y T R 2 Y ML: ˆ ML = arg min Y T R 1 Y + log det R EM: Q(, α) =...

Expected Values of the Criteria Let the true covariance matrix of Y be R 0 = EYY T Note ML: tracer 0 R 1 + log det R J: tracer 0 R 1 SEM: tracer 1 R 0 R 1 trace BA 1 + log det A dim B + log det B equality iff all eigenvalues of BA 1 1 A So ML is consistent (but not the others!)

Numerical Illustration The values that minimize the expected value of the criterion functions (= the limiting estimates as the number of observations tend to infinity) for the system x(t + 1) = ax(t) + w(t); y(t) = x(t) + e(t); a = 0.7 3 2.5 2 1.5 1 0.5 0 10 20 30 40 50 60 70 80 90 100 Figure: The minimizing values of the expected criterion functions as a function of N. Blue solid line: ML, Green dash-dotted line: SEM, Red dashed line: J.

EM-algorithm for this simple case Figure: The estimates over the first six iterations of the EM-algorithm for different initial guesses

Some Observations J W(, X) ML W(, X)dX J Y T R 1 Y ML Y T R 1 Y + log det R J is not consistent,(but ML is, of course) J and ML are different maxima of W(, X) The marginalization of W(, X) only leads to a data-independent (regularization) term log det R Is a similar result true also in the non-linear case? How would EM work in the non-linear case? (Schön, Wills, Ninness: Automatica 2011.)

Conclusions: State and Parameter Estimation Tempting to use MPC-thinking for model estimation using Moving Horizon Estimation - Just minimize over as well. This however leads to inconsistent estimates. Can it be saved by thoughtful regularization?