Data-driven signal processing

Similar documents
ELEC system identification workshop. Subspace methods

A structured low-rank approximation approach to system identification. Ivan Markovsky

Sparsity in system identification and data-driven control

A missing data approach to data-driven filtering and control

A software package for system identification in the behavioral setting

ELEC system identification workshop. Behavioral approach

Dynamic measurement: application of system identification in metrology

Two well known examples. Applications of structured low-rank approximation. Approximate realisation = Model reduction. System realisation

Optimization on the Grassmann manifold: a case study

A comparison between structured low-rank approximation and correlation approach for data-driven output tracking

Using Hankel structured low-rank approximation for sparse signal recovery

Low-rank approximation and its applications for data fitting

Structured low-rank approximation with missing data

Improved initial approximation for errors-in-variables system identification

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

Chapter 3. Tohru Katayama

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Identification of MIMO linear models: introduction to subspace methods

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

Stochastic gradient descent on Riemannian manifolds

A Behavioral Approach to GNSS Positioning and DOP Determination

Secrets of Matrix Factorization: Supplementary Document

Sum-of-exponentials modeling via Hankel low-rank approximation with palindromic kernel structure

Low-Rank Approximation

Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013

Dimension Reduction. David M. Blei. April 23, 2012

A Log-Frequency Approach to the Identification of the Wiener-Hammerstein Model

Signal Identification Using a Least L 1 Norm Algorithm

EL 625 Lecture 10. Pole Placement and Observer Design. ẋ = Ax (1)

I. D. Landau, A. Karimi: A Course on Adaptive Control Adaptive Control. Part 9: Adaptive Control with Multiple Models and Switching

6.241 Dynamic Systems and Control

Introduction p. 1 Fundamental Problems p. 2 Core of Fundamental Theory and General Mathematical Ideas p. 3 Classical Statistical Decision p.

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

1-Bit Matrix Completion

Introduction to Machine Learning

Chapter 5. Standard LTI Feedback Optimization Setup. 5.1 The Canonical Setup

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

Statistics 910, #15 1. Kalman Filter

SGN Advanced Signal Processing Project bonus: Sparse model estimation

Mitigating the Effect of Noise in Iterative Projection Phase Retrieval

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

CS281 Section 4: Factor Analysis and PCA

Data-Enabled Predictive Control : In the Shallows of the DeePC

Estimation, Detection, and Identification CMU 18752

SYSTEMTEORI - KALMAN FILTER VS LQ CONTROL

Introduction to Optimal Control Theory and Hamilton-Jacobi equations. Seung Yeal Ha Department of Mathematical Sciences Seoul National University

Observability and state estimation

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

y k = ( ) x k + v k. w q wk i 0 0 wk

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

Realization and identification of autonomous linear periodically time-varying systems

Signal theory: Part 1

Lecture 19 Observability and state estimation

Image restoration. An example in astronomy

Robust Subspace System Identification via Weighted Nuclear Norm Optimization

How data assimilation helps to illuminate complex biology

The Essentials of Linear State-Space Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

Parameter Estimation in a Moving Horizon Perspective

10. Rank-nullity Definition Let A M m,n (F ). The row space of A is the span of the rows. The column space of A is the span of the columns.

Sparse Regularization via Convex Analysis

Design of Norm-Optimal Iterative Learning Controllers: The Effect of an Iteration-Domain Kalman Filter for Disturbance Estimation

1-Bit Matrix Completion

Further Results on Model Structure Validation for Closed Loop System Identification

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Three Generalizations of Compressed Sensing

System Identification by Nuclear Norm Minimization

Stochastic gradient descent on Riemannian manifolds

REGULARIZED STRUCTURED LOW-RANK APPROXIMATION WITH APPLICATIONS

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Structured matrix factorizations. Example: Eigenfaces

Statistical Geometry Processing Winter Semester 2011/2012

IEOR 265 Lecture 3 Sparse Linear Regression

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Chap. 3. Controlled Systems, Controllability

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Structured Matrices and Solving Multivariate Polynomial Equations

Blind Deconvolution Using Convex Programming. Jiaming Cao

On-off Control: Audio Applications

Sparse Approximation and Variable Selection

4. DATA ASSIMILATION FUNDAMENTALS

Noisy Word Recognition Using Denoising and Moment Matrix Discriminants

Camera Calibration The purpose of camera calibration is to determine the intrinsic camera parameters (c 0,r 0 ), f, s x, s y, skew parameter (s =

Missing dependent variables in panel data models

OWL to the rescue of LASSO

Stochastic Realization of Binary Exchangeable Processes

Derivation of the Kalman Filter

The 1d Kalman Filter. 1 Understanding the forward model. Richard Turner

Temporal-Difference Q-learning in Active Fault Diagnosis

STA414/2104 Statistical Methods for Machine Learning II

Schur parametrizations and balanced realizations of real discrete-time stable all-pass systems.

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Structured State Space Realizations for SLS Distributed Controllers

CSE446: non-parametric methods Spring 2017

Simplex Algorithm Using Canonical Tableaus

Rank reduction of parameterized time-dependent PDEs

Lecture 4 Noisy Channel Coding

Transcription:

1 / 35 Data-driven signal processing Ivan Markovsky

2 / 35 Modern signal processing is model-based 1. system identification prior information model structure 2. model-based design identification data parameter vector filter filtered signal design objective to-be-filtered data signal

3 / 35 Data-driven methods avoid modeling model filter design filter filtered signal system identification data-driven filter design data-driven filtering data data collection system

4 / 35 Combined modeling+design has benefits identification ignores the design objective the two-step approach is suboptimal we define and solve a direct problem observed data + filtering objective filtered signal

5 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion

6 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion

7 / 35 Dynamical system B is set of signals w w B the signal w is trajectory of the system B B is an exact model for w B is unfalsified by w we consider linear time-invariant (LTI) systems L LTI model class

8 / 35 Initial conditions past of the signal w w = w p w f w p wf t

9 / 35 Estimation in terms of trajectories observer: given model B and exact trajectory w f find w p, such that w p w f B smoother: given model B and noisy trajectory w f minimize w f ŵ f subject to ŵ p ŵ f B (MBS)

10 / 35 When does trajectory w d B specify B? identifiability conditions 1. w d is persistently exciting of "sufficiently high order" 2. B is controllable how to obtain B back from w d? w d B by choosing the simplest exact model for w d

The most powerful unfalsified model of w d, B mpum (w d ) is the data generating system 11 / 35 complexity # inputs m and # states n c(b) = (m,n) the most powerful unfalsified model B mpum (w d ) := arg min c( B) B L }{{} most powerful subject to w d B }{{} unfalsified model L m,n set of models with complexity bounded by (m,n)

Data-driven state estimation replaces the model B by trajectory w d B 12 / 35 observer: given trajectories w d and w f of B find w p, such that w p w f B mpum (w d ) smoother: given noisy traj. w d and w f of B and (m,l) minimize w f ŵ f 2 }{{ 2} estimation error + w d ŵ d 2 }{{ 2} identification error subject to ŵ p ŵ f B mpum (ŵ d ) L m,l

13 / 35 Classical approach: divide and conquer 1. identification: given w d and (m,l) minimize w d ŵ d subject to B mpum (ŵ d ) L m,l 2. model-based filtering: given w f and B := B mpum (ŵ d ) minimize w f ŵ f subject to ŵ p ŵ f B

Numerical example with Kalman smoothing simulation setup B L1,2 2nd order LTI system wf = w f + noise, w f B step response wd = w d + noise, w d B smoothing with known model state space solution solution of (MBS) smoothing with unknown model identification + model-based design solution of (DDS) 14 / 35

15 / 35 Smoothing with known model state space solution [ ] [ ] u minimize f 0 I ][ x ini O T (A,C) T T (H) y f û f (SSS) representation free solution (MBS) is a generalized least squares approximation error e := ( w f ŵ f )/ w f method (MBS) (SSS) error e 0.083653 0.083653

16 / 35 Smoothing with unknown model classical approach identification + (SSS) data-driven approach solution of (DDS) with local optimization simulation result method (MBS) (DDS) classical error e 0.083653 0.087705 0.091948

17 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion

18 / 35 We aim to find missing part of trajectory missing data interpolated from w B exact data kept fixed inexact / "noisy" data approximated by min error 2

19 / 35 Other examples fit in the same setting? missing, E exact, N noisy w = Π [ u y ], u input, y output example w p u f y f state estimation? E E EIV Kalman smoothing? N N classical Kalman smoothing? E N simulation E E? partial realization E E E/? noisy realization E E N/? output tracking E? N

20 / 35 classical Kalman filter minimize y ŷ subject to w p (u,ŷ) B past future input? u output? y output tracking control minimize subject to y ref ŷ }{{} tracking error w p (û,ŷ) B past future input u p? output y p y ref

Weighted approximation criterion accounts for exact, missing, and noisy data 21 / 35 error vector: e := w ŵ e v := t i v i (t)e 2 i (t) weight used for to by v i (t) = w i (t) exact interpolate w i (t) e i (t) = 0 v i (t) (0, ) w i (t) noisy approx. w i (t) min e i (t) v i (t) = 0 w i (t) missing fill in w i (t) ŵ B

Data-driven signal processing can be posed as missing data estimation problem 22 / 35 minimize w d ŵ d 2 2 + w ŵ 2 v (DD-SP) subject to ŵ B mpum (ŵ d ) L m,l the recovered missing values of ŵ are the desired result

23 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion

24 / 35 w B Hankel matrix is low-rank exact trajectory w B L m,l R 0 w(t) + R 1 w(t + 1) + + R l w(t + l) = 0 rank deficient w(1) w(2) w(t l) w(2) w(3) w(t l + 1) H (w) := w(3) w(4) w(t l + 2)... w(l + 1) w(l + 2) w(t )

25 / 35 relation at time t = 1 R 0 w(1) + R 1 w(2) + + R l w(l + 1) = 0 in matrix form: w(1) ] w(2) [R 0 R 1 R l. = 0 w(l + 1)

25 / 35 relation at time t = 2 R 0 w(2) + R 1 w(3) + + R l w(l + 2) = 0 in matrix form: w(2) ] w(3) [R 0 R 1 R l. = 0 w(l + 2)

25 / 35 relation at time t = T l R 0 w(t l) + R 1 w(t l + 1) + + R l w(t ) = 0 in matrix form: w(t l) ] w(t l + 1) [R 0 R 1 R l w(t l + 2) = 0. w(t )

26 / 35 relation for t = 1,...,T l R 0 w(t) + R 1 w(t + 1) + + R l w(t + l) = 0 in matrix form: w(1) w(2) w(t l) ] w(2) w(3) w(t l + 1) [R 0 R 1 R l w(3) w(4) w(t l + 2) = 0 }{{} R... w(l + 1) } w(l + 2) {{ w(t ) } H (w)

27 / 35 w B L m,l there is R R (q m) q(l+1) full row rank, such that RH (w) = 0 rank ( H (w) ) ql + m q # of variables

ŵ B mpum (ŵ d ) mosaic-hankel matrix is rank deficient 28 / 35 ŵ B mpum (ŵ d ) L m,l ŵ d B L m,l and ŵ B ([ ]) rank H (ŵ d ) H (ŵ) ql + m }{{} H (ŵ d,ŵ)

Data-driven signal processing structured low-rank approximation 29 / 35 minimize w d ŵ d 2 2 + w ŵ 2 v subject to ŵ B mpum (ŵ d ) L m,l minimize w ŵ v subject to rank ( H (ŵ ) ) r

30 / 35 Three main classes of solution methods local optimization nuclear norm relaxation subspace methods considerations generality user defined hyper parameters availability of efficient algorithms/software

31 / 35 Local optimization using variable projections kernel representation ( ) min min w ŵ subject to RH (ŵ) = 0 R f.r.r. ŵ variable projection (VARPRO): elimination of ŵ leads to minimize f (R) subject to R full row rank

32 / 35 Dealing with the "R full row rank" constraint 1. impose a quadratic equality constraint RR = I 2. using specialized methods for optimization on a manifold 3. R full row rank RΠ = [ X ] I with Π permutation Π fixed total least-squares Π can be changed during the optimization

33 / 35 Summary of the VARPRO approach kernel representation parameter opt. problem min ŵ,r f.r.r. w ŵ subject to RH (ŵ) = 0 elimination of ŵ optimization on a manifold min f (R) R f.r.r. in case of mosaic-hankel H, f can be evaluated fast

34 / 35 Conclusion combined modeling and design problem (DD-SP) we aim to find is a missing part of w B reformulation as weighted structured low-rank approx.

35 / 35 Future work statistical analysis computational efficiency / recursive computation other methods: subspace, convex relaxation,...