1 / 35 Data-driven signal processing Ivan Markovsky
2 / 35 Modern signal processing is model-based 1. system identification prior information model structure 2. model-based design identification data parameter vector filter filtered signal design objective to-be-filtered data signal
3 / 35 Data-driven methods avoid modeling model filter design filter filtered signal system identification data-driven filter design data-driven filtering data data collection system
4 / 35 Combined modeling+design has benefits identification ignores the design objective the two-step approach is suboptimal we define and solve a direct problem observed data + filtering objective filtered signal
5 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion
6 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion
7 / 35 Dynamical system B is set of signals w w B the signal w is trajectory of the system B B is an exact model for w B is unfalsified by w we consider linear time-invariant (LTI) systems L LTI model class
8 / 35 Initial conditions past of the signal w w = w p w f w p wf t
9 / 35 Estimation in terms of trajectories observer: given model B and exact trajectory w f find w p, such that w p w f B smoother: given model B and noisy trajectory w f minimize w f ŵ f subject to ŵ p ŵ f B (MBS)
10 / 35 When does trajectory w d B specify B? identifiability conditions 1. w d is persistently exciting of "sufficiently high order" 2. B is controllable how to obtain B back from w d? w d B by choosing the simplest exact model for w d
The most powerful unfalsified model of w d, B mpum (w d ) is the data generating system 11 / 35 complexity # inputs m and # states n c(b) = (m,n) the most powerful unfalsified model B mpum (w d ) := arg min c( B) B L }{{} most powerful subject to w d B }{{} unfalsified model L m,n set of models with complexity bounded by (m,n)
Data-driven state estimation replaces the model B by trajectory w d B 12 / 35 observer: given trajectories w d and w f of B find w p, such that w p w f B mpum (w d ) smoother: given noisy traj. w d and w f of B and (m,l) minimize w f ŵ f 2 }{{ 2} estimation error + w d ŵ d 2 }{{ 2} identification error subject to ŵ p ŵ f B mpum (ŵ d ) L m,l
13 / 35 Classical approach: divide and conquer 1. identification: given w d and (m,l) minimize w d ŵ d subject to B mpum (ŵ d ) L m,l 2. model-based filtering: given w f and B := B mpum (ŵ d ) minimize w f ŵ f subject to ŵ p ŵ f B
Numerical example with Kalman smoothing simulation setup B L1,2 2nd order LTI system wf = w f + noise, w f B step response wd = w d + noise, w d B smoothing with known model state space solution solution of (MBS) smoothing with unknown model identification + model-based design solution of (DDS) 14 / 35
15 / 35 Smoothing with known model state space solution [ ] [ ] u minimize f 0 I ][ x ini O T (A,C) T T (H) y f û f (SSS) representation free solution (MBS) is a generalized least squares approximation error e := ( w f ŵ f )/ w f method (MBS) (SSS) error e 0.083653 0.083653
16 / 35 Smoothing with unknown model classical approach identification + (SSS) data-driven approach solution of (DDS) with local optimization simulation result method (MBS) (DDS) classical error e 0.083653 0.087705 0.091948
17 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion
18 / 35 We aim to find missing part of trajectory missing data interpolated from w B exact data kept fixed inexact / "noisy" data approximated by min error 2
19 / 35 Other examples fit in the same setting? missing, E exact, N noisy w = Π [ u y ], u input, y output example w p u f y f state estimation? E E EIV Kalman smoothing? N N classical Kalman smoothing? E N simulation E E? partial realization E E E/? noisy realization E E N/? output tracking E? N
20 / 35 classical Kalman filter minimize y ŷ subject to w p (u,ŷ) B past future input? u output? y output tracking control minimize subject to y ref ŷ }{{} tracking error w p (û,ŷ) B past future input u p? output y p y ref
Weighted approximation criterion accounts for exact, missing, and noisy data 21 / 35 error vector: e := w ŵ e v := t i v i (t)e 2 i (t) weight used for to by v i (t) = w i (t) exact interpolate w i (t) e i (t) = 0 v i (t) (0, ) w i (t) noisy approx. w i (t) min e i (t) v i (t) = 0 w i (t) missing fill in w i (t) ŵ B
Data-driven signal processing can be posed as missing data estimation problem 22 / 35 minimize w d ŵ d 2 2 + w ŵ 2 v (DD-SP) subject to ŵ B mpum (ŵ d ) L m,l the recovered missing values of ŵ are the desired result
23 / 35 Plan Example: data-driven Kalman smoothing Generalization: missing data estimation Solution approach: matrix completion
24 / 35 w B Hankel matrix is low-rank exact trajectory w B L m,l R 0 w(t) + R 1 w(t + 1) + + R l w(t + l) = 0 rank deficient w(1) w(2) w(t l) w(2) w(3) w(t l + 1) H (w) := w(3) w(4) w(t l + 2)... w(l + 1) w(l + 2) w(t )
25 / 35 relation at time t = 1 R 0 w(1) + R 1 w(2) + + R l w(l + 1) = 0 in matrix form: w(1) ] w(2) [R 0 R 1 R l. = 0 w(l + 1)
25 / 35 relation at time t = 2 R 0 w(2) + R 1 w(3) + + R l w(l + 2) = 0 in matrix form: w(2) ] w(3) [R 0 R 1 R l. = 0 w(l + 2)
25 / 35 relation at time t = T l R 0 w(t l) + R 1 w(t l + 1) + + R l w(t ) = 0 in matrix form: w(t l) ] w(t l + 1) [R 0 R 1 R l w(t l + 2) = 0. w(t )
26 / 35 relation for t = 1,...,T l R 0 w(t) + R 1 w(t + 1) + + R l w(t + l) = 0 in matrix form: w(1) w(2) w(t l) ] w(2) w(3) w(t l + 1) [R 0 R 1 R l w(3) w(4) w(t l + 2) = 0 }{{} R... w(l + 1) } w(l + 2) {{ w(t ) } H (w)
27 / 35 w B L m,l there is R R (q m) q(l+1) full row rank, such that RH (w) = 0 rank ( H (w) ) ql + m q # of variables
ŵ B mpum (ŵ d ) mosaic-hankel matrix is rank deficient 28 / 35 ŵ B mpum (ŵ d ) L m,l ŵ d B L m,l and ŵ B ([ ]) rank H (ŵ d ) H (ŵ) ql + m }{{} H (ŵ d,ŵ)
Data-driven signal processing structured low-rank approximation 29 / 35 minimize w d ŵ d 2 2 + w ŵ 2 v subject to ŵ B mpum (ŵ d ) L m,l minimize w ŵ v subject to rank ( H (ŵ ) ) r
30 / 35 Three main classes of solution methods local optimization nuclear norm relaxation subspace methods considerations generality user defined hyper parameters availability of efficient algorithms/software
31 / 35 Local optimization using variable projections kernel representation ( ) min min w ŵ subject to RH (ŵ) = 0 R f.r.r. ŵ variable projection (VARPRO): elimination of ŵ leads to minimize f (R) subject to R full row rank
32 / 35 Dealing with the "R full row rank" constraint 1. impose a quadratic equality constraint RR = I 2. using specialized methods for optimization on a manifold 3. R full row rank RΠ = [ X ] I with Π permutation Π fixed total least-squares Π can be changed during the optimization
33 / 35 Summary of the VARPRO approach kernel representation parameter opt. problem min ŵ,r f.r.r. w ŵ subject to RH (ŵ) = 0 elimination of ŵ optimization on a manifold min f (R) R f.r.r. in case of mosaic-hankel H, f can be evaluated fast
34 / 35 Conclusion combined modeling and design problem (DD-SP) we aim to find is a missing part of w B reformulation as weighted structured low-rank approx.
35 / 35 Future work statistical analysis computational efficiency / recursive computation other methods: subspace, convex relaxation,...