Sparsity in system identification and data-driven control

Similar documents
Data-driven signal processing

Using Hankel structured low-rank approximation for sparse signal recovery

Dynamic measurement: application of system identification in metrology

Two well known examples. Applications of structured low-rank approximation. Approximate realisation = Model reduction. System realisation

A software package for system identification in the behavioral setting

ELEC 3035, Lecture 3: Autonomous systems Ivan Markovsky

Sum-of-exponentials modeling via Hankel low-rank approximation with palindromic kernel structure

A structured low-rank approximation approach to system identification. Ivan Markovsky

ELEC system identification workshop. Behavioral approach

Chapter 3. Tohru Katayama

ELEC system identification workshop. Subspace methods

Low-rank approximation and its applications for data fitting

Autonomous system = system without inputs

Introduction to Sparsity in Signal Processing

FINITE-DIMENSIONAL LINEAR ALGEBRA

Computational Methods. Eigenvalues and Singular Values

Review of some mathematical tools

System Identification by Nuclear Norm Minimization

Subspace Identification

Introduction to Sparsity in Signal Processing

Filtering via Rank-Reduced Hankel Matrix CEE 690, ME 555 System Identification Fall, 2013 c H.P. Gavin, September 24, 2013

A missing data approach to data-driven filtering and control

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Structured matrix factorizations. Example: Eigenfaces

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Sparse linear models

Optimization on the Grassmann manifold: a case study

Geometric Modeling Summer Semester 2012 Linear Algebra & Function Spaces

Lecture Notes 5: Multiresolution Analysis

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Improved initial approximation for errors-in-variables system identification

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Statistical Geometry Processing Winter Semester 2011/2012

Vector Space Concepts

Applied Linear Algebra in Geoscience Using MATLAB

Three Generalizations of Compressed Sensing

Realization and identification of autonomous linear periodically time-varying systems

Linear Models for Regression. Sargur Srihari

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Towards a Mathematical Theory of Super-resolution

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Nonlinear Observers. Jaime A. Moreno. Eléctrica y Computación Instituto de Ingeniería Universidad Nacional Autónoma de México

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

REGULARIZED STRUCTURED LOW-RANK APPROXIMATION WITH APPLICATIONS

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

4 Bias-Variance for Ridge Regression (24 points)

Stochastic Spectral Approaches to Bayesian Inference

Regularizing inverse problems. Damping and smoothing and choosing...

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

NON-LINEAR APPROXIMATION OF BAYESIAN UPDATE

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

ISyE 691 Data mining and analytics

Wavelet Footprints: Theory, Algorithms, and Applications

Generalized Pencil Of Function Method

arxiv: v1 [math.st] 1 Dec 2014

IDL Advanced Math & Stats Module

Time Series Analysis

Linear Regression and Its Applications

Sparse Approximation and Variable Selection

LECTURE NOTE #11 PROF. ALAN YUILLE

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Lecture 2. Linear Systems

McGill University Department of Mathematics and Statistics. Ph.D. preliminary examination, PART A. PURE AND APPLIED MATHEMATICS Paper BETA

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Higher-Order Singular Value Decomposition (HOSVD) for structured tensors

Super-resolution via Convex Programming

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Lesson 2: Analysis of time series

CSC 576: Variants of Sparse Learning

Preliminary Examination in Numerical Analysis

Applied Linear Algebra

Frequency-Domain Rank Reduction in Seismic Processing An Overview

SPARSE signal representations have gained popularity in recent

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

4. DATA ASSIMILATION FUNDAMENTALS

Sparse and Low-Rank Matrix Decompositions

Lecture 19 Observability and state estimation

IEOR 265 Lecture 3 Sparse Linear Regression

Fundamentals of Data Assimila1on

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

SIGNAL AND IMAGE RESTORATION: SOLVING

Control, Stabilization and Numerics for Partial Differential Equations

The Inversion Problem: solving parameters inversion and assimilation problems

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Spectral k-support Norm Regularization

Fitting Linear Statistical Models to Data by Least Squares: Introduction

Resolution of Singularities I

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Linear Regression (continued)

Provable Alternating Minimization Methods for Non-convex Optimization

Preliminary Examination, Numerical Analysis, August 2016

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Robustness of Principal Components

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Cubic Splines; Bézier Curves

Transcription:

1 / 40 Sparsity in system identification and data-driven control Ivan Markovsky

This signal is not sparse in the "time domain" 2 / 40

But it is sparse in the "frequency domain" (it is weighted sum of six damped sines) 3 / 40

Problem: find sparse representation (small number of basis signals) 4 / 40 existence representation approximation

System theory offers alternative methods based on low-rank approximation 5 / 40 y(1) y(2) y(3) y(2) y(3) y(4) rank of y(3) y(4) y(5) 12... y(l) y(l + 1) y(l + 3)

6 / 40 Plan Sparse signals and linear-time invariant systems System identification as sparse approximation Solution methods and generalizations

Sum-of-damped-exponentials signals are solutions of linear constant coefficient ODE 7 / 40 y = α 1 exp z1 + + α n exp zn exp z (t) := z t p 0 y + p 1 σy + + p n σ n y = 0 (σy)(t) := y(t + 1) y = Cx, σx = Ax x(t) R n state

The solution set of linear constant coefficient ODE is linear time-invariant (LTI) system 8 / 40 n-th order autonomous LTI system B := {y = Cx σx = Ax, x(0) R n } dim(b) = n complexity of B L n LTI systems with order n

9 / 40 y B L n is constrained/structured/sparse belongs to n-dimensional subspace is linear combination of n signals described by 2n parameters

We assume that sparse representation exists, but we do not know the basis 10 / 40 classical definition of sparse signal y y has a few nonzero values (we don t know which ones) basis: unit vectors y B L n with n # of samples y is sum of a few damped sines (their frequencies and dampings are unknown) basis: damped complex exponentials

The assumption y B L n makes ill-posed problems well-posed 11 / 40 noise filtering given y = ȳ + ỹ, ỹ noise find ȳ true value forecasting given "past" samples ( y( t + 1),...,y(0) ) find "future" samples ( y(1),...,y(t) ) missing data estimation given samples y(t), t Tgiven find missing samples y(t), t Tgiven

Noise filtering: given y = ȳ + ỹ, find ȳ with prior knowledge ȳ B L n, ỹ N(0,νI) 12 / 40

Heuristic: smooth the data by low-pass filter 13 / 40

Optimal/Kalman filtering requires a model The best (but unrealistic) option is to use B 14 / 40

Kalman filtering using identified model B, (i.e., prior knowledge B L n ) 15 / 40

16 / 40 Summary the assumption y B L n imposes sparsity the basis is sum-of-damped-exponentials with unknown dampings and frequencies y B L n "regularizes" ill-posed problems

17 / 40 Plan Sparse signals and linear-time invariant systems System identification as sparse approximation Solution methods and generalizations

18 / 40 System identification is an inverse problem simulation B y given model B Ln and initial conditions find the response y B identification y B given response y and model class Ln find model B Ln that "fits well" y

19 / 40 "fits well" is often defined in stochastic setting assumption y = ȳ + ỹ where ȳ B L n is the true signal ỹ N(0,νI) is noise (zero mean white Gaussian) maximum likelihood estimator minimize over ŷ and B y ŷ subject to ŷ B L n "The noise model is just an alibi for determining the cost function." L. Ljung

Example: monthly airline passenger data 1949 1960 fit by 6th order LTI model 20 / 40

21 / 40 How well a given model B fits the data y? error(y,b) := min ŷ B y ŷ likelihood of y, given B projection of y on B validation error identification problem: minimize over B Ln error(y,b)

The link between system identification and sparse approximation is low rank 22 / 40 y B L n y(1) y(2) y(t n) y(2) y(3) y(t n + 1) rank... n y(n + 1) y(n + 2) y(t ) Hankel structured matrix H n+1 (y)

LTI system identification is equivalent to Hankel structured low-rank approximation minimize over ŷ and B y ŷ subject to ŷ B L n minimize over ŷ y ŷ subject to rank ( H n+1 (ŷ) ) n 23 / 40

24 / 40 Summary system identification aims at a map y B the map is defined through optimization problem equivalent problem: Hankel low-rank approx. (impose sparsity on the singular values)

25 / 40 Plan Sparse signals and linear-time invariant systems System identification as sparse approximation Solution methods and generalizations

26 / 40 Three solution approaches: nuclear norm heuristic subspace methods local optimization

The nuclear norm heuristic induces sparsity on the singular values 27 / 40 rank: number of nonzero singular values : l 1 -norm of the singular values vector minimization of the nuclear norm tends to increase sparsity = reduce rank leads to a convex optimization problem

Nuclear norm minimization methods involve a hyper-parameter minimize over ŷ y ŷ subject to H n+1 (ŷ) γ minimize over ŷ α y ŷ + H n+1 (ŷ) γ/α determines the rank of H n+1 (ŷ) we want α opt = max{α rank ( H n+1 (ŷ) ) n} α opt can be found by bijection 28 / 40

Originally the subspace identification methods were developed for exact data 29 / 40 L n class of LTI systems of order n state space representation B := {y = Cx σx = Ax, x(0) R n } exact identification problem y (A, C) given y B Ln exact data find (A,C) model parameters

Two steps solution method 1. rank revealing factorization C CA [ H L (y) =. CA L+1 } {{ } O ] x(0) Ax(0) A 2 x(0) A T L x(0) }{{} C 2. shift equation C CA. CA L 1 CA A = CA 2. CA L O(1:L 1,:)A = O(2:L,:) T = 2n + 1 samples suffice, L [n + 1,T n] 30 / 40

For noisy data, subspace methods involve unstructured low-rank approximation 31 / 40 do steps 1 and 2 approximately: 1. singular value decomposition of H L (y) 2. least squares solution of the shift equation L is hyper-parameter that affects the solution B

32 / 40 Local optimization using variable projections "double" optimization min B L n ( min ŷ B ) y ŷ "inner" minimization "outer" minimization error(y, B) = Π By min B L n error(y, B)

Representation of an LTI system as kernel of polynomial operator 33 / 40 p 0 y + p 1 σy + + p n σ n y = 0 (σy)(t) := y(t + 1) p(σ)y = 0, where p(z) = p 0 + p 1 z + + p n z n ] model parameter p = [p 0 p 1 p n

34 / 40 Parameter optimization problem optimization over a manifold 1.5 1 min error(y, B) B L n min error(y,p) p =1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 optimization over Euclidean spaces [ ] p = x 1 Π p 0 Π permutation Π fixed total least-squares Π can be changed during the optimization

35 / 40 Three generalizations systems with inputs missing data estimation nonlinear system identification

36 / 40 Dealing with missing data minimize over ŷ y ŷ v subject to rank ( H n+1 (ŷ) ) n weighted 2-norm approximation y ŷ v := k,t v k (t) ( y k (t) ŷ k (t) ) 2 with element-wise weights v k (t) (0, ) if y k (t) is noisy approximate y k (t) v k (t) = 0 if y k (t) is missing interpolate y k (t) v k (t) = if y k (t) is exact ŷ k (t) = y k (t)

Example: piecewise cubic interpolation vs LTI identification on the "airline passenger data" 37 / 40

38 / 40 Conclusion y is response of LTI system y sparse LTI identification low-rank approx. solution methods convex relaxation (nuclear norm) subspace (SVD + least squares) local optimization

39 / 40 DFT analysis suffers from the "leakage" 0 0

Gridding the frequency axis and using l 1 -norm minimization has limited resulution 40 / 40 given signal y select "dictionary" Φ(t) = [ ] sin(ω 1 t) sin(ω N t) minimize over a a 1 subject to y = Φa