Introduction to Sparsity in Signal Processing

Similar documents
Introduction to Sparsity in Signal Processing

Introduction to Sparsity in Signal Processing 1

Sparse Regularization via Convex Analysis

Sparse signal representation and the tunable Q-factor wavelet transform

Discrete-time signals and systems

Lecture Notes 5: Multiresolution Analysis

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Sparse linear models

Sparse signal representation and the tunable Q-factor wavelet transform

E : Lecture 1 Introduction

Module 3. Convolution. Aim

9. Numerical linear algebra background

! Review: Discrete Fourier Transform (DFT) ! DFT Properties. " Duality. " Circular Shift. ! Circular Convolution. ! Fast Convolution Methods

Sparsity in system identification and data-driven control

9. Numerical linear algebra background

Convolution. Define a mathematical operation on discrete-time signals called convolution, represented by *. Given two discrete-time signals x 1, x 2,

SNR Calculation and Spectral Estimation [S&T Appendix A]

INTRODUCTION Noise is present in many situations of daily life for ex: Microphones will record noise and speech. Goal: Reconstruct original signal Wie

Discrete Time Fourier Transform (DTFT)

Multidimensional Signal Processing

Transforms and Orthogonal Bases

Introduction to Computer Vision. 2D Linear Systems

EE123 Digital Signal Processing

LAB 6: FIR Filter Design Summer 2011

Problem 1. Suppose we calculate the response of an LTI system to an input signal x(n), using the convolution sum:

Discrete-Time Signals and Systems

Sparsity-Assisted Signal Smoothing

Lecture 3: Linear Filters

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

The Discrete Fourier Transform

Chirp Transform for FFT

Therefore the new Fourier coefficients are. Module 2 : Signals in Frequency Domain Problem Set 2. Problem 1

Lecture 3: Linear Filters

EEO 401 Digital Signal Processing Prof. Mark Fowler

How to manipulate Frequencies in Discrete-time Domain? Two Main Approaches

Inverse Problems in Image Processing

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

University Question Paper Solution

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Sparse linear models and denoising

Lecture 2 OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE

Sparse Signal Estimation by Maximally Sparse Convex Optimization

Lecture 11 FIR Filters

Advanced Digital Signal Processing -Introduction

Chapter 8 The Discrete Fourier Transform

DFT-Based FIR Filtering. See Porat s Book: 4.7, 5.6

Denoising of NIRS Measured Biomedical Signals

Designing Information Devices and Systems I Discussion 13B

Homework #1 Solution

Two-Dimensional Signal Processing and Image De-noising

! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular

Aspects of Continuous- and Discrete-Time Signals and Systems

Introduction to Compressed Sensing

ENT 315 Medical Signal Processing CHAPTER 2 DISCRETE FOURIER TRANSFORM. Dr. Lim Chee Chin

Fourier Series Representation of

An Alternating Projections Algorithm for Sparse Blind Deconvolution

Numerical Approximation Methods for Non-Uniform Fourier Data

/ (2π) X(e jω ) dω. 4. An 8 point sequence is given by x(n) = {2,2,2,2,1,1,1,1}. Compute 8 point DFT of x(n) by

ECG782: Multidimensional Digital Signal Processing

(i) Represent continuous-time periodic signals using Fourier series

Wavelet Footprints: Theory, Algorithms, and Applications

Continuous Fourier transform of a Gaussian Function

ELEG 305: Digital Signal Processing

Two-Dimensional Signal Processing and Image De-noising

APPLIED SIGNAL PROCESSING

Review of Fundamentals of Digital Signal Processing

Convolution Spatial Aliasing Frequency domain filtering fundamentals Applications Image smoothing Image sharpening

The Continuous-time Fourier

Definition. A signal is a sequence of numbers. sequence is also referred to as being in l 1 (Z), or just in l 1. A sequence {x(n)} satisfying

Introduction to Linear Systems

Math 56 Homework 5 Michael Downs

Analytic discrete cosine harmonic wavelet transform(adchwt) and its application to signal/image denoising

Digital Signal Processing Lecture 5

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

Digital Filters Ying Sun

ELEG 3124 SYSTEMS AND SIGNALS Ch. 5 Fourier Transform

On the Frequency-Domain Properties of Savitzky-Golay Filters

Multidimensional digital signal processing

Multiresolution Analysis

Frequency-Domain Design and Implementation of Overcomplete Rational-Dilation Wavelet Transforms

DISCRETE-TIME SIGNAL PROCESSING

ECG782: Multidimensional Digital Signal Processing

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Voiced Speech. Unvoiced Speech

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

Analog vs. discrete signals

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Linear Inverse Problems

Fourier Analysis of Signals Using the DFT

L29: Fourier analysis

OLA and FBS Duality Review

Inverse problem and optimization

6.02 Fall 2012 Lecture #10

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

The Z-Transform. For a phasor: X(k) = e jωk. We have previously derived: Y = H(z)X

Exercises in Digital Signal Processing

Transcription:

1 Introduction to Sparsity in Signal Processing Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 212

2 Under-determined linear equations Consider a system of under-determined system of equations y = Ax (1) A : M N M < N y : length-m x : length-n y = y(). y(m 1) x = The system has more unknowns than equations. The matrix A is wider than it is tall. x(). x(n 1) We assume that AA is invertible, therefore the system of equations (1) has infinitely many solutions.

Norms We will use the l 2 and l 1 norms: N 1 x 2 2 := x(n) 2 (2) n= N 1 x 1 := x(n). (3) x 2 2, i. e. the sum of squares, is referred to as the energy of x. n=

Least squares To solve y = Ax, it is common to minimize the energy of x. argmin x x 2 2 such that y = Ax. (4a) (4b) Solution to (4): x = A (AA ) 1 y. (5) When y is noisy, don t solve y = Ax exactly. Instead, find approximate solution: argmin x y Ax 2 2 + λ x 2 2. (6) Solution to (6): x = (A A + λi) 1 A y. (7) Large scale systems fast algorithms needed..

Sparse solutions Another approach to solve y = Ax: argmin x x 1 such that y = Ax (8a) (8b) Problem (8) is the basis pursuit (BP) problem. When y is noisy, don t solve y = Ax exactly. Instead, find approximate solution: argmin x y Ax 2 2 + λ x 1. (9) Problem (9) is the basis pursuit denoising (BPD) problem. The BP/BPD problems can not be solved in explicit form, only by iterative numerical algorithms.

6 Least squares & BP/BPD Least squares and BP/BPD solutions are quite different. Why? To minimize x 2 2... the largest values of x must be made small as they count much more than the smallest values. least square solutions have many small values, as they are relatively unimportant least square solutions are not sparse. x 2 x 2 1 1 2 x Therefore, when it is known/expected that x is sparse, use the l 1 norm; not the l 2 norm.

Algorithms for sparse solutions Cost function: Non-differentiable Convex Large-scale Algorithms: Matrix-free ISTA FISTA SALSA (ADMM) and more...

Parseval frames If A satisfies AA = pi (1) then the columns of A are said to form a Parseval frame. A can be rectangular. Least squares solution (5) becomes No matrix inversion needed. Least square solution (7) becomes x = A (AA ) 1 y = 1 p A y (AA = pi) (11) x = (A A + λi) 1 A y = 1 λ + p A y (AA = pi) (12) (Use the matrix inverse lemma.) No matrix inversion needed. If A satisfies (1), then it is very easy to find least square solutions. Some algorithms for BP/BPD also become computationally easier.

9 Example: Sparse Fourier coefficients using BP The Fourier transform tells how to write the signal as a sum of sinusoids. But, it is not the only way. Basis pursuit gives a sparse spectrum. Suppose the M-point signal y(m) is written as N 1 y(m) = c(n) exp (j 2πN ) mn, m M 1 (13) n= where c(n) is a length-n coefficient sequence, with M N. y = Ac (14) A m,n = exp (j 2πN ) mn, m M 1, n N 1 (15) c : length-n The coefficients c(n) are frequency-domain (Fourier) coefficients.

Example: Sparse Fourier coefficients using BP 1. If N = M, then A is the inverse N-point DFT matrix. 2. If N > M, then A is the first M rows of the inverse N-point DFT matrix. A or A can be implemented efficiently using the FFT. For example, in Matlab, y = Ac is implemented as: function y = A(c, M, N) v = N * ifft(c); y = v(1:m); end Similarly, A y can be obtained by zero-padding and computing the DFT. In Matlab, c = A y is implemented as: function c = AT(y, M, N) c = fft([y; zeros(n-m, 1)]); end Matrix-free algorithms. 3. Due to the orthogonality properties of complex sinusoids, AA = N I M (16)

11 Example: Sparse Fourier coefficients using BP When N = M, the coefficients c satisfying y = Ac are uniquely determined. When N > M, the coefficients c are not unique. Any vector c satisfying y = Ac can be considered a valid set of coefficients. To find a particular solution we can minimize either c 2 2 or c 1. Least squares: argmin c c 2 2 such that y = Ac (17a) (17b) Basis pursuit: argmin c c 1 such that y = Ac. (18a) (18b) The two solutions can be quite different...

Example: Sparse Fourier coefficients using BP 1.5 1.5.5 1 Signal Real part Imaginary part 2 4 6 8 1 Time (samples) Least square solution: c = A (AA ) 1 y = 1 N A y (least square solution) A y is computed via: 1. zero-pad the length-m signal y to length-n 2. compute its DFT BP solution: Compute using algorithm SALSA

Example: Sparse Fourier coefficients using BP 8 6 (A) Fourier coefficients (DFT) 4 2 2 4 6 8 1 Frequency (DFT index).4.3 (B) Fourier coefficients (least square solution).2.1 5 1 15 2 25 Frequency (index) 1.8.6 (C) Fourier coefficients (basis pursuit solution).4.2 5 1 15 2 25 Frequency (index) 3 The BP solution does not exhibit the leakage phenomenon.

14 Example: Sparse Fourier coefficients using BP 2.5 Cost function history 2 1.5 1 2 4 6 8 1 Iteration Cost function history of algorithm for basis pursuit solution

5 Example: Denoising using BPD Digital LTI filters are often used for noise reduction (denoising). But if the noise and signal overlap in the frequency domain or the respective frequency bands are unknown, then it is difficult to use LTI filters. However, if the signal has sparse (or relatively sparse) Fourier coefficients, then BPD can be used for noise reduction.

16 Example: Denoising using BPD Noisy speech signal y(m): y(m) = s(m) + w(m), m M 1, M = 5 (19) s(m) : noise-free speech signal w(m) : noise sequence..6 Noisy signal.4.2.2.4.4.3.2.1 1 2 3 4 5 Time (samples) (A) Fourier coefficients (FFT) of noisy signal 1 2 3 4 5 Frequency (index)

17 Example: Denoising using BPD Assume the noise-free speech signal s(n) has a sparse set of Fourier coefficients: y = Ac + w y : noisy speech signal, length-m A : M N DFT matrix (15) c : sparse Fourier coefficients, length-n w : noise, length-m As y is noisy, find c by solving the least square problem argmin c or the basis pursuit denoising (BPD) problem argmin c y Ac 2 2 + λ c 2 2 (2) y Ac 2 2 + λ c 1. (21) Once c is found, an estimate of the speech signal is given by ŝ = Ac.

18 Example: Denoising using BPD Least square solution: c = (A A + λi) 1 A y = 1 λ + N A y (AA = N I) (22) using (12) and (16). least square estimate of the speech signal is ŝ = Ac = N λ + N y But this is only a scaled version of the noisy signal! No real filtering is achieved. (least square solution).

19 Example: Denoising using BPD BPD solution:.6 (B) Fourier coefficients (BPD solution).4.2 1 2 3 4 5 Frequency (index).6 Denoising using BPD.4.2.2.4 1 2 3 4 5 Time (samples) obtained with algorithm SALSA. Effective noise reduction, unlike least squares!

2 Example: Deconvolution using BPD If the signal of interest x(m) is not only noisy but is also distorted by an LTI system with impulse response h(m), then the available data y(m) is y(m) = (h x)(m) + w(m) (23) where denotes convolution (linear convolution) and w(m) is additive noise. Given the observed data y, we aim to estimate the signal x. We will assume that the sequence h is known.

1 Example: Deconvolution using BPD x : length N h : length L y : length M = N + L 1 y = Hx + w (24) h h 1 h H = h 2 h 1 h h 2 h 1 h h 2 h 1 h 2 (25) H is of size M N with M > N (because M = N + L 1).

22 Example: Deconvolution using BPD Sparse signal convolved by the 4-point moving average filter { 1 n =, 1, 2, 3 4 h(n) = otherwise 2 Sparse signal 1 1 2 4 6 8 1.6.4.2 Observed signal.2.4 2 4 6 8 1

23 Example: Deconvolution using BPD Due to noise w, solve the least square problem argmin x or the basis pursuit denoising (BPD) problem argmin x y Hx 2 2 + λ x 2 2 (26) y Hx 2 2 + λ x 1 (27) Least square solution: x = (H H + λi) 1 H y. (28)

24 Example: Deconvolution using BPD.6.4.2 Deconvolution (least square solution).2.4 2 4 6 8 1 2 Deconvolution (BPD solution) 1 1 2 4 6 8 1 The BPD solution, obtained using SALSA, is more faithful to original signal.

25 Example: Filling in missing samples using BP Due to data transmission/acquisition errors, some signal samples may be lost. Fill in missing values for error concealment. Part of a signal or image may be intentionally deleted (image editing, etc). Convincingly fill in missing values according to the surrounding area to do inpainting. 2 missing samples.5 Incomplete signal.5 1 2 3 4 5 Time (samples)

Example: Filling in missing samples using BP The incomplete signal y: x : length M y : length K < M S : selection (or sampling ) matrix of size K M. y = Sx (29) For example, if only the first, second and last elements of a 5-point signal x are observed, then the matrix S is given by: 1 S = 1. (3) 1 Problem: Given y and S, find x such that y = Sx Underdetermined system, infinitely many solutions. Least square and BP solutions are very different...

Example: Filling in missing samples using BP Properties of S: 1. SS t = I (31) where I is an K K identity matrix. For example, with S in (3) 1 SS t = 1. 1 2. S t y sets the missing samples to zero. For example, with S in (3) 1 y() 1 y() y(1) S t y = y(1) = y(2). (32) 1 y(2)

28 Example: Filling in missing samples using BP Suppose x has a sparse representation with respect to A, c : sparse vector, length N, with M N A : size M N. The incomplete signal y can then be written as Therefore, if we can find a vector c satisfying then we can create an estimate ˆx of x by setting x = Ac (33) y = Sx from (29) (34a) = SAc from (33). (34b) y = SAc (35) ˆx = Ac. (36) From (35), Sˆx = y. That is, ˆx agrees with the known samples y.

29 Example: Filling in missing samples using BP Note that y is shorter than the coefficient vector c, so there are infinitely many solutions to (35). Any vector c satisfying y = SAc can be considered a valid set of coefficients. To find a particular solution, solve the least square problem argmin c c 2 2 such that y = SAc (37a) (37b) or the basis pursuit problem argmin c c 1 such that y = SAc. (38a) (38b) The two solutions are very different... Let us assume A satisfies AA = pi, (39) for some positive real number p.

Example: Filling in missing samples using BP The least square solution is c = (SA) ((SA)(SA) ) 1 y (4) = A S t (SAA S t ) 1 y (41) = A S t (p SS t ) 1 y using (1) (42) = A S t (p I) 1 y using (31) (43) = 1 p A S t y (44) Hence, the least square estimate ˆx is given by ˆx = Ac (45) = 1 p AA S t y using (44) (46) = S t y using (1). (47) This estimate consists of setting all the missing values to zero! See (32). No real estimation of the missing values has been achieved. The least square solution is of no use here.

31 Example: Filling in missing samples using BP Short segments of speech can be sparsely represented using the DFT; therefore we set A equal to the M N DFT (15) with N = 124. BP solution obtained using 1 iterations of a SALSA:.8.6.4.2 Estimated coefficients 1 2 3 4 5 Frequency (DFT index).5 Estimated signal.5 1 2 3 4 5 Time (samples) The missing samples have been filled in quite accurately.

32 Total Variation Denoising/Deconvolution A signal is observed in additive noise, y = x + w. Total variation denoising is suitable when it is known that the signal of interest has a sparse derivative. argmin x y x 2 2 + λ N x(n) x(n 1). (48) n=2 The total variation of x is defined as N TV(x) := x(n) x(n 1) = Dx 1 n=2 1 1 1 1 D =... 1 1 (49) Matrix D has size (N 1) N.

3 Total Variation Denoising/Deconvolution 6 Noise free signal 6 L2 filtered signal (λ = 3.) 4 4 2 2 2 2 5 1 15 2 25 5 1 15 2 25 6 Noisy signal 6 L2 filtered signal (λ = 8.) 4 4 2 2 2 2 5 1 15 2 25 5 1 15 2 25 6 TV filtered signal (λ = 3.) 4 2 2 5 1 15 2 25 TV filtering preserves edges/jumps more accurately than least squares filtering.

Polynomial Smoothing of Time Series with Additive Step Discontinuities 3 (a) Noisy data 2 1 1 5 1 15 2 25 3 3 (c) TV compensated data 2 1 1 5 1 15 2 25 3 3 (b) Calculated TV component 2 1 1 5 1 15 2 25 3 3 (d) Estimated signal 2 1 1 5 1 15 2 25 3 Observed noisy data: a low-frequency signal and with step-wise discontinuities (steps/jumps). Goal: Simultaneous smoothing and change/jump detection. Conventional low-pass filtering blurs discontinuities.

Polynomial Smoothing of Time Series with Additive Step Discontinuities Variational Formulation: Given noisy data y, find x (step signal) p (local polynomial signal) by minimization: where argmin a,x λ N N x(n) x(n 1) + y(n) p(n) x(n) 2 n=2 n=1 p(n) = a 1 n + + a d n d. The cost function is non-differentiable. We use variable splitting and ADMM to derive an efficient iterative algorithm.

36 Polynomial Smoothing of Time Series with Additive Step Discontinuities Nano-particle Detection: The Whispering Gallery Mode (WGM) biosensor, designed to detect individual nano-particles in real time, is based on detecting discrete changes in the resonance frequency of a microspherical cavity. The adsorption of a nano-particle onto the surface of a custom designed microsphere produces a theoretically predicted but minute change in the optical properties of the sphere [Arnold, 23]. 9 8 7 wavelength (fm) 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 Time (seconds)

37 Polynomial Smoothing of Time Series with Additive Step Discontinuities Nano-particle Detection: wavelenth (fm) 3 25 2 15 1 5 5 1 15 2 25 3 Time (seconds) wavelength (fm) 2 1 1 2 x(t) 3 5 1 15 2 25 3 Time (seconds) wavelength (fm) 15 1 5 diff(x(t)) 5 5 1 15 2 25 3 Time (seconds) Reference: Polynomial Smoothing of Time Series with Additive Step Discontinuities, IS, S. Arnold, V. Dantham. Submitted to IEEE Trans on Signal Processing.

8 Summary 1. Basis pursuit 2. Basis pursuit denoising 3. Sparse Fourier coefficients using BP 4. Denoising using BPD 5. Deconvolution using BPD 6. Filling in missing samples using BP 7. Total variation filtering 8. Simultaneous least-squares & sparsity