Sparse linear models

Similar documents
Sparse linear models and denoising

Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Multiresolution image processing

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Introduction to Compressed Sensing

Introduction to Wavelet. Based on A. Mukherjee s lecture notes

Digital Image Processing

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Introduction to Discrete-Time Wavelet Transform

Towards a Mathematical Theory of Super-resolution

Super-resolution via Convex Programming

Compressed Sensing and Neural Networks

EE123 Digital Signal Processing

An Introduction to Sparse Approximation

An Introduction to Wavelets and some Applications

Digital Image Processing Lectures 15 & 16

Sparsity in Underdetermined Systems

Sparsifying Transform Learning for Compressed Sensing MRI

Multiresolution analysis

Digital Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing

INTRODUCTION TO. Adapted from CS474/674 Prof. George Bebis Department of Computer Science & Engineering University of Nevada (UNR)

Scientific Computing: An Introductory Survey

2D Wavelets. Hints on advanced Concepts

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Wavelets in Pattern Recognition

Recent developments on sparse representation

2.3. Clustering or vector quantization 57

Wavelets and multiresolution representations. Time meets frequency

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

SGN Advanced Signal Processing Project bonus: Sparse model estimation

Wavelets and Multiresolution Processing

Niklas Grip, Department of Mathematics, Luleå University of Technology. Last update:

Wavelets, Filter Banks and Multiresolution Signal Processing

SPARSE signal representations have gained popularity in recent

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

Compressed Sensing: Extending CLEAN and NNLS

Introduction to Wavelets and Wavelet Transforms

ECG782: Multidimensional Digital Signal Processing

Sparse Time-Frequency Transforms and Applications.

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Wavelet Footprints: Theory, Algorithms, and Applications

Structured matrix factorizations. Example: Eigenfaces

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).

Digital Image Processing

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Multiresolution schemes

Bayesian Paradigm. Maximum A Posteriori Estimation

Introduction to Signal Processing

Support Detection in Super-resolution

Review: Learning Bimodal Structures in Audio-Visual Data

Strengthened Sobolev inequalities for a random subspace of functions

Wavelets and Signal Processing

ECE472/572 - Lecture 13. Roadmap. Questions. Wavelets and Multiresolution Processing 11/15/11

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Designing Information Devices and Systems I Discussion 13B

Tutorial: Sparse Signal Processing Part 1: Sparse Signal Representation. Pier Luigi Dragotti Imperial College London

( nonlinear constraints)

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11

Sensing systems limited by constraints: physical size, time, cost, energy

Transforms and Orthogonal Bases

Multiresolution schemes

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Translation-Invariant Shrinkage/Thresholding of Group Sparse. Signals

Sparse Optimization Lecture: Basic Sparse Optimization Models

Lecture 16: Multiresolution Image Analysis

Linear Algebra and Dirac Notation, Pt. 1

MULTIRATE DIGITAL SIGNAL PROCESSING

Lecture Notes 9: Constrained Optimization

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Morphological Diversity and Source Separation

Physics 342 Lecture 2. Linear Algebra I. Lecture 2. Physics 342 Quantum Mechanics I

Sparse Approximation and Variable Selection

Jim Lambers ENERGY 281 Spring Quarter Lecture 5 Notes

Introduction to Sparsity in Signal Processing

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur

Invariant Scattering Convolution Networks

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Ch. 15 Wavelet-Based Compression

Sparse Recovery Beyond Compressed Sensing

Multiscale Image Transforms

Constrained optimization

Image Processing by the Curvelet Transform

1. Fourier Transform (Continuous time) A finite energy signal is a signal f(t) for which. f(t) 2 dt < Scalar product: f(t)g(t)dt

c 1998 Society for Industrial and Applied Mathematics

EE67I Multimedia Communication Systems

Wavelets. Introduction and Applications for Economic Time Series. Dag Björnberg. U.U.D.M. Project Report 2017:20

Multiple Change Point Detection by Sparse Parameter Estimation

Wavelets For Computer Graphics

A powerful method for feature extraction and compression of electronic nose responses

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Digital Image Processing Lectures 13 & 14

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Multiscale Geometric Analysis: Thoughts and Applications (a summary)

A Review of Sparsity-Based Methods for Analysing Radar Returns from Helicopter Rotor Blades

Transcription:

Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Linear model A linear model for a signal x R n is a representation of the form x = m c i φ i i=1 {φ 1,..., φ m } is a family of atoms in R n c R m is an alternative representation or transform of x

Sparse linear model A sparse linear model contains a small number of coefficients x = i S c i φ i S m How do we choose a transform that sparsifies a class of signals? Intuition / Domain knowledge (this lecture) Learning it from the data (later on)

Why? Signals of interest (speech, natural images, biomedical activity, etc.) are often highly structured Sparse linear models are able to exploit this structure to enhance data analysis and processing Applications: Compression Denoising Inverse problems

Sparse representation in an orthonormal basis If the atoms {φ 1,..., φ n } form an orthonormal basis, then U := [ ] φ 1 φ 2 φ n UU T = I Coefficients are obtained by computing inner products with the atoms x = UU T x = m φ i, x φ i i=1

Sparse representation in a basis If the atoms {φ 1,..., φ n } form a basis, then B := [ φ 1 φ 2 φ n ] BB 1 = I Coefficients are obtained by computing inner products with dual atoms x = BB 1 x = m θ i, x φ i i=1 where θ T 1 B 1 θ = T 2 θ T n

Overcomplete dictionaries If the atoms {φ 1,..., φ m } are linearly independent and m > n D := [ φ 1 φ 2 φ m ] Two alternative sparse models 1. Synthesis sparse model x = Dc Problem: Given x find a sparse c 2. Analysis sparse model: where c is sparse D T x is sparse Both are equivalent if D is an orthonormal basis

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Fourier series Fourier series coefficients of f : [0, 1] C Fourier series S n (t) := c k := n k= n 1 0 f (t) e i2πkt dt c k e i2πkt = n k= n φ k, f φ k Sinusoidal atoms φ k (t) = e i2πkt = cos (2πkt) + i sin (2πkt) Orthonormal basis of L 2 under the usual inner product lim n f (t) S n (t) = 0 for all f L 2

Discrete Fourier transform (DFT) Discretized frequency representation for vectors in C n Sinusoidal atoms are a basis for C n φ k = 1 n 1 e i2πk n e i2πk2 n e i2πk(n 1) n DFT {x} k := (Fx) k = φ k, x F := [ φ 0 φ 1 φ n 1 ] The fast Fourier transform (FFT) computes the DFT in O (n log n) The discrete cosine transform (DCT) is a related transformation designed for real vectors

Discrete cosine transform Signal DCT coefficients

Electrocardiogram

Electrocardiogram (spectrum)

Electrocardiogram (spectrum)

2D DFT Discretized frequency representation for 2D arrays in C n n Sinusoidal atoms are a basis for C n n φ 2D k 1,k 2 = 1 n 1 e i2πk 2 n e i2πk 2 (n 1) n e i2πk 1 n e i2π(k 1 +k 2) n e i2π(k 1 +k 2 (n 1)) n e i2πk 1 (n 1) n e i2π(k 1 (n 1)+k 2) n e i2π(k 1 (n 1)+k 2 (n 1)) n ( ) T = φ 1D k 1 φ 1D k 2 DFT 2D {X } := FXF = n 1 n 1 k 1 =0 k 2 =0 φ 2D k 1,k 2, X φ 2D k 1,k 2 Generalizes to C m n, m n, and to higher dimensions

Compression via frequency representation The 2D frequency representation of images tends to be sparse Thresholding the coefficients yields a compressed representation The JPEG compression standard is based on the 2D DCT High-frequency coefficients are discarded according to a perceptual model

Compression via frequency representation Original

Compression via frequency representation 10 % largest DCT coeffs

Compression via frequency representation 2 % largest DCT coeffs

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Motivation Spectrum of speech, music, etc. varies over time Idea: Compute frequency representation of time segments of the signal We must use a window to avoid introducing spurious high frequencies

The need for windowing Signal Window = Spectrum =

The need for windowing Signal Window = Spectrum =

Short-time Fourier transform Let w : [0, 1] C be a window function localized in time and frequency STFT {f } (k, τ) := 1 0 f (t) w (t τ)e i2πkt dt = φ k,τ, f Each atom φ k,τ (t) := w (t τ) e i2πkt corresponds to w shifted by τ in time and by k in frequency In discrete time, pointwise multiplication by a shifted window followed by a DFT, equivalent to D T x where D C n m, m > n The STFT of speech tends to be sparse (analysis sparse model) Including dilations of w (in addition to time and frequency translations) yields a dictionary of Gabor atoms

Atom τ = 0, k = 0 Real part Imaginary part Spectrum

Atom τ = 1/32, k = 0 Real part Imaginary part Spectrum

Atom τ = 0, k = 64 Real part Imaginary part Spectrum

Atom τ = 1/32, k = 64 Real part Imaginary part Spectrum

Speech signal

Spectrum

Spectrogram (log magnitude of STFT coefficients) Frequency Time

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Wavelets Aim: Approximate signals at different scales A wavelet ψ is a unit-norm, zero-mean function in L 2 Wavelet transform W {f } (s, τ) := 1 1 ( t τ f (t) ψ s s 0 ) dt = φ s,τ, f Atoms are dilations and translations of the mother wavelet φ s,τ (t) = 1 ( ) t τ ψ s s We can build orthonormal basis for L 2 using wavelets

Multiresolution approximation Sequence {V j, j Z} of closed subspaces of L 2 (R) such that P Vj (f ) is an approximation of f at scale 2 j Conditions: Dilating functions in V j by 2 yields functions in V j+1 f (t) V j f ( t V j+1 2) Approximations at a scale 2 j are always better than at 2 j+1 V j+1 V j

Multiresolution approximation V j is invariant to translations at the scale 2 j f (t) V j f ( t 2 j k ) V j for all k Z As j the approximation loses all information lim V j = {0} j As j the approximation is perfect lim j V j = L 2 There exists a scaling function ζ V 0 such that { ζ0,k (t) := ζ (t k), k Z } is an orthonormal basis for V 0

{ ψ2 j,k, k Z } is an orthonormal basis for V j V j { ζ0,k (t), ψ 2 1,k, ψ 2 2,k,..., ψ 2 j,k, k Z } is an orthonormal basis for V j Wavelet basis Mallat and Meyer prove that there exists a wavelet ψ such that P Vj (f ) = P Vj+1 (f ) + k Z ψ2 j,k, f ψ 2 j,k. Many different wavelet bases: Meyer, Daubechies, Battle-Lemarie,... Discrete wavelet transform can be computed in O (n) Signal processing interpretation: Wavelets act as band-pass filters, scaling functions act as low-pass filters

Haar wavelet Scaling function Mother wavelet

Electrocardiogram Signal Haar transform

Scale 2 9 Contribution Approximation

Scale 2 8 Contribution Approximation

Scale 2 7 Contribution Approximation

Scale 2 6 Contribution Approximation

Scale 2 5 Contribution Approximation

Scale 2 4 Contribution Approximation

Scale 2 3 Contribution Approximation

Scale 2 2 Contribution Approximation

Scale 2 1 Contribution Approximation

Scale 2 0 Contribution Approximation

2D Wavelets Extension to 2D by using outer products of 1D atoms φ 2D s 1,s 2,k 1,k 2 := φ 1D s 1,k 1 ( φ 1D s 2,k 2 ) T Yields sparse representation of natural images The JPEG 2000 compression standard is based on 2D wavelets Many extensions: Steerable pyramid, ridgelets, curvelets, bandlets,...

2D wavelet transform

2D wavelet transform

Sorted coefficients 10 3 10 1 10 1 10 3

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Overcomplete dictionaries Atoms {φ 1,..., φ m } are linearly independent and m > n D := [ φ 1 φ 2 φ m ] Synthesis sparse model: x = Dc where c is sparse Problem: There are infinite choices of c such that x = Dc Some may not be sparse at all! Example: Dictionary of sinusoids

Dictionary of sinusoids x = Dc c

First idea Apply pseudoinverse ĉ := D x = D T ( DD T ) 1 x Interpretations: Projection of c onto row space of D Solution to minimize c 2 subject to x = D c,

Dictionary of sinusoids: Minimum l 2 -norm coefficients

Computing sparse representations We would like to solve Computationally intractable minimize c 0 subject to x = D c Two possibilities: Greedy methods: Select atoms one by one l 1 -norm minimization: min c R m c 1 such that x = Dc

Matching pursuit (MP) Iteratively choose atoms that are most correlated with the signal Initialization: Iterations: k = 1, 2,... φ (k) = arg max j ˆx (k) = ˆx (k 1) + r (0) = x ˆx (0) = 0 r (k 1), φ k r (k 1), φ (k) φ (k) r (k) = r (k 1) r (k 1), φ (k) φ (k)

Dictionary of sinusoids: Coefficients Original MP

Dictionary of sinusoids: Approximation Original MP

Orthogonal matching pursuit (OMP) Makes sure approximation is orthogonal to residual at Initialization: Iterations: k = 1, 2,... φ (k) = arg max j r (0) = x r (k 1), φ k A (k) = [ φ (1) φ (2) φ (k)] ĉ (k) = A (k) x = ˆx (k) = A (k) ĉ (k) r (k) = x ˆx (k) ( A (k) A (k)t ) 1 A (k)t x

Dictionary of sinusoids: Coefficients Original OMP

Dictionary of sinusoids: Approximation Original OMP

l 1 -norm minimization Estimate coefficients by solving minimize c 1 subject to x = D c Computationally tractable (convex program) Known as basis pursuit in the literature

Geometric intuition c 2 Min. l 1 -norm solution Min. l 2 -norm solution c 1 Dc = x

Dictionary of sinusoids: Coefficients Original l 1 -norm min.

Dictionary of sinusoids: Approximation Original l 1 -norm min.

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Denoising Aim: Extracting information (signal) from data in the presence of uninformative perturbations (noise) Additive noise model data = signal + noise y = x + z Prior knowledge about structure of signal vs structure of noise is required

Electrocardiogram Spectrum

Electrocardiogram: High-frequency noise (power line hum) Original spectrum Low-pass filtered spectrum

Electrocardiogram: High-frequency noise (power line hum) Original spectrum Low-pass filtered spectrum

Electrocardiogram: High-frequency noise (power line hum) Original Low-pass filtered

Electrocardiogram: High-frequency noise (power line hum) Original Low-pass filtered

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Thresholding Prior knowledge: Signal is a sparse superposition of dictionary atoms Noise is not (incoherence between atoms and noise) Hard-thresholding operator H η (x) i := { x i if x i > η, 0 otherwise

Denoising via thresholding Data Signal

Denoising via thresholding Estimate Signal

Sparsity in a basis Assumption: x = Bc, where c is sparse Threshold B 1 y ĉ = H η ( B 1 y ) = H η ( c + B 1 z ) ŷ = Bĉ Noise and sparsifying atoms should be incoherent, i.e. B 1 z is not sparse Example: Orthogonal sparsifying basis and Gaussian noise

Denoising via thresholding in DCT basis DCT coefficients Data Signal Data

Denoising via thresholding in DCT basis DCT coefficients Estimate Signal Estimate

Denoising via thresholding in a wavelet basis

Denoising via thresholding in a wavelet basis

2D wavelet coefficients

Original coefficients

Thresholded coefficients

Estimate

Estimate

Denoising via thresholding in a wavelet basis Original Noisy Estimate

Analysis model Assumption: D T x is sparse Threshold, then use left inverse of D T L ) ĉ = H η (D T y ) = H η (D T x + D T z ŷ = Lĉ Example: Thresholding STFT coefficients for speech denoising

Block thresholding Assumption: Coefficients are group sparse, nonzero coefficients cluster together Block thresholding: Partition coefficients into blocks I 1, I 2,..., I k and threshold whole blocks { x i if i I j such that x Ij 2 > η, B η (x) i := 0 otherwise

Haar transform

2D wavelet transform

Speech denoising

Time thresholding

Spectrum

Frequency thresholding

Frequency thresholding Data DFT thresholding

Spectrogram (STFT) Frequency Time

STFT thresholding Frequency Time

STFT thresholding Data STFT thresholding

STFT block thresholding Frequency Time

STFT block thresholding Data STFT block thresh.

Wavelets

Sorted wavelet coefficients 10 3 10 1 10 1 10 3

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Denoising via l 1 -norm regularized least squares Synthesis sparse model x = Dc where c is sparse We would like to solve minimize c 0 subject to y D c Computationally intractable if D is overcomplete Basis-pursuit denoising ĉ = arg min x R m y D c 2 2 + λ c 1 ˆx = Dĉ

Sines and spikes x = Dc c DCT subdictionary Spike subdictionary

Denoising via l 1 -norm regularized least squares

Denoising via l 1 -norm regularized least squares Signal Estimate

Denoising via l 1 -norm regularized least squares Signal Estimate

Introduction Linear transforms Frequency representation Short-time Fourier transform (STFT) Wavelets Overcomplete sparse models Denoising The denoising problem Thresholding Synthesis model Analysis model

Denoising via l 1 -norm regularized least squares Analysis sparse model We would like to solve D T x minimize subject to is sparse D T x y x 0 Computationally intractable if D is overcomplete Instead, we solve ˆx = arg min y x R x 2 m 2 + λ AT x Significantly more challenging to solve than synthesis formulation 1

Total variation Images and some time series tend to be piecewise constant Equivalently: Sparse gradient (or derivative) The total variation of an image I is the l 1 -norm of the horizontal and vertical components of the gradient TV (I ) := x I 1 + y I 1 Equivalent to l 1 -norm regularization with an overcomplete analysis operator Denoising via TV regularization Y Î = arg min Ĩ 2 ) (Ĩ + λ TV Ĩ R n n F

Denoising via TV regularization Signal Data

Denoising via TV regularization Signal TV reg. (small λ)

Denoising via TV regularization Signal TV reg. (medium λ)

Denoising via TV regularization Signal TV reg. (large λ)

Denoising via TV regularization

Denoising via TV regularization

Small λ

Small λ

Medium λ

Medium λ

Large λ

Large λ

Denoising via TV regularization Original Noisy Estimate