Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Similar documents
Machine Learning for Signal Processing Sparse and Overcomplete Representations

Sparse & Redundant Signal Representation, and its Role in Image Processing

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Structured matrix factorizations. Example: Eigenfaces

A tutorial on sparse modeling. Outline:

Sparsity in Underdetermined Systems

Introduction to Compressed Sensing

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Sensing systems limited by constraints: physical size, time, cost, energy

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EUSIPCO

Sparse Estimation and Dictionary Learning

SPARSE signal representations have gained popularity in recent

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Compressed Sensing: Extending CLEAN and NNLS

The Iteration-Tuned Dictionary for Sparse Representations

2.3. Clustering or vector quantization 57

Topographic Dictionary Learning with Structured Sparsity


Machine Learning for Signal Processing Non-negative Matrix Factorization

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Statistical approach for dictionary learning

SOS Boosting of Image Denoising Algorithms

Sparse Approximation and Variable Selection

Lecture 5 : Sparse Models

A Quest for a Universal Model for Signals: From Sparsity to ConvNets

Lecture: Face Recognition and Feature Reduction

Sparse linear models

Parcimonie en apprentissage statistique

Generalized Orthogonal Matching Pursuit- A Review and Some

Deriving Principal Component Analysis (PCA)

Sparsifying Transform Learning for Compressed Sensing MRI

The Pros and Cons of Compressive Sensing

Lecture Notes 10: Matrix Factorization

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Lecture: Face Recognition and Feature Reduction

Sparse Linear Models (10/7/13)

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Tensor-Based Dictionary Learning for Multidimensional Sparse Recovery. Florian Römer and Giovanni Del Galdo

Bayesian Paradigm. Maximum A Posteriori Estimation

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Learning an Adaptive Dictionary Structure for Efficient Image Sparse Coding

Complementary Matching Pursuit Algorithms for Sparse Approximation

Sparse linear models and denoising

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

TUTORIAL PART 1 Unsupervised Learning

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Analysis of Greedy Algorithms

Compressed Sensing and Neural Networks

Greedy Signal Recovery and Uniform Uncertainty Principles

ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION

Greedy Dictionary Selection for Sparse Representation

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

1 Sparsity and l 1 relaxation

Introduction to Machine Learning

17 Random Projections and Orthogonal Matching Pursuit

Multiple Change Point Detection by Sparse Parameter Estimation

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Factor Analysis (10/2/13)

Greedy Sparsity-Constrained Optimization

Designing Information Devices and Systems I Discussion 13B

Compressive Sensing and Beyond

Sketching for Large-Scale Learning of Mixture Models

Sparsity Regularization

Machine Learning for Signal Processing Non-negative Matrix Factorization

Why Sparse Coding Works

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Linear Methods for Regression. Lijun Zhang

CSC 576: Variants of Sparse Learning

ECS289: Scalable Machine Learning

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

A Review of Sparsity-Based Methods for Analysing Radar Returns from Helicopter Rotor Blades

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad

Compressed Sensing and Related Learning Problems

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Review: Learning Bimodal Structures in Audio-Visual Data

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Recent developments on sparse representation

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Final Presentation

Transcription:

Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1

Key Topics in this Lecture Basics Component-based representations Overcomplete and Sparse Representations, Dictionaries Pursuit Algorithms How to learn a dictionary Why is an overcomplete representation powerful? 2

Representing Data Dictionary (codebook) 3

Representing Data Dictionary Atoms 4

Representing Data Dictionary Atoms Each atom is a basic unit that can be used to compose larger units. 5

Representing Data Atoms 6

Representing Data Atoms 7

Representing Data Atoms 8

Representing Data 9

Representing Data 10

Representing Data 11

Representing Data 6.44 0.004 0.01 12.19 0.02 4.00 9.12. 12

Representing Data w 1 w 5 w 2 w 6 w 3 w 7 w 4. 13

Overcomplete Representations What is the dimensionality of the input image? (say 64x64 image) 4096 What is the dimensionality of the dictionary? (each image = 64x64 pixels) N x 4096 14

Overcomplete Representations What is the dimensionality of the input image? (say 64x64 image) 4096 What is the dimensionality of the dictionary? (each image = 64x64 pixels) N x 4096??? 15

Overcomplete Representations What is the dimensionality of the input image? (say 64x64 image) 4096 What is the dimensionality of the dictionary? (each image = 64x64 pixels) N x 4096 VERY LARGE!!! 16

Overcomplete Representations What is the dimensionality of the input image? (say 64x64 image) If N > 4096 (as it likely is) we have 4096 an overcomplete representation What is the dimensionality of the dictionary? (each image = 64x64 pixels) N x 4096 VERY LARGE!!! 17

Overcomplete Representations What is the dimensionality of the input image? (say 64x64 image) More generally: 4096 If #(basis vectors) > dimensions of input What is the dimensionality of the dictionary? we have an overcomplete representation (each image = 64x64 pixels) N x 4096 VERY LARGE!!! 18

Representing Data = 19

Representing Data = 20

Representing Data D α X = Input 21

Representing Data Unknown D α X = Input 22

Quick Linear Algebra Refresher Remember, #(Basis Vectors)= #unknowns D.α = X Basis Vectors Unknowns Input data 23

Quick Linear Algebra Refresher Remember, #(Basis Vectors)= #unknowns D.α = X Basis Vectors Unknowns Input data When can we solve for α? 24

Quick Linear Algebra Refresher D.α = X When #(Basis Vectors) = dim(input Data), we have a unique solution When #(Basis Vectors) < dim(input Data), we may have no solution When #(Basis Vectors) > dim(input Data), we have infinitely many solutions 25

Quick Linear Algebra Refresher D.α = X When #(Basis Vectors) = dim(input Data), we have a unique solution When #(Basis Vectors) < dim(input Data), we may have no solution When #(Basis Vectors) > dim(input Data), we have infinitely many solutions Our Case 26

Overcomplete Representations #(Basis Vectors) > dimensions of the input 27

Overcomplete Representation Unknown D α X = #(Basis Vectors) > dimensions of the input 28

Overcomplete Representations Why do we use them? How do we learn them? 29

Overcomplete Representations Why do we use them? A more natural representation of the real world More flexibility in matching data Can yield a better approximation of the statistical distribution of the data. How do we learn them? 30

Overcompleteness and Sparsity To solve an overcomplete system of the type: D.α = X Make assumptions about the data. Suppose, we say that X is composed of no more than a fixed number (k) of bases from D (k dim(x)) The term bases is an abuse of terminology.. Now, we can find the set of k bases that best fit the data point, X. 31

Representing Data Using bases that we know But no more than k=4 bases 32

Overcompleteness and Sparsity Atoms But no more than k=4 bases are active 33

Overcompleteness and Sparsity Atoms But no more than k=4 bases 34

No more than 4 bases 0 0 0 0 α D 0 0 = X 0 0 0 0 0 35

No more than 4 bases ONLY THE a COMPONENTS CORRESPONDING TO THE 4 KEY D DICTIONARY ENTRIES ARE NON-ZERO 0 0 0 0 0 0 α = X 0 0 0 0 0 36

No more than 4 bases ONLY THE a COMPONENTS CORRESPONDING TO THE 4 KEY D DICTIONARY ENTRIES ARE NON-ZERO 0 0 0 0 0 0 α = X MOST OF a IS ZERO!! a IS SPARSE 0 0 0 0 0 37

Sparsity- Definition Sparse representations are representations that account for most or all information of a signal with a linear combination of a small number of atoms. (from: www.see.ed.ac.uk/~tblumens/sparse/sparse.html) 38

The Sparsity Problem We don t really know k You are given a signal X Assuming X was generated using the dictionary, can we find α that generated it? 39

The Sparsity Problem We want to use as few basis vectors as possible to do this. Min a a 0 s.t. X Da 40

The Sparsity Problem We want to use as few basis vectors as possible to do this. Min a a 0 s.t. X Da Counts the number of nonzero elements in α 41

The Sparsity Problem We want to use as few basis vectors as possible to do this. Min a a 0 s.t. X Da How can we solve the above? 42

Obtaining Sparse Solutions We will look at 2 algorithms: Matching Pursuit (MP) Basis Pursuit (BP) 43

Matching Pursuit (MP) Greedy algorithm Finds an atom in the dictionary that best matches the input signal Remove the weighted value of this atom from the signal Again, find an atom in the dictionary that best matches the remaining signal. Continue till a defined stop condition is satisfied. 44

Matching Pursuit Find the dictionary atom that best matches the given signal. Weight = w 1 45

Matching Pursuit Remove weighted image to obtain updated signal Find best match for this signal from the dictionary 46

Matching Pursuit Find best match for updated signal 47

Matching Pursuit Find best match for updated signal Iterate till you reach a stopping condition, norm(residualinputsignal) < threshold 48

Matching Pursuit From http://en.wikipedia.org/wiki/matching_pursuit 49

Matching Pursuit Problems??? 50

Matching Pursuit Main Problem Computational complexity The entire dictionary has to be searched at every iteration 51

Comparing MP and BP Matching Pursuit Hard thresholding Basis Pursuit (remember the equations) Greedy optimization at each step Weights obtained using greedy rules 52

Basis Pursuit (BP) Remember, Min a a 0 s.t. X Da 53

Basis Pursuit Remember, Min a a 0 s.t. X Da In the general case, this is intractable 54

Basis Pursuit Remember, Min a a 0 s.t. X Da In the general case, this is intractable Requires combinatorial optimization 55

Basis Pursuit Replace the intractable expression by an expression that is solvable Min a a 1 s.t. X Da 56

Basis Pursuit Replace the intractable expression by an expression that is solvable Min a a 1 s.t. X Da This holds when D obeys the Restricted Isometry Property. 57

Basis Pursuit Replace the intractable expression by an expression that is solvable Min a a 1 s.t. X Da Objective Constraint 58

Basis Pursuit We can formulate the optimization term as: Min a { X Da 2 a 1 } Constraint Objective 59

Basis Pursuit We can formulate the optimization term as: Min a { X Da 2 a 1 } λ is a penalty term on the non-zero elements and promotes sparsity 60

Basis Pursuit Equivalent We can formulate to LASSO; the for optimization more details, term see this as: paper by Tibshirani Min a { X Da 2 a 1 } λ is a penalty term on the non-zero elements and promotes sparsity 61

Basis Pursuit There are efficient ways to solve the LASSO formulation. [Link to Matlab code] 62

Comparing MP and BP Matching Pursuit Hard thresholding Basis Pursuit Soft thresholding (remember the equations) Greedy optimization at each step Weights obtained using greedy rules Global optimization Can force N-sparsity with appropriately chosen weights 63

Many Other Methods.. Iterative Hard Thresholding (IHT) CoSAMP OMP 64

Applications of Sparse Representations Many many applications Signal representation Statistical modelling.. Two extremely popular signal processing applications: Compressive sensing Denoising 65

Compressive Sensing Recall the Nyquist criterion? To reconstruct a signal, you need to sample at twice the maximum frequency of the original signal 66

Compressive Sensing Recall the Nyquist criterion? To reconstruct a signal, you need to sample at twice the frequency of the original signal Is it possible to reconstruct signals when they have not been sampled so as to satisfy the Nyquist criterion? 67

Compressive Sensing Recall the Nyquist criterion? To reconstruct a signal, you need to sample at twice the frequency of the original signal Is it possible to reconstruct signals when they have not been sampled so as to satisfy the Nyquist criterion? Under specific criteria, yes!!!! 68

Compressive Sensing What criteria? 69

Compressive Sensing What criteria? Sparsity! 70

Compressive Sensing What criteria? Sparsity! Exploit the structure of the data Most signals are sparse, in some domain 71

Applications of Sparse Representations Two extremely popular applications: Compressive sensing You will hear more about this in Aswin s class Denoising 72

Applications of Sparse Representations Two extremely popular applications: Compressive sensing Denoising 73

Denoising As the name suggests, remove noise! 74

Denoising As the name suggests, remove noise! We will look at image denoising as an example 75

Image Denoising Here s what we want 76

Image Denoising Here s what we want 77

Image Denoising Here s what we want 78

Denoising As the name suggests, remove noise! We will look at image denoising as an example A more general take-away: How to learn the dictionaries 79

The Image Denoising Problem Given an image Remove Gaussian additive noise from it 80

Image Denoising Noisy Input Orig. Image Y = X + ε ε = Ν(0, σ) Gaussian Noise 81

Image Denoising Remove the noise from Y, to obtain X as best as possible. 82

Image Denoising Remove the noise from Y, to obtain X as best as possible Using sparse representations over learned dictionaries 83

Image Denoising Remove the noise from Y, to obtain X as best as possible Using sparse representations over learned dictionaries Yes, we will learn the dictionaries 84

Image Denoising Remove the noise from Y, to obtain X as best as possible Using sparse representations over learned dictionaries Yes, we will learn the dictionaries What data will we use? The corrupted image itself! 85

Image Denoising We use the data to be denoised to learn the dictionary. Training and denoising become an iterated process. We use image patches of size n x n pixels (i.e. if the image is 64x64, patches are 8x8) 86

Image Denoising The data dictionary D Size = n x k (k > n) This is known and fixed, to start with Every image patch can be sparsely represented using D 87

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } 88

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } Can Matching Pursuit solve this? 89

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } Can Matching Pursuit solve this? Yes 90

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } Can Matching Pursuit solve this? What constraints does it need? Yes 91

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } Can Basis Pursuit solve this? 92

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 0 } But this is intractable! 93

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 1 } Can Basis Pursuit solve this? 94

Image Denoising Recall our equations from before. We want to find α so as to minimize the value of the equation below: Min a { X Da 2 a 1 } Can Basis Pursuit solve this? Yes 95

Image Denoising Min a { X Da 2 a 1 } In the above, X is a patch. 96

Image Denoising Min a { X Da 2 a 1 } In the above, X is a patch. If the larger image is fully expressed by the every patch in it, how can we go from patches to the image? 97

Image Denoising Min a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij 98

Image Denoising Min a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij (X Y) is the error between the input and denoised image. μ is a penalty on the error. 99

Image Denoising Min a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij Error bounding in each patch -what is R ij? -How many terms in the summation? 100

Image Denoising Min a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij λ forces sparsity 101

Image Denoising But, we don t know our dictionary D. We want to estimate D as well. 102

Image Denoising But, we don t know our dictionary D. We want to estimate D as well. Min D,a ij,x { X Y 2 2 } ij a ij 0 ij ij R ij X Da ij 2 We can use the previous equation itself!!! 2 103

Image Denoising Min D,a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij How do we estimate all 3 at once? 104

Image Denoising Min D,a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij How do we estimate all 3 at once? We cannot estimate them at the same time! 105

Image Denoising Min D,a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij How do we estimate all 3 at once? Fix 2, and find the optimal 3 rd. 106

Image Denoising Min D,a ij,x { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij Initialize X = Y 107

Image Denoising 2 Min { X Y a 2 ij } ij a ij 0 ij ij R ij X Da ij 2 0 2 Initialize X = Y, initialize D You know how to solve the remaining portion for α MP, BP! 108

Image Denoising Now, update the dictionary D. Update D one column at a time, following the K-SVD algorithm K-SVD maintains the sparsity structure 109

Image Denoising Now, update the dictionary D. Update D one column at a time, following the K-SVD algorithm K-SVD maintains the sparsity structure Iteratively update α and D 110

K-SVD vs K-Means Kmeans: Given data Y Find D and a such that Error = Y Da 2 is minimized, with constraint a i 0 = 1 K-SVD Find D and a such that Error = Y D a 2 is minimized, with constraint a i 0 < T 111

Image Denoising Updating D For each basis vector, compute its contribution to the image E k Y j k D j a j 112

Image Denoising Updating D For each basis vector, compute its contribution to the image Eigen decomposition of E k E k U V T 113

K-SVD Updating D For each basis vector, compute its contribution to the image Eigen decomposition of E k Take the principal eigen vector as the updated basis vector. D k U 1 Update every entry in D 114

K-SVD Initialize D Estimate a Update every entry in D Iterate 115

Image Denoising Learned Dictionary for Face Image denoising From: M. Elad and M. Aharon, Image denoising via learned dictionaries and sparse representation, CVPR, 2006. 116

Image Denoising Min X { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij Const. wrt X We know D and α The quadratic term above has a closedform solution 117

Image Denoising Min X { X Y 2 2 ij R ij X Da ij 2 2 } ij a ij 0 ij Const. wrt X We know D and α X ( I R T R) 1 ( Y R T Da ) ij ij ij ij ij 118

Image Denoising Summarizing We wanted to obtain 3 things 119

Image Denoising Summarizing We wanted to obtain 3 things Weights α Dictionary D Denoised Image X 120

Image Denoising Summarizing We wanted to obtain 3 things Weights α Your favorite pursuit algorithm Dictionary D Using K-SVD Denoised Image X 121

Image Denoising Summarizing We wanted to obtain 3 things Weights α Your favorite pursuit algorithm Dictionary D Using K-SVD Denoised Image X Iterating 122

Image Denoising Summarizing We wanted to obtain 3 things Weights α Dictionary D Denoised Image X- Closed form solution 123

K-SVD algorithm (skip) 124

Comparing to Other Techniques Non-Gaussian data PCA of ICA Which is which? Images from Lewicki and Sejnowski, Learning Overcomplete Representations, 2000. 125

Comparing to Other Techniques Non-Gaussian data PCA ICA Images from Lewicki and Sejnowski, Learning Overcomplete Representations, 2000. 126

Comparing to Other Techniques Predicts data here Non-Gaussian data Doesn t predict data here Does pretty well PCA ICA Images from Lewicki and Sejnowski, Learning Overcomplete Representations, 2000. 127

Comparing to Other Techniques Data still in 2-D space ICA Overcomplete Doesn t capture the underlying representation, which Overcomplete representations can do 128

Summary Overcomplete representations can be more powerful than component analysis techniques. Dictionary can be learned from data. Relative advantages and disadvantages of the pursuit algorithms. 129