A Quest for a Universal Model for Signals: From Sparsity to ConvNets

Size: px
Start display at page:

Download "A Quest for a Universal Model for Signals: From Sparsity to ConvNets"

Transcription

1 A Quest for a Universal Model for Signals: From Sparsity to ConvNets Yaniv Romano The Electrical Engineering Department Technion Israel Institute of technology Joint work with Vardan Papyan Jeremias Sulam Prof. Michael Elad The research leading to these results has been received funding from the European union's Seventh Framework Program (FP/ ) ERC grant Agreement ERC-SPARSE

2 Local Sparsity = Signal Dictionary (learned) Sparse vector

3 Independent patch-processing Local Sparsity =

4 Independent patch-processing Local Sparsity =

5 Independent patch-processing Local Sparsity Convolutional Sparse Coding =

6 Independent patch-processing Local Sparsity Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding

7 Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding

8 Independent patch-processing Convolutional Sparse Coding Local Sparsity! Convolutional Neural Networks Multi-Layer Convolutional Sparse Coding The forward pass is a sparsecoding algorithm, serving our model Forward pass Extension of the classical sparse theory to a multi-layer setting

9 Our Story Begins with 9

10 Image Denoising Original Image X White Gaussian Noise E Noisy Image Y Many (thousands) image denoising algorithms can be cast as the minimization of an energy function of the form min X 1 2 X Y G X Topic=image and Relation to measurements Prior or regularization noise and (removal or denoising) 10

11 Leading Image Denoising Methods are built upon powerful patch-based local models: Popular local models: GMM Sparse-Representation Example-based Low-rank Field-of-Experts & Neural networks Working locally allows us to learn the model, e.g. dictionary learning 11

12 Why is it so Popular? It does come up in many applications Other inverse problems can be recast as an iterative denoising [Zoran & Weiss 11] [Venkatakrishnan, Bouman & Wohlberg 13] [Brifman, Romano & Elad 16] In a recent work we show that a denoiser f X can form a regularizer: Regularization by Denoising (RED) [Romano, Elad & Milanfar 16] 1 min X 2 HX Y G X Relation to measurements Prior 1 min X It is the simplest inverse problem revealing the limitations of the model 2 HX Y ρ 2 XT X f X Relation to measurements Under simple conditions on f x X G X G X regularizer = X f X 12

13 The Sparse-Land Model Assumes that every patch is a linear combination of a few columns, called atoms, taken from a matrix called a dictionary The operator R i extracts the i-th n-dimensional patch from X R N Sparse coding: n m > n Ω = n R i X P 0 : min γ i γ i 0 s. t. R i X = Ωγ i γ i 13

14 Patch Denoising Given a noisy patch R i Y, solve P 0 ϵ : γ i = argmin γ i γ i 0 s. t. R i Y Ωγ i 2 ϵ Clean patch: Ωγ i P 0 and (P 0 ϵ ) are hard to solve Greedy methods such as Orthogonal Matching Pursuit (OMP) or Thresholding Convex relaxations such as Basis Pursuit (BP) P ϵ 2 1 : min γ i 1 + ξ R i Y Ωγ i 2 γ i 14

15 Recall K-SVD Denoising [Elad & Aharon, 06] Noisy Image Initial Dictionary Using K-SVD Update the dictionary Reconstructed Image Denoise each patch Using OMP Despite its simplicity, this is a very well-performing algorithm We refer to this framework as patch averaging A modification of this method leads to state-of-the-art results [Mairal, Bach, Ponce, Spairo, Zisserman, `09] 15

16 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of The Local-Global Gap: Efficient Independent Local Patch Processing VS. The Global Need to Model The Entire Image 16

17 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Y 17

18 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) Y X 18

19 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) The orthogonally is lost due to patch-averaging Y X 19

20 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) Y X 20

21 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise 21

22 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Previous Result 22

23 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Operate Previous Result 23

24 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Operate Subtract Previous Result 24

25 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] X k+1 = f Y + ρx k ρx k Relation to Game Theory: Encourage overlapping patches to reach a consensus Relation to Graph Theory: Minimizing a Laplacian regularization functional: X = argmin X X f Y ρx T X f X Why boosting? Since it s guaranteed to be able to improve any denoiser 25

26 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] 26

27 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] 27

28 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch Self-Similarity feature 28

29 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch RAISR (Upscaling) Edge Features: Direction, Strength, Consistency 29

30 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch RAISR (Upscaling) Edge Features: Direction, Strength, Consistency 30

31 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Noisy K-SVD EPLL 31

32 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 32

33 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 33

34 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 34

35 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening Missing the residual a theoretical image [Romano & Elad backbone! 13] SOS-Boosting [Romano & Elad 15] Exploiting Why self-similarities can we [Ram use & Elad a local 13] [Romano, prior Protter to & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] solve a global problem? Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 35

36 Now, Our Story Takes a Surprising Turn 36

37 Convolutional Sparse Coding Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding Vardan Papyan, Jeremias Sulam and Michael Elad Convolutional Dictionary Learning via Local Processing Vardan Papyan, Yaniv Romano, Jeremias Sulam, and Michael Elad [LeCun, Bottou, Bengio and Haffner 98] [Lewicki & Sejnowski 99] [Hashimoto & Kurata, 00] [Mørup, Schmidt & Hansen 08] [Zeiler, Krishnan, Taylor & Fergus 10] [Jarrett, Kavukcuoglu, Gregor, LeCun 11] [Heide, Heidrich & Wetzstein 15] [Gu, Zuo, Xie, Meng, Feng & Zhang 15] 37

38 Intuitively X = The first filter The second filter 38

39 Convolutional Sparse Coding (CSC) =

40 Convolutional Sparse Representation R i+1 R i X = (2n (2n 1)m 1)m n n D L γ i γ i+1 X = DΓ R i X = Ωγ i stripe-dictionary stripe vector Adjacent representations overlap, as they skip by m items as we sweep through the patches of X 40

41 CSC Relation to Our Story A clear global model: every patch has a sparse representation w.r.t. to the same local dictionary Ω, just as we have assumed No notion of disagreement on the patch overlaps Related to the current common practice of patch averaging (R i T - put the patch Ωγ i back in the i-th location of the global vector) X = DΓ = 1 n i R i T Ωγ i What about the Pursuit? Patch averaging : independent sparse coding for each patch CSC: should seek all the representations together Is there a bridge between the two? We ll come back to this later What about the theory? Until recently little was known regrading the theoretical aspects of CSC 41

42 Classical Sparse Theory (Noiseless) P 0 : min Γ Γ 0 s. t. X = DΓ Definition: Mutual Coherence: μ D = max i j d i T d j [Donoho & Elad 03] Theorem: For a signal X = DΓ, if Γ 0 < then this solution is necessarily the sparsest [Donoho & Elad 03] μ D Theorem: The OMP and BP are guaranteed to recover the true sparse code assuming that Γ 0 < μ D [Tropp 04], [Donoho & Elad 03] 42

43 CSC: The Need for a Theoretical Study Assuming that m = 2 and n = 64 we have that [Welch, 74] μ D As a result, uniqueness and success of pursuits is guaranteed as long as Γ 0 < μ(d) Less than 8 non-zeros GLOBALLY are allowed!!! This is a very pessimistic result! Repeating the above for the noisy case leads to even worse performance predictions Bottom line: Classic SparseLand theory cannot provide good explanations for the CSC model 43

44 Moving to Local Sparsity: Stripes s Γ 0, = max i γ i 0 s P 0, : min Γ 0, s. t. X = DΓ Γ s Γ 0, is low all γ i are sparse every patch has a sparse representation over Ω If Γ is locally sparse [Papyan, Sulam & Elad 16] The solution of P 0, is necessarily unique The global OMP and BP are guaranteed to recover it m = 2 γ i This result poses a local constraint for a global guarantee, and as such, the guarantees scale linearly with the dimension of the signal 44

45 From Ideal to Noisy Signals In practice, Y = DΓ + E, where E is due to noise or model deviations Y DΓ E ϵ s P 0, : min Γ 0, s. t. Y DΓ 2 ϵ Γ Γ How close is Γ to Γ? If Γ is locally sparse and the noise is bounded [Papyan, Sulam & Elad 16] ϵ The solution of the P 0, is stable The solution obtained via global OMP/BP is stable The true and estimated representation are close 45

46 Global Pursuit via Local Processing P 1 ϵ : 1 Γ BP = min Γ 2 Y DΓ ξ Γ 1 X = DΓ = = R i T D L α i i = R i T s i m i s i are slices not patches n D L α i m 46

47 Global Pursuit via Local Processing (2) P 1 ϵ : Γ BP = min Γ 1 2 Y DΓ ξ Γ 1 Using variable splitting s i = D L α i 2 1 min s i, α i 2 Y R T i s i i 2 + ξ α i 1 i s. t. s i = D L α i These two problems are equivalent, and convex w.r.t their variables The new formulation targets the local slices, and their sparse representations Can be solved via ADMM, replace the constraint with a penalty 47

48 Slice-Based Pursuit Local Sparse Coding: Slice Reconstruction: α i = argmin α i p i = 1 ρ R iy + D L α i u i ρ 2 s i D L α i + u i ξ α i 1 Slice Aggregation: Local Laplacian: Dual Variable Update: X = j R j T p j s i = p i 1 ρ + n R ix u i = u i + s i D L α i Comment: One iteration of this procedure amounts to the very same patch-averaging algorithm we started with 48

49 Two Comments About this Scheme We work with Slices and not Patches Patches extracted from natural images, and their corresponding slices. Observe how the slices are far simpler, and contained by their corresponding patches The Proposed Scheme can be used for Dictionary (D L ) Learning Slice-based DL algorithm using standard patch-based tools, leading to a faster and simpler method, compared to existing methods Patches Slices Patches Slices [Wohlberg, 2016] Ours 49

50 Two Comments About this Scheme We work with Slices and not Patches Patches extracted from natural images, and their corresponding slices. Observe how the slices are far simpler, and contained by their corresponding patches Patches Slices Patches Slices The Proposed Scheme can be used for Dictionary (D L ) Learning Slice-based DL algorithm using standard patch-based tools, leading to a faster and simpler method, compared to existing methods Heide et. al Wohlberg [Wohlberg, 2016] Ours 50

51 Going Deeper Convolutional Neural Networks Analyzed via Convolutional Sparse Coding Joint work with Vardan Papyan and Michael Elad 51

52 CSC and CNN There is an analogy between CSC and CNN: Convolutional structure, data driven models, ReLU is a sparsifying operator, and more We propose a principled way to analyze CNN SparseLand Sparse Representation Theory The Underlying Idea Modeling data sources enables a theoretical analysis of algorithms performance But first, a short review of CNN CNN * Convolutional Neural Networks *Our analysis holds true for fully connected networks as well 52

53 CNN N Y m 1 N N m 2 N N n 1 W 2 N n 1 n 0 W 1 n 0 [LeCun, Bottou, Bengio and Haffner 98] [Krizhevsky, Sutskever & Hinton 12] [Simonyan & Zisserman 14] [He, Zhang, Ren & Sun 15] ReLU z = max 0, z 53

54 Mathematically... T f X, Y, W i, b i = ReLU b 2 + W 2 ReLU b 1 + W T 1 X Y Z 2 R Nm 2 b W T 2 R Nm 2 Nm 1 2 R Nm 2 n 1 m 1 b 1 R Nm 1 W 1 T R Nm 1 N m 2 m 1 n 0 Y R N ReLU ReLU m 1 54

55 Training Stage of CNN Consider the task of classification Given a set of signals Y j j and their corresponding labels h Y j, the CNN learns an end-to-end mapping j min W i, b i,u j l h Y j, U, f Y j, W i, b i True label Classifier Output of last layer 55

56 Back to CSC X R N n 0 D 1 R N Nm 1 m 1 Γ 1 R Nm 1 We propose to impose the same structure on the representations themselves Γ 1 R Nm 1 D 2 R Nm 1 Nm 2 Γ 2 R Nm 2 Convolutional sparsity (CSC) assumes an inherent structure is present in natural signals n 1 m 1 m 2 m 1 Multi-Layer CSC (ML-CSC) 56

57 Intuition: From Atoms to Molecules X R N D 1 R N Nm 1 Γ D 2 R Nm 1 Nm 2 1 R Nm 1 Γ 2 R Nm 2 Columns in D 1 are convolutional atoms Columns in D 2 combine the atoms in D 1, creating more complex structures The dictionary D 1 D 2 is a superposition of the atoms of D 1 The size of the effective atoms is increased throughout the layers (receptive field) Γ 1 R Nm 1 57

58 Intuition: From Atoms to Molecules X R N D 1 R N Nm 1 D 2 R Nm 1 Nm 2 Γ 2 R Nm 2 One could chain the multiplication of all the dictionaries into one effective dictionary D eff = D 1 D 2 D 3 D K and then X = D eff Γ K as in SparseLand However, a key property in this model is the sparsity of each representation (feature-maps) Γ 1 R Nm 1 58

59 A Small Taste: Model Training (MNIST) MNIST Dictionary: D 1 : 32 filters of size 7 7, with stride of 2 (dense) D 2 : 128 filters of size with stride of % sparse D3: 1024 filters of size % sparse D 1 (7 7) D 1 D 2 (15 15) D 1 D 2 D 3 (28 28) 59

60 A Small Taste: Pursuit Y Γ 0 Γ % sparse (213 nnz) Γ % sparse (30 nnz) Γ % sparse (5 nnz) 60

61 A Small Taste: Pursuit Y Γ 0 Γ 1 Γ % sparse (302 nnz) % sparse (47 nnz) % sparse (6 nnz) Γ 3 61

62 A Small Taste: Model Training (CIFAR) D 1 (5 5 3) D 1 D 2 (13 13) D 1 D 2 D 3 (32 32) CIFAR Dictionary: D 1 : 64 filters of size 5x5x3, stride of 2 dense D 2 : 256 filters of size 5x5x64, stride of % sparse D 3 : 1024 filters of size 5x5x % sparse 62

63 Deep Coding Problem (DCP) Noiseless Pursuit K Find Γ j j=1 Noisy Pursuit s. t. X = D 1 Γ 1 Γ 1 s 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K K Find Γ j j=1 s. t. Y D 1 Γ 1 2 E 0 Γ 1 s 0, λ 1 Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K 63

64 Deep Learning Problem DLP Supervised Dictionary Learning Task driven dictionary learning [Mairal, Bach, Sapiro & Ponce 12] min D i i=1 K Unsupervised Dictionary Learning Find D K i i=1 s. t. j Γ K 1,U True label Y j D 1 Γ 1 j Γ 1 j D2 Γ 2 j j D K Γ K j l h Y j, U, DCP Y j, D i Classifier 2 E 0 Γ 1 j 2 E 1 2 E K 1 Deepest representation obtained from DCP Γ 2 j Γ K j s 0, s 0, s 0, λ 1 λ 2 J λ K j=1 64

65 ML-CSC: The Simplest Pursuit The simplest pursuit algorithm (single-layer case) is the THR algorithm, which operates on a given input signal Y by: Y = DΓ + E and Γ is sparse Γ = P β D T Y Restricting the coefficients to be nonnegative does not restrict the expressiveness of the model ReLU = Soft Nonnegative Thresholding 65

66 Consider this for Solving the DCP Layered thresholding (LT): Estimate Γ 1 via the THR algorithm K DCP λ E : Find Γ j j=1 s. t. s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Γ 2 = P β2 D 2 T P β1 D 1 T Y Estimate Γ 2 via the THR algorithm Forward pass of CNN: Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K f X = ReLU b 2 + W 2 T ReLU b 1 + W 1 T Y ReLU Bias b Weights W Forward pass f Soft Non-Negative THR Thresholds β Dictionary D Layered Soft NN THR DCP 66

67 Consider this for Solving the DCP Layered thresholding (LT): Estimate Γ 1 via the THR algorithm K DCP λ E : Find Γ j j=1 s. t. s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Γ 2 = P β2 D 2 T P β1 D 1 T Y Estimate Γ 2 via the THR algorithm Forward pass of CNN: Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K f X = ReLU b 2 + W 2 T ReLU b 1 + W 1 T Y The layered (soft nonnegative) thresholding and the forward pass algorithm are the very same things!!! 67

68 Consider this for Solving the DLP DLP (supervised ): CNN training: min D i i=1 K,U j l h Y j, U, DCP Y j, D i Estimate via the layered THR algorithm The thresholds for the DCP should also learned min W i, b i,u j l h Y j, U, f Y, W i, b i CNN Language Forward pass f SparseLand Language Layered Soft NN THR DCP 68

69 Consider this for Solving the DLP DLP (supervised ): CNN training: * min D i i=1 K,U j l h Y j, U, DCP Y j, D i Estimate via the layered THR algorithm The thresholds for the DCP should also learned min W i, b i,u j l h Y j, U, f Y, W i, b i The problem CNN solved by SparseLand the training stage of CNN Language and the DLP are equivalent Language as well, assuming Forward pass that the DCP Layered is approximated THR Pursuit via the layered f thresholding DCP algorithm * Recall that for the ML-CSC, there exists an unsupervised avenue for training the dictionaries that has no simple parallel in CNN 69

70 Theoretical Questions M A X = D 1 Γ 1 Γ 1 = D 2 Γ 2 Γ K 1 = D K Γ K X Y DCP λ E Layered Thresholding (Forward Pass) K Γ i i=1 Γ i is L 0, sparse Other? 70

71 Theoretical Path: Possible Questions Having established the importance of the ML-CSC model and its associated pursuit, the DCP problem, we now turn to its analysis The main questions we aim to address: I. Uniqueness of the solution (set of representations) to the DCP λ? II. Stability of the solution to the DCP λ E problem? III. Stability of the solution obtained via the hard and soft layered THR algorithms (forward pass)? IV. Limitations of this (very simple) algorithm and alternative pursuit? V. Algorithms for training the dictionaries D K i i=1 vs. CNN? VI. New insights on how to operate on signals via CNN? 71

72 Uniqueness of DCP λ DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K Theorem: If a set of solutions Γ K i i=1 is found for (DCP λ ) such that: s Γ i 0, = λ i < μ D i Then these are necessarily the unique solution to this problem. The feature maps CNN aims to recover are unique Is this set unique? Mutual Coherence: μ D = max d T i d j i j [Donoho & Elad 03] 72

73 Stability of DCP λ E DCP λ E : Find a set of representations satisfying s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Is this set stable? Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K Suppose that we manage to solve the E DCP λ and find a feasible set of representations satisfying all the conditions K Γ i i=1 The question we pose is: How close is Γ i to Γ i? 73

74 Stability of DCP λ E Theorem: If the true representations Γ K i i=1 s Γ i 0, λ i < μ D i satisfy then the set of solutions Γ i i=1 obtained by solving this 2 problem (somehow) with E 0 = E 2 and E i = 0 (i 1) must obey 2 Γ i Γ i 2 4 E 2 2 i K j= λ j 1 μ D j The problem CNN aims to solve is stable under certain conditions Observe this annoying effect of error magnification as we dive into the model 74

75 Local Noise Assumption Our analysis relied on the local sparsity of the underlying solution Γ, which was enforced through the l 0, norm In what follows, we present stability guarantees that will also depend on the local energy in the noise vector E This will be enforced via the l 2, norm, defined as: p E 2, = max i R i E 2 75

76 Stability of Layered-THR s Theorem: If Γ i 0, < μ D i Γ min i 1 Γ max i μ D i ε L i 1 Γ i max then the layered hard THR (with the proper thresholds) will find the correct supports * and Γ LT i i Γ i ε 2, L where we have defined ε 0 p L = E 2, and p i p ε L = Γ i 0, ε i 1 s L + μ D i Γ i 0, 1 Γ max i The stability of the forward pass is guaranteed if the underlying representations are locally sparse and the noise is locally bounded * Least-Squares update of the non-zeros? 76

77 Limitations of the Forward Pass The stability analysis reveals several inherent limitations of the forward pass (a.k.a. Layered THR) algorithm: Even in the noiseless case, the forward pass is incapable of recovering the perfect solution of the DCP problem Its success depends on the ratio Γ i min / Γ i max. This is a direct consequence of relying on a simple thresholding operator The distance between the true sparse vector and the estimated one increases exponentially as a function of the layer depth next we propose a new algorithm that attempts to solve some of these problems 77

78 Special Case Sparse Dictionaries Throughout the theoretical study we assumed that the representations in the different layers are L 0, -sparse K Do we know of a simple example of a set of dictionaries D i i=1 and their corresponding signals X that will obey this property? Assuming the dictionaries are sparse: s Γ j 0, K s Γ K 0, D i 0 i=j+1 Maximal number of non-zeros in an atom in D i In the context of CNN, the above happens if a sparsity promoting regularization, such as the L 1, is employed on the filters 78

79 Better Pursuit? DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K So far we proposed the Layered THR: Γ K = P βk D K T P β2 D 2 T P β1 D 1 T X The motivation is clear getting close to what CNN use However, this is the simplest and weakest pursuit known in the field of sparsity Can we offer something better? 79

80 Layered Basis Pursuit (Noiseless) DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K We can propose a Layered Basis Pursuit Algorithm: Γ 1 LBP = min Γ 1 Γ 1 1 s. t. = D 1 Γ 1 Deconvolutional networks Γ 2 LBP = min Γ 2 Γ 2 1 s. t. Γ 1 LBP = D 2 Γ 2 [Zeiler, Krishnan, Taylor & Fergus 10] 80

81 Guarantee for Success of Layered BP As opposed to prior work in CNN, we can do far more than just proposing an algorithm we can analyze its terms for success: Theorem: If a set of representations Γ K i i=1 of the Multi-Layered CSC model satisfy s Γ i 0, λ i < μ D i then the Layered BP is guaranteed to find them Consequences: The layered BP can retrieve the underlying representations in the noiseless case, a task in which the forward pass fails to provide The Layered-BP s success does not depend on the ratio Γ i min / Γ i max 81

82 Layered Basis Pursuit (Noisy) DCP λ E : Find a set of representations satisfying Y D 1 Γ 1 2 E 0 Γ 1 s 0, λ 1 Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K Similarly to the noiseless case, this can be also solved via the Layered Basis Pursuit Algorithm But in a Lagrangian form Γ LBP 1 1 = min Γ 1 2 Y D 2 1Γ ξ 1 Γ 1 1 Γ LBP 1 2 = min Γ 2 2 Γ 1 LBP 2 D 2 Γ ξ2 Γ

83 Stability of Layered BP s Theorem: Assuming that Γ i 0, then for correctly chosen λ K i i=1 μ D i we are guaranteed that 1. The support of Γ i LBP is contained in that of Γ i 2. The error is bounded: Γ i LBP Γ i 2, i ε L = 7.5 i p E 2, p p Γ j 0, ε L i, where 3. Every entry in Γ i greater than ε i p L / Γ i 0, will be found i j=1 83

84 Layered Iterative Thresholding Layered BP: Γ LBP 1 j = min Γ j 2 Γ LBP 2 j 1 D j Γ j + ξj Γ 2 j 1 j t Layered Iterative Soft-Thresholding Γ j t = S ξj /c j Γ j t c j D j T Γ j 1 D j Γ j t 1 j Note that our suggestion implies that groups of layers share the same dictionaries Can be seen as a recurrent neural network [Gregor & LeCun 10] * c i > 0.5 λ max (D i T D i ) 84

85 This Talk Independent patch-processing Local Sparsity We described the limitations of patch-based processing and ways to overcome some of these 85

86 This Talk Independent patch-processing Local Sparsity Convolutional Sparse Coding We presented a theoretical study of the CSC and a practical algorithm that works locally 86

87 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding We mentioned several interesting connections between CSC and CNN and this led us to 87

88 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding propose and analyze a multi-layer extension of CSC, shown to be tightly connected to CNN 88

89 This Talk The ML-CSC was shown to enable a theoretical study of CNN, along with new insights Convolutional Neural Networks Independent patch-processing Convolutional Sparse Coding Local Sparsity Multi-Layer Convolutional Sparse Coding Extension of the classical sparse theory to a multi-layer setting A novel interpretation and theoretical understanding of CNN 89

90 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding Extension of the classical sparse theory to a multi-layer setting The underlying idea: Modeling the data source in order to be able to theoretically analyze algorithms performance A novel interpretation and theoretical understanding of CNN 90

91 Questions? 91

Convolutional Neural Networks Analyzed via Convolutional Sparse Coding

Convolutional Neural Networks Analyzed via Convolutional Sparse Coding Journal of Machine Learning Research 18 (2017) 1-52 Submitted 10/16; Revised 6/17; Published 7/17 Convolutional Neural Networks Analyzed via Convolutional Sparse Coding Vardan Papyan* Department of Computer

More information

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad Sparse Modeling in Image Processing and Deep Learning Computer Science Department - Israel Institute of Technology Haifa 32000, Israel The research leading to these results has been received funding from

More information

SOS Boosting of Image Denoising Algorithms

SOS Boosting of Image Denoising Algorithms SOS Boosting of Image Denoising Algorithms Yaniv Romano and Michael Elad The Technion Israel Institute of technology Haifa 32000, Israel The research leading to these results has received funding from

More information

arxiv: v1 [stat.ml] 27 Jul 2016

arxiv: v1 [stat.ml] 27 Jul 2016 Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Signal Modeling: From Convolutional Sparse Coding to Convolutional Neural Networks

Signal Modeling: From Convolutional Sparse Coding to Convolutional Neural Networks Signal Modeling: From Convolutional Spare Coding to Convolutional Neural Network Vardan Papyan The Computer Science Department Technion Irael Intitute of technology Haifa 32000, Irael Joint work with Jeremia

More information

Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning

Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning Jeremias Sulam, Member, IEEE, Vardan Papyan, Yaniv Romano, and Michael Elad Fellow, IEEE arxiv:78.875v [cs.cv] 3 Jun 8 Abstract

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Adversarial Noise Attacks of Deep Learning Architectures Stability Analysis via Sparse Modeled Signals

Adversarial Noise Attacks of Deep Learning Architectures Stability Analysis via Sparse Modeled Signals Noname manuscript No. (will be inserted by the editor) Adversarial Noise Attacks of Deep Learning Architectures Stability Analysis via Sparse Modeled Signals Yaniv Romano Aviad Aberdam Jeremias Sulam Michael

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Greedy Dictionary Selection for Sparse Representation

Greedy Dictionary Selection for Sparse Representation Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm Sparse Representation and the K-SV Algorithm he CS epartment he echnion Israel Institute of technology Haifa 3, Israel University of Erlangen - Nürnberg April 8 Noise Removal? Our story begins with image

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.

More information

Sparse molecular image representation

Sparse molecular image representation Sparse molecular image representation Sofia Karygianni a, Pascal Frossard a a Ecole Polytechnique Fédérale de Lausanne (EPFL), Signal Processing Laboratory (LTS4), CH-115, Lausanne, Switzerland Abstract

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Encoding Sparse encoder Layer 2 Representation

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon

Overcomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon Overcomplete Dictionaries for Sparse Representation of Signals Michal Aharon ii Overcomplete Dictionaries for Sparse Representation of Signals Reasearch Thesis Submitted in Partial Fulfillment of The Requirements

More information

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang

Image Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods

More information

Blind Compressed Sensing

Blind Compressed Sensing 1 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE arxiv:1002.2586v2 [cs.it] 28 Apr 2010 Abstract The fundamental principle underlying compressed sensing is that a signal,

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES A NEW FRAMEWORK FOR DESIGNING INCOERENT SPARSIFYING DICTIONARIES Gang Li, Zhihui Zhu, 2 uang Bai, 3 and Aihua Yu 3 School of Automation & EE, Zhejiang Univ. of Sci. & Tech., angzhou, Zhejiang, P.R. China

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Theories of Deep Learning

Theories of Deep Learning Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

Five Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad

Five Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad Five Lectures on Sparse and Redundant Representations Modelling of Images Michael Elad IAS/Park City Mathematics Series Volume 19, 2010 Five Lectures on Sparse and Redundant Representations Modelling

More information

The Little Engine That Could: Regularization by Denoising (RED)

The Little Engine That Could: Regularization by Denoising (RED) SIAM J. IMAGING SCIENCES Vol. 10, No. 4, pp. 1804 1844 c 2017 Society for Industrial and Applied Mathematics The Little Engine That Could: Regularization by Denoising (RED) Yaniv Romano, Michael Elad,

More information

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,

More information

A practical theory for designing very deep convolutional neural networks. Xudong Cao.

A practical theory for designing very deep convolutional neural networks. Xudong Cao. A practical theory for designing very deep convolutional neural networks Xudong Cao notcxd@gmail.com Abstract Going deep is essential for deep learning. However it is not easy, there are many ways of going

More information

The Iteration-Tuned Dictionary for Sparse Representations

The Iteration-Tuned Dictionary for Sparse Representations The Iteration-Tuned Dictionary for Sparse Representations Joaquin Zepeda #1, Christine Guillemot #2, Ewa Kijak 3 # INRIA Centre Rennes - Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, FRANCE

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise

L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise Srdjan Stanković, Irena Orović and Moeness Amin 1 Abstract- A modification of standard

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

Designing Information Devices and Systems I Spring 2018 Homework 13

Designing Information Devices and Systems I Spring 2018 Homework 13 EECS 16A Designing Information Devices and Systems I Spring 2018 Homework 13 This homework is due April 30, 2018, at 23:59. Self-grades are due May 3, 2018, at 23:59. Submission Format Your homework submission

More information

ON THE STABILITY OF DEEP NETWORKS

ON THE STABILITY OF DEEP NETWORKS ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference

More information

Compressed Sensing: Extending CLEAN and NNLS

Compressed Sensing: Extending CLEAN and NNLS Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning

Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning Shashank Singh Barnabás Póczos Jian Ma bapoczos@cs.cmu.edu Machine Learning Department Carnegie Mellon University sss1@cs.cmu.edu

More information

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. X, NO. X, XX 200X 1 Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation Ron Rubinstein, Student Member, IEEE, Michael Zibulevsky,

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

Convolutional neural networks

Convolutional neural networks 11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images! Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images! Alfredo Nava-Tudela John J. Benedetto, advisor 1 Happy birthday Lucía! 2 Outline - Problem: Find sparse solutions

More information

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

Designing Information Devices and Systems I Discussion 13B

Designing Information Devices and Systems I Discussion 13B EECS 6A Fall 7 Designing Information Devices and Systems I Discussion 3B. Orthogonal Matching Pursuit Lecture Orthogonal Matching Pursuit (OMP) algorithm: Inputs: A set of m songs, each of length n: S

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information