A Quest for a Universal Model for Signals: From Sparsity to ConvNets
|
|
- Neal Flynn
- 5 years ago
- Views:
Transcription
1 A Quest for a Universal Model for Signals: From Sparsity to ConvNets Yaniv Romano The Electrical Engineering Department Technion Israel Institute of technology Joint work with Vardan Papyan Jeremias Sulam Prof. Michael Elad The research leading to these results has been received funding from the European union's Seventh Framework Program (FP/ ) ERC grant Agreement ERC-SPARSE
2 Local Sparsity = Signal Dictionary (learned) Sparse vector
3 Independent patch-processing Local Sparsity =
4 Independent patch-processing Local Sparsity =
5 Independent patch-processing Local Sparsity Convolutional Sparse Coding =
6 Independent patch-processing Local Sparsity Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding
7 Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding
8 Independent patch-processing Convolutional Sparse Coding Local Sparsity! Convolutional Neural Networks Multi-Layer Convolutional Sparse Coding The forward pass is a sparsecoding algorithm, serving our model Forward pass Extension of the classical sparse theory to a multi-layer setting
9 Our Story Begins with 9
10 Image Denoising Original Image X White Gaussian Noise E Noisy Image Y Many (thousands) image denoising algorithms can be cast as the minimization of an energy function of the form min X 1 2 X Y G X Topic=image and Relation to measurements Prior or regularization noise and (removal or denoising) 10
11 Leading Image Denoising Methods are built upon powerful patch-based local models: Popular local models: GMM Sparse-Representation Example-based Low-rank Field-of-Experts & Neural networks Working locally allows us to learn the model, e.g. dictionary learning 11
12 Why is it so Popular? It does come up in many applications Other inverse problems can be recast as an iterative denoising [Zoran & Weiss 11] [Venkatakrishnan, Bouman & Wohlberg 13] [Brifman, Romano & Elad 16] In a recent work we show that a denoiser f X can form a regularizer: Regularization by Denoising (RED) [Romano, Elad & Milanfar 16] 1 min X 2 HX Y G X Relation to measurements Prior 1 min X It is the simplest inverse problem revealing the limitations of the model 2 HX Y ρ 2 XT X f X Relation to measurements Under simple conditions on f x X G X G X regularizer = X f X 12
13 The Sparse-Land Model Assumes that every patch is a linear combination of a few columns, called atoms, taken from a matrix called a dictionary The operator R i extracts the i-th n-dimensional patch from X R N Sparse coding: n m > n Ω = n R i X P 0 : min γ i γ i 0 s. t. R i X = Ωγ i γ i 13
14 Patch Denoising Given a noisy patch R i Y, solve P 0 ϵ : γ i = argmin γ i γ i 0 s. t. R i Y Ωγ i 2 ϵ Clean patch: Ωγ i P 0 and (P 0 ϵ ) are hard to solve Greedy methods such as Orthogonal Matching Pursuit (OMP) or Thresholding Convex relaxations such as Basis Pursuit (BP) P ϵ 2 1 : min γ i 1 + ξ R i Y Ωγ i 2 γ i 14
15 Recall K-SVD Denoising [Elad & Aharon, 06] Noisy Image Initial Dictionary Using K-SVD Update the dictionary Reconstructed Image Denoise each patch Using OMP Despite its simplicity, this is a very well-performing algorithm We refer to this framework as patch averaging A modification of this method leads to state-of-the-art results [Mairal, Bach, Ponce, Spairo, Zisserman, `09] 15
16 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of The Local-Global Gap: Efficient Independent Local Patch Processing VS. The Global Need to Model The Entire Image 16
17 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Y 17
18 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) Y X 18
19 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) The orthogonally is lost due to patch-averaging Y X 19
20 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] R i Y Ωγ i Orthogonal (when using the OMP) Y X 20
21 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise 21
22 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Previous Result 22
23 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Operate Previous Result 23
24 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Denoise Strengthen Operate Subtract Previous Result 24
25 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] X k+1 = f Y + ρx k ρx k Relation to Game Theory: Encourage overlapping patches to reach a consensus Relation to Graph Theory: Minimizing a Laplacian regularization functional: X = argmin X X f Y ρx T X f X Why boosting? Since it s guaranteed to be able to improve any denoiser 25
26 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] 26
27 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] 27
28 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch Self-Similarity feature 28
29 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch RAISR (Upscaling) Edge Features: Direction, Strength, Consistency 29
30 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Con-Patch RAISR (Upscaling) Edge Features: Direction, Strength, Consistency 30
31 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Noisy K-SVD EPLL 31
32 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 32
33 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 33
34 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening the residual image [Romano & Elad 13] SOS-Boosting [Romano & Elad 15] Exploiting self-similarities [Ram & Elad 13] [Romano, Protter & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 34
35 What is? Many researchers kept revisiting this algorithm with a feeling that key features are still lacking Here is what WE thought of Whitening Missing the residual a theoretical image [Romano & Elad backbone! 13] SOS-Boosting [Romano & Elad 15] Exploiting Why self-similarities can we [Ram use & Elad a local 13] [Romano, prior Protter to & Elad 14] A multi-scale treatment [Ophir, Lustig & Elad 11] [Sulam, Ophir & Elad 14] [Papyan & Elad 15] solve a global problem? Leveraging the context of the patch [Romano & Elad 16] [Romano, Isidoro & Milanfar 16] Enforcing the local model on the final patches (EPLL) Sparse EPLL [Sulam & Elad 15] Image Synthesis [Ren, Romano & Elad 17] 35
36 Now, Our Story Takes a Surprising Turn 36
37 Convolutional Sparse Coding Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding Vardan Papyan, Jeremias Sulam and Michael Elad Convolutional Dictionary Learning via Local Processing Vardan Papyan, Yaniv Romano, Jeremias Sulam, and Michael Elad [LeCun, Bottou, Bengio and Haffner 98] [Lewicki & Sejnowski 99] [Hashimoto & Kurata, 00] [Mørup, Schmidt & Hansen 08] [Zeiler, Krishnan, Taylor & Fergus 10] [Jarrett, Kavukcuoglu, Gregor, LeCun 11] [Heide, Heidrich & Wetzstein 15] [Gu, Zuo, Xie, Meng, Feng & Zhang 15] 37
38 Intuitively X = The first filter The second filter 38
39 Convolutional Sparse Coding (CSC) =
40 Convolutional Sparse Representation R i+1 R i X = (2n (2n 1)m 1)m n n D L γ i γ i+1 X = DΓ R i X = Ωγ i stripe-dictionary stripe vector Adjacent representations overlap, as they skip by m items as we sweep through the patches of X 40
41 CSC Relation to Our Story A clear global model: every patch has a sparse representation w.r.t. to the same local dictionary Ω, just as we have assumed No notion of disagreement on the patch overlaps Related to the current common practice of patch averaging (R i T - put the patch Ωγ i back in the i-th location of the global vector) X = DΓ = 1 n i R i T Ωγ i What about the Pursuit? Patch averaging : independent sparse coding for each patch CSC: should seek all the representations together Is there a bridge between the two? We ll come back to this later What about the theory? Until recently little was known regrading the theoretical aspects of CSC 41
42 Classical Sparse Theory (Noiseless) P 0 : min Γ Γ 0 s. t. X = DΓ Definition: Mutual Coherence: μ D = max i j d i T d j [Donoho & Elad 03] Theorem: For a signal X = DΓ, if Γ 0 < then this solution is necessarily the sparsest [Donoho & Elad 03] μ D Theorem: The OMP and BP are guaranteed to recover the true sparse code assuming that Γ 0 < μ D [Tropp 04], [Donoho & Elad 03] 42
43 CSC: The Need for a Theoretical Study Assuming that m = 2 and n = 64 we have that [Welch, 74] μ D As a result, uniqueness and success of pursuits is guaranteed as long as Γ 0 < μ(d) Less than 8 non-zeros GLOBALLY are allowed!!! This is a very pessimistic result! Repeating the above for the noisy case leads to even worse performance predictions Bottom line: Classic SparseLand theory cannot provide good explanations for the CSC model 43
44 Moving to Local Sparsity: Stripes s Γ 0, = max i γ i 0 s P 0, : min Γ 0, s. t. X = DΓ Γ s Γ 0, is low all γ i are sparse every patch has a sparse representation over Ω If Γ is locally sparse [Papyan, Sulam & Elad 16] The solution of P 0, is necessarily unique The global OMP and BP are guaranteed to recover it m = 2 γ i This result poses a local constraint for a global guarantee, and as such, the guarantees scale linearly with the dimension of the signal 44
45 From Ideal to Noisy Signals In practice, Y = DΓ + E, where E is due to noise or model deviations Y DΓ E ϵ s P 0, : min Γ 0, s. t. Y DΓ 2 ϵ Γ Γ How close is Γ to Γ? If Γ is locally sparse and the noise is bounded [Papyan, Sulam & Elad 16] ϵ The solution of the P 0, is stable The solution obtained via global OMP/BP is stable The true and estimated representation are close 45
46 Global Pursuit via Local Processing P 1 ϵ : 1 Γ BP = min Γ 2 Y DΓ ξ Γ 1 X = DΓ = = R i T D L α i i = R i T s i m i s i are slices not patches n D L α i m 46
47 Global Pursuit via Local Processing (2) P 1 ϵ : Γ BP = min Γ 1 2 Y DΓ ξ Γ 1 Using variable splitting s i = D L α i 2 1 min s i, α i 2 Y R T i s i i 2 + ξ α i 1 i s. t. s i = D L α i These two problems are equivalent, and convex w.r.t their variables The new formulation targets the local slices, and their sparse representations Can be solved via ADMM, replace the constraint with a penalty 47
48 Slice-Based Pursuit Local Sparse Coding: Slice Reconstruction: α i = argmin α i p i = 1 ρ R iy + D L α i u i ρ 2 s i D L α i + u i ξ α i 1 Slice Aggregation: Local Laplacian: Dual Variable Update: X = j R j T p j s i = p i 1 ρ + n R ix u i = u i + s i D L α i Comment: One iteration of this procedure amounts to the very same patch-averaging algorithm we started with 48
49 Two Comments About this Scheme We work with Slices and not Patches Patches extracted from natural images, and their corresponding slices. Observe how the slices are far simpler, and contained by their corresponding patches The Proposed Scheme can be used for Dictionary (D L ) Learning Slice-based DL algorithm using standard patch-based tools, leading to a faster and simpler method, compared to existing methods Patches Slices Patches Slices [Wohlberg, 2016] Ours 49
50 Two Comments About this Scheme We work with Slices and not Patches Patches extracted from natural images, and their corresponding slices. Observe how the slices are far simpler, and contained by their corresponding patches Patches Slices Patches Slices The Proposed Scheme can be used for Dictionary (D L ) Learning Slice-based DL algorithm using standard patch-based tools, leading to a faster and simpler method, compared to existing methods Heide et. al Wohlberg [Wohlberg, 2016] Ours 50
51 Going Deeper Convolutional Neural Networks Analyzed via Convolutional Sparse Coding Joint work with Vardan Papyan and Michael Elad 51
52 CSC and CNN There is an analogy between CSC and CNN: Convolutional structure, data driven models, ReLU is a sparsifying operator, and more We propose a principled way to analyze CNN SparseLand Sparse Representation Theory The Underlying Idea Modeling data sources enables a theoretical analysis of algorithms performance But first, a short review of CNN CNN * Convolutional Neural Networks *Our analysis holds true for fully connected networks as well 52
53 CNN N Y m 1 N N m 2 N N n 1 W 2 N n 1 n 0 W 1 n 0 [LeCun, Bottou, Bengio and Haffner 98] [Krizhevsky, Sutskever & Hinton 12] [Simonyan & Zisserman 14] [He, Zhang, Ren & Sun 15] ReLU z = max 0, z 53
54 Mathematically... T f X, Y, W i, b i = ReLU b 2 + W 2 ReLU b 1 + W T 1 X Y Z 2 R Nm 2 b W T 2 R Nm 2 Nm 1 2 R Nm 2 n 1 m 1 b 1 R Nm 1 W 1 T R Nm 1 N m 2 m 1 n 0 Y R N ReLU ReLU m 1 54
55 Training Stage of CNN Consider the task of classification Given a set of signals Y j j and their corresponding labels h Y j, the CNN learns an end-to-end mapping j min W i, b i,u j l h Y j, U, f Y j, W i, b i True label Classifier Output of last layer 55
56 Back to CSC X R N n 0 D 1 R N Nm 1 m 1 Γ 1 R Nm 1 We propose to impose the same structure on the representations themselves Γ 1 R Nm 1 D 2 R Nm 1 Nm 2 Γ 2 R Nm 2 Convolutional sparsity (CSC) assumes an inherent structure is present in natural signals n 1 m 1 m 2 m 1 Multi-Layer CSC (ML-CSC) 56
57 Intuition: From Atoms to Molecules X R N D 1 R N Nm 1 Γ D 2 R Nm 1 Nm 2 1 R Nm 1 Γ 2 R Nm 2 Columns in D 1 are convolutional atoms Columns in D 2 combine the atoms in D 1, creating more complex structures The dictionary D 1 D 2 is a superposition of the atoms of D 1 The size of the effective atoms is increased throughout the layers (receptive field) Γ 1 R Nm 1 57
58 Intuition: From Atoms to Molecules X R N D 1 R N Nm 1 D 2 R Nm 1 Nm 2 Γ 2 R Nm 2 One could chain the multiplication of all the dictionaries into one effective dictionary D eff = D 1 D 2 D 3 D K and then X = D eff Γ K as in SparseLand However, a key property in this model is the sparsity of each representation (feature-maps) Γ 1 R Nm 1 58
59 A Small Taste: Model Training (MNIST) MNIST Dictionary: D 1 : 32 filters of size 7 7, with stride of 2 (dense) D 2 : 128 filters of size with stride of % sparse D3: 1024 filters of size % sparse D 1 (7 7) D 1 D 2 (15 15) D 1 D 2 D 3 (28 28) 59
60 A Small Taste: Pursuit Y Γ 0 Γ % sparse (213 nnz) Γ % sparse (30 nnz) Γ % sparse (5 nnz) 60
61 A Small Taste: Pursuit Y Γ 0 Γ 1 Γ % sparse (302 nnz) % sparse (47 nnz) % sparse (6 nnz) Γ 3 61
62 A Small Taste: Model Training (CIFAR) D 1 (5 5 3) D 1 D 2 (13 13) D 1 D 2 D 3 (32 32) CIFAR Dictionary: D 1 : 64 filters of size 5x5x3, stride of 2 dense D 2 : 256 filters of size 5x5x64, stride of % sparse D 3 : 1024 filters of size 5x5x % sparse 62
63 Deep Coding Problem (DCP) Noiseless Pursuit K Find Γ j j=1 Noisy Pursuit s. t. X = D 1 Γ 1 Γ 1 s 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K K Find Γ j j=1 s. t. Y D 1 Γ 1 2 E 0 Γ 1 s 0, λ 1 Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K 63
64 Deep Learning Problem DLP Supervised Dictionary Learning Task driven dictionary learning [Mairal, Bach, Sapiro & Ponce 12] min D i i=1 K Unsupervised Dictionary Learning Find D K i i=1 s. t. j Γ K 1,U True label Y j D 1 Γ 1 j Γ 1 j D2 Γ 2 j j D K Γ K j l h Y j, U, DCP Y j, D i Classifier 2 E 0 Γ 1 j 2 E 1 2 E K 1 Deepest representation obtained from DCP Γ 2 j Γ K j s 0, s 0, s 0, λ 1 λ 2 J λ K j=1 64
65 ML-CSC: The Simplest Pursuit The simplest pursuit algorithm (single-layer case) is the THR algorithm, which operates on a given input signal Y by: Y = DΓ + E and Γ is sparse Γ = P β D T Y Restricting the coefficients to be nonnegative does not restrict the expressiveness of the model ReLU = Soft Nonnegative Thresholding 65
66 Consider this for Solving the DCP Layered thresholding (LT): Estimate Γ 1 via the THR algorithm K DCP λ E : Find Γ j j=1 s. t. s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Γ 2 = P β2 D 2 T P β1 D 1 T Y Estimate Γ 2 via the THR algorithm Forward pass of CNN: Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K f X = ReLU b 2 + W 2 T ReLU b 1 + W 1 T Y ReLU Bias b Weights W Forward pass f Soft Non-Negative THR Thresholds β Dictionary D Layered Soft NN THR DCP 66
67 Consider this for Solving the DCP Layered thresholding (LT): Estimate Γ 1 via the THR algorithm K DCP λ E : Find Γ j j=1 s. t. s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Γ 2 = P β2 D 2 T P β1 D 1 T Y Estimate Γ 2 via the THR algorithm Forward pass of CNN: Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K f X = ReLU b 2 + W 2 T ReLU b 1 + W 1 T Y The layered (soft nonnegative) thresholding and the forward pass algorithm are the very same things!!! 67
68 Consider this for Solving the DLP DLP (supervised ): CNN training: min D i i=1 K,U j l h Y j, U, DCP Y j, D i Estimate via the layered THR algorithm The thresholds for the DCP should also learned min W i, b i,u j l h Y j, U, f Y, W i, b i CNN Language Forward pass f SparseLand Language Layered Soft NN THR DCP 68
69 Consider this for Solving the DLP DLP (supervised ): CNN training: * min D i i=1 K,U j l h Y j, U, DCP Y j, D i Estimate via the layered THR algorithm The thresholds for the DCP should also learned min W i, b i,u j l h Y j, U, f Y, W i, b i The problem CNN solved by SparseLand the training stage of CNN Language and the DLP are equivalent Language as well, assuming Forward pass that the DCP Layered is approximated THR Pursuit via the layered f thresholding DCP algorithm * Recall that for the ML-CSC, there exists an unsupervised avenue for training the dictionaries that has no simple parallel in CNN 69
70 Theoretical Questions M A X = D 1 Γ 1 Γ 1 = D 2 Γ 2 Γ K 1 = D K Γ K X Y DCP λ E Layered Thresholding (Forward Pass) K Γ i i=1 Γ i is L 0, sparse Other? 70
71 Theoretical Path: Possible Questions Having established the importance of the ML-CSC model and its associated pursuit, the DCP problem, we now turn to its analysis The main questions we aim to address: I. Uniqueness of the solution (set of representations) to the DCP λ? II. Stability of the solution to the DCP λ E problem? III. Stability of the solution obtained via the hard and soft layered THR algorithms (forward pass)? IV. Limitations of this (very simple) algorithm and alternative pursuit? V. Algorithms for training the dictionaries D K i i=1 vs. CNN? VI. New insights on how to operate on signals via CNN? 71
72 Uniqueness of DCP λ DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K Theorem: If a set of solutions Γ K i i=1 is found for (DCP λ ) such that: s Γ i 0, = λ i < μ D i Then these are necessarily the unique solution to this problem. The feature maps CNN aims to recover are unique Is this set unique? Mutual Coherence: μ D = max d T i d j i j [Donoho & Elad 03] 72
73 Stability of DCP λ E DCP λ E : Find a set of representations satisfying s Y D 1 Γ 1 2 E 0 Γ 1 0, λ 1 Is this set stable? Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K Suppose that we manage to solve the E DCP λ and find a feasible set of representations satisfying all the conditions K Γ i i=1 The question we pose is: How close is Γ i to Γ i? 73
74 Stability of DCP λ E Theorem: If the true representations Γ K i i=1 s Γ i 0, λ i < μ D i satisfy then the set of solutions Γ i i=1 obtained by solving this 2 problem (somehow) with E 0 = E 2 and E i = 0 (i 1) must obey 2 Γ i Γ i 2 4 E 2 2 i K j= λ j 1 μ D j The problem CNN aims to solve is stable under certain conditions Observe this annoying effect of error magnification as we dive into the model 74
75 Local Noise Assumption Our analysis relied on the local sparsity of the underlying solution Γ, which was enforced through the l 0, norm In what follows, we present stability guarantees that will also depend on the local energy in the noise vector E This will be enforced via the l 2, norm, defined as: p E 2, = max i R i E 2 75
76 Stability of Layered-THR s Theorem: If Γ i 0, < μ D i Γ min i 1 Γ max i μ D i ε L i 1 Γ i max then the layered hard THR (with the proper thresholds) will find the correct supports * and Γ LT i i Γ i ε 2, L where we have defined ε 0 p L = E 2, and p i p ε L = Γ i 0, ε i 1 s L + μ D i Γ i 0, 1 Γ max i The stability of the forward pass is guaranteed if the underlying representations are locally sparse and the noise is locally bounded * Least-Squares update of the non-zeros? 76
77 Limitations of the Forward Pass The stability analysis reveals several inherent limitations of the forward pass (a.k.a. Layered THR) algorithm: Even in the noiseless case, the forward pass is incapable of recovering the perfect solution of the DCP problem Its success depends on the ratio Γ i min / Γ i max. This is a direct consequence of relying on a simple thresholding operator The distance between the true sparse vector and the estimated one increases exponentially as a function of the layer depth next we propose a new algorithm that attempts to solve some of these problems 77
78 Special Case Sparse Dictionaries Throughout the theoretical study we assumed that the representations in the different layers are L 0, -sparse K Do we know of a simple example of a set of dictionaries D i i=1 and their corresponding signals X that will obey this property? Assuming the dictionaries are sparse: s Γ j 0, K s Γ K 0, D i 0 i=j+1 Maximal number of non-zeros in an atom in D i In the context of CNN, the above happens if a sparsity promoting regularization, such as the L 1, is employed on the filters 78
79 Better Pursuit? DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K So far we proposed the Layered THR: Γ K = P βk D K T P β2 D 2 T P β1 D 1 T X The motivation is clear getting close to what CNN use However, this is the simplest and weakest pursuit known in the field of sparsity Can we offer something better? 79
80 Layered Basis Pursuit (Noiseless) DCP λ : Find a set of representations satisfying s X = D 1 Γ 1 Γ 1 0, λ 1 Γ 1 = D 2 Γ 2 Γ 2 s 0, λ 2 Γ K 1 = D K Γ K Γ K s 0, λ K We can propose a Layered Basis Pursuit Algorithm: Γ 1 LBP = min Γ 1 Γ 1 1 s. t. = D 1 Γ 1 Deconvolutional networks Γ 2 LBP = min Γ 2 Γ 2 1 s. t. Γ 1 LBP = D 2 Γ 2 [Zeiler, Krishnan, Taylor & Fergus 10] 80
81 Guarantee for Success of Layered BP As opposed to prior work in CNN, we can do far more than just proposing an algorithm we can analyze its terms for success: Theorem: If a set of representations Γ K i i=1 of the Multi-Layered CSC model satisfy s Γ i 0, λ i < μ D i then the Layered BP is guaranteed to find them Consequences: The layered BP can retrieve the underlying representations in the noiseless case, a task in which the forward pass fails to provide The Layered-BP s success does not depend on the ratio Γ i min / Γ i max 81
82 Layered Basis Pursuit (Noisy) DCP λ E : Find a set of representations satisfying Y D 1 Γ 1 2 E 0 Γ 1 s 0, λ 1 Γ 1 D 2 Γ 2 2 E 1 Γ 2 s 0, λ 2 Γ K 1 D K Γ K 2 E K 1 Γ K s 0, λ K Similarly to the noiseless case, this can be also solved via the Layered Basis Pursuit Algorithm But in a Lagrangian form Γ LBP 1 1 = min Γ 1 2 Y D 2 1Γ ξ 1 Γ 1 1 Γ LBP 1 2 = min Γ 2 2 Γ 1 LBP 2 D 2 Γ ξ2 Γ
83 Stability of Layered BP s Theorem: Assuming that Γ i 0, then for correctly chosen λ K i i=1 μ D i we are guaranteed that 1. The support of Γ i LBP is contained in that of Γ i 2. The error is bounded: Γ i LBP Γ i 2, i ε L = 7.5 i p E 2, p p Γ j 0, ε L i, where 3. Every entry in Γ i greater than ε i p L / Γ i 0, will be found i j=1 83
84 Layered Iterative Thresholding Layered BP: Γ LBP 1 j = min Γ j 2 Γ LBP 2 j 1 D j Γ j + ξj Γ 2 j 1 j t Layered Iterative Soft-Thresholding Γ j t = S ξj /c j Γ j t c j D j T Γ j 1 D j Γ j t 1 j Note that our suggestion implies that groups of layers share the same dictionaries Can be seen as a recurrent neural network [Gregor & LeCun 10] * c i > 0.5 λ max (D i T D i ) 84
85 This Talk Independent patch-processing Local Sparsity We described the limitations of patch-based processing and ways to overcome some of these 85
86 This Talk Independent patch-processing Local Sparsity Convolutional Sparse Coding We presented a theoretical study of the CSC and a practical algorithm that works locally 86
87 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding We mentioned several interesting connections between CSC and CNN and this led us to 87
88 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding propose and analyze a multi-layer extension of CSC, shown to be tightly connected to CNN 88
89 This Talk The ML-CSC was shown to enable a theoretical study of CNN, along with new insights Convolutional Neural Networks Independent patch-processing Convolutional Sparse Coding Local Sparsity Multi-Layer Convolutional Sparse Coding Extension of the classical sparse theory to a multi-layer setting A novel interpretation and theoretical understanding of CNN 89
90 This Talk Independent patch-processing Local Sparsity Convolutional Neural Networks Convolutional Sparse Coding Multi-Layer Convolutional Sparse Coding Extension of the classical sparse theory to a multi-layer setting The underlying idea: Modeling the data source in order to be able to theoretically analyze algorithms performance A novel interpretation and theoretical understanding of CNN 90
91 Questions? 91
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding
Journal of Machine Learning Research 18 (2017) 1-52 Submitted 10/16; Revised 6/17; Published 7/17 Convolutional Neural Networks Analyzed via Convolutional Sparse Coding Vardan Papyan* Department of Computer
More informationSparse Modeling. in Image Processing and Deep Learning. Michael Elad
Sparse Modeling in Image Processing and Deep Learning Computer Science Department - Israel Institute of Technology Haifa 32000, Israel The research leading to these results has been received funding from
More informationSOS Boosting of Image Denoising Algorithms
SOS Boosting of Image Denoising Algorithms Yaniv Romano and Michael Elad The Technion Israel Institute of technology Haifa 32000, Israel The research leading to these results has received funding from
More informationarxiv: v1 [stat.ml] 27 Jul 2016
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationSignal Modeling: From Convolutional Sparse Coding to Convolutional Neural Networks
Signal Modeling: From Convolutional Spare Coding to Convolutional Neural Network Vardan Papyan The Computer Science Department Technion Irael Intitute of technology Haifa 32000, Irael Joint work with Jeremia
More informationMulti-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning
Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning Jeremias Sulam, Member, IEEE, Vardan Papyan, Yaniv Romano, and Michael Elad Fellow, IEEE arxiv:78.875v [cs.cv] 3 Jun 8 Abstract
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationAdversarial Noise Attacks of Deep Learning Architectures Stability Analysis via Sparse Modeled Signals
Noname manuscript No. (will be inserted by the editor) Adversarial Noise Attacks of Deep Learning Architectures Stability Analysis via Sparse Modeled Signals Yaniv Romano Aviad Aberdam Jeremias Sulam Michael
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationA Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality
A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationGreedy Dictionary Selection for Sparse Representation
Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationNoise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm
Sparse Representation and the K-SV Algorithm he CS epartment he echnion Israel Institute of technology Haifa 3, Israel University of Erlangen - Nürnberg April 8 Noise Removal? Our story begins with image
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationc 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE
METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.
More informationSparse molecular image representation
Sparse molecular image representation Sofia Karygianni a, Pascal Frossard a a Ecole Polytechnique Fédérale de Lausanne (EPFL), Signal Processing Laboratory (LTS4), CH-115, Lausanne, Switzerland Abstract
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationStable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung
Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Encoding Sparse encoder Layer 2 Representation
More informationWavelet Footprints: Theory, Algorithms, and Applications
1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationOvercomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon
Overcomplete Dictionaries for Sparse Representation of Signals Michal Aharon ii Overcomplete Dictionaries for Sparse Representation of Signals Reasearch Thesis Submitted in Partial Fulfillment of The Requirements
More informationImage Noise: Detection, Measurement and Removal Techniques. Zhifei Zhang
Image Noise: Detection, Measurement and Removal Techniques Zhifei Zhang Outline Noise measurement Filter-based Block-based Wavelet-based Noise removal Spatial domain Transform domain Non-local methods
More informationBlind Compressed Sensing
1 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE arxiv:1002.2586v2 [cs.it] 28 Apr 2010 Abstract The fundamental principle underlying compressed sensing is that a signal,
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationSensing systems limited by constraints: physical size, time, cost, energy
Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original
More informationDetecting Sparse Structures in Data in Sub-Linear Time: A group testing approach
Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro
More informationA NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES
A NEW FRAMEWORK FOR DESIGNING INCOERENT SPARSIFYING DICTIONARIES Gang Li, Zhihui Zhu, 2 uang Bai, 3 and Aihua Yu 3 School of Automation & EE, Zhejiang Univ. of Sci. & Tech., angzhou, Zhejiang, P.R. China
More informationA Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases
2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary
More informationTheories of Deep Learning
Theories of Deep Learning Lecture 02 Donoho, Monajemi, Papyan Department of Statistics Stanford Oct. 4, 2017 1 / 50 Stats 385 Fall 2017 2 / 50 Stats 285 Fall 2017 3 / 50 Course info Wed 3:00-4:20 PM in
More informationTopographic Dictionary Learning with Structured Sparsity
Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity
More informationLEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler
LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationFive Lectures on Sparse and Redundant Representations Modelling of Images. Michael Elad
Five Lectures on Sparse and Redundant Representations Modelling of Images Michael Elad IAS/Park City Mathematics Series Volume 19, 2010 Five Lectures on Sparse and Redundant Representations Modelling
More informationThe Little Engine That Could: Regularization by Denoising (RED)
SIAM J. IMAGING SCIENCES Vol. 10, No. 4, pp. 1804 1844 c 2017 Society for Industrial and Applied Mathematics The Little Engine That Could: Regularization by Denoising (RED) Yaniv Romano, Michael Elad,
More informationBackpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)
Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationSparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images
Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationCOMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for
More informationCombining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation
UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).
More informationPHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN
PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationSparse & Redundant Representations by Iterated-Shrinkage Algorithms
Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,
More informationA practical theory for designing very deep convolutional neural networks. Xudong Cao.
A practical theory for designing very deep convolutional neural networks Xudong Cao notcxd@gmail.com Abstract Going deep is essential for deep learning. However it is not easy, there are many ways of going
More informationThe Iteration-Tuned Dictionary for Sparse Representations
The Iteration-Tuned Dictionary for Sparse Representations Joaquin Zepeda #1, Christine Guillemot #2, Ewa Kijak 3 # INRIA Centre Rennes - Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, FRANCE
More informationSparse Solutions of an Undetermined Linear System
1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationL-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise
L-statistics based Modification of Reconstruction Algorithms for Compressive Sensing in the Presence of Impulse Noise Srdjan Stanković, Irena Orović and Moeness Amin 1 Abstract- A modification of standard
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE
5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationDesigning Information Devices and Systems I Spring 2018 Homework 13
EECS 16A Designing Information Devices and Systems I Spring 2018 Homework 13 This homework is due April 30, 2018, at 23:59. Self-grades are due May 3, 2018, at 23:59. Submission Format Your homework submission
More informationON THE STABILITY OF DEEP NETWORKS
ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference
More informationCompressed Sensing: Extending CLEAN and NNLS
Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationMinimax Reconstruction Risk of Convolutional Sparse Dictionary Learning
Minimax Reconstruction Risk of Convolutional Sparse Dictionary Learning Shashank Singh Barnabás Póczos Jian Ma bapoczos@cs.cmu.edu Machine Learning Department Carnegie Mellon University sss1@cs.cmu.edu
More informationDouble Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. X, NO. X, XX 200X 1 Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation Ron Rubinstein, Student Member, IEEE, Michael Zibulevsky,
More informationPre-weighted Matching Pursuit Algorithms for Sparse Recovery
Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie
More informationConvolutional neural networks
11-1: Convolutional neural networks Prof. J.C. Kao, UCLA Convolutional neural networks Motivation Biological inspiration Convolution operation Convolutional layer Padding and stride CNN architecture 11-2:
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationSparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!
Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images! Alfredo Nava-Tudela John J. Benedetto, advisor 1 Happy birthday Lucía! 2 Outline - Problem: Find sparse solutions
More informationOne Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017
One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationDesigning Information Devices and Systems I Discussion 13B
EECS 6A Fall 7 Designing Information Devices and Systems I Discussion 3B. Orthogonal Matching Pursuit Lecture Orthogonal Matching Pursuit (OMP) algorithm: Inputs: A set of m songs, each of length n: S
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More information