TRADEOFFS BETWEEN SPEED AND ACCURACY IN INVERSE PROBLEMS RAJA GIRYES TEL AVIV UNIVERSITY
|
|
- Jerome Flowers
- 5 years ago
- Views:
Transcription
1 TRADEOFFS BETWEEN SPEED AND ACCURACY IN INVERSE PROBLEMS RAJA GIRYES TEL AVIV UNIVERSITY Deep Learning TCE Conference June 14, 2018
2 DEEP LEARNING IMPACT Today we get 3.5% by 152 layers Imagenet dataset 1,400,000 images 1000 categories for testing, for validation TCE conference 2
3 CUTTING EDGE PERFORMANCE IN MANY OTHER APPLICATIONS Disease diagnosis [Greenspan, Riklin, Kiriaty]. Language translation [Sutskever et al., 2014]. Video classification [Karpathy et al., 2014]. Handwriting recognition [Poznanski & Wolf, 2016]. Sentiment classification [Socher et al., 2013]. Image denoising [Remez et al., 2017]. Depth Reconstruction [Haim et al., 2017]. Super-resolution [Kim et al., 2016], [Bruna et al., 2016]. many other applications 3 TCE conference
4 CLASS AWARE DENOISING Class aware denoising Agnostic denoising TCE conference [Remez, Litani, Giryes and Bronstein, 2017] 4
5 DEEP ISP TCE conference [Schwartz, Giryes and Bronstein, 2018] 5
6 NON-UNIFORMLY QUANTIZED DEEP LEARNING Performance vs. complexity of different quantized neural networks. [Baskin et al., 2018] 6
7 DEPTH ESTIMATION BY PHASE CODED CUES [Haim, Elmalem, Bronstein, Marom, Giryes, 2018] TCE conference 7
8 COMPRESSED COLOR LIGHT FIELD TCE conference [Nabati, Mendelovic, Giryes, 2018] 8
9 EXOPLANETS DETECTION TCE conference [Zucker and Giryes, 2018] 9
10 PARTIAL SHAPE ALIGNMENT Alignment is performed by a free form deformation generated by a network: TCE conference 10 [Hanocka et al., 2018]
11 WARNING: NO DEEP LEARNING HERE! TCE conference 11 [Levy, Vahid and Giryes, 2018]
12 WHY THINGS WORK BETTER TODAY? More data larger datasets, more access (internet) Better hardware (GPU) Better learning regularization (dropout) Deep learning impact and success is not unique only to image classification. But it is still unclear why deep neural networks are so remarkably successful and how they are doing it. TCE conference 12
13 SAMPLE OF RELATED EXISTING THEORY Universal approximation for any measurable Borel functions [Hornik et. al., 1989, Cybenko 1989] Depth of a network provides an exponential complexity compared to the number parameters [Montúfar et al. 2014, Cohen et al. 2016, Eldan & Shamir, 2016] and invariance to more complex deformations [Bruna & Mallat, 2013] Number of training samples scales as the number of parameters [Shalev-Shwartz & Ben-David 2014] or the norm of the weights in the DNN [Neyshabur et al. 2015] Pooling relation to shift invariance and phase retrieval [Bruna et al. 2013, 2014] Deeper networks have more local minima that are close to the global one and less saddle points [Saxe et al. 2014], [Dauphin et al. 2014], [Choromanska et al. 2015], [Haeffele & Vidal, 2015], [Soudry & Hoffer, 2017], Relation to dictionary learning [Papayan et al. 2016]. Information bottleneck [Shwartz-Ziv & Tishby, 2017], [Tishby & Zaslavsky 2017]. Invariant representation for certain tasks [Soatto & Chiuso, 2016] Bayesian deep learning [Kendall and Gal. 2017] [Patel, Nguyen & Baraniuk, 2016].
14 SAMPLE OF OUR THEORY DNN with Random Gaussian weights are good for classifying the average points in the data and an important goal of training is to classify the boundary points between the different classes in the data [Giryes et al. 2016]. Generalization error of neural network [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Relationship between invariance and generalization in deep learning [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Solving optimization problems with neural networks [Giryes, Eldar, Bronstein & Sapiro, 2018]. Robustness to adversarial examples [Jakubovitz & Giryes, 2018]. Robustness to label noise [Drory, Avidan & Giryes, 2018]. Observation of k-nn behavior in neural networks to explain the coexistence of memorization and generalization in neural networks [Cohen, Sapiro & Giryes, 2018]. 14
15 SAMPLE OF OUR THEORY DNN with Random Gaussian weights are good for classifying the average points in the data and an important goal of training is to classify the boundary points between the different classes in the data [Giryes et al. 2016]. Generalization error of neural network [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Relationship between invariance and generalization in deep learning [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Solving optimization problems with neural networks [Giryes, Eldar, Bronstein & Sapiro, 2018]. Our focus today Robustness to adversarial examples [Jakubovitz & Giryes, 2018]. Robustness to label noise [Drory, Avidan & Giryes, 2018]. Observation of k-nn behavior in neural networks to explain the coexistence of memorization and generalization in neural networks [Cohen, Sapiro & Giryes, 2018]. 15
16 INVERSE PROBLEMS We are given X = AZ + E Given set of measurements linear operator Unknown signal noise Standard technique for recovery min X AZ 2 s. t. Z Υ Z Unconstrained form min Z X AZ λf(z) Z resides in a low dimensional set Υ Regularization parameter f is a penalty function 16
17 l 1 MINIMIZATION CASE Unconstrained form min X AZ λ Z 1 Z Can be solved by proximal gradient, e.g., iterative shrinkage and thresholding technique (ISTA) Z t+1 = ψ λμ Z t + μa T X AZ t Soft thresholding operation - λμ λμ μ is the step size 17
18 ISTA CONVERGENCE Reconstruction mean squared error (MSE) as a function of the number of iterations E Z Z t t 18
19 ISTA Z t+1 = ψ λμ LISTA Z t + μa T X AZ t Rewriting ISTA: Z t+1 = ψ λμ I μa T A Z t + μa T X Learned ISTA (LISTA): Z t+1 = ψ λ WZ t + SX Learned thresholds Learned operators 19
20 LISTA CONVERGENCE Replacing I μa T A and μa T in ISTA with the learned W and S improves convergence [Gregor & LeCun, 2010] E Z Z t t 100 Extensions to other models [Sprechmann, Bronstein & Sapiro, 2015], [Remez, Litani & Bronstein, 2015], [Tompson, Schlachter, Sprechmann & Perlin, 2016]. 20
21 Learned linear operators LISTA AS A NEURAL NETWORK S X R d W + ψ Z X = AZ + E Z Υ Soft thresholding operation - λ λ An estimate of Z [Gregor & LeCun, 2010] 21
22 Iterative soft thresholding algorithm (ISTA) X R d X = AZ + E μa T Step size μ obeys 1 μ A ISTA + I μa T A Z Soft thresholding operation - λμ λμ Minimizer of min Z ψ μ is the step size X A Z + λ Z 1 [Daubechies, Defrise & Mol, 2004], [Beck & Teboulle, 2009] 22
23 PROJECTED GRADIENT DESCENT (PGD) I μa T A μ is the step size X R d μa T + ψ Z X = AZ + E f(z) R ψ projects onto the set Υ f( Z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 23
24 THEORY FOR PGD Theorem 8: Let Z R d, f: R d R a proper function, f Z R, C f (Z) the tangent cone of f at point Z, A R m d a random Gaussian matrix and X = AZ + E. Then the estimate of PGD at iteration t, Z t, obeys Z t Z κ f ρ t Z, where ρ = sup U T I μa T A V U,V C f Z B d and κ f = 1 if f is convex and κ f = 2 otherwise. [Oymak, Recht & Soltanolkotabi, 2016]. 24
25 PGD CONVERGENCE RATE ρ = sup U T I μa T A V is the convergence U,V C f Z B d rate of PGD. Let ω be the Gaussian mean width of C f Z B d. If μ = 1 m+ d 2 1 d If μ = 1 m then ρ = O ω m. then ρ = 1 O m ω m+d. For the k-sparse model ω 2 = O klog d For GMM with k Gaussians ω 2 = O k. How may we cause ω to become smaller for having a better convergence rate? 25
26 INACCURATE PROJECTION PGD iterations projects onto Υ = Z: f Z R. Smaller Υ Smaller ω. Faster convergence as ρ = 1 O m ω or O m+d Let us assume that our signal belongs to a smaller set Υ = Z: f Z R with ω ω. Ideally, we would like to project f( Z) R onto Υ instead of Υ. This will lead to faster convergence. What if such a projection is not feasible? ω m Υ f( Z) R Υ 26
27 INACCURATE PROJECTION We will estimate the projection onto Υ by A linear projection P Followed by a projection onto Υ Assumptions: Υ (PZ) Z ϵ f( Z) R Projection of the target vector Z onto P and then onto Υ 27
28 INACCURATE PGD (IPGD) P I μa T A μ is the step size X R d μpa T + ψ Z X = AZ + E f(z) R Υ ψ projects onto the set Υ f(z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 28
29 THEORY FOR IPGD Theorem 9: Let Z R d, f: R d R a proper convex* function, f Z R, C f (Z) the tangent cone of f at point Z, A R d m a random Gaussian matrix and X = AZ + E. Then the estimate of IPGD at iteration t, Z t, obeys Z t Z ρ P t + 1 ρ P t 1 ρ P ε Z, where ρ p = sup U T P I μa T A PV U,V C f Z B d and ε = (2 + ρ p )ϵ. [Giryes, Eldar, Bronstein & Sapiro, 2018] *We have a version of this theorem also when f is non-proper or non-convex function 29
30 CONVERGENCE RATE COMPARISON PGD convergence: IPGD convergence: ρ t ρ P t + 1 ρ P t 1 ρ P (2 + ρ p )ε (a) ρ t (b) P + ε (c) t ρ P ρ t (a)ε is negligible compared to ρ P (b) For small values of t (early iterations). (c) Faster convergence as ρ P ρ (because ω p ω). 30
31 MODEL BASED COMPRESSED SENSING Υ is the set of sparse vectors with sparsity patterns that obey a tree structure. Projecting onto Υ improves convergence rate compared to projecting onto the set of sparse vectors Υ [Baraniuk et al., 2010]. The projection onto Υ is more demanding than onto Υ Note that the probability of selecting atoms from lower tree levels is smaller than upper ones. P will be a projection onto certain tree levels zeroing the values at lower levels
32 MODEL BASED COMPRESSED SENSING Z Z t Non-zeros picked entries has zero mean random Gaussian distribution with variance: - 1 at first two levels at the third level at the rest of the levels 32
33 SPECTRAL COMPRESSED SENSING Υ is the set of vectors with sparse representation in a 2-times redundant DCT dictionary such that: The active atoms are selected uniformly at random such that minimum distance between neighboring atoms is 20. The value of each representation coefficient ~N(0,1) i.i.d. We set the neighboring coefficients at distance 2 of each active atom to be ~N(0, ), respectively We set P to be a pooling-like operation that keeps in each window of size 3 only the largest value. 33
34 Z Z t SPECTRAL COMPRESSED SENSING 34
35 SPECTRAL COMPRESSED SENSING Υ is the set of vectors with sparse representation in a 4-times redundant DCT dictionary such that: The active atoms are selected uniformly at random such that minimum distance between neighboring atoms is 5. The value of each representation coefficient ~N(0,1) i.i.d. We set the neighboring coefficients at distance 1 and 2 of each active atom to be ~N(0, ) We set P to be a pooling-like operation that keeps in each window of size 5 only the largest value. 35
36 Z Z t SPECTRAL COMPRESSED SENSING 36
37 LEARNING THE PROJECTION If we have no explicit information about Υ it might be desirable to learn the projection. Instead of learning P, it is possible to replace P I μa T A and μpa T with two learned matrices S and W respectively. This leads to a very similar scheme to the one of LISTA and provides a theoretical foundation for the success of LISTA. 37
38 Learned linear operators LEARNED IPGD S X R d W + ψ Z X = AZ + E f(z) R Υ ψ projects onto the set Υ f(z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 38
39 SUPER RESOLUTION A popular super-resolution technique uses a pair of low-res and high-res dictionaries [Zeyde et al. 2012] The original work uses OMP with sparsity 3 to decode the representation of patches in low-res image Then the representation is used to reconstruct the patches of the high-res image We replace OMP with LIPGD with 3 levels but higher target sparsity This leads to better reconstruction results (with up to 0.5dB improvement) 39
40 Learned linear operators LISTA S X R d W + ψ Z X = AZ + E f(z) R Υ ψ is a proximal mapping. ψ U = argmin U Z Z R d +λf( Z) Estimate of Z. Aim at solving min X A Z Z +λ f( Z) 40
41 LISTA MIXTURE MODEL Approximation of the projection onto Υ with one linear projection may not be accurate enough. This requires more LISTA layers/iterations. Instead, one may use several LISTA networks, where each approximates a different part of Υ Training multiple LISTA networks accelerate the convergence further. Υ Υ 41
42 LISTA MIXTURE MODEL X A Z t Z t 1 X AZ 2 2 Z 1 42
43 RELATED WORKS In [Bruna et al. 2017] it is shown that a learning may give a gain due to better preconditioning of A. In [Xin et al. 2016] a relation to the restricted isometry property (RIP) is drawn In [Borgerding & Schniter, 2016] a connection is drawn to approximate message passing (AMP). All these works consider only the sparsity case 43
44 ACKNOWLEDGEMENTS Guillermo Sapiro Duke University Yonina C. Eldar Technion Alex M. Bronstein Technion TCE conference 44
45 QUESTIONS? WEB.ENG.TAU.AC.IL/~RAJA TCE conference 45
46 GAUSSIAN MEAN WIDTH Gaussian mean width: ω Υ = E sup V,W Υ V W, g, g~n(0, I). The width of the set Υ in the direction of g: V Υ W g TCE conference 46
47 MEASURE FOR LOW DIMENSIONALITY Gaussian mean width: ω Υ = E sup V,W Υ V W, g, g~n(0, I). ω 2 Υ is a measure for the dimensionality of the data. Examples: If Υ B d is a Gaussian Mixture Model with k Gaussians then ω 2 Υ = O(k) If Υ B d is a data with k-sparse representations then ω 2 Υ = O(k log d) TCE conference 47
ON THE STABILITY OF DEEP NETWORKS
ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference
More informationMATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY
MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathematics of Deep Learning Tutorial CVPR, Hawaii July 21, 2017 1 CUTTING EDGE PERFORMANCE IN MANY OTHER APPLICATIONS Disease diagnosis [Zhou,
More informationHOW STRUCTURE OF THE DATA CAN IMPROVE OUR UNDERSTANDING AND USAGE OF DEEP LEARNING?
HOW STRUCTURE OF THE DATA CAN IMPROVE OUR UNDERSTANDING AND USAGE OF DEEP LEARNING? RAJA GIRYES TEL AVIV UNIVERSITY IMVC March 6, 2018 1 AGENDA Deep learning impact. A sample of existing theory for deep
More informationMATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY
MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathematics of Deep Learning Tutorial EUSIPCO Conference, Budapest, Hungary August 29, 2016 1 AGENDA History and introduction to deep learning.
More informationLearning MMSE Optimal Thresholds for FISTA
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm
More informationMATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY
MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathema'cs of Deep Learning Tutorial CDC, Melbourne, Australia December 15, 2017 1 DEEP LEARNING IMPACT Today we get 3.5% by 152 layers Imagenet
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationGlobal Optimality in Matrix and Tensor Factorizations, Deep Learning and More
Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies
More informationDNNs for Sparse Coding and Dictionary Learning
DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationSparse Modeling. in Image Processing and Deep Learning. Michael Elad
Sparse Modeling in Image Processing and Deep Learning Computer Science Department - Israel Institute of Technology Haifa 32000, Israel The research leading to these results has been received funding from
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationDeep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
Deep Neural Networks with Gaussian Weights: A Universal Classification Strategy? Raja Giryes *, Guillermo Sapiro **, and Alex M. Bronstein * * School of Electrical Engineering, Faculty of Engineering,
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationTopographic Dictionary Learning with Structured Sparsity
Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationRobust Large Margin Deep Neural Networks
Robust Large Margin Deep Neural Networks 1 Jure Sokolić, Student Member, IEEE, Raja Giryes, Member, IEEE, Guillermo Sapiro, Fellow, IEEE, and Miguel R. D. Rodrigues, Senior Member, IEEE arxiv:1605.08254v2
More informationarxiv: v1 [stat.ml] 22 Nov 2016
Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery arxiv:1611.07252v1 [stat.ml] 22 Nov 2016 Scott Wisdom 1, Thomas Powers 1, James Pitton 1,2, and Les Atlas 1 1 Department of Electrical
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationTheoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds
Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds Xiaohan Chen Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA
More informationMargin Preservation of Deep Neural Networks
1 Margin Preservation of Deep Neural Networks Jure Sokolić 1, Raja Giryes 2, Guillermo Sapiro 3, Miguel R. D. Rodrigues 1 1 Department of E&EE, University College London, London, UK 2 School of EE, Faculty
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationSparse Estimation and Dictionary Learning
Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationOnline Dictionary Learning with Group Structure Inducing Norms
Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,
More informationEnergy Landscapes of Deep Networks
Energy Landscapes of Deep Networks Pratik Chaudhari February 29, 2016 References Auffinger, A., Arous, G. B., & Černý, J. (2013). Random matrices and complexity of spin glasses. arxiv:1003.1129.# Choromanska,
More informationSparse molecular image representation
Sparse molecular image representation Sofia Karygianni a, Pascal Frossard a a Ecole Polytechnique Fédérale de Lausanne (EPFL), Signal Processing Laboratory (LTS4), CH-115, Lausanne, Switzerland Abstract
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationExponential decay of reconstruction error from binary measurements of sparse signals
Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationarxiv: v1 [cs.it] 21 Feb 2013
q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationFast Hard Thresholding with Nesterov s Gradient Method
Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationSparse & Redundant Representations by Iterated-Shrinkage Algorithms
Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationA Quest for a Universal Model for Signals: From Sparsity to ConvNets
A Quest for a Universal Model for Signals: From Sparsity to ConvNets Yaniv Romano The Electrical Engineering Department Technion Israel Institute of technology Joint work with Vardan Papyan Jeremias Sulam
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by
More informationLecture 22: More On Compressed Sensing
Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationCompressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements
Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements Wolfgang Dahmen Institut für Geometrie und Praktische Mathematik RWTH Aachen and IMI, University of Columbia, SC
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationOptimization geometry and implicit regularization
Optimization geometry and implicit regularization Suriya Gunasekar Joint work with N. Srebro (TTIC), J. Lee (USC), D. Soudry (Technion), M.S. Nacson (Technion), B. Woodworth (TTIC), S. Bhojanapalli (TTIC),
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationSparse Proteomics Analysis (SPA)
Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015
More informationNeural Networks: Optimization & Regularization
Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg
More informationPhil Schniter. Supported in part by NSF grants IIP , CCF , and CCF
AMP-inspired Deep Networks, with Comms Applications Phil Schniter Collaborators: Sundeep Rangan (NYU), Alyson Fletcher (UCLA), Mark Borgerding (OSU) Supported in part by NSF grants IIP-1539960, CCF-1527162,
More informationA Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde
A Convex Approach for Designing Good Linear Embeddings Chinmay Hegde Redundancy in Images Image Acquisition Sensor Information content
More informationMultilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems
Multilevel and Adaptive Sparse of Inverse Problems Fachbereich Mathematik und Informatik Philipps Universität Marburg Workshop Sparsity and Computation, Bonn, 7. 11.6.2010 (joint work with M. Fornasier
More informationBM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising
BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising Chris Metzler, Richard Baraniuk Rice University Arian Maleki Columbia University Phase Retrieval Applications: Crystallography Microscopy
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationA picture of the energy landscape of! deep neural networks
A picture of the energy landscape of! deep neural networks Pratik Chaudhari December 15, 2017 UCLA VISION LAB 1 Dy (x; w) = (w p (w p 1 (... (w 1 x))...)) w = argmin w (x,y ) 2 D kx 1 {y =i } log Dy i
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationHow to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India
How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationIntroduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012
Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation
More informationOn the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation
On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation Yingzhen Yang YYANG58@ILLINOIS.EDU Jianchao Yang JIANCHAO.YANG@SNAPCHAT.COM Snapchat, Venice, CA 90291 Wei Han WEIHAN3@ILLINOIS.EDU
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationLearning Sparsifying Transforms
1 Learning Sparsifying Transforms Saiprasad Ravishankar, Student Member, IEEE, and Yoram Bresler, Fellow, IEEE Abstract The sparsity of signals and images in a certain transform domain or dictionary has
More informationarxiv: v1 [cs.lg] 13 Dec 2017
Mathematics of Deep Learning René Vidal Joan Bruna Raja Giryes Stefano Soatto arxiv:1712.04741v1 [cs.lg] 13 Dec 2017 Abstract Recently there has been a dramatic increase in the performance of recognition
More informationOPTIMIZATION METHODS IN DEEP LEARNING
Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationMMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm
Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible
More informationMaximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More information