TRADEOFFS BETWEEN SPEED AND ACCURACY IN INVERSE PROBLEMS RAJA GIRYES TEL AVIV UNIVERSITY

Size: px
Start display at page:

Download "TRADEOFFS BETWEEN SPEED AND ACCURACY IN INVERSE PROBLEMS RAJA GIRYES TEL AVIV UNIVERSITY"

Transcription

1 TRADEOFFS BETWEEN SPEED AND ACCURACY IN INVERSE PROBLEMS RAJA GIRYES TEL AVIV UNIVERSITY Deep Learning TCE Conference June 14, 2018

2 DEEP LEARNING IMPACT Today we get 3.5% by 152 layers Imagenet dataset 1,400,000 images 1000 categories for testing, for validation TCE conference 2

3 CUTTING EDGE PERFORMANCE IN MANY OTHER APPLICATIONS Disease diagnosis [Greenspan, Riklin, Kiriaty]. Language translation [Sutskever et al., 2014]. Video classification [Karpathy et al., 2014]. Handwriting recognition [Poznanski & Wolf, 2016]. Sentiment classification [Socher et al., 2013]. Image denoising [Remez et al., 2017]. Depth Reconstruction [Haim et al., 2017]. Super-resolution [Kim et al., 2016], [Bruna et al., 2016]. many other applications 3 TCE conference

4 CLASS AWARE DENOISING Class aware denoising Agnostic denoising TCE conference [Remez, Litani, Giryes and Bronstein, 2017] 4

5 DEEP ISP TCE conference [Schwartz, Giryes and Bronstein, 2018] 5

6 NON-UNIFORMLY QUANTIZED DEEP LEARNING Performance vs. complexity of different quantized neural networks. [Baskin et al., 2018] 6

7 DEPTH ESTIMATION BY PHASE CODED CUES [Haim, Elmalem, Bronstein, Marom, Giryes, 2018] TCE conference 7

8 COMPRESSED COLOR LIGHT FIELD TCE conference [Nabati, Mendelovic, Giryes, 2018] 8

9 EXOPLANETS DETECTION TCE conference [Zucker and Giryes, 2018] 9

10 PARTIAL SHAPE ALIGNMENT Alignment is performed by a free form deformation generated by a network: TCE conference 10 [Hanocka et al., 2018]

11 WARNING: NO DEEP LEARNING HERE! TCE conference 11 [Levy, Vahid and Giryes, 2018]

12 WHY THINGS WORK BETTER TODAY? More data larger datasets, more access (internet) Better hardware (GPU) Better learning regularization (dropout) Deep learning impact and success is not unique only to image classification. But it is still unclear why deep neural networks are so remarkably successful and how they are doing it. TCE conference 12

13 SAMPLE OF RELATED EXISTING THEORY Universal approximation for any measurable Borel functions [Hornik et. al., 1989, Cybenko 1989] Depth of a network provides an exponential complexity compared to the number parameters [Montúfar et al. 2014, Cohen et al. 2016, Eldan & Shamir, 2016] and invariance to more complex deformations [Bruna & Mallat, 2013] Number of training samples scales as the number of parameters [Shalev-Shwartz & Ben-David 2014] or the norm of the weights in the DNN [Neyshabur et al. 2015] Pooling relation to shift invariance and phase retrieval [Bruna et al. 2013, 2014] Deeper networks have more local minima that are close to the global one and less saddle points [Saxe et al. 2014], [Dauphin et al. 2014], [Choromanska et al. 2015], [Haeffele & Vidal, 2015], [Soudry & Hoffer, 2017], Relation to dictionary learning [Papayan et al. 2016]. Information bottleneck [Shwartz-Ziv & Tishby, 2017], [Tishby & Zaslavsky 2017]. Invariant representation for certain tasks [Soatto & Chiuso, 2016] Bayesian deep learning [Kendall and Gal. 2017] [Patel, Nguyen & Baraniuk, 2016].

14 SAMPLE OF OUR THEORY DNN with Random Gaussian weights are good for classifying the average points in the data and an important goal of training is to classify the boundary points between the different classes in the data [Giryes et al. 2016]. Generalization error of neural network [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Relationship between invariance and generalization in deep learning [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Solving optimization problems with neural networks [Giryes, Eldar, Bronstein & Sapiro, 2018]. Robustness to adversarial examples [Jakubovitz & Giryes, 2018]. Robustness to label noise [Drory, Avidan & Giryes, 2018]. Observation of k-nn behavior in neural networks to explain the coexistence of memorization and generalization in neural networks [Cohen, Sapiro & Giryes, 2018]. 14

15 SAMPLE OF OUR THEORY DNN with Random Gaussian weights are good for classifying the average points in the data and an important goal of training is to classify the boundary points between the different classes in the data [Giryes et al. 2016]. Generalization error of neural network [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Relationship between invariance and generalization in deep learning [Sokolic, Giryes, Sapiro & Rodrigues, 2017]. Solving optimization problems with neural networks [Giryes, Eldar, Bronstein & Sapiro, 2018]. Our focus today Robustness to adversarial examples [Jakubovitz & Giryes, 2018]. Robustness to label noise [Drory, Avidan & Giryes, 2018]. Observation of k-nn behavior in neural networks to explain the coexistence of memorization and generalization in neural networks [Cohen, Sapiro & Giryes, 2018]. 15

16 INVERSE PROBLEMS We are given X = AZ + E Given set of measurements linear operator Unknown signal noise Standard technique for recovery min X AZ 2 s. t. Z Υ Z Unconstrained form min Z X AZ λf(z) Z resides in a low dimensional set Υ Regularization parameter f is a penalty function 16

17 l 1 MINIMIZATION CASE Unconstrained form min X AZ λ Z 1 Z Can be solved by proximal gradient, e.g., iterative shrinkage and thresholding technique (ISTA) Z t+1 = ψ λμ Z t + μa T X AZ t Soft thresholding operation - λμ λμ μ is the step size 17

18 ISTA CONVERGENCE Reconstruction mean squared error (MSE) as a function of the number of iterations E Z Z t t 18

19 ISTA Z t+1 = ψ λμ LISTA Z t + μa T X AZ t Rewriting ISTA: Z t+1 = ψ λμ I μa T A Z t + μa T X Learned ISTA (LISTA): Z t+1 = ψ λ WZ t + SX Learned thresholds Learned operators 19

20 LISTA CONVERGENCE Replacing I μa T A and μa T in ISTA with the learned W and S improves convergence [Gregor & LeCun, 2010] E Z Z t t 100 Extensions to other models [Sprechmann, Bronstein & Sapiro, 2015], [Remez, Litani & Bronstein, 2015], [Tompson, Schlachter, Sprechmann & Perlin, 2016]. 20

21 Learned linear operators LISTA AS A NEURAL NETWORK S X R d W + ψ Z X = AZ + E Z Υ Soft thresholding operation - λ λ An estimate of Z [Gregor & LeCun, 2010] 21

22 Iterative soft thresholding algorithm (ISTA) X R d X = AZ + E μa T Step size μ obeys 1 μ A ISTA + I μa T A Z Soft thresholding operation - λμ λμ Minimizer of min Z ψ μ is the step size X A Z + λ Z 1 [Daubechies, Defrise & Mol, 2004], [Beck & Teboulle, 2009] 22

23 PROJECTED GRADIENT DESCENT (PGD) I μa T A μ is the step size X R d μa T + ψ Z X = AZ + E f(z) R ψ projects onto the set Υ f( Z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 23

24 THEORY FOR PGD Theorem 8: Let Z R d, f: R d R a proper function, f Z R, C f (Z) the tangent cone of f at point Z, A R m d a random Gaussian matrix and X = AZ + E. Then the estimate of PGD at iteration t, Z t, obeys Z t Z κ f ρ t Z, where ρ = sup U T I μa T A V U,V C f Z B d and κ f = 1 if f is convex and κ f = 2 otherwise. [Oymak, Recht & Soltanolkotabi, 2016]. 24

25 PGD CONVERGENCE RATE ρ = sup U T I μa T A V is the convergence U,V C f Z B d rate of PGD. Let ω be the Gaussian mean width of C f Z B d. If μ = 1 m+ d 2 1 d If μ = 1 m then ρ = O ω m. then ρ = 1 O m ω m+d. For the k-sparse model ω 2 = O klog d For GMM with k Gaussians ω 2 = O k. How may we cause ω to become smaller for having a better convergence rate? 25

26 INACCURATE PROJECTION PGD iterations projects onto Υ = Z: f Z R. Smaller Υ Smaller ω. Faster convergence as ρ = 1 O m ω or O m+d Let us assume that our signal belongs to a smaller set Υ = Z: f Z R with ω ω. Ideally, we would like to project f( Z) R onto Υ instead of Υ. This will lead to faster convergence. What if such a projection is not feasible? ω m Υ f( Z) R Υ 26

27 INACCURATE PROJECTION We will estimate the projection onto Υ by A linear projection P Followed by a projection onto Υ Assumptions: Υ (PZ) Z ϵ f( Z) R Projection of the target vector Z onto P and then onto Υ 27

28 INACCURATE PGD (IPGD) P I μa T A μ is the step size X R d μpa T + ψ Z X = AZ + E f(z) R Υ ψ projects onto the set Υ f(z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 28

29 THEORY FOR IPGD Theorem 9: Let Z R d, f: R d R a proper convex* function, f Z R, C f (Z) the tangent cone of f at point Z, A R d m a random Gaussian matrix and X = AZ + E. Then the estimate of IPGD at iteration t, Z t, obeys Z t Z ρ P t + 1 ρ P t 1 ρ P ε Z, where ρ p = sup U T P I μa T A PV U,V C f Z B d and ε = (2 + ρ p )ϵ. [Giryes, Eldar, Bronstein & Sapiro, 2018] *We have a version of this theorem also when f is non-proper or non-convex function 29

30 CONVERGENCE RATE COMPARISON PGD convergence: IPGD convergence: ρ t ρ P t + 1 ρ P t 1 ρ P (2 + ρ p )ε (a) ρ t (b) P + ε (c) t ρ P ρ t (a)ε is negligible compared to ρ P (b) For small values of t (early iterations). (c) Faster convergence as ρ P ρ (because ω p ω). 30

31 MODEL BASED COMPRESSED SENSING Υ is the set of sparse vectors with sparsity patterns that obey a tree structure. Projecting onto Υ improves convergence rate compared to projecting onto the set of sparse vectors Υ [Baraniuk et al., 2010]. The projection onto Υ is more demanding than onto Υ Note that the probability of selecting atoms from lower tree levels is smaller than upper ones. P will be a projection onto certain tree levels zeroing the values at lower levels

32 MODEL BASED COMPRESSED SENSING Z Z t Non-zeros picked entries has zero mean random Gaussian distribution with variance: - 1 at first two levels at the third level at the rest of the levels 32

33 SPECTRAL COMPRESSED SENSING Υ is the set of vectors with sparse representation in a 2-times redundant DCT dictionary such that: The active atoms are selected uniformly at random such that minimum distance between neighboring atoms is 20. The value of each representation coefficient ~N(0,1) i.i.d. We set the neighboring coefficients at distance 2 of each active atom to be ~N(0, ), respectively We set P to be a pooling-like operation that keeps in each window of size 3 only the largest value. 33

34 Z Z t SPECTRAL COMPRESSED SENSING 34

35 SPECTRAL COMPRESSED SENSING Υ is the set of vectors with sparse representation in a 4-times redundant DCT dictionary such that: The active atoms are selected uniformly at random such that minimum distance between neighboring atoms is 5. The value of each representation coefficient ~N(0,1) i.i.d. We set the neighboring coefficients at distance 1 and 2 of each active atom to be ~N(0, ) We set P to be a pooling-like operation that keeps in each window of size 5 only the largest value. 35

36 Z Z t SPECTRAL COMPRESSED SENSING 36

37 LEARNING THE PROJECTION If we have no explicit information about Υ it might be desirable to learn the projection. Instead of learning P, it is possible to replace P I μa T A and μpa T with two learned matrices S and W respectively. This leads to a very similar scheme to the one of LISTA and provides a theoretical foundation for the success of LISTA. 37

38 Learned linear operators LEARNED IPGD S X R d W + ψ Z X = AZ + E f(z) R Υ ψ projects onto the set Υ f(z) R Estimate of Z. Aim at solving min X A Z Z s. t. f( Z) R 38

39 SUPER RESOLUTION A popular super-resolution technique uses a pair of low-res and high-res dictionaries [Zeyde et al. 2012] The original work uses OMP with sparsity 3 to decode the representation of patches in low-res image Then the representation is used to reconstruct the patches of the high-res image We replace OMP with LIPGD with 3 levels but higher target sparsity This leads to better reconstruction results (with up to 0.5dB improvement) 39

40 Learned linear operators LISTA S X R d W + ψ Z X = AZ + E f(z) R Υ ψ is a proximal mapping. ψ U = argmin U Z Z R d +λf( Z) Estimate of Z. Aim at solving min X A Z Z +λ f( Z) 40

41 LISTA MIXTURE MODEL Approximation of the projection onto Υ with one linear projection may not be accurate enough. This requires more LISTA layers/iterations. Instead, one may use several LISTA networks, where each approximates a different part of Υ Training multiple LISTA networks accelerate the convergence further. Υ Υ 41

42 LISTA MIXTURE MODEL X A Z t Z t 1 X AZ 2 2 Z 1 42

43 RELATED WORKS In [Bruna et al. 2017] it is shown that a learning may give a gain due to better preconditioning of A. In [Xin et al. 2016] a relation to the restricted isometry property (RIP) is drawn In [Borgerding & Schniter, 2016] a connection is drawn to approximate message passing (AMP). All these works consider only the sparsity case 43

44 ACKNOWLEDGEMENTS Guillermo Sapiro Duke University Yonina C. Eldar Technion Alex M. Bronstein Technion TCE conference 44

45 QUESTIONS? WEB.ENG.TAU.AC.IL/~RAJA TCE conference 45

46 GAUSSIAN MEAN WIDTH Gaussian mean width: ω Υ = E sup V,W Υ V W, g, g~n(0, I). The width of the set Υ in the direction of g: V Υ W g TCE conference 46

47 MEASURE FOR LOW DIMENSIONALITY Gaussian mean width: ω Υ = E sup V,W Υ V W, g, g~n(0, I). ω 2 Υ is a measure for the dimensionality of the data. Examples: If Υ B d is a Gaussian Mixture Model with k Gaussians then ω 2 Υ = O(k) If Υ B d is a data with k-sparse representations then ω 2 Υ = O(k log d) TCE conference 47

ON THE STABILITY OF DEEP NETWORKS

ON THE STABILITY OF DEEP NETWORKS ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference

More information

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathematics of Deep Learning Tutorial CVPR, Hawaii July 21, 2017 1 CUTTING EDGE PERFORMANCE IN MANY OTHER APPLICATIONS Disease diagnosis [Zhou,

More information

HOW STRUCTURE OF THE DATA CAN IMPROVE OUR UNDERSTANDING AND USAGE OF DEEP LEARNING?

HOW STRUCTURE OF THE DATA CAN IMPROVE OUR UNDERSTANDING AND USAGE OF DEEP LEARNING? HOW STRUCTURE OF THE DATA CAN IMPROVE OUR UNDERSTANDING AND USAGE OF DEEP LEARNING? RAJA GIRYES TEL AVIV UNIVERSITY IMVC March 6, 2018 1 AGENDA Deep learning impact. A sample of existing theory for deep

More information

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathematics of Deep Learning Tutorial EUSIPCO Conference, Budapest, Hungary August 29, 2016 1 AGENDA History and introduction to deep learning.

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY

MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY MATHEMATICS OF DEEP LEARNING RAJA GIRYES TEL AVIV UNIVERSITY Mathema'cs of Deep Learning Tutorial CDC, Melbourne, Australia December 15, 2017 1 DEEP LEARNING IMPACT Today we get 3.5% by 152 layers Imagenet

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies

More information

DNNs for Sparse Coding and Dictionary Learning

DNNs for Sparse Coding and Dictionary Learning DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad

Sparse Modeling. in Image Processing and Deep Learning. Michael Elad Sparse Modeling in Image Processing and Deep Learning Computer Science Department - Israel Institute of Technology Haifa 32000, Israel The research leading to these results has been received funding from

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy? Deep Neural Networks with Gaussian Weights: A Universal Classification Strategy? Raja Giryes *, Guillermo Sapiro **, and Alex M. Bronstein * * School of Electrical Engineering, Faculty of Engineering,

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Robust Large Margin Deep Neural Networks

Robust Large Margin Deep Neural Networks Robust Large Margin Deep Neural Networks 1 Jure Sokolić, Student Member, IEEE, Raja Giryes, Member, IEEE, Guillermo Sapiro, Fellow, IEEE, and Miguel R. D. Rodrigues, Senior Member, IEEE arxiv:1605.08254v2

More information

arxiv: v1 [stat.ml] 22 Nov 2016

arxiv: v1 [stat.ml] 22 Nov 2016 Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery arxiv:1611.07252v1 [stat.ml] 22 Nov 2016 Scott Wisdom 1, Thomas Powers 1, James Pitton 1,2, and Les Atlas 1 1 Department of Electrical

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds

Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds Xiaohan Chen Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA

More information

Margin Preservation of Deep Neural Networks

Margin Preservation of Deep Neural Networks 1 Margin Preservation of Deep Neural Networks Jure Sokolić 1, Raja Giryes 2, Guillermo Sapiro 3, Miguel R. D. Rodrigues 1 1 Department of E&EE, University College London, London, UK 2 School of EE, Faculty

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Sparse Estimation and Dictionary Learning

Sparse Estimation and Dictionary Learning Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Energy Landscapes of Deep Networks

Energy Landscapes of Deep Networks Energy Landscapes of Deep Networks Pratik Chaudhari February 29, 2016 References Auffinger, A., Arous, G. B., & Černý, J. (2013). Random matrices and complexity of spin glasses. arxiv:1003.1129.# Choromanska,

More information

Sparse molecular image representation

Sparse molecular image representation Sparse molecular image representation Sofia Karygianni a, Pascal Frossard a a Ecole Polytechnique Fédérale de Lausanne (EPFL), Signal Processing Laboratory (LTS4), CH-115, Lausanne, Switzerland Abstract

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

A Quest for a Universal Model for Signals: From Sparsity to ConvNets

A Quest for a Universal Model for Signals: From Sparsity to ConvNets A Quest for a Universal Model for Signals: From Sparsity to ConvNets Yaniv Romano The Electrical Engineering Department Technion Israel Institute of technology Joint work with Vardan Papyan Jeremias Sulam

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Generalization and Regularization TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2019 Generalization and Regularization 1 Chomsky vs. Kolmogorov and Hinton Noam Chomsky: Natural language grammar cannot be learned by

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements

Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements Compressed Sensing - Near Optimal Recovery of Signals from Highly Incomplete Measurements Wolfgang Dahmen Institut für Geometrie und Praktische Mathematik RWTH Aachen and IMI, University of Columbia, SC

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Optimization geometry and implicit regularization

Optimization geometry and implicit regularization Optimization geometry and implicit regularization Suriya Gunasekar Joint work with N. Srebro (TTIC), J. Lee (USC), D. Soudry (Technion), M.S. Nacson (Technion), B. Woodworth (TTIC), S. Bhojanapalli (TTIC),

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Sparse Proteomics Analysis (SPA)

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF AMP-inspired Deep Networks, with Comms Applications Phil Schniter Collaborators: Sundeep Rangan (NYU), Alyson Fletcher (UCLA), Mark Borgerding (OSU) Supported in part by NSF grants IIP-1539960, CCF-1527162,

More information

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde A Convex Approach for Designing Good Linear Embeddings Chinmay Hegde Redundancy in Images Image Acquisition Sensor Information content

More information

Multilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems

Multilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems Multilevel and Adaptive Sparse of Inverse Problems Fachbereich Mathematik und Informatik Philipps Universität Marburg Workshop Sparsity and Computation, Bonn, 7. 11.6.2010 (joint work with M. Fornasier

More information

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising

BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising BM3D-prGAMP: Compressive Phase Retrieval Based on BM3D Denoising Chris Metzler, Richard Baraniuk Rice University Arian Maleki Columbia University Phase Retrieval Applications: Crystallography Microscopy

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

A picture of the energy landscape of! deep neural networks

A picture of the energy landscape of! deep neural networks A picture of the energy landscape of! deep neural networks Pratik Chaudhari December 15, 2017 UCLA VISION LAB 1 Dy (x; w) = (w p (w p 1 (... (w 1 x))...)) w = argmin w (x,y ) 2 D kx 1 {y =i } log Dy i

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Sparsifying Transform Learning for Compressed Sensing MRI

Sparsifying Transform Learning for Compressed Sensing MRI Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation

On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation Yingzhen Yang YYANG58@ILLINOIS.EDU Jianchao Yang JIANCHAO.YANG@SNAPCHAT.COM Snapchat, Venice, CA 90291 Wei Han WEIHAN3@ILLINOIS.EDU

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Learning Sparsifying Transforms

Learning Sparsifying Transforms 1 Learning Sparsifying Transforms Saiprasad Ravishankar, Student Member, IEEE, and Yoram Bresler, Fellow, IEEE Abstract The sparsity of signals and images in a certain transform domain or dictionary has

More information

arxiv: v1 [cs.lg] 13 Dec 2017

arxiv: v1 [cs.lg] 13 Dec 2017 Mathematics of Deep Learning René Vidal Joan Bruna Raja Giryes Stefano Soatto arxiv:1712.04741v1 [cs.lg] 13 Dec 2017 Abstract Recently there has been a dramatic increase in the performance of recognition

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric

More information

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information