Compressed Sensing and Neural Networks
|
|
- Lizbeth Carter
- 5 years ago
- Views:
Transcription
1 and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, / 31
2 Outline Lasso & Introduction Notation Training the network Applications 2 / 31
3 Part I Lasso & Introduction Notation Training the network Applications 3 / 31
4 Least squares Lasso & Fitting a cloud of points by a linear hyperplane Considered already by Gauss and Legendre in 18th century In 2D: 4 / 31
5 Least squares Objects (=points) described by Ω real numbers: d 1 = (d 1,1,..., d 1,Ω ) R Ω. d N = (d N,1,..., d N,Ω ) R Ω N - number of objects; D - N Ω matrix with rows d 1,..., d N 5 / 31
6 Least squares Objects (=points) described by Ω real numbers: d 1 = (d 1,1,..., d 1,Ω ) R Ω. d N = (d N,1,..., d N,Ω ) R Ω N - number of objects; D - N Ω matrix with rows d 1,..., d N P = (P 1,..., P N ) are properties of interest We look for a linear dependence P = f (d) with a linear f, i.e. P i = Ω c j d i,j or P = Dc j=1 5 / 31
7 Least squares Lasso & The solution is found by minimizing the least-square error: ĉ = arg min c R Ω N ( P i i=1 Ω j=1 c j d i,j ) 2 = arg min c R Ω P Dc / 31
8 Least squares The solution is found by minimizing the least-square error: ĉ = arg min c R Ω N ( P i i=1 Ω j=1 c j d i,j ) 2 = arg min c R Ω P Dc 2 2 Closed formula exists Convex objective function ĉ with all coordinates occupied Absolute term incorporated by an additional column full of ones 6 / 31
9 Regularization How to include preknowledge on c? Say, we prefer linear fit with small coefficients. We just weight the error of the fit against the size of the coefficient! λ > 0 - regularization parameter λ 0: least squares λ : c = 0 ĉ = arg min c R Ω P Dc λ c / 31
10 Tractability Convexity The minimizer is unique Local minimum of a convex function is also a global one Many effective methods exist (convex optimization) 8 / 31
11 Tractability Convexity The minimizer is unique Local minimum of a convex function is also a global one Many effective methods exist (convex optimization) P vs. NP P-problems: solvable in polynomial time (in dependence on the size of the input) NP-problems: solution verifiable in polynomial time; P NP One million dollar problem: P=NP? Computational Complexity 8 / 31
12 Sparsity Lasso & If Ω is large (especially Ω N), we are often interested in selecting features, i.e. in c with many coordinates equal to zero. c 0 := #{i : c i 0} - the number of non-zero coordinates of c Looking for a linear fit using only two features: Regularized version: ĉ = arg min P Dc 2 2 c R Ω, c 0 2 ĉ = arg min c R Ω P Dc λ c 0 NP-hard! 9 / 31
13 l 1 -minimization Lasso & Other ways to measure the size of c: the l p -norms ( Ω c p = c j p) 1/p j=1 Unit balls in l p in R 2 p = : c = max c j j=1,...,ω p 1 - convex problem p 1 - promotes sparsity 10 / 31
14 l 1 -minimization Lasso & p 1 - promotes sparsity Solution of S p = arg min z p s.t. Az = y for p = 1, p = 2 z R 2 11 / 31
15 l 1 -minimization Take p = 1 (Lasso - Tibshirani, 1996) ĉ = arg min c R Ω P Dc λ c 1 Chen, Donoho, Saunders: Basis pursuit (1998) λ 0 : least squares λ : ĉ = 0 In between: λ selects sparsity 12 / 31
16 l 1 -minimization Lasso & Effect of λ > 0 on the support of ω 13 / 31
17 (aka Compressive Sensing, Compressive Sampling) Theorem: Let D R N Ω with independent gaussian entries! Let 0 < ε < 1, s a natural number and ( ) N C s log(ω) + log(1/ε), C a universal constant. If c R Ω is s-sparse, P = Dc and ĉ is the minimizer of ĉ = arg min u 1, s.t. P = Du, u R Ω then c = ĉ with prob. at least 1 ε. 14 / 31
18 (aka Compressive Sensing, Compressive Sampling) Candés, Romberg, Tao (2006); Donoho (2006) Extensive theory of recovery of sparse vectors from linear measurements Optimal conditions on the number of measurements (i.e. data points) N Cs log Ω Only true, if most of the features (i.e. the columns of D) are incoherent with the majority of the others (if two features are very similar, it is difficult to distinguish between them) H. Boche, R. Calderbank, G. Kutyniok, J.V., A Survey of, First chapter in and its Applications, Birkhäuser, Springer, / 31
19 Dictionaries Lasso & Real-life signals are (almost) never sparse in the canonical basis of R Ω, more often they are sparse in some orthonormal basis, i.e. x = Bc, where c R Ω is sparse and columns (and rows) of B R Ω Ω are orthonormal vectors - wavelets, Fourier basis, etc. applies then without any essential change!...just replace D with DB... i.e. you rotate the problem / 31
20 Dictionaries Lasso & Even more often, the signal is represented in an overcomplete dictionary/lexicon: x = Lc, where c R l is sparse and columns of L R Ω l is the dictionary/lexicon - its columns form an overcomplete system (l > Ω) x is a sparse combination of non-orthogonal vectors - the columns of L. Examples: Unions of two or more orthonormal bases, each capturing different features 17 / 31
21 Dictionaries Compressed sensing can be adapted also to this situation Optimization: ˆx = arg min L u 1, s.t. P = Du u R Ω We do not recover the (non-unique!) sparse coefficients c, but the (approximation of) the signal x. Error bound involves L x, is reasonably small for example when L L is nearly diagonal... not too many features in the dictionary are too correlated / 31
22 l 1 -based optimalization l 1 -SVM: Support vector machines are a standard tool for classification problems. l 1 -penalty term leads to sparse classifiers. Nuclear norm: Minimizing nuclear norm (=sum of absolute values of eigenvalues) of a matrix leads to low-rank matrices. TV(=total variation)-norm: Minimizing i,j u i,j+1 u i,j over images u gives images with edges and flat parts. L 1 : Minimizing the L 1 -norm (=integral of the absolute value) of a function leads to functions with small support TV-norm of f : Minimizing f leads to functions with jumps along curves. 19 / 31
23 Introduction Notation Training the network Applications Part II Lasso & Introduction Notation Training the network Applications 20 / 31
24 Lasso & Introduction Notation Training the network Applications W. McCulloch, W. Pitts (1943) Motivated by biological research on human brain and neurons Neural network is a graph of nodes, partially connected. Nodes represents neurons, oriented connections between the nodes represent the transfer of outputs of some neurons to inputs of other neurons. 21 / 31
25 Introduction Notation Training the network Applications In 70 s and 80 s a number of obstacles appeared - insufficient computer power to train large neural networks, theoretical problems of processing exclusive-or, etc. Support vector machines (and other simple algorithms) took over the field of machine learning 2010 s: Algorithmic advances and higher computational power allowed to train large neural networks to human (and superhuman) performance in pattern recognition Large neural networks (a.k.a. deep learning) used successfully in many tasks 22 / 31
26 Introduction Notation Training the network Applications : Artificial Neuron Artificial Neuron:... gets activated if a linear combination of its inputs grows over a certain threshold... Inputs x = (x 1,..., x n ) R n Weights w = (w 1,..., w n ) R n Comparing w, x with a threshold b R Plugging the result into the activation function - jump (or smoothed jump) function σ Artificial neuron is a function x σ( x, w b), where σ : R R might be σ(x) = sgn(x) or σ(x) = e x /(1 + e x ), etc. 23 / 31
27 : Layers Introduction Notation Training the network Applications Artificial neural network is a directed, acyclic graph of artificial neurons The neurons are grouped by their distance to the input into layers 24 / 31
28 : Layers Introduction Notation Training the network Applications Input: x = (x 1,..., x n ) R n First layer of neurons: y 1 = σ( x, w 1 1 b1 1 ),..., y n 1 = σ( x, w 1 n 1 b 1 n 1 ) The outputs y = (y 1,..., y n1 ) become inputs for the next layer... ; last layer outputs y R Training the network: given inputs x 1,..., x N and outputs y 1,..., y N and optimize over weights w s and b s 25 / 31
29 Introduction Notation Training the network Applications : Training The parameters p of the network are initialized (for example in a random way) = N p For a set of pairs input/output (x i, y i ) we calculate the output of the neural network with current parameters = z i = N p (x i ). In an optimal case, z i = y i for all inputs Update the parameters of the neural networks to minimize/decrease the loss function, i.e.... and repeat... y i z i 2 i 26 / 31
30 : Training Introduction Notation Training the network Applications Non-convex minimization over a huge space! Huge number of local minimizers exist Initialization of the minimization algorithm is important Backpropagation algorithm: the error at the output is redistributed to the neurons of the last hidden layer, then to the previous one, etc. The error is distributed back through the network and used to update the parameters of each neuron by a gradient descent method 27 / 31
31 : Training Introduction Notation Training the network Applications Discovered in 1960 s Applied to neural networks 1970 s Theoretical progress in 1980 s and 1990 s Profited from increased computational power in 2010 s, which allowed applications to large data sets and neural networks of tens or hundreds of layers Achieved human and super-human powers in pattern recognition and later on in many other applications 28 / 31
32 Introduction Notation Training the network Applications : Deep learning Training of a layer with large number ( 100) layers Made possible by the use of GPU s (Nvidia), which accelerated the speed of deep learning by ca. 100times Use of many parameters makes it sensitive to overfitting (=too exact adaptation to the training data, not observed in other data from the same area) Overfitting reduced by regularization methods: l 2 (decay) or l 1 (sparsity) of weights Further tricks used to accelerate the learning algorithm 29 / 31
33 Applications Lasso & Introduction Notation Training the network Applications Pattern recognition Computer vision Speech recognition Social network filtering Recommendation systems Bioinformatics AlphaGo / 31
34 Introduction Notation Training the network Applications Thank you for your attention! 31 / 31
From approximation theory to machine learning
1/40 From approximation theory to machine learning New perspectives in the theory of function spaces and their applications September 2017, Bedlewo, Poland Jan Vybíral Charles University/Czech Technical
More informationKomprimované snímání a LASSO jako metody zpracování vysocedimenzionálních dat
Komprimované snímání a jako metody zpracování vysocedimenzionálních dat Jan Vybíral (Charles University Prague, Czech Republic) November 2014 VUT Brno 1 / 49 Definition and motivation Its use in bioinformatics
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationOverview. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationBayesian Methods for Sparse Signal Recovery
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationSuper-resolution via Convex Programming
Super-resolution via Convex Programming Carlos Fernandez-Granda (Joint work with Emmanuel Candès) Structure and Randomness in System Identication and Learning, IPAM 1/17/2013 1/17/2013 1 / 44 Index 1 Motivation
More informationAN INTRODUCTION TO COMPRESSIVE SENSING
AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE
More informationBhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego
Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationWhy Sparse Coding Works
Why Sparse Coding Works Mathematical Challenges in Deep Learning Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 10 Aug 2011 Deep Learning Kickoff Meeting What is Sparse Coding? There are many formulations
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationMultiple Change Point Detection by Sparse Parameter Estimation
Multiple Change Point Detection by Sparse Parameter Estimation Department of Econometrics Fac. of Economics and Management University of Defence Brno, Czech Republic Dept. of Appl. Math. and Comp. Sci.
More informationCompressed Sensing and Linear Codes over Real Numbers
Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,
More informationUncertainty quantification for sparse solutions of random PDEs
Uncertainty quantification for sparse solutions of random PDEs L. Mathelin 1 K.A. Gallivan 2 1 LIMSI - CNRS Orsay, France 2 Mathematics Dpt., Florida State University Tallahassee, FL, USA SIAM 10 July
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationIntroduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011
Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear
More informationSparse Proteomics Analysis (SPA)
Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationIntroduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012
Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationIEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1239 Preconditioning for Underdetermined Linear Systems with Sparse Solutions Evaggelia Tsiligianni, StudentMember,IEEE, Lisimachos P. Kondi,
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationLEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler
LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationCompressive sensing of low-complexity signals: theory, algorithms and extensions
Compressive sensing of low-complexity signals: theory, algorithms and extensions Laurent Jacques March 7, 9, 1, 14, 16 and 18, 216 9h3-12h3 (incl. 3 ) Graduate School in Systems, Optimization, Control
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationJ. Sadeghi E. Patelli M. de Angelis
J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of
More informationCompressed Sensing and Related Learning Problems
Compressed Sensing and Related Learning Problems Yingzhen Li Dept. of Mathematics, Sun Yat-sen University Advisor: Prof. Haizhang Zhang Advisor: Prof. Haizhang Zhang 1 / Overview Overview Background Compressed
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationLearning MMSE Optimal Thresholds for FISTA
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm
More informationFast Hard Thresholding with Nesterov s Gradient Method
Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationarxiv: v1 [cs.it] 26 Oct 2018
Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationAlgorithms for sparse analysis Lecture I: Background on sparse approximation
Algorithms for sparse analysis Lecture I: Background on sparse approximation Anna C. Gilbert Department of Mathematics University of Michigan Tutorial on sparse approximations and algorithms Compress data
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More information