Analysis of Greedy Algorithms
|
|
- Hillary Summers
- 5 years ago
- Views:
Transcription
1 Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th
2 Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm Analysis on hard-thresholding pursuit
3 Introduction Greedy algorithms: Optimization in each step No global optimality guarantee Examples Boosting (AdaBoost, Gradient Boosting), Matching Pursuit (OMP, CoSaMP), Forward and Backward algorithms (FoBa)
4 Some notation Abbreviate l(x β; y) as l(β) J (β): support of β, i.e. J (β) = {j : β j 0} X S : sub-matrix of X formed with columns in set S β S : sub-vector of β on set S β : the true coefficient; β t : estimated β in tth iteration J \J : the elements in J but not in J, i.e. J J C J : cardinality of set J e j : a vector with only jth element as 1, others as 0
5 Example Consider OMP (greedy least square) with true model y = X β + ε Note: p > n, so X T X is not invertible OMP procedure: Select and update support: j t = argmax j l(β t 1 ) j = argmax j X j, y X β t 1 ; J t = J t 1 {j t } Update estimator: β t = argmin β l(β) subject to J (β) J t (Full correction) Orthogonal: residual (y X β t ) is orthogonal to the selected support (due to full correction)
6 Problem setup Key ingredients in greedy algorithms: Choice of loss function: quadratic loss (regression); exponential loss; non-convex loss Selection criterion: select one/multiple features; choose the one with largest gradient/largest decrease in function value; involve backward procedure/no backward Iterative rule: keep the previous weights/modify the weights
7 Problem setup Objective function: min l(x β; y) subject to β 0 q Consider learning problems with large number of features (p > n) Sparse target: linear combination of small number of features (q < n) Directly solve sparse learning problem (L 0 regularization) Given weak classifiers, Boosting can be formulated into this framework
8 Example Assumption: no noise; X j 2 = 1 for each j (unit vector) Intuition: make a connection between l(β t ) and l(β t 1 ) In regression, l(β) = y X β 2, l(β) = X, X β y A simple analysis: here L 1 is not exactly L 1 norm (definition omitted) y X β t 2 2 y X β t Optimal y X β t 1 αx j t 2 2 = y X β t α y X β t 1, X j t + α 2 FC = y X β t 1, y Optimal Select α = y X β t 1, X j t y X β t 1, X j t y L1
9 Example Combine the two equations: y X β t 2 2 y X β t 1 2 2(1 y X β t y 2 L 1 ) Result by induction: Drawback: Noise? X β X β t 2 2 Estimation error? y 2 L 1? no noise = y X β t 2 2 y 2 L 1 t + 1
10 Target of analysis Commonly used: Prediction error: X β X β t 2 2 Statistical error: β β t 2 2 Selection consistency (support recovery): J (β ) = J (β t ) Some others: Minimax error bound Iteration time Note: Many papers consider the globally optimal solution instead of the true β. Most of the time, they can be replaced with each other. (Belief: β should approximately optimize l(β))
11 Regularity condition Commonly used and well known: Restricted isometry property (RIP): ρ (s) β 2 2 X β 2 2 ρ + (s) β 2 2 for all β R p with β 0 s Restricted strong convexity/smoothness (RSC/RSS): ρ (s) β β 2 2 l(β ) l(β) l(β), β β ρ + (s) β β 2 2 for all β β R p with β β 0 s
12 Regularity condition Values for ρ + (s) and ρ (s) when n = 200, s increases from 1 to n; X has i.i.d. N(0, 1/ n) entries
13 Regularity condition Other: Restricted gradient optimal constant: l(β ), β ɛ s (β ) β 2 for all β 0 s ɛ s (β ): a measure of noise, on the σ s log(p) level for regression Sparse eigenvalue condition (a different name of RIP, but use only one side): { X β 2 } ρ (s) = inf 2 β 2 ; β 0 s 2 We will use ρ and ρ + with the definition in RSC/RSS in this talk
14 Full correction effect Full correction step: ˆβ = argmin β l(β), subject to J (β) J Effect: l(β) J = 0 for β J Result: l(β ) l(β) ρ (s) β β l(β) J \J, (β β) J \J where s J J Benefit: whenever consider l(β), β β, only consider l(β) J \J, (β β) J \J ; bound with J \J is better than J J
15 Forward effect Two common choices (if adding only one feature in each step): Select j t = argmin η,j l(β + ηe j ) (line search) Select j t = argmax j l(β) j (Computationally efficient) Same result for the two selections with full correction (due to the crude bound): J \J {l(β) min l(β + ηe t η j )} ρ (s) ρ + (1) {l(β) l(β )} Comments: Interpretation: Transfer l(β) argmin η l(β + ηe t j ) into l(β) l(β ) for any β Full correction turns J J into J \J
16 Forward effect More details: Select j t = argmin η,j l(β + ηe j ): l(β) min l(β + ηe j t ) optimality l(β) min l(β + ηe η j) η,j J \J Select j t = argmax j l(β) j : = l(β) min η,j J \J l(β + η(β j β j )e j ) l(β) min l(β + ηe j t ) = l(β) min l(β + ηsgn(β η η i)e j t ) optimality l(β) min η,j J \J l(β + ηsgn(β j)e j ) Comment: Union bound used in J \J to derive the final result
17 OMP A bit refined analysis using forward effect: {l(β) min l(β + ηe j t )} η ρ (s) ρ + (1) J \J {l(β) l(β )} Taking β as β t, β + ηe j t as β t+1 and β as β we have l(β t+1 ) l(β t ) c t {l(β t ) l(β )} where c t = ρ (s)/{ρ + (1) J \J t } It can be transformed into l(β t+1 ) l(β ) (1 c t ){l(β t ) l(β )} e ct {l(β t ) l(β )} which gives l(β t ) l(β ) e Σct l(β ) + e Σct l(β 0 )
18 OMP Recall restricted gradient optimal constant: for β 0 s, l(β ), β ɛ s (β ) β 2 Usage: statistical error bound can be achieved through l(β) l(β ): ρ (s) β β 2 2 2l(β) 2l(β ) + ɛ s(β ) 2 ρ (s) where s J t J Key step in the proof: l(β) l(β ) = l(β) l(β ) l(β ), β β + l(β ), β β Once we get l(β t ) l(β ), bound on β t β 2 2 can also be achieved
19 OMP The analysis can further be refined using several techniques: Use a different l(β ) in each step so the bound can be more precise with another term q k. q k comes from the fact that l(β) l(β ) 1.5ρ + (s) β J \J ɛ s (β )/ρ + (s) Give a criterion on t so that c t can be made into a constant to combine with q k into induction. Final result (s = J (β ) 0 + t since we consider β β t ): l(β t ) l(β ) + 2.5ɛ s (β )/ρ (s) β t β 2 6ɛ s (β )/ρ (s) = O(σ J (β ) log p) when t = 4 J ρ +(1) ρ (s) ln 20ρ +( J ) ρ (s)
20 Termination Time Intuition: if the decrease is significant for each step, then there should not be too many iterations Stop before any over-fitting happens: l(β t ) l(β ) A routine to get a bound: iteration time t controls certain parameter in another bound. A restriction on that parameter gives a bound on iteration time.
21 Forward-backward greedy algorithm FoBa-obj/FoBa-gdt Process: Forward: Select the one with largest decrease in function value/largest gradient, do full correction; stop if δ t = l(β t+1 ) l(β t ) δ Backward: delete a selected feature if min j l(β t β j e j ) l(β t ) δ t /2, do full correction Intuition of FoBa: Forward procedure ensures significant decrease in function value Backward procedure removes incorrect features in early stage If decreasing is significant, gradient should be large; Otherwise, there is a bound on the infinity norm of the gradient δ is used to control forward and backward effect
22 Backward effect Assume β is also the global optimal solution Delete j t = argmin j l(β β j e j ) l(β) and do full correction Make a good control of β on J \J : β J \J 2 2 J \J ρ + (1) {min l(β β j ) l(β)} j Crude usage: β β 2 (β β ) J \J 2 = β J \J 2 Full correction turns J J into J \J
23 FoBa How to analyze? δ can be a tool to make bounds for different quantities; δ t can be a bridge to connect bounds A simple proof of a bound on gradient: l(β) ρ + (1)δ δ l(β) min l(β + ηe j ) η,j max ηe j, l(β) ρ + (1)η 2 η,j max j l(β) j 2 ρ + (1) Start with an assumption on selecting appropriate δ so that l(β ) l(β t ). δ > 4ρ +(1) ρ 2 (s) l(β )
24 General Framework Strategy I: Use an auxiliary variable β as the optimal solution on J (β ) = J (β ) J (β t ) to help analysis Termination rule comes from l(β t ) l(β ). Divide l(β t ) l(β ) into l(β t ) l(β ) {l(β ) l(β )} and use full correction result on each part For each part, we get β t β and β β Forward step gives bound on β t β ; Backward step gives bound on β t β ; both through δ t β t β β t β + β β gives a relationship between β β and β t β
25 Termination time for FoBa Full correction and RSC/RSS: 0 l(β t ) l(β ) = l(β t ) l(β ) {l(β ) l(β )} {ρ + (s) ρ (s)(k 1) 2 } β t β 2 2 where β t β 2 k β t β 2 Bound on forward step: δ t ρ 2 (s) ρ +(1) J \J t 1 βt β 2 2 Bound on backward step: β t β 2 2 J \J ρ +(1) δt Combination through δ t gives: k = ρ2 (s) J \J ρ +(1) J \J t 1 Recall J J t J J + t, which gives an upper bound on t as: { t ( J + 1) ( ρ + (s) ρ (s) + 1)2ρ +(1) ρ (s) } 2
26 General Framework Strategy II (an easy approach): use simple inequality with regularity condition to derive bound Use RSC/RSS, transfer l(β t ) l(β ) into terms with gradient and β β t 2 2 Use Holder s inequality directly to deal with the gradient term, l(β t ), β t β l(β t ) β t β 1 β t β 1 transfers into 2-norm bound. l(β) is bounded by the design of the algorithm (involving δ)
27 FoBa Details: 0 l(β ) l(β t ) l(β t ), β β t + ρ (s) β β t Final result: ρ (s) β β t 2 2 l(β t ) J \J t, (β β t ) J \J t ρ + (1)δ J \J t β β t 2 β β t ρ+ (1)δ 2 J ρ (s) \J t β β t 2 2 ρ +(1)δ ρ 2 (s) where = {j J \J t : βj 2ρ+(1) γ}, γ = ρ 2 (s) Other bounds can be achieved as well: l(β t ) l(β ) ρ +(1)δ ρ (s) ; ρ (s) 2 ρ + (1) J t \J J \J t
28 FoBa To make the bound look better (a trick): ρ + (1)δ ρ 2 (s) J \J t β J \J t 2 2 where γ = 2ρ+(1) ρ 2 (s) Then γ 2 {j J \J t : β j γ} = 2ρ +(1) ρ 2 (s) {j J \J t : β j γ} J \J t 2 {j J \J t : β j γ} = 2( J \J t {j J \J t : β j < γ} ) which leads to J \J t 2 {j J \J t : β j < γ}
29 FoBa Strategy III: use random matrix theory and simple inequalities to derive bound X β t y 2 2 = X βt X β 2 2 ε, X βt X β + ε 2 2 Define ε = l(β ). Then a generalized version is l(β t ) = X β t X β 2 2 ε, X β t X β + l(β ) ε, X β t X β can be bounded using random matrix theory l(β t ) l(β ) can be upper bounded through forward and backward effect on l(β t ) l(β ) and l(β ) l(β ), but some more precise analysis with tricks are involved Termination time bound will also change accordingly Benefit: no need assumption on RSS (ρ + )
30 FoBa Assume ε is sub-gaussion with parameter σ Comparison between results from strategy II and III: with δ ρ 2 +(1)ρ 1 (s) ε 2 β β t 2 2 δρ 2 +(1)ρ 1 (s) β β t 2 2 ρ 1 (s)σ2 J + δρ 2 (s) with δ ρ 1 (s)σ2 log p Comparison with LASSO: A bit better than LASSO error bound: O(σ 2 J log p) LASSO also needs stronger condition (irrepresentable condition) for selection consistency
31 Selection consistency Target: J = J t Several ways to evaluate: In FoBa, max{ J \J t, J t \J } = O( ); need < 1 with high probability Suppose β is known, build necessary/sufficient condition and analyze (e.g. KKT) Derive an upper bound for β β t and add a β -min condition
32 Hard-thresholding pursuit HTP procedure: Select q features with largest absolutely values after running gradient descent: β t = Θ(β t 1 η l(β t 1 ); q); do full correction The analysis in the paper use the global optimal solution for a discussion (β is the global minimum under β q) Global optimal solution is easier to be analyzed; but we can use random matrix theory to derive bounds between β and global optimal solution
33 Hard-thresholding pursuit A naive analysis: Assume l(β ) l(β t ) RSC and Holder inequality gives: l(β ) l(β t ) l(β t ) J \J t 2 β t β 2 +ρ (s) β t β 2 2 If J t J, min j β t j < β β t 2 min j β t j 2q l(β t ) ρ (2q) 2q l(β t ) ρ (2q) guarantees support recovery
34 Hard-thresholding pursuit The complete analysis is more precise with several lemmas and tricks (details omitted) Main ideas: Under certain conditions (those unknown constant terms involved), HTP terminates when β t reaches β HTP will not terminate before β t reaching β The iteration time is finite
35 Forward effect for HTP Key idea: handle the gradient by regularity condition (β β) J 2 2 = β β, (β β) J = β β η l(β ) + η l(β), (β β) J η l(β), (β β) J ρ β β 2 + η l(β) 2 (β β) J 2 where ρ = 1 2ηρ (s) + η 2 ρ + (s) is obtained from β β, l(β ) l(β) ρ (s) β β 2 2 l(β ) l(β) 2 ρ + (s) β β 2 Result: β β 2 β J \J 2 1 ρ + η l(β) J \J 2 1 ρ
36 Some comments In general, full correction make the analysis easier but not necessarily better in practice Almost every analysis needs to use RSC/RSS (or equivalently type) Induction is still a good tool to do analysis, but the bound can be very complicated The so called constant part in bound can play a significant role in practice, so the method may fail
37 Literature Forward-backward greedy algorithm: Barron, A. R., Cohen, A., Dahmen, W., & DeVore, R. A. (2008). Approximation and learning by greedy algorithms. The annals of statistics, 36(1), Liu, J., Ye, J., & Fujimaki, R. (2014, January). Forward-backward greedy algorithms for general convex smooth functions over a cardinality constraint. In International Conference on Machine Learning (pp ). Zhang, T. (2011). Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE transactions on information theory, 57(7), Matching pursuit Needell, D., & Tropp, J. A. (2009). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and computational harmonic analysis, 26(3), Zhang, T. (2009). On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research, 10(Mar), Zhang, T. (2011). Sparse recovery with orthogonal matching pursuit under RIP. IEEE Transactions on Information Theory, 57(9), Hard thresholding pursuit: Bahmani, S., Raj, B., & Boufounos, P. T. (2013). Greedy sparsity-constrained optimization. Journal of Machine Learning Research, 14(Mar), Yuan, X., Li, P., & Zhang, T. (2014, January). Gradient hard thresholding pursuit for sparsity-constrained optimization. In International Conference on Machine Learning (pp ). Yuan, X., Li, P., & Zhang, T. (2016). Exact recovery of hard thresholding pursuit. In Advances in Neural Information Processing Systems (pp ).
Boosting. Jiahui Shen. October 27th, / 44
Boosting Jiahui Shen October 27th, 2017 1 / 44 Target of Boosting Figure: Weak learners Figure: Combined learner 2 / 44 Boosting introduction and notation Boosting: combines weak learners into a strong
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationRestricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationOn Iterative Hard Thresholding Methods for High-dimensional M-Estimation
On Iterative Hard Thresholding Methods for High-dimensional M-Estimation Prateek Jain Ambuj Tewari Purushottam Kar Microsoft Research, INDIA University of Michigan, Ann Arbor, USA {prajain,t-purkar}@microsoft.com,
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationAdaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations
Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations Tong Zhang, Member, IEEE, 1 Abstract Given a large number of basis functions that can be potentially more than the number
More informationStability and Robustness of Weak Orthogonal Matching Pursuits
Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationGeneralized greedy algorithms.
Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationGreedy Sparsity-Constrained Optimization
Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationAccelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization
Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Jinghui Chen Department of Systems and Information Engineering University of Virginia Quanquan Gu
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationRobust Sparse Recovery via Non-Convex Optimization
Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn
More informationNoisy and Missing Data Regression: Distribution-Oblivious Support Recovery
: Distribution-Oblivious Support Recovery Yudong Chen Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 7872 Constantine Caramanis Department of Electrical
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationof Orthogonal Matching Pursuit
A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More informationc 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE
METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationCoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles
CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles SIAM Student Research Conference Deanna Needell Joint work with Roman Vershynin and Joel Tropp UC Davis, May 2008 CoSaMP: Greedy Signal
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationSparse analysis Lecture III: Dictionary geometry and greedy algorithms
Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j
More informationGradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property
: An iterative algorithm for sparse recovery with restricted isometry property Rahul Garg grahul@us.ibm.com Rohit Khandekar rohitk@us.ibm.com IBM T. J. Watson Research Center, 0 Kitchawan Road, Route 34,
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationPre-weighted Matching Pursuit Algorithms for Sparse Recovery
Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationMultipath Matching Pursuit
Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationModel-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk
Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationStopping Condition for Greedy Block Sparse Signal Recovery
Stopping Condition for Greedy Block Sparse Signal Recovery Yu Luo, Ronggui Xie, Huarui Yin, and Weidong Wang Department of Electronics Engineering and Information Science, University of Science and Technology
More informationLinear Convergence of Stochastic Iterative Greedy Algorithms with Sparse Constraints
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 7--04 Linear Convergence of Stochastic Iterative Greedy Algorithms with Sparse Constraints Nam Nguyen
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationTRADING ACCURACY FOR SPARSITY IN OPTIMIZATION PROBLEMS WITH SPARSITY CONSTRAINTS
TRADING ACCURACY FOR SPARSITY IN OPTIMIZATION PROBLEMS WITH SPARSITY CONSTRAINTS SHAI SHALEV-SHWARTZ, NATHAN SREBRO, AND TONG ZHANG Abstract We study the problem of minimizing the expected loss of a linear
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationIntroduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012
Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation
More informationA Tight Bound of Hard Thresholding
Journal of Machine Learning Research 18 018) 1-4 Submitted 6/16; Revised 5/17; Published 4/18 A Tight Bound of Hard Thresholding Jie Shen Department of Computer Science Rutgers University Piscataway, NJ
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationCoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp
CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationORTHOGONAL matching pursuit (OMP) is the canonical
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010 4395 Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property Mark A. Davenport, Member, IEEE, and Michael
More informationarxiv: v1 [math.na] 26 Nov 2009
Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAnalysis of Multi-stage Convex Relaxation for Sparse Regularization
Journal of Machine Learning Research 11 (2010) 1081-1107 Submitted 5/09; Revised 1/10; Published 3/10 Analysis of Multi-stage Convex Relaxation for Sparse Regularization Tong Zhang Statistics Department
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA new method on deterministic construction of the measurement matrix in compressed sensing
A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationExponential decay of reconstruction error from binary measurements of sparse signals
Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationarxiv: v2 [cs.lg] 6 May 2017
arxiv:170107895v [cslg] 6 May 017 Information Theoretic Limits for Linear Prediction with Graph-Structured Sparsity Abstract Adarsh Barik Krannert School of Management Purdue University West Lafayette,
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More information