Bias-free Sparse Regression with Guaranteed Consistency
|
|
- Sharleen Caldwell
- 5 years ago
- Views:
Transcription
1 Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March 10, 2015
2 Background Goal: recover a sparse x R n from noisy linear observation b := Ax + ɛ where A R m n and b R m are given, ɛ is zero-mean unknown noise. Our focus: the under-determined case, m n LASSO is a common approach, but its solution is biased. Fan and Li (2001): to avoid bias, minimization must use nonconvex prior Our approach keeps the convex prior but replaces minimization
3 This talk Review LASSO and explains its solution bias A new regularization path: solution to an ordinary differential inclusion use convex prior, is free of bias, and has the oracle property has sign/l 2 consistency how to compute the exact path, as well as its fast approximation how to try it by making a 2-line change to your existing code
4 LASSO and its bias Minimization form: x lasso minimize x 1 + t x 2m Ax b 2 2 Variational form (optimality condition): 0 = p + t m AT (Ax lasso b) and p x lasso 1
5 LASSO and its bias Minimization form: x lasso minimize x 1 + t x 2m Ax b 2 2 Variational form (optimality condition): Suppose 0 = p + t m AT (Ax lasso b) and p x lasso 1 S := supp(x ), that is, x = [x S; 0] LASSO recovers exact support, S = supp(x lasso ) then x lasso S = x S + 1 m (AT S A S ) 1 A T S ɛ }{{} oracle estimate, E(...) = x S 1 t (AT S A S ) 1 sign(xs lasso ). }{{} bias
6 Toy example 1 Consider b > 0 and the all-scalar problem: b = x + ɛ Oracle estimate: x oracle = b
7 Toy example 1 Consider b > 0 and the all-scalar problem: b = x + ɛ Oracle estimate: x oracle = b LASSO: x lasso minimize x x + t x b 2 2 LASSO solution: 0, 0 t 1 x lasso = b, b 1 t, 1 b < t <. LASSO reduces the signal magnitude.
8 Toy example 2 Suppose sorted a 1 > a 2... a n > 0 and given measurement where x R n. b = a T x + ɛ R
9 Toy example 2 Suppose sorted a 1 > a 2... a n > 0 and given measurement where x R n. LASSO solution: b = a T x + ɛ R x lasso 2 = = x lasso x lasso 1 = n = 0, t 0. { 0, 0 t 1, a 1 b b a 1 1, ta a 1 b < t <. LASSO selects a 1 but reduces the signal magnitude.
10 A more realistic example Setup: n = 256, m = 25, gaussian noise ɛ true signal BPDN recovery True vs LASSO (t is hand tuned) LASSO solution: selects large signals but reduces their magnitudes misses several moderate sized signals (false negatives) includes small false signals (false positives)
11 LASSO post-debiasing Goal: to restore the reduced magnitudes Let S := supp(x lasso ). Common approach: solve minimize Ax b 2 x subject to supp(x) = S (the solution and x lasso may have different signs) Another approach: remove 1 t (AT S A S ) 1 sign(xs lasso ) from xs lasso
12 LASSO post-debiasing Goal: to restore the reduced magnitudes Let S := supp(x lasso ). Common approach: solve minimize Ax b 2 x subject to supp(x) = S (the solution and x lasso may have different signs) Another approach: remove 1 t (AT S A S ) 1 sign(xs lasso ) from xs lasso Issues: extra computation of matrix inversion cannot correct false positives or false negatives in x lasso cannot work with continuous support (e.g., low-rank matrix recovery)
13 Proposed: inverse scale space (ISS) dynamic Name comes from image processing Idea: instead of minimizing prior+fitting, evolve prior and fitting along their (sub)gradients Get the solution path {x(t), p(t)} t 0 by evolving from initial x(0) = p(0) = 0. ṗ(t) = 1 m AT (b Ax(t)), }{{} fitting p(t) x(t) 1. ISS path is well-defined under assumptions: p(t) is right-continuously differentiable, and x(t) is right-continuous.
14 Compare LASSO and ISS Apply LASSO and ISS to the same example shown before true signal BPDN recovery true signal Bregman recovery True vs LASSO (shown previously) True vs ISS Compared to LASSO, ISS does not reduce signal magnitudes ISS has fewer false positives ISS has fewer false negatives. ISS recovers the moderate sized signals.
15 Under the hood: removing LASSO bias at its origin Recall in LASSO, we have p x lasso 1 and p = t m AT (b Ax lasso ) Differentiating the equation w.r.t. t gives ṗ = 1 m AT (b A(tẋ lasso + x lasso )) In fact, tẋ lasso + x lasso is LASSO s post-debiasing solution!
16 Under the hood: removing LASSO bias at its origin Recall in LASSO, we have p x lasso 1 and p = t m AT (b Ax lasso ) Differentiating the equation w.r.t. t gives ṗ = 1 m AT (b A(tẋ lasso + x lasso )) In fact, tẋ lasso + x lasso is LASSO s post-debiasing solution! Replacing tẋ lasso + x lasso by x to remove bias, yielding the ISS dynamic ṗ = 1 m AT (b Ax) ISS works better than (LASSO + post-debiasing).
17 Numerical result: prostate tumor size the first example from Hastie-Tibshirani-Friedman problem: given 8 clinical features, select predictors for prostate tumor size data: 67 training cases + 30 testing cases; parameters picked by cross validation Predictor LS Subset Selection LASSO ISS Intercept lcavol lweight age lbph svi lcp gleason pgg #predictors Test error ISS uses fewest predictors and achieves the best test error!
18 Theory: consistency guarantees for ISS Question: t so that x(t) has the following properties? sign consistency: sign(x ) = sign(x(t)). no false positive: if true xi = 0, then x i(t) = 0 no false negative: if true xi 0, then x i(t) 0
19 Theory: consistency guarantees for ISS Question: t so that x(t) has the following properties? sign consistency: sign(x ) = sign(x(t)). no false positive: if true xi = 0, then x i(t) = 0 no false negative: if true xi 0, then x i(t) 0 Theorem Make the assumptions Gaussian noise: ω N (0, σ 2 I ), normalized column: 1 n maxj Aj 2 1, and assume the irrepresentable and strong-signal conditions. Then, with high probability, ISS point x(t) has sign consistency and gives an unbias estimate to x. (There is an explicit formula for t.) Proof is based on the next two lemmas.
20 No false positive Define true support S := supp(x ), and let T := S c. Lemma Under the assumptions, if A S has full column rank and max A T j A S (A T S A S ) η j T for some η (0, 1), then with high probability supp(x(s)) S, s t := O ( ) η m. σ log n Proof uses: (i) concentration inequality, and (ii) if supp(x(s)) S, s t, then p(s) T = A T TA S (A T S A S ) 1 p(s) S + ta TP A w, s t. S
21 No false negative / sign consistency Lemma Under the assumptions, if A SA S γi and { ( ) xmin σ log S max O, O γ m ( )} σ log S log n, ηγ m then there exist t (which can be given explicitly) so that with high probability and x(t) = x S (A SA S ) 1 A Sω obeys sign(x(t)) = sign(x ) x(t) x x min/2. first term in max ensures (A SA S ) 1 A Sω u min/2 second term ensures: inf{t : sign(x S (t)) = sign(x S )} t.
22 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute
23 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute p(t) is piece-wise linear, given by p(t) = p(t k 1 ) + t t k 1 m AT (b Ax(t k 1 )), t [t k 1, t k ], where t k := sup{t > t k 1 : p(t) x(t k 1 ) 1}.
24 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute p(t) is piece-wise linear, given by p(t) = p(t k 1 ) + t t k 1 m AT (b Ax(t k 1 )), t [t k 1, t k ], where t k := sup{t > t k 1 : p(t) x(t k 1 ) 1}. x(t), t [t k 1, t k ), is constantly equal to x(t k 1 ); if t k, next u i 0, p i(t k ) = 1, x(t k ) = arg min Au b 2 2, subject to u i = 0, p i(t k ) ( 1, 1), u u i 0, p i(t k ) = 1.
25 ISS computation ISS is fast on moderately-sized problems evolve from t = 0 and through (finitely many) break points each break point: a constrained least-squares subproblem. (Since it is similar to the one at the previous break point, it can be solved by maintaining a QR decomposition) How to evolve ISS for huge problems with many break points? Answer: fast discrete approximations Bregman iteration: LASSO subproblem + add-back-the-residual Linearized Bregman iteration: closed-form iteration, parallelizable
26 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ),
27 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ), which is the first-order optimality condition to x k+1 arg min x 1 x k 1 p k, x x k δ + x }{{} 2m Ax b 2, Bregman distance of l 1
28 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ), which is the first-order optimality condition to x k+1 arg min x 1 x k 1 p k, x x k δ + x }{{} 2m Ax b 2, Bregman distance of l 1 By a change of variable, obtain the equivalent iteration: x k+1 arg min x 1 + δ x 2m Ax bk 2, b k+1 b k + (b Ax k ). add back the residual Keep your LASSO solver, use a small δ, and just add back the residual Important: derivation still holds with 1 replace by any convex r( )
29 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ
30 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ By z(t) = p(t) + 1 x(t), it reduces to an ODE: κ ż(t) = 1 m AT (b κa shrink(z(t))).
31 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ By z(t) = p(t) + 1 x(t), it reduces to an ODE: κ Insight: Given z(t), uniquely recover ż(t) = 1 m AT (b κa shrink(z(t))). x(t) = κ shrink(z(t)), p(t) = z(t) 1 κ x(t).
32 Discrete linearized Bregman Iteration ODE from the last slide: ż = 1 m AT (b κa shrink(z(t))). Forward Euler: z k+1 = z k + α k m AT (b A (κ shrink(z k )) }{{} x k )
33 Discrete linearized Bregman Iteration ODE from the last slide: ż = 1 m AT (b κa shrink(z(t))). Forward Euler: z k+1 = z k + α k m AT (b A (κ shrink(z k )) }{{} x k ) Easy to parallelize for very large dataset. For example: A = [A 1 A 2 A L ], where A l is distributed Distributed implementation: for l = 1,..., L in parallel: { z k+1 l = zl k + α k m AT l (b w k ) w k+1 l = κa l shrink(z k+1 L all-reduce sum: w k+1 = w k+1 l. l=1 l )
34 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: z k+1 = z k α k m AT (A(κ shrink(z k )) b)
35 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: Comparison: z k+1 = z k α k m AT (A(κ shrink(z k )) b) ISTA: solves LASSO as k, intermediate x k is dense LBreg: intermediate x k is sparse (useful as a regularization path)
36 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: Comparison: z k+1 = z k α k m AT (A(κ shrink(z k )) b) ISTA: solves LASSO as k, intermediate x k is dense LBreg: intermediate x k is sparse (useful as a regularization path) as k, solves: minimize x κ x 2 subject to Ax = b, with exact penalty property: sufficiently large κ gives x 1 minimizer
37 Comparison to orthogonal matching pursuit (OMP) 1 OMP: start with index set S = and vector x = 0; iterate: 1. compute residual A (b Ax), add its largest entry to S 2. set x arg min b Ax 2 2 subject to x i = 0 i S. 1 Mallat-Zhang 93, Tropp-Gilbert 07
38 Comparison to orthogonal matching pursuit (OMP) 1 OMP: start with index set S = and vector x = 0; iterate: 1. compute residual A (b Ax), add its largest entry to S 2. set x arg min b Ax 2 2 subject to x i = 0 i S. Differences: OMP: increase index set S. (OMP variants evolve S in other ways) ISS: evolves p x 1, encoding how likely a current zero becomes nonzero 1 Mallat-Zhang 93, Tropp-Gilbert 07
39 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex.
40 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex. Our solution: time differentiation ṗ(t) = f (x), p(t) r(x(t)).
41 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex. Our solution: time differentiation ṗ(t) = f (x), p(t) r(x(t)). Applications: prior r: weighted l 1, l 1,2, nuclear norm; can incorporate nonnegative or box constraints as indicator functions fitting f : square loss, logistic loss, etc. You can keep existing solver for (1), try iteratively adding back the residual. In fact, there is even a simple way to make r nonconvex.
42 Related work in optimization / image processing Discrete: Bregman iteration for imaging and compressed sensing: Osher-Burger-Goldfarb-Xu-Y 06, Y-Osher-Goldfarb-Darbon 08 Linearized Bregman on l 1: Y-Osher-Goldfarb-Darbon 08, Y 10, Lai-Y 13 Matrix completion SVT on X : Cai-Candès-Shen 10 Extension and analysis: Zhang 13, Zhang 14 Continuous: Inverse scale space (ISS) on TV: Burger-Gilboa-Osher-Xu 06 Adaptive ISS on l 1: Burger-Möller-Benning-Osher 11 Greedy ISS on l 1: Möller-Zhang 13
43 Summary Instead of minimize r(x) + t f (x), try solving { ṗ(t) = f (x(t)) The solution will have the structure you seek for no or less bias p(t) r(x(t)). often, has simple and fast approximation algorithms Reference: UCLA CAM S.Osher, F.Ruan, J.Xiong, Y.Yao and W.Yin, Sparse Recovery via Differential Inclusions, July 2014.
Sparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear regression
COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLow-Rank Factorization Models for Matrix Completion and Matrix Separation
for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear Regression
COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSparse Optimization Lecture: Dual Certificate in l 1 Minimization
Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationof Orthogonal Matching Pursuit
A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationModel-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk
Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationSolving l 1 Regularized Least Square Problems with Hierarchical Decomposition
Solving l 1 Least Square s with 1 mzhong1@umd.edu 1 AMSC and CSCAMM University of Maryland College Park Project for AMSC 663 October 2 nd, 2012 Outline 1 The 2 Outline 1 The 2 Compressed Sensing Example
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationStability and Robustness of Weak Orthogonal Matching Pursuits
Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationEnhanced Compressive Sensing and More
Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationNecessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization
Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationLecture 17 Intro to Lasso Regression
Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationNecessary and sufficient conditions of solution uniqueness in l 1 minimization
1 Necessary and sufficient conditions of solution uniqueness in l 1 minimization Hui Zhang, Wotao Yin, and Lizhi Cheng arxiv:1209.0652v2 [cs.it] 18 Sep 2012 Abstract This paper shows that the solutions
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationNonnegative Garrote Component Selection in Functional ANOVA Models
Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationSignal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit
Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably
More informationBasis Pursuit Denoising and the Dantzig Selector
BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationCompressive Sensing (CS)
Compressive Sensing (CS) Luminita Vese & Ming Yan lvese@math.ucla.edu yanm@math.ucla.edu Department of Mathematics University of California, Los Angeles The UCLA Advanced Neuroimaging Summer Program (2014)
More informationComputing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L 1 and L 2
Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L and L 2 Yifei Lou, Penghang Yin, Qi He and Jack Xin Abstract We study analytical and numerical properties of the
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationLeast Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding
arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline
More informationSCIENCE CHINA Information Sciences. Received December 22, 2008; accepted February 26, 2009; published online May 8, 2010
. RESEARCH PAPERS. SCIENCE CHINA Information Sciences June 2010 Vol. 53 No. 6: 1159 1169 doi: 10.1007/s11432-010-0090-0 L 1/2 regularization XU ZongBen 1, ZHANG Hai 1,2, WANG Yao 1, CHANG XiangYu 1 & LIANG
More informationCOMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationSparse analysis Lecture III: Dictionary geometry and greedy algorithms
Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j
More informationSparse Optimization Lecture: Sparse Recovery Guarantees
Those who complete this lecture will know Sparse Optimization Lecture: Sparse Recovery Guarantees Sparse Optimization Lecture: Sparse Recovery Guarantees Instructor: Wotao Yin Department of Mathematics,
More informationSimultaneous Sparsity
Simultaneous Sparsity Joel A. Tropp Anna C. Gilbert Martin J. Strauss {jtropp annacg martinjs}@umich.edu Department of Mathematics The University of Michigan 1 Simple Sparse Approximation Work in the d-dimensional,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationSummary and discussion of: Controlling the False Discovery Rate via Knockoffs
Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More information