Bias-free Sparse Regression with Guaranteed Consistency

Size: px
Start display at page:

Download "Bias-free Sparse Regression with Guaranteed Consistency"

Transcription

1 Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March 10, 2015

2 Background Goal: recover a sparse x R n from noisy linear observation b := Ax + ɛ where A R m n and b R m are given, ɛ is zero-mean unknown noise. Our focus: the under-determined case, m n LASSO is a common approach, but its solution is biased. Fan and Li (2001): to avoid bias, minimization must use nonconvex prior Our approach keeps the convex prior but replaces minimization

3 This talk Review LASSO and explains its solution bias A new regularization path: solution to an ordinary differential inclusion use convex prior, is free of bias, and has the oracle property has sign/l 2 consistency how to compute the exact path, as well as its fast approximation how to try it by making a 2-line change to your existing code

4 LASSO and its bias Minimization form: x lasso minimize x 1 + t x 2m Ax b 2 2 Variational form (optimality condition): 0 = p + t m AT (Ax lasso b) and p x lasso 1

5 LASSO and its bias Minimization form: x lasso minimize x 1 + t x 2m Ax b 2 2 Variational form (optimality condition): Suppose 0 = p + t m AT (Ax lasso b) and p x lasso 1 S := supp(x ), that is, x = [x S; 0] LASSO recovers exact support, S = supp(x lasso ) then x lasso S = x S + 1 m (AT S A S ) 1 A T S ɛ }{{} oracle estimate, E(...) = x S 1 t (AT S A S ) 1 sign(xs lasso ). }{{} bias

6 Toy example 1 Consider b > 0 and the all-scalar problem: b = x + ɛ Oracle estimate: x oracle = b

7 Toy example 1 Consider b > 0 and the all-scalar problem: b = x + ɛ Oracle estimate: x oracle = b LASSO: x lasso minimize x x + t x b 2 2 LASSO solution: 0, 0 t 1 x lasso = b, b 1 t, 1 b < t <. LASSO reduces the signal magnitude.

8 Toy example 2 Suppose sorted a 1 > a 2... a n > 0 and given measurement where x R n. b = a T x + ɛ R

9 Toy example 2 Suppose sorted a 1 > a 2... a n > 0 and given measurement where x R n. LASSO solution: b = a T x + ɛ R x lasso 2 = = x lasso x lasso 1 = n = 0, t 0. { 0, 0 t 1, a 1 b b a 1 1, ta a 1 b < t <. LASSO selects a 1 but reduces the signal magnitude.

10 A more realistic example Setup: n = 256, m = 25, gaussian noise ɛ true signal BPDN recovery True vs LASSO (t is hand tuned) LASSO solution: selects large signals but reduces their magnitudes misses several moderate sized signals (false negatives) includes small false signals (false positives)

11 LASSO post-debiasing Goal: to restore the reduced magnitudes Let S := supp(x lasso ). Common approach: solve minimize Ax b 2 x subject to supp(x) = S (the solution and x lasso may have different signs) Another approach: remove 1 t (AT S A S ) 1 sign(xs lasso ) from xs lasso

12 LASSO post-debiasing Goal: to restore the reduced magnitudes Let S := supp(x lasso ). Common approach: solve minimize Ax b 2 x subject to supp(x) = S (the solution and x lasso may have different signs) Another approach: remove 1 t (AT S A S ) 1 sign(xs lasso ) from xs lasso Issues: extra computation of matrix inversion cannot correct false positives or false negatives in x lasso cannot work with continuous support (e.g., low-rank matrix recovery)

13 Proposed: inverse scale space (ISS) dynamic Name comes from image processing Idea: instead of minimizing prior+fitting, evolve prior and fitting along their (sub)gradients Get the solution path {x(t), p(t)} t 0 by evolving from initial x(0) = p(0) = 0. ṗ(t) = 1 m AT (b Ax(t)), }{{} fitting p(t) x(t) 1. ISS path is well-defined under assumptions: p(t) is right-continuously differentiable, and x(t) is right-continuous.

14 Compare LASSO and ISS Apply LASSO and ISS to the same example shown before true signal BPDN recovery true signal Bregman recovery True vs LASSO (shown previously) True vs ISS Compared to LASSO, ISS does not reduce signal magnitudes ISS has fewer false positives ISS has fewer false negatives. ISS recovers the moderate sized signals.

15 Under the hood: removing LASSO bias at its origin Recall in LASSO, we have p x lasso 1 and p = t m AT (b Ax lasso ) Differentiating the equation w.r.t. t gives ṗ = 1 m AT (b A(tẋ lasso + x lasso )) In fact, tẋ lasso + x lasso is LASSO s post-debiasing solution!

16 Under the hood: removing LASSO bias at its origin Recall in LASSO, we have p x lasso 1 and p = t m AT (b Ax lasso ) Differentiating the equation w.r.t. t gives ṗ = 1 m AT (b A(tẋ lasso + x lasso )) In fact, tẋ lasso + x lasso is LASSO s post-debiasing solution! Replacing tẋ lasso + x lasso by x to remove bias, yielding the ISS dynamic ṗ = 1 m AT (b Ax) ISS works better than (LASSO + post-debiasing).

17 Numerical result: prostate tumor size the first example from Hastie-Tibshirani-Friedman problem: given 8 clinical features, select predictors for prostate tumor size data: 67 training cases + 30 testing cases; parameters picked by cross validation Predictor LS Subset Selection LASSO ISS Intercept lcavol lweight age lbph svi lcp gleason pgg #predictors Test error ISS uses fewest predictors and achieves the best test error!

18 Theory: consistency guarantees for ISS Question: t so that x(t) has the following properties? sign consistency: sign(x ) = sign(x(t)). no false positive: if true xi = 0, then x i(t) = 0 no false negative: if true xi 0, then x i(t) 0

19 Theory: consistency guarantees for ISS Question: t so that x(t) has the following properties? sign consistency: sign(x ) = sign(x(t)). no false positive: if true xi = 0, then x i(t) = 0 no false negative: if true xi 0, then x i(t) 0 Theorem Make the assumptions Gaussian noise: ω N (0, σ 2 I ), normalized column: 1 n maxj Aj 2 1, and assume the irrepresentable and strong-signal conditions. Then, with high probability, ISS point x(t) has sign consistency and gives an unbias estimate to x. (There is an explicit formula for t.) Proof is based on the next two lemmas.

20 No false positive Define true support S := supp(x ), and let T := S c. Lemma Under the assumptions, if A S has full column rank and max A T j A S (A T S A S ) η j T for some η (0, 1), then with high probability supp(x(s)) S, s t := O ( ) η m. σ log n Proof uses: (i) concentration inequality, and (ii) if supp(x(s)) S, s t, then p(s) T = A T TA S (A T S A S ) 1 p(s) S + ta TP A w, s t. S

21 No false negative / sign consistency Lemma Under the assumptions, if A SA S γi and { ( ) xmin σ log S max O, O γ m ( )} σ log S log n, ηγ m then there exist t (which can be given explicitly) so that with high probability and x(t) = x S (A SA S ) 1 A Sω obeys sign(x(t)) = sign(x ) x(t) x x min/2. first term in max ensures (A SA S ) 1 A Sω u min/2 second term ensures: inf{t : sign(x S (t)) = sign(x S )} t.

22 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute

23 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute p(t) is piece-wise linear, given by p(t) = p(t k 1 ) + t t k 1 m AT (b Ax(t k 1 )), t [t k 1, t k ], where t k := sup{t > t k 1 : p(t) x(t k 1 ) 1}.

24 Compute the ISS path Theorem The solution path to ṗ(t) = 1 m AT (b Ax(t)) and p(t) x(t) 1 with initial t 0 = 0, p(0) = 0, x(0) = 0 is given piece-wise by iteration: for k = 1, 2,..., K compute p(t) is piece-wise linear, given by p(t) = p(t k 1 ) + t t k 1 m AT (b Ax(t k 1 )), t [t k 1, t k ], where t k := sup{t > t k 1 : p(t) x(t k 1 ) 1}. x(t), t [t k 1, t k ), is constantly equal to x(t k 1 ); if t k, next u i 0, p i(t k ) = 1, x(t k ) = arg min Au b 2 2, subject to u i = 0, p i(t k ) ( 1, 1), u u i 0, p i(t k ) = 1.

25 ISS computation ISS is fast on moderately-sized problems evolve from t = 0 and through (finitely many) break points each break point: a constrained least-squares subproblem. (Since it is similar to the one at the previous break point, it can be solved by maintaining a QR decomposition) How to evolve ISS for huge problems with many break points? Answer: fast discrete approximations Bregman iteration: LASSO subproblem + add-back-the-residual Linearized Bregman iteration: closed-form iteration, parallelizable

26 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ),

27 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ), which is the first-order optimality condition to x k+1 arg min x 1 x k 1 p k, x x k δ + x }{{} 2m Ax b 2, Bregman distance of l 1

28 Discrete ISS = Bregman iteration Apply forward Euler to ṗ = 1 m AT (b Ax) while keeping p x 1: p k+1 = p k + δ m AT (b Ax k ), which is the first-order optimality condition to x k+1 arg min x 1 x k 1 p k, x x k δ + x }{{} 2m Ax b 2, Bregman distance of l 1 By a change of variable, obtain the equivalent iteration: x k+1 arg min x 1 + δ x 2m Ax bk 2, b k+1 b k + (b Ax k ). add back the residual Keep your LASSO solver, use a small δ, and just add back the residual Important: derivation still holds with 1 replace by any convex r( )

29 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ

30 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ By z(t) = p(t) + 1 x(t), it reduces to an ODE: κ ż(t) = 1 m AT (b κa shrink(z(t))).

31 Faster alternative: linearized Bregman ISS Add the blue term to ISS. ṗ(t) + 1 κ ẋ(t) = 1 m AT (b Ax(t)), p(t) x(t) 1. Solution is piece-wise smooth, every piece has a closed form. Converges to the ISS solution exponentially fast in κ By z(t) = p(t) + 1 x(t), it reduces to an ODE: κ Insight: Given z(t), uniquely recover ż(t) = 1 m AT (b κa shrink(z(t))). x(t) = κ shrink(z(t)), p(t) = z(t) 1 κ x(t).

32 Discrete linearized Bregman Iteration ODE from the last slide: ż = 1 m AT (b κa shrink(z(t))). Forward Euler: z k+1 = z k + α k m AT (b A (κ shrink(z k )) }{{} x k )

33 Discrete linearized Bregman Iteration ODE from the last slide: ż = 1 m AT (b κa shrink(z(t))). Forward Euler: z k+1 = z k + α k m AT (b A (κ shrink(z k )) }{{} x k ) Easy to parallelize for very large dataset. For example: A = [A 1 A 2 A L ], where A l is distributed Distributed implementation: for l = 1,..., L in parallel: { z k+1 l = zl k + α k m AT l (b w k ) w k+1 l = κa l shrink(z k+1 L all-reduce sum: w k+1 = w k+1 l. l=1 l )

34 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: z k+1 = z k α k m AT (A(κ shrink(z k )) b)

35 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: Comparison: z k+1 = z k α k m AT (A(κ shrink(z k )) b) ISTA: solves LASSO as k, intermediate x k is dense LBreg: intermediate x k is sparse (useful as a regularization path)

36 Comparison to ISTA Compare ISTA iteration: x k+1 = shrink(x k α k m AT (Ax k b), 1 t ) Discrete linearized Bregman (LBreg) iteration: Comparison: z k+1 = z k α k m AT (A(κ shrink(z k )) b) ISTA: solves LASSO as k, intermediate x k is dense LBreg: intermediate x k is sparse (useful as a regularization path) as k, solves: minimize x κ x 2 subject to Ax = b, with exact penalty property: sufficiently large κ gives x 1 minimizer

37 Comparison to orthogonal matching pursuit (OMP) 1 OMP: start with index set S = and vector x = 0; iterate: 1. compute residual A (b Ax), add its largest entry to S 2. set x arg min b Ax 2 2 subject to x i = 0 i S. 1 Mallat-Zhang 93, Tropp-Gilbert 07

38 Comparison to orthogonal matching pursuit (OMP) 1 OMP: start with index set S = and vector x = 0; iterate: 1. compute residual A (b Ax), add its largest entry to S 2. set x arg min b Ax 2 2 subject to x i = 0 i S. Differences: OMP: increase index set S. (OMP variants evolve S in other ways) ISS: evolves p x 1, encoding how likely a current zero becomes nonzero 1 Mallat-Zhang 93, Tropp-Gilbert 07

39 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex.

40 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex. Our solution: time differentiation ṗ(t) = f (x), p(t) r(x(t)).

41 Generalization Consider any convex regression model, parameterized by t: minimize x r(x) + t f (x) (1) Fan and Li (2001): convex r causes bias. Solution: making r nonconvex. Our solution: time differentiation ṗ(t) = f (x), p(t) r(x(t)). Applications: prior r: weighted l 1, l 1,2, nuclear norm; can incorporate nonnegative or box constraints as indicator functions fitting f : square loss, logistic loss, etc. You can keep existing solver for (1), try iteratively adding back the residual. In fact, there is even a simple way to make r nonconvex.

42 Related work in optimization / image processing Discrete: Bregman iteration for imaging and compressed sensing: Osher-Burger-Goldfarb-Xu-Y 06, Y-Osher-Goldfarb-Darbon 08 Linearized Bregman on l 1: Y-Osher-Goldfarb-Darbon 08, Y 10, Lai-Y 13 Matrix completion SVT on X : Cai-Candès-Shen 10 Extension and analysis: Zhang 13, Zhang 14 Continuous: Inverse scale space (ISS) on TV: Burger-Gilboa-Osher-Xu 06 Adaptive ISS on l 1: Burger-Möller-Benning-Osher 11 Greedy ISS on l 1: Möller-Zhang 13

43 Summary Instead of minimize r(x) + t f (x), try solving { ṗ(t) = f (x(t)) The solution will have the structure you seek for no or less bias p(t) r(x(t)). often, has simple and fast approximation algorithms Reference: UCLA CAM S.Osher, F.Ruan, J.Xiong, Y.Yao and W.Yin, Sparse Recovery via Differential Inclusions, July 2014.

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition Solving l 1 Least Square s with 1 mzhong1@umd.edu 1 AMSC and CSCAMM University of Maryland College Park Project for AMSC 663 October 2 nd, 2012 Outline 1 The 2 Outline 1 The 2 Compressed Sensing Example

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Necessary and sufficient conditions of solution uniqueness in l 1 minimization 1 Necessary and sufficient conditions of solution uniqueness in l 1 minimization Hui Zhang, Wotao Yin, and Lizhi Cheng arxiv:1209.0652v2 [cs.it] 18 Sep 2012 Abstract This paper shows that the solutions

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Some new ideas for post selection inference and model assessment

Some new ideas for post selection inference and model assessment Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably

More information

Basis Pursuit Denoising and the Dantzig Selector

Basis Pursuit Denoising and the Dantzig Selector BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Compressive Sensing (CS)

Compressive Sensing (CS) Compressive Sensing (CS) Luminita Vese & Ming Yan lvese@math.ucla.edu yanm@math.ucla.edu Department of Mathematics University of California, Los Angeles The UCLA Advanced Neuroimaging Summer Program (2014)

More information

Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L 1 and L 2

Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L 1 and L 2 Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L and L 2 Yifei Lou, Penghang Yin, Qi He and Jack Xin Abstract We study analytical and numerical properties of the

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding

Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding arxiv:204.2353v4 [stat.ml] 9 Oct 202 Least Absolute Gradient Selector: variable selection via Pseudo-Hard Thresholding Kun Yang September 2, 208 Abstract In this paper, we propose a new approach, called

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

SCIENCE CHINA Information Sciences. Received December 22, 2008; accepted February 26, 2009; published online May 8, 2010

SCIENCE CHINA Information Sciences. Received December 22, 2008; accepted February 26, 2009; published online May 8, 2010 . RESEARCH PAPERS. SCIENCE CHINA Information Sciences June 2010 Vol. 53 No. 6: 1159 1169 doi: 10.1007/s11432-010-0090-0 L 1/2 regularization XU ZongBen 1, ZHANG Hai 1,2, WANG Yao 1, CHANG XiangYu 1 & LIANG

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j

More information

Sparse Optimization Lecture: Sparse Recovery Guarantees

Sparse Optimization Lecture: Sparse Recovery Guarantees Those who complete this lecture will know Sparse Optimization Lecture: Sparse Recovery Guarantees Sparse Optimization Lecture: Sparse Recovery Guarantees Instructor: Wotao Yin Department of Mathematics,

More information

Simultaneous Sparsity

Simultaneous Sparsity Simultaneous Sparsity Joel A. Tropp Anna C. Gilbert Martin J. Strauss {jtropp annacg martinjs}@umich.edu Department of Mathematics The University of Michigan 1 Simple Sparse Approximation Work in the d-dimensional,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information