A REMARK ON THE LASSO AND THE DANTZIG SELECTOR

Size: px
Start display at page:

Download "A REMARK ON THE LASSO AND THE DANTZIG SELECTOR"

Transcription

1 A REMARK ON THE LAO AND THE DANTZIG ELECTOR YOHANN DE CATRO Abstract This article investigates a new parameter for the high-dimensional regression with noise: the distortion This latter has attracted a lot of attention recently with the appearance of new deteristic constructions of almost -Euclidean sections of the L-ball It measures how far is the intersection between the kernel of the design matrix and the unit L-ball from an L-ball We show that the distortion holds enough information to derive oracle inequalities (ie a comparison to an ideal situation where one knows the s largest coefficients of the target) for the lasso and the Dantzig selector Introduction In the past decade much emphasis has been put on recovering a large number of unknown variables from few noisy observations Consider the high-dimensional linear model where one observes a vector y R n such that y = Xβ + ε where X R n p is called the design matrix (known from the experimenter) β R p is an unknown target vector one would like to recover and ε R n is a stochastic error term that contains all the perturbations of the experiment A standard hypothesis in high-dimensional regression HTF09] requires that one can provide a constant λ 0 n R as small as possible such that () X ε l λ 0 n with an overwhelg probability where X R p n denotes the transpose matrix of X In the case of n-multivariate Gaussian distribution it is known that λ 0 n = O(σ n log p) where σn > 0 denotes the standard deviation of the noise; see Lemma A uppose that you have far less observation variables y i than the unknown variables β i For instance let us mention the Compressed ensing problem Don06 CRT06] where one would like to simultaneously acquire and compress a signal using few (non-adaptive) linear measurements ie n p In general terms we are interested in accurately estimating the target vector β and the response Xβ from few and corrupted observations During the past decade this challenging issue has attracted a lot of attention among the statistical society In 996 R Tibshirani introduced the lasso Tib96]: { () β l arg Xβ y + λ β R p l l β } where λ l > 0 denotes a tuning parameter Two decades later this estimator continues to play a key role in our understanding of high-dimensional inverse problems Its popularity might be due to the fact that this estimator is computationally tractable Indeed the lasso can be recasted Date: eptember 8 0 Key words and phrases Lasso; Dantzig selector; Oracle inequality; Almost-Euclidean section; Distortion

2 YOHANN DE CATRO in a econd Order Cone Program (OCP) that can be solved using an interior point method Recently EJ Candès and T Tao CT07] have introduced the Dantzig selector as (3) β d arg β R p β st X (y Xβ) l λ d where λ d > 0 is a tuning parameter It is known that it can be recasted as a linear program Hence it is also computationally tractable A great statistical challenge is then to find efficiently verifiable conditions on X ensuring that the lasso () or the Dantzig selector (3) would recover most of the information about the target vector β Our goal What do we precisely mean by most of the information about the target? What is the amount of information one could recover from few observations? These are two of the important questions raised by Compressed ensing uppose that you want to find an s-sparse vector (ie a vector with at most s non-zero coefficients) that represents the target then you would probably want that it contains the s largest (in magnitude) coefficients βi More precisely denote by { p} the set of the indices of the s largest coefficients The s-best term approximation vector is β R p where (β ) i = βi if i and 0 otherwise Observe that it is the s-sparse projection in respect to any l q -norm for q < + (ie it imizes the l q -distance to β among all the s-sparse vectors) and then the most natural approximation by an s-sparse vector uppose that someone gives you all the keys to recover β More precisely imagine that you know the subset in advance and that you observe y oracle = Xβ + ε This is an ideal situation referred as the oracle case Assume that the noise ε is a Gaussian white noise of standard deviation σ n ie ε N n (0 σn Id n ) where N n denotes the n-multivariate Gaussian distribution Then the optimal estimator is the ordinary least square β ideal R p on the subset namely: β ideal arg β R p upp(β) Xβ y oracle l where upp(β) { p} denotes the support (ie the set of the indices of the non-zero coefficients) of the vector β It holds β ideal β = β ideal β + β c β ideal β l + β c where β c = β β denotes the error vector of the s-best term approximation A calculation of the solution of the least square estimator shows that: E β ideal β l = E ( X X ) X y oracle β l = E ( X X ) X ε l = Trace (( X X ) ) σ n ( ρ ) σ n s where X R n s denotes the matrix composed by the columns X i R n of the matrix X such that i and ρ is the largest singular value of X It yields that E ] β ideal β / l σ n s ρ In a nutshell the l -distance between the target β and the optimal estimator β ideal can be reasonably said of the order of (4) σ n s + β ρ c

3 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 3 In this article we say that the lasso satisfies a variable selection oracle inequality of order s if and only if its l -distance to the target namely β l β is bounded by (4) up to a satisfactory multiplicative factor In some situations it could be interesting to have a good approximation of Xβ In the oracle case we have Xβ ideal Xβ l Xβ ideal Xβ l + Xβ l c Xβ ideal Xβ l + ρ β c where ρ denotes the largest singular value of X An easy calculation gives that E Xβ ideal Xβ l = Trace ( X ( X X ) X ) σ n = σ n s Hence a tolerable upper bound is given by (5) σ n s + ρ β c We say that the lasso satisfies an error prediction oracle inequality of order s if and only if its prediction error is upper bounded by (5) up to a satisfactory multiplicative factor (say logarithmic in p) Framework In this article we investigate designs with known distortion We begin with the definition of this latter: Definition A subspace Γ R p has a distortion δ p if and only if x Γ x p xl δ x A long standing issue in approximation theory in Banach spaces is to find almost -Euclidean sections of the unit l -ball ie subspaces with a distortion δ close to and a dimension close to p In particular we recall that it has been established Kas77] that with an overwhelg probability a random subspace of dimension p n (with respect to the Haar measure on the Grassmannian) satisfies ( ) / p( + log(p/n)) (6) δ C n where C > 0 is a universal constant In other words it was shown that for all n p there exists a subspace Γ n of dimension p n such that for all x Γ n ( ) / xl + log(p/n) C x n Remark Hence our framework deals also with unitary invariant random matrices For instance the matrices with iid Gaussian entries Observe that their distortion satisfies (6) Recently new deteristic constructions of almost -Euclidean sections of the l -ball have been given Most of them can be viewed as related to the context of error-correcting codes Indeed the construction of Ind07] is based on amplifying the imum distance of a code using expanders While the construction of GLR08] is based on Low-Density Parity Check (LDPC) codes Finally the construction of I0] is related to the tensor product of error-correcting codes The main reason of this surprising fact is that the vectors of a subspace of low distortion must be well-spread ie a small subset of its coordinates cannot contain most of its l -norm (cf Ind07 GLR08]) This property is required from a good error-correcting code where the weight (ie the l 0 -norm) of each codeword cannot be concentrated on a small subset of its coordinates imilarly this property was intensively studied in Compressed ensing; see for instance the Nullspace Property in CDD09]

4 4 YOHANN DE CATRO Remark The main point of this article is that all of these deteristic constructions give efficient designs for the lasso and the Dantzig selector 3 The Universal Distortion Property In the past decade numerous conditions have been given to prove oracle inequalities for the lasso and the Dantzig selector An overview of important conditions can be found in vdgb09] We introduce a new condition the Universal Distortion Property (UDP) Definition (UDP( 0 κ 0 )) Given 0 p and 0 < κ 0 < / we say that a matrix X R n p satisfies the universal distortion condition of order 0 magnitude κ 0 and parameter if and only if for all γ R p for all integers s { 0 } for all subsets { p} such that = s it holds (7) γ Xγ l + κ 0 γ Remark Observe that the design X is not normalized Equation (8) in Theorem shows that can depend on the inverse of the smallest singular value of X Hence the quantity Xγ l is scalar invariant The UDP condition is similar to the Magic Condition BLPR] and the Compatibility Condition vdgb09] The main point of this article is that UDP is verifiable as soon as one can give an upper bound on the distortion of the kernel of the design matrix; see Theorem Hence instead of proving that a sufficient condition (such as RIP CRT06] REC BRT09] Compatibility vdgb09] ) holds it is sufficient to compute the distortion and the largest singular value of the design Especially as these conditions can be hard to prove for a given matrix We recall that an open problem is to find a computationally efficient algorithm that can tell if a given matrix satisfies the RIP condition CRT06] or not We call the property Universal Distortion because it is satisfied by all the full rank matrices (Universal) and the parameters 0 and can be expressed in terms of the distortion of the kernel Γ of X: Theorem Let X R n p be a full rank matrix Denote by δ the distortion of its kernel: γ δ = p γ l sup γ ker(x) and ρ n its smallest singular value Then for all γ R p γl δ γ + δ Xγl p ρ n Equivalently we have B := {γ R p (δ/ p) γ + (δ/ρ n ) Xγ l } B p where Bp denotes the Euclidean unit ball in R p This result implies that every full rank matrix satisfies UDP with parameters described as follows Theorem Let X R n p be a full rank matrix Denote by δ the distortion of its kernel and ρ n its smallest singular value Let 0 < κ 0 < / then X satisfies UDP( 0 κ 0 ) where ( κ0 ) p δ (8) 0 = and = δ ρ n This theorem is sharp in the following sense The parameter 0 represents (see Theorem ) the maximum number of coefficients that can be recovered using lasso we call it the sparsity

5 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 5 level It is known CDD09] that the best bound one could expect is opt n/ log(p/n) up to a multiplicative constant In the case where (6) holds the sparsity level satisfies (9) 0 κ 0 opt It shows that any design matrix with low distortion satisfies UDP with an optimal sparsity level Oracle inequalities The results presented here fold into two parts In the first part we assume only that UDP holds In particular it is not excluded that one can get better upper bounds on the parameters than Theorem As a matter of fact the smaller is the sharper the oracle inequalities are Then we give oracle inequalities in terms of only the distortion of the design Theorem Let X R n p be a full column rank matrix UDP( 0 κ 0 ) and that () holds Then for any (0) λ l > λ 0 n/( κ 0 ) it holds () β l β ( λ ) 0 n λl κ0 ( {p} =s s 0 λ l s + β c ) Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < / and λ l satisfying (0) we have () β l β ( λ ) (λ 0 l 4 δ n λl κ0 {p} ρ s + β ) c n =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Consider the case where the noise satisfies the hypothesis of Lemma A and take λ 0 n = λ 0 n() Assume that κ 0 is constant (say κ 0 = /3) and take λ l = 3λ 0 n; then () becomes β l β ( 6 Xl log p σ n s + β c {p} ) =s s 0 which is an oracle inequality up to a multiplicative factor log p In the same way () becomes β l β ( 4 Xl δ log p σ n s + β {p} ρ n ρ c ) n =s s p/9δ which is an oracle inequality up to a multiplicative factor C mult := (δ log p)/ρ n In the optimal case (6) this latter becomes: (3) C mult = C p ( + log(p/n)) log p n ρ n where C > 0 is the same universal constant as in (6) Roughly speaking up to a factor of the order of (3) the lasso is as good as the oracle that knows the 0 -best term approximation of the target Moreover as mentioned in (9) 0 is an optimal sparsity level However this multiplicative constant takes small values for a restrictive range of the parameter n As a matter of fact it is meaningful when n is a constant fraction of p imilarly we shows oracle inequalities in error prediction in terms of the distortion of the kernel of the design

6 6 YOHANN DE CATRO Theorem Let X R n p be a full column rank matrix UDP( 0 κ 0 ) and that () holds Then for any (0) λ l > λ 0 n/( κ 0 ) it holds (4) Xβ l Xβ l 4λ l β c s + {p} =s s 0 Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < / and λ l satisfying (0) we have (5) Xβ l Xβ l 4λ l δ s + {p} ρ n δ ρ n β c ] =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Consider the case where the noise satisfies the hypothesis of Lemma A and take λ 0 n = λ 0 n() Assume that κ 0 is constant (say κ 0 = /3) and take λ l = 3λ 0 n then (4) becomes Xβ l Xβ l 4 X ] β c {p} l log p σ n s + s =s s 0 which is not an oracle inequality stricto sensu because of /( ) in the second term As a matter of fact it tends to lower the s-best term approximation term β c Nevertheless it is almost an oracle inequality up to a multiplicative factor of the order of log p In the same way (5) becomes Xβ l Xβ l 48 X δ log p {p} l σ n s + ρ n ρ n δ ρ n β ] c =s s p/9δ which is an oracle inequality up to a multiplicative factor C mult := (δ log p)/ρ n In the optimal case (6) this latter becomes: (6) C mult = C where C > 0 is the same universal constant as in (6) (p log p ( + log(p/n)))/ ρ n n Results for the Dantzig selector imilarly we derive the same results for the Dantzig selector The only difference is that the parameter κ 0 must be less than /4 Here again the results fold into two parts In the first one we only assume that UDP holds In the second we invoke Theorem to derive results in terms of the distortion of the design Theorem 3 Let X R n p be a full column rank matrix UDP( 0 κ 0 ) with κ 0 < /4 and that () holds Then for any (7) λ d > λ 0 /( 4κ 0 ) it holds (8) β d β 4 ( ) λ 0 λ d 4κ0 ( {p} =s s 0 ] λ d s + β c ) Assume that X satisfies

7 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 7 For every full column rank matrix X R n p for all 0 < κ 0 < /4 and λ d satisfying (7) we have (9) β d β 4 ( λ 0 λ d ) 4κ0 (λ d 4 δ {p} ρ s + β ) c n =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel The prediction error is given by the following theorem Theorem 4 Let X R n p be a full column rank matrix UDP( 0 κ 0 ) with κ 0 < /4 and that () holds Then for any (7) λ d > λ 0 /( 4κ 0 ) it holds (0) Xβ d Xβ l 4λ d β c s + {p} =s s 0 Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < /4 and λ d satisfying (0) we have () Xβ d Xβ l 4λ d δ s + {p} ρ n δ ρ n β c ] =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Observe that the same comments as in the lasso case (eg (3) (6)) hold Eventually every deteristic construction of almost-euclidean sections gives design that satisfies the oracle inequalities above 3 An overview of the standard results Oracle inequalities for the lasso and the Dantzig selector have been established under a variety of different conditions on the design In this section we show that the UDP condition is comparable to the standard conditions (RIP REC and Compatibility) and that our results are relevant in the literature on the high-dimensional regression 3 The standard conditions We recall some sufficient conditions here For all s { p} we denote by Σ s R p the set of all the s-sparse vectors Restricted Isoperimetric Property: A matrix X R n p satisfies RIP (θ ) if and only if there exists 0 < θ < (as small as possible) such that for all s { } for all γ Σ s it holds ( θ ) γ l Xγ l ( + θ ) γ l The constant θ is called the -restricted isometry constant Restricted Eigenvalue Assumption BRT09]: A matrix X satisfies RE( c 0 ) if and only if κ( c 0 ) = {p} γ 0 γ c c 0 γ Xγ l γ l > 0 The constant κ( c 0 ) is called the ( c 0 )-restricted l -eigenvalue ]

8 8 YOHANN DE CATRO () Compatibility Condition vdgb09]: We say that a matrix X R n p satisfies the condition Compatibility( c 0 ) if and only if Xγ l φ( c 0 ) = > 0 γ {p} γ 0 γ c c 0 γ The constant φ( c 0 ) is called the ( c 0 )-restricted l -eigenvalue H Condition JN0]: X R n p satisfies the H (κ) condition (with κ < /) if and only if for all γ R p and for all { p} such that it holds γ ˆλ Xγ l +κ γ where ˆλ denotes the maximum of the l -norms of the columns in X Remark The first term of the right hand side (ie s Xγ l ) is greater than the first term of the right hand side of the UDP condition (ie Xγ l ) Hence the H condition is weaker than the UDP condition Nevertheless the authors JN0] established limits of performance on their conditions: the condition H s (/3) (that implies H s (/3)) is feasible only in a severe restricted range of the sparsity parameter s Notice that this is not the case of the UDP condition the equality (9) shows that it is feasible for a large range of the sparsity parameter s Moreover a comparison of the two approaches is given in Table Let us emphasize that the above description is not meant to be exhaustive In particular we do not mention the irrepresentable condition ZY06] which ensures exact recovery of the support The next proposition shows that the UDP condition is weaker than the RIP RE and Compatibility conditions Proposition 3 Let X R n p be a full column rank matrix; then the following is true: The RIP(θ 5 ) condition with θ 5 < implies UDP( κ 0 ) for all pairs (κ 0 ) such that θ5 ] ] (3) + < κ 0 < + θ 5 and θ5 + κ 0 ] + θ5 κ 0 The RE( c 0 ) condition implies UDP( c 0 κ( c 0 ) ) The Compatibility( c 0 ) condition implies UDP( c 0 φ( c 0 ) ) Remark The point here is to show that the UDP condition is similar to the standard conditions of the high-dimensional regression For the sake of simplicity we do not study that if the converse of Proposition 3 is true As a matter of fact the UDP RE and Compatibility conditions are expressions with the same flavor: they aim at controlling the eigenvalues of X on a cone: {γ R p { p} st s γ c c γ } where c > 0 is a tunning parameter 3 The results Table shows that our results are similar to standard results in the literature Appendix A Appendix The appendix is devoted to the proof of the different results of this paper Lemma A uppose that ε = (ε i ) n i= is such that the ε i s are iid with respect to a Gaussian distribution with mean zero and variance σ n Choose t and set λ 0 n(t) = ( + t) X l σ n log p

9 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 9 Reference Condition Risk Prediction CP09] RIP l σ s log p + l L σ s log p + L BRT09] REC l σs log p (*) L σ log p (*) vdgb09] Comp l σs log p (*) L σ log p (*) JN0] H l σs log p + l No This article UDP l σs log p + l L σ log p + l / where notations are given by: Location l l L Right hand side β c β c Xβ ideal l Xβ l Left hand side ˆβ β ˆβ β l X ˆβ Xβ l Table Comparison of results in risk and prediction for the Lasso and the Dantzig selector Observe that all the inequalities are satisfied with an overwhelg probability The notation means that the inequality holds up to a multiplicative factor that may depends on the parameters of the condition The (*) notation means that the result is given for s-sparse targets The ˆβ notation represents the estimator (ie the lasso or the Dantzig selector) The parameters σ and p represent respectively the standard deviation of the noise and the dimension of the target vector β where Xl denotes the maximum l -norm of the columns of X Then P ( X ε l λ 0 n(t) ) / ( + t) ] π log p p (+t) Proof of Lemma A Observe that X ε N p (0 σn X X) Hence j = p N ( 0 σn Xj ) l Using Šidák s inequality id68] it yields P ( ( X ε l λ 0 n) P ε l λ 0 ) p n = P ( ε i λ 0 n) i= X j ε where the ε i s are iid with respect to N ( 0 σn X l Denote by Φ and ϕ respectively the ) cumulative distribution function and the probability density function of the standard normal et θ = ( + t) log p It holds p P ( ε i λ 0 ( n) = P ε λ 0 p n) = (Φ(θ) ) p > ( ϕ(θ)/θ ) p i= using an integration by parts to get Φ(θ) < ϕ(θ)/θ It yields that P ( X ε l This concludes the proof λ 0 ) ( ) p ϕ(θ) n ϕ(θ)/θ p = θ ( + t) π log p p (+t) Proof of Theorem Consider the following singular value decomposition X = U DA where U R n n is such that UU = Id n

10 0 YOHANN DE CATRO D = Diag(ρ ρ n ) is a diagonal matrix where ρ ρ n > 0 are the singular values of X and A R n p is such that AA = Id n We recall that the only assumption on the design is that it has full column rank which yields that ρ n > 0 Let δ be the distortion of the kernel Γ of the design Denote by π Γ (resp π Γ ) the l -projection onto Γ (resp Γ ) Let γ R p ; then γ = π Γ (γ) + π Γ (γ) An easy calculation shows that π Γ (γ) = A Aγ Let s { } and let { p} be such that = s It holds γ γl = πγ (γ) l + πγ (γ) l δ πγ (γ) p + A Aγ l δ ( γ p + (π Γ (γ)) ) + s Aγ l δ γ + δ A Aγ p l + Aγl δ γ + ( + δ) Aγl p δ γ p + + δ s Xγl ρ n δ γ + δ s Xγl p ρ n using the triangular inequality and the distortion of the kernel Γ Eventually set κ 0 = ( / p) δ and = δ/ρ n This ends the proof Proof of Theorem We recall that λ 0 n denotes an upper bound on the amplification of the noise; see () We begin with a standard result Lemma A Let h = β l β R p and λ l λ 0 n Then for all subsets { p} it holds (A) Xh λ l l + (λ l λ 0 n) ] h h + β c Proof By optimality we have Xβ l y l + λ lβ l Xβ y l + λ lβ It yields Xh l X ε h + λ lβ l λ lβ Let { p}; we have Xh ( l + λ lβ l c λ l β ) β l + λlβ c + X ε h λ h l + λ lβ c + λ 0 n h using () Adding λ lβ c on both sides it holds Xh + (λ l l λ 0 n) h c (λ l + λ 0 n) h + λ l β c

11 A REMARK ON THE LAO AND THE DANTZIG ELECTOR Adding (λ l λ 0 n) h on both sides we conclude the proof Using (7) and (A) it follows that (A) Xh λ l l + (λ l λ 0 n) ] h Xhl + κ 0h + β c It yields ) ] ( λ0 n h ( κ 0 Xh λ l 4λ l l + ) Xhl + β c λ l s + β c using the fact that the polynomial x (/4λ l ) x + x is not greater than λ l s This concludes the proof Proof of Theorem 3 We begin with a standard result Lemma A3 Let h = β l β R p and λ l λ 0 n Then for all subsets { p} it holds Xh (A3) 4λ l d + (λ d λ 0 n) ] h h + β c Proof et h = β β d Recall that X ε l λ 0 n it yields Hence we get (A4) Xh l X Xh l h = X ( y Xβ d) + X ( Xβ y ) l h (λ d + λ 0 n) h Xh l (λ d + λ 0 n) h c (λ d + λ 0 n) h ince β is feasible it yields β d β Thus β d c l ( β β d ) + β c h + β c l ince h c β d c + β c it yields (A5) h c + h β c Combining (A4) + λ d (A5) we get Xh l +(λ d λ 0 n) h c (3λ d + λ 0 n) h + 4λ d β c Adding (λ d λ 0 n) h on both sides we conclude the proof Using (7) and (A3) it follows that (A6) Xh + (λ 4λ l l λ 0 l n) h ] Xh l + κ 0 h + β c It yields ) ] 4( λ0 n h ( κ 0 Xh λ l 4λ l l + ) Xhl + β c λ l s + β c using the fact that the polynomial x (/4λ l ) x + x is not greater than λ l s This concludes the proof

12 YOHANN DE CATRO Proof of Theorem and Theorem 4 Using (A) we know that Xh + (λ λ l l l λ 0 n) ] h Xhl + κ 0h + β c It follows that Xh l 4λ l Xhl 4λ l β c This latter is of the form x bx c which implies that x b + c/b Hence Xhl 4λ l β c s + The same analysis holds for Theorem 4 Proof of Proposition 3 One can check that RE( c 0 ) implies UDP( c 0 κ( c 0 ) ) and that Compatibility( c 0 ) implies UDP( c 0 φ( c 0 ) ) Assume that X satisfies RIP(θ 5 ) Let γ R p s { 0 } and T 0 { p} such that T 0 = s Choose a pair (κ 0 ) as in (3) If γ T0 κ 0 γ then γ T0 Xγ l + κ 0 γ uppose that γ T0 > κ 0 γ then (A7) γ T c 0 < κ 0 γ T0 κ 0 Denote by T the set of the indices of the 4s largest coefficients (in absolute value) in T0 c denote by T the set of the indices of the 4s largest coefficients in (T 0 T ) c etc Hence we decompose T0 c into disjoint sets T0 c = T T T l Using (A7) it yields (A8) γ l Ti (4s) / γ Ti = (4s) / γ T c 0 κ 0 γt0 κ 0 s i i Using RIP(θ 5 ) and (A8) it follows that Xγl X(γ(T0 T )) l X(γTi ) l i θ γ(t0 T 5 ) l + θ 5 l γti i θ γt0l 5 κ 0 + θ 5 κ 0 θ5 + κ 0 ] + θ5 κ 0 + θ5 ( θ5 ) ] = + κ 0 κ 0 + θ 5 γ T0 γt0 s ( θ5 ) ] ] γt0 + + θ 5 s The lower bound on κ 0 shows that the right hand side is positive Observe that we took such that this latter is exactly γ T0 /( ) Eventually we get γt0 Xγ l Xγ l + κ 0 γ This ends the proofs Acknowledgments The author would like to thank Jean-Marc Azaïs and Franck Barthe for their support The authors would like to thank the anonymous reviewers for their valuable comments and suggestions

13 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 3 References BLPR] K Bertin E Le Pennec and V Rivoirard Adaptive Dantzig density estimation Ann Inst Henri Poincaré Probab tat 47 (0) no MR (0a:607) BRT09] PJ Bickel Y Ritov and AB Tsybakov imultaneous analysis of lasso and Dantzig selector Ann tatist 37 (009) no MR (00j:68) CDD09] A Cohen W Dahmen and R DeVore Compressed sensing and best k-term approximation J Amer Math oc (009) no 3 MR (00d:9404) CP09] EJ Candès and Y Plan Near-ideal model selection by l imization Ann tatist 37 (009) no 5A MR (00j:607) CRT06] EJ Candès JK Romberg and T Tao table signal recovery from incomplete and inaccurate measurements Comm Pure Appl Math 59 (006) no MR (007f:94007) CT07] EJ Candès and T Tao The Dantzig selector: statistical estimation when p is much larger than n Ann tatist 35 (007) no MR (009b:606) DET06] DL Donoho M Elad and VN Temlyakov table recovery of sparse overcomplete representations in the presence of noise IEEE Trans Inform Theory 5 (006) no 6 8 MR 3733 (007d:94007) Don06] DL Donoho Compressed sensing IEEE Trans Inform Theory 5 (006) no MR 489 (007e:9403) GLR08] V Guruswami J R Lee and A Razborov Almost euclidean subspaces of l via expander codes Proceedings of the nineteenth annual ACM-IAM symposium on Discrete algorithms ociety for Industrial and Applied Mathematics 008 pp HTF09] T Hastie R Tibshirani and J Friedman High-dimensional problems: p n The elements of statistical learning (009) 50 Ind07] P Indyk Uncertainty principles extractors and explicit embeddings of l into Proceedings of the thirty-ninth annual ACM symposium on Theory of computing ACM 007 pp I0] P Indyk and zarek Almost-euclidean subspaces of l n\ ell ˆ n via tensor products: A simple approach to randomness reduction Approximation Randomization and Combinatorial Optimization Algorithms and Techniques (00) JN0] A Juditsky and A Nemirovski Accuracy guarantees for l -recovery Kas77] B Kashin The widths of certain finite-dimensional sets and classes of smooth functions Izv Akad Nauk R er Mat 4 (977) no MR (58 #89) KT07] B Kashin and VN Temlyakov A remark on the problem of compressed sensing Mat Zametki 8 (007) no MR (009k:407) id68] Z idák On multivariate normal probabilities of rectangles: Their dependence on correlations Ann Math tatist 39 (968) MR (37 #5965) Tib96] R Tibshirani Regression shrinkage and selection via the lasso J Roy tatist oc er B 58 (996) no MR 3794 (96j:634) vdgb09] A van de Geer and P Bühlmann On the conditions used to prove oracle results for the Lasso Electron J tat 3 (009) MR ZY06] P Zhao and B Yu On model selection consistency of Lasso J Mach Learn Res 7 (006) MR Yohann de Castro is with the Département de Mathématiques Université Paris-ud Faculté des ciences d Orsay 9405 Orsay France This article was written mostly during his PhD at the Institut de Mathématiques de Toulouse address: yohanndecastro@mathu-psudfr

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Does Compressed Sensing have applications in Robust Statistics?

Does Compressed Sensing have applications in Robust Statistics? Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Shifting Inequality and Recovery of Sparse Signals

Shifting Inequality and Recovery of Sparse Signals Shifting Inequality and Recovery of Sparse Signals T. Tony Cai Lie Wang and Guangwu Xu Astract In this paper we present a concise and coherent analysis of the constrained l 1 minimization method for stale

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

Guaranteed Sparse Recovery under Linear Transformation

Guaranteed Sparse Recovery under Linear Transformation Ji Liu JI-LIU@CS.WISC.EDU Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Lei Yuan LEI.YUAN@ASU.EDU Jieping Ye JIEPING.YE@ASU.EDU Department of Computer Science

More information

Tractable performance bounds for compressed sensing.

Tractable performance bounds for compressed sensing. Tractable performance bounds for compressed sensing. Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure/INRIA, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This

More information

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3784. Ming-Jun Lai Department of Mathematics,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals

Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals Venkatesan Guruswami James R. Lee Avi Wigderson Abstract It is well-known that R N has subspaces of dimension proportional

More information

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals On the l 1 -Norm Invariant Convex -Sparse Decomposition of Signals arxiv:1305.6021v2 [cs.it] 11 Nov 2013 Guangwu Xu and Zhiqiang Xu Abstract Inspired by an interesting idea of Cai and Zhang, we formulate

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

arxiv: v1 [math.st] 5 Oct 2009

arxiv: v1 [math.st] 5 Oct 2009 On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Average-case hardness of RIP certification

Average-case hardness of RIP certification Average-case hardness of RIP certification Tengyao Wang Centre for Mathematical Sciences Cambridge, CB3 0WB, United Kingdom t.wang@statslab.cam.ac.uk Quentin Berthet Centre for Mathematical Sciences Cambridge,

More information

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably

More information

Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1

Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1 Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3740 Ming-Jun Lai Department of Mathematics

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

A Comparative Framework for Preconditioned Lasso Algorithms

A Comparative Framework for Preconditioned Lasso Algorithms A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com

More information

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 6-5-2008 Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit

More information

Error Correction via Linear Programming

Error Correction via Linear Programming Error Correction via Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles,

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Dimensionality Reduction Notes 3

Dimensionality Reduction Notes 3 Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Necessary and sufficient conditions of solution uniqueness in l 1 minimization 1 Necessary and sufficient conditions of solution uniqueness in l 1 minimization Hui Zhang, Wotao Yin, and Lizhi Cheng arxiv:1209.0652v2 [cs.it] 18 Sep 2012 Abstract This paper shows that the solutions

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Deterministic constructions of compressed sensing matrices

Deterministic constructions of compressed sensing matrices Journal of Complexity 23 (2007) 918 925 www.elsevier.com/locate/jco Deterministic constructions of compressed sensing matrices Ronald A. DeVore Department of Mathematics, University of South Carolina,

More information

arxiv: v1 [math.st] 8 Jan 2008

arxiv: v1 [math.st] 8 Jan 2008 arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Stable Signal Recovery from Incomplete and Inaccurate Measurements

Stable Signal Recovery from Incomplete and Inaccurate Measurements Stable Signal Recovery from Incomplete and Inaccurate Measurements EMMANUEL J. CANDÈS California Institute of Technology JUSTIN K. ROMBERG California Institute of Technology AND TERENCE TAO University

More information

Estimating LASSO Risk and Noise Level

Estimating LASSO Risk and Noise Level Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu

More information

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls 6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Limitations in Approximating RIP

Limitations in Approximating RIP Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Sparse recovery using sparse random matrices

Sparse recovery using sparse random matrices Sparse recovery using sparse random matrices Radu Berinde MIT texel@mit.edu Piotr Indyk MIT indyk@mit.edu April 26, 2008 Abstract We consider the approximate sparse recovery problem, where the goal is

More information

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula

Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Universal low-rank matrix recovery from Pauli measurements

Universal low-rank matrix recovery from Pauli measurements Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

DASSO: connections between the Dantzig selector and lasso

DASSO: connections between the Dantzig selector and lasso J. R. Statist. Soc. B (2009) 71, Part 1, pp. 127 142 DASSO: connections between the Dantzig selector and lasso Gareth M. James, Peter Radchenko and Jinchi Lv University of Southern California, Los Angeles,

More information

Towards a Mathematical Theory of Super-resolution

Towards a Mathematical Theory of Super-resolution Towards a Mathematical Theory of Super-resolution Carlos Fernandez-Granda www.stanford.edu/~cfgranda/ Information Theory Forum, Information Systems Laboratory, Stanford 10/18/2013 Acknowledgements This

More information

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China Acta Mathematica Sinica, English Series May, 015, Vol. 31, No. 5, pp. 755 766 Published online: April 15, 015 DOI: 10.1007/s10114-015-434-4 Http://www.ActaMath.com Acta Mathematica Sinica, English Series

More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah 00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Electronic Journal of Statistics ISSN: 935-7524 arxiv: arxiv:503.0388 Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Yuchen Zhang, Martin J. Wainwright

More information

The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective

The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective Forty-Eighth Annual Allerton Conference Allerton House UIUC Illinois USA September 9 - October 1 010 The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective Gongguo Tang

More information

Computable Performance Analysis of Sparsity Recovery with Applications

Computable Performance Analysis of Sparsity Recovery with Applications Computable Performance Analysis of Sparsity Recovery with Applications Arye Nehorai Preston M. Green Department of Electrical & Systems Engineering Washington University in St. Louis, USA European Signal

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1

THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1 The Annals of Statistics 27, Vol. 35, No. 6, 2313 2351 DOI: 1.1214/953661523 Institute of Mathematical Statistics, 27 THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1 BY EMMANUEL

More information