A REMARK ON THE LASSO AND THE DANTZIG SELECTOR
|
|
- Augusta Jacobs
- 5 years ago
- Views:
Transcription
1 A REMARK ON THE LAO AND THE DANTZIG ELECTOR YOHANN DE CATRO Abstract This article investigates a new parameter for the high-dimensional regression with noise: the distortion This latter has attracted a lot of attention recently with the appearance of new deteristic constructions of almost -Euclidean sections of the L-ball It measures how far is the intersection between the kernel of the design matrix and the unit L-ball from an L-ball We show that the distortion holds enough information to derive oracle inequalities (ie a comparison to an ideal situation where one knows the s largest coefficients of the target) for the lasso and the Dantzig selector Introduction In the past decade much emphasis has been put on recovering a large number of unknown variables from few noisy observations Consider the high-dimensional linear model where one observes a vector y R n such that y = Xβ + ε where X R n p is called the design matrix (known from the experimenter) β R p is an unknown target vector one would like to recover and ε R n is a stochastic error term that contains all the perturbations of the experiment A standard hypothesis in high-dimensional regression HTF09] requires that one can provide a constant λ 0 n R as small as possible such that () X ε l λ 0 n with an overwhelg probability where X R p n denotes the transpose matrix of X In the case of n-multivariate Gaussian distribution it is known that λ 0 n = O(σ n log p) where σn > 0 denotes the standard deviation of the noise; see Lemma A uppose that you have far less observation variables y i than the unknown variables β i For instance let us mention the Compressed ensing problem Don06 CRT06] where one would like to simultaneously acquire and compress a signal using few (non-adaptive) linear measurements ie n p In general terms we are interested in accurately estimating the target vector β and the response Xβ from few and corrupted observations During the past decade this challenging issue has attracted a lot of attention among the statistical society In 996 R Tibshirani introduced the lasso Tib96]: { () β l arg Xβ y + λ β R p l l β } where λ l > 0 denotes a tuning parameter Two decades later this estimator continues to play a key role in our understanding of high-dimensional inverse problems Its popularity might be due to the fact that this estimator is computationally tractable Indeed the lasso can be recasted Date: eptember 8 0 Key words and phrases Lasso; Dantzig selector; Oracle inequality; Almost-Euclidean section; Distortion
2 YOHANN DE CATRO in a econd Order Cone Program (OCP) that can be solved using an interior point method Recently EJ Candès and T Tao CT07] have introduced the Dantzig selector as (3) β d arg β R p β st X (y Xβ) l λ d where λ d > 0 is a tuning parameter It is known that it can be recasted as a linear program Hence it is also computationally tractable A great statistical challenge is then to find efficiently verifiable conditions on X ensuring that the lasso () or the Dantzig selector (3) would recover most of the information about the target vector β Our goal What do we precisely mean by most of the information about the target? What is the amount of information one could recover from few observations? These are two of the important questions raised by Compressed ensing uppose that you want to find an s-sparse vector (ie a vector with at most s non-zero coefficients) that represents the target then you would probably want that it contains the s largest (in magnitude) coefficients βi More precisely denote by { p} the set of the indices of the s largest coefficients The s-best term approximation vector is β R p where (β ) i = βi if i and 0 otherwise Observe that it is the s-sparse projection in respect to any l q -norm for q < + (ie it imizes the l q -distance to β among all the s-sparse vectors) and then the most natural approximation by an s-sparse vector uppose that someone gives you all the keys to recover β More precisely imagine that you know the subset in advance and that you observe y oracle = Xβ + ε This is an ideal situation referred as the oracle case Assume that the noise ε is a Gaussian white noise of standard deviation σ n ie ε N n (0 σn Id n ) where N n denotes the n-multivariate Gaussian distribution Then the optimal estimator is the ordinary least square β ideal R p on the subset namely: β ideal arg β R p upp(β) Xβ y oracle l where upp(β) { p} denotes the support (ie the set of the indices of the non-zero coefficients) of the vector β It holds β ideal β = β ideal β + β c β ideal β l + β c where β c = β β denotes the error vector of the s-best term approximation A calculation of the solution of the least square estimator shows that: E β ideal β l = E ( X X ) X y oracle β l = E ( X X ) X ε l = Trace (( X X ) ) σ n ( ρ ) σ n s where X R n s denotes the matrix composed by the columns X i R n of the matrix X such that i and ρ is the largest singular value of X It yields that E ] β ideal β / l σ n s ρ In a nutshell the l -distance between the target β and the optimal estimator β ideal can be reasonably said of the order of (4) σ n s + β ρ c
3 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 3 In this article we say that the lasso satisfies a variable selection oracle inequality of order s if and only if its l -distance to the target namely β l β is bounded by (4) up to a satisfactory multiplicative factor In some situations it could be interesting to have a good approximation of Xβ In the oracle case we have Xβ ideal Xβ l Xβ ideal Xβ l + Xβ l c Xβ ideal Xβ l + ρ β c where ρ denotes the largest singular value of X An easy calculation gives that E Xβ ideal Xβ l = Trace ( X ( X X ) X ) σ n = σ n s Hence a tolerable upper bound is given by (5) σ n s + ρ β c We say that the lasso satisfies an error prediction oracle inequality of order s if and only if its prediction error is upper bounded by (5) up to a satisfactory multiplicative factor (say logarithmic in p) Framework In this article we investigate designs with known distortion We begin with the definition of this latter: Definition A subspace Γ R p has a distortion δ p if and only if x Γ x p xl δ x A long standing issue in approximation theory in Banach spaces is to find almost -Euclidean sections of the unit l -ball ie subspaces with a distortion δ close to and a dimension close to p In particular we recall that it has been established Kas77] that with an overwhelg probability a random subspace of dimension p n (with respect to the Haar measure on the Grassmannian) satisfies ( ) / p( + log(p/n)) (6) δ C n where C > 0 is a universal constant In other words it was shown that for all n p there exists a subspace Γ n of dimension p n such that for all x Γ n ( ) / xl + log(p/n) C x n Remark Hence our framework deals also with unitary invariant random matrices For instance the matrices with iid Gaussian entries Observe that their distortion satisfies (6) Recently new deteristic constructions of almost -Euclidean sections of the l -ball have been given Most of them can be viewed as related to the context of error-correcting codes Indeed the construction of Ind07] is based on amplifying the imum distance of a code using expanders While the construction of GLR08] is based on Low-Density Parity Check (LDPC) codes Finally the construction of I0] is related to the tensor product of error-correcting codes The main reason of this surprising fact is that the vectors of a subspace of low distortion must be well-spread ie a small subset of its coordinates cannot contain most of its l -norm (cf Ind07 GLR08]) This property is required from a good error-correcting code where the weight (ie the l 0 -norm) of each codeword cannot be concentrated on a small subset of its coordinates imilarly this property was intensively studied in Compressed ensing; see for instance the Nullspace Property in CDD09]
4 4 YOHANN DE CATRO Remark The main point of this article is that all of these deteristic constructions give efficient designs for the lasso and the Dantzig selector 3 The Universal Distortion Property In the past decade numerous conditions have been given to prove oracle inequalities for the lasso and the Dantzig selector An overview of important conditions can be found in vdgb09] We introduce a new condition the Universal Distortion Property (UDP) Definition (UDP( 0 κ 0 )) Given 0 p and 0 < κ 0 < / we say that a matrix X R n p satisfies the universal distortion condition of order 0 magnitude κ 0 and parameter if and only if for all γ R p for all integers s { 0 } for all subsets { p} such that = s it holds (7) γ Xγ l + κ 0 γ Remark Observe that the design X is not normalized Equation (8) in Theorem shows that can depend on the inverse of the smallest singular value of X Hence the quantity Xγ l is scalar invariant The UDP condition is similar to the Magic Condition BLPR] and the Compatibility Condition vdgb09] The main point of this article is that UDP is verifiable as soon as one can give an upper bound on the distortion of the kernel of the design matrix; see Theorem Hence instead of proving that a sufficient condition (such as RIP CRT06] REC BRT09] Compatibility vdgb09] ) holds it is sufficient to compute the distortion and the largest singular value of the design Especially as these conditions can be hard to prove for a given matrix We recall that an open problem is to find a computationally efficient algorithm that can tell if a given matrix satisfies the RIP condition CRT06] or not We call the property Universal Distortion because it is satisfied by all the full rank matrices (Universal) and the parameters 0 and can be expressed in terms of the distortion of the kernel Γ of X: Theorem Let X R n p be a full rank matrix Denote by δ the distortion of its kernel: γ δ = p γ l sup γ ker(x) and ρ n its smallest singular value Then for all γ R p γl δ γ + δ Xγl p ρ n Equivalently we have B := {γ R p (δ/ p) γ + (δ/ρ n ) Xγ l } B p where Bp denotes the Euclidean unit ball in R p This result implies that every full rank matrix satisfies UDP with parameters described as follows Theorem Let X R n p be a full rank matrix Denote by δ the distortion of its kernel and ρ n its smallest singular value Let 0 < κ 0 < / then X satisfies UDP( 0 κ 0 ) where ( κ0 ) p δ (8) 0 = and = δ ρ n This theorem is sharp in the following sense The parameter 0 represents (see Theorem ) the maximum number of coefficients that can be recovered using lasso we call it the sparsity
5 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 5 level It is known CDD09] that the best bound one could expect is opt n/ log(p/n) up to a multiplicative constant In the case where (6) holds the sparsity level satisfies (9) 0 κ 0 opt It shows that any design matrix with low distortion satisfies UDP with an optimal sparsity level Oracle inequalities The results presented here fold into two parts In the first part we assume only that UDP holds In particular it is not excluded that one can get better upper bounds on the parameters than Theorem As a matter of fact the smaller is the sharper the oracle inequalities are Then we give oracle inequalities in terms of only the distortion of the design Theorem Let X R n p be a full column rank matrix UDP( 0 κ 0 ) and that () holds Then for any (0) λ l > λ 0 n/( κ 0 ) it holds () β l β ( λ ) 0 n λl κ0 ( {p} =s s 0 λ l s + β c ) Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < / and λ l satisfying (0) we have () β l β ( λ ) (λ 0 l 4 δ n λl κ0 {p} ρ s + β ) c n =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Consider the case where the noise satisfies the hypothesis of Lemma A and take λ 0 n = λ 0 n() Assume that κ 0 is constant (say κ 0 = /3) and take λ l = 3λ 0 n; then () becomes β l β ( 6 Xl log p σ n s + β c {p} ) =s s 0 which is an oracle inequality up to a multiplicative factor log p In the same way () becomes β l β ( 4 Xl δ log p σ n s + β {p} ρ n ρ c ) n =s s p/9δ which is an oracle inequality up to a multiplicative factor C mult := (δ log p)/ρ n In the optimal case (6) this latter becomes: (3) C mult = C p ( + log(p/n)) log p n ρ n where C > 0 is the same universal constant as in (6) Roughly speaking up to a factor of the order of (3) the lasso is as good as the oracle that knows the 0 -best term approximation of the target Moreover as mentioned in (9) 0 is an optimal sparsity level However this multiplicative constant takes small values for a restrictive range of the parameter n As a matter of fact it is meaningful when n is a constant fraction of p imilarly we shows oracle inequalities in error prediction in terms of the distortion of the kernel of the design
6 6 YOHANN DE CATRO Theorem Let X R n p be a full column rank matrix UDP( 0 κ 0 ) and that () holds Then for any (0) λ l > λ 0 n/( κ 0 ) it holds (4) Xβ l Xβ l 4λ l β c s + {p} =s s 0 Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < / and λ l satisfying (0) we have (5) Xβ l Xβ l 4λ l δ s + {p} ρ n δ ρ n β c ] =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Consider the case where the noise satisfies the hypothesis of Lemma A and take λ 0 n = λ 0 n() Assume that κ 0 is constant (say κ 0 = /3) and take λ l = 3λ 0 n then (4) becomes Xβ l Xβ l 4 X ] β c {p} l log p σ n s + s =s s 0 which is not an oracle inequality stricto sensu because of /( ) in the second term As a matter of fact it tends to lower the s-best term approximation term β c Nevertheless it is almost an oracle inequality up to a multiplicative factor of the order of log p In the same way (5) becomes Xβ l Xβ l 48 X δ log p {p} l σ n s + ρ n ρ n δ ρ n β ] c =s s p/9δ which is an oracle inequality up to a multiplicative factor C mult := (δ log p)/ρ n In the optimal case (6) this latter becomes: (6) C mult = C where C > 0 is the same universal constant as in (6) (p log p ( + log(p/n)))/ ρ n n Results for the Dantzig selector imilarly we derive the same results for the Dantzig selector The only difference is that the parameter κ 0 must be less than /4 Here again the results fold into two parts In the first one we only assume that UDP holds In the second we invoke Theorem to derive results in terms of the distortion of the design Theorem 3 Let X R n p be a full column rank matrix UDP( 0 κ 0 ) with κ 0 < /4 and that () holds Then for any (7) λ d > λ 0 /( 4κ 0 ) it holds (8) β d β 4 ( ) λ 0 λ d 4κ0 ( {p} =s s 0 ] λ d s + β c ) Assume that X satisfies
7 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 7 For every full column rank matrix X R n p for all 0 < κ 0 < /4 and λ d satisfying (7) we have (9) β d β 4 ( λ 0 λ d ) 4κ0 (λ d 4 δ {p} ρ s + β ) c n =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel The prediction error is given by the following theorem Theorem 4 Let X R n p be a full column rank matrix UDP( 0 κ 0 ) with κ 0 < /4 and that () holds Then for any (7) λ d > λ 0 /( 4κ 0 ) it holds (0) Xβ d Xβ l 4λ d β c s + {p} =s s 0 Assume that X satisfies For every full column rank matrix X R n p for all 0 < κ 0 < /4 and λ d satisfying (0) we have () Xβ d Xβ l 4λ d δ s + {p} ρ n δ ρ n β c ] =s s (κ 0/δ) p where ρ n denotes the smallest singular value of X and δ the distortion of its kernel Observe that the same comments as in the lasso case (eg (3) (6)) hold Eventually every deteristic construction of almost-euclidean sections gives design that satisfies the oracle inequalities above 3 An overview of the standard results Oracle inequalities for the lasso and the Dantzig selector have been established under a variety of different conditions on the design In this section we show that the UDP condition is comparable to the standard conditions (RIP REC and Compatibility) and that our results are relevant in the literature on the high-dimensional regression 3 The standard conditions We recall some sufficient conditions here For all s { p} we denote by Σ s R p the set of all the s-sparse vectors Restricted Isoperimetric Property: A matrix X R n p satisfies RIP (θ ) if and only if there exists 0 < θ < (as small as possible) such that for all s { } for all γ Σ s it holds ( θ ) γ l Xγ l ( + θ ) γ l The constant θ is called the -restricted isometry constant Restricted Eigenvalue Assumption BRT09]: A matrix X satisfies RE( c 0 ) if and only if κ( c 0 ) = {p} γ 0 γ c c 0 γ Xγ l γ l > 0 The constant κ( c 0 ) is called the ( c 0 )-restricted l -eigenvalue ]
8 8 YOHANN DE CATRO () Compatibility Condition vdgb09]: We say that a matrix X R n p satisfies the condition Compatibility( c 0 ) if and only if Xγ l φ( c 0 ) = > 0 γ {p} γ 0 γ c c 0 γ The constant φ( c 0 ) is called the ( c 0 )-restricted l -eigenvalue H Condition JN0]: X R n p satisfies the H (κ) condition (with κ < /) if and only if for all γ R p and for all { p} such that it holds γ ˆλ Xγ l +κ γ where ˆλ denotes the maximum of the l -norms of the columns in X Remark The first term of the right hand side (ie s Xγ l ) is greater than the first term of the right hand side of the UDP condition (ie Xγ l ) Hence the H condition is weaker than the UDP condition Nevertheless the authors JN0] established limits of performance on their conditions: the condition H s (/3) (that implies H s (/3)) is feasible only in a severe restricted range of the sparsity parameter s Notice that this is not the case of the UDP condition the equality (9) shows that it is feasible for a large range of the sparsity parameter s Moreover a comparison of the two approaches is given in Table Let us emphasize that the above description is not meant to be exhaustive In particular we do not mention the irrepresentable condition ZY06] which ensures exact recovery of the support The next proposition shows that the UDP condition is weaker than the RIP RE and Compatibility conditions Proposition 3 Let X R n p be a full column rank matrix; then the following is true: The RIP(θ 5 ) condition with θ 5 < implies UDP( κ 0 ) for all pairs (κ 0 ) such that θ5 ] ] (3) + < κ 0 < + θ 5 and θ5 + κ 0 ] + θ5 κ 0 The RE( c 0 ) condition implies UDP( c 0 κ( c 0 ) ) The Compatibility( c 0 ) condition implies UDP( c 0 φ( c 0 ) ) Remark The point here is to show that the UDP condition is similar to the standard conditions of the high-dimensional regression For the sake of simplicity we do not study that if the converse of Proposition 3 is true As a matter of fact the UDP RE and Compatibility conditions are expressions with the same flavor: they aim at controlling the eigenvalues of X on a cone: {γ R p { p} st s γ c c γ } where c > 0 is a tunning parameter 3 The results Table shows that our results are similar to standard results in the literature Appendix A Appendix The appendix is devoted to the proof of the different results of this paper Lemma A uppose that ε = (ε i ) n i= is such that the ε i s are iid with respect to a Gaussian distribution with mean zero and variance σ n Choose t and set λ 0 n(t) = ( + t) X l σ n log p
9 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 9 Reference Condition Risk Prediction CP09] RIP l σ s log p + l L σ s log p + L BRT09] REC l σs log p (*) L σ log p (*) vdgb09] Comp l σs log p (*) L σ log p (*) JN0] H l σs log p + l No This article UDP l σs log p + l L σ log p + l / where notations are given by: Location l l L Right hand side β c β c Xβ ideal l Xβ l Left hand side ˆβ β ˆβ β l X ˆβ Xβ l Table Comparison of results in risk and prediction for the Lasso and the Dantzig selector Observe that all the inequalities are satisfied with an overwhelg probability The notation means that the inequality holds up to a multiplicative factor that may depends on the parameters of the condition The (*) notation means that the result is given for s-sparse targets The ˆβ notation represents the estimator (ie the lasso or the Dantzig selector) The parameters σ and p represent respectively the standard deviation of the noise and the dimension of the target vector β where Xl denotes the maximum l -norm of the columns of X Then P ( X ε l λ 0 n(t) ) / ( + t) ] π log p p (+t) Proof of Lemma A Observe that X ε N p (0 σn X X) Hence j = p N ( 0 σn Xj ) l Using Šidák s inequality id68] it yields P ( ( X ε l λ 0 n) P ε l λ 0 ) p n = P ( ε i λ 0 n) i= X j ε where the ε i s are iid with respect to N ( 0 σn X l Denote by Φ and ϕ respectively the ) cumulative distribution function and the probability density function of the standard normal et θ = ( + t) log p It holds p P ( ε i λ 0 ( n) = P ε λ 0 p n) = (Φ(θ) ) p > ( ϕ(θ)/θ ) p i= using an integration by parts to get Φ(θ) < ϕ(θ)/θ It yields that P ( X ε l This concludes the proof λ 0 ) ( ) p ϕ(θ) n ϕ(θ)/θ p = θ ( + t) π log p p (+t) Proof of Theorem Consider the following singular value decomposition X = U DA where U R n n is such that UU = Id n
10 0 YOHANN DE CATRO D = Diag(ρ ρ n ) is a diagonal matrix where ρ ρ n > 0 are the singular values of X and A R n p is such that AA = Id n We recall that the only assumption on the design is that it has full column rank which yields that ρ n > 0 Let δ be the distortion of the kernel Γ of the design Denote by π Γ (resp π Γ ) the l -projection onto Γ (resp Γ ) Let γ R p ; then γ = π Γ (γ) + π Γ (γ) An easy calculation shows that π Γ (γ) = A Aγ Let s { } and let { p} be such that = s It holds γ γl = πγ (γ) l + πγ (γ) l δ πγ (γ) p + A Aγ l δ ( γ p + (π Γ (γ)) ) + s Aγ l δ γ + δ A Aγ p l + Aγl δ γ + ( + δ) Aγl p δ γ p + + δ s Xγl ρ n δ γ + δ s Xγl p ρ n using the triangular inequality and the distortion of the kernel Γ Eventually set κ 0 = ( / p) δ and = δ/ρ n This ends the proof Proof of Theorem We recall that λ 0 n denotes an upper bound on the amplification of the noise; see () We begin with a standard result Lemma A Let h = β l β R p and λ l λ 0 n Then for all subsets { p} it holds (A) Xh λ l l + (λ l λ 0 n) ] h h + β c Proof By optimality we have Xβ l y l + λ lβ l Xβ y l + λ lβ It yields Xh l X ε h + λ lβ l λ lβ Let { p}; we have Xh ( l + λ lβ l c λ l β ) β l + λlβ c + X ε h λ h l + λ lβ c + λ 0 n h using () Adding λ lβ c on both sides it holds Xh + (λ l l λ 0 n) h c (λ l + λ 0 n) h + λ l β c
11 A REMARK ON THE LAO AND THE DANTZIG ELECTOR Adding (λ l λ 0 n) h on both sides we conclude the proof Using (7) and (A) it follows that (A) Xh λ l l + (λ l λ 0 n) ] h Xhl + κ 0h + β c It yields ) ] ( λ0 n h ( κ 0 Xh λ l 4λ l l + ) Xhl + β c λ l s + β c using the fact that the polynomial x (/4λ l ) x + x is not greater than λ l s This concludes the proof Proof of Theorem 3 We begin with a standard result Lemma A3 Let h = β l β R p and λ l λ 0 n Then for all subsets { p} it holds Xh (A3) 4λ l d + (λ d λ 0 n) ] h h + β c Proof et h = β β d Recall that X ε l λ 0 n it yields Hence we get (A4) Xh l X Xh l h = X ( y Xβ d) + X ( Xβ y ) l h (λ d + λ 0 n) h Xh l (λ d + λ 0 n) h c (λ d + λ 0 n) h ince β is feasible it yields β d β Thus β d c l ( β β d ) + β c h + β c l ince h c β d c + β c it yields (A5) h c + h β c Combining (A4) + λ d (A5) we get Xh l +(λ d λ 0 n) h c (3λ d + λ 0 n) h + 4λ d β c Adding (λ d λ 0 n) h on both sides we conclude the proof Using (7) and (A3) it follows that (A6) Xh + (λ 4λ l l λ 0 l n) h ] Xh l + κ 0 h + β c It yields ) ] 4( λ0 n h ( κ 0 Xh λ l 4λ l l + ) Xhl + β c λ l s + β c using the fact that the polynomial x (/4λ l ) x + x is not greater than λ l s This concludes the proof
12 YOHANN DE CATRO Proof of Theorem and Theorem 4 Using (A) we know that Xh + (λ λ l l l λ 0 n) ] h Xhl + κ 0h + β c It follows that Xh l 4λ l Xhl 4λ l β c This latter is of the form x bx c which implies that x b + c/b Hence Xhl 4λ l β c s + The same analysis holds for Theorem 4 Proof of Proposition 3 One can check that RE( c 0 ) implies UDP( c 0 κ( c 0 ) ) and that Compatibility( c 0 ) implies UDP( c 0 φ( c 0 ) ) Assume that X satisfies RIP(θ 5 ) Let γ R p s { 0 } and T 0 { p} such that T 0 = s Choose a pair (κ 0 ) as in (3) If γ T0 κ 0 γ then γ T0 Xγ l + κ 0 γ uppose that γ T0 > κ 0 γ then (A7) γ T c 0 < κ 0 γ T0 κ 0 Denote by T the set of the indices of the 4s largest coefficients (in absolute value) in T0 c denote by T the set of the indices of the 4s largest coefficients in (T 0 T ) c etc Hence we decompose T0 c into disjoint sets T0 c = T T T l Using (A7) it yields (A8) γ l Ti (4s) / γ Ti = (4s) / γ T c 0 κ 0 γt0 κ 0 s i i Using RIP(θ 5 ) and (A8) it follows that Xγl X(γ(T0 T )) l X(γTi ) l i θ γ(t0 T 5 ) l + θ 5 l γti i θ γt0l 5 κ 0 + θ 5 κ 0 θ5 + κ 0 ] + θ5 κ 0 + θ5 ( θ5 ) ] = + κ 0 κ 0 + θ 5 γ T0 γt0 s ( θ5 ) ] ] γt0 + + θ 5 s The lower bound on κ 0 shows that the right hand side is positive Observe that we took such that this latter is exactly γ T0 /( ) Eventually we get γt0 Xγ l Xγ l + κ 0 γ This ends the proofs Acknowledgments The author would like to thank Jean-Marc Azaïs and Franck Barthe for their support The authors would like to thank the anonymous reviewers for their valuable comments and suggestions
13 A REMARK ON THE LAO AND THE DANTZIG ELECTOR 3 References BLPR] K Bertin E Le Pennec and V Rivoirard Adaptive Dantzig density estimation Ann Inst Henri Poincaré Probab tat 47 (0) no MR (0a:607) BRT09] PJ Bickel Y Ritov and AB Tsybakov imultaneous analysis of lasso and Dantzig selector Ann tatist 37 (009) no MR (00j:68) CDD09] A Cohen W Dahmen and R DeVore Compressed sensing and best k-term approximation J Amer Math oc (009) no 3 MR (00d:9404) CP09] EJ Candès and Y Plan Near-ideal model selection by l imization Ann tatist 37 (009) no 5A MR (00j:607) CRT06] EJ Candès JK Romberg and T Tao table signal recovery from incomplete and inaccurate measurements Comm Pure Appl Math 59 (006) no MR (007f:94007) CT07] EJ Candès and T Tao The Dantzig selector: statistical estimation when p is much larger than n Ann tatist 35 (007) no MR (009b:606) DET06] DL Donoho M Elad and VN Temlyakov table recovery of sparse overcomplete representations in the presence of noise IEEE Trans Inform Theory 5 (006) no 6 8 MR 3733 (007d:94007) Don06] DL Donoho Compressed sensing IEEE Trans Inform Theory 5 (006) no MR 489 (007e:9403) GLR08] V Guruswami J R Lee and A Razborov Almost euclidean subspaces of l via expander codes Proceedings of the nineteenth annual ACM-IAM symposium on Discrete algorithms ociety for Industrial and Applied Mathematics 008 pp HTF09] T Hastie R Tibshirani and J Friedman High-dimensional problems: p n The elements of statistical learning (009) 50 Ind07] P Indyk Uncertainty principles extractors and explicit embeddings of l into Proceedings of the thirty-ninth annual ACM symposium on Theory of computing ACM 007 pp I0] P Indyk and zarek Almost-euclidean subspaces of l n\ ell ˆ n via tensor products: A simple approach to randomness reduction Approximation Randomization and Combinatorial Optimization Algorithms and Techniques (00) JN0] A Juditsky and A Nemirovski Accuracy guarantees for l -recovery Kas77] B Kashin The widths of certain finite-dimensional sets and classes of smooth functions Izv Akad Nauk R er Mat 4 (977) no MR (58 #89) KT07] B Kashin and VN Temlyakov A remark on the problem of compressed sensing Mat Zametki 8 (007) no MR (009k:407) id68] Z idák On multivariate normal probabilities of rectangles: Their dependence on correlations Ann Math tatist 39 (968) MR (37 #5965) Tib96] R Tibshirani Regression shrinkage and selection via the lasso J Roy tatist oc er B 58 (996) no MR 3794 (96j:634) vdgb09] A van de Geer and P Bühlmann On the conditions used to prove oracle results for the Lasso Electron J tat 3 (009) MR ZY06] P Zhao and B Yu On model selection consistency of Lasso J Mach Learn Res 7 (006) MR Yohann de Castro is with the Département de Mathématiques Université Paris-ud Faculté des ciences d Orsay 9405 Orsay France This article was written mostly during his PhD at the Institut de Mathématiques de Toulouse address: yohanndecastro@mathu-psudfr
INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina
INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationDoes Compressed Sensing have applications in Robust Statistics?
Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationof Orthogonal Matching Pursuit
A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationShifting Inequality and Recovery of Sparse Signals
Shifting Inequality and Recovery of Sparse Signals T. Tony Cai Lie Wang and Guangwu Xu Astract In this paper we present a concise and coherent analysis of the constrained l 1 minimization method for stale
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationA new method on deterministic construction of the measurement matrix in compressed sensing
A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central
More informationLASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationGuaranteed Sparse Recovery under Linear Transformation
Ji Liu JI-LIU@CS.WISC.EDU Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Lei Yuan LEI.YUAN@ASU.EDU Jieping Ye JIEPING.YE@ASU.EDU Department of Computer Science
More informationTractable performance bounds for compressed sensing.
Tractable performance bounds for compressed sensing. Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure/INRIA, U.C. Berkeley. Support from NSF, DHS and Google.
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationCS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5
CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given
More informationNecessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization
Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This
More informationThe Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1
The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3784. Ming-Jun Lai Department of Mathematics,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationEuclidean sections of l N 1 with sublinear randomness and error-correction over the reals
Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals Venkatesan Guruswami James R. Lee Avi Wigderson Abstract It is well-known that R N has subspaces of dimension proportional
More informationOn the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals
On the l 1 -Norm Invariant Convex -Sparse Decomposition of Signals arxiv:1305.6021v2 [cs.it] 11 Nov 2013 Guangwu Xu and Zhiqiang Xu Abstract Inspired by an interesting idea of Cai and Zhang, we formulate
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationarxiv: v1 [math.st] 5 Oct 2009
On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable
More informationMath 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.
Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationAverage-case hardness of RIP certification
Average-case hardness of RIP certification Tengyao Wang Centre for Mathematical Sciences Cambridge, CB3 0WB, United Kingdom t.wang@statslab.cam.ac.uk Quentin Berthet Centre for Mathematical Sciences Cambridge,
More informationA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees
Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationSignal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit
Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably
More informationSparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1
Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3740 Ming-Jun Lai Department of Mathematics
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationConvex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationA Comparative Framework for Preconditioned Lasso Algorithms
A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com
More informationUniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 6-5-2008 Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit
More informationError Correction via Linear Programming
Error Correction via Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles,
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationDimensionality Reduction Notes 3
Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T
More informationSparsity and the Lasso
Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is
More informationNecessary and sufficient conditions of solution uniqueness in l 1 minimization
1 Necessary and sufficient conditions of solution uniqueness in l 1 minimization Hui Zhang, Wotao Yin, and Lizhi Cheng arxiv:1209.0652v2 [cs.it] 18 Sep 2012 Abstract This paper shows that the solutions
More informationAdaptive L p (0 <p<1) Regularization: Oracle Property and Applications
Adaptive L p (0
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationDeterministic constructions of compressed sensing matrices
Journal of Complexity 23 (2007) 918 925 www.elsevier.com/locate/jco Deterministic constructions of compressed sensing matrices Ronald A. DeVore Department of Mathematics, University of South Carolina,
More informationarxiv: v1 [math.st] 8 Jan 2008
arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationStable Signal Recovery from Incomplete and Inaccurate Measurements
Stable Signal Recovery from Incomplete and Inaccurate Measurements EMMANUEL J. CANDÈS California Institute of Technology JUSTIN K. ROMBERG California Institute of Technology AND TERENCE TAO University
More informationEstimating LASSO Risk and Noise Level
Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu
More informationMinimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls
6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior
More informationStochastic Design Criteria in Linear Models
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework
More informationLimitations in Approximating RIP
Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing
More informationNoisy Signal Recovery via Iterative Reweighted L1-Minimization
Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationSparse recovery using sparse random matrices
Sparse recovery using sparse random matrices Radu Berinde MIT texel@mit.edu Piotr Indyk MIT indyk@mit.edu April 26, 2008 Abstract We consider the approximate sparse recovery problem, where the goal is
More informationGaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula
Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner formula Larry Goldstein, University of Southern California Nourdin GIoVAnNi Peccati Luxembourg University University British
More informationExact Low-rank Matrix Recovery via Nonconvex M p -Minimization
Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:
More informationUniversal low-rank matrix recovery from Pauli measurements
Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov
More informationRestricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu
More informationDASSO: connections between the Dantzig selector and lasso
J. R. Statist. Soc. B (2009) 71, Part 1, pp. 127 142 DASSO: connections between the Dantzig selector and lasso Gareth M. James, Peter Radchenko and Jinchi Lv University of Southern California, Los Angeles,
More informationTowards a Mathematical Theory of Super-resolution
Towards a Mathematical Theory of Super-resolution Carlos Fernandez-Granda www.stanford.edu/~cfgranda/ Information Theory Forum, Information Systems Laboratory, Stanford 10/18/2013 Acknowledgements This
More informationRui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China
Acta Mathematica Sinica, English Series May, 015, Vol. 31, No. 5, pp. 755 766 Published online: April 15, 015 DOI: 10.1007/s10114-015-434-4 Http://www.ActaMath.com Acta Mathematica Sinica, English Series
More informationLIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah
00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationOptimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators
Electronic Journal of Statistics ISSN: 935-7524 arxiv: arxiv:503.0388 Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Yuchen Zhang, Martin J. Wainwright
More informationThe Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective
Forty-Eighth Annual Allerton Conference Allerton House UIUC Illinois USA September 9 - October 1 010 The Stability of Low-Rank Matrix Reconstruction: a Constrained Singular Value Perspective Gongguo Tang
More informationComputable Performance Analysis of Sparsity Recovery with Applications
Computable Performance Analysis of Sparsity Recovery with Applications Arye Nehorai Preston M. Green Department of Electrical & Systems Engineering Washington University in St. Louis, USA European Signal
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationTHE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1
The Annals of Statistics 27, Vol. 35, No. 6, 2313 2351 DOI: 1.1214/953661523 Institute of Mathematical Statistics, 27 THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1 BY EMMANUEL
More information