Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit

Size: px
Start display at page:

Download "Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit"

Transcription

1 Sreejith Kallummil * Sheetal Kalyani * Abstract Orthogonal matching pursuit OMP) is a widely used algorithm for recovering sparse high dimensional vectors in linear regression models. The optimal performance of OMP requires a priori knowledge of either the sparsity of regression vector or noise statistics. Both these statistics are rarely known a priori and are very difficult to estimate. In this paper, we present a novel technique called residual ratio thresholding RRT) to operate OMP without any a priori knowledge of sparsity and noise statistics and establish finite sample and large sample support recovery guarantees for the same. Both analytical results and numerical simulations in real and synthetic data sets indicate that RRT has a performance comparable to OMP with a priori knowledge of sparsity and noise statistics.. Introduction This article deals with the estimation of the regression vector β R p in the linear regression model y = Xβ + w, where X R n p is a known design matrix with unit Euclidean norm columns, w is the noise vector and y is the observation vector. Throughout this article, we assume that the entries of the noise w are independent, zero mean and Gaussian distributed with variance σ. We consider the high dimensional and sample starved scenario of n < p or n p where classical techniques like ordinary least squares OLS) are no longer applicable. This problem of estimating high dimensional vectors in sample starved scenarios is ill-posed even in the absence of noise unless strong structural assumptions are made on X and β. A widely used and practically valid assumption is sparsity. The vector β R p is sparse if the support of β given by S = suppβ) = {k : β k } has cardinality k = cards) p. * Equal contribution Department of Electrical Engineering, IIT Madras, India Department of Electrical Engineering, IIT Madras, India. Correspondence to: Sreejith Kallummil <sreejith.k.venugopal@gmail.com>. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 8, 8. Copyright 8 by the authors). A number of algorithms like least absolute shrinkage and selection operator LASSO)Tropp, 6; Tibshirani, 996), Dantzig selector DS)Candes & Tao, 7), subspace pursuit SP)Dai & Milenkovic, 9), OMP Pati et al., 993; Mallat & Zhang, 993; Tropp, ; Cai & Wang, ), elastic net Zou & Hastie, 5) etc. are proposed to efficiently estimate β. Tuning the hyper parameters of aforementioned algorithms to achieve optimal performance require a priori knowledge of signal parameters like sparsity k or noise statistics like σ etc. Unfortunately, these parameters are rarely known a priori. To the best of our knowledge, no computationally efficient technique to estimate k is reported in open literature. However, limited success on the estimation of σ has been reported in literature Dicker, ; Fan et al., ; Dicker & Erdogdu, 6; Bayati et al., 3). However, the performance of these σ estimates when used for tuning hyper parameters in LASSO, DS, OMP etc. are largely unknown. Generalised techniques for hyper parameter selection like cross validation CV)Arlot et al., ), re-sampling Meinshausen & Bühlmann, ) etc. are computationally challenging. Further, CV is reported to have poor variable selection behaviourchichignoud et al., 6; Arlot et al., ). Indeed, algorithms that are oblivious to signal and noise statistics are also proposed in literature. This include algorithms inspired or related to LASSO like square root LASSOBelloni et al., ), AV Chichignoud et al., 6), approximate message passing Mousavi et al., 3; Bayati et al., 3) etc. and ridge regression inspired techniques like least squares adaptive thresholding LAT), ridge adaptive thresholding RAT)Wang et al., 6) etc. However, most of existing signal and noise statistics oblivious sparse recovery techniques have only large sample performance guarantees. Further, many of these techniques assume that design matrix X is sampled from a random ensemble, a condition which is rarely satisfied in practice... Contributions of this paper This article present a novel technique called residual ratio thresholding RRT) for finding a good estimate of support S from the data dependent/adaptive sequence of supports generated by OMP. RRT is analytically shown to accomplish exact support recovery, i.e., identifying S) under the same finite sample and deterministic constraints on X like

2 restricted isometry constants RIC) or mutual coherence required by OMP with a priori knowledge of k or σ. However, the signal to noise ratio SNR= Xβ /nσ ) required for support recovery using RRT is slightly higher than that of OMP with a priori knowledge of k or σ. This extra SNR requirement is shown to decrease with the increase in sample size n. RRT and OMP with a priori knowledge of k or σ are shown to be equivalent as n in terms of the SNR required for support recovery. RRT involves a tuning parameter α that can be set independent of ambient SNR or noise statistics. The hyper parameter α in RRT have an interesting semantic interpretation of being the high SNR upper bound on support recovery error. Also RRT is asymptotically tuning free in the sense that a very wide range of α deliver similar performances as n. Numerical simulations indicate that RRT can deliver a highly competitive performance when compared to OMP having a priori knowledge of k or σ, OMP with k estimated using CV and the recently proposed LAT algorithm. Further, RRT also delivered a highly competitive performance when applied to identify outliers in real data sets, an increasingly popular application of sparse estimation algorithmsmitra et al., ; 3). The remainder of this article is organised as follows. In section we discuss OMP algorithm. RRT algorithm is presented in Section 3. Section presents theoretical performance guarantees for RRT. Section 5 presents numerical simulation results. All the proofs are provided in the supplementary material... Notations used p x q = ) x k q q k= is the lq norm of x R p. n is the n zero vector and I n is the n n identity matrix. spanx) is the column space of X. X = X T X) X T is the Moore-Penrose pseudo inverse of X. X J denotes the sub-matrix of X formed using the columns indexed by J. N u, C) represents a Gaussian random vector R.V) with mean u and covariance matrix C. Ba, b) denotes a Beta R.V with parameters a and b. a b implies that a and b are identically distributed. [p] represents the floor operator. φ represents the null set. For any two sets J and J, J /J denotes the set difference. a P b represents the convergence of R.V a to R.V b in probability.. Orthogonal Matching Pursuit OMP) OMP Algorithm ) starts with a null support estimate and in each iteration it adds that column index to the current support which is the most correlated with the previous residual r k, i.e., t k = arg max X T j rk. Then a j LS estimate of β restricted to the current support Somp k is Algorithm Orthogonal matching pursuit Input: Observation y, matrix X Initialize S omp = φ. k = and residual r = y repeat Identify the next column t k = arg max X T j rk j Expand current support Somp k = Somp k t k Restricted LS estimate: ˆβ S k omp = X Sompy. k ˆβ {,...,p}/s k omp = p k. Update residual: r k = y X ˆβ = I n P k )y. Increment k k +. until stopping condition SC) is true Output: Support estimate Ŝ = Sk omp. Vector estimate ˆβ computed as an intermediate estimate of β and this estimate is used to update the residual. Note that P k in Algorithm refers to X S k omp X S k omp, the projection matrix onto spanx S k omp ). Since the residual r k is orthogonal to spanx S k omp ), X T j rk = for all j Somp. k Consequently, t k+ / Somp, k i.e., the same index will not be selected in two different iterations. Hence, Somp k+ Somp, k i.e. the support sequence is monotonically increasing. The monotonicity of Somp k in turn implies that the residual norm r k is a non increasing function of k, i.e, r k+ r k. Most of the theoretical properties of OMP are derived assuming a priori knowledge of true sparsity level k in which case OMP stops after exactly k iterationstropp, ; Wang, 5). When k is not known, one has to rely on stopping conditions SC) based on the properties of the residual r k as k varies. For example, one can stop OMP iterations once the residual power is too low compared to the expected noise power. Mathematically, when the noise w is l bounded, i.e., w ɛ for some a priori known ɛ, then OMP can be stopped if r k ɛ. For a Gaussian noise vector w N n, σ I n ), ɛ σ = σ n + n logn) satisfiescai & Wang, ) P w ɛ σ ) n, ) i.e., Gaussian noise is l bounded with a very high probability. Consequently, one can stop OMP iterations in Gaussian noise once r k ɛ σ. A number of deterministic recovery guarantees are proposed for OMP. Among these guarantees the conditions based on RIC are the most popular. RIC of order j denoted by δ j is defined as the smallest value of δ such that δ) b Xb + δ) b ) hold true for all b R p with b = cardsuppb)) j. A smaller value of δ j implies that X act as a near orthogonal

3 matrix for all j sparse vectors b. Such a situation is ideal for the recovery of a j-sparse vector b using any sparse recovery technique. The latest RIC based support recovery guarantee using OMP is given in Lemma Liu et al., 7). Lemma. OMP with k iterations or SC r k w can recover any k sparse vector β provided that δ k+ < / k + and w ɛ omp = β min δk+ k + δ k+ + δk. + k + δ k+ Since P w < ɛ σ ) /n when w N n, σ I n ), it follows from Lemma that OMP with k iterations or SC r k ɛ σ can recover any k -sparse vector β with probability greater than /n provided that δ k+ < / k + and ɛ σ ɛ omp. Lemma implies that OMP with a priori knowledge of k or σ can recover support S once the matrix satisfies the regularity condition δ k+ < / k + and the SNR is high. It is also known that this RIC condition is worst case necessary. Consequently, Lemma is one of the best deterministic guarantee for OMP available in literature. Note that the mutual incoherence condition given by µ X = max j k XT j X k < /k ) also ensures exact support recovery at high SNR. Note that the a priori knowledge of k or σ required to materialise the recovery guarantees in Lemma are not available in practical problems. Further, k and σ are very difficult to estimate. This motivates the proposed RRT algorithm which does not require a priori knowledge of k or σ. 3. Residual Ratio Thresholding RRT) RRT is a novel signal and noise statistics oblivious technique to estimate the support S based on the behaviour of the residual ratio statistic RRk) = r k / r k as k increases from k = to a predefined value k = k max > k. As aforementioned, identifying the support using the behaviour of r k requires a priori knowledge of σ. However, as we will show in this section, support detection using RRk) does not require a priori knowledge of σ. Since the residual norms are non negative and non increasing, RRk) always satisfy RRk). 3.. Minimal superset and implications Consider running k max > k iterations of OMP and let {Somp} k kmax k= be the support sequence generated by OMP. Recall that Somp k is monotonically increasing. Definition :- The minimal superset in the OMP support sequence {Somp} k kmax k= is given by Somp kmin, where k min = min{k : S Somp}). k When the set {k : S Somp} k = φ, we set k min = and Somp kmin = φ. In words, minimal superset is the smallest superset of support S present in a particular realization of the support estimate sequence {Somp} k kmax k=. Note that both k min and Somp kmin are unobservable random variables. Since cardsomp) k = k, Somp k for k < k cannot satisfy S Somp k and hence k min k. Further, the monotonicity of Somp k implies that S Somp k for all k k min. Case :- When k min = k, then Somp k = S and Somp k S for k k, i.e., S is present in the solution path. Further, when k min = k, it is true that Somp k S for k k. Case :- When k < k min k max, then Somp k S for all k and S omp k S for k k min, i.e., S is not present in the solution path. However, a superset of S is present. Case 3:- When k min =, then Somp k S for all k, i.e., neither S nor a superset of S is present in {Somp} k kmax k=. To summarize, exact support recovery using any OMP based scheme including the signal and noise statistics aware schemes is possible only if k min = k. Whenever k min > k, it is possible to estimate true support S without having any false negatives. However, one then has to suffer from false positives. When k min =, any support in {Somp} k kmax k= has to suffer from false negatives and all supports Somp k for k > k has to suffer from false positives also. Note that the matrix and SNR conditions required for exact support recovery in Lemma automatically implies that k min = k. We formulate the proposed RRT scheme assuming that k min = k. 3.. Behaviour of RRk ) Next we consider the behaviour of residual ratio statistic at the k iteration, i.e., RRk ) = r k / r k under the assumption that w ɛ omp and δ k+ < / k + which ensures k min = k and S k omp S for all k k. Since Xβ = X S β S spanx S ), I n P k )Xβ n if S S k omp and I n P k )Xβ = n if S S k omp. This along with the monotonicity of S k omp implies the following. I n P k )Xβ n for k < k min = k and I n P k )Xβ = n for k k min = k. Thus r k = I n P k )y = I n P k )X S β S + I n P k )w for k < k min = k, whereas, r k = I n P k )w for k k min = k. Consequently, at k = k, the numerator r k of RRk ) contains contribution only from the noise term I n P k )w, whereas, the denominator r k in RRk ) contain contributions from both the signal term i.e., I n P k )X S β S and the noise term I n P k )w. This behaviour of RRk ) along with the fact that w P as σ implies the following theorem. Theorem. Assume that the matrix X satisfies the RIC constraint δ k+ < / k + and k max > k. Then a). RRk min ) P as σ. b). lim σ Pk min = k ) =.

4 Algorithm Residual ratio thresholding Input: Observation y, matrix X Step : Run k max iterations of OMP. Step : Compute RRk) for k =,..., k max. Step 3: Estimate k RRT = max{k : RRk) Γ α RRT k)} Output: Support estimate Ŝ = Sk RRT omp. Vector estimate ˆβS k RRT omp ) = X y, ˆβ{,..., p}/s k S k RRT RRT omp ) = omp p krrt Behaviour of RRk) for k > k min Next we discuss the behaviour of RRk) for k > k min. By the definition of k min we have S Somp k which implies that r k = I n P k )w for k k min. The absence of signal terms in numerator and the denominator of RRk) = In P k)w I n P k )w for k > k min implies that even when w or σ, RRk) for k > k min does not converge to zero. This behaviour of RRk) for k > k min is captured in Theorem where we provide explicit σ or SNR independent lower bounds on RRk) for k > k min. Theorem. Let F a,b x) denotes the cumulative distribution function of a Ba, b) random variable. Then σ >, ) Γ α RRT k) = F α n k satisfies,.5 k max p k + ) PRRk) > Γ α RRT k), k > k min ) α. 3) Theorem states that the residual ratio statistic RRk) for k > k min is lower bounded by the deterministic sequence {Γ α RRT k)}kmax k=k min+ with a high probability for small values of α). Please note that k min is itself a R.V. Note that the sequence Γ α RRT k) is dependent only on the matrix dimensions n and p. Further, Theorem does not make any assumptions on the noise variance σ or the design matrix X. Theorem is extremely non trivial considering the fact that the support estimate sequence {Somp} k kmax k= produced by OMP is adaptive and data dependent. Lemma. The following important properties of Γ α RRT k) are direct consequences of the monotonicity of CDF and the fact that a Beta R.V take values only in [, ]. ). Γ α RRT k) is defined only in the interval α [, k max p k + )]. ). Γ α RRT k). 3). Γ α RRT k) is a monotonically increasing function of α. ). Γ α RRT k) = when α = and Γα RRT k) = when α = k max p k + ). 3.. Residual ratio thresholding framework From Theorem, it is clear that Pk min = k ) and PS omp k = S) increases with increasing SNR or decreasing σ ), whereas, RRk min ) decreases to zero with increasing SNR. At the same time, for small values of α like α =., RRk) for k > k min is lower bounded by Γ α RRT k) with a very high probability at all SNR. Hence, finding the last index k such that RRk) Γ α RRT k), i.e., k RRT = max{k : RRk) Γ α RRT k)} gives k and equivalently Somp k = S with a probability increasing with increasing SNR. This motivates the proposed signal and noise statistics oblivious RRT algorithm presented in Algorithm. Remark. An important aspect regarding the RRT in Algorithm is the choice of k RRT when the set {k : RRk) Γ α RRT k)} = φ. This situation happens only at very low SNR. When {k : RRk) Γ α RRT k)} = φ for a given value of α, we increase the value of α to the smallest value α new > α such that {k : RRk) Γ αnew RRT k)} φ. Mathematically, we set k RRT = max{k : RRk) < Γ αnew RRT k)}, where α new = min {a : {k : RRk) a>α Γα RRT k)} = φ}. Since α = p k max gives Γ α RRT ) = and RR), a value of α new pk max always exists. α new can be easily computed by first pre-computing {Γ a RRT k)}kmax k= for say prefixed values of a in the interval α, pk max ]. Remark. RRT requires performing k max iterations of OMP. All the quantities required for RRT including RRk) and the final estimates can be computed while performing these k max iterations itself. Consequently, RRT has complexity Ok max np). As we will see later, a good choice of k max is k max = [.5n + )] which results in a complexity order On p). This complexity is approximately n/k times higher than the Onpk ) complexity of OMP when k or σ are known a priori. This is the computational cost being paid for not knowing k or σ a priori. In contrast, L fold CV requires running /L)n iterations of OMP L times resulting in a OL /L)n p) = OLn p) complexity, i.e., RRT is L times computationally less complex than CV. Remark 3. RRT algorithm is developed only assuming that the support sequence generated by the sparse recovery algorithm is monotonically increasing. Apart from OMP, algorithms such as orthogonal least squareswen et al., 7) and OMP with thresholdingyang & de Hoog, 5) also produce monotonic support sequences. RRT principle can be directly applied to operate these algorithms in a signal and noise statistics oblivious fashion.. Analytical Results for RRT In this section we present support recovery guarantees for RRT and compare it with the results available for OMP with a priori knowledge of k or σ. The first result in this section deals with the finite sample and finite SNR performance for RRT.

5 Theorem 3. Let k max k and suppose that the matrix X satisfies δ k+ < k+. Then RRT can recover the true support S with probability greater than /n α provided that ɛ σ < minɛ omp, ɛ rrt ), where ɛ rrt = Γα RRT k ) δ k β min + Γ α RRT k. ) ) Theorem 3 implies that RRT can identify the support S at a higher SNR or lower noise level than that required by OMP with a priori knowledge of k and σ. For small values of α like α =., the probability of exact support recovery, i.e., α /n is similar to that of the /n probability of exact support recovery in Lemma. Also please note that the RRT framework does not impose any extra conditions on the design matrix X. Consequently, the only appreciable difference between RRT and OMP with a priori knowledge of k and σ is in the extra SNR required by RRT which is quantified next using the metric ɛ extra = ɛ omp /ɛ rrt. Note that the larger the value of ɛ extra, larger should be the SNR or equivalently smaller should be the noise level required for RRT to accomplish exact support recovery. Substituting the values of ɛ omp and ɛ rrt and using the bound δ k δ k+ gives ɛ extra + Γ α RRT k). 5) Note that. Consequently, δ k + k +δ k + ɛ extra.5 + = δ k + k +δ k + δ k + k +δ k + ) +δk + δ k + ) + Γ α RRT k. 6) ) Since Γ α RRT k ), it follows that.5 + is always greater than or equal to one. Γ α RRT k) ) However, ɛ extra decreases with the increase in Γ α RRT k ). In particular, when Γ α RRT k ) =, there is no extra SNR requirement. Remark. RRT algorithm involves two hyper parameters viz. k max and α. Exact support recovery using RRT requires only that k max k. However, k is an unknown quantity. In our numerical simulations, we set k max = minp, [.5rankX) + )]). This choice is motivated by the facts that k < [.5rankX)+)] is a necessary condition for exact support recovery using any sparse estimation algorithmelad, ) when n < p and minn, p) is the maximum possible number of iterations in OMP. Since evaluating rankx) requires extra computations, one can always use rankx) n to set k max = minp, [.5n+)]). Please note that this choice of k max is independent of the operating SNR, design matrix and the vector to be estimated and the user is not required to tune this parameter. Hence, α is the only user specified hyper parameter in RRT algorithm... Large sample behaviour of RRT Next we discuss the behaviour of RRT as n. From 6), it is clear that the extra SNR required for support recovery using RRT decreases with increasing Γ α RRT k ). However, by Lemma increasing Γ α RRT k ) requires an increase in the value of α. However, increasing α decreases the probability of support recovery given by α /n. In other words, one cannot have exact support recovery using RRT at lower SNR without increasing the probability of error in the process. An answer to this conundrum is available in the large sample regime where it is possible to achieve both α and Γ α RRT k ), i.e., no extra SNR requirement and no decrease in probability of support recovery. The following theorem states the conditions required for Γ α RRT k ) for large values of n. Theorem. Define k lim = lim k /n, p lim = n lim logp)/n and α lim = lim logα)/n. Let n n k max = minp, [.5n + )]). Then Γ α RRT k ) = ) F α n k satisfies the following,.5 k max p k + ) asymptotic limits. Case :-). lim n Γα RRT k ) =, whenever k lim <.5, p lim = and α lim =. Case :-). < lim n Γα RRT k ) < if k lim <.5, α lim = and p lim >. In particular, lim n Γα RRT k ) = exp p lim k lim ). Case 3:- lim n Γα RRT k ) = if k lim <.5, α lim = and p lim =. Theorem states that all choices of n, p, k ) satisfying p lim = and k lim <.5 can result in lim n Γα RRT k ) = provided that the parameter α satisfies α lim =. Note that α lim = for a wide variety of α including α = constant, α = /n δ for some δ >, α = / logn) etc. It is interesting to see which n, p, k ) scenario gives p lim = and k lim <.5. Note that exact recovery in n < p scenario is possible only if k [.5n + )]. Thus, the assumption k lim <.5 will be satisfied in all interesting problem scenarios. Regime :- lim n Γα RRT k ) = in low dimensional regression problems with p fixed and n or all n, p), ) with lim n p/n. Regime :- lim n Γα RRT k ) = in high dimensional case with p increases sub exponentially with n as expn δ ) for some δ < or p increases polynomially w.r.t n, i.e., p = n δ for some δ >. In both cases, p lim = lim n lognδ )/n = and p lim = lim n logexpnδ ))/n =. Regime 3:- lim n Γα RRT k ) = in the extreme high dimensional case where n, p, k ),, ) satisfy-

6 ing n ck logp) for some constant c >. Here p lim = lim logp)/n lim = and k lim = n n ck /c logp) =. Note that the sampling regime lim n n k logp) is the best known asymptotic guarantee available for OMPFletcher & Rangan, ). Regime :- Consider a sampling regime where n, p), ) such that k is fixed and n = ck logp), i.e., p is exponentially increasing with n. Here p lim = /ck ) and ck ) < k lim =. Consequently, lim n Γα RRT k ) = exp. A good example of this sampling regime is Tropp & Gilbert, 7) where it was shown that OMP can recover a not every) particular k dimensional signal from n random measurements in noiseless case) when n = ck logp). Note that c for all k and c for large k. Even if we assume that only n = k logp) measurements are sufficient for recovering a k sparse signal, we have lim n Γα RRT k ) = exp.5) =.95 for k = 5 i.e., ɛ extra.57) and lim n Γα RRT k ) = exp.5) =.9753 for k = i.e.,ɛ extra.7). Note that Γ α RRT k ) as n implies that ɛ extra and minɛ omp, ɛ rrt ). This asymptotic behaviour of Γ α RRT k ) and ɛ extra imply the large sample consistency of RRT as stated in the following theorem. Theorem 5. Suppose that the sample size n such that the matrix X satisfies δ k+ <, ɛ k+ σ ɛ omp and p lim =. Then, a). OMP running k iterations and OMP with SC r k ɛ σ are large sample consistent, i.e.. lim PŜ = S) =. n b). RRT with hyper parameter α satisfying lim n α = and α lim = is also large sample consistent. Theorem 5 implies that at large sample sizes, RRT can accomplish exact support recovery under the same SNR and matrix conditions required by OMP with a priori knowledge of k or σ. Theorem 5 has a very important corollary. Remark 5. Theorem implies that all choices of α satisfying α and α lim = deliver similar performances as n. Note that the range of adaptations satisfying α and α lim = include α = / logn), α = /n δ for δ > etc. Since a very wide range of tuning parameters deliver similar results as n, RRT is in fact asymptotically tuning free. Remark 6. Based on the large sample analysis of RRT, one can make the following guidelines on the choice of α. When the sample size n is large, one can choose α as a function of n that satisfies both lim α = and α lim =. Also since n the support recovery guarantees are of the form /n α, it does not make sense to choose a value of α that decays to zero faster than /n. Hence, it is preferable to choose values of α that decreases to zero slower than /n like α = / logn), α = / n etc... A high SNR operational interpretation of α Having discussed the large sample behaviour of RRT, we next discuss the finite sample and high SNR behaviour of RRT. Define the events support recovery error E = {Ŝ S} and false positive F = cardŝ/s) > and missed discovery or false negative M = cards/ŝ) >. The following theorem characterizes the likelihood of these events as SNR increases to infinity or σ. Theorem 6. Let k max > k and the matrix X satisfies δ k+ < / k +. Then, a). lim PM) =. σ b). lim PE) = lim PF) α. σ σ Theorem 6 states that when the matrix X allows for exact support recovery in the noiseless or low noise situation, RRT will not suffer from missed discoveries. Under such favourable conditions, α is a high SNR upper bound on both the probability of error and the probability of false positives. Please note that such explicit characterization of hyper parameters are not available for hyper parameters in Square root LASSO, RAT, LAT etc. 5. Numerical Simulations In this section, we provide extensive numerical simulations comparing the performance of RRT with state of art sparse recovery techniques. In particular, we compare the performance of RRT with OMP with k estimated using five fold CV and the least squares adaptive thresholding LAT) proposed in Wang et al., 6). In synthetic data sets, we also compare RRT with OMP running exactly k iterations and OMP with SC r k σ n + n logn)cai & Wang, ). These algorithms are denoted in Figures - by CV, LAT, OMP and OMP respectively. RRT and RRT represent RRT with parameter α set to α = / logn) and α = / n respectively. By Theorem 5, RRT and RRT are large sample consistent. 5.. Synthetic data sets The synthetic data sets are generated as follows. We consider two models for the matrix X. Model sample each entry of the design matrix X R n p independently according to N, ). Matrix X in Model is formed by concatenating I n with a n n Hadamard matrix H n, i.e., X = [I n, H n ]. This matrix guarantee exact support recovery using OMP at high SNR once k < + n Elad, ). The columns of X in both models are normalised to have unit l -norm. Based on the choice of X and support S, we conduct experiments. Experiments - involve matrix of model with n, p) given

7 l error 8 False discoveries Missed discoveries OMP OMP RRT RRT LAT CV OMP OMP RRT RRT LAT CV.. OMP OMP RRT RRT LAT CV Figure. Experiment : Box plots of l error ˆβ β left), false positives middle) and false negatives right). l error False discoveries Missed discoveries OMP OMP RRT RRT LAT CV.5 OMP OMP RRT RRT LAT CV.. OMP OMP RRT RRT LAT CV Figure. Experiment : Box plots of l error ˆβ β left), false positives middle) and false negatives right). l error False discoveries Missed discoveries OMP OMP RRT RRT LAT CV OMP OMP RRT RRT LAT CV OMP OMP RRT RRT LAT CV Figure 3. Experiment 3: Box plots of l error ˆβ β left), false positives middle) and false negatives right). l error 5 False discoveries 6 Missed discoveries OMP OMP RRT RRT LAT CV 5 OMP OMP RRT RRT LAT CV OMP OMP RRT RRT LAT CV Figure. Experiment : Box plots of l error ˆβ β left), false positives middle) and false negatives right). Data Set Outliers reported in literature RRT CV LAT Stack Loss, 3,,, 3,,, 3,,, n = and p = Rousseeuw & Leroy, 5) plus observations including intercept Rousseeuw & Leroy, 5) AR 9,, 3, 3, 38, 7 9,,, 3 9,, 3, 3, 38, 7 9,, n = 6 and p = 3 3, 38, 7, 5 plus observations 3, 3, 38 Atkinson & Riani, ) Atkinson & Riani, ) 7, 5 Brain Body Weight, 6,, 6, 7, 5, 6, 6, 5, 6, 6, 5, 6, 6, 5 n = 7 and p = Rousseeuw & Leroy, 5) Rousseeuw & Leroy, 5) Stars,, 3, 3,, 3, 3,, 3, 3,, 3, 3 n = 7 and p = plus 3 observations Rousseeuw & Leroy, 5) Rousseeuw & Leroy, 5) Table. Outliers detected by various algorithms. RRT with both α = / logn) and α = / n delivered similar results. Existing results on Stack loss, Brain and Body weight and Stars data set are based on the combinatorially complex least median of squares LMedS) algorithm. Existing results on AR are based on extensive graphical analysis.

8 by, 3) and, 9) respectively with support S sampled randomly from the set {,..., p}. Experiment 3 and involve matrix of model with n = 8, p = 56). For experiment 3, support S is sampled randomly from the set {,..., p}, whereas, in experiment, support S is fixed at {,,..., k }. The noise w is sampled according to N n, σ I n ) with σ =. The non zero entries of β are randomly assigned β j = ±. Subsequently, these entries are scaled to achieve SNR = Xβ /n = 3. The number of non zero entries k in all experiments are fixed at six. We compare the algorithms in terms of the l error, the number of false positives and the number of false negatives produced in runs of each experiment. From the box plots given in Figures -, it is clear that RRT with both values of α perform very similar to OMP. They differ only in one run of experiment 3 where RRT and RRT suffer from a false negative. Further, RRT and RRT outperform CV and LAT in all the four experiments in terms of all the three metrics considered for evaluation. This is primarily because LAT and CV are more prone to make false positives, whereas RRT and RRT does not report any false positives. OMP consistently made false negatives which explains its poor performance in terms of l error. We have observed that once the SNR is made slightly higher, OMP delivers a performance similar to OMP. Also note that RRT with two significantly different choices of α viz. α = / n and α = / logn) delivered similar performances. This observation is in agreement with the claim of asymptotic tuning freeness made in Remark 5. Similar trends are also visible in the simulation results presented in supplementary materials. 5.. Outlier detection in real data sets We next consider the application of sparse estimation techniques including RRT to identify outliers in low dimensional or full column rank i.e., n > p) real life data sets, an approach first considered in Mitra et al., ; 3). Consider a robust regression model of the form y = Xβ + w + g out with usual interpretations for X, β and w. The extra term g out R n represents the gross errors in the regression model that cannot be modelled using the distributional assumptions on w. Outlier detection problem in linear regression refers to the identification of the support S g = suppg out ). Since X has full rank, one can always annihilate the signal component Xβ by projecting onto a subspace orthogonal to spanx). This will result in a simple linear regression model of the form given by ỹ = I n XX )y = I n XX )g out + I n XX )w, 7) i.e., identifying S g in robust regression is equivalent to a sparse support identification problem in linear regression. Even though this is a regression problem with n observations and n variables, the design matrix I n XX ) in 7) is rank deficient i.e., ranki n XX ) = n rankx) < n). Hence, classical techniques based on LS are not useful for identifying S g. Since cards g ) and variance of w are unknown, we only consider the application of RRT, OMP with CV and LAT in detecting S g. We consider four widely studied real life data sets and compare the outliers identified by these algorithms with the existing and widely replicated studies on these data sets. More details on these data sets are given in the supplementary materials. The outliers detected by the aforementioned algorithms and outliers reported in existing literature are tabulated in TABLE. Among the four data sets considered, outliers detected by RRT and existing results are in consensus in two data sets viz. Stack loss and Stars data sets. In AR data set, RRT identifies all the outliers. However, RRT also include observations and 5 as outliers. These identifications can be potential false positives. In Brain and Body Weight data set, RRT agrees with the existing results in observations. However, RRT misses two observations viz. and 7 which are claimed to be outliers by existing results. LAT agrees with RRT in all data sets except the stack loss data set where it missed outlier indices and 3. CV correctly identified all the outliers identified by other algorithms in all four data sets. However, it made lot of false positives in three data sets. To summarize, among all the three algorithms considered, RRT delivered an outlier detection performance which is the most similar to the results reported in literature. 6. Conclusions This article proposed a novel signal and noise statistics independent sparse recovery technique based on OMP called residual ratio thresholding and derived finite and large sample guarantees for the same. Numerical simulations in real and synthetic data sets demonstrates a highly competitive performance of RRT when compared to OMP with a priori knowledge of signal and noise statistics. The RRT technique developed in this article can be used to operate sparse recovery techniques that produce a monotonic sequence of support estimates in a signal and noise statistics oblivious fashion. However, the support estimate sequence generated by algorithms like LASSO, DS, SP etc. are not monotonic in nature. Hence, extending the concept of RRT to operate sparse estimation techniques that produce non monotonic support sequence in a signal and noise statistics oblivious fashion is an interesting direction of future research.

9 References Arlot, S., Celisse, A., et al. A survey of cross-validation procedures for model selection. Statistics surveys, : 79,. Atkinson, A. and Riani, M. Robust diagnostic regression analysis. Springer Science & Business Media,. Bayati, M., Erdogdu, M. A., and Montanari, A. Estimating lasso risk and noise level. In Advances in Neural Information Processing Systems, pp. 9 95, 3. Belloni, A., Chernozhukov, V., and Wang, L. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98):79 86,. Cai, T. T. and Wang, L. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information theory, 577):68 688,. Candes, E. and Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, pp , 7. Chichignoud, M., Lederer, J., and Wainwright, M. J. A practical scheme and fast algorithm to tune the lasso with optimality guarantees. Journal of Machine Learning Research, 73):, 6. Dai, W. and Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Transactions on Information Theory, 555):3 9, May 9. Dicker, L. H. Variance estimation in high-dimensional linear models. Biometrika, ):69 8,. Dicker, L. H. and Erdogdu, M. A. Maximum likelihood for variance estimation in high-dimensional linear models. In Artificial Intelligence and Statistics, pp , 6. Elad, M. Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer,. Fan, J., Guo, S., and Hao, N. Variance estimation using refitted cross-validation in ultra high dimensional regression. Journal of the Royal Statistical Society: Series B Statistical Methodology), 7):37 65,. Fletcher, A. K. and Rangan, S. Orthogonal matching pursuit: A Brownian motion analysis. IEEE Transactions on Signal Processing, 63):, March. Liu, C., Fang, Y., and Liu, J. Some new results about sufficient conditions for exact support recovery of sparse signals via orthogonal matching pursuit. IEEE Transactions on Signal Processing, 657):5 5, Sept 7. Mallat, S. G. and Zhang, Z. Matching pursuits with timefrequency dictionaries. IEEE Transactions on signal processing, ): , 993. Meinshausen, N. and Bühlmann, P. Stability selection. Journal of the Royal Statistical Society: Series B Statistical Methodology), 7):7 73,. Mitra, K., Veeraraghavan, A., and Chellappa, R. Robust regression using sparse learning for high dimensional parameter estimation problems. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp , March. Mitra, K., Veeraraghavan, A., and Chellappa, R. Analysis of sparse regularization based robust regression approaches. IEEE Transactions on Signal Processing, 65):9 57, March 3. Mousavi, A., Maleki, A., and Baraniuk, R. G. Parameterless optimal approximate message passing. arxiv preprint arxiv:3.35, 3. Pati, Y. C., Rezaiifar, R., and Krishnaprasad, P. S. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Signals, Systems and Computers, Conference Record of The Twenty-Seventh Asilomar Conference on, pp.. IEEE, 993. Rousseeuw, P. J. and Leroy, A. M. Robust regression and outlier detection, volume 589. John wiley & sons, 5. Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B Methodological), pp , 996. Tropp, J. A. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information theory, 5):3,. Tropp, J. A. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory, 53):3 5, 6. Tropp, J. A. and Gilbert, A. C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53): , 7. Wang, J. Support recovery with orthogonal matching pursuit in the presence of noise. IEEE Transactions on Information Theory, 63): , Nov 5. Wang, X., Dunson, D., and Leng, C. No penalty no tears: Least squares in high-dimensional linear models. In International Conference on Machine Learning, pp. 8 8, 6.

10 Wen, J., Wang, J., and Zhang, Q. Nearly optimal bounds for orthogonal least squares. IEEE Transactions on Signal Processing, 65): , 7. Yang, M. and de Hoog, F. Orthogonal matching pursuit with thresholding and its application in compressive sensing. IEEE Transactions on Signal Processing, 63): , 5. Zou, H. and Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B Statistical Methodology), 67):3 3, 5.

arxiv: v1 [stat.ml] 2 Jun 2018

arxiv: v1 [stat.ml] 2 Jun 2018 Sreejith Kallummil Sheetal Kalyani arxiv:86.65v [stat.ml] Jun 8 Abstract Orthogonal matching pursuit (OMP) is a widely used algorithm for recovering sparse high dimensional vectors in linear regression

More information

Noise Statistics Oblivious GARD For Robust Regression With Sparse Outliers

Noise Statistics Oblivious GARD For Robust Regression With Sparse Outliers 1 Noise Statistics Oblivious For Robust Regression With Sparse Outliers Sreejith Kallummil, Sheetal Kalyani Department of Electrical Engineering Indian Institute of Technology Madras Chennai, India 600036

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery

Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery : Distribution-Oblivious Support Recovery Yudong Chen Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 7872 Constantine Caramanis Department of Electrical

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Estimating LASSO Risk and Noise Level

Estimating LASSO Risk and Noise Level Estimating LASSO Risk and Noise Level Mohsen Bayati Stanford University bayati@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Andrea Montanari Stanford University montanar@stanford.edu

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Recovery Guarantees for Rank Aware Pursuits

Recovery Guarantees for Rank Aware Pursuits BLANCHARD AND DAVIES: RECOVERY GUARANTEES FOR RANK AWARE PURSUITS 1 Recovery Guarantees for Rank Aware Pursuits Jeffrey D. Blanchard and Mike E. Davies Abstract This paper considers sufficient conditions

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Stability of LS-CS-residual and modified-cs for sparse signal sequence reconstruction

Stability of LS-CS-residual and modified-cs for sparse signal sequence reconstruction Stability of LS-CS-residual and modified-cs for sparse signal sequence reconstruction Namrata Vaswani ECE Dept., Iowa State University, Ames, IA 50011, USA, Email: namrata@iastate.edu Abstract In this

More information

A simple test to check the optimality of sparse signal approximations

A simple test to check the optimality of sparse signal approximations A simple test to check the optimality of sparse signal approximations Rémi Gribonval, Rosa Maria Figueras I Ventura, Pierre Vergheynst To cite this version: Rémi Gribonval, Rosa Maria Figueras I Ventura,

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares Recovery of Sparse Signals Using Multiple Orthogonal east Squares Jian Wang, Ping i Department of Statistics and Biostatistics arxiv:40.505v [stat.me] 9 Oct 04 Department of Computer Science Rutgers University

More information

MATCHING PURSUIT WITH STOCHASTIC SELECTION

MATCHING PURSUIT WITH STOCHASTIC SELECTION 2th European Signal Processing Conference (EUSIPCO 22) Bucharest, Romania, August 27-3, 22 MATCHING PURSUIT WITH STOCHASTIC SELECTION Thomas Peel, Valentin Emiya, Liva Ralaivola Aix-Marseille Université

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Acommon problem in signal processing is to estimate an

Acommon problem in signal processing is to estimate an 5758 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Necessary and Sufficient Conditions for Sparsity Pattern Recovery Alyson K. Fletcher, Member, IEEE, Sundeep Rangan, and Vivek

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property : An iterative algorithm for sparse recovery with restricted isometry property Rahul Garg grahul@us.ibm.com Rohit Khandekar rohitk@us.ibm.com IBM T. J. Watson Research Center, 0 Kitchawan Road, Route 34,

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

A Comparative Framework for Preconditioned Lasso Algorithms

A Comparative Framework for Preconditioned Lasso Algorithms A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery

Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Leveraging Machine Learning for High-Resolution Restoration of Satellite Imagery Daniel L. Pimentel-Alarcón, Ashish Tiwari Georgia State University, Atlanta, GA Douglas A. Hope Hope Scientific Renaissance

More information

Sparse Approximation of Signals with Highly Coherent Dictionaries

Sparse Approximation of Signals with Highly Coherent Dictionaries Sparse Approximation of Signals with Highly Coherent Dictionaries Bishnu P. Lamichhane and Laura Rebollo-Neira b.p.lamichhane@aston.ac.uk, rebollol@aston.ac.uk Support from EPSRC (EP/D062632/1) is acknowledged

More information

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING Yannick Morvan, Dirk Farin University of Technology Eindhoven 5600 MB Eindhoven, The Netherlands email: {y.morvan;d.s.farin}@tue.nl Peter

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

A Generalized Restricted Isometry Property

A Generalized Restricted Isometry Property 1 A Generalized Restricted Isometry Property Jarvis Haupt and Robert Nowak Department of Electrical and Computer Engineering, University of Wisconsin Madison University of Wisconsin Technical Report ECE-07-1

More information

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 12, DECEMBER 2008 2009 Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation Yuanqing Li, Member, IEEE, Andrzej Cichocki,

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit

Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit Scalable Sparse Subspace Clustering by Orthogonal Matching Pursuit Chong You, Daniel P. Robinson, René Vidal Johns Hopkins University, Baltimore, MD, 21218, USA Abstract Subspace clustering methods based

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

ORTHOGONAL matching pursuit (OMP) is the canonical

ORTHOGONAL matching pursuit (OMP) is the canonical IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010 4395 Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property Mark A. Davenport, Member, IEEE, and Michael

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information