Online Adaptive Estimation of Sparse Signals: where RLS meets the l 1 -norm

Size: px
Start display at page:

Download "Online Adaptive Estimation of Sparse Signals: where RLS meets the l 1 -norm"

Transcription

1 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) Online Adaptive Estimation of Sparse Signals: where meets the l -norm Daniele Angelosante, Member, IEEE, Juan-Andres Bazerque, Student Member, IEEE Georgios B. Giannakis, Fellow, IEEE Abstract Using the l -norm to regularize the least-squares criterion, the batch least-absolute shrinkage and selection operator (Lasso) has well-documented merits for estimating sparse signals of interest emerging in various applications where observations adhere to parsimonious linear regression models. To cope with high complexity, increasing memory requirements, and lack of tracking capability that batch Lasso estimators face when processing observations sequentially, the present paper develops a novel -weighted Lasso (TWL) approach. Performance analysis reveals that TWL cannot estimate consistently the desired signal support without compromising rate of convergence. This motivates the development of a - and norm-weighted Lasso (TWL) scheme with l -norm weights obtained from the recursive least-squares () algorithm. The resultant algorithm consistently estimates the support of sparse signals without reducing the convergence rate. To cope with sparsity-aware recursive real- processing, novel adaptive algorithms are also developed to enable online coordinate descent solvers of TWL and TWL that provably converge to the true sparse signal in the -invariant case. Simulated tests compare competing alternatives and corroborate the performance of the novel algorithms in estimating -invariant signals, and tracking -varying signals under sparsity constraints. I. ITRODUCTIO Sparsity is an attribute present in a plethora of natural as well as man-made signals and systems. This is reasonable not only because nature itself is parsimonious but also because processing and simple models with minimal degrees of freedom are attractive from an implementation perspective. Exploitation of sparsity is critical in applications as diverse as variable selection in linear regression models for diabetes [24], image compression [5], distributed spectrum sensing for cognitive radios [3], estimation of wireless multipath channels [9], [23], and signal decomposition using overcomplete bases [8]. The notion of variable selection (VS) associated with sparse linear regression [24] is the cornerstone of the emerging area of compressive sampling (CS) [4], [5], [8]. VS is a combinatorially complex task closely related (but not identical) to the well-known model order selection problem tackled through Akaike s information [], Bayesian information [20], and risk inflation [] criteria. A typically convex function of Submitted September 2, Revised January 20, 200. Accepted March 9, 200. Parts of this paper were presented at the IEEE Intl. Conf. on Acoust., Speech and Signal Proc., Taipei, Taiwan, April 2009; and at the IEEE Workshop on Stat. Signal Proc., Cardiff, Wales, UK, August The authors are with the Dept. of Electrical and Computer Engr. at the Univ. of Minnesota, Minneapolis, M 55455, USA; ( s: {angel29, bazer002, georgios}@umn.edu) the model fitting error is penalized with the l 0 -norm of the unknown vector which equals the number of nonzero entries, and thus accounts for model complexity (degrees of freedom). To bypass the non-convexity of the l 0 -norm, VS and CS approaches replace it with convex penalty terms (e.g., the l - norm) that capture sparsity but also lead to computationally efficient solvers. Research on CS and VS has concentrated on batch processing, and various algorithms for sparse linear regression are available. Those include the basis pursuit and Lasso [8], [24], the Dantzig selector [6], and the l 2 -norm constrained l -norm minimizer [4]. CS and VS estimators are nonlinear functions of the available observations which they process in a batch form using iterative algorithms. However, many sparse signals encountered in practice must be estimated based on noisy observations that become available sequentially in. For such cases, batch signal estimators typically incur complexity and memory requirements that grow as progresses. In addition, the sparse signal may vary with both in its nonzero support, as well as in the values of its nonzero entries. To cope with these challenges, the present paper develops adaptive algorithms for recursive estimation and tracking of (possibly -varying) sparse signals based on noisy sequential observations adhering to a linear regression model. Adaptive estimation of sparse signals has received little consideration so far. Sequential noise-free signal recovery was considered in [8], and a sparsity-aware least mean-square (LMS) algorithm was pursued in [4]. Sparsity-aware like algorithms are reported in [2], while an effort to combine Kalman filtering and compressed sensing can be found in [26]. After preliminaries, the problem is stated in Section II. Section III deals with two pseudo-real adaptive Lasso algorithms: the -weighted Lasso (TWL), and the and norm-weighted Lasso (TWL). The novel TWL replaces the l 2 -norm regularizing term of the algorithm with the l -norm that encourages sparsity. It turns out that if the sparse signal vector is -invariant, the TWL cannot estimate the signal support consistently while at the same ensuring convergence comparable to. To overcome this shortcoming, the TWL introduces a weighted l -norm regularization, where the weights are obtained from the algorithm. TWL estimates consistently the support of the sparse signal vector at converge rate identical to that of. Since TWL and TWL estimates are not available in closed form, a sequence of convex programs has to be solved as new data are acquired, which is not desirable for real- applications. Recent advances in sparse linear regression have

2 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 2 showed that the coordinate descent approach provides an efficient means of solving Lasso-like problems since it is fast and numerically stable [3], [27]. This motivates the present paper s novel, low-complexity online coordinate descent algorithms developed in Section IV. Partial coordinate-wise updates have been considered for adaptive but sparsity-agnostic processing to lower the computational complexity of LMS and [2], [28]. On top of lowering the computational burden, online coordinate descent is well motivated for sparsity-aware adaptive processing because the scalar coordinate estimates become available in closed form. Corroborating simulations are presented in Section V both for -invariant and varying sparse signals. Conclusions are drawn in Section VI. otation. Column vectors (matrices) are denoted with lowercase (upper-case) boldface letters and sets with calligraphic letters; ( ) T stands for transposition, and ( ) for the Moore- Penrose pseudo-inverse. The pth entry of vector x is denoted as x(p), and the (p,q)th entry of matrix R as R(p,q). The function (µ,σ 2 ) stands for the Gaussian probability density function with mean µ and variance σ 2 ; the l - and l 2 -norms of x R P are denoted as x := P p= x(p) and x 2 := P p= x(p) 2, respectively. II. PRELIMIARIES AD PROBLEM STATEMET Consider a vector x o R P which is sparse, meaning that only a few of its entries x o (p), p =,...,P, are nonzero. Let S xo := {p : x o (p) 0} denote its support, P := S xo the number of non-zero entries, and P 0 := P P. Sparsity amounts to having P P 0. Suppose that such a sparse vector is to be estimated sequentially in from scalar observations obeying the linear regression model y n := h T nx o + v n, n =,..., () where h n R P is the regression vector at n, and the additive noise v n is assumed uncorrelated with h n, white, with mean zero, and variance σ 2. The goal of this paper is to develop sequential and adaptive estimators of x o that is a priori known to be sparse, and perhaps slowly varying with n. The least-squares (LS) criterion is the workhorse for linear regression analysis [9, p. 658]. If y := [y,...,y ] T and H := [h,...,h ] T, the LS estimator of x o at solves the minimization problem x LS := arg min x R P y H x 2 2. (2) If < P or H is not full column rank, the problem in (2) does not admit a unique solution. For such cases, minimizing also the l 2 -norm of x renders the LS solver unique, and expressible as x LS = H y, where denotes matrix pseudoinverse defined as in e.g., [5, p. 275]. In the sequential context considered herein, LS faces three challenges: i) increasing memory requirements for storing y and H as grows large; ii) complexity of order O(P 3 ) per instant n to perform the inversion in H ; and iii) lack of capability to track possible variations of x o with n. These challenges are met by the recursive least-squares () estimator obtained as [9, Chap. 2] x := arg min x R P β,n (y n h T nx) 2, =,2,... (3) n= where the so-called forgetting factor β,n describes one of the following data windowing choices: (w) Infinite window with β,n =. This choice is adopted for -invariant signals and, with proper initialization renders equivalent to LS at complexity O(P 2 ) per datum. (w2) Exponentially decaying window with β,n = β n, and 0 β <. With this choice, downweighs old samples, and can track -varying signals. (w3) Finite window with β,n = if n M and β,n = 0 otherwise. Here, only the most recent M samples are utilized to form x while the rest are discarded. The estimator in (3) can be expressed recursively in terms of x. Supposing that {h n} P n= are linearly independent, setting β,n =, and initializing this recursion with the LS solution for = P, that is x = xls P, the coincides with the LS for successive instants > P, provided that x o remains invariant [9, p. 740]. For < P or when {h n } P n= are linearly dependent, the estimator can be regularized by augmenting the LS cost with a scaled l 2 -norm of x [9, p. 739]. Specifically, the regularized is x := arg min x R P [ β,n (y n h T nx) 2 +γ x 2 2 n= ], =,2,... (4) where γ > 0 is a pre-selected decreasing function of that depends on the selected window and its effect vanishes for large. Clearly, for γ = 0 the regularized in (4) reduces to the ordinary one in (3), but both do not exploit the sparsity present in x o. Sparse linear regression is a topic of intense research in the last decade and Lasso is one of the most widely applied sparsity-aware estimators [5], [24]. The Lasso estimator is given by x Lasso [ ] := arg min x R P 2 y H x x. (5) Thanks to the scaled l -norm, the cost encourages sparse solutions [24]: the larger the chosen is, the more components are shrunk to zero. Interestingly, the Lasso performs well in sparse problems also when < P, and the convex l - norm regularization which can afford efficient solvers when optimizing (5) given batch data, performs similarly to its nonconvex l 0 -norm counterpart [4]. The question that arises is how the l -norm regularization can be effectively utilized in adaptive signal processing. Specifically, given {y n,h n } n=, we wish to develop recursive schemes to estimate the sparse signal of interest with: i) minimal memory requirements; ii) tracking capability; and iii) limited complexity.

3 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 3 III. ADAPTIVE PSEUDO-REAL TIME LASSO Motivated by (3), a -weighted Lasso (TWL) approach emerges naturally to endow the batch Lasso in (5) with ability to handle sequential processing. Specifically, the proposed TWL estimator is x TWL := arg min x R P J TWL (x), =,2,... (6) where J TWL (x) := 2 n= β,n(y n h T nx) 2 + x. In addition to windowing, note that is now allowed to vary with. eglecting constant terms, the cost function in (6) can be re-written as where r := J TWL (x) = 2 xt R x x T r + x (7) β,n y n h n, R := n= β,n h n h T n. (8) n= Due to data windowing, r and R can be updated recursively as [cf. (w)-(w3)] (w) : r = r + y h, R = R + h h T (9a) (w2) : r = βr + y h, R = βr + h h T (9b) (w3) : r = r + y h y M h M, R = R + h h T h M h T M. (9c) Relative to the batch Lasso in (5), any of the TWL updates in (9) offers memory savings. Clearly, choices (w2) and (w3) allow also the signal of interest to vary slowly with. With respect to in (4), the TWL estimator inherits the properties brought by the l -norm, namely sparsity awareness and ability to deal with under-determined systems ( < P ). Summarizing, the attractive features of TWL are: i) Reduced memory requirements with respect to batch Lasso. ii) Improved performance relative to when x o is sparse and -invariant; and iii) Enhanced tracking capability when x o is sparse and -varying, relative to batch Lasso and for windows of size less than the dimension P of x o. Despite these attractive features, the main limitation of TWL is that a convex program has to be solved per. While the cost is differentiable, and thus amenable to closedform minimization, J TWL (x) is not. However, initializing the convex program at with the solution x TWL at provides a warm start-up, which speeds up convergence to the optimum x TWL. For these reasons, TWL is a pseudo-real algorithm. Low-complexity real- algorithms will be developed in Section IV. But for now, it is worth checking TWL for consistency. A. (In) Consistency of the TWL estimator Since the non-zero support of x o is unknown, and sparse vector estimators are nonlinear functions of the data not expressible in closed form, performance analysis is distinct from and far more challenging than that of LS estimators. Consider for simplicity that x o is -invariant for which (w) is prudent to adopt, and suppose that the regressors and noise satisfy these regularity (ergodicity) conditions: (r) lim n= h nh T n = R with probability (w.p.), with R positive definite; and (r2) lim n= v nh n = r vh w.p.. If the noise v n and the regressors {h n } are mixing, which is the case for most stationary processes with vanishing memory in practice, then (r) and (r2) are readily met. Since v n in () is zero mean and uncorrelated with h n, the cross-covariance in (r2) vanishes. With r vh = 0, it follows readily from () that r := lim n= y nh n = R x o w.p.. Upon dividing both sides of (7) by and taking limits, (r) and (r2) then imply that lim (/)J TWL(x) = (/2)xT R x x T r := J (x) w.p., if is chosen to grow slower than. In this case, as it holds that x TWL = arg min x J TWL(x) arg min x J (x) := R r = x o w.p.. This proves the following result. Proposition. For the model in () with (r), (r2), and (w) in effect, the TWL estimator is strongly consistent, provided that is chosen to satisfy lim = 0. At this point it is instructive to recall that under the conditions of Proposition, the LS estimator x LS = R r also converges w.p. to x o = R r, and is thus strongly consistent [6]. For this reason, to asses performance of TWL and differentiate it from that of LS it is pertinent to consider sufficiently large (but preferably finite) for which the standard sparsity-agnostic LS is unable to accurately estimate the zero entries of x o. It is thus of interest to check whether TWL can estimate jointly the nonzero support and the nonzero entries of x o consistently for sufficiently large. To this end, suppose that the first P entries of x o are non-zero; i.e., S xo := {,...,P }; and partition accordingly the R matrix as ( ) R R R = 0. R 0 R 00 Following the definitions in [0], support consistency amounts to having lim Prob [S x = S xo ] = (0) and -estimation consistency requires convergence in distribution ( d ), that is ( x P xp o ) d (0 P,σ 2 R ) () where x P denotes the P vector obtained by extracting the first P components of x. Properties (0) and () are referred to as oracle properties because a sparsity-aware estimator possessing these properties is asymptotically as good as if the support S xo was known in advance [0]. Under (w), the TWL corresponds to a sequential version of the batch Lasso estimator. Hence, asymptotic properties of

4 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 4 the latter derived in [6], [29] carry over to the TWL estimator introduced here. Lemma. (See also [29, Prop. ]). For the model in () with (r), (r2), and (w) in effect, if lim = 0 0, [ ] then lim Prob S x TWL = S x o = c( 0 ) < with c( 0 ) denoting an increasing function of 0. In words, Lemma asserts that if grows as, support consistency cannot be achieved. Since c( 0 ) increases with 0, the hope for the TWL to satisfy the oracle properties is left for cases wherein grows faster than. Unfortunately, the next result discourages this. Lemma 2. (See also [29, Lemma 3]). For the model in () with (r), (r2), and (w) in effect, if lim = and lim = 0, then lim ( x TWL x o) = C, where C is a non-random constant. Lemma 2 states that if grows faster than but slower than, the rate of convergence is, that is slower than ; hence, ( x TWL x o) diverges. Combining Lemmas and 2, the following negative result holds for batch Lasso and thus for TWL. Proposition 2. For the model in () with (r), (r2), and (w) in effect, the TWL estimator can not achieve the oracle properties for any choice of. Before exploring alternatives to TWL that satisfy the oracle properties, one remark is in order. Remark. If instead of the convex l -norm the LS cost is regularized with suitably chosen non-convex functions of x, it is possible to construct sparsity-aware estimators that asymptotically possess the oracle properties [0]. Of course, the price paid is inefficient optimization due to non-convexity. These considerations motivate searching for convex regularizing terms which result in pseudo real- Lasso estimators satisfying the oracle properties. Such a class is developed next using the weighted l -norm regularization, introduced in [29], [30] for the batch Lasso. B. Time- and norm-weighted Lasso Let u( ) denote the step function, {µ } a positive sequence dependent on the sample size, and a > a constant tuning parameter. Based on these, define the weight function w µ ( ) : R + [0,] as w µ (x) := [aµ x] + (a )µ u (x µ ) + u (µ x) (2) where [α] + := max(α,0). Using w µ, the novel - and norm-weighted Lasso (TWL) estimator weighs the l -norm with coefficients depending on the entries of x ; that is, [ x TWL := arg min x R P 2 p= β,n (y n h T nx) 2 (3) n= P ( + w µ x (p) ) ] x(p), =,2,... Figure shows the weight function w µ (x) for µ = 0.2 and a = 3.7. otice that while in TWL weighs identically all summands x(p) in the l -norm, the TWL estimator places higher weight to small entries, and lower weight to entries with large amplitudes. In fact, estimates of size less than µ are penalized as in TWL, while estimates between µ and aµ are penalized in a linearly decreasing manner. Finally, estimates larger than aµ are not penalized at all (cf. Fig. ). It is worth recalling at this point that albeit sparsity-agnostic, the estimator is -estimation consistent [6], that is ( x x o ) d (0,σ 2 R ). (4) Based on (4), it is possible to establish the following result. Proposition 3. (See also [30, Theorem 4]). For the model in (), with (r), (r2), and (w) in effect, if lim =, lim = 0 and µ =, the TWL estimator satisfies the oracle properties (0) and (). Weighted l -norm regularization was introduced in [7], [29], and [30] using different weight functions to effect sparsity and satisfy the oracle properties of the batch weighted Lasso estimator. The weight function in (2) corresponds to the local linear approximation of the smoothly clipped absolute deviation regularizer introduced by [30]. The difference here is its coupling with to ensure consistency of the novel adaptive TWL estimator. ext, the implications of Propositions 2 and 3 are demonstrated through simulated tests. C. umerical examples Gaussian observations are generated according to () with a -invariant x o, P = 30, P = 3, v n (0,σ 2 ), σ 2 = 0, h n (0 P,I P ) and infinite windowing as in (w). The penalty scale is set to = 2σ 2 log P for the TWL (see also [29]), and = 2σ log P for TWL with µ = and a = 3.7. The first three entries of x o are chosen equal to unity, and all other entries are set to zero. Figure 2 depicts the mean-square error (MSE), E[ x x o 2 ], across for the TWL, TWL, and along with what is termed genie-aided (GA), which knows in advance the support and performs standard to estimate the nonzero components. The convex optimization problem per is solved using the SeDuMi package [22] interfaced with Yalmip [7]. Observe that while TWL outperforms, it is outperformed by TWL, whose performance approaches that of the GA- benchmark. Indeed, the TWL does achieve the oracle properties in the considered simulation setting. ext, Gaussian observations are generated according to () with a -varying x o (henceforth denoted as x n ), and parameters P = 30, P = 3, v n (0,σ 2 ), σ 2 = 0, and h n (0 P,I P ). A Gauss-Markov model is assumed for x n ; that is, x n (p) = αx n (p) + w n (p) with x 0 (p) (0,), α = 0.99, and w n (p) ( 0, α 2) for p =,2,3. Without loss of generality, (w2) is adopted with β = 0.9, and = 2σ 2 log P n= β2( n) for both TWL and

5 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 5 TWL and µ =. Clearly, in a -varying n= setting these estimators are β n not expected to achieve the consistency properties established when x o remains -invariant. Figure 3 depicts the squared error (SE) for a realization of the, GA-, TWL and TWL. In the considered setting, TWL and TWL perform similarly and both outperform while approaching the performance of the GA- benchmark. ext, Gaussian observations are generated according to () with a -varying x n, and parameters P = 30, P = 3, v n (0,σ 2 ), σ 2 = 0 2, and h n (0 P,I P ). A Gauss- Markov model is assumed for x n with α = 0.995, and (w3) windowing is adopted. For brevity, only the regularized in (4) with constant γ is shown along with the TWL estimators. In Fig. 4, two window sizes of length M = 5 and M = 30 are simulated. Interestingly, while with M = 30 outperforms with M = 5, TWL with M = 5 outperforms TWL with M = 30, and achieves the overall best performance. In fact, TWL performs well even for small window sizes, M < P, when the signal of interest is sparse. Thus, TWL exhibits better tracking capability than which requires longer window size, and thus can track signals with slower variations. IV. ADAPTIVE REAL-TIME LASSO As mentioned earlier, TWL and TWL estimators are not suitable for real- implementation. In this section, online algorithms are developed and analyzed. The vector iterates developed in the next subsection provide online solvers of (6) and (3), admit a closed-form solution per iteration, and are proved convergent to x o when the unknown vector is invariant. For notational brevity, the algorithms are developed for the TWL estimator but carry over to TWL as well. A. Online coordinate descent One approach to finding the solution x TWL in (6) is to run a cyclic coordinate descent (CCD) algorithm, which in its simplest form entails cyclic iterative minimization of J TWL(x) in (7) with respect to one coordinate per iteration cycle. Let x (i ) denote the solution at and iteration i. The pth variable at the ith iteration is updated as x (i) (p) := arg min J TWL (x (i) (),...,x(i) (p ),x, x x (i ) (p + ),...,x (i ) (P)) (5) for p =,...,P. During the ith cycle each coordinate (here the pth) is optimized, while the pre-cursor coordinates (those with p < p) are kept fixed to their values at the ith cycle, and the post-cursor coordinates (those with p > p) are kept fixed to their values at the (i )st cycle. Albeit convex, the cost J TWL (x) is non-differentiable. onetheless, convergence of the CCD algorithm for Lassotype problems follows readily using the results of [25]. In addition to affording effective initialization (with the all-zero vector), another attractive feature of CCD Lasso solvers is that each coordinate-wise minimizer is available in closed form. Recent comparative studies show that CCD exhibits computational complexity similar (if not lower) than state-ofthe-art batch Lasso solvers and is numerically stable [3], [27]. The online coordinate descent (OCD) algorithm introduced next can be viewed as an adaptive counterpart of CCD Lasso, where a new datum is incorporated at each iteration; that is, the iteration index (i) in CCD is replaced in OCD by the index. The challenge arises because the cost function changes with. The crux of OCD is to update only one variable per datum in the spirit of e.g., the partial least mean-squares (PLMS) algorithm [2]. otwithstanding, PLMS is a sparsity-agnostic first-order algorithm, whereas OCD is sparsity-cognizant, it capitalizes on second-order statistics similar to, and it is also provably convergent. For notational convenience, express the index as = kp + p, where p {,...,P } corresponds to the only entry of x to be updated at, and k = P indexes the number of cycles; that is, how many s the pth coordinate is updated. Let x OCD denote the solution of the OCD algorithm at and x OCD (q) = xocd (q) for q p, which amounts to setting all but the pth coordinate at equal to those at, and selecting the pth one by minimizing (x); that is, J TWL x OCD (p) := arg min x J TWL ( x OCD (),..., x OCD (p ),x, x OCD (p + ),..., x OCD (P)). (6) In the cyclic update (6), the pre-cursor coordinates { x OCD (q), q < p} have been updated k + s, while the post-cursor entries { x OCD (q), q > p} have been updated k s. After isolating from J TWL (x) only terms which depend on the pth coordinate that is currently optimized, recursion (6) can be rewritten as (cf. (6)) [ ] x OCD (p) = arg min x 2 R (p,p)x 2 r,p x + x (7) r,p := r (p) q p R (p,q) x OCD (q). (8) Being a scalar optimization problem, it is well known that the minimization problem in (7) accepts a closed-form solution, namely [3] x OCD (p) = sgn(r,p) [ ] r,p R (p,p). (9) + Equation (9) amounts to a soft-thresholding operation that sets to zero inactive entries, thus facilitating convergence Algorithm OCD-TWL Initialize with x OCD 0 = 0, p =,..., P for k = 0,,... do for p =,..., P do S. Acquire datum y, and regressor h, = kp + p. S2. Obtain r and R as in (8). S3. Set x OCD (q) = x OCD (q) for all q p. S4. Compute r,p via (8). S5. Update x OCD (p) as in (9). end for end for

6 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 6 to sparse iterates. The OCD-TWL scheme is tabulated as Algorithm. Convergence of OCD-TWL is established in Appendix A, and the main result can be summarized as follows. Proposition 4. For the model in () with (r), (r2), and (w) in effect, if lim = 0, it holds w.p. that lim x OCD = x o. In words, Proposition 4 asserts that the OCD-TWL estimator is strongly consistent. B. Online selective coordinate descent The OCD-TWL solver has low complexity but may exhibit slow convergence since each variable is updated every P observations. But since P P 0 due to sparsity, most of the OCD-TWL re-sets to zero inactive entries of x o. On the other hand, updating zero variables cannot be skipped a priori since new nonzero entries may arise in -varying scenarios. To address this dilemma, it is prudent to select which coordinate to update. A related selective approach has been pursued for batch Lasso in [27], and is extended here to the novel OCD solver. Let d ep J TWL( xoscd ) and d e p J TWL( xoscd ) denote the forward and backward directional derivatives w.r.t. x(p) evaluated at x OSCD, which denotes the online selective coordinate descent (OSCD) estimate at. Define also the vectors d +,d R P whose pth entries are d ep J TWL( xoscd ) and d ep J TWL( xoscd ), respectively. It is not difficult to verify that (see also [27]) d + = R x OSCD r + s + (20) d = r R x OSCD + s (2) with s +,s R P, s + (p) = if x OSCD 0 and s+ (p) = otherwise; while s (p) = if x OSCD 0, and s+ (p) = otherwise. After evaluating (20) and (2), the coordinate with the most negative directional derivative, either forward or backward, is updated. The OSCD-TWL scheme is summarized as Algorithm 2. Remark 2. A subgradient-based LMS-like is developed in [2] for sparsity-aware online estimation. However, subgradient methods are first-order algorithms that posses slow convergence. For this reason, the OCD and OSCD alternatives developed here should be preferred. C. Complexity issues Recall that the algorithm requires O(P 2 ) algebraic operations per datum. On the other hand, the OCD-TWL Algorithm 2 OSCD-TWL Initialize x OSCD 0 = 0, p =,..., P. for =, 2,... do S. Acquire datum y, and regressor h. S2. Evaluate d + and d as in (20) and (2). S3. Select p = arg min p{d + (p), d (p)} P p=. S4. Update x OSCD (q) = x OSCD (q) for all q p. S5. Compute r,p via (8). S6. Update x OSCD (p ) as in (9). end for Algorithm requires r,p, whose computational burden is O(P) given R and r. As far as OSCD is concerned, the selection step requires evaluation of d + and d whose computation entails O(PP, ) algebraic operations, where P, denotes the number of non-zero entries of x OSCD. However, the overall computational burden of the OCD-TWL algorithm is dominated by the update of R, which requires O(P 2 ) algebraic operations. In this general case, the OCD (OSCD) can be implemented cyclically to update each coordinate per datum without affecting the overall complexity in order to speed up convergence. We summarize the online cyclic coordinate descent (OCCD) TWL as in Algorithm 3. An important simplification which appears in problems such as system identification and beamforming is that regressors are sliding with ; that is h n := [h(n),h(n ),...,h(n P + )] T with h 0 = 0 P. In this case, R updates and the estimates incur complexity O(P) [9, p. 86], [28]. Likewise, OCD-TWL and OSCD-TWL in Algorithms and 2 can be also implemented with complexity O(P). Same conclusions can be drawn for online implementations of the TWL through OCD or OSCD. In a nutshell, the novel online algorithms entail complexity analogous to. V. SIMULATED TESTS The online algorithms developed in Section IV are simulated here and compared with the TWL and TWL algorithms of Section III and also with the in both -invariant as well as -varying scenarios. Gaussian observations are generated according to () with a -invariant x o, P = 30, P = 3, v n (0,σ 2 ), σ 2 = 0, h n (0 P,I P ), and windowing as in (w). The first three entries of x o are chosen equal to unity, and all other entries are set to zero. Figure 5 depicts the MSE of the OCD- TWL, OCCD-TWL, OSCD-TWL, and TWL versus. The scale is set to = 2σ 2 log P. As expected, the OCD- TWL converges to the TWL which requires the solution of a convex program per. Similar results holds for the OCCD-TWL and OSCD-TWL algorithms that also provide a means of enhancing the convergence speed. Figure 6 depicts the MSE of the OCD-TWL, OCCD- TWL, OSCD-TWL, and TWL versus. The scale is set to = 2σ log P with µ = and a = 3.7 Also in this case the online algorithms converge to their pseudo real- counterparts. ext, Gaussian observations are generated according to () with a -varying x n, and parameters P = 30, P = 3, Algorithm 3 OCCD-TWL Initialize x OCCD 0 = 0, p =,..., P for =, 2,... do S. Acquire datum y, and regressor h. S2. Obtain r and R as in (8). for p =,..., P do S3. Evaluate r,p = r (p) q p S4. Update x OCCD (p) via (9). end for end for R(p, q) xoccd (q)

7 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 7 v n (0,σ 2 ), σ 2 = 0, and h n (0 P,I P ). A Gauss-Markov model is assumed for x n with entries generated according to x n (p) = αx n (p) + w n (p) with x 0 (p) (0,), α = 0.99, and w n (p) ( 0, α 2) for p =,2,3; (w2) is adopted with β = 0.9, and scale n= β2( n). Figure 7 shows a real- = 2σ 2 log P ization of the squared error (SE) for the OCD-TWL, OCCD- TWL, OSCD-TWL, TWL, and. The OCD-TWL exhibits performance similar to that of the. Indeed, updating one coordinate per observation in -varying settings weakens tracking capabilities [2]. However, the performance OCCD- TWL and OSCD-TWL approaches that of TWL, and both outperform the algorithm. Successively, simulated tests are performed to assess performance when the support of x n changes with. The setting in this example is identical to that of Fig. 7, except that here the support of the sparse x n also undergoes step changes. Specifically, at = 25 the third entry of x n starts decreasing, and after = 50 the same entry is set to zero. In addition, at = 25 the fourth entry becomes nonzero. Figures 8 and 9 depict, respectively, the true variations of x n (3) and x n (4) across, along with their estimates obtained using the, the OCCD-TWL, and the OCCD-TWL with n= β n µ = and a = 3.7. Observe that the developed sparsity-aware algorithms can set to zero inactive entries while estimates are not sparse and yield a nonzero value even if the true entry is zero. Moreover, after a few instants from the changing support points, the developed algorithms are able to track entries that become nonzero, and are further able to set to zero entries that disappear. Finally, the novel algorithms are tested for identifying the sparse, finite impulse response of a discrete-, linear system using input and noisy output data satisfying the input-output relationship y n = P p=0 h p x n p + v n = x T nh o + v n, n =,..., (22) where h o := [h 0,...,h P ] T collects the unknown impulse response coefficients, x n := [x n,...,x n P+ ] T denotes the given input data (the regressor vector in ()), and y n the output at n. As the system order maybe unknown, a large known upper bound P is selected. Since many entries of h o maybe zero or negligible, the impulse response is sparse. In addition, nonzero entries may exhibit slow variations, which gives rise to a -varying impulse response h n. To assess performance of the introduced algorithms, a system with P = 28 and P = 6 nonzero entries at unknown locations is simulated. The input sequence is assumed zeromean, white, Gaussian, with unit variance, and v n (0,σ 2 ) with σ 2 = 0 2., OSCD-TWL, and OSCD-TWL are tested along with the GA- for an exponential window with β = Since the regressors here are shift-invariant, all algorithms incur computational burden that scales linearly with P. The impulse response is generated according to a first-order Gauss-Markov process with α = The tuning parameters of the OSCD-TWL and OSCD-TWL have been chosen as in Fig. 8. Figure 0 depicts the MSE (averaged over 00 realizations) across. It is clear that both OSCD-TWL and OSCD-TWL outperform the. In particular, the gain of the OSCD-TWL is more than one order of magnitude. VI. COCLUDIG SUMMARY Recursive algorithms were developed in this paper for estimation of (possibly -varying) sparse signals based on observations that obey a linear regression model, and become available sequentially in. The novel TWL and TWL algorithms can be viewed as l -norm regularized versions of the. Simulations illustrated that TWL outperforms the sparsity-agnostic scheme when estimating -invariant and slowly-varying sparse signals. Moreover, the novel algorithms exhibit enhanced tracking capability with respect to especially for short observation windows. Performance analysis revealed that TWL estimates cannot simultaneously recover the signal support and maintain convergence of. This prompted the development of TWL, which for proper selection of design parameters can achieve oracle consistency properties for -invariant sparse signals. However, TWL and TWL require solving a convex problem per step, and may be less desirable for real- applications. To overcome this limitation, low-complexity sparsity-aware online schemes were also developed. The crux of these schemes is a novel optimization algorithm that implements the basic coordinate descent iteration online. Albeit simple, the resulting OCD- TWL (OCD-TWL) algorithm was proved convergent when the sparse signal is invariant. At complexity comparable to OCD-TWL (OCD-TWL) but with improved convergence speed, online selective variants choose the best coordinate to optimize and exhibit performance similar to the pseudo real TWL (TWL). APPEDIX A PROOF OF PROPOSITIO 4 Define the vector χ k := x OCD for = kp, that is, the one containing the iterates at the end of the kth cycle when each variable has been updated k s. The proof that χ k converges to x o as k will proceed in five stages. In the first one, Algorithm is put in the form of a noisy vector-matrix difference equation. The second and third stages prove that the corresponding discrete- dynamical system is exponentially stable, and that the sequence {χ k } k= is bounded. In the fourth stage, convergence to a limit point χ is proved. The proof concludes by showing that χ = x o. A. Dynamical system Let Rk denote the matrix with entries Rk (p,q) := R kp+p (p,q)/(kp +p), and r k the vector with entries r k (p) := r kp+p (p)/(kp + p). Conditions (r) and (r2) guarantee that k r w.p.. Consider the decompo- R k k R and r k sition R k = D k + L k +Ūk where D k is diagonal, and L k (Ūk) is strictly lower (upper) triangular. Observe that L k ŪT k.

8 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 8 Dividing the cost function of the problem in (7) by kp +p B. Exponential stability yields [ ( ) ( ) RkP+p (p,p) χ k+ (p) = arg min χ 2 rkp+p,p χ χ 2 kp + p kp + p ( ) ] + χ. (23) kp + p The solution of this scalar minimization problem can be obtained in two steps. First, solve the differentiable linearquadratic part of (23) using the auxiliary vector z k+ to obtain ( ( RkP+p (p,p) rkp+p (p) z k+ (p) = arg min z q<p [ 2 kp + p R kp+p (p,q) χ k+ (q) kp + p q>p ) z 2 kp + p ) ] z R kp+p (p,q) χ k (q) kp + p (24) where r kp+p,p was expanded according to its definition in (8) and divided in two sums: the one already updated in the cycle k + (q < p), and the second one updated in the kth cycle (q > p). The second step to solve (23) is to pass z k+ (p) through the soft-threshold operator χ k+ (p) = sgn(z k+ (p)) [ z k+ (p) kp+p kp + p ]. (25) + Using the decomposition of R k, (24) can be rewritten as z k+ (p) = arg min 2 D k (p,p) z 2 ( r k (p) q<p z L k (p,q)χ k+ (q) q>p Ū k (p,q)χ k (q) whose solution is obtained (after equating the derivative to zero) as D k (p,p) z k+ (p) = r k (p) q<p L k (p,q)χ k+ (q) ) q>p Ū k (p,q)χ k (q). (26) Concatenating the latter with p =,...,P yields the matrixvector difference equation D k z k+ = r k L k χ k+ Ūkχ k. (27) The soft-thresholding operation in (25) can be accounted for by defining the error vector ǫ k := χ k+ z k+ in which case (27) can be re-written as D k (χ k+ ǫ k ) = r k L k χ k+ Ūkχ k. (28) Assuming that there exists a k such that D k + L k is invertible for each k > k, (28) can be written as χ k+ = Ḡkχ k +( D k + L k ) r k +( D k + L k ) Dk ǫ k (29) with Ḡk := ( D k + L k ) Ū k. The key point to be used subsequently is that (25) guarantees that the entries of ǫ k are bounded by a vanishing sequence. Specifically, it holds that ǫ k (p) kp+p, for p =,...,P (30) kp + p since the input-output variables of the soft-threshold operator χ = sgn(z)[ z ] + obey χ z. z Let Ḡ(l : k) := k i=l Ḡi in (29) denote the product of the transition matrices Ḡk. The goal of this stage is to prove that Ḡ(l : k) cρ k l+, with ρ <. The convergence of R k to R implies convergence of Ḡk to G := (D + L ) U, where D, L and U are the diagonal, lower triangular, and upper triangular parts of R, respectively. Since R is positive definite, the spectral radius of G is strictly less than one, i.e., (G ) < [5, p. 52]. Furthermore, for every δ > 0 there exists a c(δ) constant w.r.t. k, such that G k < c(δ)[ (G ) + δ] k [5, p. 336]. Then, by selecting δ = ( (G ))/2, and defining ρ := ( + (G ))/2, it holds that G k < cρ k, with ρ <. (3) Upon defining G k := Ḡk G, the following recursion is obtained Ḡ( : k) = ḠkḠ( : k ) = G Ḡ( : k ) + G k Ḡ( : k ) = G k + G k i G i Ḡ( : i ),Ḡ( : 0) := I. Using (3), the latter can be bounded as Ḡ( : k) cρk + c which after multiplying both sides by ρ k ρ k Ḡ( : k) c + ρ k i G i Ḡ( : i ) yields cρ G i Ḡ( : i ) ρ (i ) (32) and allows one to apply the discrete Bellman-Gronwall lemma (see e.g., [2, p. 35]). Lemma 5 (Bellman-Gronwall). If c,ξ k,h k 0 k satisfy the recursive inequality ξ k c + h i ξ i (33) then ξ k obeys the non-recursive inequality ξ k c k ( + h i ). (34) For ξ k = ρ k Ḡ( : k) and h k = cρ G k+, (32) takes the form of (33), so that (34) holds and (after multiplying both sides by ρ k ) results in Ḡ( : k) ρk c k ( ) + cρ G i = c k ( ) ρ + c G i. (35) Raising both sides of (35) to the power of /k and applying the geometric-arithmetic mean inequality, it follows that Ḡ( : k) /k c /k k ( ) ρ + c G i

9 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 9 which is readily rewritten as Ḡ( : k) c ( ρ + c k k G i ). Since Ḡk k G and G k k 0 w.p., for every δ > 0 there exists an integer k 0 such that if k k 0, then k k G i δ w.p. Thus, if δ is selected as ( ρ )/(2c), and ρ as ( + ρ )/2, the following bound is obtained Ḡ( : k) cρk, ρ <, k k 0, w.p.. It is clear, by inspection, that the proof so far carries over even if the product of transition matrices starts at l > ; that is Ḡ(l : k) cρk l+, ρ <, k k 0, w.p.. (36) Certainly, c, ρ, and k 0 do not depend on l. However, k 0 does depend on the realization of the random sequence Ḡk, and its existence and finiteness are guaranteed w.p.. C. Boundedness Define v k := ( D k + L k ) ( r k + D k ǫ k ), and rewrite (29) as χ k+ = Ḡkχ k + v k. Using back substitution, χ k+ can then be expressed as k k χ k+ = Ḡ(k 0 0 : k)χ k0 + Ḡ(k i + : k)v k i + v k. Since D k, L k, r k, and ǫ k converge, the random sequence v k converges too w.p. ; hence, it can be stochastically bounded by a random variable v; that is, v k < v, k, w.p.. This, combined with the exponential stability ensured by (36), guarantees that the realizations of the random sequence χ k are (stochastically) bounded; thus D. Convergence k k 0 χ k+ cρ k+k0 χ k0 + cρ i v + ( ) v c χ k0 + cv ρ +, w.p.. (37) Define the error D k := D k D, and similarly L k := L k L, Ũ k := Ūk U, and r k := r k r. Using these new variables, (28) can be rewritten in error form as (D + D k )(χ k+ ǫ k ) = (r + r k ) (L + L k )χ k+ (U + Ũk)χ k (38) and, after regrouping terms, as χ k+ = (D + L ) U χ k + (D + L ) (r + r k ( D k + L k )χ k+ + (D + D k )ǫ k Ũkχ k ). (39) Equation (39) describes an exponentially stable linear invariant system with transition matrix (D + L ) U, and input u k := (D +L ) [r + r k ( D k + L k )χ k+ + (D + D k )ǫ k Ũkχ k ]. The input can be divided into its limiting point u := (D +L ) r, and the error ũ k := (D +L ) [ r k ( D k + L k )χ k+ +(D + D k )ǫ k Ũkχ k ]. As k, the vector ũ k goes to zero almost surely because the sequence χ k is bounded, and the error r k, Dk, Lk and Ũ k as well as ǫ k, all go to zero w.p.. With this notation and recalling the definition G := (D + L ) U, (39) is rewritten as χ k+ = G χ k + u + ũ k and back-substituting again the new expression for χ k+ yields χ k+ = G k χ + G k i u + G k i ũ i. (40) Convergence of this recursion will be established by showing that the first and third terms in the right-hand side vanish as k, while the surviving one corresponds to a stable geometric series. Given that c > 0,ρ < such that n G n cρ n, convergence of the first term to zero follows readily from (3). The third term represents the limiting output value of a multiple input-multiple output stable linear -invariant system with vanishing input; that is lim i ũ i = 0. As k G k cρ k, (3) implies that it is possible to bound the sum under consideration as G k i ũ i c ρ k i ũ i. (4) Since lim i ũ i = 0, it holds by the definition of the limit that for any ǫ > 0 so that ũ i ǫ, i. Using the latter along with (4), it follows that for k G k i ũ i c = cρ k ρ k i ũ i + cǫ ρ i i= ũ i + cǫ ρ k i i= ρ k i (42). Because ρ i ũ i does not depend on k, the limit of the first summand in (42) goes to zero; hence, lim G k i ũ i cǫ/( ρ ). k The last inequality holds ǫ > 0; thus, lim G k i ũ i = 0 k which establishes convergence to zero of the third sum in the right-hand side of (40). E. Limit point Once convergence is established, it is possible to take the limit as k in (38) to obtain D (χ ǫ ) = r L χ U χ, w.p.. (43)

10 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR) 0 Recalling that ǫ lim = 0, (43) reduces to (D + L + U )χ = r, w.p. (44) and since D + L + U = R, it holds that χ = (R ) r = x o, w.p. (45) which concludes the proof. ACKOWLEDGMETS The work in this paper was supported by the SF grants CCF and CO , and also through collaborative participation in the Communications and etworks Consortium sponsored by the U. S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement DAAD The U. S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. REFERECES [] H. Akaike, A new look at the statistical model identification, IEEE Trans. Automatic Control, pp , 974. [2] D. Angelosante and G. B. Giannakis, -weighted Lasso for adaptive estimation of sparse signals, in Proc. IEEE Intl. Conf. Acoust. Sp. Sig. Proc., Taipei, Taiwan, Apr [3] J.-A. Bazerque and G. B. Giannakis, Distributed spectrum sensing for cognitive radios by exploiting sparsity, Proc. of 42nd Asilomar Conf., Pacific Grove, CA, Oct , [4] E. Candès, J. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp , [5] E. Candès and T. Tao, ear optimal signal recovery from random projections: Universal encoding strategies?, IEEE Trans. on Information Theory, vol. 52, no. 2, pp , [6] E. Candès and T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n, Annals of Statistics, vol. 35, no. 6, pp , [7] E. Candès, M. Wakin and S. Boyd, Enhancing sparsity by reweighted l minimization, J. Fourier Anal. Appl., vol. 4, pp , [8] S. S. Chen, D. L. Donoho and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM J. on Scientific Computing, vol. 20, no., pp. 33 6, 998. [9] S. F. Cotter and B. D. Rao, Sparse channel estimation via matching pursuit with application to equalization, IEEE Trans. on Communications, vol. 50, no. 3, pp , March [0] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., vol. 96, pp , 200. [] D. P. Foster and E. I. George, The risk inflation criterion for multiple regression, Annals of Statistics, vol. 22, no. 4, pp , 994. [2] M. Godavarti, A. O. Hero III, Partial update LMS algorithms, IEEE Trans. on Signal Processing, Vol. 53, o. 7, pp , July [3] J. Friedman, T. Hastie, H. Höfling and R. Tibshirani, Pathwise coordinate optimization, Annals of Applied Statistics, vol., pp , Dec [4] Y. Gu Y. Chen and A. O. Hero, Sparse LMS for system identification, in Proc. IEEE Intl. Conf. Acoust. Sp. Sig. Proc., Taipei, Taiwan, Apr [5] G. Golub and C Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, 996. [6] K. Knight and W. J. Fu, Asymptotics for Lasso-type estimators, Annals of Statististcs, vol. 28, pp , [7] J. Löfberg, Yalmip: Software for solving convex (and nonconvex) optimization problems, in Proc. Amer. Contr. Conf., Minneapolis, M, June [8] D. M. Malioutov, S. Sanghavi and A. S. Willsky, Compressed sensing with sequential observations, Proc. of Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Las Vegas, V, April [9] A. H. Sayed, Fundamentals of Adaptive Filtering, John Wiley & Sons, [20] G. Schwarz, Estimating the dimension of a model, Annals of Statistics, vol. 6, no. 2, pp , 978. [2] V. Solo and X. Kong, Adaptive signal processing algorithms: stability and performance, Prentice-Hall, Upper Saddle River, J, 995. [22] J. Sturm, Using sedumi.02, a matlab toolbox for optimization over symmetric cones, Optim. Meth. Softw., vol., no. 2, pp , 999. [23] G. Taubock and F. Hlawatsch, A compressed sensing technique for OFDM channel estimation in mobile environments: Exploiting channel sparsity for reducing pilots, Proc. of Int. Conf. Acoust. Sp. Sig. Proc., Las Vegas, V., Apr [24] R. Tibshirani, Regression shrinkage and selection via the Lasso, J. Royal. Statist. Soc., vol. 58, no., pp , 996. [25] P. Tseng, Convergence of block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., vol. 09, pp , June 200. [26]. Vaswani, Kalman filtered compressed sensing, Proc. of Intl. Conf. Image Proc, San Diego, CA, Oct [27] T. T. Wu and K. Lange, Coordinate descent algorithms for Lasso penalized regression, Annals of Applied Statistics, vol. 2, pp , March [28] G. P. White, Y. V. Zakharov and J. Liu, Low-complexity algorithms using dichotomous coordinate descent iterations, IEEE Trans. Sign. Proc., vol. 56, pp , July [29] H. Zou, The adaptive Lasso and its oracle properties, J. of the American Statistical Association, vol. 0, no. 476, pp , [30] H. Zou and R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, vol. 36, no. 4, pp , 2008.

11 IEEE TRASACTIOS O SIGAL PROCESSIG (TO APPEAR).5 aµ t µ t =0.2, a=3.7 TWL TWL 8 7 6, M=5 TWL, M=5, M=30 TWL, M=30 5 w µt (x) SE µ t x Fig.. Weight functions for TWL and TWL estimators. Fig. 4. Squared error comparisons of pseudo real- estimators (varying x n with finite window) GA TWL TWL TWL OCD TWL OCCD TWL OSCD TWL MSE 0 MSE Fig. 2. MSE comparisons of pseudo real- estimators (-invariant x o). Fig. 5. MSE comparisons of online estimators (-invariant x o) GA TWL TWL TWL OCD TWL OCCD TWL OSCD TWL 3 SE MSE Fig. 3. Squared error comparisons of pseudo real- estimators (varying x n with exponentially decreasing window). Fig. 6. x o). MSE comparisons of online weighted-norm estimators (-invariant

An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models

An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models Soheil Salehpour, Thomas Gustafsson,

More information

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute

More information

Exploiting Sparsity for Wireless Communications

Exploiting Sparsity for Wireless Communications Exploiting Sparsity for Wireless Communications Georgios B. Giannakis Dept. of ECE, Univ. of Minnesota http://spincom.ece.umn.edu Acknowledgements: D. Angelosante, J.-A. Bazerque, H. Zhu; and NSF grants

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

squares based sparse system identification for the error in variables

squares based sparse system identification for the error in variables Lim and Pang SpringerPlus 2016)5:1460 DOI 10.1186/s40064-016-3120-6 RESEARCH Open Access l 1 regularized recursive total least squares based sparse system identification for the error in variables Jun

More information

Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity

Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR) Robust in the Presence of Correlated Noise Using Outlier Sparsity Shahroh Farahmand and Georgios B. Giannais Abstract Relative to batch alternatives,

More information

CO-OPERATION among multiple cognitive radio (CR)

CO-OPERATION among multiple cognitive radio (CR) 586 IEEE SIGNAL PROCESSING LETTERS, VOL 21, NO 5, MAY 2014 Sparse Bayesian Hierarchical Prior Modeling Based Cooperative Spectrum Sensing in Wideb Cognitive Radio Networks Feng Li Zongben Xu Abstract This

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Recursive l 1, Group lasso

Recursive l 1, Group lasso Recursive l, Group lasso Yilun Chen, Student Member, IEEE, Alfred O. Hero, III, Fellow, IEEE arxiv:.5734v [stat.me] 29 Jan 2 Abstract We introduce a recursive adaptive group lasso algorithm for real-time

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions Score-Function Quantization for Distributed Estimation Parvathinathan Venkitasubramaniam and Lang Tong School of Electrical and Computer Engineering Cornell University Ithaca, NY 4853 Email: {pv45, lt35}@cornell.edu

More information

Multiplicative and Additive Perturbation Effects on the Recovery of Sparse Signals on the Sphere using Compressed Sensing

Multiplicative and Additive Perturbation Effects on the Recovery of Sparse Signals on the Sphere using Compressed Sensing Multiplicative and Additive Perturbation Effects on the Recovery of Sparse Signals on the Sphere using Compressed Sensing ibeltal F. Alem, Daniel H. Chae, and Rodney A. Kennedy The Australian National

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity

Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity 3308 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 6, JUNE 202 Robust RLS in the Presence of Correlated Noise Using Outlier Sparsity Shahroh Farahmand and Georgios B. Giannais Abstract Relative

More information

Expressions for the covariance matrix of covariance data

Expressions for the covariance matrix of covariance data Expressions for the covariance matrix of covariance data Torsten Söderström Division of Systems and Control, Department of Information Technology, Uppsala University, P O Box 337, SE-7505 Uppsala, Sweden

More information

Riccati difference equations to non linear extended Kalman filter constraints

Riccati difference equations to non linear extended Kalman filter constraints International Journal of Scientific & Engineering Research Volume 3, Issue 12, December-2012 1 Riccati difference equations to non linear extended Kalman filter constraints Abstract Elizabeth.S 1 & Jothilakshmi.R

More information

Lassoing Line Outages in the Smart Power Grid

Lassoing Line Outages in the Smart Power Grid Lassoing Line Outages in the Smart Power Grid Hao Zhu and Georgios B. Giannakis University of Minnesota, Dept. of ECE, 00 Union Street SE, Minneapolis, MN 55455, USA arxiv:1105.3168v1 [cs.sy] 16 May 011

More information

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Hyperplane-Based Vector Quantization for Distributed Estimation in Wireless Sensor Networks Jun Fang, Member, IEEE, and Hongbin

More information

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

A Strict Stability Limit for Adaptive Gradient Type Algorithms

A Strict Stability Limit for Adaptive Gradient Type Algorithms c 009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional A Strict Stability Limit for Adaptive Gradient Type Algorithms

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Joint Direction-of-Arrival and Order Estimation in Compressed Sensing using Angles between Subspaces

Joint Direction-of-Arrival and Order Estimation in Compressed Sensing using Angles between Subspaces Aalborg Universitet Joint Direction-of-Arrival and Order Estimation in Compressed Sensing using Angles between Subspaces Christensen, Mads Græsbøll; Nielsen, Jesper Kjær Published in: I E E E / S P Workshop

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

DNNs for Sparse Coding and Dictionary Learning

DNNs for Sparse Coding and Dictionary Learning DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,

More information

SIGNALS with sparse representations can be recovered

SIGNALS with sparse representations can be recovered IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1497 Cramér Rao Bound for Sparse Signals Fitting the Low-Rank Model with Small Number of Parameters Mahdi Shaghaghi, Student Member, IEEE,

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

/16/$ IEEE 1728

/16/$ IEEE 1728 Extension of the Semi-Algebraic Framework for Approximate CP Decompositions via Simultaneous Matrix Diagonalization to the Efficient Calculation of Coupled CP Decompositions Kristina Naskovska and Martin

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis 84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational

More information

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay

Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay Adaptive MMSE Equalizer with Optimum Tap-length and Decision Delay Yu Gong, Xia Hong and Khalid F. Abu-Salim School of Systems Engineering The University of Reading, Reading RG6 6AY, UK E-mail: {y.gong,x.hong,k.f.abusalem}@reading.ac.uk

More information

Estimation Error Bounds for Frame Denoising

Estimation Error Bounds for Frame Denoising Estimation Error Bounds for Frame Denoising Alyson K. Fletcher and Kannan Ramchandran {alyson,kannanr}@eecs.berkeley.edu Berkeley Audio-Visual Signal Processing and Communication Systems group Department

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

LTI Systems, Additive Noise, and Order Estimation

LTI Systems, Additive Noise, and Order Estimation LTI Systems, Additive oise, and Order Estimation Soosan Beheshti, Munther A. Dahleh Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts

More information

LEAST Mean Squares Algorithm (LMS), introduced by

LEAST Mean Squares Algorithm (LMS), introduced by 1 Hard Threshold Least Mean Squares Algorithm Lampros Flokas and Petros Maragos arxiv:1608.0118v1 [cs.sy] 3 Aug 016 Abstract This work presents a new variation of the commonly used Least Mean Squares Algorithm

More information

Likelihood Bounds for Constrained Estimation with Uncertainty

Likelihood Bounds for Constrained Estimation with Uncertainty Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 5 Seville, Spain, December -5, 5 WeC4. Likelihood Bounds for Constrained Estimation with Uncertainty

More information

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander

EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS. Gary A. Ybarra and S.T. Alexander EFFECTS OF ILL-CONDITIONED DATA ON LEAST SQUARES ADAPTIVE FILTERS Gary A. Ybarra and S.T. Alexander Center for Communications and Signal Processing Electrical and Computer Engineering Department North

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

NONLINEAR systems with memory appear frequently in

NONLINEAR systems with memory appear frequently in IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 12, DECEMBER 2011 5907 Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation Vassilis Kekatos, Member, IEEE, and Georgios B

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Alternating Optimization with Shrinkage

Alternating Optimization with Shrinkage Sparsity-Aware Adaptive Algorithms Based on 1 Alternating Optimization with Shrinkage Rodrigo C. de Lamare and Raimundo Sampaio-Neto arxiv:1401.0463v1 [cs.sy] 2 Jan 2014 Abstract This letter proposes a

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

ESTIMATOR STABILITY ANALYSIS IN SLAM. Teresa Vidal-Calleja, Juan Andrade-Cetto, Alberto Sanfeliu

ESTIMATOR STABILITY ANALYSIS IN SLAM. Teresa Vidal-Calleja, Juan Andrade-Cetto, Alberto Sanfeliu ESTIMATOR STABILITY ANALYSIS IN SLAM Teresa Vidal-Calleja, Juan Andrade-Cetto, Alberto Sanfeliu Institut de Robtica i Informtica Industrial, UPC-CSIC Llorens Artigas 4-6, Barcelona, 88 Spain {tvidal, cetto,

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1

GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS. Mitsuru Kawamoto 1,2 and Yujiro Inouye 1 GENERALIZED DEFLATION ALGORITHMS FOR THE BLIND SOURCE-FACTOR SEPARATION OF MIMO-FIR CHANNELS Mitsuru Kawamoto,2 and Yuiro Inouye. Dept. of Electronic and Control Systems Engineering, Shimane University,

More information

Auxiliary signal design for failure detection in uncertain systems

Auxiliary signal design for failure detection in uncertain systems Auxiliary signal design for failure detection in uncertain systems R. Nikoukhah, S. L. Campbell and F. Delebecque Abstract An auxiliary signal is an input signal that enhances the identifiability of a

More information

Minimax MMSE Estimator for Sparse System

Minimax MMSE Estimator for Sparse System Proceedings of the World Congress on Engineering and Computer Science 22 Vol I WCE 22, October 24-26, 22, San Francisco, USA Minimax MMSE Estimator for Sparse System Hongqing Liu, Mandar Chitre Abstract

More information

Denoising of NIRS Measured Biomedical Signals

Denoising of NIRS Measured Biomedical Signals Denoising of NIRS Measured Biomedical Signals Y V Rami reddy 1, Dr.D.VishnuVardhan 2 1 M. Tech, Dept of ECE, JNTUA College of Engineering, Pulivendula, A.P, India 2 Assistant Professor, Dept of ECE, JNTUA

More information

CCA BASED ALGORITHMS FOR BLIND EQUALIZATION OF FIR MIMO SYSTEMS

CCA BASED ALGORITHMS FOR BLIND EQUALIZATION OF FIR MIMO SYSTEMS CCA BASED ALGORITHMS FOR BLID EQUALIZATIO OF FIR MIMO SYSTEMS Javier Vía and Ignacio Santamaría Dept of Communications Engineering University of Cantabria 395 Santander, Cantabria, Spain E-mail: {jvia,nacho}@gtasdicomunicanes

More information

Recursive l 1, Group lasso

Recursive l 1, Group lasso Recursive l, Group lasso Yilun Chen, Student Member, IEEE, Alfred O. Hero, III, Fellow, IEEE Abstract We introduce a recursive adaptive group lasso algorithm for real-time penalized least squares prediction

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

arxiv: v1 [math.st] 1 Dec 2014

arxiv: v1 [math.st] 1 Dec 2014 HOW TO MONITOR AND MITIGATE STAIR-CASING IN L TREND FILTERING Cristian R. Rojas and Bo Wahlberg Department of Automatic Control and ACCESS Linnaeus Centre School of Electrical Engineering, KTH Royal Institute

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

Title without the persistently exciting c. works must be obtained from the IEE

Title without the persistently exciting c.   works must be obtained from the IEE Title Exact convergence analysis of adapt without the persistently exciting c Author(s) Sakai, H; Yang, JM; Oka, T Citation IEEE TRANSACTIONS ON SIGNAL 55(5): 2077-2083 PROCESS Issue Date 2007-05 URL http://hdl.handle.net/2433/50544

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

ACCORDING to Shannon s sampling theorem, an analog

ACCORDING to Shannon s sampling theorem, an analog 554 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 2, FEBRUARY 2011 Segmented Compressed Sampling for Analog-to-Information Conversion: Method and Performance Analysis Omid Taheri, Student Member,

More information

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Huikang Liu, Yuen-Man Pun, and Anthony Man-Cho So Dept of Syst Eng & Eng Manag, The Chinese

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations Exact Topology Identification of arge-scale Interconnected Dynamical Systems from Compressive Observations Borhan M Sanandaji, Tyrone Vincent, and Michael B Wakin Abstract In this paper, we consider the

More information

ANYTIME OPTIMAL DISTRIBUTED KALMAN FILTERING AND SMOOTHING. University of Minnesota, 200 Union Str. SE, Minneapolis, MN 55455, USA

ANYTIME OPTIMAL DISTRIBUTED KALMAN FILTERING AND SMOOTHING. University of Minnesota, 200 Union Str. SE, Minneapolis, MN 55455, USA ANYTIME OPTIMAL DISTRIBUTED KALMAN FILTERING AND SMOOTHING Ioannis D. Schizas, Georgios B. Giannakis, Stergios I. Roumeliotis and Alejandro Ribeiro University of Minnesota, 2 Union Str. SE, Minneapolis,

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Modeling and Analysis of Dynamic Systems

Modeling and Analysis of Dynamic Systems Modeling and Analysis of Dynamic Systems by Dr. Guillaume Ducard c Fall 2016 Institute for Dynamic Systems and Control ETH Zurich, Switzerland G. Ducard c 1 Outline 1 Lecture 9: Model Parametrization 2

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information