Recursive l 1, Group lasso

Size: px
Start display at page:

Download "Recursive l 1, Group lasso"

Transcription

1 Recursive l, Group lasso Yilun Chen, Student Member, IEEE, Alfred O. Hero, III, Fellow, IEEE Abstract We introduce a recursive adaptive group lasso algorithm for real-time penalized least squares prediction that produces a time sequence of optimal sparse predictor coefficient vectors. At each time index the proposed algorithm computes an exact update of the optimal l, -penalized recursive least squares (RLS predictor. Each update minimizes a convex but nondifferentiable function optimization problem. We develop an online homotopy method to reduce the computational complexity. Numerical simulations demonstrate that the proposed algorithm outperforms the l regularized RLS algorithm for a group sparse system identification problem has lower implementation complexity than direct group lasso solvers. Index Terms RLS, group sparsity, mixed norm, homotopy, group lasso, system identification I. INTRODUCTION Recursive Least Squares (RLS is a widely used method for adaptive filtering prediction in signal processing related fields. Its applications include: acoustic echo cancelation; wireless channel equalization; interference cancelation data streaming predictors. In these applications a measurement stream is recursively fitted to a linear model, described by the coefficients of an FIR prediction filter, in such a way to minimize a weighted average of squared residual prediction errors. Compared to other adaptive filtering algorithms such as Least Mean Square (LMS filters, RLS is popular because of its fast convergence low steady-state error. In many applications it is natural to constrain the predictor coefficients to be sparse. In such cases the adaptive FIR prediction filter is a sparse system: only a few of the impulse response coefficients are non-zero. Sparse systems can be divided into general sparse systems group sparse systems [], [2]. Unlike a general sparse system, whose impulse response can have arbitrary sparse structure, a group sparse system has impulse response composed of a few distinct clusters of non-zero coefficients. Examples of group sparse systems include specular multipath acoustic wireless channels [3], [4] compressive spectrum sensing of narrowb sources [5]. The exploitation of sparsity to improve prediction performance has attracted considerable interest. For general sparse systems, the l norm has been recognized as an effective promotor of sparsity [6], [7]. In particular, l regularized LMS [2], [8] RLS [9], [] algorithms have been proposed for for sparsification of adaptive filters. For group sparse systems, mixed norms such as the l,2 norm the l, Y. Chen A. O. Hero are with the Department of Electrical Engineering Computer Science, University of Michigan, Ann Arbor, MI 489, USA. Tel: Fax: s: {yilun, hero}@umich.edu. This work was partially supported by AFOSR, grant number FA norm have been applied to promote group-level sparsity in statistical regression [] [3], commonly referred to as the group lasso, sparse signal recovery in signal processing communications [], [4]. In [2], the authors proposed a family of LMS filters with convex regularizers. As a specific scenario, they considered group sparse LMS filters with l,2 - type regularizers empirically demonstrated the superior performance of l,2 -regularized LMS over the l -regularized LMS for identifying unkonwn group-sparse channels. In this paper, we develop group sparse RLS algorithms for real-time system identification. Specifically, we consider RLS penalized by the l, norm to promote group sparsity, which we call the recursive l, group lasso. The algorithm is based on the homotopy approach to solving the lasso problem is a nontrivial extension of [5] [7] to group sparse structure. Both the l,2 l, regularizer have been widely adopted to enforce group sparse structure in regression other problems. While, as shown in [8], in some cases the l,2 is more effective in enforcing group sparsity, we adopt the l, norm due to its more favorable computational properties. As we show in this paper, the l, regularizer can be very efficiently implemented in the recursive setting of RLS. Our implementation exploits the piecewise linearity of the l, norm to obtain an exact solution to each iteration of the group sparsity-penalized RLS algorithm using the homotopy approach. Compared to other contemporary l, l,2 group lasso solvers [2], [9], our simulation results demonstrate that our proposed methods attain equivalent or better performance but at lower computational cost for online group sparse RLS problems. The paper is organized as follows. Section II formulates the problem. In Section III we develop the homotopy based algorithm to solve the recursive l, group lasso in an online recursive manner. Section IV provides numerical simulation results Section V summarizes our principal conclusions. The proofs of theorems some details of the proposed algorithm are provided in Appix. Notations: In the following, matrices vectors are denoted by boldface upper case letters boldface lower case letters, respectively; ( T denotes the transpose operator, denote the l l norm of a vector, respectively; for a set A, A denotes its cardinality φ denotes the empty set; x A denotes the sub-vector of x from the index set A R AB denotes the sub-matrix of R formed from the row index set A column index set B. A. Recursive Least Squares II. PROBLEM FORMULATION Let w be a p-dimensional coefficient vector. Let y be an n-dimensional vector comprised of observations {y j } n j=. Let

2 2 {x j } n j= be a sequence of p-dimensional predictor variables. In stard adaptive filtering terminology, y j, x j w are the primary signal, the reference signal, the adaptive filter weights. The RLS algorithm solves the following quadratic minimization problem recursively over time n = p, p +,...: n ŵ n = arg min γ n j (y j w T x j 2, ( w j= where γ (, ] is the forgetting factor controlling the tradeoff between transient steady-state behaviors. To serve as a template for the sparse RLS extensions described below we briefly review the RLS update algorithm. Define R n r n as n R n = γ n j x j x T j (2 r n = j= n γ n j x j y j. (3 j= The solution ŵ n to ( can be then expressed as ŵ n = R n r n. (4 The matrix R n the vector r n are updated as R n = γr n + x n x T n, r n = γr n + x n y T n. Applying the Sherman-Morrison-Woodbury formula [2], where R n = γ R n γ α n g n g T n, (5 g n = R n x n (6 α n = γ + x T n g n. (7 Substituting (5 into (4, we obtain the weight update [2] ŵ n = ŵ n + α n g n e n, (8 Fig.. system (a (b Examples of (a a general sparse system (b a group-sparse where {G m } M m= is a group partition of the index set G = {,..., p}, i.e., M m= G m = G, G m G m = φ if m m, w Gm is a sub-vector of w indexed by G m. The l, norm is a mixed norm: it encourages correlation among coefficients inside each group via the l norm within each group promotes sparsity across each group using the l norm. The mixed norm w, is convex in w reduces to w when each group contains only one coefficient, i.e., G = G 2 = = G M =. The l, group lasso solves the following penalized least squares problem: ŵ n = arg min w 2 n γ n j (y j w T x j 2 + λ w,, ( j= where λ is a regularization parameter as in stard RLS γ controls the trade-off between the convergence rate steady-state performance. Eq. ( is a convex problem can be solved by stard convex optimizers or path tracing algorithms [2]. Direct solution of ( has computational complexity of O(p 3. where e n = y n ŵ T n x n. (9 C. Recursive l, group lasso Equations (5-(9 define the RLS algorithm which has computational complexity of order O(p 2. B. Non-recursive l, group lasso The l, group lasso is a regularized least squares approach which uses the l, mixed norm to promote group-wise sparse pattern on the predictor coefficient vector. The l, norm of a vector w is defined as M w, = w Gm, m= In this subsection we obtain a recursive solution for ( that gives an update ŵ n from ŵ n. The approach taken is a group-wise generalization of recent works [5], [6] that uses the homotopy approach to sequentially solve the lasso problem. Using the definitions (2 (3, the problem ( is equivalent to ŵ n = arg min w 2 wt R n w w T r n + λ w, ( 2 wt γr n + x T n x n w w T (γr n + x n y n + λ w,. = arg min w (

3 3 Let f(β, λ be the solution to the following parameterized problem f(β, λ = arg min ( w 2 wt γr n + βx n x T n w (2 w T (γr n + βx n y n + λ w, where β is a constant between. ŵ n ŵ n of problem ( can be expressed as ŵ n = f(, γλ, ŵ n = f(, λ. Our proposed method computes ŵ n from ŵ n in the following two steps: Step. Fix β = calculate f(, λ from f(, γλ. This is accomplished by computing the regularization path between γλ λ using homotopy methods introduced for the nonrecursive l, group lasso. The solution path is piecewise linear the algorithm is described in [2]. Step 2. Fix λ calculate the solution path between f(, λ f(, λ. This is the key problem addressed in this paper. To ease the notations we denote x n y n by x y, respectively, define the following variables: R(β = γr n + βxx T (3 r(β = γr n + βxy. (4 Problem (2 is then f(β, λ = arg min w 2 wt R(βw w T r(β+λ w,. (5 In Section III we will show how to propagate f(, λ to f(, λ using the homotopy approach applied to (5. A. Set notation III. ONLINE HOMOTOPY UPDATE We begin by introducing a series of set definitions. Figure 2 provides an example. We divide the entire group index set into P Q, respectively, where P contains active groups Q is its complement. For each active group m P, we partition the group into two parts: the maximal values, with indices A m, the rest of the values, with indices B m : A m = arg max i G m w i, m P, B m = G m A m. The set A B are defined as the union of the A m B m sets, respectively: A = A m, B = B m. Finally, we define m P C = m Q G m. C m = G m C. m P G G 2 G 3 G 4 G 5 P Fig. 2. Illustration of the partitioning of a 2 element coefficient vector w into 5 groups of 4 indices. The sets P Q contain the active groups the inactive groups, respectively. Within each of the two active groups the maximal coefficients are denoted by the dark red color. B. Optimality condition The objective function in (5 is convex but non-smooth as the l, norm is non-differentiable. Therefore, problem (5 reaches its global minimum at w if only if the subdifferential of the objective function contains the zero vector. Let w, denote the sub-differential of the l, norm at w. A vector z w, only if z satisfies the following conditions (details can be found in [2], [4]: A z Am =, m P, (6 sgn (z Am = sgn (w Am, m P, (7 z B =, (8 z Cm, m Q, (9 where A, B, C, P Q are β-depent sets defined on w as defined in Section III-A. For notational convenience we drop β in R(β r(β leaving the β-depency implicit. The optimality condition is then written as Rw r + λz =, z w,. (2 As w C = z B =, (2 implies the three conditions R AA w A + R AB w B r A + λz A =, (2 R BA w A + R BB w B r B =, (22 R CA w A + R CB w B r C + λz C =. (23 The vector w A lies in a low dimensional subspace. Indeed, by definition of A m, if A m > w i = w i, i, i A m. Therefore, for any active group m P, where Q w Am = s Am α m (24 α m = w Gm, s A = sgn (w A. Using matrix notation, we represent (24 as w A = Sa. (25 B C

4 4 where S = s A... s A P (26 is a A P sign matrix the vector a is comprised of α m, m P. The solution to (5 can be determined in closed form if the sign matrix S sets (A, B, C, P, Q are available. Indeed, from (6 (7 S T z A =, (27 where is a P vector comprised of s. With (25 (27, (2 (22 are equivalent to S T R AA Sa + S T R AB w B S T r A + λ =, R BA Sa + R BB w B r B =. (28 Therefore, by defining the (a.s. invertible matrix ( S H = T R AA S S T R AB, (29 R BA S ( S b = T r A r B, v = R BB ( a w B, (3 (28 is equivalent to Hv = b λe, where e = ( T, T T, so that v = H (b λe. (3 As w A = Sa, the solution vector w can be directly obtained from v via (3. For the sub-gradient vector, it can be shown that λz A = r A (R AA S R AB v, (32 C. Online update z B = (33 λz C = r C (R CA S R CB v. (34 Now we consider (5 using the results in III-B. Let β β be two constants such that β > β. For a given value of β [β, β ] define the class of sets S = (A, B, C, P, Q make β explicit by writing S(β. Recall that S(β is specified by the solution f(β, λ defined in (9. Assume that S(β does not change for β [β, β ]. The following theorem propagates f(β, λ to f(β, λ via a simple algebraic relation. Theorem. Let β β be two constants such that β > β for any β [β, β ] the solutions to (5 share the same sets S = (A, B, C, P, Q. Let v v be vectors defined as f(β, λ f(β, λ, respectively. Then v = v + β β + σh 2 β (y ŷg, (35 the corresponding sub-gradient vector has the explicit update λz A = λz A + β β + σh 2 β (y ŷ {x A (R AA S R AB g} (36 λz C = λz C + β β + σh 2 β (y ŷ {x C (R CA S R CB g}, (37 where R = R( as defined in (3, (x, y is the new sample as defined in (3 (4, the sign matrix S is obtained from the solution at β = β, H is calculated from (29 using S R(, d, u, ŷ σh 2 are defined by ( S d = T x A x B, (38 g = H d, (39 ŷ = d T v, (4 σ 2 H = d T g. (4 The proof of Theorem is provided in Appix A. Theorem provides the closed form update for the solution path f(β, λ f(β, λ, under the assumption that the associated sets S(β remain unaltered over the path. Next, we partition the range β [, ] into contiguous segments over which S(β is piecewise constant. Within each segment we can use Theorem to propagate the solution from left point to right point. Below we specify an algorithm for finding the points of each of these segments. Fix an point β of one of these segments. We seek a critical point β that is defined as the maximum β ensuring S(β remains unchanged within [β, β ]. By increasing β from β, the sets S(β will not change until at least one of the following conditions are met: Condition. There exists i A such that z i = ; Condition 2. There exists i B m such that w i = α m; Condition 3. There exists m P such that α m = ; Condition 4. There exists m Q such that z C m =. Condition is from (7 (8, Condition 2 3 are based on definitions of A P, respectively, Condition 4 comes from (6 (9. Following [2], [22], the four conditions can be assumed to be mutually exclusive. The actions with respect to Conditions -4 are given by Action. Move the entry i from A to B: A A {i}, B B {i}; Action 2. Move the entry i from B to A: A A {i}, B B {i}; Action 3. Remove group m from the active group list update the related sets Action 4. Select group m update the related sets P P {m}, Q Q {m}, A A A m, C C A m ; P P {m}, Q Q {m}, A A C m, C C C m.

5 5 By Theorem, the solution update from β to β is in closed form. The critical point of β can be determined in a straightforward manner (details are provided in Appix B. Let β (k, k =,..., 4 be the minimum value that is greater than β meets Condition -4, respectively. The critical point β is then β = min k=,...,4 β(k. D. Homotopy algorithm implementation We now have all the ingredients for the homotopy update algorithm the pseudo code is given in Algorithm. Algorithm : Homotopy update from f(, λ to f(, λ. Input : f(, λ, R(, x, y output: f(, λ Initialize β =, β =, R = R(; Calculate (A, B, C, P, Q (v, λz A, λz C from f(, λ; while β < do Calculate the environmental variables (S, H, d, g, ŷ, σh 2 from f(β, λ R; Calculate {β (k }4 k= that meets Condition -4, respectively; Calculate the critical point β that meets Condition k : k = arg min k β (k β = β (k ; if β then Update (v, λz A, λz C using (35, (36 (37; Update (A, B, C, P, Q by Action k ; β = β ; else break; β = ; Update (v, λz A, λz C using (35; Calculate f(, λ from v. Next we analyze the computational cost of Algorithm. The complexity to compute each critical point is summarized in Table I, where N is the dimension of H. As N = P + B A + B, N is upper bounded by the number of non-zeros in the solution vector. The vector g can be computed in O(N 2 time using the matrix-inverse lemma [2] the fact that, for each action, H is at most perturbed by a rank-two matrix. This implies that the computation complexity per critical point is O(p max{n, log p} the total complexity of the online update is O(k 2 p max{n, log p}, where k 2 is the number of critical points of β in the solution path f(, λ f(, λ. This is the computational cost required for Step 2 in Section II-C. A similar analysis can be performed for the complexity of Step, which requires O(k p max{n, log p} where k is the number of critical points in the solution path f(, γλ f(, λ. Therefore, the overall computation complexity of the recursive l, group lasso is O(k p max{n, log p}, where g = H d O(N 2 x A (R AA S R AB g O( A N x C (R CA S R CB g O( C N β ( O( A β (2 O( B β (3 O( P β (4 O( C log C TABLE I COMPUTATION COSTS OF ONLINE HOMOTOPY UPDATE FOR EACH CRITICAL POINT (a (b Fig. 3. Responses of the time varying system. (a: Initial response. (b: Response after the 2th iteration. The groups for Algorithm were chosen as 2 equal size contiguous groups of coefficients partitioning the range,...,. k = k + k 2, i.e., the total number of critical points in the solution path f(, γλ f(, λ f(, λ. An instructive benchmark is to directly solve the n-samples problem (2 from the solution path f(, (i.e., a zero vector f(, λ [2], without using the previous solution ŵ n. This algorithm, called icap in [2], requires O(k p max{n, log p}, where k is the number of critical points in f(, f(, λ. Empirical comparisons between k k, provided in the following section, indicate that icap requires significantly more computation than our proposed Algorithm. IV. NUMERICAL SIMULATIONS In this section we demonstrate our proposed recursive l, group lasso algorithm by numerical simulation. We simulated the model y j = w T x j + v j, j =,..., 4, where v j is a zero mean Gaussian noise w is a sparse p = element vector containing only 4 non-zero coefficients clustered between indices See Fig. 3 (a. After 2 time units, the locations of the non-zero coefficients of w is shifted to the right, as indicated in Fig. 3 (b. The input vectors were generated as indepent identically distributed Gaussian rom vectors with zero mean identity covariance matrix, the variance of observation noise v j is.. We created the groups in the recursive l, group lasso as follows. We divide the RLS filter coefficients w into 2 groups with group boundaries, 5,,..., where each group contains 5 coefficients. The forgetting factor γ

6 6 MSE (db 5 5 Proposed RLS Recursive Lasso Averaged Number of Critical Points Homotopy on β λ Homotopy on λ Time Fig. 4. Averaged MSE of the proposed algorithm, RLS recursive lasso. The corresponding system response is shown in Fig. 3. the regularization parameter λ were set to.9., respectively. We repeated the simulation times the averaged mean squared errors of the RLS, sparse RLS proposed RLS shown in Fig. 4. We implemented the stard RLS sparse RLS using the l regularization, where the forgetting factors are also set to.9. As [9], [] [5] are solving the same online optimization problem it suffices to compare the MSE of [5] to that of sparse RLS. The regularization parameter λ was chosen to achieve the lowest steady-state error, resulting in λ =.5. It is important to note that, while their computational complexity is lower than that of our proposed RLS algorithm ( lower than that of [5], the methods of [9] [] only approximate the exact l penalized solution. It can be seen from Fig. 4 that our proposed sparse RLS method outperforms stard RLS sparse RLS in both convergence rate steady-state MSE. This demonstrates the power of our group sparsity penalty. At the change point of 2 iterations, both the proposed method sparse RLS of [5] show superior tracking performances as compared to the stard RLS. We also observe that the proposed method achieves even smaller MSE after the change point occurs. This is due to the fact that the active cluster spans across group boundaries in the initial system (Fig. 3 (a, while the active clusters in the shifted system overlap with fewer groups. Fig. 5 shows the average number of critical points (accounting for both trajectories in β λ of the proposed algorithm, i.e., the number k as defined in Section III-D. As a comparison, we implement the icap method of [2], a homotopy based algorithm that traces the solution path only over λ. The average number of critical points for icap is plotted in Fig. 5, which is the number k in Section III-D. Both the proposed algorithm icap yield the same solution but have different computational complexities proportional to k k, respectively. It can be seen that the proposed algorithm saves as much as 75% of the computation costs for equivalent performance. We next investigated the effect of adding a second group of non-zero coefficients to w compare different algorithms for group-sparse RLS algorithms. Fig. 6(a-(b shows these coefficients before after a relocation of both the Time Fig. 5. Averaged number of critical points for the proposed recursive method of implementing l, lasso the icap [2] non-recursive method of implementation. The corresponding system response is shown in Fig. 3. two groups in reverse directions at the 2th time point. The parameter settings group allocation setting for our RLS algorithm are the same as before. The regularization parameter λ is set to.3 for the proposed method. For comparison, we utilized the SpaRSA algorithm [9] implemented in a public software package to solve group-sparse regularized RLS problems in an online manner. The SpaRSA algorithm is an efficient iterative method in which each step is obtained by solving an optimization subproblem, it supports warmstart capabilities by choosing good initial values. In our online implementation, the current estimate is used as the initial value for solving the next estimate using the updated measurement sample. Both l, l,2 regularizers are employed in the SpaRSA algorithm. The regularization parameter λ of l, - SpaRSA is set to the same value as in the proposed method the λ for l,2 -SpaRSA is tuned for best steady-state MSE. As SpaRSA is an iterative algorithm, its solution is depent on the stopping rule, which is selected based on stopping threshold on the relative change in the objective value. Two stopping threshold values, 3 5, are set for l, - SpaRSA, respectively, 5 is set to l,2 -SpaRSA, The MSE performance comparison is shown in Fig. 7 the averaged computational time performed on an Intel Core 2.53GHz CPU is plotted in Fig. 8. It can be observed from Fig. 7 Fig. 8 that both the MSE performance computational time of SpaRSA algorithms dep on the stopping threshold. This is due to the fact that SpaRSA obtains approximate solutions of the regularized RLS problem rather than the exact solution attained by the proposed method. For l, - SpaRSA, a large threshold value implies faster run time but less tracking performance in MSE as compared to a smaller stopping threshold. We observed that the MSE curve of the more computationally deming l, -SpaRSA will approach to that of the proposed method when the stopping threshold is very large as the two algorithms solve the same optimization problem (results not shown here. For l,2 -SpaRSA, the algorithm achieves comparable computational time with the proposed method with the specified stopping threshold, while Available from mtf/sparsa/

7 7 Fig. 6. Coefficients for the spurious group misalignment experiment shown in Figs (a: Initial response. (b: Response after the 2th iteration. The groups for the recursive group sparse RLS algorithm (Algorithm were chosen as 2 equal size contiguous groups of coefficients partitioning the range,...,. Fig. 8. Averaged computational time of the proposed algorithm, RLS SpaRSA algorithms [9] using an Intel Core 2.53GHz CPU. The stopping thresholds for l, -SpaRSA (a, l, -SpaRSA (b l,2 -SpaRSA are set to 3, 5 5, respectively. The corresponding system response is shown in Fig. 6. that the proposed method outperformed the stard l regularized RLS for identifying an unknown group sparse system, in terms of both tracking steady-state mean squared error. As indicated in Section IV, the MSE performance of the proposed method deps on the choice of group partition. The development of performance-optimal data-depent group partitioning algorithms is a worthwhile topic for future study. Another worthwhile area of research is the extension of the proposed method to overlapping groups other flexible partitions [23]. Fig. 7. Averaged MSE of the proposed algorithm, RLS SpaRSA algorithms [9]. The stopping thresholds for l, -SpaRSA (a, l, -SpaRSA (b l,2 -SpaRSA are set to 3, 5 5, respectively. The corresponding system response is shown in Fig. 6. VI. APPENDIX The authors would like to thank the reviewers for their valuable comments on the paper. the MSE performance of l,2 -SpaRSA is worse than that of the proposed method. Note that the system response before the 2th iteration is perfectly aligned with the group allocation but is shifted out of alignment after the change point. The difference between the MSE curves of the proposed method before after the transition suggests that our proposed recursive group lasso is not overly sensitive to shifts in the group-partition even when there are multiple groups. V. CONCLUSION In this paper we proposed a l, regularized RLS algorithm for online sparse linear prediction. We developed a homotopy based method to sequentially update the solution vector as new measurements are acquired. Our proposed algorithm uses the previous estimate as a warm-start, from which we compute the homotopy update to the exact current solution. The proposed algorithm can process streaming measurements with time varying predictors is computationally efficient compared to non-recursive contemporary warm-start capable group lasso solvers. Numerical simulations demonstrated A. Proof of Theorem VII. APPENDIX We begin by deriving (35. According to (3, v = H (b λe. (42 As S (A, B, C, P, Q remain constant within [β, β ], where e = e, (43 b = b + δdy, (44 H = H + δdd T, δ = β β, H b are calculated using S within [β, β ] R(β r(β, respectively. We emphasize that H is based on R(β is different from H defined in Theorem. According to the Sherman-Morrison-Woodbury formula, H = H δ + σ 2 δ (H d(h d T, (45

8 8 where σ 2 = d T H d. Substituting (43, (44 (45 into (42, after simplification we obtain ( v = H δ + σ 2 δ (H d(h d T (b + δdy λe = H (b λe + H δdy δ + σ 2 δ H dd T H (b λe σ2 δ 2 + σ 2 δ H dy δ = v + + σ 2 δ (y dt vh d δ = v + + σ 2 δ (y ŷh d, (46 where ŷ = d T v as defined in (4. Note that H is defined in terms of R(β rather than R( H = H + β dd T, so that where g σh 2 As σh 2 = dt g, Accordingly, H = H β + σh 2 β gg T, (47 are defined by (39 (4, respectively. H d = H d σ2 H β + σh 2 β g. (48 σ 2 = d T H d = σh 2 σ2 H β + σh 2 β σh 2 = σ 2 H Substituting (48 (49 to (46, we finally obtain + σh 2 β. (49 v δ = v + + σh 2 β (y ŷg = v + β β + σh 2 β (y ŷg. Equations (36 (37 can be established by direct substitutions of (35 into their definitions (32 (34 thus the proof of Theorem is complete. B. Computation of critical points For ease of notation we work with ρ, defined by ρ = β β + σh 2 β. (5 It is easy to see that over the range β > β, ρ is monotonically increasing in (, /σh 2. Therefore, (5 can be inverted by β = ρ + β σh 2 (5 ρ, where ρ (, /σh 2 to ensure β > β. Suppose we have obtained ρ (k, k =,..., 4, β (k can be calculated using (5 the critical point β is then β = min k=,...,4 β(k. We now calculate the critical value of ρ for each condition one by one. Critical point for Condition : Define the temporary vector According to (36, t A = (y ŷ {x A (R AA S R AB g}. λz A = λz A + ρt A. Condition is met for any ρ = ρ ( i ρ ( i such that = λz i t i, i A. Therefore, the critical value of ρ that satisfies Condition is { } ρ ( = min i A, ρ ( i (, /σh 2. ρ ( i 2 Critical point for Condition 2: By the definition (3, v is a concatenation of α m w Bm, m P: v T = ((α m m P, w T B,..., w T B P, (52 where (α m m P denotes the vector comprised of α m, m P. Now we partition the vector g in the same manner as (52 denote τ m u m as the counter part of α m w Bm in g, i.e., ( g T = (τ m m P, u T,..., u T P. Eq. (35 is then equivalent to α m = α m + ρτ m, (53 w B m,i = w Bm,i + ρu m,i, where u m.i is the i-th element of the vector u m. Condition 2 indicates that is satisfied if ρ = ρ (2+ m,i ρ (2+ m,i α m = ±w B m,i, or ρ = ρ (2 m,i, where = α m w Bm,i, ρ (2 m,i = α m + w Bm,i. u m,i τ m u m,i + τ m Therefore, the critical value of ρ for Condition 2 is { } ρ (2 = min ρ (2± m,i m P, i =,..., B m, ρ (2± m,i (, /σh 2. 3 Critical point for Condition 3: According to (53, α m = yields ρ = ρ (3 i determined by ρ (3 m = α m τ m, m P, the critical value for ρ (3 is { } ρ (3 = min ρ (3 m, m P, ρ (3 m (, /σh 2.

9 9 h(x h(x 2 h(x 3 h(x 4 k k k 2 k 3 k 4 x x 2 x 3 x 4 x 5 Fig. 9. An illustration of the fast algorithm for critical condition 4. 4 Critical point for Condition 4: Define Eq. (37 is then t C = (y ŷ {x C (R CA S R CB g}. λz C m = λz Cm + ρt Cm, Condition 4 is equivalent to i C m ρt i + λz i = λ. (54 To solve (54 we develop a fast method that requires complexity of O(N log N, where N = C m. The algorithm is given in Appix C. For each m Q, let ρ (4 m be the minimum positive solution to (54. The critical value of ρ for Condition 4 is then { } ρ (4 = min ρ (4 m m Q, ρ (4 m (, /σh 2. k 5 y Algorithm 2: Solve x from N i= a i x x i = y. Input : {a i, x i } N i=, y output: x min, x max Sort {x i } N i= in the ascing order: x x 2... x N ; Re-order {a i } N i= such that a i corresponds to x i ; Set k = N i= a i; for i =,..., N do k i = k i + 2a i ; Calculate h = N i=2 a i x x i ; for i = 2,..., N do h i = h i + k i (x i x i if min i k i > y then No solution; Exit; else if y > h then x min = x + (y h /k ; else Seek j such that y [h j, h j ]; x min = x j + (y h j /k j ; if y > h N then x max = x N + (y h N /k N ; else Seek j such that y [h j, h j ]; x max = x j + (y h j /k j ; C. Fast algorithm for critical condition 4 Here we develop an algorithm to solve problem (54. Consider solving the more general problem: N a i x x i = y, (55 i= where a i x i are constants a i >. Please note that the notations here have no connections to those in previous sections. Define the following function h(x = N a i x x i. i= The problem is then equivalent to finding h (y, if it exists. An illustration of the function h(x is shown in Fig. 9, where k i denotes the slope of the ith segment. It can be shown that h(x is piecewise linear convex in x. Therefore, the equation (55 generally has two solutions if they exist, denoted as x min x max. Based on piecewise linearity we propose a search algorithm to solve (55. The pseudo code is shown in Algorithm 2 its computation complexity is dominated by the sorting operation which requires O(N log N. REFERENCES [] Y.C. Eldar, P. Kuppinger, H. Bolcskei, Block-sparse signals: Uncertainty relations efficient recovery, Signal Processing, IEEE Transactions on, vol. 58, no. 6, pp , 2. [2] Y. Chen, Y. Gu, A.O. Hero, Regularized Least-Mean-Square Algorithms, Arxiv preprint arxiv:2.566, 2. [3] W.F. Schreiber, Advanced television systems for terrestrial broadcasting: Some problems some proposed solutions, Proceedings of the IEEE, vol. 83, no. 6, pp , 995. [4] Y. Gu, J. Jin, S. Mei, l Norm Constraint LMS Algorithm for Sparse System Identification, IEEE Signal Processing Letters, vol. 6, pp , 29. [5] M. Mishali Y.C. Eldar, From theory to practice: Sub-Nyquist sampling of sparse wideb analog signals, Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp , 2. [6] R. Tibshirani, Regression shrinkage selection via the lasso, J. Royal. Statist. Soc B., vol. 58, pp , 996. [7] E. Cès, Compressive sampling, Int. Congress of Mathematics, vol. 3, pp , 26. [8] Y. Chen, Y. Gu, A.O. Hero, Sparse LMS for system identification, in Acoustics, Speech Signal Processing, 29. ICASSP 29. IEEE International Conference on. IEEE, 29, pp [9] B. Babadi, N. Kalouptsidis, V. Tarokh, SPARLS: The sparse RLS algorithm, Signal Processing, IEEE Transactions on, vol. 58, no. 8, pp , 2. [] D. Angelosante, J.A. Bazerque, G.B. Giannakis, Online Adaptive Estimation of Sparse Signals: Where RLS Meets the l norm, Signal Processing, IEEE Transactions on, vol. 58, no. 7, pp , 2. [] M. Yuan Y. Lin, Model selection estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology, vol. 68, no., pp , 26.

10 [2] P. Zhao, G. Rocha, B. Yu, The composite absolute penalties family for grouped hierarchical variable selection, Annals of Statistics, vol. 37, no. 6A, pp , 29. [3] F.R. Bach, Consistency of the group Lasso multiple kernel learning, The Journal of Machine Learning Research, vol. 9, pp , 28. [4] S. Negahban M.J. Wainwright, Joint support recovery under high-dimensional scaling: Benefits perils of l, -regularization, Advances in Neural Information Processing Systems, pp. 6 68, 28. [5] P.J. Garrigues E.L. Ghaoui, An homotopy algorithm for the Lasso with online observations, in Neural Information Processing Systems (NIPS, 28, vol. 2. [6] S. Asif J. Romberg, Dynamic Updating for l Minimization, Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp , 2. [7] D.M. Malioutov, S.R. Sanghavi, A.S. Willsky, Sequential compressed sensing, Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp , 2. [8] S.N. Negahban M.J. Wainwright, Simultaneous support recovery in high dimensions: Benefits perils of block l l -regularization, Information Theory, IEEE Transactions on, vol. 57, no. 6, pp , 2. [9] S.J. Wright, R.D. Nowak, M.A.T. Figueiredo, Sparse reconstruction by separable approximation, Signal Processing, IEEE Transactions on, vol. 57, no. 7, pp , 29. [2] W.W. Hager, Updating the inverse of a matrix, SIAM review, vol. 3, no. 2, pp , 989. [2] B. Widrow S.D. Stearns, Adaptive Signal Processing, New Jersey: Prentice Hall, 985. [22] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Annals of statistics, vol. 32, no. 2, pp , 24. [23] R. Jenatton, J.Y. Audibert, F. Bach, Structured Variable Selection with Sparsity-Inducing Norms, Arxiv preprint arxiv: , 29.

Recursive l 1, Group lasso

Recursive l 1, Group lasso Recursive l, Group lasso Yilun Chen, Student Member, IEEE, Alfred O. Hero, III, Fellow, IEEE arxiv:.5734v [stat.me] 29 Jan 2 Abstract We introduce a recursive adaptive group lasso algorithm for real-time

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

squares based sparse system identification for the error in variables

squares based sparse system identification for the error in variables Lim and Pang SpringerPlus 2016)5:1460 DOI 10.1186/s40064-016-3120-6 RESEARCH Open Access l 1 regularized recursive total least squares based sparse system identification for the error in variables Jun

More information

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels

Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Sparse Recovery of Streaming Signals Using. M. Salman Asif and Justin Romberg. Abstract

Sparse Recovery of Streaming Signals Using. M. Salman Asif and Justin Romberg. Abstract Sparse Recovery of Streaming Signals Using l 1 -Homotopy 1 M. Salman Asif and Justin Romberg arxiv:136.3331v1 [cs.it] 14 Jun 213 Abstract Most of the existing methods for sparse signal recovery assume

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Alternating Optimization with Shrinkage

Alternating Optimization with Shrinkage Sparsity-Aware Adaptive Algorithms Based on 1 Alternating Optimization with Shrinkage Rodrigo C. de Lamare and Raimundo Sampaio-Neto arxiv:1401.0463v1 [cs.sy] 2 Jan 2014 Abstract This letter proposes a

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Title without the persistently exciting c. works must be obtained from the IEE

Title without the persistently exciting c.   works must be obtained from the IEE Title Exact convergence analysis of adapt without the persistently exciting c Author(s) Sakai, H; Yang, JM; Oka, T Citation IEEE TRANSACTIONS ON SIGNAL 55(5): 2077-2083 PROCESS Issue Date 2007-05 URL http://hdl.handle.net/2433/50544

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

operator and Group LASSO penalties

operator and Group LASSO penalties Multidimensional shrinkage-thresholding operator and Group LASSO penalties Arnau Tibau Puig, Ami Wiesel, Gilles Fleury and Alfred O. Hero III Abstract The scalar shrinkage-thresholding operator is a key

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Within Group Variable Selection through the Exclusive Lasso

Within Group Variable Selection through the Exclusive Lasso Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,

More information

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK TRNKA PAVEL AND HAVLENA VLADIMÍR Dept of Control Engineering, Czech Technical University, Technická 2, 166 27 Praha, Czech Republic mail:

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Exploiting Sparsity for Wireless Communications

Exploiting Sparsity for Wireless Communications Exploiting Sparsity for Wireless Communications Georgios B. Giannakis Dept. of ECE, Univ. of Minnesota http://spincom.ece.umn.edu Acknowledgements: D. Angelosante, J.-A. Bazerque, H. Zhu; and NSF grants

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

arxiv: v1 [cs.lg] 20 Feb 2014

arxiv: v1 [cs.lg] 20 Feb 2014 GROUP-SPARSE MATRI RECOVERY iangrong Zeng and Mário A. T. Figueiredo Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal ariv:1402.5077v1 [cs.lg] 20 Feb 2014 ABSTRACT We apply the

More information

LEAST Mean Squares Algorithm (LMS), introduced by

LEAST Mean Squares Algorithm (LMS), introduced by 1 Hard Threshold Least Mean Squares Algorithm Lampros Flokas and Petros Maragos arxiv:1608.0118v1 [cs.sy] 3 Aug 016 Abstract This work presents a new variation of the commonly used Least Mean Squares Algorithm

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

Compressed Sensing with Quantized Measurements

Compressed Sensing with Quantized Measurements Compressed Sensing with Quantized Measurements Argyrios Zymnis, Stephen Boyd, and Emmanuel Candès April 28, 2009 Abstract We consider the problem of estimating a sparse signal from a set of quantized,

More information

GREEDY RLS FOR SPARSE FILTERS

GREEDY RLS FOR SPARSE FILTERS 18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmark, August 23-27, 2010 GREEDY RLS FOR SPARSE FILTERS Bogdan Dumitrescu, Ioan Tăbuş Department of Signal Processing Tampere University

More information

Sparse Algorithms are not Stable: A No-free-lunch Theorem

Sparse Algorithms are not Stable: A No-free-lunch Theorem Sparse Algorithms are not Stable: A No-free-lunch Theorem Huan Xu Shie Mannor Constantine Caramanis Abstract We consider two widely used notions in machine learning, namely: sparsity algorithmic stability.

More information

Transductive Experiment Design

Transductive Experiment Design Appearing in NIPS 2005 workshop Foundations of Active Learning, Whistler, Canada, December, 2005. Transductive Experiment Design Kai Yu, Jinbo Bi, Volker Tresp Siemens AG 81739 Munich, Germany Abstract

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

MODIFIED DISTRIBUTED ITERATIVE HARD THRESHOLDING

MODIFIED DISTRIBUTED ITERATIVE HARD THRESHOLDING MODIFIED DISTRIBUTED ITERATIVE HARD THRESHOLDING Puxiao Han, Ruixin Niu Virginia Commonwealth University Dept. of Electrical and Computer Engineering Richmond, VA, 23284, U.S.A. Email: {hanp, rniu}@vcu.edu

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations

Exact Topology Identification of Large-Scale Interconnected Dynamical Systems from Compressive Observations Exact Topology Identification of arge-scale Interconnected Dynamical Systems from Compressive Observations Borhan M Sanandaji, Tyrone Vincent, and Michael B Wakin Abstract In this paper, we consider the

More information

An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models

An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, An On-line Method for Estimation of Piecewise Constant Parameters in Linear Regression Models Soheil Salehpour, Thomas Gustafsson,

More information

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis 84 R. LANDQVIST, A. MOHAMMED, COMPARATIVE PERFORMANCE ANALYSIS OF THR ALGORITHMS Comparative Performance Analysis of Three Algorithms for Principal Component Analysis Ronnie LANDQVIST, Abbas MOHAMMED Dept.

More information

Learning Bound for Parameter Transfer Learning

Learning Bound for Parameter Transfer Learning Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Rahul Garg IBM T.J. Watson research center grahul@us.ibm.com Rohit Khandekar IBM T.J. Watson research center rohitk@us.ibm.com

More information

Near Optimal Adaptive Robust Beamforming

Near Optimal Adaptive Robust Beamforming Near Optimal Adaptive Robust Beamforming The performance degradation in traditional adaptive beamformers can be attributed to the imprecise knowledge of the array steering vector and inaccurate estimation

More information

MIST: l 0 Sparse Linear Regression with Momentum

MIST: l 0 Sparse Linear Regression with Momentum MIST: l 0 Sparse Linear Regression with Momentum Goran Marjanovic, Magnus O. Ulfarsson, Alfred O. Hero III arxiv:409.793v [stat.ml] 9 Mar 05 Abstract Significant attention has been given to minimizing

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

1-Bit Compressive Sensing

1-Bit Compressive Sensing 1-Bit Compressive Sensing Petros T. Boufounos, Richard G. Baraniuk Rice University, Electrical and Computer Engineering 61 Main St. MS 38, Houston, TX 775 Abstract Compressive sensing is a new signal acquisition

More information

Performance Analysis of Norm Constraint Least Mean Square Algorithm

Performance Analysis of Norm Constraint Least Mean Square Algorithm IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 5, MAY 2012 2223 Performance Analysis of Norm Constraint Least Mean Square Algorithm Guolong Su, Jian Jin, Yuantao Gu, Member, IEEE, and Jian Wang Abstract

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information