arxiv: v2 [cs.na] 10 Feb 2018

Size: px
Start display at page:

Download "arxiv: v2 [cs.na] 10 Feb 2018"

Transcription

1 Efficient Robust Matrix Factorization with Nonconvex Penalties Quanming Yao, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong { qyaoaa, jamesk }@cse.ust.hk arxiv: v2 [cs.na] 10 Feb 2018 Abstract Robust matrix factorization (RMF), which uses the l 1-loss, often outperforms standard matrix factorization using the l 2- loss, particularly when outliers are present. The state-of-theart RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) l 1-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-theart in terms of both accuracy and scalability. Introduction Matrix factorization (MF) is a fundamental machine learning tool, and serves as an important component in many applications such as computer vision (Basri, Jacobs, and Kemelmacher 2007; Gu et al. 2014; Ji et al. 2010), social networks (Yang and Leskovec 2013) and recommender systems (Koren 2008). The square loss has been commonly used in MF (Mnih and Salakhutdinov 2008; Koren 2008; Candès and Recht 2009). However, this implicitly assumes the Gaussian noise, and is sensitive to outliers. Eriksson & Van Den Hengel (2010) proposed robust matrix factorization (RMF), which uses the l 1 -loss instead, and obtains much better empirical performance. Unfortunately, the resultant nonconvex nonsmooth optimization problem is much more difficult. Most RMF solvers are not scalable (Eriksson and Van Den Hengel 2010; Zheng et al. 2012; Cabral et al. 2013; Meng et al. 2013; Kim et al. 2015). Recently, Cambier & Absil (2016) proposed the robust matrix completion (RMC) algorithm, which smooths the l 1 -loss and then performs gradient descent on the Riemannian manifold. empirically, it can be used on large sparse matrices. However, as smoothing is used, RMC only approximates the RMF solution. Copyright c 2018, Association for the Advancement of Artificial Intelligence ( All rights reserved. The current state-of-the-art RMF solver is the RMF-MM (Lin, Xu, and Zha 2017), which is based on the technique of majorization minimization (MM) (Lange, Hunter, and Yang 2000; Hunter and Lange 2004; Mairal 2013). In each iteration, instead of optimizing the original RMF objective, a convex nonsmooth surrogate is constructed and optimized by the linearized alternating direction method with parallel splitting and adaptive penalty (LADMPSAP) algorithm (Lin, Liu, and Li 2015), a variant of the alternating direction method of multipliers (ADMM) (Boyd et al. 2011). Empirically, RMF-MM has fast convergence. Besides, it is the only RMF solver with convergence guarantees. However, it does not utilize data sparsity, and so cannot be used on large sparse matrices. Though the l 1 -loss used in RMF is more robust than the l 2 -loss, it may still not be robust enough for outliers. A similar observation is also made recently on the l 1 -regularizer in sparse learning and low-rank matrix learning (Gong et al. 2013; Zuo et al. 2013; Gu et al. 2014; Lu et al. 2016; Yao and Kwok 2016). To alleivate this problem, various nonconvex regularizers have been introduced. Examples includue the Geman penalty (Geman and Yang 1995), logsum penalty (Candès, Wakin, and Boyd 2008) and Laplace penalty (Trzasko and Manduca 2009). Empirically, they achieve much better performance on tasks such as feature selection (Gong et al. 2013; Zuo et al. 2013) and image denoising (Gu et al. 2014; Lu et al. 2016). In this paper, we propose to improve the robustness of RMF by using these nonconvex functions (instead of l 1 or l 2 ) as the loss function. Note that the above papers used these nonconvex functions as regularizers, not as the loss. The resultant optimization problem is difficult, and existing RMF solvers cannot be used. As for RMF-MM, we rely on the more flexible MM optimization technique, and a new MM surrogate is proposed. To improve scalabiltiy, we transform the surrogate to its dual and then solve it with the accelerated proximal gradient (APG) algorithm (Beck 2009; Nesterov 2013). Data sparsity can also be exploited in the design of the APG algorithm. As for its convergence analysis, proof techniques in RMF-MM cannot be used as the loss is no longer convex. Instead, we develop new proof techniques based on the Clarke subdifferential, and show that convergence to a critical point can be guaranteed. Extensive experiments demonstrate the superiority of the

2 proposed algorithm over the state-of-the-art in terms of both accuracy and scalability. Notation For scalar x, sign (x) = 1 if x > 0, 0 if x = 0, and 1 otherwise. For a vector x, Diag(x) constructs a diagonal matrix X with X ii = x i. For a matrix X, X F = ( i,j X2 ij )1/2 is its Frobenius norm, X 1 = i,j X ij is its l 1 -norm, and nnz(x) is the number of nonzero elements in X. for a square matrix X, tr(x) = i X ii is its trace. for two matrices X, Y, denotes element-wise product. For a smooth function f, f is its gradient. for a convex f, G f(x) = {U : f(y ) f(x) + tr(u (Y X))} is a subgradient. for a continuous f, f is the Clarke subdifferential (Clarke 1990). Related Work Majorization Minimization Majorization minimization (MM), also called optimization transfer, is a general technique to make difficult optimization problems easier (Lange, Hunter, and Yang 2000; Hunter and Lange 2004; Mairal 2013). Many algorithms can be seen as special cases of the MM algorithm. Examples include the EM algorithm, proximal algorithm, and difference of convex programming. Consider a function h(x), which is hard to optimize. In using MM, let the iterate at the kth iteration be X k. The next iterate is generated as X k+1 = X k + arg min X f k (X), (1) where f k is a surrogate that is being optimized instead of h. As suggested in (Lange, Hunter, and Yang 2000), a good surrogate should have the following properties: (a) h(x k + X) f k (X) for any X k ; (b) 0=arg min X f k (X) h(x k + X) and h(x k ) = f k (0); (c) f k is convex. Robust Matrix Factorization (RMF) In matrix factorization (MF), a matrix M R m n is approximated by UV, where U R m r, V R n r and r min(m, n). Moreover, some entries of M may be missing, as in applications such as structure from motion (SfM) (Basri, Jacobs, and Kemelmacher 2007) and recommender systems (Koren 2008). The MF problem can be formulated as: min U,V 1 2 W (M UV ) 2 F + λ 2 ( U 2 F + U 2 F ), (2) where W {0, 1} m n contain indices to missing entries in M (with W ij = 1 if M ij is observed, and 0 otherwise), and λ 0 is a regularization parameter. The l 2 -loss in (2) is sensitive to outliers. De La Torre & Black (2003) replaced it by the l 1 -loss, leading to robust matrix factorization (RMF): min U,V W (M UV ) 1 + λ 2 ( U 2 F + V 2 F ). (3) Many RMF solvers have been developed. However, as (3) is neither convex nor smooth, these solvers lack scalability and/or robustness and/or convergence guarantees. Very recently, Lin et al. (2017) proposed the RMF-MM algorithm, which solves (3) using MM. Let the iterate at the kth iteration be (U k, V k ). RMF-MM finds increments (Ū, V ) to be added to the target (U, V ) solution: U = U k + Ū, V = V k + V. (4) Given (U k, V k ), the objective in (3) can be rewritten as H k (Ū, V ) W (M (U k + Ū)(V k + V ) ) 1 + λ 2 U k + Ū 2 F + λ 2 V k + V 2 F. (5) The following Proposition constructs a surrogate F k that satisfies properties (a) and (b) in Section. Moreover, unlike H k, F k is jointly convex on (Ū, V ). Proposition 0.1. (Lin, Xu, and Zha 2017) Let nnz(w (i,:) ) (resp. nnz(w (:,j) )) be the number of nonzero elements in the ith row (resp. jth column) of W, Λ r = Diag( nnz(w (1,:) ),..., nnz(w (m,:) )), and Λ c = Diag( nnz(w (:,1) ),..., nnz(w (:,n) )). Then, H k (Ū, V ) F k (Ū, V ), where F k (Ū, V ) W (M U k (V k ) Ū(V k ) U k V ) 1 + λ 2 U k + Ū 2 F Λ rū 2 F + λ 2 V k + V 2 F Λ c V 2 F. (6) Equality holds iff (Ū, V ) = (0, 0). However, because of the coupling of Ū, V k (resp. U k, V ) in Ū(V k ) (resp. U k V ) in (6), F k is still difficult to optimize. To address this problem, RMF-MM uses the linearized alternating direction method with parallel splitting and adaptive penalty (LADMPSAP) (Lin, Liu, and Li 2015), a multi-block variant of the alternating direction method of multipliers (ADMM) (Boyd et al. 2011). RMF-MM converges to a critical point of (3). Empirically, it converges faster than the other RMF solvers. However, due to nonconvexity of (3), its convergence rate is not known. Moreover, its space and time complexities are O(mn), and O(mnrIK), respectively, where I is the number of LADMPSAP iterations. They both grow linearly with the matrix size, and may be computationally expensive on large data sets. Besides, the l 1 -loss can still be sensitive to outliers. Proposed Algorithm In this paper, we replace the l 1 -loss by a general nonconvex loss. Section introduces the nonconvex RMF formulation. The resultant optimization problem is more difficult to solve, and existing RMF solvers cannot be used as they rely crucially on the l 1 -norm. Instead, we use the more flexible MM technique. To handle nonconvexity, a new surrogate, based on relaxation back to l 1 and variable decoupling, is proposed in Section. Instead of using LADMPSAP

3 as in RMF-MM, we propose in Section a more timeand space-efficient procedure by running the accelerated proximal gradient (APG) algorithm on the surrogate s dual. In particular, the underlying gradient computations can efficiently utilize data sparsity. The complete algorithm is presented in Section, and convergence is analyzed in Section. Nonconvex Loss With a nonconvex loss, problem (3) is changed to: m n Ḣ(U, V ) W ij φ ( M ij [UV ] ij ) min U,V i=1 j=1 + λ 2 ( U 2 F + V 2 F ), (7) where φ is nonconvex. We assume the following on φ: Assumption 1. φ(α) is concave, smooth and strictly increasing on α 0. Obviously, the (convex) l 1 function also satisfies Assumption 1, and thus (7) includes (3). As shown in Table 1, Assumption 1 is satisfied by popular nonconvex functions including the Geman penalty (Geman and Yang 1995), Laplace penalty (Trzasko and Manduca 2009), the log-sum-penalty (LSP) (Candès, Wakin, and Boyd 2008), and slightly modified variants of the minimax concave penalty (MCP) (Zhang 2010) and smooth-capped-absolutedeviation (SCAD) penalty (Fan and Li 2001). Note that unlike previous papers (Gong et al. 2013; Zuo et al. 2013; Yao and Kwok 2016), we use these nonconvex functions as the loss, not as the regularizer. Table 1: Example nonconvex regularizers (θ > 2 for SCAD and θ > 0 for others is a constant). Here, δ > 0 is a small constant to ensure that the φ s for MCP and SCAD are strictly increasing. Geman penalty Laplace penalty log-sum-penalty (LSP) minimax concave penalty (MCP) smoothly clipped absolute deviation (SCAD) penalty φ( α ) α θ+ α ( α θ ) 1 exp ( ) log 1 + α θ { (1 + δ)α α2 2θ α θ 1 2 θ2 + δα α > θ (1 + δ)α α 1 α 2 +2θα 1 2(θ 1) + δα 1 < α θ (1+θ) 2 + δα α > θ Note from (7) that when the ith row of W is zero, the ith row of U obtained is also zero because of the U 2 F regularizer. Similarly, when the ith column of W is zero, the corresponding column in V is also zero. To avoid this trivial solution, we make the following Assumption, as in matrix completion (Candès and Recht 2009) and RMF-MM. Assumption 2. W has no zero row or column. Constructing the Surrogate The following Proposition first obtains a convex upper bound of φ using the Taylor expansion. Proposition 0.2. φ( α ) φ ( β ) α +(φ( β ) φ ( β ) β ) for any given β R. Equality holds iff α = ±β. Figure 1 shows the upper bounds for the nonconvex penalties in Table 1. Interestingly, the upper bound is simply a scaled l 1 -loss (with scaling factor φ ( β )) and offset by φ( β ) φ ( β ) β. As one can expect, recovery of the l 1 - loss will make optimization easier. At iteration k, Ḣ in (7) can be written as: (Ū, V ) m n i=1 j=1 W ijφ( M ij [(U k + Ū)(V k + V ) ] ij ) + λ 2 U k + Ū 2 F + λ 2 V k + V 2 F, where (Ū, V ) is as defined in (4). Applying Proposition 0.2 on the first term of Ḣk, we obtain the following. Corollary 0.3. Let A k ij = φ ( [U k (V k ) ] ij ), then Ḣ k (Ū, V ) b k + λ 2 U k + Ū 2 F + λ 2 V k + V 2 F + Ẇ k (M U k (V k ) Ū(V k ) U k V Ū V ) 1, where b k = m n i=1 j=1 W ij(φ ( ) [U k (V k ) ] ij [U k (V k ) ] ij ) and Ẇ k = A k W. A k ij The product Ū V inside the l 1 term still couples V Ū and together. As Ḣk is similar as H k in (5), one may want to reuse Proposition 0.1. However, note that the elements in A k are positive due to Assumption 1 and thus Ẇ k is a nonnegative matrix, while W in (3) is a binary matrix. Using the Cauchy-Schwarz inequality, the following Lemma decouples Ū and V. This then allows the design of a convex surrogate F k that upper-bounds Ḣk. Lemma 0.4. Let sum(ẇ (i,:) k ) = n j=1 Ẇ ij k (resp. sum(ẇ (:,j) k ) = m i=1 Ẇ ij k ) be the sum of elements on row i (resp. column j) of Ẇ k. Then, Ẇ k (Ū V ) Λk rū 2 F Λk c V 2 F, where Λ k r = Diag( sum(ẇ (1,:) k ),..., Λ k c = Diag( sum(ẇ (:,1) k ),..., holds iff (Ū, V ) = (0, 0). Proposition 0.5. sum(ẇ k (m,:) sum(ẇ k (:,n) Ḣ k (Ū, V ) F k (Ū, V ), where )) and )). Equality F k (Ū, V ) Ẇ k (M U k (V k ) Ū(V k ) U k V ) 1 + λ 2 U k + Ū 2 F Λk rū 2 F + λ 2 V k + V 2 F Λk c V 2 F + b k. (8) Equality holds iff (Ū, V ) = (0, 0). In the special case where the l 1 -loss is used, Ẇ k = W, b k = 0 for all k s, and Λ k r = Λ r, Λ k c = Λ c. The surrogate in (8) then reduces to that in (6), and Proposition 0.5 becomes Proposition 0.1.

4 (a) Geman. (b) Laplace. (c) LSP. (d) MCP (modified). (e) SCAD (modified). Figure 1: Illustration of the upper bound in Proposition 0.2 for the various nonconvex penalities in Table 1 β = 1, θ = 2.5 for SCAD and θ = 0.5 for the others; and δ = 0.05 for MCP and SCAD. It can be easily seen that F k qualifies as a good surrogate in Section : (a) Ḣ(Ū + U k, V + V k ) F k (Ū, V ); (b) (0, 0) = arg min Ū, V F k (Ū, V ) Ḣk (Ū, V ) and F k (0, 0) = Ḣ(0, 0); and (c) F k is jointly convex in Ū, V. Optimizing the Surrogate Optimizing using LADMPSAP LADMPSAP, which has been used in RMF-MM, can also be used to optimize (8). 1 However, LADMPSAP cannot utilize possible sparsity of W. Moreover, LADMPSAP converges at a rate of O(1/T ) (Lin, Liu, and Li 2015), which is slow. Optimizing using APG We first transform the surrogate in (8) to the dual, which then has only nnz(w ) variables (instead of O(mn)). Let Ω {(i 1, j 1 ),..., (i nnz(w ), j nnz(w ) )} be the set containing indices to the nonzero elements in W. Let H( ) be the linear operator which maps a nnz(w )- dimensional vector x to a sparse matrix X R m n with nonzero positions indicated by Ω, i.e., X itj t = x t where (i t, j t ) is the tth element in Ω. Let H 1 ( ) be the inverse operator of H, which extracts an nnz(w )-dimensional vector x from X with x t = X itj t. Proposition 0.6. Let ẇ k = H 1 (Ẇ k ). The dual problem of F k is min Ū, V min D k (x) x W k 1 2 tr((h(x)v k λu k ) A k r(h(x)v k λu k )) tr(h(x) M) tr((h(x) U k λv k ) A k c (H(x) U k λv k )), (9) 1 Because of the lack of space, the detailed update equations will not be included here. where W k {x R nnz(w ) : x i [ẇ k ] 1 i }, A k r = (λi + (Λ k r) 2 ) 1, and A k c = (λi + (Λ k c ) 2 ) 1. From the obtained x, the primal (Ū, V ) solution can be recovered as Ū = A k r(h(x)v k λu k ) and V = A k c (H(x) U k λv k ). Problem (9) can be solved by the APG algorithm, which has a convergence rate of O(1/T 2 ) (Beck 2009; Nesterov 2013) and is faster than LADMPSAP. In each iteration, one needs to compute the gradient D k of the objective: D k (x) = H 1 (A k r(h(x)v k λu k )(V k ) ) + H 1 (U k [(U k ) H(x) λ(v k ) ]A k c ) H 1 (M). (10) The following Proposition shows that the proximal step has a closed-form solution. As x is nnz(w )-dimensional, computing the proximal step takes only O(nnz(W )) time. Proposition 0.7. Let x = arg min 1 x W k 2 x z 2 F for a given z. Then, x i = sign (z i) min( z i, (ẇi k) 1 ). Exploiting Sparsity A direct implementation of APG takes O(mnr) time and O(mn) space. In the following, we show how these can be reduced by exploiting sparsity. Objective (9) involves A k r, A k c and W k, which are all related to Ẇ k. From its definition in Corollary 0.3, Ẇ k is sparse (as W is sparse). Thus, by exploting sparsity, constructing A k r, A k c and W k only take O(nnz(W )) time and space. In each APG iteration, one has to compute the objective, gradient, and proximal step. First, consider computing the gradient in (10). The first term can be rewritten as ĝ k = H 1 (Q k (V k ) ), where Q k = A k r(h(x)v k λu k ). As A k r is diagonal and H(x) is sparse, Q k can be computed as A k r(h(x)v k ) λ(a k ru k ) in O(nnz(W )r + mr) time, where r is the number of columns in U k and V k. Let the tth element in Ω be (i t, j t ). By the definition of H 1 ( ), we have ĝt k = r q=1 Qk i V k tq j tq, and this takes O(nnz(W )r + mr)

5 time. Similarly, computing the second term in (10) takes O(nnz(W )r + nr) time. Hence, computing the gradient D k ( ) using Algorithm 1 takes a total of O(nnz(W )r+(m+ n)r) time and O(nnz(W ) + (m + n)r) space. Algorithm 1 Computing D k (x) by exploiting sparsity. 1: set X itj t = x t for all (i t, j t ) Ω; // i.e., X = H(x) 2: Q k = A k r(xv k ) λ(a k ru k ); 3: construct the nnz(w )-dimensional vector ĝ k with ĝt k = r q=1 Qk i V k tq j ; tq 4: P k = A k c (X U k ) λ(a k c V k ); 5: construct the nnz(w )-dimensional vector ğ k with ğt k = r q=1 U i k P k tq j ; // i.e., tq ğk = H 1 (U k (P k ) ) 6: return ĝ k + ğ k H 1 (M). Next, as x R nnz(w ), the proximal step in Proposition 0.7 takes O(nnz(W )) time and space. Similar to Algorithm 1, the objective can be obtained in O(nnz(W )r + (m + n)r) time and O(nnz(W ) + (m + n)r) space. Thus, by exploiting sparsity, the APG algorithm takes O(nnz(W ) + (m + n)r) space, which is much smaller than the O(mn) space required by LADMPSAP. The iteration time complexity of APG is O(nnz(W )r + (m + n)r), which is again much smaller than the O(mnr) time each LADMPSAP iteration takes. Complete Algorithm The complete procedure, which will be called Robust Matrix Factorization with Nonconvex Loss (RMFNL) algorithm is shown in Algorithm 2. The surrogate is optimized via its dual in step 4. The primal solution is recovered in step 5, and (U k, V k ) are updated in step 6. Algorithm 2 Robust matrix factorization using nonconvex loss (RMFNL) algorithm. 1: initialize U 1 R m r and V 1 R m r ; 2: for k = 1, 2,..., K do 3: compute Ẇ k in Corollary 0.3 (only on the observed positions), and Λ k r, Λ k c in Lemma 0.4; 4: compute x k = arg min x W k D k (x) in Proposition 0.6 using APG (use Algorithm 1 to compute D k (x)); 5: compute Ū k and V k in Proposition 0.6; 6: U k+1 = U k + Ū k, V k+1 = V k + V k ; 7: end for 8: return U K+1 and V K+1. Convergence Analysis In this Section, we prove convergence of the proposed RMFNL algorithm. As φ in (7) is nonconvex, Proposition 1 in (Lin, Xu, and Zha 2017) cannot be extended to our surrogate F k. The RMF-MM iterations guarantee a sufficient decrease on the objective (Lin, Xu, and Zha 2017) in the following sense: Definition 0.1 ((Nesterov 1998)). The sequence {X k } has a sufficient decrease on f if there exists a constant γ > 0 such that f(x k ) f(x k+1 ) γ X k X k+1 2 F, k. The following Proposition shows that RMFNL also achieves a sufficient decrease on its objective. Moreover, the {(U k, V k )} sequence generated is bounded, which has at least one limit point. The Proposition is similar to Theorem 1 in (Lin, Xu, and Zha 2017). However, we use a different surrogate and Λ k r, Λ k c are not fixed but change with k. Hence, their proof cannot be directly extended. Proposition 0.8. For Algorithm 2, (i) {(U k, V k )} is bounded. (ii) {(U k, V k )} has a sufficient decrease on Ḣ, i.e., Ḣ(U k, V k ) Ḣ(U k+1, V k+1 ) γ U k+1 U k 2 F + γ V k+1 V k 2 F, where γ > 0 is a constant; (iii) lim k (U k+1 U k ) = 0 and lim k (V k+1 V k ) = 0; In the following, we show that limit points from Algorithm 2 are also critical points of (3). To prove convergence, the subgradient is used in (Lin, Xu, and Zha 2017). Here, as φ is nonconvex, we will use the Clarke subdifferential (Clarke 1990), which generalizes subgradients to nonconvex functions. The following Proposition connects the subgradient of surrogate F k to the Clarke subdifferential of Ḣ. Proposition 0.9. (i) F k (0, 0) = Ḣ k (0, 0); (ii) If 0 Ḣ k (0, 0), then (U k, V k ) is a critical point of (7). Theorem The limit points of the sequence generated by Algorithm 2 are also critical points of (7). RMF-MM is the only existing RMF solver that guarantees convergence to critical points. While our (7) is more difficult than their (3) (due to the nonconvex loss), the proposed RMFNL still shares the same guarantee as RMF-MM. Moreover, as (7) is neither convex nor smooth, convergence to critical points is the best one can obtain (Nesterov 1998). Experiments In this section, we compare the proposed RMFNL with state-of-the-art MF algorithms on both synthetic and realworld data sets. Experiments are performed on a PC with Intel i7 CPU and 32GB RAM. All the codes are in Matlab, with sparse matrix operations implemented in C++. Synthetic Data We first perform experiments with synthetic data. The synthetic data is generated as X = UV, where U R m 5, V R m 5, and m = {250, 500, 1000}. Elements of U and V are sampled i.i.d. from the normal distribution N (0, 1). This is then corrupted to produce M = X + N + S, where N is the noise matrix from N (0, 0.1), and S is a sparse matrix modeling outliers with 5% nonzero elements randomly sampled from {±5}. We randomly draw 10 log(m)/m% of the elements from M as observations, with half of them for training and the other half for validation. The remaining unobserved elements are for testing. note that the larger the m, the sparser is the observed matrix. We compare RMF-MM (Lin, Xu, and Zha 2017), which uses the l 1 -loss, with the proposed RMFNL (Algorithm 2),

6 Table 2: Performance on the synthetic data set ( nnz is the percentage of nonzero elements). RMSE is scaled by 10 1 and CPU time is in seconds. The best and comparable results according to the pairwise t-test with 95% confidence are highlightened. m = 250 (nnz: 11.04%) m = 500 (nnz: 6.21%) m = 1000 (nnz: 3.45%) loss algorithm RMSE CPU time RMSE CPU time RMSE CPU time l 1 RMF-MM 1.77± ± ± ± ± ±130.9 LADMPSAP 1.09± ± ± ± ± ±138.8 LSP RMFNL APG(dense) 1.10± ± ± ± ± ±91.9 APG 1.09± ± ± ± ± ±3.2 Table 3: Performance on the recommender system data sets. The RMSE is scaled by 10 1, CPU time is in seconds for MovieLens and minutes for netflix and yahoo. The best and comparable results according to the pairwise t-test with 95% confidence are highlightened. MovieLens-100K MovieLens-10M netflix yahoo loss algorithm RMSE CPU time RMSE CPU time RMSE CPU time RMSE CPU time l 2 AltGrad 10.40± ± ± ±8.9 RP 11.53± ± ± ± ± ± ± ±3.1 ScaledASD 10.73± ± ± ± ± ± ± ±9.4 l 1 RMC 9.14± ± ± ±3.9 RMF-MM 9.09± ±116.0 LSP RMFNL 8.91± ± ± ± ± ± ± ±6.8 using the nonconvex log-sum-penalty (LSP) (Candès, Wakin, and Boyd 2008) as loss. As observed in (Gong et al. 2013; Yao, Kwok, and Zhong 2015; Lu et al. 2016; Yao and Kwok 2016), other nonconvex functions in Table 1 usually have similar behavior. We compare three different optimizers for RMFNL: (a) RMFNL(LADMPSAP) in Section ; (b) RMFNL(APG-dense) in Section, which uses APG but does not utilize data sparsity; and (c) RMFNL(APG) in Algorithm 2, which utilizes data sparsity as discussed in Section. We initialize the iterates (U 1, V 1 ) by Gaussian random matrices. The matrix rank r is set to the ground-truth (i.e., 5). optimization is stopped when the relative change in objective values is smaller than For performance evaluation, we follow (Lin, Xu, and Zha 2017) and use (i) the root mean square error RMSE = W (X Ū V T ) 2 F /nnz( W ), where the testing elements are indicated by the binary matrix W ; and (ii) CPU time. To reduce statistical variability, results are averaged over five repetitions. Table 2 shows the results. As can be seen, all RMFNL variants have lower RMSE than RMF-MM, as LSP is more robust than l 1. As for time, RMFNL(APG-dense) is faster than RMFNL(LADMPSAP), as APG has a faster convergence rate than LADMPSAP. By further utilizing data sparsity, RMFNL(APG) is the fastest. Figure 2 shows that the speedups of RMFNL(APG) over RMFNL(LADMPSAP) and RMFNL(APG-dense) increase with matrix size (m). Recall that the larger the m, the sparser is the matrix. Hence, this agrees with Table 2, which shows that the time complexity of RMFNL(APG) only depends on the number of nonzero elements, but not on the matrix size. Figure 3 shows the convergence. When measured w.r.t. the number of iterations, all RMFNL variants and RMF- MM have the same behavior. However, when measured w.r.t. time, RMFNL(APG) is the fastest. Recommender Systems In this section, we perform experiments on some popular recommendation system data sets (Table 4). Following (Yao and Kwok 2016), we use 50% of the observed ratings for training, 25% for validation, and the rest for testing. We randomly corrupt 5% of the training ratings by adding ±5 with equal probability. Table 4: Recommendation data sets used in the experiment. Here, nnz is the percentage of nonzero elements. #users #movies #ratings nnz MovieLens-100K 943 1, , % MovieLens-10M 69,878 10,677 10,000, % netflix 480,189 17, ,480, % yahoo 249, ,111 62,551, % The proposed RMFNL, using LSP as loss, is compared with the following state-of-the-art algorithms: 1. Matrix factorization algorithms using the square loss, including alternating gradient descent (AltGrad) (Mnih and Salakhutdinov 2008), Riemannian preconditioning (RP) (Mishra and Sepulchre 2016), and scaled alternating steepest descent (ScaledASD) (Tanner and Wei 2016); 2. RMF algorithms, including RMF-MM and robust matrix completion (RMC) (Cambier and Absil 2016). All hyperparameters are tuned by using the validation set. To reduce statistical variability, results are averaged over five repetitions. For performance evaluation, we use (i) the testing RMSE; and (ii) CPU time. RMF-MM cannot converge in 10 4 seconds on MovieLens- 10M, and is thus not reported (as well as on the larger netflix and yahoo data sets). AltGrad is not compared on netflix and yahoo, as it is inferior to ScaledASD on MovieLens. RMC runs out of memory on netflix and yahoo. Table 3 shows the performance. As can be seen, RMFNL

7 (a) 250. (b) 500. (c) Figure 3: Convergence on the synthetic data set. Top: RMSE vs number of iterations (note that RMFNL(APG-dense) and RMFNL(APG) overlap with each other); Bottom: RMSE vs CPU time. Table 5: Performance on the data subsets extracted from the Dinosaur sequence. CPU time is in seconds. The best and comparable results according to the pairwise t-test with 95% confidence are highlightened. D1 D2 D3 loss algorithm l 1-loss CPU time l 1-loss CPU time l 1-loss CPU time l 1 RMF-MM(heuristic) 0.374± ± ± ± ± ±3.4 RMF-MM 0.442± ± ± ± ± ±2.1 LSP RMFNL 0.323± ± ± ± ± ±1.0 Affine Rigid Structure-from-Motion (SfM) Figure 2: Speedup of RMFNL(APG) over the other two RMFNL variants (LADMPSAP and APG-dense). consistently achieves the lowest RMSE, while the l 2 -based methods (AltGrad, RP and ScaledASD) have the highest RMSE. Thus, the nonconvex LSP loss is the most robust while the l 2 -loss is the most susceptible to outliers. RMC, which approximates the l 1 -loss, has testing RMSE slightly higher than RMF-MM. Figures 4(a) and 4(b) show convergence of the testing RMSE on MovieLens. As can be seen, RMF-MM is the slowest on MovieLens-100K, as it cannot utilize data sparsity. AltGrad, RP and ScaledASD are fast, but have very high RMSEs. The speeds of RMFNL and RMC are comparable. Figures 4(c) and 4(d) show convergence on netflix and yahoo, and the observations are similar. In this section, we perform experiments on affine rigid structure-from-motion (SfM) (Koenderink and Van Doorn 1991). This involves reconstructing the 3D scene from sparse feature points tracked in m images of a moving camera. Each feature point is projected to every image plane, and is thus represented by a 2m-dimensional vector. With n feature points, this leads to a 2m n matrix. Often, this matrix has missing data (e.g., when feature points are not visible throughout the whole motion sequence) and outliers (which arise from feature mismatch). We use the Oxford Dinosaur sequence, which has 36 images and 4, 983 feature points. As in (Lin, Xu, and Zha 2017), we extract three data subsets using feature points observed in at least 5, 6 and 7 images (Table 6). The fully observed data matrix can be recovered by rank-4 matrix factorization (Eriksson and Van Den Hengel 2010; Meng et al. 2013; Zheng et al. 2012; Cabral et al. 2013; Lin, Xu, and Zha 2017). We compare RMFNL with RMF-MM, and a RMF- MM variant (denoted RMF-MM(heuristic)) described in Section 4.2 of (Lin, Xu, and Zha 2017). In this variant, the diagonal entries of Λ r and Λ c are initialized with small values and then gradually increased until they reach the upper bounds in (6). It is claimed that this enables RMF-MM to converge faster (Lin, Xu, and Zha 2017). However, our

8 (a) MovieLens-100K. (b) MovieLens-10M. (c) netflix. (d) yahoo. Figure 4: Testing RMSE vs CPU time (seconds) on recommendation data sets. RMF-MM is too slow to run on larger data sets. Table 6: Data sets extracted from the Dinosaur sequence. Here, min occurrence is the minimum number of images that all feature points in that data subset have to appear in. data subset min occurrence matrix size sparsity D % D % D % experimental results show that this heuristic leads to more accurate, but not faster, results. Moreover, the key pitfall is that Proposition 0.1 and the convergence guarantee for RMF-MM no longer holds. For performance evaluation, as there is no ground-truth, we follow (Lin, Xu, and Zha 2017) and use (i) the l 1 - loss on reconstructing the corrupted matrix W (UV X) 1 /nnz( W ), where U and V are output from the algorithm, M is the data matrix with the observed positions indicated by the binary matrix W ; and (ii) CPU time. Results are shown in Table 5. As can be seen, RMF- MM(heuristic) obtain a lower l 1 -loss than RMF-MM, but is still outperformed by RMFNL. In terms of time, RMFNL is the fastest as it can utilize data sparsity, though the speedup is not as significant as in previous sections. This is because the extracted Dinosaur data subsets are not very sparse, as can be seen from Table 6. Conclusion In this paper, we addressed two problems in robust matrix factorization (RMF). First, to enhance robustness, we proposed the use of nonconvex loss over the commonly used (convex) l 1 and l 2 -loss. Second, we improved scalabililty by exploiting data sparsity (which RMF-MM cannot) and using the accelerated proximal gradient algorithm (which is faster than the commonly used ADMM). Theoretical analysis shows that the proposed RMFNL algorithm can generate a decreasing sequence and converge to a critical point. Extensive experiments on both synthetic and realworld data sets demonstrate that RMFNL is more accurate and more scalable than the state-of-the-art. References Basri, R.; Jacobs, D.; and Kemelmacher, I Photometric stereo with general, unknown lighting. International Journal of Computer Vision 72(3): Beck, A.and Teboulle, M A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1): Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; and Eckstein, J Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1): Cabral, R.; De la Torre, F.; Costeira, J.; and Bernardino, A Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In International Conference on Computer Vision, Cambier, L., and Absil, P Robust low-rank matrix completion by Riemannian optimization. SIAM Journal on Scientific Computing 38(5):S440 S460. Candès, E., and Recht, B Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9(6): Candès, E.; Wakin, M.; and Boyd, S Enhancing sparsity by reweighted l 1 minimization. Journal of Fourier Analysis and Applications 14(5-6): Clarke, F Optimization and nonsmooth analysis. SIAM. Eriksson, A., and Van Den Hengel, A Efficient computation of robust low-rank matrix approximations in the presence of missing data using the l 1 -norm. In Computer Vision and Pattern Recognition, Fan, J., and Li, R Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456): Geman, D., and Yang, C Nonlinear image recovery with half-quadratic regularization. IEEE Transactions on Image Processing 4(7): Gong, P.; Zhang, C.; Lu, Z.; Huang, J.; and Ye, J A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In International Conference on Machine Learning, Gu, S.; Zhang, L.; Zuo, W.; and Feng, X Weighted nuclear norm minimization with application to image

9 denoising. In Computer Vision and Pattern Recognition, Hunter, D., and Lange, K A tutorial on MM algorithms. The American Statistician 58(1): Ji, H.; Liu, C.; Shen, Z.; and Xu, Y Robust video denoising using low rank matrix completion. In Computer Vision and Pattern Recognition, Kim, E.; Lee, M.; Choi, C.; Kwak, N.; and Oh, S Efficient l 1 -norm-based low-rank matrix approximations for large-scale problems using alternating rectified gradient method. IEEE Transactions on Neural Networks and Learning Systems 26(2): Koenderink, J., and Van Doorn, A Affine structure from motion. Journal of the Optical Society of America 8(2): Koren, Y Factorization meets the neighborhood: a multifaceted collaborative filtering model. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Lange, K.; Hunter, R.; and Yang, I Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics 9(1):1 20. Lin, Z.; Liu, R.; and Li, H Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning. Machine Learning 2(99): Lin, Z.; Xu, C.; and Zha, H Robust matrix factorization by majorization minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence (99). Lu, C.; Tang, J.; Yan, S.; and Lin, Z Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm. IEEE Transactions on Image Processing 25(2): Mairal, J Optimization with first-order surrogate functions. In International Conference on Machine Learning, Meng, D.; Xu, Z.; Zhang, L.; and Zhao, J A cyclic weighted median method for l 1 low-rank matrix factorization with missing entries. In AAAI Conference on Artificial Intelligence, Mishra, B., and Sepulchre, R Riemannian preconditioning. SIAM Journal on Optimization 26(1): Mnih, A., and Salakhutdinov, R Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, Nesterov, Y Introductory lectures on convex optimization: A basic course. Springer. Nesterov, Y Gradient methods for minimizing composite functions. Mathematical Programming 140(1): Tanner, J., and Wei, K Low rank matrix completion by alternating steepest descent methods. Applied and Computational Harmonic Analysis 40(2): Trzasko, J., and Manduca, A Highly undersampled magnetic resonance image reconstruction via homotopicminimization. IEEE Transactions on Medical Imaging 28(1): Yang, J., and Leskovec, J Overlapping community detection at scale: a nonnegative matrix factorization approach. In International Conference on Web Search and Data Mining, Yao, Q., and Kwok, J Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. In International Conference on Machine Learning, Yao, Q.; Kwok, J.; and Zhong, W Fast lowrank matrix learning with nonconvex regularization. In International Conference on Data Mining, Zhang, C Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 38(2): Zheng, Y.; Liu, G.; Sugimoto, S.; Yan, S.; and Okutomi, M Practical low-rank matrix approximation under robust l 1 -norm. In Computer Vision and Pattern Recognition, Zuo, W.; Meng, D.; Zhang, L.; Feng, X.; and Zhang, D A generalized iterated shrinkage algorithm for nonconvex sparse coding. In International Conference on Computer Vision,

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Chen Xu 1, Zhouchen Lin 1,2,, Zhenyu Zhao 3, Hongbin Zha 1 1 Key Laboratory of Machine Perception (MOE), School of EECS, Peking

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

LOW-rank matrix learning is a central issue in many

LOW-rank matrix learning is a central issue in many IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, SUBMITTED 1 Large-Scale Low-Rank Matrix Learning with Nonconvex Regularizers Quanming Yao, Member IEEE, James T. Kwok, Fellow IEEE, Taifeng

More information

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization MM has been successfully applied to a wide range of problems. Mairal (2013 has given a comprehensive review on MM. Conceptually,

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm

Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm Xiawei Guo, Quanming Yao, James T. Kwok

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering

SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering Jianping Shi Naiyan Wang Yang Xia Dit-Yan Yeung Irwin King Jiaya Jia Department of Computer Science and Engineering, The Chinese

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization

An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization Shubao Zhang and Hui Qian and Xiaojin Gong College of Computer Science and Technology Department

More information

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm

Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson, Anton van den Hengel School of Computer Science University of Adelaide,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Accelerated Gradient Method for Multi-Task Sparse Learning Problem

Accelerated Gradient Method for Multi-Task Sparse Learning Problem Accelerated Gradient Method for Multi-Task Sparse Learning Problem Xi Chen eike Pan James T. Kwok Jaime G. Carbonell School of Computer Science, Carnegie Mellon University Pittsburgh, U.S.A {xichen, jgc}@cs.cmu.edu

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

arxiv: v1 [cs.cv] 1 Jun 2014

arxiv: v1 [cs.cv] 1 Jun 2014 l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

Elastic-Net Regularization of Singular Values for Robust Subspace Learning

Elastic-Net Regularization of Singular Values for Robust Subspace Learning Elastic-Net Regularization of Singular Values for Robust Subspace Learning Eunwoo Kim Minsik Lee Songhwai Oh Department of ECE, ASRI, Seoul National University, Seoul, Korea Division of EE, Hanyang University,

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure

A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure 1 A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure Ashkan Esmaeili and Farokh Marvasti ariv:1810.1260v1 [stat.ml] 29 Oct 2018 Abstract In this paper, we introduce a novel and robust

More information

Algorithms for Collaborative Filtering

Algorithms for Collaborative Filtering Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

On the Convex Geometry of Weighted Nuclear Norm Minimization

On the Convex Geometry of Weighted Nuclear Norm Minimization On the Convex Geometry of Weighted Nuclear Norm Minimization Seyedroohollah Hosseini roohollah.hosseini@outlook.com Abstract- Low-rank matrix approximation, which aims to construct a low-rank matrix from

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 16 Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications Fanhua Shang, Member, IEEE,

More information

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization Nonconvex Sparse Logistic Regression with Weakly Convex Regularization Xinyue Shen, Student Member, IEEE, and Yuantao Gu, Senior Member, IEEE Abstract In this work we propose to fit a sparse logistic regression

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization

Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization Shengke Xue, Wenyuan Qiu, Fan Liu, and Xinyu Jin College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou,

More information

Improved Bounded Matrix Completion for Large-Scale Recommender Systems

Improved Bounded Matrix Completion for Large-Scale Recommender Systems Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-7) Improved Bounded Matrix Completion for Large-Scale Recommender Systems Huang Fang, Zhen Zhang, Yiqun

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information