Generalized Singular Value Thresholding

Size: px
Start display at page:

Download "Generalized Singular Value Thresholding"

Transcription

1 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Generalized Singular Value Thresholding Canyi Lu, Chango Zhu, Chunyan u, Shuicheng Yan, Zhouchen Lin 3, Department of Electrical and Computer Engineering, National University of Singapore School of Computer Science and Technology, Huazhong University of Science and Technology 3 Key Laoratory of Machine Perception (MOE), School of EECS, Peking University canyilu@nus.edu.sg, zhuchango@gmail.com, xuchunyan@gmail.com, eleyans@nus.edu.sg, zlin@pku.edu.cn Astract This work studies the Generalized Singular Value Thresholding (GSVT) operator Prox σ g ( ), Prox σ g (B) = arg min g(σ i()) + B F, associated with a nonconvex function g defined on the singular values of. We prove that GSVT can e otained y performing the proximal operator of g (denoted as Prox g( )) on the singular values since Prox g( ) is monotone when g is lower ounded. If the nonconvex g satisfies some conditions (many popular nonconvex surrogate functions, e.g., l p-norm, < p <, of l -norm are special cases), a general solver to find Prox g() is proposed for any. GSVT greatly generalizes the known Singular Value Thresholding (SVT) which is a asic suroutine in many convex low rank minimization methods. We are ale to solve the nonconvex low rank minimization prolem y using GSVT in place of SVT. Introduction The sparse and low rank structures have received much attention in recent years. There have een many applications which exploit these two structures, such as face recognition (Wright et al. 9), suspace clustering (Cheng et al. ; Liu et al. 3) and ackground modeling (Candès et al. ). To achieve sparsity, a principled approach is to use the convex l -norm. However, the l -minimization may e suoptimal, since the l -norm is a loose approximation of the l -norm and often leads to an over-penalized prolem. This rings the attention ack to the nonconvex surrogate y interpolating the l -norm and l -norm. Many nonconvex penalities have een proposed, including l p -norm ( < p < ) (Frank and Friedman 993), Smoothly Clipped Asolute Deviation (SCAD) (Fan and Li ), Logarithm (Friedman ), Minimax Concave Penalty (MCP) (Zhang and others ), Geman (Geman and Yang 995) and Laplace (Trzasko and Manduca 9). Their definitions are shown in Tale. Numerical studies (Candès, Wakin, and Boyd 8) have shown that the nonconvex optimization usually outperforms convex models. Corresponding author. Copyright c 5, Association for the Advancement of Artificial Intelligence ( All rights reserved. Tale : Popular nonconvex surrogate functions of l -norm ( θ ). Penalty Formula g(θ), θ, λ > l p-norm λθ p, < p <. λθ, if θ λ, θ SCAD +γλθ λ, if λ < θ γλ, (γ ) λ (γ+), if θ > γλ. λ Logarithm log(γθ + ) log(γ+) λθ θ MCP, if θ < γλ, γ γλ, if θ γλ. λθ Geman. θ+γ Laplace λ( exp( θ )). γ The low rank structure is an extension of sparsity defined on the singular values of a matrix. A principled way is to use the nuclear norm which is a convex surrogate of the rank function (Recht, Fazel, and Parrilo ). However, it suffers from the same suoptimal issue as the l -norm in many cases. Very recently, many popular nonconvex surrogate functions in Tale are extended on the singular values to etter approximate the rank function (Lu et al. 4). However, different from the convex optimization, the nonconvex low rank minimization is much more challenging than the nonconvex sparse minimization. The Iteratively Reweighted Nuclear Norm (IRNN) method is proposed to solve the following nonconvex low rank minimization prolem (Lu et al. 4) min m F () = g(σ i ()) + h(), () where σ i () denotes the i-th singular value of R m n (we assume m n in this work). g : R + R + is continuous, concave and nonincreasing on [, + ). Popular nonconvex surrogate functions in Tale are some examples. h : R m n R + is the loss function which has Lipschitz continuous gradient. IRNN updates k+ y minimizing a surrogate function which upper ounds the ojective function in (). The surrogate function is constructed y linearizing g and h at k, simultaneously. In theory, IRNN guarantees to decrease the ojective function value of () in each iteration. However, it may decrease slowly since the upper 85

2 Gradient g(θ) Gradient g(θ) θ.5 (a) l p-norm 4 6 θ (d) MCP Gradient g(θ) Gradient g(θ) θ.5 () SCAD 4 6 θ (e) Geman Gradient g(θ) Gradient g(θ) θ (c) Logarithm θ (f) Laplace Figure : Gradients of some nonconvex functions (For l p - norm, p =.5. For all penalties, λ =, γ =.5). ound surrogate may e quite loose. It is expected that minimizing a tighter surrogate will lead to a faster convergence. A possile tighter surrogate function of the ojective function in () is to keep g and relax h only. This leads to the following updating rule which is named as Generalized Proximal Gradient (GPG) method in this work k+ = arg min = arg min g(σ i()) + h( k ) + h( k ), k + µ k F g(σ i()) + µ k + µ h(k ) F, where µ > L(h), L(h) is the Lipschitz constant of h, guarantees the convergence of GPG as shown later. It can e seen that solving () requires solving the following prolem Prox σ g (B) = arg min () g(σ i ()) + B F. (3) In this work, the mapping Prox σ g ( ) is called the Generalized Singular Value Thresholding (GSVT) operator associated with the function m g( ) defined on the singular values. If g(x) = λx, m g(σ i()) is degraded to the convex nuclear norm λ. Then (3) has a closed form solution Prox σ g (B) = U Diag(D λ (σ(b)))v T, where D λ (σ(b)) = (σ i (B) λ) + } m, and U and V are from the SVD of B, i.e., B = U Diag(σ(B))V T. This is the known Singular Value Thresholding (SVT) operator associated with the convex nuclear norm (when g(x) = λx) (Cai, Candès, and Shen ). More generally, for a convex g, the solution to (3) is Prox σ g (B) = U Diag(Prox g (σ(b)))v T, (4) where Prox g ( ) is defined element-wise as follows, Prox g () = arg min x f (x) = g(x) + (x ), (5) For x <, g(x) = g( x). If, Prox g(). If <, Prox g() = Prox g( ). So we only need to discuss the case, x in this work. where Prox g ( ) is the known proximal operator associated with a convex g (Comettes and Pesquet ). That is to say, solving (3) is equivalent to performing Prox g ( ) on each singular value of B. In this case, the mapping Prox g ( ) is unique, i.e., (5) has a unique solution. More importantly, Prox g ( ) is monotone, i.e., Prox g (x ) Prox g (x ) for any x x. This guarantees to preserve the nonincreasing order of the singular values after shrinkage and thresholding y the mapping Prox g ( ). For a nonconvex g, we still call Prox g ( ) as the proximal operator, ut note that such a mapping may not e unique. It is still an open prolem whether Prox g ( ) is monotone or not for a nonconvex g. Without proving the monotonity of Prox g ( ), one cannot simply perform it on the singular values of B to otain the solution to (3) as SVT. Even if Prox g ( ) is monotone, since it is not unique, one also needs to carefully choose the solution p i Prox g (σ i (B)) such that p p p m. Another challenging prolem is that there does not exist a general solver to (5) for a general nonconvex g. It is worth mentioning that some previous works studied the solution to (3) for some special choices of nonconvex g (Nie, Huang, and Ding ; Chartrand ; Liu et al. 3a). However, none of their proofs was rigorous since they ignored proving the monotone property of Prox g ( ). See the detailed discussions in the next section. Another recent work (Gu et al. 4) considered the following prolem related to the weighted nuclear norm: min f w,b() = w i σ i () + B F, (6) where w i, i =,, m. Prolem (6) is a little more general than (3) y taking different g i (x) = w i x. It is claimed in (Gu et al. 4) that the solution to (6) is = UDiag (Prox gi (σ i (B)), i =,, m}) V T, (7) where B = UDiag(σ(B))V T is the SVD of B, and Prox gi (σ i (B)) = maxσ i (B) w i, }. However, such a result and their proof are not correct. A counterexample is as follows: B = = [ [ ] , w = ] [, = [.5.5 ], where is otained y (7). The solution is not optimal to (6) since there exists shown aove such that f w,b ( ) =.6 < f w,b ( ) =.393. The reason ehind is that (Prox gi (σ i (B)) Prox gj (σ j (B)))(σ i (B) σ j (B)), (8) does not guarantee to hold for any i j. Note that (8) holds when w w m, and thus (7) is optimal to (6) in this case. In this work, we give the first rigorous proof that Prox g ( ) is monotone for any lower ounded function (regardless of the convexity of g). Then solving (3) is degenerated to solving (5) for each = σ i (B). The Generalized Singular Value Thresholding (GSVT) operator Prox σ g ( ) associated with any lower ounded function in (3) is much more ], 86

3 general than the known SVT associated with the convex nuclear norm (Cai, Candès, and Shen ). In order to compute GSVT, we analyze the solution to (5) for certain types of g (some special cases are shown in Tale ) in theory, and propose a general solver to (5). At last, with GSVT, we can solve () y the Generalized Proximal Gradient (GPG) algorithm shown in (). We test oth Iteratively Reweighted Nuclear Norm (IRNN) and GPG on the matrix completion prolem. Both synthesis and real data experiments show that GPG outperforms IRNN in terms of the recovery error and the ojective function value. Generalized Singular Value Thresholding Prolem Reformulation A main goal of this work is to compute GSVT (3), and uses it to solve (). We will show that, if Prox g ( ) is monotone, prolem (3) can e reformulated into an equivalent prolem which is much easier to solve. Lemma. (von Neumann s trace inequality (Rhea )) For any matrices A, B R m n (m n), Tr(A T B) m σ i(a)σ i (B), where σ (A) σ (A) and σ (B) σ (B) are the singular values of A and B, respectively. The equality holds if and only if there exist unitaries U and V such that A = U Diag(σ(A))V T and B = U Diag(σ(B))V T are the SVDs of A and B, simultaneously. Theorem. Let g : R + R + e a function such that Prox g ( ) is monotone. Let B = U Diag(σ(B))V T e the SVD of B R m n. Then an optimal solution to (3) is = U Diag(ϱ )V T, (9) where ϱ satisfies ϱ ϱ ϱ m, i =,, m, and ϱ i Prox g (σ i (B)) = argmin g(ϱ i ) + ϱ i (ϱ i σ i (B)). () Proof. Denote σ () σ m () as the singular values of. Prolem (3) can e rewritten as } min g(ϱ i ) + B F. min ϱ:ϱ ϱ m σ()=ϱ () By using the von Neumann s trace inequality in Lemma, we have B F = Tr ( T ) Tr( T B) + Tr(B T B) = σi () Tr( T B) + σi (B) σi () σ i()σ i(b) + σi (B) = (σ i() σ i(b)). Note that the aove equality holds when admits the singular value decomposition = U Diag(σ())V T, where U and V are the left and right orthonormal matrices in the SVD of B. In this case, prolem () is reduced to (g(ϱ i ) + ) (ϱ i σ i (B)). () min ϱ:ϱ ϱ m Since Prox g ( ) is monotone and σ (B) σ (B) σ m (B), there exists ϱ i Prox g (σ i (B)), such that ϱ ϱ ϱ m. Such a choice of ϱ is optimal to (), and thus (9) is optimal to (3). From the aove proof, it can e seen that the monotone property of Prox g ( ) is a key condition which makes prolem () separale conditionally. Thus the solution (9) to (3) shares a similar formulation as the known Singular Value Thresholding (SVT) operator associated with the convex nuclear norm (Cai, Candès, and Shen ). Note that for a convex g, Prox g ( ) is always monotone. Indeed, (Prox g ( ) Prox g ( )) ( ) (Prox g ( ) Prox g ( )),, R +. The aove inequality can e otained y the optimality of Prox g ( ) and the convexity of g. The monotonicity of Prox g ( ) for a nonconvex g is still unknown. There were some previous works (Nie, Huang, and Ding ; Chartrand ; Liu et al. 3a) claiming that the solution (9) is optimal to (3) for some special choices of nonconvex g. However, their results are not rigorous since the monotone property of Prox g ( ) is not proved. Surprisingly, we find that the monotone property of Prox g ( ) holds for any lower ounded function g. Theorem. For any lower ounded function g, its proximal operator Prox g ( ) is monotone, i.e., for any p i Prox g (x i ), i =,, p p, when x > x. Note that it is possile that σ i (B) = σ j (B) for some i < j in (). Since Prox g ( ) may not e unique, we need to choose ϱ i Prox g(σ i (B)) and ϱ j Prox g(σ j (B)) such that ϱ i ϱ j. This is the only difference etween GSVT and SVT. Proximal Operator of Nonconvex Function So far, we have proved that solving (3) is equivalent to solving (5) for each = σ i (B), i =,, m, for any lower ounded function g. For a nonconvex g, only for some special cases, the candidate solutions to (5) have a closed form (Gong et al. 3). There does not exist a general solver for a more general nonconvex g. In this section, we analyze the solution to (5) for a road choice of the nonconvex g. Then a general solver will e proposed in the next section. Assumption. g : R + R +, g() =. g is concave, nondecreasing and differentiale. The gradient g is convex. In this work, we are interested in the nonconvex surrogate of l -norm. Except the differentiality of g and the convexity of g, all the other assumptions in Assumption are necessary to construct a surrogate of l -norm. As shown later, these two additional assumptions make our analysis much easier. Note that the assumptions for the nonconvex 87

4 Algorithm : A General Solver to (5) in which g satisfying Assumption Input:. Output: Identify an optimal solution, or ˆx = maxx f (x) =, x }. if g() = then return ˆx = ; else // find ˆx y fixed point iteration. x =. // Initialization. while not converge do x k+ = g(x k ); if x k+ < then return ˆx = ; reak; end end end Compare f () and f (ˆx ) to identify the optimal one. function considered in Assumption are quite general. It is easy to verify that many popular surrogates of l -norm in Tale satisfy Assumption, including l p -norm, Logarithm, MCP, Geman and Laplace penalties. Only the SCAD penalty violates the convex g assumption, as shown in Figure. Proposition. Given g satisfying Assumption, the optimal solution to (5) lies in [, ]. The aove fact is ovious since oth g(x) and (x ) are nondecreasing on [, + ). Such a result limits the solution space, and thus is very useful for our analysis. Our general solver to (5) is also ased on Proposition. Note that the solutions to (5) lie in or the local points x f (x) = g(x) + x = }. Our analysis is mainly ased on the numer of intersection points of D(x) = g(x) and the line C (x) = x. Let = sup C (x) and D(x) have no intersection}. We have the solution to (5) in different cases. Please refer to the supplementary material for the detailed proofs. Proposition. Given g satisfying Assumption and g() = +. Restricted on [, + ), when >, C (x) and D(x) have two intersection points, denoted as P = (x, y), P = (x, y), and x < x. If there does not exist > such that f () = f (x ), then Prox g () = for all. If there exists > such that f () = f (x ), let = inf f () = f (x ) }. Then we have = x Prox g () = argmin f (x), if >, x, if. Proposition 3. Given g satisfying Assumption and g() < +. Restricted on [, + ), if we have C g() (x) = g() x g(x) for all x (, g()), then C (x) and D(x) have only one intersection point Prox g () Prox g () l (a) l -norm logarithm (d) Logarithm Prox g () Prox g () lp () l p-norm laplace (e) Laplace Prox g () Prox g () mcp (c) MCP geman (f) Geman Figure : Plots of v.s. Prox g () for different choices of g: convex l -norm and popular nonconvex functions which satisfy Assumption in Tale. (x, y ) when > g(). Furthermore, = x Prox g () = argmin f (x), if > g(), x, if g(). Suppose there exists < ˆx < g() such that C g() (ˆx) = g() ˆx > g(ˆx). Then, when g() >, C (x) and D(x) have two intersection points, which are denoted as P = (x, y) and P = (x, y) such that x < x. When g() <, C (x) and D(x) have only one intersection point (x, y ). Also, there exists such that g() > > and f () = f (x ). Let = inf f () = f (x ) }. We have Prox g() = argmin x f (x) = x, if > g(), = x, if g() >,, if. Corollary. Given g satisfying Assumption. Denote ˆx = maxx f (x) =, x } and x = arg min x,ˆx } f (x). Then x is optimal to (5). The results in Proposition and 3 give the solution to (5) in different cases, while Corollary summarizes these results. It can e seen that one only needs to compute ˆx which is the largest local minimum. Then comparing the ojective function value at and ˆx leads to an optimal solution to (5). Algorithms In this section, we first give a general solver to (5) in which g satisfies Assumption. Then we are ale to solve the GSVT prolem (3). With GSVT, prolem () can e solved y Generalized Proximal Gradient (GPG) algorithm as shown in (). We also give the convergence guarantee of GPG. A General Solver to (5) Given g satisfying Assumption, as shown in Corollary, and ˆx = maxx f (x) =, x } are the candidate solutions to (5). The left task is to find ˆx which is the largest local minimum point near x =. So we can start 88

5 Figure 3: Experimental results of low rank matrix recovery on random data. (a) Frequency of Success (FoS) for a noise free case. () Relative error for a noisy case. (c) Convergence curves of IRNN and GPG for a noisy case. searching for x from x = y the fixed point iteration algorithm. Note that it will e very fast since we only need to search within [, ]. The whole procedure to find x can e found in Algorithm. In theory, it can e proved that the fixed point iteration guarantees to find x. If g is nonsmooth or g is nonconvex, the fixed point iteration algorithm may also e applicale. The key is to find all the local solutions with smart initial points. Also all the nonsmooth points should e considered as the candidates. All the nonconvex surrogates g except SCAD in Tale satisfy Assumption, and thus the solution to (5) can e otained y Algorithm. Figure illustrates the shrinkage effect of proximal operators of these functions and the convex ` -norm. The shrinkage and thresholding effect of these proximal operators are similar when is relatively small. However, when is relatively large, the proximal operators of the nonconvex functions are nearly uniased, i.e., keeping nearly the same as the ` -norm. On the contrast, the proximal operator of the convex ` -norm is iased. In this case, the ` -norm may e over-penalized, and thus may perform quite differently from the ` -norm. This also supports the necessity of using nonconvex penalties on the singular values to approximate the rank function. () lim (k k+ ) = ; k + (3) If F () + when F +, then any limit point of k } is a stationary point. It is expected that GPG will decrease the ojective function value faster than IRNN since it uses a tighter surrogate function. This will e verified y the experiments. Experiments In this section, we conduct some experiments on the matrix completion prolem to test our proposed GPG algorithm m (3) g(σi ()) + PΩ () PΩ (M) F, min where Ω is the index set, and PΩ : Rm n Rm n is a linear operator that keeps the entries in Ω unchanged and those outside Ω zeros. Given PΩ (M), the goal of matrix completion is to recover M which is of low rank. Note that we have many choices of g which satisfies Assumption, and we simply test on the Logarithm penalty, since it is suggested in (Lu et al. 4; Cande s, Wakin, and Boyd 8) that it usually performs well y comparing with other nonconvex penalties. Prolem (3) can e solved y GPG y using GSVT (9) in each iteration. We compared GPG with IRNN on oth synthetic and real data. The continuation technique is used to enhance the low rank matrix recovery in GPG. The initial value of λ in the Logarithm penalty is set to λ, and dynamically decreased till reaching λt. Generalized Proximal Gradient Algorithm for () Given g satisfying Assumption, we are now ale to get the optimal solution to (3) y (9) and Algorithm. Now we have a etter solver than IRNN to solve () y the updating rule (), or equivalently σ k+ k k = Prox g h( ). µ µ Low-Rank Matrix Recovery on Random Data We conduct two experiments on synthetic data without and with noises (Lu et al. 4). For the noise free case, we generate M = M M, where M Rm r, M Rr n are i.i.d. random matrices, and m = n = 5. The underlying rank r varies from to 33. Half of the elements in M are missing. We set λ =.9 PΩ (M), and λt = 5 λ. The relative error RelErr= M F / M F is used to evaluate the recovery performance. If RelErr is smaller than 3, is regarded as a successful recovery of M. We repeat the experiments times for each r. We compare GPG y using GSVT with IRNN and the convex Augmented Lagrange Multiplier (ALM) (Lin, Chen, and Ma 9). Figure 3 (a) plots r v.s. the frequency of success. It can e seen The aove updating rule is named as Generalized Proximal Gradient (GPG) for the nonconvex prolem (), which generalizes some previous methods (Beck and Teoulle 9; Gong et al. 3). The main per-iteration cost of GPG is to compute an SVD, which is the same as many convex methods (Toh and Yun a; Lin, Chen, and Ma 9). In theory, we have the following convergence results for GPG. Theorem 3. If µ > L(h), the sequence k } generated y () satisfies the following properties: () F (k ) is monotonically decreasing. 89

6 recovered image y APGL 8. 7 PSNR 6 5 Relative Error APGL IRNN GPG.4 APGL IRNN GPG 8. PSNR Relative Error.8.6 (a) Original () Noisy (c) APGL (d) IRNN (e) GPG Figure 4: Image inpainting y APGL, IRNN, and GPG. 3.4 APGL IRNN GPG APGL IRNN GPG (f) PSNR & error that GPG is slightly etter than IRNN when r is relatively small, while oth IRNN and GPG fail when r 3. Both of them outperform the convex ALM method, since the nonconvex logarithm penalty approximates the rank function etter than the convex nuclear norm. For the noisy case, the data matrix M is generated in the same way, ut are added some additional noises.e, where E is an i.i.d. random matrix. For this task, we set λ = P Ω (M), and λ t =.λ in GPG. The convex APGL algorithm (Toh and Yun ) is compared in this task. Each method is run times for each r 5, 8,, 3, 5, 3}. Figure 3 () shows the mean relative error. It can e seen that GPG y using GSVT in each iteration significantly outperforms IRNN and APGL. The reason is that λ t is not that small as in the noise free case. Thus, the upper ound surrogate of g in IRNN will e much more loose than that in GPG. Figure 3 (c) plots some convergence curves of GPG and IRNN. It can e seen that GPG without relaxing g will decrease the ojective function value faster. Applications on Real Data Matrix completion can e applied to image inpainting since the main information is dominated y the top singular values. For a color image, assume that 4% of pixels are uniformly missing. They can e recovered y applying low rank matrix completion on each channel (red, green and lue) of the image independently. Besides the relative error defined aove, we also use the Peak Signal-to-Noise Ratio (PSNR) to evaluate the recovery performance. Figure 4 shows two images recovered y APGL, IRNN and GPG, respectively. It can e seen that GPG achieves the est performance, i.e., the largest PSNR value and the smallest relative error. We also apply matrix completion for collaorative filtering. The task of collaorative filtering is to predict the unknown preference of a user on a set of unrated items, according to other similar users or similar items. We test on the MovieLens data set (Herlocker et al. 999) which includes three prolems, movie-k, movie-m and movie- M. Since only the entries in Ω of M are known, we use Normalized Mean Asolute Error (NMAE) P Ω ( ) P Ω (M) / Ω to evaluate the performance as in (Toh and Yun ). As shown in Tale, GPG achieves the est performance. The improvement enefits from the GPG algorithm which uses a fast and exact solver of GSVT (9). Tale : Comparison of NMAE of APGL, IRNN and GPG for collaorative filtering. Prolem size of M: (m, n) APGL IRNN GPG moive-k (943, 68).76e-3.6e-3.53e-3 moive-m (64, 376).66e-.5e-.47e- moive-m (7567, 677) 3.3e- 3.e-.89e- Conclusions This paper studied the Generalized Singular Value Thresholding (GSVT) operator associated with the nonconvex function g on the singular values. We proved that the proximal operator of any lower ounded function g (denoted as Prox g ( )) is monotone. Thus, GSVT can e otained y performing Prox g ( ) on the singular values separately. Given, we also proposed a general solver to find Prox g () for certain type of g. At last, we applied the generalized proximal gradient algorithm y using GSVT as the suroutine to solve the nonconvex low rank minimization prolem (). Experimental results showed that it outperformed previous method with smaller recovery error and ojective function value. For nonconvex low rank minimization, GSVT plays the same role as SVT in convex minimization. One may extend other convex low rank models to nonconvex cases, and solve them y using GSVT in place of SVT. An interesting future work is to solve the nonconvex low rank minimization prolem with affine constraint y ALM (Lin, Chen, and Ma 9) and prove the convergence. Acknowledgements This research is supported y the Singapore National Research Foundation under its International Research Funding Initiative and administered y the IDM Programme Office. Z. Lin is supported y NSF China (grant nos and 63), 973 Program of China 8

7 (grant no. 5CB355) and MSRA Collaorative Research Program. C. Lu is supported y the MSRA fellowship 4. References Beck, A., and Teoulle, M. 9. A fast iterative shrinkagethresholding algorithm for linear inverse prolems. SIAM Journal on Imaging Sciences. Cai, J.-F.; Candès, E. J.; and Shen, Z.. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization (4): Candès, E. J.; Li,.; Ma, Y.; and Wright, J.. Roust principal component analysis? Journal of the ACM 58(3). Candès, E. J.; Wakin, M. B.; and Boyd, S. P. 8. Enhancing sparsity y reweighted l minimization. Journal of Fourier Analysis and Applications 4(5-6): Chartrand, R.. Nonconvex splitting for regularized low-rank+ sparse decomposition. IEEE Transactions on Signal Processing 6(): Cheng, B.; Yang, J.; Yan, S.; Fu, Y.; and Huang, T. S.. Learning with l -graph for image analysis. TIP 9(Compendex): Comettes, P. L., and Pesquet, J.-C.. Proximal splitting methods in signal processing. Fixed-point algorithms for inverse prolems in science and engineering. Fan, J., and Li, R.. Variale selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456): Frank, L., and Friedman, J A statistical view of some chemometrics regression tools. Technometrics. Friedman, J.. Fast sparse regression and classification. International Journal of Forecasting 8(3): Geman, D., and Yang, C Nonlinear image recovery with half-quadratic regularization. TIP 4(7): Gong, P.; Zhang, C.; Lu, Z.; Huang, J.; and Ye, J. 3. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization prolems. In ICML. Gu, S.; Zhang, L.; Zuo, W.; and Feng,. 4. Weighted nuclear norm minimization with application to image denoising. In CVPR. Herlocker, J. L.; Konstan, J. A.; Borchers, A.; and Riedl, J An algorithmic framework for performing collaorative filtering. In International ACM SIGIR conference on Research and development in information retrieval. ACM. Lin, Z.; Chen, M.; and Ma, Y. 9. The augmented Lagrange multiplier method for exact recovery of a corrupted low-rank matrices. UIUC Technical Report UILU-ENG-9-5, Tech. Rep. Liu, D.; Zhou, T.; Qian, H.; u, C.; and Zhang, Z. 3a. A nearly uniased matrix completion approach. In Machine Learning and Knowledge Discovery in Dataases. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; and Ma, Y. 3. Roust recovery of suspace structures y low-rank representation. TPAMI 35():7 84. Lu, C.; Tang, J.; Yan, S. Y.; and Lin, Z. 4. Generalized nonconvex nonsmooth low-rank minimization. In CVPR. Nie, F.; Huang, H.; and Ding, C. H.. Low-rank matrix recovery via efficient Schatten p-norm minimization. In AAAI. Recht, B.; Fazel, M.; and Parrilo, P. A.. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 5(3):47 5. Rhea, D.. The case of equality in the von Neumann trace inequality. preprint. Toh, K., and Yun, S. a. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares prolems. Pacific Journal of Optimization. Toh, K., and Yun, S.. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares prolems. Pacific Journal of Optimization 6(65-64):5. Trzasko, J., and Manduca, A. 9. Highly undersampled magnetic resonance image reconstruction via homotopicminimization. IEEE Transactions on Medical imaging 8():6. Wright, J.; Yang, A. Y.; Ganesh, A.; Sastry, S. S.; and Ma, Y. 9. Roust face recognition via sparse representation. TPAMI 3(): 7. Zhang, C.-H., et al.. Nearly uniased variale selection under minimax concave penalty. The Annals of Statistics 38():

8 Supplementary Material of Generalized Singular Value Thresholding Canyi Lu, Chango Zhu, Chunyan u, Shuicheng Yan, Zhouchen Lin 3, Department of Electrical and Computer Engineering, National University of Singapore School of Computer Science and Technology, Huazhong University of Science and Technology 3 Key Laoratory of Machine Perception (MOE), School of EECS, Peking University canyilu@nus.edu.sg, zhuchango@gmail.com, xuchunyan@gmail.com, eleyans@nus.edu.sg, zlin@pku.edu.cn Ananlysis of the Proximal Operator of Nonconvex Function In the following development, we consider the following prolem Prox g () = arg min x f (x) = g(x) + (x ), () where g(x) satisfies the following assumption. Assumption. g : R + R +, g() =. g is concave, nondecreasing and differentiale. The gradient g is convex. Set C (x) = x and D(x) = g(x). Let = sup C (x) and D(x) have no intersection}, and x = inf x (x, y) is the intersection point of C (x) and D(x)}.. Proof of Proposition Proposition. Given g satisfying Assumption and g() = +. Restricted on [, + ), when >, C (x) and D(x) have two intersection points, denoted as P = (x, y), P = (x, y), and x < x. If there does not exist > such that f () = f (x ), then Prox g () = for all. If there exists > such that f () = f (x ), let = inf >, f () = f (x ) }. Then we have Prox g () = argmin f (x) x = x, if >,, if. () Remark: When exists and >, ecause D(x) = g(x) is convex and decreasing, we can conclude that C (x) and D(x) have exactly two intersection points. When, C (x) and D(x) may have multiple intersection points. Proof. When >, since f (x) = D(x) C (x), we can easily see that f is increasing on (, x ), decreasing on (x, x ) and increasing on (x, ). So, and x are two local minimum points of f (x) on [, ]. Case : If there exists > such that f () = f (x ), denote = inf >, f () = f (x ) }. First, we consider >. Let = + ε for some ε >. We have f (x ) f () = (x ε) + g(x ) ( + ε) = (x ) ( ) ε(x ) ε =f (x ) f () εx = εx <. Since f is decreasing on [x, x ], we conclude that f () > f (x ) f (x ). So, when >, x is the gloal minimum of f (x) on [, ]. Corresponding author.

9 Second, we consider <. We show that f () f (x ) y contradiction. Suppose that there exists such that f () > f (x ). Since f is strictly increasing on (, x ), we have f (x ) > f (). Because we have f (x ) > f (), y a direct computation, we get f (x ) < f (), g(x ) x g(x ) (x ) >, g(x ) x g(x ) (x ) <. According to the intermediate value theorem, there exists x such that x < x < x and g( x) x g( x) ( x) =. Let = g( x) + x. Then, ( x, x) is the intersection point of C (x) and D(x) such that f ( x) = f (). Since x < x < x and g is convex and nonincreasing, we conclude that < <, which contradicts the minimality of. Also, when, we have f (x) = D(x) C (x), ecause D(x) is aove C (x). So, the gloal minimum of f (x) on [, ] is. Case : Suppose for all >, f () f (x ). Since f is increasing on (, x ), we have f (x ) > f (). We now show that for all >, f (x ) f (). Suppose this is not true and there exists such that > and f (x ) < f (). Because we have f (x ) > f (), y a direct computation, we get f (x ) < f (), g(x ) x g(x ) (x ) >, g(x ) x g(x ) (x ) <. So, according to the intermediate value theorem, there exists x such that g( x) x g( x) ( x) =. Let = g( x) + x. Then, ( x, x) is the intersection point of C (x) and D(x) such that f ( x) = f (). Since x < x < x and g is convex and nonincreasing, we conclude that < <, which contradicts f () f (x ) for all >. So, for all >, is the minimum of f (x) on [, ]. Similarly, when, we have f (x) = D(x) C (x), ecause D(x) is aove C (x). So, the gloal minimum of f (x) on [, ] is. The proof is completed.. Proof of Proposition 3 Proposition 3. Given g satisfying Assumption and g() < +. Restricted on [, + ), if we have C g() (x) = g() x g(x) for all x (, g()), then C (x) and D(x) have only one intersection point (x, y ) when > g(). Furthermore, Prox g () = argmin f (x) x = x, if > g(),, if g(). Suppose there exists < ˆx < g() such that C g() (ˆx) = g() ˆx > g(ˆx). Then, when g() >, C (x) and D(x) have two intersection points, which are denoted as P = (x, y) and P = (x, y) such that x < x. When g() <, C (x) and D(x) have only one intersection point (x, y ). Also, there exists such that g() > > and f () = f (x ). Let = inf g() > >, f () = f (x ) }. We have Prox g () = argmin x f (x) = x, if > g(), = x, if g() >,, if. (3) (4) Remark: If exists, when, it is possile that C (x) and D(x) have more than two intersection points. If does not exist, when g(), it is also possile that C (x) and D(x) have more than two intersection points. Proof. Case : Suppose we have C g () (x) = g() x g(x) for all x on (, g()). Notice for all g(), we have g(x) = D(x) C (x), so the minimum point of f (x) is. For all > g(), C = x and D(x) have only one intersection point denoted as (x, y ). Then, we can easily see that f is decreasing on (, x ) and increasing on (x, ). So, when > g(), the minimum point of f (x) is x.

10 Case : Suppose there exists < ˆx < g() such that C g() (ˆx) = g() ˆx > g(ˆx). Then, D(x) and C (x) have two intersection points, i.e., (, g()) and (x g(), y g() ). It is easily checked that f g() is strictly decreasing on (, x g() ), so we have f g() (x g() ) < f g() (). Also, since f is strictly increasing on (, x ), we have f (x ) > f (). Because we have y a direct computation, we get f (x ) > f (), f g() (x g() ) < f g() (), g(x ) x g(x ) (x ) >, g(x g() ) x g() g(x g() ) (x g() ) <. So, according to the intermediate value theorem, there exists x such that g( x) x g( x) ( x) =. Let = g( x)+ x. Then, ( x, x) is the intersection point of C (x) and D(x) such that f ( x) = f (). Since x < x < x g() and g is convex and nonincreasing, we conclude that < < g(). Next, we set = inf < < g(), f () = f (x ) }. Given g() >, we can easily see that f is increasing on (, x ), decreasing on (x, x ) and increasing on (x, ). So, and x are two local minimum points of f (x) on [, ]. Next, for g() >, set = + ε for some ε >. We have f (x ) f () = (x ε) + g(x ) ( + ε) = (x ) ( ) ε(x ) ε =f (x ) f () εx = εx <. Since f is decreasing on (x, x ), we conclude that f () > f (x ) f (x ). So, when >, x is the gloal minimum of f (x) on [, ]. Then, for all <, we show that f () f (x ). We prove y contradiction. Suppose that there exists such that f () > f (x ). Because we have f (x ) > f (), y a direct computation, we get f (x ) < f (), g(x ) x g(x ) (x ) >, g(x ) x g(x ) (x ) <. So, according to the intermediate value theorem, there exists x such that g( x ) x g( x ) ( x ) = and x < x < x. Let = g( x ) + x. Then, ( x, x ) is the intersection point of C (x) and D(x) such that f ( x ) = f (). Since x < x < x and g is convex and nonincreasing, we conclude that < <, which contradicts the minimality of. Next, when, we have f (x) = D(x) C (x), so the gloal minimum of f (x) on [, ] is. Also, when > g(), C = x and D(x) have only one intersection point (x, y ). Then, we can easily see that f is decreasing on (, x ) and increasing on (x, ). So, when > g(), the gloal minimum point of f (x) is x..3 Proof of Corollary Corollary. Given g satisfying Assumption in prolem (). Denote ˆx = maxx f (x) =, x } and x = arg min x,ˆx } f (x). Then x is optimal to (), i.e., x Prox g (). 3

11 Proof. As shown in Proposition and 3, when is larger than a certain threshold, Prox g () (x in ()(4) or x in (3)(4)) is unique. Actually the unique solution is the largest intersection point of C (x) and g(x), i.e., Prox g () = ˆx = maxx f (x) =, x }. For all the other choices of, Prox g (). Thus, and ˆx, one of them should e optimal to (). Thus x = arg min x,ˆx } f (x) is optimal to (). Proof of Theorem Theorem. For any lower ounded function g, its proximal operator Prox g ( ) is monotone, i.e., for any p i Prox g (x i ), i =,, p p, when x > x. Proof. The lower ound assumption of g guarantees a finite solution to prolem (). By the optimality of p i, i =,, we have Summing them together gives g(p ) + (p x ) g(p ) + (p x ), (5) g(p ) + (p x ) g(p ) + (p x ). (6) (p x ) + (p x ) (p x ) + (p x ). (7) It reduces to Thus p p when x > x. (p p )(x x ). (8) 3 Convergence Analysis of Algorithm Assume there exists otherwise, is a solution to (). ˆx = maxx f (x) = g(x) + x =, x }; We only need to prove that the fixed point iteration guarantees to find ˆx. First, if g() =, then we have found ˆx =. For the case ˆx <, we prove that, the fixed point iteration, starting from x =, converges to ˆx. Indeed, we have g(x) < x, for any x > ˆx. We prove this y contradiction. Assume there exists x > ˆx such that g( x) > x. Notice g satisfies Assumption. It is easy to see g is continuous, decreasing and nonnegative. Then we have g() < ( g() > since > ˆx ). Thus there must exist some ˆx (min(, x), max(, x)) > ˆx such that g(ˆx) = ˆx. This contradicts the definition of ˆx. So, we have x k+ = g(x k ) < x k, if x k > ˆx. On the other hand, x k } is lower ounded y ˆx. So there must exist a limit of x k }, denoted as x, which is no less than ˆx. Let k + on oth sides of x k+ = g(x k ), and we see that x = g( x). So, x = ˆx, i.e., lim x k = ˆx. k + 4 Convergence Analysis of Generalized Proximal Gradient Algorithm Consider the following prolem min m F () = g(σ i ()) + h(), (9) 4

12 where g : R + R + is continuous, concave and nonincreasing on [, + ), and h : R m n R + has Lipschitz continuous gradient with Lipschitz constant L(h). The Generalized Proximal Gradient (GPG) algorithm solves the aove prolem y the following updating rule k+ = arg min = arg min g(σ i ()) + h( k ) + h( k ), k + µ k F g(σ i ()) + µ k + µ h(k ) F. Then we have the following results. Theorem 3. If µ > L(h), the sequence k } generated y () satisfies the following properties: () F ( k ) is monotonically decreasing. Indeed, () lim k + (k k+ ) = ; F ( k ) F ( k+ ) µ L(h) k k+ F ; (3) If F () + when F +, then any limit point of k } is a stationary point. Proof. Since k+ is optimal to (), we have g(σ i ( k+ )) + h( k ) + h( k ), k+ k + µ k+ k F = g(σ i ( k )) + h( k ) + h( k ), k k + µ k k F g(σ i ( k )). On the other hand, since h has Lipschitz continuous gradient, we have [] h( k+ ) h( k ) + h( k ), k+ k + L(h) k+ k F. () Comining () and () leads to F ( k ) F ( k+ ) = g(σ i ( k )) + h( k ) g(σ i ( k+ )) h( k+ ) µ L(h) k+ k F. Thus µ > L(h) guarantees that F ( k ) F ( k+ ). Summing (3) for k =,,, we get This implies that F ( ) µ L(h) + k= () () (3) k+ k F. (4) lim k + (k k+ ) =. (5) Furthermore, since F () + when F +, k } is ounded. There exist and a susequence kj } such that lim j + kj =. By using (5), we get lim j + kj+ =. Considering that kj is optimal to (), and m g(σ i()) is convex (since g is concave) [3], there exists Q kj+ ( m g(σ i( kj+ )) ) such that Q kj+ + h( kj ) + µ( kj+ kj ) =. (6) 5

13 Let j + in (6). By the upper semi-continuous property of the sudifferential [], there exists Q ( m g(σ i( ))), such that = Q + h( ) F ( ). (7) Thus is a stationary point to (9). References [] Amir Beck and Marc Teoulle. A fast iterative shrinkage-thresholding algorithm for linear inverse prolems. SIAM Journal on Imaging Sciences, 9. [] Frank Clarke. Nonsmooth analysis and optimization. In Proceedings of the International Congress of Mathematicians, 983. [3] Adrian S Lewis and Hristo S Sendov. Nonsmooth analysis of singular values. Part I: Theory. Set-Valued Analysis, 3(3):3 4, 5. 6

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,

More information

Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013

Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013 Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013 Content Subspace Segmentation Problem Related Work Sparse Subspace Clustering (SSC) Low-Rank Representation (LRR)

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Robust Component Analysis via HQ Minimization

Robust Component Analysis via HQ Minimization Robust Component Analysis via HQ Minimization Ran He, Wei-shi Zheng and Liang Wang 0-08-6 Outline Overview Half-quadratic minimization principal component analysis Robust principal component analysis Robust

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

arxiv: v2 [cs.na] 10 Feb 2018

arxiv: v2 [cs.na] 10 Feb 2018 Efficient Robust Matrix Factorization with Nonconvex Penalties Quanming Yao, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong { qyaoaa,

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Joint Capped Norms Minimization for Robust Matrix Recovery

Joint Capped Norms Minimization for Robust Matrix Recovery Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-7) Joint Capped Norms Minimization for Robust Matrix Recovery Feiping Nie, Zhouyuan Huo 2, Heng Huang 2

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization

An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization Shubao Zhang and Hui Qian and Xiaojin Gong College of Computer Science and Technology Department

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 16 Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications Fanhua Shang, Member, IEEE,

More information

BENEFITING from the success of Compressive Sensing

BENEFITING from the success of Compressive Sensing Nonconvex Nonsmooth Low Rank Minimization via Iteratively Reweighted Nuclear Norm Canyi Lu, Student Member, IEEE, Jinhui Tang, Senior Member, IEEE, Shuicheng Yan, Senior Member, IEEE, and Zhouchen Lin,

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization

Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization Canyi Lu 1, Jiashi Feng 1, Yudong Chen, Wei Liu 3, Zhouchen Lin 4,5,, Shuicheng Yan 6,1

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

arxiv: v2 [stat.ml] 1 Jul 2013

arxiv: v2 [stat.ml] 1 Jul 2013 A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank Hongyang Zhang, Zhouchen Lin, and Chao Zhang Key Lab. of Machine Perception (MOE), School of EECS Peking University,

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Noname manuscript No. (will be inserted by the editor) The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Zhouchen Lin Minming Chen Yi Ma Received: date / Accepted:

More information

Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements

Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements Canyi Lu, Jiashi Feng 2, Zhouchen Li,4 Shuicheng Yan 5,2 Department of Electrical and Computer Engineering, Carnegie Mellon University 2

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization

Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization 0 IEEE th International Conference on Data Mining Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization Feiping Nie, Hua Wang,XiaoCai, Heng Huang and Chris Ding Department of Computer

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank

A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank Hongyang Zhang, Zhouchen Lin, and Chao Zhang Key Lab. of Machine Perception (MOE), School of EECS Peking University,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

arxiv: v1 [cs.cv] 17 Nov 2015

arxiv: v1 [cs.cv] 17 Nov 2015 Robust PCA via Nonconvex Rank Approximation Zhao Kang, Chong Peng, Qiang Cheng Department of Computer Science, Southern Illinois University, Carbondale, IL 6901, USA {zhao.kang, pchong, qcheng}@siu.edu

More information

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization MM has been successfully applied to a wide range of problems. Mairal (2013 has given a comprehensive review on MM. Conceptually,

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Reweighted Nuclear Norm Minimization with Application to System Identification

Reweighted Nuclear Norm Minimization with Application to System Identification Reweighted Nuclear Norm Minimization with Application to System Identification Karthi Mohan and Maryam Fazel Abstract The matrix ran minimization problem consists of finding a matrix of minimum ran that

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization Chen Xu 1, Zhouchen Lin 1,2,, Zhenyu Zhao 3, Hongbin Zha 1 1 Key Laboratory of Machine Perception (MOE), School of EECS, Peking

More information

Tensor-Tensor Product Toolbox

Tensor-Tensor Product Toolbox Tensor-Tensor Product Toolbox 1 version 10 Canyi Lu canyilu@gmailcom Carnegie Mellon University https://githubcom/canyilu/tproduct June, 018 1 INTRODUCTION Tensors are higher-order extensions of matrices

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of

More information

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Noname manuscript No. (will be inserted by the editor) The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Zhouchen Lin Minming Chen Yi Ma Received: date / Accepted:

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization

Nonconvex Sparse Logistic Regression with Weakly Convex Regularization Nonconvex Sparse Logistic Regression with Weakly Convex Regularization Xinyue Shen, Student Member, IEEE, and Yuantao Gu, Senior Member, IEEE Abstract In this work we propose to fit a sparse logistic regression

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds

Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds Hongyang Zhang, Zhouchen Lin, Chao Zhang, Edward

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

PRINCIPAL Component Analysis (PCA) is a fundamental approach

PRINCIPAL Component Analysis (PCA) is a fundamental approach IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm Canyi Lu, Jiashi Feng, Yudong Chen, Wei Liu, Member, IEEE, Zhouchen

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Application of Tensor and Matrix Completion on Environmental Sensing Data

Application of Tensor and Matrix Completion on Environmental Sensing Data Application of Tensor and Matrix Completion on Environmental Sensing Data Michalis Giannopoulos 1,, Sofia Savvaki 1,, Grigorios Tsagkatakis 1, and Panagiotis Tsakalides 1, 1- Institute of Computer Science

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Non-Convex Rank/Sparsity Regularization and Local Minima

Non-Convex Rank/Sparsity Regularization and Local Minima Non-Convex Rank/Sparsity Regularization and Local Minima Carl Olsson, Marcus Carlsson Fredrik Andersson Viktor Larsson Department of Electrical Engineering Chalmers University of Technology {calle,mc,fa,viktorl}@maths.lth.se

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Supplemental Figures: Results for Various Color-image Completion

Supplemental Figures: Results for Various Color-image Completion ANONYMOUS AUTHORS: SUPPLEMENTAL MATERIAL (NOVEMBER 7, 2017) 1 Supplemental Figures: Results for Various Color-image Completion Anonymous authors COMPARISON WITH VARIOUS METHODS IN COLOR-IMAGE COMPLETION

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Matrix Completion for Structured Observations

Matrix Completion for Structured Observations Matrix Completion for Structured Observations Denali Molitor Department of Mathematics University of California, Los ngeles Los ngeles, C 90095, US Email: dmolitor@math.ucla.edu Deanna Needell Department

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS Andrew S. Lan 1, Christoph Studer 2, and Richard G. Baraniuk 1 1 Rice University; e-mail: {sl29, richb}@rice.edu 2 Cornell University; e-mail:

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material)

A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material) A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material) Ganzhao Yuan and Bernard Ghanem King Abdullah University of Science and Technology (KAUST), Saudi Arabia

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Coherent imaging without phases

Coherent imaging without phases Coherent imaging without phases Miguel Moscoso Joint work with Alexei Novikov Chrysoula Tsogka and George Papanicolaou Waves and Imaging in Random Media, September 2017 Outline 1 The phase retrieval problem

More information

Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization

Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization Low-Rank Tensor Completion by Truncated Nuclear Norm Regularization Shengke Xue, Wenyuan Qiu, Fan Liu, and Xinyu Jin College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou,

More information