FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT

Size: px
Start display at page:

Download "FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT"

Transcription

1 FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT N. S. AYBAT, D. GOLDFARB, AND G. IYENGAR Abstract. The stable principal component pursuit SPCP problem is a non-smooth convex optimization problem, the solution of which has been shown both in theory and in practice to enable one to recover the low ran and sparse components of a matrix whose elements have been corrupted by Gaussian noise. In this paper, we first show how several existing fast first-order methods can be applied to this problem very efficiently. Specifically, we show that the subproblems that arise when applying optimal gradient methods of Nesterov, alternating linearization methods and alternating direction augmented Lagrangian methods to the SPCP problem either have closed-form solutions or have solutions that can be obtained with very modest effort. Later, we develop a new first order algorithm, NSA, based on partial variable splitting. All but one of the methods analyzed require at least one of the non-smooth terms in the objective function to be smoothed and obtain an ɛ-optimal solution to the SPCP problem in O1/ɛ iterations. NSA, which wors directly with the fully non-smooth objective function, is proved to be convergent under mild conditions on the sequence of parameters it uses. Our preliminary computational tests show that the latter method, NSA, although its complexity is not nown, is the fastest among the four algorithms described and substantially outperforms ASALM, the only existing method for the SPCP problem. To best of our nowledge, an algorithm for the SPCP problem that has O1/ɛ iteration complexity and has a per iteration complexity equal to that of a singular value decomposition is given for the first time. 1. Introduction. In [, 1], it was shown that when the data matrix D R m n is of the form D = X 0 + S 0, where X 0 is a low-ran matrix, i.e. ranx 0 min{m, n}, and S 0 is a sparse matrix, i.e. S 0 0 mn. 0 counts the number of nonzero elements of its argument, one can recover the low-ran and sparse components of D by solving the principal component pursuit problem 1 where ξ =. max{m,n} min X + ξ D X 1, 1.1 X R m n For X R m n, X denotes the nuclear norm of X, which is equal to the sum of its singular values, X 1 := m n i=1 j=1 X ij, X := max{ X ij : 1 i m, 1 j n} and X := σ max X, where σ max X is the maximum singular value of X. To be more precise, let X 0 R m n with ranx 0 = r and let X 0 = UΣV T = r i=1 σ iu i vi T denote the singular value decomposition SVD of X 0. Suppose that for some µ > 0, U and V satisfy max U T e i µr i m, max V T e i µr i n, UV T µr mn, 1. where e i denotes the i-th unit vector. Theorem 1.1. [] Suppose D = X 0 + S 0, where X 0 R m n with m < n satisfies 1. for some µ > 0, and the support set of S 0 is uniformly distributed. Then there are constants c, ρ r, ρ s such that with probability of at least 1 cn 10, the principal component pursuit problem 1.1 exactly recovers X 0 and S 0 provided that ranx 0 ρ r mµ 1 logn and S 0 0 ρ s mn. 1.3 In [13], it is shown that the recovery is still possible even when the data matrix, D, is corrupted with a dense error matrix, ζ 0 such that ζ 0 F δ, by solving the stable principal component pursuit SPCP problem Specifically, the following theorem is proved in [13]. P : min X,S R m n{ X + ξ S 1 : X + S D F δ}. 1.4 IEOR Department, Columbia University. nsa106@columbia.edu. IEOR Department, Columbia University. goldfarb@columbia.edu. IEOR Department, Columbia University. gi10@columbia.edu. Research partially supported by ONR grant N , NSF Grant DMS and DOE Grant DE-FG

2 Theorem 1.. [13] Suppose D = X 0 + S 0 + ζ 0, where X 0 R m n with m < n satisfies 1. for some µ > 0, and the support set of S 0 is uniformly distributed. If X 0 and S 0 satisfy 1.3, then for any ζ 0 such that ζ 0 F δ the solution, X, S, to the stable principal component pursuit problem 1.4 satisfies X X 0 F + S S 0 F Cmnδ for some constant C with high probability. Principal component pursuit and stable principal component pursuit both have applications in video surveillance and face recognition. For existing algorithmic approaches to solving principal component pursuit see [, 3, 6, 7, 13] and references therein. In this paper, we develop four different fast first-order algorithms to solve the SPCP problem P. The first two algorithms are direct applications of Nesterov s optimal algorithm [9] and the proximal gradient method of Tseng [11], which is inspired by both FISTA and Nesterov s infinite memory algorithms that are introduced in [1] and [9], respectively. In this paper it is shown that both algorithms can compute an ɛ-optimal, feasible solution to P in O1/ɛ iterations. The third and fourth algorithms apply an alternating direction augmented Lagrangian approach to an equivalent problem obtained by partial variable splitting. The third algorithm can compute an ɛ-optimal, feasible solution to the problem in O1/ɛ iterations, which can be easily improved to O1/ɛ complexity. Given ɛ > 0, all first three algorithms use suitably smooth versions of at least one of the norms in the objective function. The fourth algorithm NSA wors directly with the original non-smooth objective function and can be shown to converge to an optimal solution of P, provided that a mild condition on the increasing sequence of penalty multipliers holds. To best of our nowledge, an algorithm for the SPCP problem that has O1/ɛ iteration complexity and has a per iteration complexity equal to that of a singular value decomposition is given for the first time. The only algorithm that we now of that has been designed to solve the SPCP problem P is the algorithm ASALM [10]. The results of our numerical experiments comparing NSA algorithm with ASALM has shown that NSA is faster and also more robust to changes in problem parameters.. Proximal Gradient Algorithm with Smooth Objective Function. In this section we show that Nesterov s optimal algorithm [8, 9] for simple sets is efficient for solving P. For fixed parameters µ > 0 and ν > 0, define the smooth C 1,1 functions f µ. and g ν. as follows f µ X = g ν S = max X, U µ U R m n : U 1 U F,.1 max S, W ν W R m n : W 1 W F.. Clearly, f µ. and g ν. closely approximate the non-smooth functions fx := X and gs := S 1, respectively. Also let χ := {X, S R m n R m n : X + S D F δ} and L = 1 µ + 1 ν, where 1 µ and 1 ν are the Lipschitz constants for the gradients of f µ. and g ν., respectively. Then Nesterov s optimal algorithm [8, 9] for simple sets applied to the problem: min X,S R m n{f µx + ξ g ν S : X, S χ},.3 is given by Algorithm 1. Because of the simple form of the set χ, it is easy to ensure that all iterates Y x, Y s, Zx, Zs and X +1, S +1 lie in χ. Hence, Algorithm 1 enjoys the full convergence rate of OL/ of the Nesterov s method. Thus, setting µ = Ωɛ and ν = Ωɛ, Algorithm 1 computes an ɛ-optimal and feasible solution to problem P in = O1/ɛ iterations. The iterates Y x, Y s and Zx, Zs that need to be computed at each iteration of Algorithm 1 are solutions to an optimization problem of the form: { L P s : min X X,S R m n X F + S S } F + Q x, X + Q s, S : X, S χ..4 The following lemma shows that the solution to problems of the form P s can be computed efficiently.

3 Algorithm 1 SMOOTH PROXIMAL GRADIENTX 0, S 0 1: input: X 0 R m n, S 0 R m n : 0 3: while do 4: Compute f µx and { g νs } 5: Y x, Y s argmin X,S fµx, X + g νs, S + L X X F + S S F : X, S χ 6: Γ X, S := i+1 i=0 { fµxi, X + gνsi, S } { } 7: Z x, Z s argmin X,S Γ X, S + L X X0 F + S S 0 F : X, S χ 8: X +1, S +1 9: : end while 11: return X, S Y x, Y s + +3 Z x, Z s Lemma.1. The optimal solution X, S to problem P s can be written in closed form as follows. When δ > 0, θ X S L + θ = D q s + q x X,.5 L + θ θ S = L + θ where q x X := X 1 L Q x, q s S := S 1 L Q s and θ = max { 0, D q x X + L + θ L + θ L + θ L q x X + q s S D F δ q s S,.6 1 }..7 When δ = 0, X = 1 D q s S + 1 q x X and S = 1 D q x X + 1 q s S..8 Proof. Suppose that δ > 0. Writing the constraint in problem P s, X, S χ, as the Lagrangian function for.4 is given as LX, S; θ = L 1 X + S D F δ,.9 X X F + S S F + Q x, X X + Q s, S S + θ X + S D F δ. Therefore, the optimal solution X, S and optimal Lagrangian multiplier θ R must satisfy the Karush- Kuhn-Tucer KKT conditions: i. X + S D F δ, ii. θ 0, iii. θ X + S D F δ = 0, iv. LX X + θ X + S D + Q x = 0, v. LS S + θ X + S D + Q s = 0. Conditions iv and v imply that X, S satisfy.5 and.6, from which it follows that X + S D = L L + θ q x X + q s S D..10 3

4 Case 1: q x X + q s S D F δ. Setting X = q x X, S = q s S and θ = 0, clearly satisfies.5,.6 and conditions i from.10, ii and iii. Thus, this choice of variables satisfies all the five KKT conditions. Case : q x X+q s S D F > δ. Set θ = L qx X+q s S D F δ 1. Since q x X+q s S D F > δ, θ > 0; hence, ii is satisfied. Moreover, for this value of θ, it follows from.10 that X +S D F = δ. Thus, KKT conditions i and iii are satisfied. Therefore, setting X and S according to.5 and.6, respectively; and setting { θ L q x = max 0, X + q s S } D F 1, δ satisfies all the five KKT conditions. Now, suppose that δ = 0. Since S = D X, problem P s can be written as min X R m n X X + Qx L F + D X S + Qs L F, which is also equivalent to the problem: min X R m n X q x X F + X D q s S F. Then.8 trivially follows from first-order optimality conditions for this problem and the fact that S = D X. 3. Proximal Gradient Algorithm with Partially Smooth Objective Function. In this section we show how the proximal gradient algorithm, Algorithm 3 in [11], can be applied to the problem min X,S R m n{f µx + ξ S 1 : X, S χ}, 3.1 where f µ. is the smooth function defined in.1 such that f µ. is Lipschitz continuous with constant L µ = 1 µ. This algorithm is given in Algorithm. Algorithm PARTIALLY SMOOTH PROXIMAL GRADIENTX 0, S 0 1: input: X 0 R m n, S 0 R m n : Z0 x, Z0 s X 0, S 0, 0 3: while do 4: Y x, Y s + 5: Compute f µy x X, S + + { 6: Z+1, x Z+1 s argmin X,S i=0 7: X +1, S +1 X, S + 8: + 1 9: end while 10: return X, S + Z x, Z s i+1 } x {ξ S 1 + fµyi, X } + Lµ X X0 F : X, S χ Z+1, x Z+1 s + Mimicing the proof in [11], it is easy to show that Algorithm, which uses the prox function 1 X X 0 F, converges to the optimal solution of 3.1. Given X 0, S 0 χ, e.g. X 0 = 0 and S 0 = D, the current algorithm eeps all iterates in χ as in Algorithm 1, and hence it enjoys the full convergence rate of OL/. Thus, setting µ = Ωɛ, Algorithm computes an ɛ-optimal, feasible solution of problem P in = O1/ɛ iterations. The only thing left to be shown is that the optimization subproblems in Algorithm can be solved efficiently. The subproblem that has to be solved at each iteration to compute Z+1 x, Zs +1 has the form: { P ns : min ξ S 1 + Q, X X + ρ X X } F : X, S χ, 3. for some ρ > 0. Lemma 3.1 shows that these computations can be done efficiently. 4

5 Lemma 3.1. The optimal solution X, S to problem P ns can be written in closed form as follows. When δ > 0, S = sign D q X { max D q X ξ ρ + } θ ρθ E, 0, 3.3 X = θ ρ + θ D S + ρ ρ + θ q X, 3.4 where q X := X 1 ρ Q, E and 0 Rm n are matrices with all components equal to ones and zeros, respectively, and denotes the componentwise multiplication operator. θ = 0 if D q X F δ; otherwise, θ is the unique positive solution of the nonlinear equation φθ = δ, where { ξ φθ := min θ E, ρ ρ + θ } D q X F. 3.5 Moreover, θ can be efficiently computed in Omn logmn time. When δ = 0, S = sign D q X max { D q X } ξρ E, 0 and X = D S. 3.6 Proof. Suppose that δ > 0. Let X, S be an optimal solution to problem P ns and θ denote the optimal Lagrangian multiplier for the constraint X, S χ written as.9. Then the KKT optimality conditions for this problem are i. Q + ρx X + θ X + S D = 0, ii. ξg + θ X + S D = 0 and G S 1, iii. X + S D F δ, iv. θ 0, v. θ X + S D F δ = 0. From i and ii, we have [ ] [ ] ρ + θ I θ I X θ I θ = I S [ θ D + ρ q X θ D ξg ], 3.7 where q X = X 1 ρ Q. From 3.7 it follows that [ ρ + θ I θ I 0 ρθ ρ+θ I ] [ ] [ X S = ρθ ρ+θ θ D + ρ q X D q X ξg ]. 3.8 From the second equation in 3.8, we have ξ ρ + θ ρθ G + S + q X D = But 3.9 is precisely the first-order optimality conditions for the shrinage problem { min S R m n ξ ρ + θ ρθ S S + q X D F Thus, S is the optimal solution to the shrinage problem and is given by follows from the first equation in 3.8, and it implies X + S D = }. ρ ρ + θ S + q X D

6 Therefore, X + S D F = ρ ρ + θ S + q X D F, = ρ ρ + θ sign D q X = ρ { { D q X ξ ρ + θ } ρθ E, 0 D q X F, max ρ + θ max D q X ξ ρ + } θ ρθ E, 0 D q X F, = ρ { ρ + θ min ξ ρ + } θ ρθ E, D q X F, { } ξ = min θ E, ρ D q X ρ + θ F, 3.11 where the second equation uses 3.3. Now let φ : R + R + be { } ξ φθ := min θ E, ρ D q X F. 3.1 ρ + θ Case 1: D q X F δ. θ = 0, S = 0 and X = q X trivially satisfy all the KKT conditions. Case : D q X F > δ. It is easy to show that φ. is a strictly decreasing function of θ. Since φ0 = D q X F > δ and lim θ φθ = 0, there exists a unique θ > 0 such that φθ = δ. Given θ, S and X can then be computed from equations 3.3 and 3.4, respectively. Moreover, since θ > 0 and φθ = δ, 3.11 implies that X, S and θ satisfy the KKT conditions. We now show that θ can be computed in Omn logmn time. Let A := D q X and 0 a 1 a... a mn be the mn elements of the matrix A sorted in increasing order, which can be done in Omn logmn time. Defining a 0 := 0 and a mn+1 :=, we then have for all j {0, 1,..., mn} that ρ ρ + θ a j ξ θ ρ ρ + θ a j+1 1 ξ a j 1 ρ 1 θ 1 ξ a j+1 1 ρ For all < j mn define θ j such that 1 θ j = 1 ξ a j 1 ρ and let := max Then for all < j mn { j : 1 θ j } 0, j {0, 1,..., mn}. j φθ j = ρ ξ a ρ + θ i + mn j j θ j i=0 Also define θ := and θ mn+1 := 0 so that φθ := 0 and φθ mn+1 = φ0 = A F > δ. Note that {θ j } { <j mn} contains all the points at which φθ may not be differentiable for θ 0. Define j := max{j : φθ j δ, j mn}. Then θ is the unique solution of the system j ρ ξ a ρ + θ i + mn j = δ and θ > 0, 3.15 θ i=0 since φθ is continuous and strictly decreasing in θ for θ 0. Solving the equation in 3.15 requires finding the roots of a fourth-order polynomial a..a. quartic function; therefore, one can compute θ > 0 using the algebraic solutions of quartic equations as shown by Lodovico Ferrari in 1540, which requires O1 operations. Note that if = mn, then θ is the solution of the equation ρ mn ρ + θ a i = δ, i=1

7 i.e. θ = ρ A F δ 1 = ρ D X F δ 1. Hence, we have proved that problem P ns can be solved efficiently. Now, suppose that δ = 0. Since S = D X, problem P ns can be written as min S R m n ξ ρ S S D q X F Then 3.6 trivially follows from first-order optimality conditions for the above problem and the fact that X = D S. The following lemma will be used later in Section 5. However, we give its proof here, since it uses some equations from the proof of Lemma 3.1. Let 1 χ.,. denote the indicator function of the closed convex set χ R m n R m n, i.e. if Z, S χ, then 1 χ Z, S = 0; otherwise, 1 χ Z, S =. Lemma 3.. Suppose that δ > 0. Let X, S be an optimal solution to problem P ns and θ be an optimal Lagrangian multiplier such that X, S and θ together satisfy the KKT conditions, i-v in the proof of Lemma 3.1. Then W, W 1 χ X, S, where W := Q + ρ X X = θ X + S D. Proof. Let W := Q + ρ X X, then from i and v of the KKT optimality conditions in the proof of Lemma 3.1, we have W = θ X + S D and W F = θ X + S D = θ X + S D δ + θ δ = θ δ Moreover, for all X, S χ, it follows from the definition of χ that W, θ X + S D θ W F X + S D F θ δ W F. Thus, for all X, S χ, we have W, W = W F = θ δ W F W, θ X + S D. Hence, 0 W, θ X + S D W = W, θ X X + S S X, S χ It follows from the proof of Lemma 3.1 that if D q X F > δ, then θ > 0, where q X = X 1 ρ Q. Therefore, 3.19 implies that 0 W, X X + S S X, S χ. 3.0 On the other hand, if D q X F δ, then θ = 0. Hence W = θ X + S D = 0, and 3.0 follows trivially. Therefore, 3.0 always holds and this shows that W, W 1 χ X, S. 4. Alternating Linearization and Augmented Lagrangian Algorithms. In this and the next section we present algorithms for solving problems 3.1 and 1.4 that are based on partial variable splitting combined with alternating minimization of a suitably linearized augmented Lagrangian function. We can write problems 1.4 and 3.1 generically as min X,S Rm n{φx + ξ gs : X, S χ}. 4.1 For problem 1.4, φx = fx = X, while for problem 3.1, φx = f µ X given in.1. In this section, we first assume that assume that φ : R m n R and g : R m n R m n R are any closed convex functions such that φ is Lipschitz continuous, and χ is a general closed convex set. Here we use partial variable splitting, i.e. we only split the X variables in 4.1, to arrive at the following equivalent problem min X,S,Z Rm n{φx + ξ gs : X = Z, Z, S χ}. 4. Let ψz, S := ξ gs + 1 χ Z, S and define the augmented Lagrangian function L ρ X, Z, S; Y = φx + ψz, S + Y, X Z + ρ X Z F. 4.3 Then minimizing 4.3 by alternating between X and then Z, S leads to several possible methods that can compute a solution to 4.. These include the alternating linearization method ALM with sipping step 7

8 Algorithm 3 ALM-SY 0 1: input: X 0 R m n, S 0 R m n, Y 0 R m n : Z 0 X 0, 0 3: while 0 do 4: X +1 argmin X L ρx, Z, S ; Y 5: if φx +1 + ψx +1, S > L ρx +1, Z, S ; Y then 6: X +1 Z 7: end if 8: Z +1, S +1 argmin Z,S ψz, S + φx +1 + φx +1, Z X +1 + ρ Z X +1 F 9: Y +1 φx +1 + ρx +1 Z +1 10: : end while that has an O ρ convergence rate, and the fast version of this method with an O ρ rate see [3] for full splitting versions of these methods. In this paper, we only provide a proof of the complexity result for the alternating linearization method with sipping steps ALM-S in Theorem 4.1 below. One can easily extend the proof of Theorem 4.1 to an ALM method based on 4.3 with the function gs replaced by a suitably smoothed version see [3] for the details of ALM algorithm. Theorem 4.1. Let φ : R m n R and ψ : R m n R m n R be closed convex functions such that φ is Lipschitz continuous with Lipschitz constant L, and χ be a closed convex set. Let ΦX, S := φx+ψx, S. For ρ L, the sequence {Z, S } Z+ in Algorithm ALM-S satisfies ΦZ, S ΦX, S ρ X 0 X F + n, 4.4 where X, S = argmin X,S R m n ΦX, S, n := 1 i=0 1 {ΦX i+1,s i>l ρx i+1,z i,s i;y i} and 1 {.} is 1 if its argument is true; otherwise, 0. Proof. See Appendix A for the proof. We obtain Algorithm 4 by applying Algorithm 3 to solve problem 3.1, where the smooth function φx = f µ X, defined in.1, the non-smooth closed convex function is ξ S χ X, S and χ = {X, S R m n R m n : X + S D F δ}. Theorem 4.1 shows that Algorithm 4 has an iteration complexity of O 1 ɛ to obtain ɛ-optimal and feasible solution of P. Algorithm 4 PARTIALLY SMOOTH ALMY 0 1: input: Y 0 R m n : Z 0 0, S 0 D, 0 3: while 0 do 4: X +1 argmin X f µx + Y, X Z + ρ X Z F 5: B f µx +1 + ξ S 1 + Y, X +1 Z + ρ X +1 Z F 6: if f µx +1 + ξ S χx +1, S > B then 7: X +1 Z 8: end if 9: Z +1, S +1 argmin Z,S {ξ S 1 + f µx +1, Z X +1 + ρ Z X +1 F : Z, S χ} 10: Y +1 f µx +1 + ρx +1 Z +1 11: + 1 1: end while Using the fast version of Algorithm 3, a fast version of Algorithm 4 with Oρ/ convergence rate, employing partial splitting and alternating linearization, can be constructed. This fast version can compute an ɛ-optimal and feasible solution to problem P in O1/ɛ iterations. Moreover, lie the proximal gradient methods described earlier, each iteration for these methods can be computed efficiently. The subproblems 8

9 to be solved at each iteration of Algorithm 4 and its fast version have the following generic form: min X R m n f µx + Q, X X + ρ X X F, 4.5 min {ξ S 1 + Q, Z Z + ρ Z,S R m n Z Z F : Z, S χ}. 4.6 Let U diagσv T denote the singular value decomposition of the matrix X Q/ρ, then X, the minimizer of the subproblem in 4.5, can be easily computed as U diag σ V T. And Lemma 3.1 shows how to solve the subproblem in 4.6. σ max{ρσ, 1+ρµ} 5. Non-smooth Augmented Lagrangian Algorithm. Algorithm 5 is a Non-Smooth Augmented Lagrangian Algorithm NSA that solves the non-smooth problem P. The subproblem in Step 4 of Algorithm 5 is a matrix shrinage problem and can be solved efficiently by computing a singular value decomposition SVD of an m n matrix; and Lemma 3.1 shows that the subproblem in Step 6 can also be solved efficiently. Algorithm 5 NSAZ 0, Y 0 1: input: Z 0 R m n, Y 0 R m n : 0 3: while 0 do 4: X +1 argmin X { X + Y, X Z + ρ X Z F } 5: Ŷ +1 Y + ρ X +1 Z 6: Z +1, S +1 argmin {Z,S: Z+S D F δ } {ξ S 1 + Y, Z X +1 + ρ Z X +1 F } 7: Let θ be an optimal Lagrangian dual variable for the 1 Z + S D F δ constraint 8: Y +1 Y + ρ X +1 Z +1 9: Choose ρ +1 such that ρ +1 ρ 10: : end while We now prove that Algorithm NSA converges under fairly mild conditions on the sequence {ρ } Z+ of penalty parameters. We first need the following lemma, which extends the similar result given in [6] to partial splitting of variables. Lemma 5.1. Suppose that δ > 0. Let {X, Z, S, Y, θ } Z+ be the sequence produced by Algorithm NSA. X, X, S 1 = argmin X,Z,S { X + ξ S 1 : Z + S D F δ, X = Z} be any optimal solution, Y R m n and θ 0 be any optimal Lagrangian duals corresponding to the constraints X = Z and 1 Z + S D F δ, respectively. Then { Z X F + ρ Y Y F } Z + is a non-increasing sequence and Z + Z +1 Z F < Z + ρ Y +1 Y F <, Z + ρ 1 Y +1 + Y, S +1 S < Z + ρ 1 Ŷ+1 + Y, X +1 X <, Z + ρ 1 Y Y +1, X + S Z +1 S +1 <. Proof. See Appendix B for the proof. Given partially split SPCP problem, min X,Z,S { X + ξ S 1 : X = Z, Z, S χ}, let L be its Lagrangian function LX, Z, S; Y, θ = X + ξ S 1 + Y, X Z + θ Z + S D F δ. 5.1 Theorem 5.. Suppose that δ > 0. Let {X, Z, S, Y, θ } Z+ be the sequence produced by Algorithm NSA. Choose {ρ } Z+ such that 9

10 i 1 Z + ρ = : Then lim Z+ Z = lim Z+ X = X, lim Z+ S = S such that X, S = argmin{ X + ξ S 1 : X + S D F δ}. ii Z + 1 ρ = : If D X F δ, then lim Z+ θ = θ 0 and lim Z+ Y = Y such that X, X, S, Y, θ is a saddle point of the Lagrangian function L in 5.1. Otherwise, if D X F = δ, then there exists a limit point, Y, θ, of the sequence {Y, θ } Z+ such that Y, θ = argmax Y,θ {LX, X, S ; Y, θ : θ 0}. Remar 5.1. Requiring 1 Z + ρ = is similar to the condition in Theorem in [6], which is needed to show that Algorithm I-ALM converges to an optimal solution of the robust PCA problem. Remar 5.. Let D = X 0 + S 0 + ζ 0 such that ζ 0 F δ and X 0, S 0 satisfies the assumptions of Theorem 1.. If S 0 F > Cmnδ, then with very high probability, D X F > δ, where C is the numerical constant defined in Theorem 1.. Therefore, most of the time in applications, one does not encounter the case where D X F = δ. Proof. From Lemma 5.1 and the fact that X +1 Z +1 = 1 ρ Y +1 Y for all 1, we have > ρ Y +1 Y F = X +1 Z +1 F. Z + Z + Hence, lim Z+ X Z = 0. Let X #, X #, S # 1 = argmin X,Z,S { X + ξ S 1 : Z + S D F δ, X = Z} be any optimal solution, Y # R m n and θ # 0 be any optimal Lagrangian duals corresponding to X = Z and 1 Z + S D F δ constraints, respectively and f := X # + ξ S # 1. Moreover, let χ = {Z, S R m n R m n : Z + S D F δ} and 1 χ Z, S denote the indicator function of the closed convex set χ, i.e. 1 χ Z, S = 0 if Z, S χ; otherwise, 1 χ Z, S =. Since the sequence {Z, S } Z+ produced by NSA is a feasible sequence for the set χ, we have 1 χ Z, S = 0 for all 1. Hence, the following inequality is true for all 0 X + ξ S 1 = X + ξ S χ Z, S, X # + ξ S # χ X #, S # Ŷ, X # X Y, S # S Y, X # + S # Z S, = f + Ŷ + Y #, X X # + Y + Y #, S S # + Y # Y, X # + S # Z S 5. + Y #, Z X, where the inequality follows from the convexity of norms and the fact that Y ξ S 1, Ŷ X and Y, Y 1 χ Z, S ; the final equality follows from rearranging the terms and the fact that X #, S # χ. From Lemma 5.1, we have Z + ρ 1 1 Ŷ + Y #, X X # + Y + Y #, S S # + Y # Y, X # + S # Z S Since 1 Z + ρ =, there exists K Z + such that lim Ŷ + Y #, X X # + Y + Y #, S S # + Y # Y, X # + S # Z S K <. = and the fact that lim Z+ Z X = 0 imply that along K 5. converges to f = X # + ξ S # 1 = min{ X +ξ S 1 : X, S χ}; hence along K subsequence, { X +ξ S 1 } K is a bounded sequence. Therefore, there exists K K Z + such that lim K X, S = X, S. Also, since lim Z+ Z X = 0 and Z, S χ for all 1, we also have X, S = lim K Z, S χ. Since the limit of both sides of 5. along K gives X + ξ S 1 = lim K X + ξ S 1 f and X, S χ, we conclude that X, S = argmin{ X + ξ S 1 : X, S χ}. 10

11 It is also true that X, X, S is an optimal solution to an equivalent problem: argmin X,Z,S { X + 1 ξ S 1 : Z + S D F δ, X = Z}. Now, let Ȳ Rm n and θ 0 be optimal Lagrangian duals corresponding to X = Z and 1 Z + S D F δ constraints, respectively. From Lemma 5.1, it follows that { Z X F + ρ Y Ȳ F } Z + is a bounded non-increasing sequence. Hence, it has a unique limit point, i.e. lim Z X F = lim Z X F + ρ Y Ȳ Z + Z F = lim Z X + K F + ρ Y Ȳ F = 0, where the equalities follow from the facts that lim K Z = X, µ as and {Ŷ} Z+, {Y } Z+ are bounded sequences. lim Z+ Z X F = 0 and lim Z+ Z X = 0 imply that lim Z+ X = X. Using Lemma 3.1 for the -th subproblem given in Step 6 in Algorithm 5, we have { S +1 = sign D X ρ Y max D X Y ξ ρ } + θ, 5.4 ρ Z +1 = θ ρ + θ D S +1 + ρ ρ + θ E, 0 ρ θ X ρ Y. 5.5 If D X ρ Y F δ, then θ = 0; otherwise, θ > 0 is the unique solution such that φ θ = δ, where { φ θ := ξ min θ E, ρ ρ + θ X D } F Y. 5.6 ρ In the following, it is shown that the sequence {S } Z+ has a unique limit point S. Since lim Z+ X = X, {Y } Z+ is a bounded sequence and ρ as, we have lim Z+ X ρ Y = X. Case 1: D X F δ. Previously, we have shown that that exists a subsequence K Z + such that lim K X, S = X, S = argmin X,S { X + ξ S 1 : X + S D F δ}. On the other hand, since D X F δ, X, 0 is a feasible solution. Hence, X + ξ S X, which implies that S = 0. X + ξ S 1 = X + ξ S χ Z, S, X + ξ χ X, 0 Ŷ, X X Y, 0 S Y, X + 0 Z S, = X + Ŷ, X X + Y, Z X. 5.7 Since the sequences {Y } Z+ and {Ŷ} Z+ are bounded and lim Z+ X = lim Z+ Z = X, taing the limit on both sides of 5.7, we have X + ξ lim S 1 = lim X + ξ S 1 Z + Z + = lim Z + X + Ŷ, X X + Y, Z X = X. Therefore, lim Z+ S 1 = 0, which implies that lim Z+ S = S = 0. Case : D X F > δ. Since D X ρ Y F D X F > δ, there exists K Z + such that for all K, D X ρ Y F > δ. For all K, φ. is a continuous and strictly decreasing function of θ for θ 0. Hence, inverse function φ 1. exits around δ for all K. Thus, φ 0 = D X ρ Y F > δ and lim θ φ θ = 0 imply that θ = φ 1 δ > 0 for all K. Moreover, φ θ φθ := ξ θ E F implies that θ ξ mn δ for all K. Therefore, {θ } Z+ is a bounded sequence, which has a convergent subsequence K θ Z + such that lim Kθ θ = θ. We also have φ θ φ θ pointwise for all 0 θ ξ mn δ, where { φ θ := ξ } min θ E, D X F

12 Since φ θ = δ for all K, we have { δ = lim φ θ = ξ K min E, θ ρ ρ + θ X D } F Y = φ θ. 5.9 ρ Note that since D X F > δ, φ is invertible around δ, i.e. φ 1 exists around δ. Thus, θ = φ 1 δ. Since K θ is an arbitrary subsequence, we can conclude that θ := lim Z+ θ = φ 1 δ. Since there exists θ > 0 such that θ = lim Z+ θ, taing the limit on both sides of 5.4, we have { S := lim S +1 = sign D X max D X ξ } Z + θ E, 0, 5.10 and this completes the first part of the theorem. Now, we will show that if D X F δ, then the sequences {θ } Z+ and {Y } Z+ have unique limits. Note that from B.3, it follows that Y = θ 1 Z + S D for all 1. First suppose that D X F < δ. Since D X ρ Y F D X F < δ, there exists K Z + such that for all K, D X ρ Y F < δ. Thus, from Lemma 3.1 for all K, θ = 0, S +1 = 0, Z +1 = X ρ Y, which implies that θ := lim Z+ θ = 0 and Y = lim Z+ Y = lim Z+ θ 1 Z + S D = 0 since S = lim K S = lim Z+ S = 0, lim Z+ Z = X and D X F < δ. Now suppose that D X F > δ. In Case above we have shown that θ = lim Z+ θ. Hence, there exists Y R m n such that Y = lim Z+ θ 1 Z + S D = θ X + S D. Suppose that 1 Z + =. From Lemma 5.1, we have ρ Z + Z +1 Z F <. Equivalently, the series can be written as > Z +1 Z F = ρ Ŷ+1 Y +1 F Z + Z + Since 1 Z + =, there exists a subsequence K Z ρ + such that lim K Ŷ+1 Y +1 F lim K ρ Z +1 Z F = 0, i.e. lim K ρ Z +1 Z = 0. Using B.1, B. and B.3, we have = 0. Hence, 0 X +1 + θ Z +1 + S +1 D + ρ Z +1 Z, ξ S θ Z +1 + S +1 D If D X = δ, then there exists Y R m n such that Y = lim Z+ θ 1 Z + S D = θ X + S D. Taing the limit of 5.1,5.13 along K Z + and using the fact that lim K ρ Z +1 Z = 0, we have 0 X + θ X + S D, ξ S 1 + θ X + S D and 5.15 together imply that X, S, Y = θ X + S D and θ satisfy KKT optimality conditions for the problem min X,Z,S { X +ξ S 1 : 1 Z +S D F δ, X = Z}. Hence, X, X, S, Y, θ is a saddle point of the Lagrangian function LX, Z, S; Y, θ = X + ξ S 1 + Y, X Z + θ Z + S D F δ. Suppose that D X F = δ. Fix > 0. If D X ρ Y F δ, then θ = 0. Otherwise, θ > 0 and as shown in case in the first part of the proof θ ξ mn δ. Thus, for any > 0, 0 θ ξ mn δ. Since {θ } Z+ is a bounded sequence, there exists a further subsequence K θ K such that θ := lim Kθ θ 1 and Y := lim Kθ θ 1 Z + S D = θ X + S D exist. Thus, taing the limit of 5.1,5.13 along K θ Z + and using the facts that lim K ρ Z +1 Z = 0 and X = lim Z+ X = lim Z+ Z, S = lim Z+ S exist, we conclude that X, X, S, Y, θ is a saddle point of the Lagrangian function LX, Z, S; Y, θ. 1

13 6. Numerical experiments. Our preliminary numerical experiments showed that among the four algorithms discussed in this paper, NSA is the fastest. It also has very few parameters that need to be tuned. Therefore, we only report the results for NSA. We conducted two sets of numerical experiments with 1 NSA to solve 1.4, where ξ =. In the first set we solved randomly generated instances of the max{m,n} stable principle component pursuit problem. In this setting, first we tested only NSA to see how the run times scale with respect to problem parameters and size; then we compared NSA with another alternating direction augmented Lagrangian algorithm ASALM [10]. In the second set of experiments, we ran NSA and ASALM to extract moving objects from an airport security noisy video [5] Random Stable Principle Component Pursuit Problems. We tested NSA on randomly generated stable principle component pursuit problems. The data matrices for these problems, D = X 0 + S 0 + ζ 0, were generated as follows i. X 0 = UV T, such that U R n r, V R n r for r = c r n and U ij N 0, 1, V ij N 0, 1 for all i, j are independent standard Gaussian variables and c r {0.05, 0.1}, ii. Λ {i, j : 1 i, j n} such that cardinality of Λ, Λ = p for p = c p n and c p {0.05, 0.1}, iii. Sij 0 U[ 100, 100] for all i, j Λ are independent uniform random variables between 100 and 100, iv. ζij 0 ϱn 0, 1 for all i, j are independent Gaussian variables. We created 10 random problems of size n {500, 1000, 1500}, i.e. D R n n, for each of the two choices of c r and c p using the procedure described above, where ϱ was set such that signal-to-noise ratio of D is either 80dB or 45dB. Signal-to-noise ratio of D is given by [ ] E X 0 + S 0 F SNRD = 10 log 10 E [ ζ 0 F ] cr n + c s 100 /3 = 10 log 10 ϱ. 6.1 Hence, for a given SNR value, we selected ϱ according to 6.1. Table 6.1 displays the ϱ value we have used in our experiments. As in [10], we set δ = n + 8nϱ in 1.4 in the first set of experiments for both Table 6.1 ϱ values depending on the experimental setting SNR n c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p= dB dB NSA and ASALM. Our code for NSA was written in MATLAB 7. and can be found at ~nsa106. We terminated the algorithm when X +1, S +1 X, S F X, S F + 1 ϱ. 6. The results of our experiments are displayed in Tables 6. and 6.3. In Table 6., the row labeled CPU lists the running time of NSA in seconds and the row labeled SVD# lists the number of partial singular value decomposition SVD computed by NSA. The minimum, average and maximum CPU times and number of partial SVD taen over the 10 random instances are given for each choice of n, c r and c p values. Table C.3 and Table C.4 in the appendix list additional error statistics. With the stopping condition given in 6., the solutions produced by NSA have Xsol +S sol D F D F approximately when SNRD = 80dB and when SNRD = 45dB, regardless of the problem dimension n and the problem parameters related to the ran and sparsity of D, i.e. c r and c p. After thresholding the singular values of X sol that were less than , NSA found the true ran in all 10 13

14 random problems solved when SNRD = 80dB, and it found the true ran for 113 out of 10 problems when SNRD = 45dB, while for 6 of the remaining problems ranx sol is off from ranx 0 only by 1. Table 6. shows that the number of partial SVD was a very slightly increasing function of n, c r and c p. Moreover, Table 6.3 shows that the relative error of the solution X sol, S sol was almost constant for different n, c r and c p values. Table 6. NSA: Solution time for decomposing D R n n, n {500, 1000, 1500} c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p=0.1 SNR n Field min/avg/max min/avg/max min/avg/max min/avg/max 80dB 45dB SVD# 9/9.0/9 9/9.5/10 10/10.0/10 11/11/11 CPU 3./4.4/ /5.1/ /5./ /6./8.1 SVD# 9/9.9/10 10/10.0/10 11/11/11 1/1.0/1 CPU 16.5/19.6/ /0.7/4.3 5./6.9/ /31./36.3 SVD# 10/10.0/10 10/10.9/11 1/1.0/1 1/1./13 CPU 38.6/44.1/ /48.6/ /84.1/ /97.7/155. SVD# 6/6/6 6/6.9/7 7/7.1/8 8/8/8 CPU.3/.9/4..9/3.6/4.5.9/3.9/6. 3.5/4./6.0 SVD# 7/7.0/7 7/7.0/7 8/8.1/9 9/9.0/9 CPU 11.5/13.4/ /13.3/ /18.7/ /3.8/8.9 SVD# 7/7.9/8 8/8.0/8 9/9.0/9 9/9.0/9 CPU 34.1/37.7/ /37.1/ /59.0/ /59.7/64.8 Table 6.3 NSA: Solution accuracy for decomposing D R n n, n {500, 1000, 1500} c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p=0.1 SNR n Relative Error avg / max avg / max avg / max avg / max 80dB 45dB X sol X 0 F X 0 F 4.0E-4 / 4.E-4 5.8E-4 / 8.5E-4 3.6E-4 / 3.9E-4 4.4E-4 / 4.5E-4 S sol S 0 F S 0 F 1.7E-4 / 1.8E-4 1.6E-4 /.5E-4 1.6E-4 / 1.8E-4 1.3E-4 / 1.3E-4 X sol X 0 F X 0 F.0E-4 /.4E-4 3.8E-4 / 4.1E-4.E-4 /.E-4.8E-4 /.9E-4 S sol S 0 F S 0 F 1.E-4 / 1.4E-4 1.5E-4 / 1.6E-4 1.E-4 / 1.3E-4 1.1E-4 / 1.1E-4 X sol X 0 F X 0 F 1.8E-4 /.E-4.1E-4 /.6E-4 1.3E-4 / 1.3E-4.8E-4 /.9E-4 S sol S 0 F S 0 F 1.3E-4 / 1.6E-4 9.6E-5 / 1.1E-4 8.1E-5 / 8.5E-5 1.3E-4 / 1.4E-4 X sol X 0 F X 0 F 6.0E-3 / 6.E-3 8.0E-3 / 9.E-3 6.1E-3 / 6.3E-3 8.1E-3 / 8.E-3 S sol S 0 F S 0 F.1E-3 /.E-3.3E-3 /.7E-3.E-3 /.3E-3.7E-3 /.9E-3 X sol X 0 F X 0 F 4.1E-3 / 4.E-3 6.1E-3 / 6.E-3 4.6E-3 / 4.7E-3 6.0E-3 / 6.5E-3 S sol S 0 F S 0 F 1.9E-3 / 1.9E-3.4E-3 /.5E-3.3E-3 / 3.5E-3 3.1E-3 / 3.7E-3 X sol X 0 F X 0 F 3.4E-3 / 3.6E-3 4.7E-3 / 4.7E-3 3.9E-3 / 4.0E-3 5.3E-3 / 5.3E-3 S sol S 0 F S 0 F 1.8E-3 / 1.8E-3.3E-3 /.3E-3.6E-3 / 3.5E-3 3.1E-3 / 3.1E-3 Next, we compared NSA with ASALM [10] for a fixed problem size, i.e. n = 1500 where D R n n. In all the numerical experiments, we terminated NSA according to 6.. For random problems with SNRD = 80dB, we terminated ASALM according to 6.. However, for random problems with SNRD = 45dB, ASALM produced solutions with 99% relative errors when 6. was used. Therefore, for random problems with SNRD = 45dB, we terminated ASALM either when it computed a solution with better relative errors comparing to NSA solution for the same problem or when an iterate satisfied 6. with the righthand side replaced by 0.1ϱ. The code for ASALM was obtained from the authors of [10]. 14

15 The comparison results are displayed in Table 6.5 and Table 6.6. In Table 6.5, the row labeled CPU lists the running time of each algorithm in seconds and the row labeled SVD# lists the number of partial SVD computation of each algorithm. In Table 6.5, the minimum, average and maximum of CPU times and the number of partial SVD computation of each algorithm taen over the 10 random instances are given for each two choices of c r and c p. Moreover, Table C.1 and Table C. given in the appendix list different error statistics. We used PROPACK [4] for computing partial singular value decompositions. In order to estimate the ran of X 0, we followed the scheme proposed in Equation 17 in [6]. Both NSA and ASALM found the true ran in all 40 random problems solved when SNRD = 80dB. NSA found the true ran for 39 out of 40 problems with n = 1500 when SNRD = 45dB, while for the remaining 1 problem ranx sol is off from ranx 0 only by 1. On the other hand, when SNRD = 45dB, ASALM could not find the true ran in any of the test problems. For each of the four problem settings corresponding to different c r and c p values, in Table 6.4 we report the average and maximum of ranx sol over 10 random instances, after thresholding the singular values of X sol that were less than Table 6.5 shows that for all of the problem classes, the number of partial SVD required by ASALM was Table 6.4 NSA vs ASALM: ranx sol values for problems with n = 1500, SNRD = 45dB ranx 0 = 75 ranx 0 = 150 c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p=0.1 Alg. avg / max avg / max avg / max avg / max NSA 75 / / / / 150 ASALM / / 07.4 / / 04 more than twice the number that NSA required. On the other hand, there was a big difference in CPU times; this difference can be explained by the fact that ASALM required more leading singular values than NSA did per partial SVD computation. Table 6.6 shows that although the relative errors of the low-ran components produced by NSA were slightly better, the relative errors of the sparse components produced by NSA were significantly better than those produced by ASALM. Finally, in Figure 6.1, we plot the decomposition of D = X 0 + S 0 + ζ 0 R n n generated by NSA, where ranx 0 = 75, S 0 0 = 11, 500 and SNRD = 45. In the first row, we plot randomly selected 1500 components of S 0 and 100 leading singular values of X 0 in the first row. In the second row, we plot the same components of S sol and 100 singular of X sol produced by NSA. In the third row, we plot the absolute errors of S sol and X sol. Note that the scales of the graphs showing absolute errors of S sol and X sol are larger than those of S 0 and X 0. And in the fourth row, we plot the same 1500 random components of ζ 0. When we compare the absolute error graphs of S sol and X sol with the graph showing ζ 0, we can confirm that the solution produced by NSA is inline with Theorem 1.. Table 6.5 NSA vs ASALM: Solution time for decomposing D R n n, n = 1500 c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p=0.1 SNR Alg. Field min/avg/max min/avg/max min/avg/max min/avg/max 80dB 45dB NSA ASALM NSA ASALM SVD# 10/10.0/10 10/10.9/11 1/1.0/1 1/1./13 CPU 38.6/44.1/ /48.6/ /84.1/ /97.7/155. SVD# /.0/ 0/0.0/0 9/9.0/9 9/9.4/30 CPU 657.3/677.8/ /850.0/ /1316.1/ /1905./004.7 SVD# 7/7.9/8 8/8.0/8 9/9.0/9 9/9.0/9 CPU 34.1/37.7/ /37.1/ /59.0/ /59.7/64.8 SVD# 1/1/1 18/18.5/19 8/8.0/8 7/7.3/8 CPU 666.6/686.9/ /857.1/ /13./ /1739.1/

16 Table 6.6 NSA vs ASALM: Solution accuracy for decomposing D R n n, n = 1500 c r=0.05 c p=0.05 c r=0.05 c p=0.1 c r=0.1 c p=0.05 c r=0.1 c p=0.1 SNR Alg. Relative Error avg / max avg / max avg / max avg / max 80dB 45dB NSA ASALM NSA ASALM X sol X 0 F X 0 F 1.8E-4 /.E-4.1E-4 /.6E-4 1.3E-4 / 1.3E-4.8E-4 /.9E-4 S sol S 0 F S 0 F 1.3E-4 / 1.6E-4 9.6E-5 / 1.1E-4 8.1E-5 / 8.5E-5 1.3E-4 / 1.4E-4 X sol X 0 F X 0 F 3.9E-4 / 4.E-4 8.4E-4 / 8.8E-4 6.6E-4 / 6.8E-4 1.4E-3 / 1.4E-3 S sol S 0 F S 0 F 5.7E-4 / 6.E-4 7.6E-4 / 8.0E-4 1.1E-3 / 1.1E-3 1.4E-3 / 1.4E-3 X sol X 0 F X 0 F 3.4E-3 / 3.6E-3 4.7E-3 / 4.7E-3 3.9E-3 / 4.0E-3 5.3E-3 / 5.3E-3 S sol S 0 F S 0 F 1.8E-3 / 1.8E-3.3E-3 /.3E-3.6E-3 / 3.5E-3 3.1E-3 / 3.1E-3 X sol X 0 F X 0 F 4.6E-3 / 4.8E-3 7.3E-3 / 8.4E-3 4.7E-3 / 4.7E-3 7.8E-3 / 7.9E-3 S sol S 0 F S 0 F 4.8E-3 / 4.9E-3 5.8E-3 / 7.0E-3 5.5E-3 / 5.5E-3 7.3E-3 / 7.5E-3 Fig NSA: Comparison of randomly selected 1500 components of ζ 0 with absolute errors of those components in S sol and σx sol. D R n n, n = 1500, SNRD = 45dB 6.. Foreground Detection on a Noisy Video. We used NSA and ASALM to extract moving objects in an airport security video [5], which is a sequence of 01 grayscale frames of size We assume that the airport security video [5] was not corrupted by Gaussian noise. We formed the i-th column of the data matrix D by stacing the columns of the i th frame into a long vector, i.e. D is in R In order to have a noisy video with SNR = 0dB signal-to-noise ratio SNR, given D, we chose ϱ = D F / SNR/0 and then obtained a noisy D by D = D + ϱ randn , 01, where randnm, n produces a random matrix with independent standard Gaussian entries. Solving for X, S = argmin X,S R { X + ξ S 1 : X + S D F δ}, we decompose D into a low ran 16

17 matrix X and a sparse matrix S. We estimate the i-th frame bacground image with the i-th column of X and estimate the i-th frame moving object with the i-th column of S. Both algorithms are terminated when X +1,S +1 X,S F X,S F +1 ϱ The recovery statistics of each algorithm are are displayed in Table 6.7. X sol, S sol denote the variables corresponding to the low-ran and sparse components of D, respectively, when the algorithm of interest terminates. Figure 7 and Figure 7 show the 35-th, 100-th and 15-th frames of the noise added airport security video [5] in their first row of images. The second and third rows in these tables have the recovered bacground and foreground images of the selected frames, respectively. Even though the visual quality of recovered bacground and foreground are very similar, Table 6.7 shows that both the number of partial SVDs and the CPU time of NSA are significantly less than those for ASALM. Table 6.7 NSA vs ASALM: Recovery statistics for foreground detection on a noisy video Alg. CPU SVD# X sol S sol 1 ranx sol X sol +S sol D F D F NSA ASALM Acnowledgements. We would lie to than to Min Tao for providing the code ASALM. Dt: X sol t: S sol t: Fig Bacground extraction from a video with 0dB SNR using NSA 17

18 Dt: X sol t: S sol t: Fig. 7.. Bacground extraction from a video with 0dB SNR using ASALM 18

19 REFERENCES [1] A. Bec and M. Teboulle, A fast iterative shrinage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 009, pp [] E. J. Candès, X. Li, Y. Ma, and Wright J., Robust principle component analysis?, Journal of ACM, , pp [3] D. Goldfarb, S. Ma, and K. Scheinberg, Fast alternating linearization methods for minimizing the sum of two convex functions. arxiv: v, October 010. [4] R.M. Larsen, Lanczos bidiagonalization with partial reorthogonalization, Technical report DAIMI PB-357, Department of Computer Science, Aarhus University, [5] L. Li, W. Huang, I. Gu, and Q. Tian, Statistical modeling of complex bacgrounds for foreground object detection, IEEE Trans. on Image Processing, , pp [6] Z. Lin, M. Chen, L. Wu, and Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-ran matrices, arxiv: v, 011. [7] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma, Fast convex optimization algorithms for exact recovery of a corrupted low-ran matrix, tech. report, UIUC Technical Report UILU-ENG-09-14, 009. [8] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, 004. [9], Smooth minimization of nonsmooth functions, Mathematical Programming, , pp [10] M. Tao and X. Yuan, Recovering low-ran and sparse components of matrices from incomplete and noisy observations, SIAM Journal on Optimization, 1 011, pp [11] P. Tseng, On accelerated proximal gradient methods for convex-concave optimization, submitted to SIAM Journal on Optimization, 008. [1] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, Robust principal component analysis: Exact recovery of corrupted low-ran matrices via convex optimization, in Proceedings of Neural Information Processing Systems NIPS, December 009. [13] Z. Zhou, X. Li, J. Wright, E. Candès, and Y. Ma, Stable principle component pursuit, Proceedings of International Symposium on Information Theory, 010. Appendix A. Proof of Theorem 4.1. Definition A.1. Let φ : R m n R and ψ : R m n R m n R be closed convex functions and define Q φ Z, S X := ψz, S + φx + γ φ X, Z X + ρ Z X F, Q ψ Z X, S := φz + ψx, S + γ ψ x X, S, Z X + ρ Z X F, A.1 A. and p φ x X, p φ s X := argmin Z,S R m n Q φ Z, S X, p ψ X, S := argmin Z R m n Q ψ Z X, S, where γ φ X is any subgradient in the subdifferential φ at the point X and A.3 A.4 γx ψ X, S, γs ψ X, S is any subgradient in the subdifferential ψ at the point X, S. Lemma A.. Let φ, ψ, Q φ, Q ψ, p φ x, p φ s, p ψ, γ φ, γ ψ x, γ ψ s be as given in Definition A.1. and ΦX, S := φx + ψx, S. Let X 0 R m n and define ˆX := p φ xx 0 and Ŝ := pφ s X 0. If then for any X, S R m n R m n, Moreover, if ρ ΦX, S Φ ˆX, Ŝ Φ ˆX, Ŝ Qφ ˆX, Ŝ X0, A.5 X ˆX F X X 0 F. A.6 Φ p ψ ˆX, Ŝ, Ŝ Q ψ p ψ ˆX, Ŝ ˆX, Ŝ, A.7 then for any X, S R m n R m n, ΦX, S Φ p ψ ˆX, Ŝ Ŝ, X p ψ ˆX, Ŝ F X ρ ˆX F. A.8 19

20 Proof. Let X 0 R m n satisfy A.5. Then for any X, S R m n R m n, we have ΦX, S Φ p φ xx 0, Ŝ ΦX, S Q φ ˆX, Ŝ X 0. A.9 First order optimality conditions for A.3 and ψ being a closed convex function guarantee that there exists γx ψ ˆX, Ŝ, γs ψ ˆX, Ŝ ψ ˆX, Ŝ such that γx ψ ˆX, Ŝ where ψ ˆX, Ŝ denotes the subdifferential of ψ.,. at the point ˆX, Ŝ. Moreover, using the convexity of ψ.,. and φ., we have + γ φ X 0 + ρ ˆX X 0 = 0, A.10 γs ψ ˆX, Ŝ = 0, A.11 ψx, S ψ ˆX, Ŝ + γx ψ ˆX, Ŝ, X ˆX + γs ψ ˆX, Ŝ, S Ŝ, φx φx 0 + γ φ X 0, X X 0. These two inequalities and A.11 together imply ΦX, S ψ ˆX, Ŝ + γx ψ ˆX, Ŝ, X ˆX + φx 0 + γ φ X 0, X X 0. A.1 This inequality together with A.5 and A.10 gives ΦX, S Φ ˆX, Ŝ γx ψ ˆX, Ŝ, X ˆX + γ φ X 0, X X 0 = γ φ X 0 + γx ψ ˆX, Ŝ, X ˆX ρ X X0 F, =ρ X 0 ˆX, X ˆX ρ X X0 F, = ρ X ˆX F X X 0 F. γ φ X 0, ˆX X 0 ρ X X0 F, Hence, we have A.6. Suppose that X 0 satisfies A.7. Then for any X, S R m n R m n, we have ΦX, S Φ p ψ ˆX, Ŝ, Ŝ ΦX, S Q ψ p ψ ˆX, Ŝ ˆX, Ŝ. A.13 First order optimality conditions for A.4 and φ being a closed convex function guarantee that there exists γ p φ ψ ˆX, Ŝ φ p ψ ˆX, Ŝ such that γ φ p ψ ˆX, Ŝ + γx ψ ˆX, Ŝ + ρ p ψ ˆX, Ŝ ˆX = 0. A.14 Moreover, using the convexity of φ. and ψ.,., we have φx φ p ψ ˆX, Ŝ + γ φ p ψ ˆX, Ŝ, X p ψ ˆX, Ŝ, A.15 ψx, S ψ ˆX, Ŝ + γx ψ ˆX, Ŝ, X ˆX, A.16 0

21 where A.16 follows from the fact that ˆX, Ŝ = argmin X,S Q φ X, S X 0 implies γx ψ ˆX, Ŝ, 0 ψ ˆX, Ŝ, i.e. we can set γs ψ ˆX, Ŝ = 0. Summing the two inequalities A.15 and A.16 give ΦX, S ψ ˆX, Ŝ + γx ψ ˆX, Ŝ, X ˆX + φ p ψ ˆX, Ŝ + γ φ p ψ ˆX, Ŝ, X p ψ ˆX, Ŝ. A.17 This inequality together with A.7 and A.14 gives ΦX, S Φ p ψ ˆX, Ŝ, Ŝ γx ψ ˆX, Ŝ, X ˆX + γ φ p ψ ˆX, Ŝ, X p ψ ˆX, Ŝ γx ψ ˆX, Ŝ, p ψ ˆX, Ŝ ˆX ρ pψ ˆX, Ŝ ˆX F, = γ φ p ψ ˆX, Ŝ + γx ψ ˆX, Ŝ, X p ψ ˆX, Ŝ ρ pψ ˆX, Ŝ ˆX F, =ρ ˆX p ψ ˆX, Ŝ, X p ψ ˆX, Ŝ ρ pψ ˆX, Ŝ ˆX F, = ρ X p ψ ˆX, Ŝ F X ˆX F. Hence, we have A.8. We are now ready to give the proof of Theorem 4.1. Proof. Let I := {0 i 1 : ΦX i+1, S i L ρ X i+1, Z i, S i ; Y i } and I c := {0, 1,..., 1} \ I. Since φ. is Lipschitz continuous with Lipschitz constant L and ρ L, Φp φ xx, p φ s X Q φ p φ xx, p φ s X X is true for all X R m n. Since A.5 in Lemma A. is true for all X 0 R m n, A.6 is true for all X, S R m n R m n. Particularly, since for all i I I c Z i+1, S i+1 = argmin Q φ Z, S X i+1, Z,S A.18 setting X, S := X, S and X 0 := X i+1 in Lemma A. imply that p φ xx i+1 = Z i+1, p φ s X i+1 = S i+1 and we have ρ ΦX, S ΦZ i+1, S i+1 Z i+1 X F X i+1 X F. Moreover, A.18 implies that for all i I I c, there exits γ ψ x Z i, S i, γ ψ s Z i, S i ψz i, S i such that A.19 γ ψ x Z i, S i + φx i + ρz i X i = 0, γ ψ s Z i, S i = 0. A.0 A.1 A.0 and the definition of Y i+1 of Algorithm ALM-S shown in Algorithm 3 imply that γ ψ x Z i, S i = φx i + ρx i Z i = Y i. Hence, by defining Q ψ. Z i, S i according to A. using γ ψ x Z i, S i = Y i, for all X R m n we have L ρ X, Z i, S i ; Y i = φx + ψz i, S i + Y i, X Z i + ρ X Z i F = Q ψ X Z i, S i. A. for all i I I c. Hence, for all i I X i+1 = argmin X L ρ X, Z i, S i ; Y i = argmin X Q ψ X Z i, S i. Thus, for all i I, setting X 0 := X i in Lemma A. imply p φ xx i = Z i, p φ s X i = S i and p ψ p φ xx i, p φ s X i = p ψ Z i, S i = X i+1. For all i I we have ΦX i+1, S i L ρ X i+1, Z i, S i ; Y i = Q ψ X i+1 Z i, S i. Hence, for all i I setting X 0 := X i in Lemma A. satisfies A.7. Therefore, setting X, S := X, S and X 0 := X i in Lemma A. implies that ρ ΦX, S ΦX i+1, S i X i+1 X F Z i X F. 1 A.3

22 For any i I, summing A.19 and A.3 gives ρ ΦX, S ΦX i+1, S i ΦZ i+1, S i+1 Z i+1 X F Z i X F. A.4 Moreover, since X i+1 = Z i for i I c and A.19 holds for all i I I c, we trivially have ρ ΦX, S ΦZ i+1, S i+1 Z i+1 X F Z i X F. Summing A.4 and A.5 over i = 0, 1,..., 1 gives I + I c ΦX, S ΦX i+1, S i ρ i I 1 ΦZ i+1, S i+1 Z X F Z 0 X F. i=0 A.5 A.6 For any i I I c, setting X, S := X i+1, S i and X 0 := X i+1 in Lemma A. gives ρ ΦX i+1, S i ΦZ i+1, S i+1 Z i+1 X i+1 F 0. A.7 Trivially, for i = 1,..., we also have ρ ΦX i, S i 1 ΦZ i, S i Z i X i F 0. A.8 Moreover, since for all i I setting X 0 := X i in Lemma A. satisfies A.7, setting X, S := Z i, S i and X 0 := X i in Lemma A. implies that ρ ΦZ i, S i ΦX i+1, S i X i+1 Z i F 0. A.9 And since X i+1 = Z i for all i I c, A.9 trivially holds for all i I c. Thus, for all i I I c we have ρ ΦZ i, S i ΦX i+1, S i 0. A.30 Adding A.7 and A.30 yields ΦZ i, S i ΦZ i+1, S i+1 for all i I I c and adding A.8 and A.30 yields ΦX i, S i 1 ΦX i+1, S i for all i = 1,..., 1. Hence, 1 i=0 ΦZ i+1, S i+1 ΦZ, S, and i I ΦX i+1, S i n ΦX, S 1. A.31 These two inequalities, A.6 and the fact that X 0 = Z 0 imply ρ I + Ic ΦX, S n ΦX, S 1 ΦZ, S X 0 X F. A.3 Hence, 4.4 follows from the facts: I + I c = +n and n ΦX, S 1 +ΦZ, S +n ΦZ, S due to A.7. Appendix B. Proof of Lemma 5.1. Proof. Since Y and θ are optimal Lagrangian dual variables, we have X, X, S = argmin X + ξ S 1 + Y, X Z + θ Z + S D X,Z,S F δ.

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

First Order Methods for Large-Scale Sparse Optimization. Necdet Serhat Aybat

First Order Methods for Large-Scale Sparse Optimization. Necdet Serhat Aybat First Order Methods for Large-Scale Sparse Optimization Necdet Serhat Aybat Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

An ADMM Algorithm for Clustering Partially Observed Networks

An ADMM Algorithm for Clustering Partially Observed Networks An ADMM Algorithm for Clustering Partially Observed Networks Necdet Serhat Aybat Industrial Engineering Penn State University 2015 SIAM International Conference on Data Mining Vancouver, Canada Problem

More information

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit

Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Dense Error Correction for Low-Rank Matrices via Principal Component Pursuit Arvind Ganesh, John Wright, Xiaodong Li, Emmanuel J. Candès, and Yi Ma, Microsoft Research Asia, Beijing, P.R.C Dept. of Electrical

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS

A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS N. S. AYBAT AND G. IYENGAR Abstract. We propose a first-order augmented Lagrangian algorithm FALC to solve the composite norm imization problem X R m n

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

arxiv: v1 [cs.cv] 1 Jun 2014

arxiv: v1 [cs.cv] 1 Jun 2014 l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 16 Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications Fanhua Shang, Member, IEEE,

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms

Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms Carlos Humes Jr. a, Benar F. Svaiter b, Paulo J. S. Silva a, a Dept. of Computer Science, University of São Paulo, Brazil Email: {humes,rsilva}@ime.usp.br

More information

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Noname manuscript No. (will be inserted by the editor) The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Zhouchen Lin Minming Chen Yi Ma Received: date / Accepted:

More information

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Noname manuscript No. (will be inserted by the editor) The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices Zhouchen Lin Minming Chen Yi Ma Received: date / Accepted:

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Yuan Shen Zaiwen Wen Yin Zhang January 11, 2011 Abstract The matrix separation problem aims to separate

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

arxiv: v1 [stat.ml] 1 Mar 2016

arxiv: v1 [stat.ml] 1 Mar 2016 DUAL SMOOTHING AND LEVEL SET TECHNIQUES FOR VARIATIONAL MATRIX DECOMPOSITION Dual Smoothing and Level Set Techniques for Variational Matrix Decomposition arxiv:1603.00284v1 [stat.ml] 1 Mar 2016 Aleksandr

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

EE227 Project Report Robust Principal Component Analysis. Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam

EE227 Project Report Robust Principal Component Analysis. Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam EE227 Project Report Robust Principal Component Analysis Maximilian Balandat Walid Krichene Chi Pang Lam Ka Kit Lam May 10, 2012 May 10, 2012 1 Introduction Over the past decade there has been an explosion

More information

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization Roger Behling a, Clovis Gonzaga b and Gabriel Haeser c March 21, 2013 a Department

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

arxiv: v1 [stat.ml] 1 Mar 2015

arxiv: v1 [stat.ml] 1 Mar 2015 Matrix Completion with Noisy Entries and Outliers Raymond K. W. Wong 1 and Thomas C. M. Lee 2 arxiv:1503.00214v1 [stat.ml] 1 Mar 2015 1 Department of Statistics, Iowa State University 2 Department of Statistics,

More information

Robust Low-Rank Modelling on Matrices and Tensors

Robust Low-Rank Modelling on Matrices and Tensors Imperial College London Department of Computing MSc in Advanced Computing Robust Low-Ran Modelling on Matrices and Tensors by Georgios Papamaarios Submitted in partial fulfilment of the requirements for

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

arxiv: v2 [math.oc] 25 Mar 2018

arxiv: v2 [math.oc] 25 Mar 2018 arxiv:1711.0581v [math.oc] 5 Mar 018 Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming Yangyang Xu Abstract Augmented Lagrangian method ALM has been popularly

More information

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global and local convergence results

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

arxiv: v1 [cs.cv] 17 Nov 2015

arxiv: v1 [cs.cv] 17 Nov 2015 Robust PCA via Nonconvex Rank Approximation Zhao Kang, Chong Peng, Qiang Cheng Department of Computer Science, Southern Illinois University, Carbondale, IL 6901, USA {zhao.kang, pchong, qcheng}@siu.edu

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS MIN TAO AND XIAOMING YUAN December 31 2009 Abstract. Many applications arising in a variety of fields can be

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization

Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Mathematical Programming manuscript No. (will be inserted by the editor) Complexity Analysis of Interior Point Algorithms for Non-Lipschitz and Nonconvex Minimization Wei Bian Xiaojun Chen Yinyu Ye July

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Primal-Dual First-Order Methods for a Class of Cone Programming

Primal-Dual First-Order Methods for a Class of Cone Programming Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global

More information

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Alp Simsek Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Asuman E.

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 Nobuo Yamashita 2, Christian Kanzow 3, Tomoyui Morimoto 2, and Masao Fuushima 2 2 Department of Applied

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Direct Robust Matrix Factorization for Anomaly Detection

Direct Robust Matrix Factorization for Anomaly Detection Direct Robust Matrix Factorization for Anomaly Detection Liang Xiong Machine Learning Department Carnegie Mellon University lxiong@cs.cmu.edu Xi Chen Machine Learning Department Carnegie Mellon University

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information