Low rank matrix completion by alternating steepest descent methods

Size: px
Start display at page:

Download "Low rank matrix completion by alternating steepest descent methods"

Transcription

1 SIAM J. IMAGING SCIENCES Vol. xx, pp. x c xxxx Society for Industrial and Applied Mathematics x x Low rank matrix completion by alternating steepest descent methods Jared Tanner and Ke Wei Abstract. Matrix completion involves recovering a matrix from a subset of its entries by utilizing interdependency between the entries, typically through low rank structure. Despite the combinatorial nature of matrix completion, there are many computationally efficient algorithms which are effective for a broad class of matrices. In this paper, we introduce an alternating steepest descent algorithm () and a scaled variant,, for the fixed-rank matrix completion problem. Empirical evaluation of and on both image in painting and random problems show they are competitive with other state-of-the-art matrix completion algorithms in terms of recoverable rank and overall computational time. In particular, their low per iteration computational complexity makes and efficient for large size problems, especially when computing the solutions to moderate accuracy. A preliminary convergence analysis is also presented. Key words. Matrix completion, alternating minimization, gradient descent, exact line-search AMS subject classifications. 15A29,41A29,6510,65J20,68Q25,90C26 1. Introduction. The problem of recovering a low rank matrix from partial entries - also known as matrix completion - arises in a wide variety of practical context, such as model reduction [23], pattern recognition [10], and machine learning [1, 2]. rom the pioneering work on low rank approximation by azel [12] and matrix completion by Candes and Recht [6], this problem has received intensive investigations both from theoretical and algorithmic aspects, see [3, 4, 5, 7, 8, 9, 13, 15, 16, 18, 21, 22, 24, 25, 26, 27, 29, 30, 31, 32] and their references for a partial review. These matrix completion and low rank approximation techniques rely on the dependencies between entries imposed by the low rank structure. Explicitly seeking the lowest rank matrix consistent with the known entries is expressed as (1.1) min Z R m n rank(z), subject to P Ω(Z) = P Ω (Z 0 ), where Z 0 R m n is the underlying matrix to be reconstructed, Ω is a subset of indices for the known entries, and P Ω is the associated sampling operator which acquires only the entries indexed by Ω. Problem (1.1) is non-convex and generally NP-hard [17] due to the rank objective. One of the most widely studied approaches is to replace the rank objective with a convex relaxation such as the Schatten 1-norm, also known as the nuclear norm. It has been proven that, provided the singular vectors are weakly correlated with the canonical basis, the solution of (1.1) can be obtained by solving a nuclear norm minimization problem (1.2) min Z R m n Z, subject to P Ω (Z) = P Ω (Z 0 ), where the nuclear norm of Z, denoted by Z, is the sum of its singular values. More precisely, if Z 0 satisfies the incoherent conditions and Ω = p of its entries are observed Mathematical Institute, University of Oxford, Oxford, UK (tanner@maths.ox.ac.uk). Mathematical Institute, University of Oxford, Oxford, UK (wei@maths.ox.ac.uk). 1

2 2 J. Tanner and K. Wei uniformly at random, Z 0 can be exactly recovered by (1.2) with high probability if p is of the order O(r(m + n r)log α (max(m, n)) [6, 7, 15, 27]. In practise, (1.2) can be solved either by semidefinite programming or by iterative soft thresholding algorithms [5, 13, 20]. Alternative to nuclear norm minimization, there have been many algorithms which are designed to target (1.1) directly; most of them are adaptations of algorithms for compressed sensing, such as the hard thresholding algorithms [3, 18, 22, 29]. The iterative thresholding algorithms typically update a current rank r estimate along a line-search direction which departs the manifold of rank r matrices, and then projects back to the rank r manifold by computing the nearest matrix in robenius norm. The most direct implementation of the projection involves computing a partial singular value decomposition (SVD) in each iteration. or r m n, computing the SVD has computational complexity of O(n 3 ) which is the dominant per iteration cost 1, and limits their applicability for large n. A subclass of methods which further exploit the manifold structure in their line-search updates achieve superior efficiency, for r n, of order O(r 3 ). Examples include [31] which is a nonlinear conjugate gradient method and [26] which uses a scaled gradient in the subspace update, as well as other geometries and metrics in [24, 25]. To circumvent the high computational cost of an SVD, other algorithms explicitly remain on the manifold of rank r matrices by using the factorization Z = XY where X R m r and Y R r n. Based on this simple factorization model, rather than solving (1.1), algorithms are designed to solve the non-convex problem (1.3) min X,Y 1 2 P Ω(Z 0 ) P Ω (XY ) 2. Algorithms for the solution of (1.3) usually follow an alternating minimization scheme, with Poweractorization [16] and [32] two representatives. We present alternating steepest descent (), and a scaled variant, which incorporate an exact line-search to update the solutions for the model (1.3). In so doing has a lower per iteration complexity than Poweractorization, allowing a more efficient exploration of the manifold in the early iterations where the current estimate is inaccurate. The manuscript is outlined as follows. In Sec. 2 we briefly review the alternating minimization algorithms Poweractorization and which motivate. In Sec. 3 we propose for (1.3), which replaces the least square subproblem solution in Poweractorization with exact line-search so as to reduce the per iteration computational cost. In Sec. 4 we present, a version of which is accelerated by scaling the search directions adaptively to improve the asymptotic convergence rate. Numerical experiments presented in Sec. 5 contrast the aforementioned algorithms and show that is highly efficient, particularly for large problems when solved to moderate accuracy. Section 6 concludes the paper with some future research directions. 2. Alternating minimization. Alternating minimization is widely used for optimization problems due to its simplicity, low memory footprint and flexibility [11]. Without loss of generality, let us consider an objective function f(x, Y ) with two variable components X 1 The related low rank approximation question [28] has an O(n 4 ) per iteration cost dominated by computing the measurement operator and its adjoint; in this context, computing the SVD is of secondary cost.

3 Alternating Steepest Descent for Matrix Completion 3 X, Y Y, where X and Y are the admissible sets. The alternating minimization method, also known as Gauss-Seidel or 2-block coordinate descent method, minimizes f(x, Y ) by successive searches over X and Y. or problem (1.3), the objective function is (2.1) f(x, Y ) = 1 2 P Ω(Z 0 ) P Ω (XY ) 2, with X := R m r, Y := R r n. Poweractorization (), Alg. 1, is a matrix completion algorithm which applies the alternating minimization method to (1.3). Algorithm 1 Poweractorization (, [16]) Input: P Ω (Z 0 ), X 0 R m r, Y 0 R r n Repeat 1. ix Y i, solve X i+1 = arg min P Ω(Z 0 ) P Ω (XY i ) 2 X 2. ix X i+1, solve Y i+1 = arg min P Ω(Z 0 ) P Ω (X i+1 Y ) 2 Y 3. i = i + 1 Until termination criteria is reached Ouput X i, Y i Note that each subproblem in Alg. 1 is a standard least square problem. In [16], a rank increment technique is also proposed to improve the performance of the algorithm. As a typical alternating minimization method, limit points of Alg. 1 are necessarily stationary points, see Thm The per iteration computational cost of is determined by the solution of the least squares subproblems in Alg. 1. To decrease this per iteration cost, [32] proposes the model 1 (2.2) min X,Y,Z 2 Z XY 2 2, subject to P Ω (Z) = P Ω (Z 0 ), where the projection onto Ω is moved from the objective in (2.1) to a linear constraint. The alternating minimization approach applied to (2.2) gives the low-rank matrix fitting algorithm (), Alg. 2. Algorithm 2 Low-rank Matrix itting (, [32]) Input: P Ω (Z 0 ), X 0 R m r, Y 0 R r n Repeat 1. X i+1 = Z i Y i = arg min XY i Z i 2 X 2. Y i+1 = X i+1 Z i = arg min Y X i+1y Z i 2 3. Z i+1 = X i+1 Y i+1 + P Ω (Z 0 X i+1 Y i+1 ) 4. i = i + 1 Until termination criteria is reached Output: X i, Y i obtains new approximate solutions of X and Y by solving least square problems while Z is held fixed, as done in. However the subproblems in have explicit

4 4 J. Tanner and K. Wei solutions, which reduces the high per iteration computational cost of. Moreover, [32] proposes using an over-relaxation scheme to further accelerate Alg. 2. The convergence of limit points to stationary points is similarly established for, see Thm. 3.5 in [32]. 3. Alternating steepest descent. Solving the least square subproblem in to high accuracy is both computationally expensive and potentially of limited value when the factor that is held fixed is inconsistent with the known entries P Ω (Z 0 ). To improve the computational efficiency we replace solving the least squares subproblems in with a single step of simple line-search along the gradient descent directions. The alternating steepest descent method () applies steepest gradient descent to f(x, Y ) in (2.1) alternatively with respect to X and Y. If f(x, Y ) is written as f Y (X) when Y is held constant and f X (Y ) when X is held constant, the directions of gradient ascent are (3.1) (3.2) f Y (X) = (P Ω (Z 0 ) P Ω (XY ))Y T, f X (Y ) = X T (P Ω (Z 0 ) P Ω (XY )). Let t x, t y be the steepest descent stepsizes for descent directions f Y (X) and f X (Y ), then (3.3) (3.4) t x = t y = f Y (X) 2 P Ω ( f Y (X)Y ) 2, f X (Y ) 2 P Ω (X f X (Y )) 2, which together form the alternating steepest descent algorithm, Alg. 3. Algorithm 3 Alternating Steepest Descent () Input: P Ω (Z 0 ), X 0 R m r, Y 0 R r n Repeat 1. f Yi (X i ) = (P Ω (Z 0 ) P Ω (X i Y i ))Yi T, t xi = 2. X i+1 = X i t xi f Yi (X i ) 3. f Xi+1 (Y i ) = X T i+1 (P Ω(Z 0 ) P Ω (X i+1 Y i )), t yi = 4. Y i+1 = Y i t yi f Xi+1 (Y i ) 5. i = i + 1 Until termination criteria is reached Output: X i, Y i f Yi (X i ) 2 P Ω ( f Yi (X i )Y i ) 2 f Xi+1 (Y i ) 2 P Ω (X i+1 f Xi+1 (Y i )) 2 Remark 3.1.The denominators in t xi and t yi are zeroes if and only if the gradients are zeroes. This follows from the contradiction that otherwise f(x, Y ) could be decreased without a lower bound which violates the nonnegativity of the objective. If the denominator of t xi or t yi is zero in an iteration, it indicates X i or Y i respectively are at a local minima and should be held fixed for that iteration by setting the offending stepsize to be zero while minimizing the other objective. If both denominators of t xi or t yi are zeroes then a stationary point has been reached.

5 Alternating Steepest Descent for Matrix Completion Per iteration computational cost: efficient residual updates. Algorithm 3 consists of two main parts: the computations of the gradient and the steepest descent stepsize. orming the gradient involves a matrix product between the residual and an m r or r n matrix. The computations of both the residual and the matrix product require to leading order 2 Ω r floating point operations (flops), where Ω denotes the number of sampled entries. Computing the stepsize also requires 2 Ω r flops for the denominator. Naively implemented this would require a per iteration complexity for Alg. 3 of 12 Ω r flops. However the residual only needs to be computed once at the beginning of an iteration and then can be updated efficiently. Taking the first two steps as an example, the residual after X i is updated to X i+1 is P Ω (Z 0 ) P Ω (X i+1 Y i ) = P Ω (Z 0 ) P Ω ((X i t xi f Yi (X i ))Y i ) = P Ω (Z 0 ) P Ω (X i Y i ) + t xi P Ω ( f Yi (X i )Y i ). ortunately P Ω ( f Yi (X i )Y i ) is formed when computing t xi, so the new residual can be updated from the old one P Ω (Z 0 ) P Ω (X i Y i ) without computing a matrix-matrix product. Therefore the leading order cost per iteration of Alg. 3 is 8 Ω r flops convergence analysis. The convergence analysis of block coordinate methods for general unconstrained optimization problems has been studied in [14] based on stationary points. However because of the simple structure of problem (1.3), for completeness we will offer a direct proof that limit points of are necessarily stationary points of (1.3); that is limit points (X, Y ) satisfy (3.5) f X (Y ) = 0, f Y (X) = 0. Theorem 3.1. Any limit point of the sequence (X i, Y i ) generated by Alg. 3 is a stationary point. Proof. Suppose (X ik, Y ik ) is a subsequence of (X i, Y i ) such that lim X i k = X, lim Y i k = Y, i k i k for some (X, Y ). In order to prove (X, Y ) is a stationary point, by continuity of the gradient, it is sufficient to prove (3.6) lim i k f X ik (Y ik ) = 0, lim f Y ik (X ik ) = 0. i k Since (X ik, Y ik ) are bounded for all i k, (3.6) follows immediately from the subsequent Lems. 3.2 and 3.3. Lemma 3.2. If (X ik, Y ik ) is a bounded sequence, (3.7) lim i k f X ik (Y ik 1) = 0, lim i k f Y ik (X ik ) = 0 Proof. By the quadratic form of f(x, Y ) with respect to X and Y, the decrease of f(x, Y ) can be computed exactly in each iteration, (3.8) (3.9) f(x i, Y i 1 ) = f(x i 1, Y i 1 ) 1 f Yi 1 (X i 1 ) 4 2 P Ω ( f Yi 1 (X i 1 )Y i 1 ) 2, f(x i, Y i ) = f(x i, Y i 1 ) 1 f Xi (Y i 1 ) 4 2 P Ω (X i f Xi (Y i 1 )) 2.

6 6 J. Tanner and K. Wei Summing up (3.8) and (3.9) from i = 0 to i = l gives f(x l, Y l ) = f(x 0, Y 0 ) 1 l 1 { f Yi (X i ) 4 2 P Ω ( f Yi (X i )Y i ) 2 i=0 f Xi+1 (Y i ) 4 + P Ω (X i+1 f Xi+1 (Y i )) 2 }. The nonnegativity of f(x l, Y l ) for all l implies Consequently, { f Yi (X i ) 4 f Xi+1 (Y i ) 4 } P Ω ( f Yi (X i )Y i ) 2 + P Ω (X i+1 f Xi+1 (Y i )) 2 <. i=0 (3.10) (3.11) lim i lim i f Yi (X i ) 4 P Ω ( f Yi (X i )Y i ) 2 f Xi (Y i 1 ) 4 P Ω (X i f Xi (Y i 1 )) 2 = 0, = 0. In particular, (3.10) and (3.11) are also true when i is replaced by the subsequence i k. Since Y ik is bounded, one has P Ω ( f Yik (X ik )Y ik )) f Yik (X ik )Y ik ) f Yik (X ik ) Y ik C f Yik (X ik ) for some C > 0. Thus (3.12) lim i k f Yik (X ik ) 4 C 2 f Yik (X ik ) 2 lim i k f Yik (X ik ) 4 P Ω ( f Yik (X ik )Y ik ) 2 = 0, which gives lim f Y ik (X ik ) = 0. Similarly lim f X ik (Y ik 1) = 0 since X ik is bounded. i k i k Lemma 3.2 justifies the second limit in (3.6). The first limit in (3.6) follows directly from Lem Lemma 3.3. If (X ik, Y ik ) are bounded, (3.13) lim i k f X ik (Y ik ) = 0. Proof. The difference between f Xik (Y ik ) and f Xik (Y ik 1) is f Xik (Y ik ) f Xik (Y ik 1) = X T i k (P Ω (Z 0 ) P Ω (X ik Y ik )) + X T i k (P Ω (Z 0 ) P Ω (X ik Y ik 1)) = X T i k P Ω (X ik (Y ik Y ik 1)) = t yik 1 XT i k P Ω (X ik f Xik (Y ik 1)) f Xik (Y ik 1) 2 = P Ω (X ik f Xik (Y ik 1)) 2 Xi T k P Ω (X ik f Xik (Y ik 1)),

7 Alternating Steepest Descent for Matrix Completion 7 where the third equation follows from Y ik = Y ik 1 + t yik 1 f X ik (Y ik 1). Hence, f Xik (Y ik 1) 2 f Xik (Y ik ) f Xik (Y ik 1) X ik P Ω (X ik f Xik (Y ik 1)) (3.14) f Xik (Y ik 1) 2 C 0 P Ω (X ik ) f Xik (Y ik 1)) where the second inequality follows from the boundedness of X ik and the limit follows from (3.11). Combining (3.14) together with (3.7) gives lim f X ik (Y ik ) = 0. i k 3.3. Poweractorization convergence analysis. This section presents a proof that limit points of Poweractorization are necessarily stationary point. The proof follows that of Thm Theorem 3.4. Any limit point of the sequence (X i, Y i ) generated Alg. 1 is a stationary point. Proof. Let (X i, Y i ) be a sequence from Alg. 1, with a subsequence (X ik, Y ik ) converging to (X, Y ). Because Y i minimize f Xi (Y ) over all the Y R r n, for all i. By the continuity of the gradients, f Xi (Y i ) = 0, f X (Y ) = lim i k f X ik (Y ik ) = 0. To prove lim i k f Y ik (X ik ) = 0, let ( X i, Y i 1 ) be the solution obtained by one step steepest descent as in Alg. 3 from (X i 1, Y i 1 ); and (X i, Ỹi) be the solution obtained by one step steepest descent from (X i, Y i 1 ). Then, f(x i, Y i 1 ) f( X i, Y i 1 ) = f(x i 1, Y i 1 ) 1 f Yi 1 (X i 1 ) 4 2 P Ω ( f Yi 1 (X i 1 )Y i 1 ) 2, f(x i, Y i ) f(x i, Ỹi) = f(x i, Y i 1 ) 1 f Xi (Y i 1 ) 4 2 P Ω (X i f Xi (Y i 1 )) 2, where the inequalities follow from that X i and Y i, by the definition, are the minimal points, and the equalities follow from the same argument as in (3.8) and (3.9). Thus, f(x l, Y l ) f(x 0, Y 0 ) 1 2 l 1 { k=0 Repeating the argument for Lem. 3.2 yields f Yi (X i ) 4 P Ω ( f Yi (X i )Y i ) 2 f Y (X ) = lim i k f Y ik (X ik ) = 0. f Xi+1 (Y i ) 4 + P Ω (X i+1 f Xi+1 (Y i )) 2 }.

8 8 J. Tanner and K. Wei 4. Scaled alternating steepest descent. In this section, we present an accelerated version of Alg. 3. To motivate its construction, let us consider the case when all the entries of Z 0 are observed. Under this assumption, problem (1.3) can be simplified to (4.1) min X,Y 1 2 Z0 XY 2. The Newton directions for problem (4.1) with respect to X and Y are (4.2) (4.3) (Z 0 XY )Y T (Y Y T ) 1, (X T X) 1 X T (Z 0 XY ), which are the gradient descent directions scaled by (Y Y T ) 1 and (X T X) 1 respectively. However when only partial entries of Z 0 are known, the Newton directions do not possess explicit formulas similar to (4.2) and (4.3); and solving each subproblem by Newton s method is the same as solving a least square problem as f(x, Y ) is an exact quadratic function of X or Y. Nevertheless, it is still possible to scale the gradient descent directions by (Y Y T ) 1 and (X T X) 1. The scaled alternating gradient descent method () uses the scaled gradient descent directions with exact line-search, Alg. 4. Algorithm 4 Scaled Alternating Steepest Descent () Input: P Ω (Z 0 ), X 0 R m r, Y 0 R r n Repeat 1. f Yi (X i ) = (P Ω (Z 0 ) P Ω (X i Y i ))Y T i 2. d xi = f Yi (X i )(Y i Y T i ) 1, t xi = f Yi (X i ), d xi / P Ω (d xi Y i ) 2 3. X i+1 = X i + t xi d xi 4. f Xi+1 (Y i ) = X T i+1 (P Ω(Z 0 ) P Ω (X i+1 Y i )) 5. d yi = (X T i+1 X i+1) 1 f Xi+1 (Y i ), t yi = f Xi+1 (Y i ), d yi / P Ω (X i+1 d yi ) 2 6. Y i+1 = Y i + t yi d yi 7. i = i + 1 Until termination criteria is reached Output: X i, Y i Remark The leading order per iteration computational cost of Alg. 4 remains 8 Ω r, and the residual can be updated as in, see Sec Although motivated from a different perspective, can also be viewed as a sequential version of the Riemannian gradient method suggested in [24], though with with a new metric. 5. Numerical experiments. In the section, we present the empirical results on our gradient based algorithms and other algorithms. and are implemented in Matlab with two subroutines written in C to take advantage of the sparse structure. Other tested algorithms include Poweractorization () [16], [32], [31], and [26], which are all downloaded from the authors websites and where applicable use the aforementioned C subroutines. We compare these algorithms on both standard test images in Sec. 5.1 and random low rank matrices in Sec. 5.2.

9 Alternating Steepest Descent for Matrix Completion Image inpainting. In this section, we demonstrate the performance of the proposed algorithms on image inpainting problems. Image inpainting concerns filling in the unknown pixels of an image from an incomplete set of observed entries. All the aforementioned algorithms are tested on three standard grayscale test images (Boat, Barbara, and Lena), each projected to the nearest rank 50 image in robenius norm. Two sampling schemes are considered, 1) random sampling where 35% pixels of the low rank image were sampled uniformly at random, and 2) cross mask where 6.9% pixels of of the image were masked in a non-random cross pattern. The tolerance was set to. All algorihtms were provided with the true rank of the image except for random sampling which used warm starting strategies proposed by its authors in [32] where the hand tuned parameters chosen to give the best performance on the images tested were their increasing strategy with initial rank set to 25 and maximum rank set to 60. The tests were performed on the same computer as described in Sec Original image Rank 50 approximation 35% random sampling igure 1. Image reconstruction for Boat from random sampling. The size of the image is The desired rank is 50, and 35% entries of the low rank image are sampled uniformly at random.

10 10 J. Tanner and K. Wei Original image Rank 50 approximation cross masked igure 2. Image reconstruction for Boat from cross masked. The size of the image is The desired rank is 50, and 6.9% entries of of the low rank image are masked in a non-random way. The reconstructed images for the Boat test images and each tested algorithm are displayed in igs. 1 and 2 for the random sampling and cross mask respectively. The plotted against iteration number and computational time are shown for each test image and algorithm tested are displayed in igs. 3 and 4 respectively. ig. 4 (a,c,e) show that and have rapid initial decrease in the residual, and that has the fastest asymptotic rate. or moderate accuracy and require the least time for random sampling of the tested natural images, while for high accuracy is preferable. It should be noted that the completed images are visually indistinguishable for s of about, which is well before the fast asymptotic rate of begins. Alternatively, for the cross mask, ig. 4 (b,d,f) show that, Poweractorization, and are preferable throughout, though the fast asymptotic rate of suggests it would be superior for a relative tolerance below. Only is preferred for both random sampling and the cross mask, suggesting its usage for moderate accuracy,

11 Alternating Steepest Descent for Matrix Completion 11 Convergence curve for random sampling Convergence curve for cross mask iter (a) iter (b) Convergence curve for random sampling Convergence curve for cross mask iter (c) iter (d) Convergence curve for random sampling Convergence curve for cross mask iter (e) iter (f) igure 3. Relative residual as function of iterations for image recovery testing on Boat (top), Barbara (middle) and Lena (bottom). Left panel: random sampling; right panel: cross mask. and for higher accuracy Random matrix completion problems. In this section we conduct tests of randomly drawn rank r matrices using the model Z 0 = XY, where X R m r and Y R r n with X and Y having their entries drawn i.i.d. from the normal distribution N(0, 1). A random subset Ω of p entries in Z 0 are sampled uniformly at random. or conciseness, the tests presented in this section consider square matrices as is typical in the literature.

12 12 J. Tanner and K. Wei Convergence curve for random sampling Convergence curve for cross mask time [s] (a) time [s] (b) Convergence curve for random sampling Convergence curve for cross mask time [s] (c) time [s] (d) Convergence curve for random sampling Convergence curve for cross mask time [s] (e) time [s] (f) igure 4. Relative residual as function of time for image recovery testing on Boat (top), Barbara (middle) and Lena (bottom). Left panel: random sampling; right panel: cross mask Largest recoverable rank. Of central importance in matrix completion is what is the largest recoverable rank given (p, m, n), or equivalently, given (r, m, n) how many of the entries are needed in order to reliably recover a rank r matrix. To quantify the optimality of such statements we use the phase transition framework which for matrix completion is defined

13 Alternating Steepest Descent for Matrix Completion 13 through the undersampling and oversampling ratios: (5.1) δ = p mn and ρ = d r p, where p is the number of sampled entries, and d r = r(m + n r) is the degrees of freedom in a m n rank r matrix. The region of greatest importance is severe undersampling, for δ 1, and we restrict our tests to the two values δ {0.05, 0.1}, which also allows us testing of large matrices 2. Due to the high computational cost of accurately computing an algorithm s phase transition we limit the testing here to the three of the most efficient algorithms,, and. Both and were provided with the true rank of the matrix, while the decreasing strategy with the initial rank set to 1.25r was used in as suggested in [32]. In these tests an algorithm is considered to have successfully recovered the sampled matrix Z 0 if it returned a matrix Z out that is within of Z 0 in the relative robenius norm, that is if rel.err := Z out Z 0 / Z 0. or each triple (m, n, p), we started from a sufficiently small rank r so that the algorithm could successfully recover each of the sampled matrices in 100 random tests. The rank is then increased until it was large enough that the algorithm failed to recovery any of 100 tests to within the prescribed relative accuracy. We refer to the largest rank for which the algorithm succeeded in each of the 100 tests as r min, and the smallest rank for which the algorithm failed all the 100 tests as r max. The exact values of r min, r max and associated ρ min, ρ max are listed in Tab. 1. rom Tab.1, we can see that both and LGeomCG are able to recovering a rank r matrix from only C r(m+n r) measurements of its entries with C being only slightly larger than 1. The ability of recovering a low rank matrix from almost near the minimum number of measurements was firstly reported for NIHT by the authors in [29]. In contrast, has a substantially lower phase transition. Table 1 shows that these observations are consistent for different, moderately large size matrices. Table 1 Phase transition table for,, and. or each (m, n, p) with p = δ mn, if r r min, the algorithm was observed to recover each of the 100 randomly drawn m n matrices of rank r and failed to recover each of the randomly drawn matrices if r r max. The oversampling ratios ρ min and ρ max are computed using r min and r max by (5.1). Configuration m = n δ r min r max ρ min ρ max r min r max ρ min ρ max r min r max ρ min ρ max The tests were conducted on the IRIDIS HPC facility provided by the e-infrastructure South Centre for Innovation.The IRIDIS facility runs Linux nodes composed of two 6-core 2.4GHz Intel Westmere processors and approximately 22GB of usable memory per node. The data presented required 4.5 months of CPU time.

14 14 J. Tanner and K. Wei Recovery time. In this section we evaluate the recovery time for the aforementioned algorithms when applied to the previously described randomly generated rank r matrices. The relative error, number of iterations, and computational time reported in this section are average results of 10 random tests. All algorithms were supplied with the true rank, except which used the decreasing strategy with the initial rank 1.25r as advocated in [32]. The tolerance for, P Ω (Z out Z 0 ) 2 / P Ω (Z 0 ) 2, was set to for each algorithm and all parameters were set to their default values. The simulation was conducted on a Linux machine with Intel Xeon E GHz and 64GB memory, and executed from Matlab R2013a. The performance of,, and are compared in Tab. 2 for medium size matrices of m = n = 1000 with undersampling ratio δ {0.1, 0.3, 0.5} and the rank of the matrix is taken so that p/d r 2.0 and Table 2 shows that takes many fewer iterations than and ; however, both and can require much less computational time than due to their lower per iteration computational complexity. is observed to take fewer number of iterations than and is typically faster, with the advantage of over increases as when δ = p/mn increases for p/d r being approximately fixed. This is because the more entries of a matrix are known, the more likely it is that will behave like a second order method; in particular, if all the entries have been observed, is actually an alternating Newton method for (4.1) while is an alternating gradient descent method. Next we compare the algorithms, excluding due to its slow convergence, on larger problem sizes and explore their how their performance changes with additive noise. Tests with additive noise have the sampled entries P Ω (Z 0 ) corrupted by the vector (5.2) e = ǫ P Ω (Z 0 ) 2 w w 2, where the entries of w are i.i.d standard Gaussian random variables, and ǫ is referred to as the noise level. The tests conducted here consider ǫ to be either zero or 0.1. Tests are conducted for m = n {4000, 8000, 16000}, r {40, 80}, and p/d r {3, 5}. The results are presented in Tabs. 3 and 4 for noise levels ǫ = 0 and 0.1 respectively. Table 3 shows that for ǫ = 0: and consistently require less time than, always requires the least time for p/d r = 5 and requires the least time for p/d r = 3. Table 4 shows that for ǫ = 0.1 the performance is predictable, with out of the twelve (p, m, n, r) tuples tested: requires the least time in four instances, in one instance, in three instances, and in four instances. Despite the efficiency of for random sampling of the natural images in ig. 4 (a,c,e), Tabs. 3 and 4 show that consistently requires substantially greater computational time, form random rank r matrices, than the other algorithms tested. Tables 3 and 4 show the efficiency of and when the random problems are solved to moderate accuracy. However it is worth noting that these randomly generated matrices are well-conditioned with high probability; as and are both first order methods, it can be expected that they will suffer from slow asymptotic convergence for severely ill-conditioned matrices, in which case a higher order method such as or may be preferable, see [4, 26] for a discussion of this phenomenon.

15 Alternating Steepest Descent for Matrix Completion 15 Table 2 Average computational time (seconds) and average number of iterations of, scaled and over ten random rank r matrices per (p, m, n, r) tuple for m = 1000, n = 1000, δ {0.1, 0.3, 0.5}. p/d r rel.err iter time rel.err iter time p/mn 0.1 r e e e e e e p/mn 0.3 r e e e e e e p/mn 0.5 r e e e e e e Table 3 Average computational time (seconds) and average number of iterations of, scaled, and over ten random rank r matrices per (p, m, n, r) tuple for m = n {4000, 8000, 16000}, r {40, 80} and p/d r {3, 5}. The noise level ǫ is equal to 0. r p/d r rel.err iter time rel.err iter time rel.err iter time rel.err iter time m = n e e e e e e e e e e e e e e e e e e e e m = n e e e e e e e e e e e e e e e e e e e e m = n e e e e e e e e e e e e e e e e e e e e

16 16 J. Tanner and K. Wei Table 4 Average computational time (seconds) and average number of iterations of, scaled, and over ten random rank r matrices per (p, m, n, r) tuple for m = n {4000, 8000, 16000}, r {40, 80} and p/d r {3, 5}. The noise level ǫ is equal to 0.1. r p/d r rel.err iter time rel.err iter time rel.err iter time rel.err iter time m = n e e e e e e e e e e e e e e e e e e e e m = n e e e e e e e e e e e e e e e e e e e e m = n e e e e e e e e e e e e e e e e e e e e Conclusion and future directions. In this paper, we have proposed an alternating steepest method,, and a scaled variant,, for matrix completion. Despite their simplicity, empirical evaluation of and on both image in painting and random problems show and to be competitive with other state-of-the-art matrix completion algorithms in terms of recoverable rank and overall computational time. In particular, their low per iteration computational complexity makes and efficient for large size problems, especially when computing the solutions to moderate accuracy. Although and are implemented for fixed rank problems in the paper, the common rank increasing or decreasing heuristics can be incorporated into them as well. or, it would be interesting to try different scaling techniques, especially for more diverse applications. In a different direction, a recent paper [19] proves that Poweractorization with some specific initial point is guaranteed to find the objective matrix with high probability if the number of measurements is sufficiently large; a similar analysis for appears a constructive approach. REERENCES [1] Y. Amit, M. ink, N. Srebro, and S. Ullman. Uncovering shared structures in multiclass classification. in Proceedings of the 24th international conference on machine learning, ACM, pages 17 24, [2] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. Advances in Neural Information Processing Systems, 19:41 48, [3] J. Blanchard, J. Tanner, and K. Wei. CGIHT: Conjugate gradient iterative hard thresholing for com-

17 Alternating Steepest Descent for Matrix Completion 17 pressed sensing and matrix completion. Numerical analysis group preprint 14/08, University of Oxford, [4] N. Boumal and P.-A. Absil. RTRMC: A Riemannian trust-region method for low-rank matrix completion. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett,.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24 (NIPS), pages [5] J.. Cai, E. J. Candès, and Z. W. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4): , [6] E. J. Candès and B. Recht. Exact matrix completion via convex optimization. oundations of Computational Mathematics, 9(6): , [7] E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5): , [8] W. Dai, O. Milenkovic, and E. Kerman. Subspace evolution and transfer (SET) for low-rank matrix completion. IEEE Transactions on Signal Processing, 59(7): , [9] Y. C. Eldar, D. Needell, and Y. Plan. Unicity conditions for low-rank matrix recovery. Applied and Computational Harmonic Analysis, 33(2): , [10] L. Eldén. Matrix methods in data mining and pattern recognization. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, [11] R. Escalante and M. Raydan. Alternating Projection Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, [12] M. azel. Matrix rank minimization with applications. Ph. D. dissertation, Stanford University, [13] D. Goldfarb and S. Ma. Convergence of fixed-point continuation algorithms for matrix rank minimization. oundations of Computational Mathematics, 11(2): , [14] L. Grippo and M. Sciandrone. Gloabally convergence block-coordinate techniques for unconstrained optimization. Optimization Methods & Software, 10(4): , [15] D. Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. on Information Theory, 57(3): , [16] J. P. Haldar and D. Hernando. Rank-constrained solutions to linear matrix equations using Poweractorization. IEEE Signal Processing Letters, 16(7): , [17] N. J. A. Harvey, D. R. Karger, and S. Yekhanin. The complexity of matrix completion. SODA 2006, Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithms, pages , [18] P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. Proceedings of the Neural Information Processing Systems Conference (NIPS), pages , [19] P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrix completion using alternating minimization. arxiv: , [20] Toh. K.-C. and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacific J. Optimization, 6: , [21] A. Kyrillidis and V. Cevher. Matrix ALPS: Accelerated low rank and sparse matrix reconstruction. Technical report, [22] A. Kyrillidis and V. Cevher. Matrix recipes for hard thresholding methods. Journal of Mathematical Imaging and Vision, 48(2): , [23] Z. Liu and L. Vandenberghe. Interior-point method for nuclear norm approximation with application to system identification. SIAM Journal on Matrix Analysis and Applications, 31: , [24] B. Mishra, K. A. Apuroop, and R. Sepulchre. A Riemannian geometry for low-rank matrix completion. arxiv: , [25] B. Mishra, G. Meyer, S. Bonnabel, and R. Sepulchre. ixed-rank matrix factorizations and Riemannian low-rank optimization. Computational Statistics, doi: /s z. [26] T. Ngo and Y. Saad. Scaled gradients on grassmann manifolds for matrix completion. In Advances in Neural Information Processing Systems(NIPS) [27] B. Recht. A simpler approach to matrix completion. The Journal of Machine Learning Research, 12: , [28] B. Recht, M. azel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3): , [29] J. Tanner and K. Wei. Normalized iterative hard thresholding for matrix completion. SIAM J. Sci.

18 18 J. Tanner and K. Wei Comput., 35(5):S104 S125, [30] K. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacific Journal of Optimization, pages , [31] B. Vandereycken. Low rank matrix completion by Riemannian optimization. SIAM Journal on Optimization, 23(2): , [32] Z. Wen, W. Yin, and Y. Zhang. Solving a low-rank factorization model for matrix completion by a non-linear successive over-relaxation algorithm. Mathematical Programming Computation, 4: , 2012.

Low rank matrix completion by alternating steepest descent methods

Low rank matrix completion by alternating steepest descent methods Low rank matrix completion by alternating steepest descent methods Jared Tanner a, Ke Wei b a Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem Morteza Ashraphijuo Columbia University ashraphijuo@ee.columbia.edu Xiaodong Wang Columbia University wangx@ee.columbia.edu

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Zaiwen Wen, Wotao Yin, Yin Zhang 2010 ONR Compressed Sensing Workshop May, 2010 Matrix Completion

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

User s Guide for LMaFit: Low-rank Matrix Fitting

User s Guide for LMaFit: Low-rank Matrix Fitting User s Guide for LMaFit: Low-rank Matrix Fitting Yin Zhang Department of CAAM Rice University, Houston, Texas, 77005 (CAAM Technical Report TR09-28) (Versions beta-1: August 23, 2009) Abstract This User

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Riemannian Pursuit for Big Matrix Recovery

Riemannian Pursuit for Big Matrix Recovery Riemannian Pursuit for Big Matrix Recovery Mingkui Tan 1, Ivor W. Tsang 2, Li Wang 3, Bart Vandereycken 4, Sinno Jialin Pan 5 1 School of Computer Science, University of Adelaide, Australia 2 Center for

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Exact Low-Rank Matrix Completion from Sparsely Corrupted Entries via Adaptive Outlier Pursuit

Exact Low-Rank Matrix Completion from Sparsely Corrupted Entries via Adaptive Outlier Pursuit Noname manuscript No. (will be inserted by the editor) Exact Low-Rank Matrix Completion from Sparsely Corrupted Entries via Adaptive Outlier Pursuit Ming Yan Yi Yang Stanley Osher Received: date / Accepted:

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm

Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson, Anton van den Hengel School of Computer Science University of Adelaide,

More information

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM ZAIWEN WEN, WOTAO YIN, AND YIN ZHANG CAAM TECHNICAL REPORT TR10-07 DEPARTMENT OF COMPUTATIONAL

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Method

Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Method Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Method Xiaoming Yuan a, 1 and Junfeng Yang b, 2 a Department of Mathematics, Hong Kong Baptist University, Hong Kong, China b Department

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from

More information

Sparse and Low-Rank Matrix Decompositions

Sparse and Low-Rank Matrix Decompositions Forty-Seventh Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 30 - October 2, 2009 Sparse and Low-Rank Matrix Decompositions Venkat Chandrasekaran, Sujay Sanghavi, Pablo A. Parrilo,

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION

LOW ALGEBRAIC DIMENSION MATRIX COMPLETION LOW ALGEBRAIC DIMENSION MATRIX COMPLETION Daniel Pimentel-Alarcón Department of Computer Science Georgia State University Atlanta, GA, USA Gregory Ongie and Laura Balzano Department of Electrical Engineering

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Convergence Rates for Greedy Kaczmarz Algorithms

Convergence Rates for Greedy Kaczmarz Algorithms onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products Hui Zhao Jiuqiang Han Naiyan Wang Congfu Xu Zhihua Zhang Department of Automation, Xi an Jiaotong University, Xi an,

More information

Matrix Completion from Noisy Entries

Matrix Completion from Noisy Entries Matrix Completion from Noisy Entries Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh Abstract Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Guaranteed Rank Minimization via Singular Value Projection

Guaranteed Rank Minimization via Singular Value Projection 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 Guaranteed Rank Minimization via Singular Value Projection Anonymous Author(s) Affiliation Address

More information

Coherent imaging without phases

Coherent imaging without phases Coherent imaging without phases Miguel Moscoso Joint work with Alexei Novikov Chrysoula Tsogka and George Papanicolaou Waves and Imaging in Random Media, September 2017 Outline 1 The phase retrieval problem

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Breaking the Limits of Subspace Inference

Breaking the Limits of Subspace Inference Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

A New Estimate of Restricted Isometry Constants for Sparse Solutions

A New Estimate of Restricted Isometry Constants for Sparse Solutions A New Estimate of Restricted Isometry Constants for Sparse Solutions Ming-Jun Lai and Louis Y. Liu January 12, 211 Abstract We show that as long as the restricted isometry constant δ 2k < 1/2, there exist

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Positive semidefinite matrix approximation with a trace constraint

Positive semidefinite matrix approximation with a trace constraint Positive semidefinite matrix approximation with a trace constraint Kouhei Harada August 8, 208 We propose an efficient algorithm to solve positive a semidefinite matrix approximation problem with a trace

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

arxiv: v2 [cs.it] 12 Jul 2011

arxiv: v2 [cs.it] 12 Jul 2011 Online Identification and Tracking of Subspaces from Highly Incomplete Information Laura Balzano, Robert Nowak and Benjamin Recht arxiv:1006.4046v2 [cs.it] 12 Jul 2011 Department of Electrical and Computer

More information

arxiv: v1 [math.na] 28 Oct 2018

arxiv: v1 [math.na] 28 Oct 2018 Iterative Hard Thresholding for Low-Rank Recovery from Rank-One Projections Simon Foucart and Srinivas Subramanian Texas A&M University Abstract arxiv:80.749v [math.na] 28 Oct 208 A novel algorithm for

More information

Tensor Completion for Estimating Missing Values in Visual Data

Tensor Completion for Estimating Missing Values in Visual Data Tensor Completion for Estimating Missing Values in Visual Data Ji Liu, Przemyslaw Musialski 2, Peter Wonka, and Jieping Ye Arizona State University VRVis Research Center 2 Ji.Liu@asu.edu, musialski@vrvis.at,

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 16-02 The Induced Dimension Reduction method applied to convection-diffusion-reaction problems R. Astudillo and M. B. van Gijzen ISSN 1389-6520 Reports of the Delft

More information

Matrix Completion from a Few Entries

Matrix Completion from a Few Entries Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Robust PCA by Manifold Optimization

Robust PCA by Manifold Optimization Journal of Machine Learning Research 19 (2018) 1-39 Submitted 8/17; Revised 10/18; Published 11/18 Robust PCA by Manifold Optimization Teng Zhang Department of Mathematics University of Central Florida

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information