1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as

Size: px
Start display at page:

Download "1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as"

Transcription

1 A NON-MONOTONE ALTERNATING UPDATING METHOD FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Abstract. In this paper we consider a general matrix factorization model which covers a large class of existing models with many applications in areas such as machine learning and imaging sciences. To solve this possibly nonconvex, nonsmooth and non-lipschitz problem, we develop a non-monotone alternating updating method based on a potential function. Our method essentially updates two blocs of variables in turn by inexactly minimizing this potential function, and updates another auxiliary bloc of variables using an explicit formula. The special structure of our potential function allows us to tae advantage of efficient computational strategies for non-negative matrix factorization to perform the alternating minimization over the two blocs of variables. A suitable line search criterion is also incorporated to improve the numerical performance. Under some mild conditions, we show that the line search criterion is well defined, and establish that the sequence generated is bounded and any cluster point of the sequence is a stationary point. Finally, we conduct some numerical experiments using real datasets to compare our method with some existing efficient methods for non-negative matrix factorization and matrix completion. The numerical results show that our method can outperform these methods for these specific applications Key words. updating. Matrix factorization; non-monotone line search; stationary point; alternating AMS subject classifications. 90C6, 90C30, 90C90, 65K05 1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as 1.1) min FX, Y ) := ΨX) + ΦY ) + 1 AXY ) b, X,Y where X R m r and Y R n r are decision variables with r minm, n}, the functions Ψ : R m r R } and Φ : R n r R } are proper closed but possibly nonconvex, nonsmooth and non-lipschitz, b R q is a given vector and A : R m n R q is a linear map with q mn and AA = I q I q denotes the identity map from R q to R q ). Model 1.1) covers many existing widely-studied models in many application areas such as machine learning [35] and imaging sciences [44]. In particular, ΨX) and ΦY ) can be various regularizers for inducing desired structures, and A can be suitably chosen to model different scenarios. For example, when ΨX) and ΦY ) are chosen as the indicator functions see the next section for notation and definitions) for X = X R m r : X 0} and Y = Y R n r : Y 0}, respectively, and A is the identity map, 1.1) reduces to the non-negative matrix factorization NMF) problem, which has been widely used in data mining applications to provide interpretable decompositions of data. NMF was first introduced by Paatero and Tapper [5], and then popularized by Lee and Seung [17]. The basic tas of NMF is to find two nonnegative matrices X R m r + and Y R n r + such that M XY for a given nonnegative data matrix M R m n +. We refer readers to [, 9, 10, 18, 37] for more information on NMF and its variants. Another example of 1.1) arises in recent models of the matrix completion MC) problem see [30, 31, 3]), where ΨX) Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, P.R. China. lei.yang@connect.polyu.h, t.pong@polyu.edu.h, xiaojun.chen@polyu.edu.h). The second author s wor was supported in part by Hong Kong Research Grants Council PolyU153085/16p. The third author s wor was supported in part by Hong Kong Research Grants Council PolyU153000/15p. 1

2 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN and ΦY ) are chosen as the Schatten-p 1 quasi-norm and the Schatten-p quasi-norm for suitable p 1, p > 0, respectively, and A is the sampling map. The MC problem aims to recover an unnown low ran matrix from a sample of its entries and arises in various applications see, for example, [3,, 7, 33]). Many widely-studied models for MC are based on nuclear-norm minimization [5, 6, 6], or, more generally, Schatten-p 0 < p 1) quasi )norm minimization [16, 3, 4]. Recently, models based on low-ran matrix factorization such as 1.1) have become popular because singular value decompositions or eigenvalue decompositions of huge m n) matrices are not required for solving these models see, for example, [15, 30, 31, 3, 34, 38]). More examples of 1.1) can be found in recent surveys [35, 44]. Problem 1.1) is in general nonconvex even when Ψ, Φ are convex) and NP-hard 1. Therefore, in this paper, we focus on finding a stationary point of the objective F in 1.1). Note that F involves two blocs of variables. This ind of structure has been widely studied in the literature; see, for example, [1, 4, 13, 14, 40, 41, 43]. One popular class of methods for tacling this ind of problems is the alternating direction method of multipliers ADMM) see, for example, [41, 43]), in which each iteration consists of an alternating minimization of an augmented Lagrangian function that involves X, Y and some auxiliary variables, followed by updates of the associated multipliers. However, the conditions presented in [41, 43] that guarantee convergence of the ADMM are too restrictive. Moreover, updating the auxiliary variables and the multipliers can be expensive for large-scale problems. Another class of methods for 1.1) is the alternating-minimization-based or bloc-coordinate-descent-type) methods see [1, 4, 8, 11, 0, 1, 40]), which alternately exactly or inexactly) minimizes FX, Y ) over each bloc of variables and converges under some mild conditions. When A is not the identity map, the majorization technique can be used to simplify the subproblems. Some representative algorithms of this class are proximal alternating linearized minimization PALM) [4], hierarchical alternating least squares HALS) for NMF only; see [8, 11, 0, 1]) and bloc coordinate descent BCD) [40]. Comparing with ADMM, it was reported in [40] that BCD outperforms ADMM in both CPU time and solution quality for NMF. PALM, HALS and BCD are currently the state-of-the-art algorithms for solving problems of the form 1.1). In this paper, we develop a new iterative method for 1.1), which, according to our numerical experiments in Section 6, outperforms HALS and BCD for NMF, and PALM for MC. Our method is based on the following potential function specifically constructed for F in 1.1)): 77 1.) Θ α, X, Y, Z) := ΨX) + ΦY ) + α XY Z F + AZ) b, where α and are real numbers. Instead of alternately exactly or inexactly) minimizing FX, Y ) or the augmented Lagrangian function, our method alternately updates X and Y by inexactly minimizing Θ α, X, Y, Z) over X and Y, and then updates Z by an explicit formula. Note that the coupled variables XY is now separated from A in our potential function. Thus, one can readily tae advantage of efficient computational strategies for NMF, such as those used in HALS see the hierarchical-prox updating strategy in Section 4), for inexactly minimizing Θ α, X, Y, Z) over X or Y. Furthermore, our method can be implemented for NMF and MC without explicitly forming the huge m n) matrix Z see 6.3) and 6.5)) in each iteration. This signifi- [36]. 1 Problem 1.1) is NP-hard because it contains NMF as a special case, which is NP-hard in general

3 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS cantly reduces the computational cost per iteration. Finally, a suitable non-monotone 88 line search criterion, which is motivated by recent studies on non-monotone algo- 89 rithms see, for example, [7, 1, 39]), is also incorporated to improve the numerical 90 performance. 91 In the rest of this paper, we first present notation and preliminaries in Section. 9 We then study the properties of our potential function Θ α, in Section 3. Specifically, if AA = I q and α, are chosen such that αi + A A 0 and 1 α + 1 = 1, then the 94 problem min α,x, Y, Z)} is equivalent to 1.1) see Theorem 3.). Furthermore, X,Y,Z under the weaer conditions that AA = I q and 1 α + 1 = 1, we can show that i) a 96 stationary point of Θ α, gives a stationary point of F; ii) a stationary point of F can 97 be used to construct a stationary point of Θ α, see Theorem 3.3). Thus, one can find 98 a stationary point of F by finding a stationary point of Θ α,. In Section 4, we develop 99 a non-monotone alternating updating method to find a stationary point of Θ α,, and 100 hence of F. The convergence analysis of our method is presented in Section 5. We 101 show that our non-monotone line search criterion is well defined and any cluster point 10 of the sequence generated by our method is a stationary point of F under some mild 103 conditions. Section 6 gives numerical experiments to evaluate the performance of our 104 method for NMF and MC on real datasets. Our computational results illustrate the 105 efficiency of our method. Finally, some concluding remars are given in Section Notation and preliminaries. In this paper, for a vector x R m, x i denotes its i-th entry, x denotes the Euclidean norm of x and Diagx) denotes the diagonal matrix whose i-th diagonal element is x i. For a matrix X R m n, x ij denotes the ij-th entry of X, x j denotes the j-th column of X and trx) denotes the trace of X. The Schatten-p quasi-)norm 0 < p < ) of X is defined as minm,n) ) X Sp = i=1 ς p 1 i X) p, where ς i X) is the i-th singular value of X. For p =, the Schatten- norm reduces to the Frobenius norm X F, and for p = 1, the Schatten-1 norm reduces to the nuclear norm X. Moreover, the spectral norm is denoted by X, which is the largest singular value of X; and the l 1 -norm and l p -quasi-norm 0 < p < 1) of X are given by X 1 := m n i=1 j=1 x ij and m ) 1 n X p := i=1 j=1 x ij p p, respectively. For two matrices X and Y of the same size, we denote their trace inner product by X, Y := m n i=1 j=1 x ijy ij. We also use X Y resp., X Y ) to denote x ij y ij resp., x ij y ij ) for all i, j). Furthermore, for a linear map A : R m n R q, A denotes the adjoint linear map and A denotes the induced operator norm of A, i.e., A = sup AX) : X F 1}. A linear self-map T is said to be symmetric if T = T. For a symmetric linear selfmap T : R m n R m n, we say that T is positive definite, denoted by T 0, if X, T X) > 0 for all X 0. The identity map from R m n to R m n is denoted by I and the identity map from R q to R q is denoted by I q. Finally, for a nonempty closed set C R m n, its indicator function δ C is defined by 0 if X C, δ C X) = + otherwise. For an extended-real-valued function f : R m n [, ], we say that it is proper if fx) > for all X R m n and its domain domf := X R m n : fx) < } is nonempty. A function f : R m n [, ] is level-bounded [8, Definition 1.8] if for every α R, the set X R m n : fx) α} is bounded possibly empty). For a proper function f : R m n, ], we use the notation

4 4 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Y f X to denote Y X i.e., Y X F 0) and fy ) fx). The limiting) subdifferential [8, Definition 8.3] of f at X domf used in this paper, denoted by fx), is defined as fx) := D R m n : X f X and D D with D fx } ) for all, 136 where fỹ ) denotes the Fréchet subdifferential of f at Ỹ domf, which is the set 137 of all D R m n satisfying lim inf Y Ỹ,Y Ỹ fy ) fỹ ) D, Y Ỹ Y Ỹ 0. F From the above definition, we can easily observe see, for example, [8, Proposition 8.7]) that }.1) D R m n : X f X, D D, D fx ) fx). When f is continuously differentiable or convex, the above subdifferential coincides with the classical concept of derivative or convex subdifferential of f; see, for example, [8, Exercise 8.8] and [8, Proposition 8.1]. In this paper, we say that X is stationary point of f if 0 fx ). For a proper closed function g : R m, ], the proximal mapping Prox g : R m R m of g is defined by Prox g z) := Argmin gx) + 1 x R m x z }. For any ν > 0, the matrix shrinage operator S ν : R m n R m n is defined by S ν X) := UDiag s)v si ν, if s with s i = i ν > 0, 0, otherwise, where U R m t, s R t + and V R n t are given by the singular value decomposition of X, i.e, X = UDiags)V. We now present two propositions, which will be useful for developing our method in Section 4. Proposition.1. Suppose that AA = I q and αα + ) 0. Then, αi + A A is invertible and its inverse is given by 1 α I αα+) A A. Proof. It is easy to chec that 1 α I αα+) A A is well defined since αα+) 0, and that αi + A A ) ) 1 α I αα+) A A = I. This completes the proof. Proposition.. Let ψ : R m, ] and φ : R n, ] be proper closed functions. Given P, Q R m n and a R n, b R m with a = 0, b 0, the following statements hold. } i) The problem min ψx) + 1 x R m xa P F is equivalent to min ψx) + a x R m x P a } a ; ii) The problem min y R n φy) + 1 by Q F } is equivalent to min y R n φy) + b y Q b } b.

5 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS Proof. Statement i) can be easily proved by noticing that xa P F = xa F xa, P + P F = a x x, P a + P F = a x P a/ a P a / a + P F. 163 Then, statement ii) can be easily proved by using statement i) and by Q F = 164 yb Q F. 165 Before ending this section, we discuss the first-order necessary conditions for 1.1). 166 First, from [8, Exercise 8.8] and [8, Proposition 10.5], we see that FX, Y ) = ΨX) + A AXY ) b ) Y ΦY ) + A AXY ) b )) X Then, it follows from the generalized Fermat s rule [8, Theorem 10.1] that any local minimizer X, Y ) of 1.1) satisfies 0 FX, Y ), i.e., ). 170.) 0 ΨX) + A AXY ) b)y, 0 ΦY ) + A AXY ) b)) X, which implies that X, Y ) is a stationary point of F. In this paper, we focus on finding a stationary point X, Y ) of F, i.e., X, Y ) satisfies.) in place of X, Y ). 3. The potential function for F. In this section, we analyze the relation between F and its potential function Θ α, defined in 1.). Intuitively, Θ α, originates from F by separating the coupled variables XY from the linear mapping A via introducing an auxiliary variable Z and penalizing XY = Z. We will see later that the stationary point of F can be characterized by the stationary point of Θ α,. Before proceeding, we prove the following technical lemma. 179 satisfying ) Lemma 3.1. Suppose that AA = I q and 1 α + 1 Z = we have FX, Y ) = Θ α, X, Y, Z). 3.) 3.3) Proof. First, from 3.1), we have ) XY I α+ A A ) + α+ A b), XY Z = AZ) b = A XY = AXY ) α+ AA AXY ) + α+ A AXY ) b) α+ A AXY ) + = 1. Then, for any X, Y, Z) ) α+ A b) α+ AA b) b = α α+ where the last equality follows from AA = I q. Then, we see that b AXY ) b ), 188 α XY Z F + AZ) b α+ A AXY ) b) + F A AXY ) b) F + = α = α α+) 1 = α α+) 1 = α α+ 1 AXY ) b + AXY ) b, α α+ α α+) 1 AXY ) b ) α α+) 1 AXY ) b AXY ) b

6 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN where the first equality follows from 3.) and 3.3); and the third equality follows from AA = I q. This, together with 1 α + 1 = 1 and the definitions of F and Θ α, completes the proof. Based on the above lemma, we now establish the following property of Θ α,. Theorem 3.. Suppose that AA = I q. If α and are chosen such that αi + A A 0 and 1 α + 1 =1, then the problem min Θ α,x, Y, Z)} is equivalent to 1.1). X,Y,Z Proof. First, it is easy to see from αi + A A 0 that the function Z Θ α, X, Y, Z) is strongly convex. Thus, for any fixed X and Y, the optimal solution Z to the problem min Θ α,x, Y, Z)} exists and is unique, and can be obtained Z explicitly. Indeed, from the optimality condition, we have αz XY ) + A AZ ) b) = 0. Then, since αi + A A is invertible as αi + A A 0), we see that Z = αi + A A) 1 [ αxy + A b) ] [ ] 1 [αxy = α I αα+) A A + A b) ] ) [ ] = I α+ A A XY ) + α A b) αα+) A AA b) ) [ ] = I α+ A A XY ) + α αα+) A b) ) = I XY ) + α+ A A α+ A b), where the second equality follows from Proposition.1 and the fourth equality follows from AA = I q. This, together with Lemma 3.1, implies that FX, Y ) = Θ α, X, Y, Z ). Then, we have that } min Θ α,x, Y, Z)} = min min Θ α,x, Y, Z)} = min Θ α,x, Y, Z )} X,Y,Z X,Y Z X,Y This completes the proof. = min FX, Y )}. X,Y Remar 3.1. From the proof of Lemma 3.1, we see that if Φ and) Ψ are the indi cator functions of some nonempty closed sets, then FX, Y ) = α + 1 Θ α, X, Y, Z) holds with the special choice of Z in 3.1) whenever AA = I q and 1 α + 1 > 0. Thus, 10 the result in Theorem 3. remains valid whenever AA = I q and α, are chosen such that αi + A A 0 and 1 α + 1 > 0. 1 We see from Theorem 3. that 1.1) is equivalent to minimizing Θ α, with some 13 suitable choices of α and. On the other hand, we can also characterize the relation 14 between the stationary points of F and Θ α, under weaer conditions on α and. 15 Theorem 3.3. Suppose that AA = I q and α, are chosen such that 1 α + 1 = Then, the following statements hold. 17 i) If X, Y, Z ) is a stationary point of Θ α,, then X, Y ) is a stationary point 18 of F; 19 ii) If X, Y ) is a stationary point of F, then X, Y, Z ) is a stationary point of 0 1 Θ α,, where Z is given by 3.4) Z = ) X I α+ A A Y ) ) + α+ A b).

7 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS Proof. First, if X, Y, Z ) is a stationary point of Θ α,, then we have 0 Θ α, X, Y, Z ), i.e., 3.5a) 0 ΨX ) + αx Y ) Z )Y, 3.5b) 0 ΦY ) + αx Y ) Z ) X, 3.5c) 0 = αz X Y ) ) + A AZ ) b). Since 1 α + 1 = 1, we have αα+) 0 and hence αi +A A is invertible from Lemma.1. Then, using the same arguments in the proof of Theorem 3., we see from 3.5c) that X, Y, Z ) satisfies 3.4). Moreover, using 3.4) and the same arguments in 3.) and 3.3), we have 3.6) 3.7) X Y ) Z = α+ A AX Y ) ) b), AZ ) b = AX Y ) ) b ). α α+ Thus, substituting 3.6) into 3.5a) and 3.5b), we see that 0 ΨX ) + α α+ A AX Y ) ) b)y, 3.8) 0 ΦY ) + α A AX Y ) ) b) ) X. α+ This together with 1 α + 1 = 1 implies X, Y ) is a stationary point of F. This proves statement i). We now prove statement ii). First, if X, Y ) is a stationary point of F, then invoing 1 α + 1 = 1 and.), we have 3.8). Next, we consider X, Y, Z ) with Z given by 3.4). Then, X, Y, Z ) satisfies 3.6) and 3.7). Thus, substituting 3.6) into 3.8), we obtain 3.5a) and 3.5b). Moreover, we have from 3.6) and 3.7) that ) αz X Y ) ) + A AZ ) b) = α α+ A AX Y ) ) b ) + A α α+ AX Y ) ) b )) = 0. This together with 3.5a) and 3.5b) implies that X, Y, Z ) is a stationary point of Θ α,. This proves statement ii) Remar 3.. From the proof of Theorem 3.3, one can see that if Ψ and Φ are 40 cones, Theorem 3.3 remains valid under the weaer conditions that AA = I q and 1 α + 1 > 0. 4 From Theorem 3.3, we see that a stationary point of F can be obtained from a stationary point of Θ α, with a suitable choice of α and, i.e., 1 α + 1 = 1. Since the 44 linear map A is no longer associated with the coupled variables XY in Θ α,, finding 45 a stationary point of Θ α, is conceivably easier. Thus, one can consider finding a 46 stationary point of Θ α, in order to find a stationary point of F. Note that some 47 existing alternating-minimization-based methods see, for example, [1, 40]) can be 48 used to find a stationary point of Θ α,, and hence of F, under the conditions that AA = I q and α, are chosen so that αi + A A 0 and 1 α + 1 = 1. These 50 conditions further imply that α > 1 and = α α 1 > 1. However, as we will see from 51 our numerical results in Section 6, finding a stationary point of Θ α, with α > 1 5 can be slow. In view of this, in the next section, we develop a new non-monotone 53 alternating updating method for finding a stationary of Θ α, and hence of F) under the weaer conditions that AA = I q and 1 α + 1 = 1. This allows more flexibilities 55 in choosing α and.

8 58 8 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Non-monotone alternating updating method. In this section, we con- 57 sider a non-monotone alternating updating method NAUM) for finding a stationary point of Θ α, with 1 α + 1 = 1. Compared to existing alternating-minimization-based 59 methods [1, 40] applied to Θ α,, which update X, Y, Z by alternately solving sub- 60 problems related to Θ α,, NAUM updates Z by an explicit formula see 4.5)) and 61 updates X, Y by solving subproblems related to Θ α, in a Gauss-Seidel manner. Be- 6 fore presenting the complete algorithm, we first comment on the updates of X and 63 Y. 64 Let X, Y ) denote the value of X, Y ) after the 1) th iteration, and let U, V ) 65 denote the candidate for X +1, Y +1 ) at the -th iteration we will set X +1, Y +1 ) 66 to be U, V ) if a line search criterion is satisfied; more details can be found in Algorithm 67 1). For notational simplicity, we also define H α X, Y, Z) := α XY Z F for any X, Y, Z). Then, at the -th iteration, we first compute Z by 4.5) and, in the line search loop, we compute U in one of the following 3 ways for a given µ > 0: Proximal 7 4.1a) U Argmin X ΨX) + H α X, Y, Z ) + µ X X F Prox-linear 4.1b) U Argmin X ΨX) + X H α X, Y, Z ), X X + µ X X F. Hierarchical-prox If Ψ is column-wise separable, i.e., ΨX) = r i=1 ψ ix i ) for X = [x 1,, x r ] R m r, we can update U column-by-column. Specifically, for i = 1,,, r, compute c) u i Argmin x i ψ i x i ) + H α u j<i, x i, x j>i, Y, Z ) + µ x i x i, where u j<i denotes u 1,, u i 1 ) and x j>i denotes x i+1,, x r). After computing U, we compute V in one of the following 3 ways for a given σ > 0: Proximal 80 4.a) V Argmin Y ΦY ) + H α U, Y, Z ) + σ Y Y F. 81 Prox-linear b) V Argmin Y ΦY ) + Y H α U, Y, Z ), Y Y + σ Y Y F. Hierarchical-prox If Φ is column-wise separable, i.e., ΦY ) = r i=1 φ iy i ) for Y = [y 1,, y r ] R n r, we can update V column-by-column. Specifically, for i = 1,,, r, compute 84 4.c) v i Argmin y i φ i y i ) + H α U, v j<i, y i, y j>i, Z ) + σ y i y i, 85 where v j<i denotes v 1,, v i 1 ) and y j>i denotes y i+1,, y r ).

9 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS For notational simplicity, we further let 4.3) ρ := I α+ A A and let γ 0 be a nonnegative number satisfying ) α + γ) I + A A Remar 4.1 Comments on hierarchical-prox ). The hierarchical-prox updating scheme requires the column-wise separability of Ψ or Φ. This is satisfied for many common regularizers, for example, F, 1, p p 0 < p < 1), and the indicator function of the nonnegativity or box) constraint. Remar 4. Comments on ρ and γ). Since AA = I q, we see that the eigenvalues of A A are either 0 or 1. Then, the eigenvalues of I α+ A A must α be either 1 or α+, and hence ρ = max 1, α /α + ) }. Similarly, the eigenvalues of αi + A A) are either α or α + ). Then, 4.4) is satisfied whenever γ max0, α, α + )}. Now, we are ready to present NAUM as Algorithm 1. Algorithm 1 NAUM for finding a stationary point of F Input: X 0, Y 0 ), α and such that 1 α + 1 = 1, ρ as in 4.3), γ 0 satisfying 4.4), τ > 1, c > 0, µ min > 0, σ max > σ min > 0, and an integer N 0. Set = 0. while a termination criterion is not met, do Step 1. Compute Z by ) X 4.5) Z = I α+ A A Y ) ) + α+ A b). Step. Choose µ 0 µmin and σ 0 [σmin, σ max ] arbitrarily. Set µ = µ 0, σ = σ 0 and µmax = α + γρ) Y + c. a) Set µ min µ, µ max }. Compute U by either 4.1a), 4.1b) or 4.1c). b) Compute V by either 4.a), 4.b) or 4.c). c) If 4.6) FU, V ) max [ N] FXi, Y i ) c + i U X F + V Y ) F, then go to Step 3. d) If µ = µ max, set σ max = α+γρ) U +c, σ min τσ, σ max } and then, go to step b); otherwise, set µ τµ and σ τσ and then, go to step a). Step 3. Set X +1 U, Y +1 V, µ µ, σ σ, + 1 and go to Step 1. end while Output: X, Y ) In Algorithm 1, the update for Z is given explicitly. This is motivated by the condition on Z at a stationary point of Θ α, ; see 3.5c). In fact, following the same arguments in 3.9), we see that 3.5c) always holds at X, Y, Z ) with Z

10 10 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN given in 4.5) when AA = I q and 1 α + 1 = 1. If, in addition, αi + A A holds, one can show that Z is actually the optimal solution to the problem min Z Θα, X, Y, Z) } 305. In this case, our NAUM with N = 0 in 4.6) can be viewed 306 as an alternating-minimization-based method see, for example, [1, 40]) applied to the problem min X,Y,Z Θ α, X, Y, Z)}. However, if αi + A A 0, then the corresponding inf Z Θα, X, Y, Z) } = for all, and Z is only a stationary point 309 of Z Θ α, X, Y, Z). In this case, the function value of Θ α, may increase after 310 updating Z by 4.5). Fortunately, as we shall see later in 5.8) and 5.9), as long as 311 AA = I q and 1 α + 1 = 1, we still have Θ α,x +1, Y +1, Z ) < Θ α, X, Y, Z ) 31 by updating X +1 and Y +1 with properly chosen parameters µ and σ. Thus, if 313 the possible increase in Θ α, induced by the Z-update is not too large, one can still 314 ensure Θ α, X +1, Y +1, Z +1 ) < Θ α, X, Y, Z ). Moreover, it can be seen from 315 Lemma 3.1 and 4.5) that FX, Y ) = Θ α, X, Y, Z ) and hence the decrease of 316 Θ α, translates to that of F see Lemma 5.1 below). In view of this, Θ α, is a valid potential function for minimizing F as long as AA = I q and 1 α + 1 = 1, even when 318 < 0 or α < 0. Allowing negative α or maes our NAUM even with N = 0 in )) different from the classical alternating minimization schemes. 30 Our NAUM also allows U and V to be updated in three different ways respectively, 31 and hence there are 9 possible combinations. Thus, one can choose suitable updating 3 schemes to fit different applications. In particular, if Ψ or Φ are column-wise separable, taing advantage of the structure of Θ α, and the fact that XY can be written as r i=1 x iyi with X = [x 1,, x r ] R m r and Y = [y 1,, y r ] R n r, one can 35 update X or Y column-wise even when A I. The motivation for updating X 36 or Y ) column-wise rather than updating the whole X or Y ) is that the resulting 37 subproblems 4.1c) or 4.c)) can be reduced to the computation of the proximal 38 mapping of ψ i or φ i ), which is easy for many commonly used ψ i or φ i ). Indeed, 39 from 4.1c) and 4.c), u i and v i are given by u i Argmin ψ i x i ) + α x i yi ) P i x i F x i x i }, ) v i Argmin φ i y i ) + α ui yi Q i y i F y i yi }, where P i and Q i 331 Pi ) are defined by := Z i 1 j=1 u jy j ) r j=i+1 x j y j ), Q i := Z i 1 j=1 u jv j r j=i+1 u jy j ). Then, from Proposition., we can reformulate the subproblems in 4.7) and obtain the corresponding solutions by computing the proximal mappings of ψ i and φ i, which can be computed efficiently when ψ i and φ i are some common regularizers used in the literature. In particular, when ψ i ) and φ i ) are 1, or the indicator function of the box constraint, these subproblems have closed-form solutions. This updating strategy has also been used for NMF; see, for example, [8, 0, 1]. However, the methods used in [8, 0, 1] can only be applied for some specific problems with A = I, while NAUM can be applied for more general problems with AA = I q. Our NAUM adapts a non-monotone line search criterion see Step in Algorithm 1) to improve the numerical performance. This is motivated by recent studies on This may happen when 0 < α < 1 so that = αα 1) 1 < 0, or 0 < < 1 so that α = 1) 1 < 0.

11 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS non-monotone algorithms with promising performances; see, for example, [7, 1, 39]. However, different from the non-monotone line search criteria used there, NAUM only includes U, V ) in the line search loop and checs the stopping criterion 4.6) after updating a pair of U, V ), rather than checing 4.6) immediately once U or V is updated. Thus, we do not need to compute the function value after updating each bloc of variable. This may reduce the cost of the line search and mae NAUM more practical, especially when computing the function value is relatively expensive. Before moving to the convergence analysis of NAUM, we would lie to point out an interesting connection between NAUM and the low-ran matrix fitting algorithm, LMaFit [38], for solving the following matrix completion model without regularizers: min X,Y 1 P Ω XY M) F, where Ω is the index set of the nown entries of M, and P Ω Z) eeps the entries of Z in Ω and sets the remaining ones to zero. If we apply our NAUM with 4.1a) and 4.a), then at the -th iteration, the iterates Z, X +1 and Y +1 are given by ) Z = I α+ P Ω X Y ) + α+ P ΩM), X +1 = µ X + αz Y ) µ I + αy ) Y ) 1, Y +1 = σ Y + αz ) X +1) σ I + αx +1 ) X +1) 1. One can verify that the sequence Z, X +1, Y +1 )} above can be equivalently generated by the following scheme with Z 0 = P Ω M) + P Ω c X 0 Y 0 ) ) : Z = Z ) α+ + 1 α+ X Y ), X +1 = µ X + αz Y ) µ I + αy ) Y ) 1, Y +1 = σ Y + αz ) X +1) σ I + αx +1 ) X +1) 1, Z +1 = P Ω M) + P Ω c X +1 Y +1 ) ), 361 where Ω c is the complement set of Ω. Surprisingly, when µ = σ = 0, this scheme 36 is exactly the SORsuccessive over-relaxation)-lie scheme used in LMaFit see [38, 363 Eq..11)]) with ω := α+ being an over-relaxation weight. With this connection, 364 our NAUM, in some sense, can be viewed as an SOR-based algorithm. Moreover, just 365 lie the classical SOR for solving a system of linear equations, LMaFit with ω > also appears to be more efficient from the extensive numerical experiments reported 367 in [38]. Then, it is natural to consider α+ > 1 and hence 1 α > 1 since 1 α + 1 = 1) in 368 NAUM. This also gives some insights for the necessity of allowing more flexibilities in 369 choosing α and, and the promising performance of NAUM with a relatively small 370 α 0, 1) as we shall see in Section Convergence analysis of NAUM. In this section, we discuss the conver- 37 gence properties of Algorithm 1. First, we present the first-order optimality conditions 373 for the three different updating schemes in a) of Algorithm 1 as follows: 374 Proximal a) 0 ΨU) + α UY ) Z ) Y + µ U X ). 376 Prox-linear b) 0 ΨU) + α X Y ) Z ) Y + µ U X ).

12 1 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Hierarchical-prox For i = 1,,, r, i 5.1c) 0 ψ i u i ) + α j=1 u jyj ) + r j=i+1 x j yj ) Z ) yi + µ u i x i ). Similarly, the first-order optimality conditions for the three different updating schemes in b) of Algorithm 1 are Proximal 5.a) 0 ΦV ) + α UV Z ) U + σ V Y ). Prox-linear 5.b) 0 ΦV ) + α UY ) Z ) U + σ V Y ). Hierarchical-prox For i = 1,,, r, c) 0 φ i v i ) + α i j=1 u jv j + r j=i+1 u jy j ) Z ) ui + σ v i y i ). 388 We also need to mae the following assumptions Assumption a1) Ψ, Φ are proper, closed, level-bounded functions and continuous on their dom- 391 ains respectively; 39 a) AA = I q ; a3) 1 α + 1 = 1. Remar 5.1. i) From a1), one can see from [8, Theorem 1.9] that inf Ψ and inf Φ are finite, i.e., Ψ and Φ are bounded from below. In particular, the iterates 4.1a), 4.1b), 4.1c), 4.a), 4.b) and 4.c) are well defined; ii) The continuity assumption in a1) holds for many common regularizers, for example, l 1 -norm, nuclear norm and the indicator function of a nonempty closed set; iii) a) is satisfied for some common linear maps, for example, the identity map and the sampling map. We start our convergence analysis by proving the following auxiliary lemma. Lemma 5.1 Sufficient descent of F). Suppose that Assumption 5.1 holds. Let X, Y ) be generated by Algorithm 1 at the -th iteration, and U, V ) be the candidate for X +1, Y +1 ) generated by steps a) and b). Then, for any integer 0, we have 5.3) FU, V ) FX, Y ) µ α + γρ) Y U X F σ α + γρ) U V Y F. Proof. First, from Lemma 3.1 and 4.5), we see that FX, Y ) = Θ α, X, Y, Z ). For any U, V ), let ) UV W = I α+ A 5.4) A ) + α+ A b). Then, from Lemma 3.1, we have FU, V ) = Θ α, U, V, W ). Thus, to establish 5.3), we only need to consider the difference Θ α, U, V, W ) Θ α, X, Y, Z ). We start by noting that ) UV A AW ) = A A α+ A AA ) A ) + α+ A AA ) b) 5.5) = α α+ A A UV ) + α+ A b),

13 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS where the last equality follows from a) in Assumption 5.1. Then, we obtain that Z Θ α, U, V, W ) = αw UV ) + A AW ) A b) [ ] [ =α α+ A AUV )+ α+ A α b) + α+ A A UV ) + α+ A b) ] A b) = 0, where the second equality follows from 5.4) and 5.5). Moreover, since γ is chosen such that α + γ)i + A A 0 see 4.4)), we see that, for any 0, the function Z Θ α, U, V, Z) + γ Z Z F is convex and hence Θ α, U, V, Z ) + γ Z Z F }} =0 Θ α, U, V, W ) + γ W Z F + Z Θ α, U, V, W ) + γw Z ), Z W, }} =0 which implies that ) Θ α, U, V, W ) Θ α, U, V, Z ) γ W Z F Then, substituting 4.5) and 5.4) into 5.6), we obtain ) Θ α, U, V, W ) Θ α, U, V, Z ) γ UV I α+ A A X Y ) ) F γ I α+ A A UV X Y ) F = γρ UV Y ) + U X )Y ) F 5.7) UV Y ) F + U X )Y ) ) F γρ i) γρ ii) γρ U V Y F + Y U X F ) U V Y F + Y U X F where the equality follows from the definition of ρ in 4.3); i) follows from the relation AB F A B F ; and ii) follows from the relation a + b a + b. Next, we claim that 5.8) Θ α, U, V, Z ) Θ α, U, Y, Z ) α U σ V Y F, Θ α, U, Y, Z ) Θ α, X, Y, Z ) α Y µ 5.9) U X F. Below, we will only prove 5.8). The proof for 5.9) can be done in a similar way. To prove 5.8), we consider the following three cases. Proximal: In this case, we have Θ α, U, V, Z ) Θ α, U, Y, Z ) = ΦV )+H α U, V, Z ) ΦY ) H α U, Y, Z ) [ = ΦV )+H α U, V, Z )+ σ ] V Y F [ ΦY )+H α U, Y, Z ) ] σ V Y F σ V Y F, where the inequality follows from the definition of V as a minimizer of 4.a). This implies 5.8). ),

14 14 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Prox-linear: In this case, we have Θ α, U, V, Z ) Θ α, U, Y, Z ) = ΦV )+H α U, V, Z ) ΦY ) H α U, Y, Z ) ΦV ) + H α U, Y, Z ) + Y H α U, Y, Z ), V Y + α U V Y F ΦY ) H α U, Y, Z ) = ΦV )+ Y H α U, Y, Z ), V Y + σ V Y F ΦY )+ α U σ V Y F α U σ V Y F, where the first inequality follows from the fact that Y Y H α X, Y, Z) is Lipschitz with modulus α X and the last inequality follows from the definition of V as a minimizer of 4.b). Hierarchical-prox: In this case, for any 1 i r, we have Θ α, U, v j<i, v i, y j>i, Z ) Θ α, U, v j<i, y i, y j>i, Z ) = φ i v i ) + H α U, v j<i, v i, yj>i, Z ) φ i yi ) H α U, v j<i, yi, yj>i, Z ) [ = φ i v i ) + H α U, v j<i, v i, yj>i, Z ) + σ v i yi ] σ v i yi [ φ i yi ) + H α U, v j<i, yi, yj>i, Z ) ] σ v i y i, where the inequality follows from the definition of v i as a minimizer of 4.c). Then, summing the above relation from i = r to i = 1 and simplifying the resulting inequality, we obtain 5.8). The inequality 5.9) can be obtained via a similar argument. Now, summing 5.7), 5.8) and 5.9), and using FU, V ) = Θ α, U, V, W ) and FX, Y ) = Θ α, X, Y, Z ), we obtain 5.3). This completes the proof. From Lemma 5.1, we see that the sufficient descent of FX, Y ) can be guaranteed as long as µ and σ are sufficiently large. Thus, based on this lemma, we can show in the following proposition that our non-monotone line search criterion in Algorithm 1 is well defined. Proposition 5. Well-definedness of the non-monotone line search criterion). Suppose that Assumption 5.1 holds and Algorithm 1 is applied. Then, for each 0, the line search criterion 4.6) is satisfied after finitely many inner iterations. Proof. We prove this proposition by contradiction. Assume that there exists a 0 such that the line search criterion 4.6) cannot be satisfied after finitely many inner iterations. Note from a) and d) in Step of Algorithm 1 that µ µ max = α + γρ) Y + c and hence µ = µ max must be satisfied after finitely many inner iterations. Let n denote the number of inner iterations when µ = µ max is satisfied for the first time. If µ 0 µmax, then n = 1; otherwise, we have which implies that 5.10) n µ min τ n µ 0 τ n < µ max, logµ max ) logµ min ) +. log τ

15 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS Then, from d) in Step of Algorithm 1, we have U U µ max and σ max = α + γρ) U µ max + c after at most n + 1 inner iterations, where U µ max is computed by 4.1a), 4.1b) or 4.1c) with µ = µ max. Moreover, we see that σ = σ max must be satisfied after finitely many inner iterations. Similarly, let ˆn denote the number of inner iterations when σ = σ max is satisfied for the first time. If σ 0 > σmax, then ˆn = n ; if σ 0 = σmax, then ˆn = 0; otherwise, we have which implies that ˆn σ min τ ˆn 1 σ 0 τ ˆn 1 < σ max, logσ max ) logσ min ) + 1. log τ Thus, after at most maxn, ˆn }+1 inner iterations, we must have V V σ max is computed by 4.a), 4.b) or 4.c) with σ = σ max V σ max most maxn, ˆn } + 1 inner iterations, we have FU µ max, V σ max) FX, Y ) µmax α+γρ) Y = c Uµ max U µ max X F + V σ max Y F ), X F σmax α+γρ) U µ max, where. Therefore, after at V σ max Y F where the inequality follows from 5.3) and the equality follows from µ max = α + γρ) Y + c and σ max = α + γρ) U µ max + c. This together with FX, Y ) max [ N] FXi, Y i ) + i implies that 4.6) must be satisfied after at most maxn, ˆn } + 1 inner iterations, which leads to a contradiction. Now, we are ready to prove our main convergence result, which characterizes a cluster point of the sequence generated by Algorithm 1. Our proof of statement ii) in the following theorem is similar to that of [39, Lemma 4]. However, the arguments involved are more intricate since we have two blocs of variables in our line search loop. Theorem 5.3. Suppose that Assumption 5.1 holds. Let X, Y )} be the sequence generated by Algorithm 1. Then, i) boundedness of sequence) X, Y )}, µ } and σ } are bounded; ii) diminishing successive changes) lim X+1 X F + Y +1 Y F = 0; iii) global subsequential convergence) any cluster point X, Y ) of X, Y )} is a stationary point of F. Proof. Statement i). We first show that ) FX, Y ) FX 0, Y 0 ) for all 1. We will prove it by induction. Indeed, for = 1, it follows from Proposition 5. that FX 1, Y 1 ) FX 0, Y 0 ) c X 1 X 0 F + Y 1 Y 0 ) F 0

16 16 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN is satisfied after finitely many inner iterations. Hence, 5.11) holds for = 1. We now suppose that 5.11) holds for all K for some integer K 1. Then, we only need to show that 5.11) also holds for = K + 1. For = K + 1, we have FX K+1, Y K+1 ) FX 0, Y 0 ) FX K+1, Y K+1 ) max [K N] FX, Y ) + i K c X K+1 X K F + Y K+1 Y K ) F 0, where the first inequality follows from the induction hypothesis and the second inequality follows from 4.6). Hence, 5.11) holds for = K + 1. This completes the induction. Then, from 5.11), we have that for any 0, FX 0, Y 0 ) FX, Y ) = ΨX ) + ΦY ) + 1 AX Y ) ) b, which, together with a1) in Assumption 5.1, implies that the sequences X }, Y } and AX Y ) ) b } are bounded. Moreover, from Step and Step 3 in Algorithm 1, it is easy to see µ µ max = α + γρ) Y + c for all. Since Y } is bounded, the sequences µ max } and µ } are bounded. Next, we prove the boundedness of σ }. Indeed, at the -th iteration, there are three possibilities: µ < µ max : In this case, we have σ σ 0τ ñ σ max τ ñ, where ñ denotes the number of inner iterations for } the line search at the -th iteration and ñ logµ max ) logµ max 1, min ) log τ + see 5.10) and the discussions preceding it). µ = µ max and σ > σ max : In this case, } we have σ σ 0τ ñ σ max τ ñ, where logµ max ) logµ ñ max 1, min ) log τ +. Otherwise, we have σ σ max = α + γρ) X +1 + c. Note that ñ } is bounded as µ max } is bounded. Thus, σ } is bounded as the sequences X } and ñ } are bounded. This proves statement i). Statement ii). We first claim that any cluster point of X, Y )} is in domf. Since X, Y )} is bounded from statement i), there exists at least one cluster point. Suppose that X, Y ) is a cluster point of X, Y )} and let X i,y i )} be a convergent subsequence such that limx i, Y i ) = X, Y ). Then, from the lower i semicontinuity of F since Ψ, Φ are closed by a1) in Assumption 5.1) and 5.11), we have FX, Y ) lim i FX i, Y i ) FX 0, Y 0 ), which implies that FX, Y ) is finite and hence X, Y ) domf. For notational simplicity, from now on, we let X := X +1 X, Y := Y +1 Y, Z := Z +1 Z and ) l) = arg max FX i, Y i ) : i = [ N] +,, }. i 530 Then, the line search criterion 4.6) can be rewritten as ) FX +1, Y +1 ) FX l), Y l) ) c X F + Y ) F 0.

17 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS Observe that FX l+1), Y l+1) ) } = max [+1 N] FXi, Y i ) = max FX +1, Y +1 ), max + i +1 [+1 N] FXi, Y i ) + i } i) max FX l), Y l) ), max [+1 N] FXi, Y i ) + i } max FX l), Y l) ), max [ N] FXi, Y i ) + i } ii) = max FX l), Y l) ), FX l), Y l) ) = FX l), Y l) ), where i) follows from 5.13) and ii) follows from 5.1). Therefore, the sequence FX l), Y l) )} is non-increasing. Since FX l), Y l) ) is also bounded from below due to a1) in Assumption 5.1), we conclude that there exists a number F such that 5.14) lim FXl), Y l) ) = F. We next prove by induction that for all j 1, 5.15a) lim X = lim l) j Y l) j = 0, 5.15b) lim FXl) j, Y l) j ) = F. We first prove 5.15a) and 5.15b) for j = 1. Applying 5.13) with replaced by l) 1, we obtain FX l), Y l) ) FX ll) 1), Y ll) 1) ) c which, together with 5.14), implies that 5.16) Then, from 5.14) and 5.16), we have lim X = lim l) 1 Y l) 1 = 0. X l) 1 F + Y l) 1 ) F, F = lim FXl), Y l) ) = lim FXl) 1 + X l) 1, Y l) 1 + Y l) 1) = lim FXl) 1, Y l) 1 ), where the last equality follows because X, Y )} is bounded, any cluster point of X, Y )} is in domf and F is uniformly continuous on any compact subset of domf under a1) in Assumption 5.1. Thus, 5.15a) and 5.15b) hold for j = 1. We next suppose that 5.15a) and 5.15b) hold for j = J for some J 1. It remains to show that they also hold for j = J + 1. Indeed, from 5.13) with replaced by l) J 1 here, without loss of generality, we assume that is large enough such that l) J 1 is nonnegative), we have FX l) J, Y l) J ) FX ll) J 1), Y ll) J 1) ) c X l) J 1 F + Y l) J 1 F ), which implies that X l) J 1 F + Y l) J 1 F c FX ll) J 1), Y ll) J 1) ) FX l) J, Y l) J ) ).

18 18 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN This together with 5.14) and the induction hypothesis implies that lim X = lim l) J+1) Y l) J+1) = 0. Thus, 5.15a) holds for j = J + 1. From this, we further have lim FXl) J+1), Y l) J+1) )= lim FXl) J X l) J+1), Y l) J Y l) J+1)) = lim FXl) J, Y l) J ) = F, where the second equality follows because X, Y )} is bounded, any cluster point of X, Y )} is in domf and F is uniformly continuous on any compact subset of domf under a1) in Assumption 5.1. Hence, 5.15b) also holds for j = J + 1. This completes the induction. We are now ready to prove the main result in this statement. Indeed, from 5.1), we can see N l) without loss of generality, we assume that is large enough such that N). Thus, for any, we must have N 1 = l) j for 1 j N + 1. Then, we have X N 1 F = X l) j F Y N 1 F = Y l) j F This together with 5.15a) implies that max 1 j N+1 X l) j F, max 1 j N+1 Y l) j F. lim X = lim X N 1 = 0, lim Y = lim Y N 1 = 0. This proves the statement ii). Statement iii). Again, let X, Y ) be a cluster point of X, Y )} and let X i,y i )} be a convergent subsequence such that limx i, Y i ) = X, Y ). Recall i that X, Y ) domf. On the other hand, it is easy to see from 4.5) that lim Z i = i Z, where Z is given by 3.4). Thus, it can be shown as in 3.9) that 5.17) We next show that 5.18a) 5.18b) αz X Y ) ) + A AZ ) b) = 0. 0 ΨX ) + αx Y ) Z )Y, 0 ΦY ) + αx Y ) Z ) X. We start by showing 5.18a) in the following cases: Proximal&Prox-linear: In these two cases, passing to the limit along X i, Y i )} in 5.1a) or 5.1b) with X i+1 in place of U and µ i in place of µ, and invoing a1) in Assumption 5.1, statements i), ii), X, Y ) domf and.1), we obtain 5.18a). Hierarchical-prox: In this case, passing to the limit along X i, Y i )} in 5.1c) with X i+1 in place of U and µ i in place of µ, and invoing a1) in Assumption 5.1, statements i), ii), X, Y ) domf and.1), we have 0 ψ i x i ) + αx Y ) Z )y i for any i = 1,,, r. Then, stacing them up, we obtain 5.18a).

19 NAUM FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS 19 Similarly, we can obtain 5.18b). Thus, combining 5.17), 5.18a) and 5.18b), we see that X, Y, Z ) is a stationary point of Θ α,, which further implies X, Y ) is a stationary point of F from Theorem 3.3. This proves statement iii). 589 Remar 5. Comment on a3) in Assumption 5.1). If Φ and Ψ are the 590 indicator functions of some nonempty closed sets, Theorem 5.3 can remain valid under 591 the weaer condition on α and that 1 α + 1 > 0 with a slight modification in 4.6) 59 of Algorithm 1. Indeed, when Φ and Ψ are the indicator functions, one can see from 593 Remar 3.1 and the proofs of Lemma 5.1 and Proposition 5. that if 1 α + 1 > 0, then ) Θα, FU, V ) FX, Y 1 ) = α + 1 U, V, W ) Θ α, X, Y, Z ) ) 594 ) ) 1 α + 1 µ α+γρ) Y U X F + σ α+γρ) U V Y F, ) and the line search criterion is well defined with c replaced by α + 1 c. Moreover, 596 recalling [8, Exercise 8.14], we see that Ψ and Φ are normal cones. Thus, following Remar 3. and the similar augments in Theorem 5.3, we can obtain the same results ) when 1 α + 1 > 0 with c replaced by 1 α + 1 c in 4.6) of Algorithm 1. Remar 5.3 Comments on updating µ max and σ max ). In Algorithm 1, we need to evaluate µ max = α + γρ) Y + c and σ max = α + γρ) U + c in each iteration. However, computing the spectral norms of Y and U might be costly, especially when r is large. Hence, in our experiments, instead of computing Y and U, we compute Y F and U F, and update µmax and σ max by µ max = α + γρ) Y F + c and σmax = α + γρ) U F + c instead. Since Y Y F and U U F, it follows from 5.3) that FU, V ) FX, Y ) µ α+γρ) Y F U X F σ α+γρ) U F V Y F. Then, one can show that Proposition 5. and Theorem 5.3 remain valid. In addition, we compute the quantities U F and Y F by tru U) and try ) Y ), respectively. For some cases, the matrices U U and Y ) Y can be used repeatedly in updating the variables and evaluating the objective value and successive changes to reduce the cost of line search; see a concrete example in Section Numerical experiments. In this section, we conduct numerical experiments to test our algorithm for NMF and MC on real datasets. All experiments are run in MATLAB R015b on a 64-bit PC with an Intel Core i CPU 3.60 GHz) and 3 GB of RAM equipped with Windows 10 OS. 6.1) 6.1. Non-negative matrix factorization. We first consider NMF min X,Y 1 XY M s.t. X 0, Y 0, F where X R m r and Y R n r are decision variables. Note that the feasible set of 6.1) is unbounded. We hence focus on the following model: 6.) min X,Y 1 XY M s.t. 0 X X max, 0 Y Y max, F where X max 0 and Y max 0 are upper bound matrices. One can show that, when Xij max and Yij max are sufficiently large 3, solving 6.) gives a solution of 6.1). In our 3 The estimations of X max ij and Y max ij have been discussed in [9, Page 67].

20 0 LEI YANG, TING KEI PONG, AND XIAOJUN CHEN experiments, for simplicity, we set Xij max = and Yij max = for all i, j). Now, we see that 6.) corresponds to 1.1) with ΨX) = δ X X), ΦY ) = δ Y Y ) and A = I, where X = X R m r : 0 X X max } and Y = Y R n r : 0 Y Y max }. We apply NAUM to solving 6.), and use 4.1c) and 4.c) to update U and V. The specific updates of Z, u i and v i are α Z = α+ X Y ) + u i = max 0, min α+ M, x max i, v i = max 0, min yi max, αp i y i + µ x i α y i + µ αq i ) u i + σ y i α u i + σ }}, i = 1,, r, }}, i = 1,, r, where Pi and Q i are defined in 4.8). Note that here it is not necessary to update Z explicitly. Indeed, we can directly compute Pi y i and Q i ) u i by substituting Z as below: Pi yi = α α+ X Y ) yi + α+ My i i 1 j=1 u jyj ) yi r j=i+1 x j yj ) yi, 6.3) Q i ) u i = α α+ Y X ) u i + α+ M u i i 1 j=1 v ju j u i r j=i+1 y j u j u i. When computing X Y ) yi and Y X ) u i in the above, we first compute Y ) yi and X ) u i to avoid forming the huge m n) matrix X Y ). Moreover, the matrices X ) U, U U, Y ) Y and M U that have been computed in 6.3) can be used again to evaluate the successive changes and the objective value as follows: U X F = tru U) trx ) U) + trx ) X ), V Y F = trv V ) try ) V ) + try ) Y ), UV M F = tru U)V V )) trm U)V ) + M F In the above relations, X ) X and Y ) Y can be obtained from U U and V V in the previous iteration, respectively, and M F can be computed in advance. Ad- 640 ditionally, as we discussed in Remar 5.3, try ) Y ) and tru U) can also be 641 used in computing µ max and σ max, respectively. These techniques were also used in 64 many popular algorithms for NMF to reduce the computational cost see, for example, 643 [, 9, 10, 18, 37]). 644 The experiments are conducted on the face datasets dense matrices) and the 645 text datasets sparse matrices). For face datasets, we use CBCL 4, ORL 5 [9] and the 646 extended Yale Face Database B e-yaleb) 6 [19] for our test. CBCL contains images of faces with pixels, ORL contains 400 images of faces with pixels, and e-yaleb contains 414 images of faces with pixels. In our 649 experiments, for each face dataset, each image is vectorized and staced as a column 650 of a data matrix M of size m n. For text datasets, we use three datasets from the 651 CLUTO toolit 7. The specific values of m and n for each dataset and the values of r 65 used for our tests are summarized in Table The parameters in NAUM are set as follows: µ min = µ 1 = 1, σ min = σ 1 = 1, σ max = 10 6, τ = 4, c = 10 4, N = 3, µ 0 = max 0.1 µ 1, µ min} and σ 0 = 4 Available in 5 Available in 6 Available in iswa/extyaledatabase/extyaleb.html. 7 Available in

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems arxiv:1711.06831v [math.oc] 1 Nov 017 Lei Yang Abstract In this paper, we consider a class of

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

1. Introduction. We consider the following constrained optimization problem:

1. Introduction. We consider the following constrained optimization problem: SIAM J. OPTIM. Vol. 26, No. 3, pp. 1465 1492 c 2016 Society for Industrial and Applied Mathematics PENALTY METHODS FOR A CLASS OF NON-LIPSCHITZ OPTIMIZATION PROBLEMS XIAOJUN CHEN, ZHAOSONG LU, AND TING

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

496 B.S. HE, S.L. WANG AND H. YANG where w = x y 0 A ; Q(w) f(x) AT g(y) B T Ax + By b A ; W = X Y R r : (5) Problem (4)-(5) is denoted as MVI

496 B.S. HE, S.L. WANG AND H. YANG where w = x y 0 A ; Q(w) f(x) AT g(y) B T Ax + By b A ; W = X Y R r : (5) Problem (4)-(5) is denoted as MVI Journal of Computational Mathematics, Vol., No.4, 003, 495504. A MODIFIED VARIABLE-PENALTY ALTERNATING DIRECTIONS METHOD FOR MONOTONE VARIATIONAL INEQUALITIES Λ) Bing-sheng He Sheng-li Wang (Department

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

On proximal-like methods for equilibrium programming

On proximal-like methods for equilibrium programming On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Convex Analysis and Optimization Chapter 4 Solutions

Convex Analysis and Optimization Chapter 4 Solutions Convex Analysis and Optimization Chapter 4 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES JONATHAN LUK These notes discuss theorems on the existence, uniqueness and extension of solutions for ODEs. None of these results are original. The proofs

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Zaiwen Wen, Wotao Yin, Yin Zhang 2010 ONR Compressed Sensing Workshop May, 2010 Matrix Completion

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

CLASSICAL ITERATIVE METHODS

CLASSICAL ITERATIVE METHODS CLASSICAL ITERATIVE METHODS LONG CHEN In this notes we discuss classic iterative methods on solving the linear operator equation (1) Au = f, posed on a finite dimensional Hilbert space V = R N equipped

More information

arxiv: v2 [math.oc] 28 Jan 2019

arxiv: v2 [math.oc] 28 Jan 2019 On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Liang Chen Xudong Li Defeng Sun and Kim-Chuan Toh March 28, 2018; Revised on Jan 28, 2019 arxiv:1803.10803v2

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

A proximal-like algorithm for a class of nonconvex programming

A proximal-like algorithm for a class of nonconvex programming Pacific Journal of Optimization, vol. 4, pp. 319-333, 2008 A proximal-like algorithm for a class of nonconvex programming Jein-Shan Chen 1 Department of Mathematics National Taiwan Normal University Taipei,

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Reweighted Nuclear Norm Minimization with Application to System Identification

Reweighted Nuclear Norm Minimization with Application to System Identification Reweighted Nuclear Norm Minimization with Application to System Identification Karthi Mohan and Maryam Fazel Abstract The matrix ran minimization problem consists of finding a matrix of minimum ran that

More information

Lecture 7: Semidefinite programming

Lecture 7: Semidefinite programming CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

The Split Hierarchical Monotone Variational Inclusions Problems and Fixed Point Problems for Nonexpansive Semigroup

The Split Hierarchical Monotone Variational Inclusions Problems and Fixed Point Problems for Nonexpansive Semigroup International Mathematical Forum, Vol. 11, 2016, no. 8, 395-408 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2016.6220 The Split Hierarchical Monotone Variational Inclusions Problems and

More information

A NOVEL FILLED FUNCTION METHOD FOR GLOBAL OPTIMIZATION. 1. Introduction Consider the following unconstrained programming problem:

A NOVEL FILLED FUNCTION METHOD FOR GLOBAL OPTIMIZATION. 1. Introduction Consider the following unconstrained programming problem: J. Korean Math. Soc. 47, No. 6, pp. 53 67 DOI.434/JKMS..47.6.53 A NOVEL FILLED FUNCTION METHOD FOR GLOBAL OPTIMIZATION Youjiang Lin, Yongjian Yang, and Liansheng Zhang Abstract. This paper considers the

More information

Maximization of Submodular Set Functions

Maximization of Submodular Set Functions Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author:

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material)

A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material) A Proximal Alternating Direction Method for Semi-Definite Rank Minimization (Supplementary Material) Ganzhao Yuan and Bernard Ghanem King Abdullah University of Science and Technology (KAUST), Saudi Arabia

More information

SOS TENSOR DECOMPOSITION: THEORY AND APPLICATIONS

SOS TENSOR DECOMPOSITION: THEORY AND APPLICATIONS SOS TENSOR DECOMPOSITION: THEORY AND APPLICATIONS HAIBIN CHEN, GUOYIN LI, AND LIQUN QI Abstract. In this paper, we examine structured tensors which have sum-of-squares (SOS) tensor decomposition, and study

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem Advances in Operations Research Volume 01, Article ID 483479, 17 pages doi:10.1155/01/483479 Research Article Modified Halfspace-Relaxation Projection Methods for Solving the Split Feasibility Problem

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

A Regularized Directional Derivative-Based Newton Method for Inverse Singular Value Problems

A Regularized Directional Derivative-Based Newton Method for Inverse Singular Value Problems A Regularized Directional Derivative-Based Newton Method for Inverse Singular Value Problems Wei Ma Zheng-Jian Bai September 18, 2012 Abstract In this paper, we give a regularized directional derivative-based

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

An Augmented Lagrangian Approach for Sparse Principal Component Analysis

An Augmented Lagrangian Approach for Sparse Principal Component Analysis An Augmented Lagrangian Approach for Sparse Principal Component Analysis Zhaosong Lu Yong Zhang July 12, 2009 Abstract Principal component analysis (PCA) is a widely used technique for data analysis and

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

On Penalty and Gap Function Methods for Bilevel Equilibrium Problems

On Penalty and Gap Function Methods for Bilevel Equilibrium Problems On Penalty and Gap Function Methods for Bilevel Equilibrium Problems Bui Van Dinh 1 and Le Dung Muu 2 1 Faculty of Information Technology, Le Quy Don Technical University, Hanoi, Vietnam 2 Institute of

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions

Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions Multiple integrals: Sufficient conditions for a local minimum, Jacobi and Weierstrass-type conditions March 6, 2013 Contents 1 Wea second variation 2 1.1 Formulas for variation........................

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Nonlinear Programming Algorithms Handout

Nonlinear Programming Algorithms Handout Nonlinear Programming Algorithms Handout Michael C. Ferris Computer Sciences Department University of Wisconsin Madison, Wisconsin 5376 September 9 1 Eigenvalues The eigenvalues of a matrix A C n n are

More information