Rank-Constrainted Optimization: A Riemannian Manifold Approach

Similar documents
Rank-constrained Optimization: A Riemannian Manifold Approach

Riemannian Optimization and its Application to Phase Retrieval Problem

Note on the convex hull of the Stiefel manifold

An extrinsic look at the Riemannian Hessian

Solving PhaseLift by low-rank Riemannian optimization methods for complex semidefinite constraints

A Riemannian Limited-Memory BFGS Algorithm for Computing the Matrix Geometric Mean

Optimization Algorithms on Riemannian Manifolds with Applications

The nonsmooth Newton method on Riemannian manifolds

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Wen Huang. Webpage: wh18 update: May 2018

H 2 -optimal model reduction of MIMO systems

Extending a two-variable mean to a multi-variable mean

A variable projection method for block term decomposition of higher-order tensors

Generalized Shifted Inverse Iterations on Grassmann Manifolds 1

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery

Variational inequalities for set-valued vector fields on Riemannian manifolds

LINE SEARCH ALGORITHMS FOR LOCALLY LIPSCHITZ FUNCTIONS ON RIEMANNIAN MANIFOLDS

arxiv: v1 [math.oc] 22 May 2018

Non-negative matrix factorization with fixed row and column sums

STUDY of genetic variations is a critical task to disease

Riemannian Optimization with its Application to Blind Deconvolution Problem

Krylov Techniques for Model Reduction of Second-Order Systems

Optimization on the Grassmann manifold: a case study

On Some Variational Optimization Problems in Classical Fluids and Superfluids

An implementation of the steepest descent method using retractions on riemannian manifolds

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve

Algorithms for Riemannian Optimization

Quasi-Newton Methods

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

arxiv: v2 [math.oc] 31 Jan 2019

c 2010 Society for Industrial and Applied Mathematics

A Riemannian symmetric rank-one trust-region method

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

CONVEX OPTIMIZATION OVER POSITIVE POLYNOMIALS AND FILTER DESIGN. Y. Genin, Y. Hachez, Yu. Nesterov, P. Van Dooren

A Smoothing Newton Method for Solving Absolute Value Equations

A derivative-free nonmonotone line search and its application to the spectral residual method

Generalized Power Method for Sparse Principal Component Analysis

Invariant Subspace Computation - A Geometric Approach

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Riemannian Pursuit for Big Matrix Recovery

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Unconstrained optimization

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

Online Learning in the Manifold of Low-Rank Matrices

Navigation and Obstacle Avoidance via Backstepping for Mechanical Systems with Drift in the Closed Loop

Intrinsic Representation of Tangent Vectors and Vector Transports on Matrix Manifolds

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Linear & nonlinear classifiers

Global Convergence of Perry-Shanno Memoryless Quasi-Newton-type Method. 1 Introduction

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

You should be able to...

A Model-Trust-Region Framework for Symmetric Generalized Eigenvalue Problems

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

arxiv: v4 [math.oc] 5 Jan 2016

Projection-like retractions on matrix manifolds

Preconditioned conjugate gradient algorithms with column scaling

Recurrent Neural Network Approach to Computation of Gener. Inverses

Accelerated line-search and trust-region methods

Optimization and Root Finding. Kurt Hornik

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Pogorelov Klingenberg theorem for manifolds homeomorphic to R n (translated from [4])

Gradient method based on epsilon algorithm for large-scale nonlinearoptimization

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

REVISITING THE STABILITY OF COMPUTING THE ROOTS OF A QUADRATIC POLYNOMIAL

5 Quasi-Newton Methods

Approximating the joint spectral radius using a genetic algorithm framework

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Subspace Projection Matrix Completion on Grassmann Manifold

HAIYUN ZHOU, RAVI P. AGARWAL, YEOL JE CHO, AND YONG SOO KIM

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Supplementary Notes for W. Rudin: Principles of Mathematical Analysis

Robotics 1 Inverse kinematics

Linear Regression (continued)

Line Search Methods for Unconstrained Optimisation

New Inexact Line Search Method for Unconstrained Optimization 1,2

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Programming, numerics and optimization

Numerical optimization

Information-Theoretic Limits of Matrix Completion

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Computations of Critical Groups at a Degenerate Critical Point for Strongly Indefinite Functionals

On the convergence of higher-order orthogonality iteration and its extension

Convergence analysis of Riemannian trust-region methods

Variational Analysis and Tame Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Transcription:

Ran-Constrainted Optimization: A Riemannian Manifold Approach Guifang Zhou1, Wen Huang2, Kyle A. Gallivan1, Paul Van Dooren2, P.-A. Absil2 1- Florida State University - Department of Mathematics 1017 Academic Way, Tallahassee, FL 206-410, US 2- Universite catholique de Louvain - ICTEAM Institute Avenue G. Lemaı tre 4, 148 Louvain-la-Neuve, Belgium Abstract. This paper presents an algorithm that solves optimization problems on a matrix manifold M Rm n with an additional ran inequality constraint. New geometric objects are defined to facilitate efficiently finding a suitable ran. The convergence properties of the algorithm are given and a weighted low-ran approximation problem is used to illustrate the efficiency and effectiveness of the algorithm. 1 Introduction We consider low-ran optimization problems that combine a ran inequality constraint with a matrix manifold constraint: min f (X), X M (1) where min(m, n), M = {X M ran(x) } and M is a submanifold of Rm n. Typical choices for M are: the entire set Rm n, a sphere, symmetric matrices, elliptopes, spectrahedrons. Recently, optimization on manifolds has attracted significant attention. Attempts have been made to understand problem (1) by considering a related but simpler problem minx Rm n f (X), where Rm n = {X Rm n ran(x) = }, (see e.g., [1, 2]). Since Rm n is a submanifold of Rm n of dimension (m+n ), the simpler problem can be solved using techniques from Riemannian optimization [] applied to matrix manifolds. However, a disadvantage is that the manifold Rm n is not closed in Rm n, which complicates considerably the conver gence analysis and performance of an iteration. Very recently a more global view of a projected line-search method on Rm n = m n {X R ran(x) } along with a convergence analysis has been developed in [4]. In [], the results of [4] are exploited to propose an algorithm that successively increases the ran by a given constant. Its convergence result to critical This research was supported by the National Science Foundation under grant NSF1262476. This paper presents research results of the Belgian Networ DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office. This wor was supported by grant FNRS PDR T.017.1. 249

points can be deduced from [4, Th..9]; it relies on the assumption, often satisfied in practice, that the limit points have ran. Under this assumption, a line-search method on Rm n is ultimately the same as a line-search method on m n R. In this paper, we develop an efficient algorithm for the optimization problems. The main contributions are as follows: First, we generalize the admissible set from Rm n to M. Second, the proposed algorithm solves a ran inequal ity constrained problem while finding a suitable ran for approximation of the locally optimal solution. Third, the proposed algorithm successively increases or decreases the ran by an adaptively-chosen amount; therefore, the ran is updating appropriately, which avoids excessive increase of ran in order to save computations and storage. Finally, theoretical convergence results are given. The rest of this paper is organized as follows. A tangent cone descent algorithm is given in Section 2. Some theoretical results are presented in Section without proofs. Finally, a performance study is provided on the weighted lowran problem [6] in Section 4. 2 A Tangent Cone Descent Algorithm The following assumptions are made to hold throughout this paper: (i) The closure of Mr := {X M ran(x) = r} is a subset of or equal to M r. (ii) Mr is a manifold. (iii) The extension of cost function f in (1) on M is well-defined and it is continuously differentiable. The basic idea of the new algorithm is applying Riemannian optimization methods on a fixed-ran manifold Mr while efficiently and effectively updating the ran r. For each fixed-ran manifold Mr, efficient and well-understood Riemannian optimization algorithms, see e.g., [7, 8, 9], can be used directly. The main issue is to move from one fixed-ran manifold to another efficiently and effectively. A ran adjustment decision considers the following two functions: one is an extension of f on M, i.e., ff : M R : X 7 ff (X), such that f = ff M, and the other is the restriction of f on a fixed-ran manifold Mr, i.e., fr : Mr R : X 7 fr (X), such that fr = f Mr. The ran is increased at x Mr if r < and the following two conditions hold simultaneously: Condition I (angle threshold ǫ1 ): tan( (gradff (X), gradfr (X))) > ǫ1 ; Condition II (difference threshold ǫ2 ): gradff (X) gradfr (X) ǫ2, where gradff (X) and gradfr (X) denote the Riemannian gradients of ff and fr at a point X Mr. The parameters ǫ1 and ǫ2 are important for finding the exact/approximate solutions and controlling the computational efficiency of the method. A smaller ǫ1 value maes it easier to increase ran per iteration. The smaller ǫ2 is the stricter the accuracy of the approximate local minimizer required. In particular, convergence to critical points of (1) is obtained if ǫ2 is set to 0. Next, we discuss the details about updating the ran. The tangent cone to M at a point X M, TX M, is equal to the set of all smooth admissible directions for M at X (see e.g., [10]). Note that TX M is a linear space 20

for X M but is not for any point X M 1. The tangent cone TX M contains the tangent space TX Mr and also the curves approaching X by points of ran T greater than r, but not beyond, i.e., TX M = TX Mr + {η r NX Mr TX M ran(η r ) r}. The characterization of TX M for M = Rm n has been given in [4]. To obtain the next iteration on M M, a general line-search method is considered: Xn+1 = R Xn (tn ηxn,r ), where tn is a step size and R is a ran-related retraction defined below in Definition 1. Definition 1 (Ran-related retraction) Let X Mr. A mapping R X : TX M M is a ran-related retraction if, ηx TX M, (i) R X (0) = X, (ii) δ > 0 such that [0, δ) t 7 R X (tηx ) is smooth and R X (tηx ) Mr for all t [0, δ), where r is the integer such that r r, ηx TX M r, and d ηx / TX Mr 1, (iii) dt R X (tηx ) t=0 = ηx. It can be shown that a ran-related retraction always exists. Note R X is not necessarily S a retraction on M since it may not be smooth on the tangent bundle TM := X M TX M. The modified Riemannian optimization algorithm is setched as Alg. 1. Generally speaing, η in Step 7 in Alg. 1 is not unique and any one can be used to change to ran r. The ey to efficiency of the algorithm is to choose r correctly and then choose an appropriate η. In [4], a similar idea is developed for M = Rm n and r =. The authors define the search direction as the projection of the negative gradient onto the tangent cone. If M = Rm n and r =, the choice of η in Step 7 is equivalent to the definition in [4, Corollary.]. Main Theoretical Results Suppose the cost function f is continuously differentiable and the algorithm chosen in Step 2 of Alg. 1 has the property of global convergence to critical points of fr, we give the following main results. The proofs are omitted due to space constraints (see [1]). Theorem 1 (Global Result) The sequence q {Xn } generated by Alg. 1 satisfies 1 + ǫ12 ǫ2. lim inf n PTXn M (gradff (Xn )) 1 2 Theorem 2 (Local Result) Let ff be a C function and X be a nondegenerate minimizer of ff with ran r. Suppose ǫ2 > 0. Denote the sequence of iterates generated by Algorithm 1 by {Xn }. There exists a neighborhood of X, UX, such that if D = {X M f (X) f (X0 )} UX ; D is compact; fˆ : TM R : ξ 7 ff R (ξ) is a radially L-C 1 function with sufficient large βrl defined in [, Definition 7.4.1] such that for 1 any X, Y D, R X (Y ) < βrl, then there exists N > 0 such that n > N ran(xn ) = r and Xn UX. 21

Algorithm 1 Modified Riemannian Optimization Algorithm 1: 2: : 4: : 6: 7: 8: 9: 10: 11: 12: 1: 14: 1: 16: 17: 4 for n = 0, 1, 2,... do Apply a Riemannian algorithm (e.g. one of GenRTR [7], RBFGS [8, 9, 11], RTR-SR1 [9, 12]) to approximately optimize f over Mr with initial point Xn and stop at X n Mr, where either gradfr (X n ) < ǫ (flag 1) or X n is close to M r 1 (flag 0); if flag = 1 then if Both Conditions I and II are satisfied then Set r and η to be r and gradfr (X n ) respectively; while gradff (X n ) η > ǫ21 η do Set r to be r + 1 and η argminη TX M r gradff (X n ) ηf ; end while Obtain Xn+1 by applying an Armijo-type line search algorithm along η using a ran-related retraction; else If ǫ is small enough, stop. Otherwise, ǫ τ ǫ, where τ (0, 1); end if else {flag = 0} If the ran has not been increased on any previous iteration, reduce the ran of X n based on truncation while eeping the function value decrease, update r, obtain next iterate Xn+1 ; Else reduce the ran of X n such that the next iterate Xn+1 satisfies f (Xn+1 ) f (Xi ) c(f (Xi+1 ) f (Xi )), where i is such that the latest ran increase was from Xi to Xi+1, 0 < c < 1. Set r to be the ran of Xn+1 ; end if end for Application We illustrate the effectiveness of Alg. 1 for the weighted low-ran approximation problem: min A X2W, f (X) = A X2W = vec{a X}T W vec{a X} X M (2) where M = R80 10, A is given, W R800 800 is a positive definite symmetric weighting matrix and vec{a} denotes the vectorized form of A, i.e., a vector constructed by stacing the consecutive columns of A in one vector. The matrix A is generated by A1 AT2 Rm n, where A1 R80, A2 R10. W = U ΣU T, where U R800 800 is a random orthogonal matrix generated by Matlab s QR and RAND. The mn singular values of the weighting matrix are generated by Matlab function LOGSPACE with condition number 100 and multiplying, element-wise, by a uniform distribution matrix on the interval [0., 1.]. Three values of are considered, one less than the true ran, one equal to the 22

= = =7 method Alg. 1 DMM SULS APM Alg. 1 DMM SULS APM Alg. 1 DMM SULS APM ran.02(99/100) 7(0/100) 7(0/100) f 8.67+01 8.67+01 8.67+01 8.67+01.29 22 1.27 20.06 12 8.281 10.62 17 2.11 17 2.26 12 2.49 10 W Rel Err( A X AW ).108 01.108 01.108 01.108 01 1.09 1.4 12.419 08 8.90 07 1.87 10.740 11 4.77 08 4.86 07 time(sec) 6.162 01.02+00.967+00 9.11+00.19 01.02+00 1.067+00 4.02+00 4.44 01 1.66+00 2.07+00 6.89+00 Table 1: Method comparisons. The number in the parenthesis indicates the fraction of experiments where the numerical ran (number of singular values greater than 10 8 ) found by the algorithm equals the true ran. The subscript ± indicates a scale of 10±. Alg. 1 and SULS, are stopped when the norm of the final gradient on the fixed-ran manifold over the norm of initial full gradient is less than 10 8 while DMM and APM are stopped when the norm of final gradient over the norm of initial gradient is less than 10 7. true ran and one greater than the true ran. Four algorithms are compared: Alg. 1, DMM [14], SULS [4] and APM [6]. The initial point of Alg. 1 and SULS is a randomly generated ran-1 matrix and the initial points of DMM and APM are randomly generated n-by-(n ) and m-by-(m ) matrices respectively. Results shown in Table 1 are the average of 100 runs for different data matrices R, weighting matrices W and initial points. The results show that Alg. 1 with ǫ1 =, ǫ2 = 10 4 (GenRTR is used as the inner algorithm, when the sizes become larger, limited-memory RBFGS [9, 11], for example, may be a better choice) has significant advantages compared with the other methods. It achieves good accuracy in the approximation with less computational time. Furthermore, for Alg. 1, all Riemannian objects, their updates and Xn are based on an efficient three factor, U, D, V, representation, and the singular values are immediately available while, for the other three methods, an additional SVD is required. References [1] B. Mishra, G. Meyer, F. Bach, and R. Sepulchre. Low-ran optimization with trace norm penalty. SIAM Journal on Optimization, 2(4):2124 2149, 201. [2] M. Journe e, F. Bach, P.-A. Absil, and R. Sepulchre. Low-ran optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimiza- 2

tion, 20():227 21, 2010. [] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008. [4] R. Schneider and A. Uschmajew. Convergence results for projected linesearch methods on varieties of low-ran matrices via Lojasiewicz inequality. ArXiv e-prints, February 2014. [] A. Uschmajew and B. Vandereycen. Line-search methods and ran increase on low-ran matrix varieties. In Proceedings of the 2014 International Symposium on Nonlinear Theory and its Applications (NOLTA2014), 2014. [6] W.-S. Lu, S.-C. Pei, and P.-H. Wang. Weighted low-ran approximation of general complex matrices and its application in the design of 2-d digital filters. IEEE Transactions on Circuits and Systems I, 44:60 6, 1997. [7] P.-A. Absil, C. G. Baer, and Kyle A. Gallivan. Trust-region methods on Riemannian manifolds. Foundations of Computational Mathematics, 7():0 0, 2007. [8] W. Ring and B. Wirth. Optimization methods on Riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):96 627, January 2012. [9] W. Huang. Optimization algorithms on Riemannian manifolds with applications. PhD thesis, Florida State University, 201. [10] Donal B. O Shea and Leslie C. Wilson. Limits of tangent spaces to real surfaces. American Journal of Mathematics, 126():pp. 91 980, 2004. [11] Wen Huang, K. A. Gallivan, and P.-A. Absil. A Broyden class of quasi-newton methods for Riemannian optimization. UCL-INMA-2014.01, U.C.Louvain, Submitted for publication, 201. [12] W. Huang, P.-A. Absil, and K. A. Gallivan. A Riemannian symmetric ran-one trust-region method. Mathematical Programming, February 2014. doi:10.1007/s10107-014-076-1. [1] Guifang Zhou. Ran-contrained optimization: A Riemannian approach. PhD thesis, Florida State University. in preparation. [14] I. Brace and J. H. Manton. An improved BFGS-on-manifold algorithm for computing low-ran weighted approximations. In Proceedings of 17th International Symposium on Mathematical Theory of Networs and Systems, pages 17 178, 2006. 24