Optimization on the Grassmann manifold: a case study Konstantin Usevich and Ivan Markovsky Department ELEC, Vrije Universiteit Brussel 28 March 2013 32nd Benelux Meeting on Systems and Control, Houffalize, Belgium
Introduction Optimization on the Grassmann manifold Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) Example: fitting x 1,..., x N R m to a d-dimensional subspace L f 1 (L ) = N ρ(x k, L ) k=1 Examples: computer vision, machine learning, system identification,......, matrix/tensor low-rank factorization/approximation,... 2 of 15
Introduction What is this talk about Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) (Absil, Mahony, Sepulchre 2008): How to define derivatives, how to optimize; atlases, tangent bundles, Riemannian metric,... 3 of 15
Introduction What is this talk about Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) (Absil, Mahony, Sepulchre 2008): How to define derivatives, how to optimize; atlases, tangent bundles, Riemannian metric,... This talk: (re)statement of the problem, reformulations as min X R n f(x) 3 of 15
Introduction: problem (re)statement Problem (re)statement R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. d=1: Rf=R m \{0} 4 of 15
Introduction: problem (re)statement Problem (re)statement R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. Problems: How to impose the constraint R Rf? If R is a minimum, then UR is also a minimum... d=1: Rf=R m \{0} A solution: Reduce the search space 4 of 15
Introduction: problem (re)statement Reduction of the search space R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. How to reduce the search space: 1. RR = I d orthonormal basis represents all subspaces, nonlinear constraint d=1: Rf=R m \{0} 5 of 15
Introduction: problem (re)statement Reduction of the search space R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. d=1: Rf=R m \{0} How to reduce the search space: 1. RR = I d orthonormal basis represents all subspaces, nonlinear constraint 2. R = [ X I d ], where X R d (m d) represents almost all subspaces (all R with nonsingular R :,(m d+1):m ) 5 of 15
Outline Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 5 of 15
Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d 6 of 15
Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15
Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15
Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15
Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m Moreover, for d = 1 we can select Π: X 1,j 1 Question: for d > 1, can we consider only X from a bounded subset of R d (m d)? 6 of 15
Optimization over [X I d ]Π Bounded parametrizations with permutation matrices Theorem (Knuth, 1980) For any R R d m, rank R = d, there exist U and Π such that UR = [ X I d ] Π, Xij 1 for all (i, j) 7 of 15
Optimization over [X I d ]Π Bounded parametrizations with permutation matrices Theorem (Knuth, 1980) For any R R d m, rank R = d, there exist U and Π such that Proof: (sketch) UR = [ X I d ] Π, Xij 1 for all (i, j) 1. Find d d submatrix with maximal determinant (maximal volume) J 0 = arg max det R :,J R = J 2. Take Π 0 that permutes R :,J0 and R :,(m d+1):m R = RΠ 0 = 3. Take [ X I d ] = (R:,J0 ) 1 RΠ 0 By Kramer s rule we have that X i,j 1 7 of 15
Optimization over [X I d ]Π Optimization over permutations is equivalent to Π perm. R Rf f(r) min f ([ ] ) X I d Π X [ 1;1] d (m d) There are m! (m d)! permutations Π... 8 of 15
Optimization over [X I d ]Π Optimization over permutations is equivalent to Π perm. R Rf f(r) min f ([ ] ) X I d Π X [ 1;1] d (m d) There are m! (m d)! permutations Π..... we can switch the permutation in the course of optimization, if X i,j > δ 1 for some i, j. 8 of 15
Optimization with RR = I d Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 8 of 15
Optimization with RR = I d Orthonormal basis R R d m f(r) subject to RR = I d Disadvantage: nonlinear constraint d=1, R m Possible solutions: 1. Lagrange multipliers, 2. penalty: f(r) + γ RR I d 2 F γ? not at all. 9 of 15
Optimization with RR = I d Penalty method for orthonormal bases R Rf f(r) subject to RR = I d where f(r) = f(ur) nonsingular U ( ) Theorem. For any γ > 0, the local minima of R Rf f(r) + γ RR I d 2 F coincide with the local minima of ( ). 10 of 15
Optimization with RR = I d Penalty method for orthonormal bases R Rf f(r) subject to RR = I d where f(r) = f(ur) nonsingular U ( ) Theorem. For any γ > 0, the local minima of R Rf f(r) + γ RR I d 2 F coincide with the local minima of ( ). Proof. Let R = UΣV be an SVD of R. Then f(r) + γ RR I d 2 F f(v ) + γ V V I d 2 F = f(v ) = f(r) 10 of 15
A case study Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 10 of 15
A case study Structured low-rank approximation System identification, model reduction, etc. structured low-rank approximation Structured low-rank approximation: given p R np, S and r < m p p p 2 2 subject to rank S ( p) r. (SLRA) 11 of 15
A case study Structured low-rank approximation System identification, model reduction, etc. structured low-rank approximation Structured low-rank approximation: given p R np, S and r < m p p p 2 2 subject to rank S ( p) r. (SLRA) Kernel representation + variable projection: R R d m : rank R=d f(r) := ( min p p p 2 2 subject to RS ( p) = 0 ), where d = m r. 11 of 15
A case study Structured low-rank approximation: an example Hankel SLRA: given w = (w 1,..., w n ), g( [ ] r 1 r 2 r 3 ) := min w ŵ 2 2 subject to [ ] ŵ 1 ŵ 2 ŵ 3 ŵ n 2 r1 r 2 r 3 ŵ 2 ŵ 3 ŵ n 2 ŵ n 1 = 0 ŵ 3 ŵ n 2 ŵ n 1 ŵ n min f(r) solves the SLRA problem f(αr) = f(r) α 0 12 of 15
A case study: f([x 1 x 2 1]) (3 10 Hankel matrix) 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 13 of 15
A case study Some results Mosaic Hankel low-rank approximation (LTI system identification of DAISY datasets), approximation errors are measured as 100% (1 p p 2 2/ p 2 2) t. # [X I d ] [X I d ]Π genrtr RR = I d RR I d 2 F 1 99.96 99.96 99.96 99.96 99.96 2 99.93 99.93 99.92 99.92 99.91 3 98.66 98.66 98.66 97.8 98.58 4 96.45 96.45 95.69 95.71 95.69 5 90.26 90.27 86.46 85.69 82.71 Table: Percentage fits of the methods 14 of 15
Conclusions Conclusions 2 reformulations as optimization on Euclidean space freedom in choosing the optimization method Issues: bound on the number of switches, choosing regularization parameter... References: I. Markovsky, K. Usevich (in press), Structured low-rank approximation with missing values, SIAM J. Matrix Anal. Appl. K. Usevich, I. Markovsky (2012), Global and local optimization methods for structured low-rank approximation, submitted. http://github.com/slra/slra/ software and reproducible experiments 15 of 15