Optimization on the Grassmann manifold: a case study

Similar documents
Using Hankel structured low-rank approximation for sparse signal recovery

Structured low-rank approximation with missing data

Data-driven signal processing

A software package for system identification in the behavioral setting

Two well known examples. Applications of structured low-rank approximation. Approximate realisation = Model reduction. System realisation

A missing data approach to data-driven filtering and control

ELEC system identification workshop. Behavioral approach

Generalized Power Method for Sparse Principal Component Analysis

A comparison between structured low-rank approximation and correlation approach for data-driven output tracking

Sparsity in system identification and data-driven control

Generalized Shifted Inverse Iterations on Grassmann Manifolds 1

Low-rank approximation and its applications for data fitting

STUDY of genetic variations is a critical task to disease

REGULARIZED STRUCTURED LOW-RANK APPROXIMATION WITH APPLICATIONS

Chapter -1: Di erential Calculus and Regular Surfaces

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Learning gradients: prescriptive models

ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering

A structured low-rank approximation approach to system identification. Ivan Markovsky

Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis

1 Differentiable manifolds and smooth maps. (Solutions)

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

An extrinsic look at the Riemannian Hessian

EECS 275 Matrix Computation

Invariant Subspace Computation - A Geometric Approach

Supervised Dimension Reduction:

Structured Matrices and Solving Multivariate Polynomial Equations

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

The (extended) rank weight enumerator and q-matroids

distances between objects of different dimensions

Computation. For QDA we need to calculate: Lets first consider the case that

Inverse and Implicit Mapping Theorems (Sections III.3-III.4)

Improved initial approximation for errors-in-variables system identification

Problems in Linear Algebra and Representation Theory

i = f iα : φ i (U i ) ψ α (V α ) which satisfy 1 ) Df iα = Df jβ D(φ j φ 1 i ). (39)

Introduction to Machine Learning

Book of Abstracts - Extract th Annual Meeting. March 23-27, 2015 Lecce, Italy. jahrestagung.gamm-ev.de

OPTIMALITY CONDITIONS FOR GLOBAL MINIMA OF NONCONVEX FUNCTIONS ON RIEMANNIAN MANIFOLDS

Variational inequalities for set-valued vector fields on Riemannian manifolds

Math Advanced Calculus II

Principal Component Analysis

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

1 Differentiable manifolds and smooth maps. (Solutions)

DISSERTATION MEAN VARIANTS ON MATRIX MANIFOLDS. Submitted by. Justin D Marks. Department of Mathematics. In partial fulfillment of the requirements

Subset selection for matrices

A Least Squares Formulation for Canonical Correlation Analysis

STA414/2104 Statistical Methods for Machine Learning II

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Principal Components Analysis. Sargur Srihari University at Buffalo

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

On Weighted Structured Total Least Squares

The Singular Value Decomposition

The nonsmooth Newton method on Riemannian manifolds

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Singular Value Decomposition (SVD) and Polar Form

Directional Field. Xiao-Ming Fu

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

SYLLABUS. 1 Linear maps and matrices

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Stochastic dynamical modeling:

Before you begin read these instructions carefully.

Statistical Pattern Recognition

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

225A DIFFERENTIAL TOPOLOGY FINAL

Realization and identification of autonomous linear periodically time-varying systems

ECON 4117/5111 Mathematical Economics

Math Topology II: Smooth Manifolds. Spring Homework 2 Solution Submit solutions to the following problems:

Lecture 8: Linear Algebra Background

Geometry on Probability Spaces

Statistical Pattern Recognition

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

ON ANGLES BETWEEN SUBSPACES OF INNER PRODUCT SPACES

The multiple-vector tensor-vector product

Reduced-Rank Hidden Markov Models

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Lecture 4 Orthonormal vectors and QR factorization

Institute for Computational Mathematics Hong Kong Baptist University

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Functional Analysis Review

Topological complexity of motion planning algorithms

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

Linear Dimensionality Reduction

1 Determinants. 1.1 Determinant

REFLECTIONS IN A EUCLIDEAN SPACE

ELEC system identification workshop. Subspace methods

Fantope Regularization in Metric Learning

STA141C: Big Data & High Performance Statistical Computing

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Algorithms to Compute Bases and the Rank of a Matrix

Mathematical foundations - linear algebra

Stochastic gradient descent on Riemannian manifolds

Problem # Max points possible Actual score Total 120

Math 147, Homework 1 Solutions Due: April 10, 2012

DEVELOPMENT OF MORSE THEORY

MATHEMATICS COMPREHENSIVE EXAM: IN-CLASS COMPONENT

Extra exercises Analysis in several variables

Transcription:

Optimization on the Grassmann manifold: a case study Konstantin Usevich and Ivan Markovsky Department ELEC, Vrije Universiteit Brussel 28 March 2013 32nd Benelux Meeting on Systems and Control, Houffalize, Belgium

Introduction Optimization on the Grassmann manifold Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) Example: fitting x 1,..., x N R m to a d-dimensional subspace L f 1 (L ) = N ρ(x k, L ) k=1 Examples: computer vision, machine learning, system identification,......, matrix/tensor low-rank factorization/approximation,... 2 of 15

Introduction What is this talk about Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) (Absil, Mahony, Sepulchre 2008): How to define derivatives, how to optimize; atlases, tangent bundles, Riemannian metric,... 3 of 15

Introduction What is this talk about Grassmann manifold: Gr(d, m) := { d-dimensional subspaces of R m} f(l ) L Gr(d,m) (Absil, Mahony, Sepulchre 2008): How to define derivatives, how to optimize; atlases, tangent bundles, Riemannian metric,... This talk: (re)statement of the problem, reformulations as min X R n f(x) 3 of 15

Introduction: problem (re)statement Problem (re)statement R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. d=1: Rf=R m \{0} 4 of 15

Introduction: problem (re)statement Problem (re)statement R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. Problems: How to impose the constraint R Rf? If R is a minimum, then UR is also a minimum... d=1: Rf=R m \{0} A solution: Reduce the search space 4 of 15

Introduction: problem (re)statement Reduction of the search space R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. How to reduce the search space: 1. RR = I d orthonormal basis represents all subspaces, nonlinear constraint d=1: Rf=R m \{0} 5 of 15

Introduction: problem (re)statement Reduction of the search space R R d m f(r) subject to R R f, where Rf := {R R d m rank R = d}, and f(r) = f(ur) nonsingular U R d d. d=1: Rf=R m \{0} How to reduce the search space: 1. RR = I d orthonormal basis represents all subspaces, nonlinear constraint 2. R = [ X I d ], where X R d (m d) represents almost all subspaces (all R with nonsingular R :,(m d+1):m ) 5 of 15

Outline Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 5 of 15

Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d 6 of 15

Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15

Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15

Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m 6 of 15

Optimization over [X I d ]Π Parametrizations with permutation matrices [ X Id ] do not represent all d-dimensional subspaces Solution: consider all permutations Π R m m and matrices of the form [ X I d ] Π, Corresponds to fixing R :,J = I d d=1, R m Moreover, for d = 1 we can select Π: X 1,j 1 Question: for d > 1, can we consider only X from a bounded subset of R d (m d)? 6 of 15

Optimization over [X I d ]Π Bounded parametrizations with permutation matrices Theorem (Knuth, 1980) For any R R d m, rank R = d, there exist U and Π such that UR = [ X I d ] Π, Xij 1 for all (i, j) 7 of 15

Optimization over [X I d ]Π Bounded parametrizations with permutation matrices Theorem (Knuth, 1980) For any R R d m, rank R = d, there exist U and Π such that Proof: (sketch) UR = [ X I d ] Π, Xij 1 for all (i, j) 1. Find d d submatrix with maximal determinant (maximal volume) J 0 = arg max det R :,J R = J 2. Take Π 0 that permutes R :,J0 and R :,(m d+1):m R = RΠ 0 = 3. Take [ X I d ] = (R:,J0 ) 1 RΠ 0 By Kramer s rule we have that X i,j 1 7 of 15

Optimization over [X I d ]Π Optimization over permutations is equivalent to Π perm. R Rf f(r) min f ([ ] ) X I d Π X [ 1;1] d (m d) There are m! (m d)! permutations Π... 8 of 15

Optimization over [X I d ]Π Optimization over permutations is equivalent to Π perm. R Rf f(r) min f ([ ] ) X I d Π X [ 1;1] d (m d) There are m! (m d)! permutations Π..... we can switch the permutation in the course of optimization, if X i,j > δ 1 for some i, j. 8 of 15

Optimization with RR = I d Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 8 of 15

Optimization with RR = I d Orthonormal basis R R d m f(r) subject to RR = I d Disadvantage: nonlinear constraint d=1, R m Possible solutions: 1. Lagrange multipliers, 2. penalty: f(r) + γ RR I d 2 F γ? not at all. 9 of 15

Optimization with RR = I d Penalty method for orthonormal bases R Rf f(r) subject to RR = I d where f(r) = f(ur) nonsingular U ( ) Theorem. For any γ > 0, the local minima of R Rf f(r) + γ RR I d 2 F coincide with the local minima of ( ). 10 of 15

Optimization with RR = I d Penalty method for orthonormal bases R Rf f(r) subject to RR = I d where f(r) = f(ur) nonsingular U ( ) Theorem. For any γ > 0, the local minima of R Rf f(r) + γ RR I d 2 F coincide with the local minima of ( ). Proof. Let R = UΣV be an SVD of R. Then f(r) + γ RR I d 2 F f(v ) + γ V V I d 2 F = f(v ) = f(r) 10 of 15

A case study Outline Introduction: problem (re)statement Optimization over [X I d ]Π Optimization with RR = I d A case study 10 of 15

A case study Structured low-rank approximation System identification, model reduction, etc. structured low-rank approximation Structured low-rank approximation: given p R np, S and r < m p p p 2 2 subject to rank S ( p) r. (SLRA) 11 of 15

A case study Structured low-rank approximation System identification, model reduction, etc. structured low-rank approximation Structured low-rank approximation: given p R np, S and r < m p p p 2 2 subject to rank S ( p) r. (SLRA) Kernel representation + variable projection: R R d m : rank R=d f(r) := ( min p p p 2 2 subject to RS ( p) = 0 ), where d = m r. 11 of 15

A case study Structured low-rank approximation: an example Hankel SLRA: given w = (w 1,..., w n ), g( [ ] r 1 r 2 r 3 ) := min w ŵ 2 2 subject to [ ] ŵ 1 ŵ 2 ŵ 3 ŵ n 2 r1 r 2 r 3 ŵ 2 ŵ 3 ŵ n 2 ŵ n 1 = 0 ŵ 3 ŵ n 2 ŵ n 1 ŵ n min f(r) solves the SLRA problem f(αr) = f(r) α 0 12 of 15

A case study: f([x 1 x 2 1]) (3 10 Hankel matrix) 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 13 of 15

A case study Some results Mosaic Hankel low-rank approximation (LTI system identification of DAISY datasets), approximation errors are measured as 100% (1 p p 2 2/ p 2 2) t. # [X I d ] [X I d ]Π genrtr RR = I d RR I d 2 F 1 99.96 99.96 99.96 99.96 99.96 2 99.93 99.93 99.92 99.92 99.91 3 98.66 98.66 98.66 97.8 98.58 4 96.45 96.45 95.69 95.71 95.69 5 90.26 90.27 86.46 85.69 82.71 Table: Percentage fits of the methods 14 of 15

Conclusions Conclusions 2 reformulations as optimization on Euclidean space freedom in choosing the optimization method Issues: bound on the number of switches, choosing regularization parameter... References: I. Markovsky, K. Usevich (in press), Structured low-rank approximation with missing values, SIAM J. Matrix Anal. Appl. K. Usevich, I. Markovsky (2012), Global and local optimization methods for structured low-rank approximation, submitted. http://github.com/slra/slra/ software and reproducible experiments 15 of 15