Subspace Arrangements in Theory and Practice

Size: px

Start display at page:

Download "Subspace Arrangements in Theory and Practice"

Matilda Skinner
6 years ago
Views:

1 Subspace Arrangements in Theory and Practice Robert M. Fossum Department of Mathematics Beckman Institute University of Illinios at Urbana-Champaign Applications in Biology, Dynamics, and Statistics 09 March 2007

2 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

3 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

4 Abstract Imagery A subspace arrangement is a union of a finite number of subspaces of a vector space. We will discuss the importance of subspace arrangements first as mathematical objects and now as a popular class of models for engineering.

5 Abstract Imagery A subspace arrangement is a union of a finite number of subspaces of a vector space. We will discuss the importance of subspace arrangements first as mathematical objects and now as a popular class of models for engineering. We will then introduce some of new theoretical results that were motivated from practice. Using these results we will address the computational issue about how to extract subspace arrangements from noisy or corrupted data.

6 Abstract Imagery A subspace arrangement is a union of a finite number of subspaces of a vector space. We will discuss the importance of subspace arrangements first as mathematical objects and now as a popular class of models for engineering. We will then introduce some of new theoretical results that were motivated from practice. Using these results we will address the computational issue about how to extract subspace arrangements from noisy or corrupted data. Finally we will turn to the importance of subspace arrangements by briefly discussing the connections to sparse representations, manifold learning, etc...

7 Collaborators and Support Primary Collaborators Yi Ma, ECE, UIUC Allen Yang, ECE, UIUC and EECS, UC Berkeley Harm Derksen, Math, U Michigan Collaborators René Vidal, Johns Hopkins Kun Huang, OSU John Wright, ECE, UIUC Shankar Rao, ECE, UIUC Drew Wagner, ECE, UIUC Support UIUC ECE, CSL, and CSE. NSF CAREER, CRS, CCF ONR

8 Problems and Motivation Goal Process, represent, store, retrieve, and interpret multidimensional and multivariate data. Example Imagery Multivariate data (extracted from images and videos) tend to be inhomogeneous and multi-modal. Often desirable to segment such data into uni-modal or homogeneous subsets, and model each subset with a different model. Linear Models are easiest to use but of course our data is usually not linear.

9 Problems and Motivation Goal Process, represent, store, retrieve, and interpret multidimensional and multivariate data.

Example (Reconstruction of Dynamic Scenes) Segment

homographies are (c) 3D rigid-bodies are bilinear forms

10 Example (Reconstruction of Dynamic Scenes) Segment feature points that belong to objects that have different 2D or 3D motions. (a) 2D translations are hyperplanes in C 2 (b) 3D homographies are (c) 3D rigid-bodies are bilinear forms in R bilinear forms in C Figure: Subspace constraints in pairwise images [Vidal & Ma, ECCV 2004]

11 Example (Reconstruction of Dynamic Scenes) Segment feature points that belong to objects that have different 2D or 3D motions. parking-lot movie Figure: Stacked image features in an affine camera sequence lie in 4-D subspaces [Kanatani 2003]

12 DNA Sequences Clustering of DNA sequences Figure: Helicobacter pylori Zinovyev, Andreï: Visualizing the spatial structure of triplet distributions in genetic texts, IHES Technical Report 2002.

13 DNA Sequences Clustering of DNA sequences

14 Example Finger Print Identification Example of finger print data (a) Final Result (b) Original Figure: pk/research/matlabfns/fingerprints/docs/

15 Introduction Subspace Arrangements Noise Robust GPCA Other Recent Advances Future Image Segmentation Segment an image into regions of homogeneous texture. (a) GPCA (without post-processing) (b) Human Reference: Huang, Ma, & Vidal. Minimum Effective Dimension for Mixtures of Subspaces. CVPR, IMA 09 March 2007 Robert M. Fossum Subspace Arrangements in Theory and Practice

Segment a video sequence into clips of homogeneous scenes.

Segments Reference: 1. Huang, Wagner, & Ma.

16 Introduction Subspace Arrangements Noise Robust GPCA Other Recent Advances Future Video Segmentation Event Detection Segment a video sequence into clips of homogeneous scenes. PingPong movie (c) Ping-Pong sequence (d) First three PCs (e) Segments Reference: 1. Huang, Wagner, & Ma. Hybrid linear system identification. CDC, Vidal & Ravichandran. Segmentation and Optical Flow for Multiple Moving Dynamic Textures. CVPR, IMA 09 March 2007 Robert M. Fossum Subspace Arrangements in Theory and Practice

studies on a large class of multivariate mixed data require

multiple subsets with simpler quantitative models.

Linear Switching Systems. Human Kinematics.

17 Introduction Subspace Arrangements Noise Robust GPCA Other Recent Advances Future Other Applications Scientific studies on a large class of multivariate mixed data require effective segmentation methods to represent the data into multiple subsets with simpler quantitative models. Face Recognition. Hyperspectral Images. Linear Switching Systems. Human Kinematics. Handwritten Digits. trackers movie IMA 09 March 2007 Robert M. Fossum Subspace Arrangements in Theory and Practice

18 Segmentation? Example (What does segmentation mean?) Is there a formal mathematical definition of segmentation? What should the proper criterion for segmentation be? How should we measure the gain or loss of the segmentation? First choose a simple class of models with which each of the subsets is supposed to fit. Some of the popular models are either probabilistic distributions (e.g., Gaussian distributions) or geometric/algebraic sets (e.g., linear subspaces). Then the complete set of mixed data is assumed to consist of samples drawn from a mixture of such probabilistic distributions [5, 15] or geometric/algebraic sets [17]. Then estimate the mixture of all the models and simultaneously or subsequently decompose them into individual ones. Thus data segmentation is essentially identified with a (mixture) model estimation problem. Segmenting the data and estimating the model are therefore coupled together. Ma, Derksen, et al. Lossy Coding and segmentation of multivariate mixed data. Tech, Report

19 Method of Attack Model Find a model that fits the data and that fits the application. The goal of dimension reduction is to find a low-dimensional model for the high-dimensional data points. This problem has been studied for at least a century. One of the most commonly used technique is principal component analysis (PCA) (aka the Karhunen-Loéve transform (KLT). Given a finite set of data points, the goal is to find a low-dimensional linear model such as a subspace to fit the data points. Mathematically this is achieved by performing singular value decomposition on the matrix of data points to obtain the basis for the subspace. PCA (or SVD) has been widely used. In bioinformatics it has been frequently used to cluster sequences, microarray data, or even spectroscopy signals.

20 Method of Attack Model Find a model that fits the data and that fits the application. The goal of dimension reduction is to find a low-dimensional model for the high-dimensional data points. This problem has been studied for at least a century. One of the most commonly used technique is principal component analysis (PCA) (aka the Karhunen-Loéve transform (KLT). Given a finite set of data points, the goal is to find a low-dimensional linear model such as a subspace to fit the data points. Mathematically this is achieved by performing singular value decomposition on the matrix of data points to obtain the basis for the subspace. PCA (or SVD) has been widely used. In bioinformatics it has been frequently used to cluster sequences, microarray data, or even spectroscopy signals.

21 Method of Attack Model Find a model that fits the data and that fits the application. The goal of dimension reduction is to find a low-dimensional model for the high-dimensional data points. This problem has been studied for at least a century. One of the most commonly used technique is principal component analysis (PCA) (aka the Karhunen-Loéve transform (KLT). Given a finite set of data points, the goal is to find a low-dimensional linear model such as a subspace to fit the data points. Mathematically this is achieved by performing singular value decomposition on the matrix of data points to obtain the basis for the subspace. PCA (or SVD) has been widely used. In bioinformatics it has been frequently used to cluster sequences, microarray data, or even spectroscopy signals.

22 Method of Attack Model Find a model that fits the data and that fits the application. The goal of dimension reduction is to find a low-dimensional model for the high-dimensional data points. This problem has been studied for at least a century. One of the most commonly used technique is principal component analysis (PCA) (aka the Karhunen-Loéve transform (KLT). Given a finite set of data points, the goal is to find a low-dimensional linear model such as a subspace to fit the data points. Mathematically this is achieved by performing singular value decomposition on the matrix of data points to obtain the basis for the subspace. PCA (or SVD) has been widely used. In bioinformatics it has been frequently used to cluster sequences, microarray data, or even spectroscopy signals.

23 Method of Attack Model Find a model that fits the data and that fits the application. The goal of dimension reduction is to find a low-dimensional model for the high-dimensional data points. This problem has been studied for at least a century. One of the most commonly used technique is principal component analysis (PCA) (aka the Karhunen-Loéve transform (KLT). Given a finite set of data points, the goal is to find a low-dimensional linear model such as a subspace to fit the data points. Mathematically this is achieved by performing singular value decomposition on the matrix of data points to obtain the basis for the subspace. PCA (or SVD) has been widely used. In bioinformatics it has been frequently used to cluster sequences, microarray data, or even spectroscopy signals.

Principal Component Analysis (PCA) and Extensions PCA Problem: Dimension reduction for high-dimensional data to a low-dimensional subspace [Pearson 1901, Eckart-Yong 1930, Jolliffe 2002] Solution:

24 Principal Component Analysis (PCA) and Extensions PCA Problem: Dimension reduction for high-dimensional data to a low-dimensional subspace [Pearson 1901, Eckart-Yong 1930, Jolliffe 2002] Solution: Singular Value Decomposition (SVD) UΣV T = svd([x 1, x 2,..., x N ]). Extensions to PCA: Nonlinear Kernel PCA [Scholkopf-Smola-Muller 1998] Probabilistic PCA [Tipping-Bishop 1999, Collins et al. 2001] Higher-order SVD [Tucker 1966, Davis 2002] Independent Component Analysis (ICA) [Hyvarinen-Karhunen-Oja 2001] Punch line All of them do not have the capacity to estimate multiple low-dimensional models.

Iterative Schemes to the Subspace-Segmentation Problem: EM-Type Solutions: EM [Dempster et al. 1977] & K-Subspaces [Ho et al. 2003] Complete variable: (x, z) R D N, where z is the missing membership.

2 M step: Use the expected values w (m) ik, compute ˆΘ (m+1). Difficulties: Sensitive to the initialization (greedy).

25 Iterative Schemes to the Subspace-Segmentation Problem: EM-Type Solutions: EM [Dempster et al. 1977] & K-Subspaces [Ho et al. 2003] Complete variable: (x, z) R D N, where z is the missing membership. Model parameters: Θ = {θ 1,..., θ n}, where θ i = (B i, σ i, π i ) for each V i, B i is the basis, and π i = p(z = i). 1 E step: Calculate w (m) ik. = p(z k = i x k, ˆΘ (m) ). 2 M step: Use the expected values w (m) ik, compute ˆΘ (m+1). Difficulties: Sensitive to the initialization (greedy). Random Sample Consensus (RANSAC): [Fischler & Bolles 1981, Torr 1998, Forsyth et al. 2001, Bartoli 2001, Schindler & Suter 2005] 1 Sample a (minimal) subset to estimate an estimate of the parameters. 2 Compute the consensus of the estimated parameters with the total sample set. Difficulties: 1 Higher-dimensional models overfit samples on lower-dimensional subspaces. 2 Lower-dimensional models may fit samples at intersections.

26 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

27 Subspace Arrangements Definition Let W be a vector space over a scalar field F of dimension D. A subspace arrangement A in W is a union of subspaces V 1, V 2,..., V m of W. A = V 1 V 2 V m (1) Definition For each subset S {1, 2,..., m} let V S = s S V s (2) n S = dim V S (3) c S = dim W dim V S. (4) The last integer is called the codimenson of V S.

28 Subspace Arrangements Definition Let W be a vector space over a scalar field F of dimension D. A subspace arrangement A in W is a union of subspaces V 1, V 2,..., V m of W. A = V 1 V 2 V m (1) Transversal Arrangements Definition The arrangmement is said to be transversal if c S = min(dim W, X c i ) (5) i S for all non-empty subsets S of {1, 2,..., m}. This means that the intersections of the subspaces are as small as possible.

29 Subspace Arrangements Definition Let W be a vector space over a scalar field F of dimension D. A subspace arrangement A in W is a union of subspaces V 1, V 2,..., V m of W. A = V 1 V 2 V m (1) Simple Case of two subspaces Suppose V 1, V 2 are subspaces of W. Then or, in terms of codimensions, dim V 1 + dim V 2 = dim(v 1 + V 2 ) + dim(v 1 V 2 ) codimv 1 + codimv 2 = codim(v 1 + V 2 ) + codim(v 1 V 2 ). We conclude that when D codim(v 1 ) + codim(v 2 ), the arrangement is transversal if V 1 V 2 = 0. When D > codim(v 1 ) + codim(v 2 ), then it is transveral if V 1 + V 2 = W.

30 Subspace Arrangements Definition Let W be a vector space over a scalar field F of dimension D. A subspace arrangement A in W is a union of subspaces V 1, V 2,..., V m of W. A = V 1 V 2 V m (1) Problem Given data {z 1, z 2,..., z N } R D, find a subspace arrangement A, the constituent subspaces V 1,..., V m, their bases, and then the segmentation of the data into these constituent subspaces.

31 In the first frame we see a collection of sample points that are arranged on a subspace arrangement pictured in the second frame. The third frame illustrates the separated subspaces.

32 Algebraic sets and Vanishing Ideals Definitions The vector space W = R D. The algebra of multivariate polynomial functions on W is R [D]. = R[X1, X 2,..., X D ]. A function q(x 1,..., X D ) R [D] is homogeneous of degree h if it a linear combination of monomials of the form where h = e 1 + e 2 + e D. X e 1 1 X e 2 2 X e D D The subspace of homogeneous functions of degree h is denoted by R [D] h. There is a decomposition R [D] = R [D] 0 R [D] 1 R [D] h. The dimension dim(r [D] h D 1 + h ) =. h

33 Algebraic sets and Vanishing Ideals Definitions The vector space W = R D. The algebra of multivariate polynomial functions on W is R [D]. = R[X1, X 2,..., X D ]. A function q(x 1,..., X D ) R [D] is homogeneous of degree h if it a linear combination of monomials of the form where h = e 1 + e 2 + e D. X e 1 1 X e 2 2 X e D D The subspace of homogeneous functions of degree h is denoted by R [D] h. There is a decomposition R [D] = R [D] 0 R [D] 1 R [D] h. The dimension dim(r [D] h D 1 + h ) =. h

34 Algebraic sets and Vanishing Ideals Definitions The vector space W = R D. The algebra of multivariate polynomial functions on W is R [D]. = R[X1, X 2,..., X D ]. A function q(x 1,..., X D ) R [D] is homogeneous of degree h if it a linear combination of monomials of the form where h = e 1 + e 2 + e D. X e 1 1 X e 2 2 X e D D The subspace of homogeneous functions of degree h is denoted by R [D] h. There is a decomposition R [D] = R [D] 0 R [D] 1 R [D] h. The dimension dim(r [D] h D 1 + h ) =. h

35 Algebraic sets and Vanishing Ideals Definitions The vector space W = R D. The algebra of multivariate polynomial functions on W is R [D]. = R[X1, X 2,..., X D ]. A function q(x 1,..., X D ) R [D] is homogeneous of degree h if it a linear combination of monomials of the form where h = e 1 + e 2 + e D. X e 1 1 X e 2 2 X e D D The subspace of homogeneous functions of degree h is denoted by R [D] h. There is a decomposition R [D] = R [D] 0 R [D] 1 R [D] h. The dimension dim(r [D] h D 1 + h ) =. h

36 Algebraic sets and Vanishing Ideals Definitions The vector space W = R D. The algebra of multivariate polynomial functions on W is R [D]. = R[X1, X 2,..., X D ]. A function q(x 1,..., X D ) R [D] is homogeneous of degree h if it a linear combination of monomials of the form where h = e 1 + e 2 + e D. X e 1 1 X e 2 2 X e D D The subspace of homogeneous functions of degree h is denoted by R [D] h. There is a decomposition R [D] = R [D] 0 R [D] 1 R [D] h. The dimension dim(r [D] h D 1 + h ) =. h

37 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

38 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

39 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

40 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

41 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

42 Vanishing Polynomials More Definitions If X W, then a vanishing polynomial on X is a polynomial f R [D]. = R[X1,..., X D ] such that f (x) = 0 for all x X. The vanishing ideal consists of all vanishing polynomials. They form an ideal I(X) = {f R [D] f (x) = 0, for all x X} The ideal generated by the polynomials {f 1,..., f n} is denoted by (f 1,..., f n) and is the set {f 1 g f ng n : g i R [D] for all i}. The zero set of a subset F of R [D] is Z(F ) = {x R D : f (x) = 0 for all f F }. An ideal I in R [D] is said to be homogeneous if the homogeneous component of any function in I is also in I. If I is homogeneous, then there is a decomposition where I h = I R [D]. I = I 0 I 1 I h

43 Introduction Subspace Arrangements Noise Robust GPCA Other Recent Advances Future Example Models {0} Ambient Space Geometry Zero Sets {0} V = {x nti x = 0, i = 1,..., c} Algebra Vanishing Ideals I(0) = {f (0) = 0 : f R[D] } ` I(V ) = nt1 x,..., ntc x A = V1 Vn RD I(A) = I(V1 ) I(Vn ) I(RD ) = {0} IMA 09 March 2007 Robert M. Fossum Subspace Arrangements in Theory and Practice

44 Algebraic Sets and their Vanishing Polynomials Basics Theorem (Correspondence between Algebraic Sets and Ideals) Suppose that X R D and that J R [D] is an ideal. Then Z(I(X)) X I(Z(J)) J with equality when X = Z(J) for some ideal J or when J = I(X) for some subset X. Definition A set of the form Z(J) is called an algebraic set.

45 Examples Example (1) Suppose V is a subspace of W (so a simple subspace arrangement). Then V is the set of solutions of a homogeneous system of linear equations. If dim V = n, then the system of equations has codimv linearly independent equations. The corresponding ideal of vanishing functions is the ideal generated by these linear forms. Denote this ideal by I(V ). This is a homogeneous ideal, generated by its homogeneous forms of degree 1. So I(V ) = I 1 (V ) I 2 (V ) where 0 1 x 1 x 2 I 1 (V ) = {c 1 X 1 + c 2 X c D X D c 1 x 1 + c D x D = 0 for all B A V }. x D Example (2) If A is the subspace arrangement A = V 1 V m, then I(A) = I(V 1 ) I(V m).

46 Hilbert Functions Purpose We are given a data set {z 1, z 2,..., z N } whose vanishing ideal we wish to find. We assume or know that the vanishing ideal is homogeneous. We would like to know the dimensions of homogeneous components of this ideal. This dimension has been studied intensely for at least a century. Definition (Hilbert Function) Suppose J is a homogeneous ideal. The function H J : N N with value H J (h) = dim J h is called the Hilbert Function of J.

47 Hilbert Series Putting things together Definition Suppose J is a homogeneous ideal. The Hilbert Series for J, denoted by H J (t) is defined to the the formal power series H J (t) = X h 0 H J (h)t h. Example Let the ideal be the algebra R [D] = R[X 1,..., X D ]. Then D 1 + h H R [D](h) = h H R [D](t) = X D 1 + h t h h h 0 = 1 (1 t) D.

48 Examples Example (of a Subspace) Suppose V is a subspace of R D of dimension d and codimension c = D d. Then I(V ) is generated as an ideal by V. It is relatively easy to see that the Hilbert series of this ideal is. H I(V ) (t) = = 1 (1 t) D 1 (1 t) d 1 (1 t)c (1 t) D

49 Examples Example (of a Subspace Arrangement) Now suppose A = V 1 V m is a subspace arrangement. It follows that I(A) = I(V 1 ) I(V m). This ideal is homogeneous. Another ideal that is also important for us is the product ideal P(A) = I(V 1 )I(V 2 ) I(V m). The reason for this is that the zeros of this ideal comprise the subspace arrangement. Theorem (Ma et.al) The zero set of the product ideal above is A. Z(P(A)) = A.

50 Examples The Hilbert Series for the product ideal is determined by the codimensions of the intersections. Theorem (Derksen) Let A = V 1 V 2 V m be a subspace arrangement. Then the Hilbert Function of the product ideal is a function of the codimensions of the individual subspaces and is given by H P(A) (h) = X ( 1) T D 1 c T + h, D 1 c T T where the sum is over all subsets T {1, 2,..., m} (including the empty subset) and c T = P j T c j. When the arrangement is transversal even more can be said. Theorem (Derksen) If the subspace arrangement is transversal, then the Hilbert functions for the vanishing ideal and the product ideal agree in all dimensions where they are defined. In particular for all h m. I h (A) = P h (A)

51 Product Ideal and Vanishing Ideal This equality tell us that the homogeneous components of the vanishing ideal I h (A) and the product ideal P h (A) agree for all h m in the case of a transversal arrangement. Knowing the values of the Hilbert Functions facilitates the task of finding the correct subspace arrangement model for a given data set. This is especially important if the set consists of noisy or corrupted data. This formula was found by Harm Derksen (U Michigan).

52 Product Ideal and Vanishing Ideal This equality tell us that the homogeneous components of the vanishing ideal I h (A) and the product ideal P h (A) agree for all h m in the case of a transversal arrangement. Knowing the values of the Hilbert Functions facilitates the task of finding the correct subspace arrangement model for a given data set. This is especially important if the set consists of noisy or corrupted data. This formula was found by Harm Derksen (U Michigan).

53 Product Ideal and Vanishing Ideal This equality tell us that the homogeneous components of the vanishing ideal I h (A) and the product ideal P h (A) agree for all h m in the case of a transversal arrangement. Knowing the values of the Hilbert Functions facilitates the task of finding the correct subspace arrangement model for a given data set. This is especially important if the set consists of noisy or corrupted data. This formula was found by Harm Derksen (U Michigan).

54 Table of Hilbert Functions Suppose that A = V 1 V 2 V 3 is a transversal arrangement in R 4. Let d 1, d 2, d 3 (respectively c 1, c 2, c 3 ) be the dimensions (resp. codimensions) of V 1, V 2, V 3. We make a table of h I (h) for h = 3, 4, 5. c 1, c 2, c 3 d 1, d 2, d 3 h I (3) h I (4) h I (5) 1, 1, 1 3, 3, , 1, 2 3, 3, , 1, 3 3, 3, , 2, 2 3, 3, , 2, 3 3, 2, , 3, 3 3, 1, , 2, 2 2, 2, , 2, 3 2, 2, , 3, 3 2, 1, , 3, 3 1, 1, Note that the codimensions c 1, c 2, c 3 are almost determined by h I (3). They are uniquely determined h I (3) and h I (4).

55 Veronese Maps Definition (Veronese Map) For each integer h we define the Veronese of order h by ν h : R D R D 1+h h 0 a h a a h a 2 a 2 ν h B A =. a h 1 1 a. D a D C. A ad h The entries on the right are the monomials in the a i of degree h arranged in some order.

56 Veronese Maps D = 3, h = 3; D = 4, h = 2 0 a a1 2a 2 a1 2a 3 0 ν a 1 a 1 a a 2 A = a 1 a 2 a 3 a 3 a 1 a3 2 a2 3 B a2 a 3 C a 2 a3 2 A a3 3 0 a a 1 a a 1 a 3 a 1 a 1 a 4 ν 2 Ba 2 a 3 A = a 2 2 a 2 a 3 a 4 a 2 a 4 B a3 2 a 3 a 4 A a4 2

57 Homogeneous Polynomials A homogeneous polynomial of degree h is an element where c R c T ν h (X) = X c e1 e D X e 1 1 X e 2 2 X e D D e 1 + e e D = h D 1+h h and X = (X 1,..., X D ) T.

58 Finding Vanishing Polynomials from a Data Set Z = {z 1,..., z N } is a given data set. A basis for the vector space of homogeneous polynomial functions of degree h that vanish on Z is found by finding a basis for the left null space of the `D 1+h h N matrix ` ν h (z 1 ) ν h (z 2 ) ν h (z N )

59 Why knowing the value of h I (n) is crucial in estimating subspace arrangements? 1 H I (m) determines the number of linearly dependent vanishing polynomials of degree m. M = H I (h) {f 1, f 2,, f M } I m = (f 1, f 2,, f M ) A = Z (I m). 2 H I (m) facilitates model selection within a class of candidate arrangement models. Figure: Data space R 3.M [3] 3 = 10. c 1 c 2 c 3 H I (3) I 3 is a subspace of R [3] 3 of dim 10 dim(i 3 ) can only be 1, 2, 4, or 7. Given the subspace dimensions uniquely determine dim(i m). If H I (m) correctly estimated the possible dimensions of the individual subspaces. Reference: Yang, Rao, Wagner, Fossum, & Ma. Hilbert functions and applications to the estimation of subspace arrangements. ICCV, 2005.

60 GPCA: (a walk-through example) x V 1 V 2 (x 3 = 0)or(x 1 = x 2 = 0) I 2 = (x 1 x 3, x 2 x 3 ). Veronese Map: Given N samples x 1,..., x N R 3,. L 2 = [ν2 (x 1 ),..., ν 2 (x N )] R M[3] 2 N 2 3 = 6 4 (x 1 ) 2 (x 1 x 2 ) (x 1 x 3 ) (x 2 ) 2 (x 2 x 3 ) (x 3 ) V1 x1 x3 V2 R 3 x2 The null space of L 2 is c 1 = [0, 0, 1, 0, 0, 0] c 2 = [0, 0, 0, 0, 1, 0] p 1 = c T 1 ν 2(x) = x 1 x 3 p 2 = c T 2 ν 2(x) = x 2 x 3 Punch line 1 The null space of L n give coefficients of the vanishing polynomials. 2 The dimension of Null(L n) is given by h I (n). Vidal, Ma, & Sastry. Generalized Principal Component Analysis, PAMI, 2005.

61 GPCA: a walk-through example (continued) P(x) =. [p 1 (x)p 2 (x)] = [x 1 x 3, x 2 x 3 ], then x p J (P)(x) = 1 x3 0 x x p = x 3 x. 2 Evaluating J (P)(x) at sample points gives normal vectors that span V1 and V 2, respectively: V1 V2 b22 b21 b11 z = (a, b, 0) T V 1 J (P)(x) z = ` 0 0 a 0 0 b. y = (0, 0, c) T V 2 J (P)(x) y = ` c 0 0 c 0 0. Using V1 and V 2 V 1 and V 2. to segment samples and recover Polynomial Differentiation Algorithm (PDA) V1 V2 R D νn(x) = R M[D] n R M[D] n p(x) = c T x Null(Ln) x = = V1 V2 R D Rank(Ln) = M [D] n hi(n) MATLAB code available at:

62 Generalized Principal Component Analysis Algorithm Given a set of (noise free) samples {z 1, z 2,..., z N } from m linear subspaces of dimensions d 1, d 2,..., d m in R D : 1: Construct the matrix L m = `ν m(z 1 ), ν m(z 2 ),..., ν m(z N ). 2: Compute the singular value decomposition (SVD) of L m and let C be the singular vectors associated with the M = H I (m) smallest singular values. 3: Construct the polynomials P(X) = C T ν m(x). 4: for all 1 i m do 5: Pick one point z i per subspace, and compute the Jacobian J (P)(z i ). 6: Compute a basis B i = `b 1, b 2,..., b di of Vi from the right null space of J (P)(z i ) via the singular value decomposition of J (P)(z i ). 7: Assign samples z j that satisfy B T i z j = 0 to the subspace V i. 8: end for

63 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

64 From Ideal to Noisy Noisy Data We have given some detail showing how to use the theory of subspace arrangements to segment a pure set of data into its compoenent subspaces. The problem now is how to handle noisy data. Estimating the vanishing polynomials and subsequently retrieving the subspaces become a statistical problem (which algebraists are not equipped to handle). The data matrix is now usually of full rank. When the vanishing polynomials are found, their Jacobian at noisy sample points no longer span the orthogonal complement to the subspaces. Yang, Ma, et al. have found techniques that apply in this case a subject of Yang s thesis. 1 Noiseless samples from the subspaces. 2 Corrupted by (Gaussian) noise. 3 Contaminated by both noise and outliers.

65 From Ideal to Noisy Noisy Data We have given some detail showing how to use the theory of subspace arrangements to segment a pure set of data into its compoenent subspaces. The problem now is how to handle noisy data. Estimating the vanishing polynomials and subsequently retrieving the subspaces become a statistical problem (which algebraists are not equipped to handle). The data matrix is now usually of full rank. When the vanishing polynomials are found, their Jacobian at noisy sample points no longer span the orthogonal complement to the subspaces. Yang, Ma, et al. have found techniques that apply in this case a subject of Yang s thesis. 1 Noiseless samples from the subspaces. 2 Corrupted by (Gaussian) noise. 3 Contaminated by both noise and outliers.

66 From Ideal to Noisy Noisy Data We have given some detail showing how to use the theory of subspace arrangements to segment a pure set of data into its compoenent subspaces. The problem now is how to handle noisy data. Estimating the vanishing polynomials and subsequently retrieving the subspaces become a statistical problem (which algebraists are not equipped to handle). The data matrix is now usually of full rank. When the vanishing polynomials are found, their Jacobian at noisy sample points no longer span the orthogonal complement to the subspaces. Yang, Ma, et al. have found techniques that apply in this case a subject of Yang s thesis. 1 Noiseless samples from the subspaces. 2 Corrupted by (Gaussian) noise. 3 Contaminated by both noise and outliers.

Estimation of Arrangements from Noisy Samples An

if it tolerates outliers (together with noise).

stable on hyperplanes:[yang] (a) 0% noise (b) 4%

cannot handle subspaces of different dimensions

1 P(x) estimated by PCA is noisy P(x) 0.

67 Estimation of Arrangements from Noisy Samples An algorithm is stable if it tolerates noise, robust if it tolerates outliers (together with noise). Polynomial Differentiation Algorithm (PDA) is stable on hyperplanes:[yang] (a) 0% noise (b) 4% noise (c) 8% noise (d) 16% noise However, PDA cannot handle subspaces of different dimensions in noisy data. 1 P(x) estimated by PCA is noisy P(x) 0. 2 x is perturbed away from subspace x P x is not normal to the subspace. Derivatives x P x is always full rank. Best we can do is to treat each V i as hyperplane. (e) a priori, 8% noise (f) estimation result

68 We propose a new algorithm, GPCA-Voting, for subspaces of different dimensions. The noise effects in the process of algebraic GPCA. V1 V2 R D νn(x) = R M[D] n R M[D] n p(x) = c T x Null(Ln) x = = R D V1 V2 Improve the vanishing polynomials, i.e., estimation of Null(L n): 1 L n is always full rank Eigenspace. 2 Uses Fisher discriminant to min within-subspace fitting error and max between-subspace fitting error. (g) eigenvalue (noise-free) (h) eigenvalue (8% noise) Improve the normal vectors: A voting scheme. 1 More accurate estimates of the bases. 2 Determine different dimensions of the subspaces.

69 Improving Vanishing Polynomials Since L n is full rank, replace null space by eigenspace to the smallest eigenvalues. Assuming ν n(x) is zero-mean Gaussian distributed, the last eigenvector c of L n = [ν n(x i )] is optimal for least-square fitting c = arg min ν c n(x) T c. However, it is more practical to assume x to be corrupted by Gaussian noise: x i = ˆx i (c) + n i, i = 1, 2,..., N, where ˆx(c) is the projection of x onto the zero set of the polynomial c. Hence, we want to estimate the mean square distance c 1 = arg min c N NX x i ˆx i (c) 2. i=1

70 Introduction Subspace Arrangements Noise Robust GPCA Other Recent Advances Future A Voting Scheme Goal: Averaging derivatives at all samples of a subspace will produce more stable basis. Difficulty: We do not know which samples belong to the same subspace, yet. GPCA-Voting (a walk-through example) 1 Assume samples are drawn from 3 subspaces of dimensions 2, 1, 1 in R3. 2 Estimate hi (3) = 6 linearly independent 3rd degree polynomials via eig(a, B). 3 Subspace voting on two codimension classes (rank-1 & rank-2) with a tolerance threshold τ 4 Segment the samples based on the estimated bases. 5 (optional) Iteratively refine the segmentation via EM or K-Subspaces. IMA 09 March 2007 Robert M. Fossum Subspace Arrangements in Theory and Practice

Simulation Results 1 Improving vanishing polynomials.

eig(a,b) (8% noise) 2 Segmentation simulations Table:

Subspace Dimensions EM K-Subspaces PDA Voting

4% (4, 2, 2, 1) in R 5 53% 57% 39.8% 5.7% 5.

71 Simulation Results 1 Improving vanishing polynomials. (i) eigenvalue (noise-free) (j) eigenvalue (8% noise) (k) eig(a,b) (8% noise) 2 Segmentation simulations Table: Segmentation errors. 4% Gaussian noise is added. Subspace Dimensions EM K-Subspaces PDA Voting Voting+K-Subspaces (2, 2, 1) in R 3 29% 27% 13.2% 6.4% 5.4% (4, 2, 2, 1) in R 5 53% 57% 39.8% 5.7% 5.7% (4, 4, 4, 4) in R 5 20% 25% 25.3% 17% 11% (l) 8% (m) 12% (n) 16% Figure: (2, 1, 1) R 3. (a) 8% (b) 12% (c) 16% Figure: (2, 2, 1) R 3.

72 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

73 Noisy Data Corrupted by Outliers There isn t time to discuss the problems involved in estimating the vanishing polynomials in the case of outliers. Again, this is a topic that Yang has handled in his thesis. See Yi Ma, Allen Y. Yang, Harm Derksen, and Robert Fossum. Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data. To Appear SIAM Review, 2007.

74 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

75 Problem Given labeled training images X i R m n i i = 1... K of the faces of K individuals, and a new test image, x T, infer the identity of the person in the test image. Under idealized conditions, images of a face (in a fixed pose) under all possible lighting conditions lie on a linear subspace (see fig 8) [?].

76 Figure: Subspaces spanned by face images of three individuals under varying illumination.

77 Recovering Sparse Data Thus, if x T belongs to the i-th individual, then x T X i w for some w R n i. We can then view face recognition as identifying a point on an arrangement of K subspaces, range(x 1 )... range(x K ). Concatenating the training data into one large matrix A = [X 1... X K ] R m n, we have that x T = Aw for some sparse w in R n. How can we efficiently recover such a sparse w? Theorem (Donoho, Candes and Tao) Let A be m n, with m < n, and suppose y = Aw 0 for some sufficiently sparse w 0 ( supp(w) < ρ(a)). Then the sparsest solution w 0 is also the minimum l 1 -norm solution: w 0 = arg min w 1 subject to Aw = y. (6) w and can therefore be computed by linear programming. Thus, minimizing the l 1 norm gives an efficient way of locating a point in the (combinatorially large) subspace arrangement given by all ` n ρ(a) subsets of ρ(a) columns of A.

78 Recognition under occlusion. Now suppose the some fraction of the pixels of the test image are occluded: x T = Aw + e (7) where e is some vector of errors which are arbitrary magnitude, uncorrelated with the image, but concentrated on only a subset of the image pixels. That is, e is sparse as well. We can estimate such simultaneously sparse w, e in the same manner. Rewriting, «w x T = [A I]. (8) e We then wish to locate x T in the subspace arrangement given by sparse linear combinations of the columns of [A I]. The next figure shows some results for images with 30% of the pixels occluded.

79 Figure Image Error Reconstruction Image Error Reconstruction Image Error Reconstruction (a) (b) (c) Figure: Some visual results. (a) Occluded face image. (b) Estimated error, e. (c) Reconstruction, Aw.

80 Coefficients, red > same class Coefficients, red > same class Coefficients, red > same class Figure: Recovered coefficients w for the above three examples. The red entries correspond to training images of the same person. Notice that in all cases the largest coefficient corresponds to the correct identity.

81 Outline 1 Introduction Abstract Motivating Examples of Data Types and Uses 2 Subspace Arrangements Definitions Quick Course on Algebraic Sets Hilbert Functions and Hilbert Series Hilbert Series and Subspace Arrangements 3 Estimating Hybrid Subspace Models from Noisy Samples Stability Estimation of Vanishing Polynomials GPCA-Voting 4 Robust Component Analysis 5 Other Recent Advances Occluded Face Recognition 6 Future Directions Further Investigations Summary and Conclusions Thanks

82 Further Investigation 1 Investigate arrangements of manifolds higher order. 2 Investiage multilinear arrangements. That is arrangements within spaces of the form R D 1 R D 2 R Dr. Video sequences are examples where r 3. 3 Scale the method up to handle larger data sets. 4 Apply to various data sets. As an example, consider a scene in which two or more vehicles with occupants conversing are moving. Try to segment the two vehicles AND the sound from the two vehicles. (Huang)

83 Further Investigation 1 Investigate arrangements of manifolds higher order. 2 Investiage multilinear arrangements. That is arrangements within spaces of the form R D 1 R D 2 R Dr. Video sequences are examples where r 3. 3 Scale the method up to handle larger data sets. 4 Apply to various data sets. As an example, consider a scene in which two or more vehicles with occupants conversing are moving. Try to segment the two vehicles AND the sound from the two vehicles. (Huang)

84 Further Investigation 1 Investigate arrangements of manifolds higher order. 2 Investiage multilinear arrangements. That is arrangements within spaces of the form R D 1 R D 2 R Dr. Video sequences are examples where r 3. 3 Scale the method up to handle larger data sets. 4 Apply to various data sets. As an example, consider a scene in which two or more vehicles with occupants conversing are moving. Try to segment the two vehicles AND the sound from the two vehicles. (Huang)

85 Further Investigation 1 Investigate arrangements of manifolds higher order. 2 Investiage multilinear arrangements. That is arrangements within spaces of the form R D 1 R D 2 R Dr. Video sequences are examples where r 3. 3 Scale the method up to handle larger data sets. 4 Apply to various data sets. As an example, consider a scene in which two or more vehicles with occupants conversing are moving. Try to segment the two vehicles AND the sound from the two vehicles. (Huang)

86 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

87 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

88 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

89 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

90 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

91 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

92 Summary Summary GPCA and its stable and robust versions offer a top down method for analysing data. One studies the algebraic structure first and then studies the individual subspaces using both algebraic and statistical methods. A new greedy algorithm due to Derksen and Ma shows promise in segmenting data. It may be used in conjunction with GPCA and other techniques. The confluence of algebra, statistics, and computation is crucial for a complete and thorough understanding of the modeling of mixed data. Problems Dimensionality Curse. The dimension of the ambient space of the Veronese embedding is huge. Outlier Detection Number of motions (does not work well for large numbers of moving objects).

93 Acknowledgements/Collaborators Acknowledgements Director Pierre Wiltzius and Beckman Institute for its hospitality. Yi Ma, Allen Y. Yang, and others in our group for graciously allowing me to use slides. Frank Sottile for pointing us to subspace arrangements. Bernt Sturmfels for pointing us to the Veronese map. Collaborators Professor Tom Huang (ECE) Professor Yi Ma (ECE) Professor Harm Derksen University of Michigan Professor Rene Vidal at Johns Hopkins Professor Kun Huang at Ohio State Allen Yang, Wei Hong, Shankar Rao, Andrew Wagner, John Wright, Nemanja Petrovic, David Murphy, Joseph Brennan, and many others Supported by NSF CCF-TF

94 THANK YOU!

Segmentation of Subspace Arrangements III Robust GPCA

Segmentation of Subspace Arrangements III Robust GPCA Berkeley CS 294-6, Lecture 25 Dec. 3, 2006 Generalized Principal Component Analysis (GPCA): (an overview) x V 1 V 2 (x 3 = 0)or(x 1 = x 2 = 0) {x 1x