Science is linear, is nt The line, the circle, and the ray Nonlinear spaces with efficient linearizations R. Sepulchre -- University of Cambridge Francqui Chair UCL, 05 Page rank algorithm Consensus algorithms Power method all share the same linear iteration + = A But behaviors take place in nonlinear spaces The line The circle The ray The power algorithm is an iteration on the projective space (orthogonality constraint, i.e. the circle) Perron-Frobenius theorem is a fied point in the projective space (positivity constraints, i.e. the ray) R S R + r
Part I: Homogeneity is essential to nonlinear behaviors The line : linear spaces Homogeneity is the net best thing to linearity It is necessary to account for the nonlinear nature of data y + y R Linear combinations: the basis of calculus It is sufficient to make local analysis efficient R, R n, R n n, C, C n, Sym(n), Skew(n),... The circle : phase and rotation spaces The ray : intensity spaces e i Embedding: Projection: S! C :! e i C! S : z! arg(z) Embedding: R +! R : r! log r S, S n, SO(n), SU(n),... R +, + (n), GL + (n),... phase, rotation, attitude, orthogonal matrices, unitary matrices,... Representation: linear spaces + orthogonality constraints radius, intensity, concentration, probability, density, Representation: linear spaces + positivity constraints
Polar coordinates r z A fundamental result of linear matri analysis A = QR Linear transformations (matrices) have intensities and orientation A = U V T z = re i C Any linear transformation can be decomposed as two rotations and one diagonal scaling. Linear objects (vectors) have intensities and orientation Nonlinear data Eamples of nonlinear measurements Most sensors have a preference for phase or intensity. For good reasons. concentration signals Our ears favor amplitude. Our eyes favor phase. Our nose favors concentrations. phase & intensity signals intensity signals
Behaviors (Willems) Linear behaviors (a mature theory) (V, B) V T! W vector space The Universum : space in which we observe/measure/collect the data The Behavior : mathematical law that govern the data B linear subspace E.g. Dynamical systems: V signal space, i.e. T! W :(t, )! w(t, ) B local law in (t, ) : F (w, ẇ,..., w (m) )=0 Linearization = local (and efficient) calculus Linearized behaviors Filtering Interpolating Optimizing (least squares, grad) Averaging,... (lecture 4) Linearization principle (Newton): linear behaviors capture local behaviors near a nominal solution w B w w + w w B(w ) linearized behavior around w From Taylor s epansion: DF(w ) w =0 Note: this requires W to be a linear space...
Linearized phase behavior Local analysis of nonlinear behaviors Embed the space W in a linear space and make the embedding part of the behavior W = S, B : F ( ) =0 B(w ) B(w ) B(w 3) is equivalent to z W = C, B : F ( ) =0 z = e i B Patching linearized behaviors: intractable Note: conceptually, the embedding trick works for arbitrary differentiable manifolds. What is special about the phase constraint Invariance and nonlinear data We like to think of local laws over to be invariant with respect to data phases intensities rotation scaling Homogeneity: linearization should look the same everywhere... Moon phase measured in Tokyo or Paris T measured in C or F...
Behaviors in invariant spaces LT(S)I behaviors are maimally invariant Laws independent from the locality of our data are invariant to specific transformations of the space The law is the same everywhere and everytime W = R W = S W = R + Invariance to translation of data Invariance to rotation of data Invariance to scaling of data shift-invariant in T translation invariant in W Note: those are the eact assumptions under which behavioral theory is mature and efficient. The key property The geometry of scaling and rotating Phase and rotation spaces are homogeneous spaces. In those spaces, linearization (and hence local calculus) can be made the same everywhere. The line, the circle, and the ray share a common mathematical structure. + z e i.e iz e.e z Transitive group action. Reaching any point from any point.
Lie groups Matri Lie groups From Wikipedia: three major themes in 9th century mathematics were combined by Lie in creating his new theory: the idea of symmetry, as eemplified by Galois through the algebraic notion of a group; geometric theory and the eplicit solutions of differential equations of mechanics, worked out by Poisson and Jacobi; and the new understanding of geometry that emerged in the works of Plücker, Möbius, Grassmann and others, and culminated in Riemann's revolutionary vision of the subject. R n n SO(n) matri translation matri rotation Pioneers in engineering: e.g. A.S. Willsky, Dynamical Systems Defined on Groups: Structural Properties and Estimation, Ph.D. Thesis, MIT Dept. of Aeronautics and Astronautics, May 973. Thesis Advisors: R. W. Brockett, Wallace E. Vander Velde. GL(n) matri scaling Homogeneity is essential to nonlinear modeling The local description of the law can be made independent from the locality of the data window only if W is homogeneous. Homogeneity is key to tractability: local coordinates can be made the same everywhere... Part II: Eamples of behaviors on homogeneous spaces A homogeneous space is a space with a transitive group action by a Lie group. (The sphere is not a Lie group but it is a homogenous space)
Homogeneous spaces A homogeneous space M is a space with a transitive group action by a Lie group G. Notation: M G/H H is the stabilizer: Two (important) eamples S + (n) Symmetric positive definite matrices (Behaviors in homogeneous spaces; positivity constraints) H = {g G g.e = e} Gr(p, n) Sphere: S O(3)/O() Behaviors on the space of positive definite matrices the set of p-dimensional subspaces of Rn (Homogeneous behaviors in linear spaces; orthogonality and rank constraints) Diffusion Tensor Imaging voel data = local measure of diffusion of water molecules Phase Intensity Homogeneous data processing (Filtering, interpolating, registering,...)
Quadratic forms on linear data Zero-mean gaussian distributions are characterized by covariance matrices A linear transformation of the data points results in the group action GL(n) X = E( T )! A X! AXA T S + (n) The group acts transitively on by congruence. S + (n) GL(n)/O(n) Other eamples of quadratic forms: kernels, distance matrices,... Engineering impact of affine-invariant geometry of the cone Statistical engineering S. T. Smith, Covariance, subspace, and intrinsic Cramer-Rao bounds, IEEE Trans. Signal Process., 53 (005), pp. 60 630. Conve optimization Yu. E. Nesterov and M. J. Todd, On the Riemannian geometry defined for selfconcordant barriers and interior point methods, Found. Comput. Math., (00), pp. 333 36 DTI imaging X. Pennec, P. Fillard, and N. Ayache, A Riemannian framework for tensor computing, International Journal of Computer Vision, 66 (006), pp. 4 66. Behaviors on quadratic forms: gaussian processes, kernel optimization,... Different ways to make a space homogeneous Diffusion Tensor Imaging : filtering and interpolating smarties X = AA T = U U T = ep(z) Affine-invariant geometry (intrinsic) X GL(n)/O(n) Group embedding (etrinsic) (U, ) O(n) + linear embedding (etrinsic) Z Sym(n) Issues: computation, singularities, invariance properties (PhD Anne Collard, 03: anisotropy preserving midpoints)
Matri completion: a popular benchmark Big data behaviors ~ 07 known ratings (0.0% - 0.%) A recurrent theme: Scarcity of data points in huge dimensional spaces make behaviors ill-posed. Remedy: rank and orthogonality constraints 3 4 4 3 5 5 4 3 ~ 05 items ~ 06 users Matri completion with a low-rank prior Statistics with scarce data Make the search space dimension consistent with the number of data points 3 4 4 3 5 5 4 3 spots are gene epression levels each row is an eperiment (~0) each column is a gene (~04) DNA. mrna Protein
Statistics with a low-rank /sparsity prior The Grassmann manifold Gr(p, n) the set of p-dimensional subspaces of Rn A subspace is determined by the first p columns of an orthogonal matri Gr(p, n) O(n)/stab e epression of a component for all eperiments test correlation with clinical data gene signature of a component test overlap with pathways, regulatory modules Rank constraints M (p, m n) Transitive group action : A key homogeneous space of behavioral theory Miing rank and positivity constraints Space of m by n matrices of rank p (A, B)! AXB T M (p, m n) Gl(n) Gl(m)/stab ep The set of positive semidefinite matrices of size n and rank p S+ (p, n) = {X Rn n X = X T 0} p Transitive group action: A! AXAT S+ (p, n) GL(n)/stabe 44
The line, the circle, and the ray Nonlinear spaces with efficient linearizations Nonlinear data is a source of nonlinear behaviors. Phase and intensities spaces are homogeneous. Rank, orthogonality, and positivity constraints are homogeneous. Behaviors on homogeneous spaces can be made independent from the locality of data. Calculus on homogeneous spaces can be made invariant (lecture 3). Even if they are not invariant, behaviors on homogeneous spaces might have invariant properties (see lecture 6). Behaviors defined on non homogeneous spaces are ill-posed Behaviors with invariant properties are tractable