Information Geometric Structure on Positive Definite Matrices and its Applications

Similar documents
Information Geometry

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

Information geometry of Bayesian statistics

Information Geometry on Hierarchy of Probability Distributions

Information geometry for bivariate distribution control

Riemannian geometry of positive definite matrices: Matrix means and quantum Fisher information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Left-invariant metrics and submanifold geometry

Differential Geometry, Lie Groups, and Symmetric Spaces

Information Geometric view of Belief Propagation

Introduction to Information Geometry

Statistical physics models belonging to the generalised exponential family

Information geometry of mirror descent

arxiv: v1 [cs.lg] 17 Aug 2018

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions

Dirac Cohomology, Orbit Method and Unipotent Representations

SUBTANGENT-LIKE STATISTICAL MANIFOLDS. 1. Introduction

1: Lie groups Matix groups, Lie algebras

Information geometry of the power inverse Gaussian distribution

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

REGULAR TRIPLETS IN COMPACT SYMMETRIC SPACES

distances between objects of different dimensions

AFFINE SPHERES AND KÄHLER-EINSTEIN METRICS. metric makes sense under projective coordinate changes. See e.g. [10]. Form a cone (1) C = s>0

Mean-field equations for higher-order quantum statistical models : an information geometric approach

On classification of minimal orbits of the Hermann action satisfying Koike s conditions (Joint work with Minoru Yoshida)

Distances between spectral densities. V x. V y. Genesis of this talk. The key point : the value of chordal distances. Spectrum approximation problem

On the exponential map on Riemannian polyhedra by Monica Alice Aprodu. Abstract

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

arxiv: v3 [math.pr] 4 Sep 2018

DIFFERENTIAL GEOMETRY HW 12

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

A Geometric View of Conjugate Priors

arxiv: v1 [stat.co] 14 Oct 2010

arxiv: v1 [cs.lg] 1 May 2010

Nonparametric Bayesian Methods (Gaussian Processes)

Lecture Note 5: Semidefinite Programming for Stability Analysis

Geometry of U-Boost Algorithms

Background Mathematics (2/2) 1. David Barber

A Geometric View of Conjugate Priors

Journée Interdisciplinaire Mathématiques Musique

Many of the exercises are taken from the books referred at the end of the document.

Lecture Notes 1: Vector spaces

Bregman divergence and density integration Noboru Murata and Yu Fujimoto

Single-channel source separation using non-negative matrix factorization

Convex Optimization in Classification Problems

Gaussian discriminant analysis Naive Bayes

Modern Geometric Structures and Fields

Symmetric Spaces Toolkit

The local geometry of compact homogeneous Lorentz spaces

Geometry of symmetric R-spaces

FANG-TING TU. Department of Mathematics, Louisiana State University, Baton Rouge, LA, 70803

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

STA414/2104 Statistical Methods for Machine Learning II

Loos Symmetric Cones. Jimmie Lawson Louisiana State University. July, 2018

Hessian Riemannian Gradient Flows in Convex Programming

Advanced Machine Learning & Perception

H-projective structures and their applications

Submanifolds of. Total Mean Curvature and. Finite Type. Bang-Yen Chen. Series in Pure Mathematics Volume. Second Edition.

Inference. Data. Model. Variates

Background on c-projective geometry

Comparison for infinitesimal automorphisms. of parabolic geometries

Quasi-conformal minimal Lagrangian diffeomorphisms of the

On projective classification of plane curves

ESSENTIAL KILLING FIELDS OF PARABOLIC GEOMETRIES: PROJECTIVE AND CONFORMAL STRUCTURES. 1. Introduction

Legendre transformation and information geometry

Let F be a foliation of dimension p and codimension q on a smooth manifold of dimension n.

Lecture: Examples of LP, SOCP and SDP

Fundamentals of Differential Geometry

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

6. Orthogonality and Least-Squares

Nonlinear Programming Models

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Global aspects of Lorentzian manifolds with special holonomy

Tensor Sparse Coding for Positive Definite Matrices

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Equivalence, Invariants, and Symmetry

Information Geometry: Background and Applications in Machine Learning

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Smooth Dynamics 2. Problem Set Nr. 1. Instructor: Submitted by: Prof. Wilkinson Clark Butler. University of Chicago Winter 2013

Problems in Linear Algebra and Representation Theory

The line, the circle, and the ray. R + x r. Science is linear, is nt? But behaviors take place in nonlinear spaces. The line The circle The ray

Machine learning - HT Maximum Likelihood

The Metric Geometry of the Multivariable Matrix Geometric Mean

STA 4273H: Statistical Machine Learning

Information Geometry and Primal-Dual Interior-point Algorithms

Warped Riemannian metrics for location-scale models

Is Every Invertible Linear Map in the Structure Group of some Algebraic Curvature Tensor?

Semidefinite Programming

INFORMATION THEORY AND STATISTICS

A Theory of Mean Field Approximation

Supplement: Universal Self-Concordant Barrier Functions

MANIFOLDS. (2) X is" a parallel vectorfield on M. HOMOGENEITY AND BOUNDED ISOMETRIES IN

(Multivariate) Gaussian (Normal) Probability Densities

A Convexity Theorem For Isoparametric Submanifolds

Transcription:

Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1

Outline 1. Introduction 2. Standard information geometry on positive definite matrices 3. Extension via the other potentials (Bregman divergence) Joint work with S. Eguchi (ISM) 4. Conclusions 2

1. Introduction : the set of positive definite real symmetric matrices related to matrix (in)eq. (Lyapunov,Riccati, ) mathematical programming (SDP) statistics (Gaussian, Covariance matrix) symmetric cones (hom. sp., Jordan alg.) 3

Information geometry on Dualistic geometric structure : arbitrary vector fields on :Riemannian metric, :a pair of dual affine connections 4

A simple way to introduce a dualistic structure (1) : open domain in :strongly convex on (i.e., positive definite Hessian mtx.) Cf. Hessian geometry Riemannian metric Dual affine connections 5

A simple way to introduce a dualistic structure (2) divergence projection MLE, MaxEnt and so on Pythagorean relations 6

2. Standard IG on : the set of positive definite real symmetric matrices logarithmic characteristic func. on ϕ( P) = log det P, P PD( n;r) - The standard case - 7 7

-log det P appears as Semidefinite Programming (SDP) self-concordant barrier function Multivariate Analysis (Gaussian dist.) log-likelihood function (structured covariance matrix estimation) Symmetric cone: log characteristic function Information geometry on potential function 8

Standard dualistic geometric structure on (1) [AO,Suda,Amari LAA96] the set of n by n real symmetric matrix :arbitrary set of basis matrices (primal) affine coordinate system Identification 9

Standard dualistic geometric structure on (2) ϕ(p) plays a role of potential function : Riemannian metric (Fisher for Gaussian), :dual affine connections Jordan product (mutation) 10 10

Properties symmetric cones -invariant :involution dual affine coordinate system (Legendre tfm.) divergence self-dual 11

Invariance of the structure Automorphism group, i.e., congruent T transformation: τ P = GPG, G GL( n, the differential: G ( τ G ) * X = GXG T R), Ex) Riemannian metric 12 12

Doubly autoparallel submanifold Def. Submanifold is doubly autoparallel when it is both - and -autoparallel, equivalently, is both linearly and inverselinearly constrained. 13

Linearly constrained Inverse-linearly -autoparallel -autoparallel 14

Set. 15

Doubly autoparallelism (special case) Jordan product for Sym(n) 16

Doubly autoparallelism - Examples (1) 17

Doubly autoparallelism - Examples - (2) 18

Applications of DA Nearness, matrix approximation, GL(n)-invariance, convex optimization Semidefinite Programming If a feasible region is DA, an explicit formula for the optimal solution exists. Maximum likelihood estimation of structured covariance matrix GGM, Factor analysis, signal processing (AR model) 19

MLE of str. cov. matrix (1) n samples of random variable z 20

MLE of str. cov. matrix (2) If is also inverse-linearly constrained, i.e., is DA, then MLE is a convex optimization problem with a solution formula: 21

MLE of str. cov. matrix (3) 22

MLE of str. cov. matrix (4) 23

MLE of str. cov. matrix (5) E-step: Explicit formula for simple imbedding (e.g., upper-left corner etc) M-step: reduces to solving a linear equation if the structure of C is DA. 24

3. Extension via the other potentials (Bregman divergence) The other convex potentials V-potential functions Study their different and common geometric natures Application to multivariate statistics? 25

Contents V-potential function Dualistic geometry on Foliated Structure Decomposition of divergence Application to statistics geometry of a family of multivariate elliptic distributions 26

Def. V-potential function -The standard case:, V (s) : R + R V ( s) = log s ϕ( P) = log det P Characteristic function on (strongly convex) 27

Def. Rem. The standard case: ν ( s) = 1, ν ( s) = 0, k 1 k 2 Prop. (Strong convexity condition) The Hessian matrix of the V-potential is positive definite on if and only if 28

Prop. When two conditions in Prop.1 hold, Riemannian metric derived from the V- potential is = Here, X,Y : vector field ~ symmetric matrix-valued func. Rem. The standard case: = 29

Prop. (affine connections) Let be the canonical flat connection on. Then the V-potential defines the following dual connection with respect to : 30

Rem. the standard case: mutation of the Jordan product of Ei and Ej 31

divergence function Divergence function derived from - a variant of relative entropy, - Pythagorean type decomposition 32 32

Prop. The largest group that preserves the dualistic structure invariant is τ G G SL( n, R) with except in the standard case. Rem. the standard case: Rem. The power potential of the form: τ G with G GL( n, R) has a special property. 33

Special properties for the power potentials The affine connections derived from the power potentials are GL(n)-invariant. Both - and -projection are GL(n) - invariant. 34

Foliated Structures The following foliated structure features the dualistic geometry derived by the V-potential. 35

Prop. Each leaf and are orthogonal each other with respect to. Prop. Every is simultaneously a - and - geodesic for an arbitrary V-potential. 36

Prop. a Each leaf is a homogeneous space with the constant negative curvature 37

Application to multivariate statistics Non Gaussian distribution (generalized exponential family) Robust statistics beta-divergence, Machine learning, and so on Nonextensive statistical physics Power distribution, generalized (Tsallis) entropy, and so on 38

Application to multivariate statistics Geometry of U-model Def. Given a convex function U and set u=u, U-model is a family of elliptic (probability) distributions specified by P: = :normalizing const. 39

Rem. When U=exp, the U-model is the family of Gaussian distributions. U-divergence: Natural closeness measure on the U-model Rem. When U=exp, the U-divergence is the Kullback-Leibler divergence (relative entropy). 40

Prop. Geometry of the U-model equipped with the U-divergence coincides with derived from the following V-potential function: 41

Conclusions Sec. 2 DA submanifold: needs a tractable characterization or the classification Sec. 3 Derived dualistic geometry is invariant under the SL n, -group actions Each leaf is a homogeneous manifold with a negative constant curvature Decomposition of the divergence function (skipped) Relation with the U-model with the U-divergence ( R) 42

Main References A. Ohara, N. Suda and S. Amari, Dualistic Differential Geometry of Positive Definite Matrices and Its Applications to Related Problems, Linear Algebra and its Applications, Vol.247, 31-53 (1996). A. Ohara, Information geometric analysis of semidefinite programming problems, Proceedings of the institute of statistical mathematics ( 統計数理 ), Vol.46, No.2, 317-334 (1998) in Japanese. A. Ohara and S. Eguchi, Geometry on positive definite matrices and V-potential function, Research Memorandum No. 950, The Institute of Statistical Mathematics, Tokyo, July (2005). 43