2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto Shinto Eguchi Institute Institute Statistical Mathematics, Japan This talk is based on a joint work with Osamu Komori and Atsumi Ohara, University of Fukui
Outline Short review for Information geometry Kolmogorov-Nagumo mean -path in a function space Generalized mean and variance divergence geometry U-divergence geometry Minimum U-divergence and density estimation 2
A short review of IG Nonparametric space Space of statistics Information geometry is discussed on the product space 3
Bartlett s identity Parametric model Bartlett s first identity Bartlett s second identity 4
Metric and connections Information metric Mixture connection Exponential connection Rao (1945), Dawid (1975), Amari (1982) 5
Geodesic curves and surfaces in IG m-geodesic curve e-geodesic curve m-geodesic surface e-geodesic surface 6
Kullback-Leibler K-L divergence 1. KL divergence is the expected log-likelihood ratio 2. Maximum likelihood is minimum KL divergence. Akaike (1974) 3. KL divergence induces to the m-connection and e-connection Eguchi (1983) 7
Pythagoras Thm Amari-Nagaoka (2001) Pf 8
Exponential model Exponential model Mean parameter For For Amari (1982) Degenerated Bartlett identity 9
Exponential model Mean equal space Minimum KL leaf 10
Pythagoras foliation 11
log + exp log & exp Bartlett identities KL-divergence e-connection m-connection e-geodesic m-geodesic Pythagoras identity exponential model mean equal space Pythagoras foliation 12
Path geometry { m-geodesic, e-geodesic, } 13
Kolmogorov-Nagumo mean K-N mean is for positive numbers 14
K-N mean in Y Def. K-N mean Cf. Naudts (2009) 15
-path Def. -path connecting f and g Thm (Pf ) lim c 1 ( c )
Examples of -path Exm 0 Exm 1 Exm 2 Exm 3 17
Identities of -density Model 1st identity 2nd identity because 18
Generalized mean and variance Def Note
Generalized mean and variance Exm 20
Bartlett Identity Model Bartlett identity Bartlett identities 21
Tangent space of Y Tangent space Riemannian metric Expectation gives the tangent space Topological properties of T f depend on If = log, then T f is too large to do statistics on Y Cf. Pistone (1992) 22
Def Parallel transport A vector field is parallel along a curve A curve is -geodesic Cf. Amari (1982). 23
-geodesic Thm If is the -geodesic curve Proof. 24
-divergence -cross entropy -entropy -divergence Note: -divergence is KL-divergence if = log 25
Divergence geometry Def. Let be a statistical model. with the Riemannian metric on M : the pair of affine connections on M: 26
-divergence geometry The metric Affine connection pair 27
Pythagorean theorem Thm -geodesic -geodesic h Pf f g 28
-Pythagorean foliation -mean equal space 29
-mean -Bartlett identities - divergence - connection - connection - geodesic - geodesic - Pythagoras identity - model - Pythagoras foliation - mean equal space What is a statistical meaning of -mean and -variance? 30
U-divergence U-cross-entropy U-entropy U-divergence Note Exm 31
U-divergence geometry The metric associated with U-divergence: Affine connections associated with U-divergence: Thm (i) U-geodesic is mixture geodesic. (ii) U*-geodesic is geodesic 32
U-geometry = -geometry / -metric on a model M U-metric on a model M -connection U*-connection 33
Triangle with D U Thm mixture geodesic -geodesic h Pf f g 34
U-loss function U-estimation Let g(x) be a data density function with statistical model U-empirical loss function U-estimator for 35
U-estimator under -model -model U-empirical loss function U-estimator for U-estimator under -model has analogy with MLE under exponential model 36
Potential function Def We call the potential function on -model Note We define the mean parameter by Cf. mean parameter Thm U-estimator for is given by the sample mean 37
Pythagoras foliation Thm Pf 38
Pythagoras foliation 39
U-Boost learning for density estimation U-loss function L U n 1 ( f ) ( f ( xi )) U( ( f ( x)))dx n p i 1 Dictionary of density functions RI W { g ( x) : g ( x) 0, g ( x)dx 1, } Learning space = -model W * 1 ( co( ( W ))) { 1 ( ( g ( x)))} Let f ( x, π) W * W ( ( g ( x)). Then f ( x,(0,,1, 0)) g ( x) 1 ) ( ) Goal : find f * argmin f W * U L U ( f )
U-Boost algorithm ( A) Find f1 arg min L ( g) g W U (B) Update f k f k 1 1 ((1 k 1 ) ( f k ) k 1 ( g k 1 )) st ( k 1, g k 1 ) arg min, g) (0,1) ( W L U ( 1 ((1 ) ( f k ) ( g))) ˆ 1 ( C) Select K, and f ((1 K ) ( fk 1) K ( g K )) Example 2. Power entropy ( f 1 * ) ( x) ( k gk ( x) k k log gk ( x) k k * If 0, f ( x) exp ) g ( x) If * 1, f ( x) ( x) Klemela (2007) k k g k k k Friedman et al (1984)
Inner step in the convex hull W { t( x, ) : } W * U 1 (co ( W )) * * 1 Goal : f argmin L ( f ) f ( x) ( ˆ ( fˆ ( x)) ˆ ( ˆ ˆ f ˆ( x))) f W * U U 1 1 k k W g 7 g 5 * W U f (x) g 4 g 6 g 3 g 2 f (x) g 1
Non-asymptotic bound Theorem. Assume that a data distribution has a density g(x) and that ( A) sup U''( ){ ( ) ( )} Then we have where FA( g, W EE( g, W IE( K, W ) (,, ) co( ( W )) W W EI D ( g, fˆ ) FA( g, W * ) EE( g, W ) IE( K, W g U = U K ) inf f WU= D U n ( g, f ) ) 2 EI { sup f g f W 2 2 U c b K c 1 1 ( f ( x )) EI ( ) } n i 1 U i ( c:step- lengthconstant) p 2 b U ), (Functional approximation) (Estimation error) (Iteration effect) Remark. Trade between FA( p, W * U ) and EE( p, W ) 43
-Boost -Boost KDE RSDE (Girolami-He, 2004) C (skewed-unimodal) H (bimodal) L (quadrimodal) 44
Conclusion K-N mean E ( ), Cov ( ) -path = -geodesic -divergence (G ( ),* ( ), ( ) ) -geodesic, -geodesic ) U U-divergence (G (U),* (U), (U) ) -geodesic, m-geodesic ) -path W * U 1 (co ( W )) 45
Future problems Tangent space Path space 46
Future problems -mean, -variance, -divergence These are natural ideas from IG view point We can build -efficiency in estimation theory, but What is a statistical meaning of -mean and -variance? Can we define a random sample of -version? 47
Thank you 48