Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Jun Zhang 1, and Fubo Li 1 University of Michigan, Ann Arbor, Michigan, USA Sichuan University, Chengdu, China junz@umich.edu Abstract. Divergence functions play a central role in information geometry. Given a manifold M, a divergence function D is a smooth, nonnegative function on the product manifold M M that achieves its global minimum of zero (with semi-positive definite Hessian) at those points that form its diagonal submanifold M x. It is well-known (Eguchi, 198) that the statistical structure on M (a Riemmanian metric with a pair of conjugate affine connections) can be constructed from the second and third derivatives of D evaluated at M x. Here, we investigate Riemannian and symplectic structures on M M as induced from D. Wederivea necessary condition about D for M M to admit a complex representation and thus become a Kähler manifold. In particular, Kähler potential is shown to be globally defined for the class of -divergence induced by a strictly convex function (Zhang, 004). In such case, we recover α- Hessian structure on the diagonal manifold M x, which is equiaffine and displays the so-called called reference-representation biduality. Divergence functons are fundamental objects in Information Geometry, the differential geometric study of the manifold of (parametric or non-parametric) probability distributions (see Amari, 1985; Amari and Nagaoka, 000). They measure the directed (asymmetric) difference between two points on this manifold, where each point represents a probability function or a vector in the parametric space. Divergence functions induce the statistical structure of a manifold a statistical structure consists of a Riemannian metric along with a pair of torsion-free affine connections that are conjugate to each other with respect to the metric. When the conjugate connections are Ricci-symmetric, i.e., when the connections are equiaffine, then the manifold admits, in addition, a pair of parallel volumn forms. Those geometric structures on tangent bundles were reviewed in Zhang and Matsuzoe (009). In this paper, we investigate how to use divergence functions to construct geometric structures of the cotangent bundle, specifically, the symplectic structure of a statistical manifold. For the class of -divergence functions (Zhang, 004), the statistical manifold admits, in addition, a compatible complex structure, and hence becomes a Kähler manifold. Corresponding author. F. Nielsen and F. Barbaresco (Eds.): GSI 013, LNCS 8085, pp. 595 603, 013. c Springer-Verlag Berlin Heidelberg 013
596 J. Zhang and F. Li 1 Statistical Manifolds Induced by Divergence Functions 1.1 -Divergence Functions Definition. A divergence function D : M M R 0 on a manifold M under a local chart V R n is a smooth function (differentiable up to third order) which satisfies (i) D(x, y) 0 x, y V with equality holding if and only if x = y; (ii) D i (x, x) =D,j (x, x) =0, i, j {1,,,n}; (iii) D i,j (x, x) is positive definite. Here D i (x, y) = x id(x, y), D,i (x, y) = y id(x, y) denote partial derivatives with respect to the i-th component of point x and of point y, respectively, D i,j (x, y) = x i y j D(x, y) the second-order mixed derivative, etc. Zhang (004) proposed the construction of a general family of divergece functions based on a convex function. This class of divergence functions, called - divergence here, included many familiar examples, such as Bregman divergence (Bregman, 1967), Kullback-Leibler divergence, f-divergence (Csiszar, 1967), α- divergence (Amari, 1985), U-divergence (Eguchi, 008), Jenson difference (Rao, 1987), etc. Let : V R n R,x (x) be a strictly convex function. For any two points x M, y M and any real number α ( 1, 1), strict convexity of guarantees 1 α (x)+ 1+α ( 1 α (y) x + 1+α ) y 0. The inequality sign is reversed when α > 1) (with equality holding only when x = y). Assuming to be sufficiently smooth, a family of functions on V V,as indexed by α R, can be constructed as -divergence functions, denoted D (α) (Zhang, 004): D (α) (x, y) = 4 1 α Clearly, D (α) (x, y) =D( α) ( 1 α (x)+ 1+α ( 1 α (y) x + 1+α )) y. (1) (y, x), and D (±1) (x, y) is defined by taking lim α ±1 : D (1) D ( 1) (x, y) = D( 1) (y, x) =B (x, y), (x, y) = D (1) (y, x) =B (y, x). where B is the Bregman divergence B (x, y) =(x) (y) x y, (y) () where = [ 1,, n ]and, n denotes the canonical pairing of x = [x 1,,x n ] V and u =[u 1,,u n ] Ṽ (dual to V ): x, u n = n i=1 xi u i.
Symplectic and Kähler Structures on Statistical Manifolds Induced 597 1. α-hessian Structure Induced from -Divergence A statistical manifold (M,g,Γ,Γ ) is equipped with a one-parameter family of affine connections, the so-called (Amari, 1985) α-connections Γ (α) (α R), where Γ (α) = 1+α Γ + 1 α Γ and Γ (0) = Γ. It is well-known that a statistical structure {M,g,Γ,Γ } can be induced from a divergence function. Lemma 1 (Eguchi, 1983). A divergence function induces a Riemannian metric g and a pair of torsion-free conjugate connections Γ, Γ given as g ij (x) = D i,j (x, y) x=y ; Γ ij,k (x) = D ij,k (x, y) x=y ; Γij,k k,ij(x, y) x=y. It is easily verifiable that Γ ij,k,γij,k as given above are torsion free1 and satisfy the conjugacy condition with respect to the induced metric g ij. Hence {M,g,Γ,Γ } as induced is a statistical manifold (Lauritzen, 1987). Applying the Eguchi construction to D,wehave Proposition 1 (Zhang, 004). The manifold {M,g,Γ (α),γ ( α) } associated with D (α) (x, y) is given by g ij (x) = ij (3) and Γ (α) α ij,k (x) =1 ijk, Γ (α) ij,k (x) =1+α ijk. (4) Here, ij, ijk denote, respectively, second and third partial derivatives of (x) ij = (x) x i x j, ijk = 3 (x) x i x j x k. The manifold {M,g,Γ (α),γ ( α) } is called an α-hessian manifold. It is equipped with an α-independent metric and a family of α-transitively flat connections Γ (α), i.e., Γ (α), with Γ (α) ij,k = Γ ( α) ij,k and Levi-Civita connection Γ ij,k (x) = 1 ijk. The concept of α-hessian manifold is a proper generalization of the dually flat Hessian manifold (Shima, 001; Shima and Yagi, 1997) induced from a strict convex function. Straightforward calculation shows that: Proposition (Zhang, 004; Zhang, 007; Zhang and Matsuzoe, 009). For α-hessian manifold {M,g,Γ (α),γ ( α) }, (i) the curvature tensor of the α-connection is given by: R (α) μνij = 1 α 4 ( ilν jkμ ilμ jkν )Ψ lk = R (α) l,k with Ψ ij being the matrix inverse of ij ; ijμν, 1 Conjugate connections which admit torsion has been recently studied by Calin, Matsuzoe, and Zhang (009) and Matsuzoe (010). Uohashi (00) called Γ (α) α-transitively flatness when Γ, Γ are dually flat.
598 J. Zhang and F. Li (ii) all α-connections are equiaffine, with the α-parallel volume forms (i.e., the volume forms that are parallel under α-connections) given by Ω (α) =det[ ij ] 1 α. α-hessian manifold manifests the so-called reference-representation biduality (Zhang 004; 006; Zhang and Matsuzoe, 009). Its auto-parallel curves have analytic expression. From d x i ds + j,k Γ (α)i dx j dx k jk ds ds =0 and substituting (4), we obtain d x i ki ds + 1 α dx j kij ds i i,l dx k ds ( ) d 1 α =0 ds k x =0. So the auto-parallel curves of an α-hessian manifold all have the form ( ) 1 α k x = a k s (α) + b k where the scalar s is the arc length and a k,b k,k =1,c...,nare constant vectors (determined by a point and the direction along which the auto-parallel curve flows through). For α = 1, the auto-parallel curves are given by u k = k (x) = a k s + b k are affine coordinates as previously noted. Symplectic Structue Induced by Divergence Functions A divergence function D is given as a bi-variable function on M (of dimension n). We now view it as a (single-variable) function on M M (of dimension n) that assumes zero value along the diagonal Δ M M M. We will use D to induce both a symplectic form and a compatible metric on M M, which, when restricted to Δ M, is a Lagrange submanifold that carries a statistical structure. Barndorff-Nielsen and Jupp (1997) associated a symplectic form on M M with D (called york there), defined as (apart from a minus sign that we add here) ω D (x, y) = D i,j (x, y)dx i dy j (5) (the comma separates the variable being in the first slot versus the second slot for differentiation). For example, Bregman divergence B (given by ()) induces the symplectic form ij dx i dy j. Fixing a particular y or a particular x in M M results in two n-dimensional submanifolds of M M that will be denoted, respectively, M x M {y} and M y {x} M. Let us write out the canonical symplectic form ω x on the cotangent bundle T M x given by ω x = dx i dξ i.
Symplectic and Kähler Structures on Statistical Manifolds Induced 599 Given D, we define a map L D from M M T M x, (x, y) (x, ξ) givenby L D : (x, y) (x, D i (x, y)dx i ). It is easily to check that in a neighborhood of the diagonal Δ M M M, the map L D is a diffeomorphism since the Jacobian matrix of the map ( δij D ij 0 D i,j ) is nondegenerate in such a neighborhood of the diagonal Δ M. We calculate the pullback of this symplectic form (defined on T M x )tom M: L D ω x = L D (dx i dξ i )=dx i dd i (x, y) = dx i (D ij (x, y)dx j + D i,j dy j )=D i,j (x, y)dx i dy j. (Here D ij dx i dx j =0sinceD ij (x, y) =D ji (x, y) always holds.) On the other hand, we consider the canonical symplectic form ω y = dy i dη i on M y and define a map R D from M M T M y, (x, y) (y, η) givenby R D : (x, y) =(y, D,i (x, y)dy i ). Using R D to pullback ω y to M M yields an analogous formula. Proposition 3. The symplectic form ω D (as in Eqn. 5) defined on M M is the pullback by the maps L D and R D of the canonical symplectic form ω x defined on T M x and ω y defined on T M y : L D ω x = D i,j (x, y)dx i dy j = ω D, RD ω y = D i,j (x, y)dx i dy j = ω D..1 Almost Complex, Hermite Metric, and Kähler Structures An almost complex structure J on M M is defined by a vector bundle isomorphism (from T M M to itself), with the property that J = I. Requiring J to be compatible with ω D,thatis, ω D (JX,JY)=ω D (X, Y ), X, Y T (x,y) M M, we may obtain a constraint on the divergence function D. From ( ) ( ω D x i, y j = ω D J x i,j ) ( y j = ω D y i, ) x j = ω D ( x j, y i we require, and subsequently call a divergence function proper if and only if ), D i,j = D j,i, (6)
600 J. Zhang and F. Li or explicitly D x i y j = D x j y i. Note that this condition is always satisfied on Δ M, by the definition of a divergence function D, which has allowed us to define a Riemannian structure on Δ M (Lemma 1). We now require it to be satisfied on M M (at least a neighborhood of Δ M. For proper divergence functions, we can induce a metric g D on M M the induced Riemannian (Hermit) metric g D is defined by g D (X, Y )=ω D (X, JY ). It is easily to verify g D is invariant under the almost complex structure J. The metric components are given by: g ij = g D ( x i, ) ( x j = ω D x i,j ) ( x j = ω D x i, ) y j = D i,j, ( ) ( g,ij = g D y i, y j = ω D y i,j ) ( y j = ω D y i, ) x j = D j,i, ( ) ( g i,j = g D x i, y j = ω D x i,j ) ( y j = ω D x i, ) x j =0. So the desired Riemannian metric on M M is g D = D i,j (dx i dx j + dy i dy j). In short, while a divergence function induces a Riemannian structure on the diagonal manifold Δ M of M M, a proper divergence function induces a Riemannian structure on M M that is compatible with ω D. We now discuss the Kähler structure on the product manifold M M. By definition, ds = g D 1ω ( D = D i,j dx i dx j + dy i dy j) + ( 1D i,j dx i dy j dy i dx j) = D i,j (dx i + 1dy i) ( dx j 1dy j) = D i,j dz i d z j. Now introduce complex coordinates z = x + 1y, ( z + z D(x, y) =D, z z ) D(z, z), 1
Symplectic and Kähler Structures on Statistical Manifolds Induced 601 so D z i z j = 1 4 (D ij + D,ij )= 1 D z i z j. If D satisfies D ij + D,ij = κd i,j (7) where κ is a constant, then M M admits a Kähler potential (and hence is a Kähler manifold) ds = κ D z i z j dzi d z j.. Kähler Structure Induced from -Divergence With respect to the -divergence (1), observe that ( 1 α x + 1+α ) ( y = ( 1 α + 1+α 4 4 1 )z+(1 α 1+α ) 4 4 1 ) z (α) (z, z), (8) we have (α) ( ( z i z j = 1+α 1 α ij 8 4 + 1+α ) ( 1 α 4 z + 1+α ) ) 1 4 4 z 1 which is symmetric in i, j. Both (6) and (7) are satisfied. The symplectic form, under the complex coordinates, is given by ( 1 α ω (α) = ij x + 1+α ) dx i dy j = 4 1 (α) 1+α z i z j dzi d z j and the line-element is given by ds (α) = 8 (α) 1+α z i z j dzi d z j. Theorem 1. A smooth, strictly convex function : U M R induces a a family of Kähler structure (M,ω (α),g (α) ) defined on U U M M with 1. the symplectic form ω (α) is given by ω (α) = (α) ij dxi dy j which is compatiable with the canonical almost complex structure ω (α) (JX,JY)=ω (α) (X, Y ), where X, Y are vector fields on U U;. the Riemannian metric g (α),compatiblewithj and ω (α) above, is given by g (α) = (α) ij (dxi dx j + dy i dy j );
60 J. Zhang and F. Li 3. the Kähler structure ds (α) = (α) ij dzi d z j = with the Kähler potential given by Here, (α) ij = ij ( 1 α x + 1+α y). 1+α (α) (z, z). 8 1+α (α) z i z j. For the diagonal manifold Δ M = {(x, x) :x M}, a basis of its tangent space T (x,x) Δ M can be selected as e i = 1 ( x i + y i ). The Riemannian metric on the diagonal, induced from g (α) is g (α) (e i,e j ) x=y = g (α),e i e j = (α) kl (dx k dx l + dy k dy l ), = (α) ij (x, x) = ij(x). 1 ( x i + y i ) 1 ( x j + y j ) Therefore, restricting to the diagonal Δ M, g (α) reduces to the Riemannian metric induced by the divergence D (α) through the Eguchi method. References 1. Amari, S.: Differential Geometric Methods in Statistics. Lecture Notes in Statistics, vol. 8. Springer, New York (1985) (reprinted in 1990). Amari, S., Nagaoka, H.: Method of Information Geometry. AMS Monograph. Oxford University Press (000) 3. Barndorff-Nielsen, O.E., Jupp, P.E.: Yorks and Symplectic Structures. Journal of Statistical Planning and Inferrence 63, 63133 63146 (1997) 4. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Physics 7, 00 17 (1967) 5. Calin, O., Matsuzoe, H.: Zhang. J. Generalizations of conjugate connections. In: Trends in Differential Geometry, Complex Analysis and Mathematical Physics: Proceedings of the 9th International Workshop on Complex Structures and Vector Fields, pp. 4 34 (009) 6. Csiszár, I.: On topical properties of f-divergence. Studia Mathematicarum Hungarica, 39 339 (1967) 7. Eguchi, S.: Second order efficiency of minimum contrast estimators in a curved exponential family. Annals of Statistics 11, 793 803 (1983)
Symplectic and Kähler Structures on Statistical Manifolds Induced 603 8. Eguchi, S.: Information divergence geometry and the application to statistical machine learning. In: Emmert-Streib, F., Dehmer, M. (eds.) Information Theory and Statistical Learning, pp. 309 33. Springer (008) 9. Lauritzen, S.: Statistical manifolds. In: Amari, S., Barndorff-Nielsen, O., Kass, R., Lauritzen, S., Rao, C.R. (eds.) Differential Geometry in Statistical Inference, Hayward, CA. IMS Lecture Notes, vol. 10, pp. 163 16 (1987) 10. Matsuzoe, M.: Statistical manifolds and affine differential geometry. Advanced Studies in Pure Mathematics 57, 303 31 (010) 11. Rao, C.R.: Differential metrics in probability spaces. In: Amari, S., Barndorff- Nielsen, O., Kass, R., Lauritzen, S., Rao, C.R. (eds.) Differential Geometry in Statistical Inference, Hayward, CA. IMS Lecture Notes, vol. 10, pp. 17 40 (1987) 1. Shima, H.: Hessian Geometry, Shokabo, Tokyo (001) (in Japanese) 13. Shima, H., Yagi, K.: Geometry of Hessian manifolds. Differential Geometry and its Applications 7, 77 90 (1997) 14. Uohashi, K.: On α-conformal equivalence of statistical manifolds. Journal of Geometry 75, 179 184 (00) 15. Zhang, J.: Divergence function, duality, and convex analysis. Neural Computation 16, 159 195 (004) 16. Zhang, J.: Referential duality and representational duality on statistical manifolds. In: Proceedings of the Second International Symposium on Information Geometry and its Applications, Tokyo, pp. 58 67 (006) 17. Zhang, J.: A note on curvature of α-connections on a statistical manifold. Annals of Institute of Statistical Mathematics 59, 161 170 (007) 18. Zhang, J., Matsuzoe, H.: Dualistic differential geometry associated with a convex function. In: Gao, D.Y., Sherali, H.D. (eds.) Advances in Applied Mathematics and Global Optimization (Dedicated to Gilbert Strang on the Occasion of His 70th Birthday), Advances in Mechanics and Mathematics, ch. 13, vol. III, pp. 439 466. Springer (009)