June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

Size: px
Start display at page:

Download "June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions"

Transcription

1 Dual Peking University June 21, 2016

2 Divergences:

3 Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all vector fields X, Y, Z T (M) is called the Riemannian connection.

4 Dual connection Giving two and on M, if for all X, Y, Z Z X, Y = Z X, Y + X, Z Y (2) holds, then we say that and are duals of each other with respect to g, and call one either the dual connection or the conjugate connection of the other. In addition, we call such a triple (g,, ) a dualistic structure on M. If is metric, then =. Hence the duality of may be considered as a ization of metric connection. In a statistical model S, (g, (α), ( α) ) is a dualistic structure.

5 Dual connection Given a local frame [x i ], from Equation (2) we have k g ij = Γ ki,j + Γ kj,i. (3) Thus, given g and, there exists a unique dual connection. In addition ( ) = holds. We also see that ( + )/2 becomes a metric connection. And conversely, if a connection has the same torsion as and if ( + )/2 is metric, then =.

6 Submanifold Letting N be a submanifold of M, consider N and N, which are respectively the projections of and onto N with respect to g. These are dual with respect to g N (the metric on N determined by g). We call (g N, N, N ) the dualistic structure on N induced by (g,, ), or the induced dualistic structure on M.

7 Covariant derivative Let γ : t γ(t) be a curve in M and let X and Y be vector fields along γ. In addition, let D t X and D t Y respectively denote the covariant derivatives of X with respect to and Y with respect to. Then from Equation (2), we see that d dt X (t), Y (t) = D tx (t), Y (t) + X (t), D t Y (t) (4) Now suppose that X is parallel with respect to, and that Y is parallel with respect to, i.e., D t X = D t Y = 0. Then X (t), Y (t) is constant on γ.

8 Parallel transform Theorem Letting P γ and P γ (:T p (M) T q (M), where p and q are boundary points of γ) respectively denote the parallel translation along γ with respect ro and, then for all X,Y T p (M) we have P γ (X ), P γ(y ) q = X, Y p. (5) This is a ization of the invariance of the inner product under parallel translation through metric discussed in Chapter 1.

9 Curvature The relationship between P γ and P γ is completely determined by Equation (5). Hence if P γ is independent of the actual curve joining p and q, and hence may be written as P γ = P p,q, then this is true of P γ also. Theorem Letting the curvature tensors of and be denoted by R and R, respectively, we have This is immediate from R = 0 R = 0. (6) R(X, Y )Z, W = R (X, Y )W, Z, X, Y, Z, X T (M). (7)

10 Measure Consider a smooth function D = D( ) : M M R satisfying for any p, q M D(p q) 0, and D(p q) = 0 iff p = q. (8) D is a distance-like measure of the separation between two points. However, it does not in satisfy the axioms of distance (symmetry and the triangle inequality).

11 Derivatives Given an arbitrary coordinate system [x i ] of M, let us represent a pair of points (p, p ) M M by a pair of coordinates ([x i ], [x i ]) and denote the partial derivatives of D(p p ) with respect to p and p by D(( i ) p p ) ( i ) x D(x p ) x = p D(( i ) p ( j ) p ) ( i ) x ( j ) y D(x y) x=p,y=p D(( i j ) p ( k ) p ) ( i ) x ( j ) x ( k ) y D(x y) x=p,y=p, etc., (9) These definitions are naturally extended to those of D((X 1 X l ) p p ), D(p (Y 1 Y m ) p ) and D((X 1 X l ) p (Y 1 Y m ) p ) for any vector fields X 1, X l, Y 1,, Y m T (M).

12 Divergence Now consider the restrictions onto the diagonal {(p, p) p M} M M and denote the induced on M by D[X 1 X l ] : p D((X 1 X l ) p p), D[ Y 1 Y m ] : p D(p (Y 1 Y m ) p ), D[X 1 X l Y 1 Y m ] : p D((X 1 X l ) p (Y 1 Y m ) p ). (10) Easily, we have D[ i ] = D[ i ] = 0, (11) D[ i j ] = D[ i j ] = D[ i j ]( g (D) ij ). (12)

13 Divergence and Riemannian metric The matrix [g (D) ij ] is positive semidefinite (it s the Hessian matrix of the minimum point). When [g (D) ij ] is strictly positive definite everywhere on M, we say that D is a or a function on M. For a D, a unique Riemannian metric g (D) =, (D) on M is defined by i, j (D) = g (D) ij, or equivalently by Using Taylor expansion, we have X, Y (D) = D[X Y ]. (13) D(p q) = 1 2 g (D) ij (q) x i x j + o( x 2 ), (14)

14 Divergence and connection We define a connection (D) with the coefficients Γ (D) ij,k by or equivalently by Γ (D) ij,k = D[ i j k ], (15) (D) X Y, Z (D) = D[XY Z] (16) It s easy to see that Γ (D) ij,k = Γ(D) ji,k Γh(D) ij = Γ h(d) ji.

15 Divergence and connection D(p q) = 1 2 g (D) ij (q) x i x j h(d) ijk (q) x i x j x k +o( x 3 ), (17) where h (D) ijk D[ i j k ] = i g (D) jk + Γ (D) jk,i. (18) Conversely, we see that g (D) and (D) are determined by the expansion (17) through Equation (18).

16 Divergence and dual connection Replace the D(p q) with its dual D (p q) = D(q p). Then we obtain g (D ) = g (D) and Γ (D ) ij,k = D[ k i j ]. (19) Theorem (D) and (D ) are dual with respect to g (D).

17 Divergence and dual connection D(p q) = D (q p) where = 1 2 g (D) ij (p) x i x j 1 6 h(d ) ijk (p) x i x j x k + o( x 3 ) h (D ) ijk (20) D[ i j k ] = i g (D) jk + Γ (D ) jk,i. (21)

18 Dual connection and We see that any induces a torsion-free dualistic structure. Conversely, any triple (g,, ) are induced from a. In fact, if we let where D(p q) 1 2 g ij(q) x i x j h ijk(q) x i x j x k, (22) h ijk i g jk + Γ jk,i = Γ ij,k + Γ ik,j + Γ jk,i, (23) then (g,, ) = (g (D), (D), (D ) ).

19 Let (g,, ) be a dualistic structure on a manifold M. If and are both symmetric (T = T = 0), then from Theorem before we see that -flatness and -flatness are equivalent. We call (M, g,, ) a dually flat space if both dual are flat.

20 Autoparallel Theorem Let (M, g,, ) be a dually flat space. If a submanifold N of M is autoparallel with respect to either or, then N is a dually flat space with respect to the dualistic structure (g N, N, N ) induced on N by (g,, ).

21 Dual coordinate Suppose (U; θ i, η j ) is a coordinate neighborhood of dually flat space (M, g,, ), where [θ i ] and [η j ] denote the affine coordinate system for and respectively. We let i θ i and j η j. i, j is constant on U since they are respectively parallel on flat manifold. Thus we can choose particular coordinate systems such that i, j = δ j i. (24) Such two systems are called mutually dual. We see then that the Euclidean coordinate system defined as i, j = δ ij (affine coordinate) is self-dual.

22 Dual coordinate Dual coordinate systems do not ly exist for a Riemannian manifold. If (M, g,, ) is a dually flat space, then dual coordinate systems exist. Conversely, if for a Riemannian manifold (M, g) there exists such coordinate systems, then and for which they are affine are determined, and (M, g,, ) is a dually flat space.

23 Dual coordinate Let the components of g with respect to [θ i ] and [η j ] be defined by g ij i, j and g ij i, j. (25) Considering j = ( j θ i ) i and i = ( i η j ) j, the Equation (24) is equivalent to η j θ i = g ij and therefore g ij g jk = δ k i. and θ i η j = g ij, (26)

24 Legendre transformations Suppose we are given mutually dual coordinate systems [θ i ] and [η j ], and consider the following partial differential equation for a function ψ : M R : i ψ = η i. (27) Rewrite this as dψ = η i dθ i, and a solution exists iff i η j = j η i. Since i η j = g ij = j η i, a solution ψ always exists. i j ψ = g ij, (28) Hessian matrix of ψ is positve definte, thus it s strictly convex of [θ 1,, θ m ].

25 Legendre transformations Similarly, a solution ϕ to i ϕ = θ i (29) exists. In fact, ϕ = θ i η i ψ is a solution. i j ϕ = g ij, (30) and hence it s a strictly convex function of [η 1,, η m ].

26 From convexity we have Legendre transformations ϕ(q) = max p M {θi (p)η i (q) ψ(p)}, (31) ψ(p) = max q M {θi (p)η i (q) ϕ(q)}. (32) Sometimes it is more natural to view these relations as ϕ(η) = max θ Θ {θi η i ψ(θ)}, (33) ψ(θ) = max η H {θi η i ϕ(η)}, (34) where ψ and ϕ are simply convex defined on convex regions Θ and H in R m.

27 Legendre transformations Those coordinate transformations expressed in Equations (27) through (32) are called Legendre transformations, and ψ and ϕ are called their potentials. Note also that Γ ij,k i j, k = i j k ψ, (35) Γ ij,k i j, k = i j k ψ, (36) which are derived from Equation (3) combined with Γ ij,k = Γ ij,k = 0.

28 Let (M, g,, ) be a dually flat space, on which we are given mutually dual affine coordinate systems {[θ i ], [η i ]}. The canonical or (g, )- is defined as D(p q) ψ(p) + ϕ(q) θ i (p)η i (q). (37) Then from Equation (31) and (32) we see that D(p q) 0 and D(p q) = 0 p = q. It is easy to verify the equations D(( i j ) p q) = g ij (p) and D(p ( i j ) q ) = g ij (q) (38) which immediately implies that D is a and induces g. Also = (D) and = (D ) since Γ ij,k = Γ ij,k = 0 due to the -affinity of [θ i ] and the -affinity of [η i ].

29 Note 1 The canonical is defined globally, though it uses locally defined charts, which is guaranteed by the following lemma. Lemma Suppose M is connected and is flat with respect to, then every two or finite points on M can be contained in a single affine chart.

30 Note 2 If given another set of dual affine coordinate systems expressed by θ j = A j i θi + B j, η j = Cj i η i + D j, (39) ψ = ψ + D j θ j + c, ϕ = ϕ + B j η j B j D j c, (40) where [A j i ] is a regular matrix and [C i j ] is its inverse, [Bj ] and [D j ] are real-valued vectors, and c is a real number, then we have ψ(p) + ϕ(q) θ i (p)η i (q) = ψ(p) + ϕ(q) θ i (p) η i (q), (41) which indicates that the canonical is well defined. On (M, g,, ), we define the (g, )- D (p q) = D(q p).

31 Example If is a Riemannian connection, the condition for dually flat reduces to being flat, and hence there exists a Euclidean coordinate system [θ i ], which is self dual (θ i = η i ), and its potential is given by ψ = ϕ = 1 2 i (θi ) 2. Hence we obtain D(p q) = 1 {(θ i (p)) 2 + (θ i (q)) 2 2θ i (p)θ i (q)} 2 i = 1 2 {d(p, q)}2, (42) where d is the Euclidean distance d(p, q) i {θi (p) θ i (q)} 2. In, D(p q) on a dually flat space is only approximately equal to 1 2 {d(p, q)}2 in the sense of Equation (14).

32 Triangular relation Theorem Let {[θ i ], [η i ]} be mutually dual affine coordinate systems of a dually flat space (M, g,, ), and let D be a on M. Then a necessary and sufficient condition for D to be the (g, )- is that for all p,q,r M the following triangular relation holds: D(p q) + D(q r) D(p r) = {θ i (p) θ i (q)}{η i (r) η i (q)}. (43)

33 Pythagorean relation Theorem Let p,q, and r be three points in M. Let γ 1 be the -geodesic connecting p and q, and let γ 2 be the -geodesic connecting q and r. If at the intersection q the curves γ 1 and γ 2 are orthogonal (with respect to the inner product g), then we have the Pythagorean relation D(p r) = D(p q) + D(q r) (44)

34 Pythagorean relation Figure: The Pythagorean relation for (g, )-s.

35 Projection Corollary Let p be a point in M and let N be a submanifold of M which is -autoparallel. Then a necessary and sufficient condition for a point q in N to satisfy D(p q) = min r N D(p r) is for the -geodesic connecting p and q to be orthogonal to N at q. The point q is called the -projection of p onto N when the geodesic connecting p and q N is orthogonal to N.

36 Projection Figure: The projection theorem of (g, )-.

37 Projection Theorem Let p be a point in M and let N be a submanifold of M. A necessary and sufficient condition for a point q N to be a stationary point of the function D(p ) : r D(p r) restricted on N (in other words, the partial derivatives with respect to a coordinate system of N are all 0) is for the -geodesic connecting p and q to be orthogonal to N at q. Corollary Given a point p in M and a positive number c, suppose that the D-sphere N = {q M D(p q) = c} forms a hypersurface in M. Then every -geodesic passing through the center p orthogonally intersects N.

38 em algorithm Given two submanifolds K and S in a dually flat M, we define a between K and S by D[K S] min D(p q) = D( p q), (45) p K,q S where D is the (g, )- of M and p K and q S are the closest pair between K and S. In order to obtain the closest pair, the following iterative algorithm is proposed.

39 em algorithm Figure: Iterated dual geodesic projections (em algorithm)

40 em algorithm Begin with an arbitrary Q t S, t = 0, 1, and search for P K that minimizes D(P Q t ) which is given by the geodesic projection of Q t to K. Let it be P t K. Then search for the point in S that minimizes D(P t Q) which is given by the dual geodesic projection of P t to S, denoted as Q t+1. Since we have D(P t 1 Q t ) D(P t Q t ) D(P t Q t+1 ), (46) the procedure converges. It is unique when S is flat and K is dual flat. Otherwise, the converging point is not necessarily unique.

41 Let f (u) be a convex function on u > 0. For each probability distributions p, q, we define ( q(x) ) D f (p q) p(x)f dx (47) p(x) and call it the.

42 Properties of Using Jensen s inequality we have ( D f (p q) f p(x) q(x) ) p(x) dx = f (1), (48) where the equality holds if p = q and, conversely, the equality implies p = q when f (u) is strictly convex at u = 1. D f is invariant when f (u) is replaced with f (u) + c(u 1) for any c R.

43 Properties of Df = D f, where f = uf (1/u). Monotonicity Let κ = {κ(y x) 0; x X, y Y} be an arbitrary transition probability distribution such that κ(y x)dy = 1, x, whereby the value of x is randomly transformed ro y according to the probability κ(y x). Denoting the distributions of y derived from p(x) and q(x) by p κ (y) and q κ (y) respectively, we have D f (p q) D f (p κ q κ ) (49)

44 Properties of Proof of monotonicity. ( q(x) D f (p q) = p(x)κ(y x)f p(x) ( q(x) = p κ (y)p κ (x y)f p(x) p κ (y)f = D f (p κ q κ ) ) dxdy ) dxdy ( p κ (x y) q(x) p(x) dx ) dy (50) The equality holds if p κ (x y) = q κ (x y) for all x and y.

45 Joint convexity The joint convexity D f (λp 1 + (1 λ)p 2 λq 1 + (1 λ)q 2 ) (51) λd f (p 1 q 1 ) + (1 λ)d f (p 2 q 2 ), 0 λ 1 follows from the convexity of pf ( q p ) ((λ 1 p 1 + λ 2 p 2 )f ( λ 1q 1 +λ 2 q 2 λ 1 p 1 +λ 2 p 2 ) = q 1 q +λ p 2 p (λ 1 p 1 + λ 2 p 2 )f ( λ 1p 1 p 2 λ 1 p 1 +λ 2 p 2 ) λ 1 p 1 f ( q 1 p 1 ) + λ 2 p 2 f ( q 2 p 2 )).

46 Assume f is strictly convex and smooth and f (1) = 0, then D f becomes a and induces the metric g (D f ) = g (f ) and the connection (D f ) = (f ).

47 α- Important examples of smooth s are given by the α- D (α) = D f (α) for a real number α, which is defined by 4 f (α) 1 α 2 {1 u(1+α)/2 } (α ±1) (u) = ulogu (α = 1) logu (α = 1). We have for α ±1 D (α) (p q) = 4 1 α 2 {1 (52) p(x) 1 α 2 q(x) 1+α 2 dx} (53)

48 and for α = ±1 D ( 1) (p q) = D (1) (q p) = α- p(x)log p(x) dx. (54) q(x) We can immediately see that the α- D (α) induces (g (f (α)), (f (α)) ) = (g, (α) ). Note that D (α) (p q) = D ( α) (q p) ly holds. In particular, D (0) (p q) is symmetric, and moreover D (0) (p q) satisfies the axioms of distance, which follows since D (0) (p q) = 2 ( p(x) q(x)) 2 dx. (55) D (0) (p q) is called the Hellinger distance.

49 Kullback The ±1- is called the Kullback or Kullback-Leibler(KL). Here we refer to D ( 1) as the KL and D (1) its dual. The KL satisfies the chain rule: D ( 1) (p q) =D ( 1) (p κ q κ ) + D ( 1) (p κ ( y) q κ ( y))p κ (y)dy. (56)

50 Expectation parameters In an family p(x; θ) = exp[c(x) + θ i F i (x) ψ(θ)], (57) the natural parameters [θ i ] form a 1-affine chart. Now if we define η i = η i (θ) E θ [F i ] = F i (x)p(x; θ)dx, (58) then η i = i ψ and i j ψ = g ij. Hence [η i ] is a (-1)-affine chart dual to [θ i ], and ψ is the potential of a Legendre transformation. We call this [η i ] the expectation parameters or the dual parameters.

51 Examples Normal Distribution η 1 = µ = θ1 2θ 2, η 2 = µ 2 + σ 2 = (θ1 ) 2 2θ 2 4(θ 2 ) 2 Poisson Distribution P(X ) for finite X η = ξ = expθ η i = p(x i ) = ξ i = expθ i 1 + n j=1 expθj

52 Entropy The dual potential ϕ is given by ϕ(η) = θ i η i ψ(θ) = E θ [logp θ C] = H(p θ ) E θ [C], where H is the entropy: H(p) p(x)logp(x)dx. In addition, we have (59) ϕ(θ) = max{θ i η i (θ) ψ(θ )}, (60) θ where the maximum is attained by θ = θ

53 The ±1- is exactly the canonical (g, (±1) )-. The triangular relation can be rewritten as D(p q) + D(q r) D(p r) = {p(x) q(x)}{logr(x) logq(x)}dx, where D = D ( 1) is the KL. (61)

54 Projection From theorems in canonical, the solutions to the minimization problems min D(p q) and min D(q p) q M q M are repectively given by the (m) -projection and (e) -projection.

55 Principle of maximum entropy Given (n + 1) C, F 1,, F n : X R, let S = {p θ θ Θ} be the n-dimensional family. Then for any θ Θ and any q P(X ) we have H(p θ ) + E pθ [C] + θ i E pθ [F i ] H(q) E q [C] θ i E q [F i ] = D(q p θ ) 0, which leads to max {H(q) + E q[c] + θ i E q [F i ]} q P(X ) = H(p θ ) + E pθ [C] + θ i E pθ [F i ] = ψ(θ). (62) (63)

56 Principle of maximum entropy Given a vector λ = (λ 1,, λ n ) R n,let M λ {q P E q [F i ] = λ i, i = 1,, n}. (64) Now assume S M λ and suppose θ λ Θ s.t. η i (θ λ ) = E pθλ [F i ] = λ i for i = 1,, n. Then we have max {H(q) + E q [C]} = H(p θλ ) + E pθλ [C] q M λ = ψ(θ λ ) θ i λ λ i = min θ Θ {ψ(θ) θi λ i }, When C = 0 it follows that max q Mλ H(q) = H(p θλ ), which is called the principle of maximum entropy. (65)

57 Boltzmann-Gibbs distribution The thermal equilibrium state which maximizes the thermodynamical entropy S(p) kh(p), where k(> 0) is Boltzmann s constant, under the constraint E q [ɛ] = ɛ on the average of the energy function ɛ, is given by the Boltzmann-Gibbs distribution p (x) = 1 Z e ɛ(x)/kt, (66) where T is the temperature and Z is the partition function. This corresponds to the previous situation by letting C = 0, n = 1, F i = ɛ, λ = ɛ, θ λ = 1/kT and ψ(θ λ ) = logz.

58 Statistical model with hidden variables Consider a statistical model M = {p(x, ξ)}, where x is divided into two parts x = (y, h) so that p(x, ξ) = p(y, h; ξ). When x is not fully observed but y is observed, h is called a hidden variable. In such a case, we estimate ξ from observed y. Actually, we want to compute the MLE of p Y (y, ξ) = p(y, h; ξ)dh. However, in many cases, the form of p(x, ξ) is simple and estimation is tractable in M, but p Y (y, ξ) is complicated and the estimation is computationally intractable.

59 Empirical distribution Consider a larger model S = {q(y, h)} consisting of all probability density of (y, h). We don t have the empirical distribution q(x) = 1 N δ(x x i ) but only an empirical distribution q Y (y) for y only. We use an arbitrary conditional distribution q(h y) and put q(y, h) = q Y (y)q(h y). (67) And we take all the candidates of observed points and consider a submanifold D = { q(y, h) q(y, h) = q Y (y)q(h y), q(h y) is arbitrary}. (68)

60 Empirical distribution D is the observed submanifold in S specified by the partially observed data y 1,, y N. By using the empirical distribution, it is written as q(y, h) = 1 N δ(y y i )q(h y i ) (69) The data submanifold D is m-flat, because it is linear with respect to q(h y i ).

61 MLE and KL- Consider the minimizer of KL- from data manifold M to the model manifold D, D[D : M] = min q Y (y)q(h y)log q Y (y)q(h y) dydh (70) p(y, h, ξ) Theorem The MLE of p Y (y, ξ) is the minimizer of the KL- from D to M. In fact, we minimize the equation above with respect to both ξ and q(h y) alternately by the em algorithm, that is, the alternating use of the e-projection and m-projection.

62 Algorithm () 1 Choose an initial parameter ξ 0. 2 E-step e-project ξ 0 to D. It can be verified that the e-projection is q(h y) = p(h y; ξ 0 ). 3 M-step Maximize a log likelihood L(ξ, ξ 0 ) = 1 p(h y; ξ 0 )logp(y N i, h, ξ)dh (71) i to obtain a new candidate ξ 1 in M. It can be verified that this is the m-projection. 4 Repeat step 2 and 3.

63 Theorem The KL- decreases monotonically by repeating the E-step and the M-step. Hence, the algorithm converges to an equilibrium. It should be noted that the m-projection is not necessarily unique unless M is e-flat. Hence, there might exist local minima. However, we often come across the family and thus there exists unique solution.

Introduction to Information Geometry

Introduction to Information Geometry Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry

More information

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Chapter 2 Exponential Families and Mixture Families of Probability Distributions Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical

More information

Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions

Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Jun Zhang 1, and Fubo Li 1 University of Michigan, Ann Arbor, Michigan, USA Sichuan University, Chengdu, China

More information

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential

More information

arxiv: v1 [cs.lg] 17 Aug 2018

arxiv: v1 [cs.lg] 17 Aug 2018 An elementary introduction to information geometry Frank Nielsen Sony Computer Science Laboratories Inc, Japan arxiv:1808.08271v1 [cs.lg] 17 Aug 2018 Abstract We describe the fundamental differential-geometric

More information

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

Differential Forms, Integration on Manifolds, and Stokes Theorem

Differential Forms, Integration on Manifolds, and Stokes Theorem Differential Forms, Integration on Manifolds, and Stokes Theorem Matthew D. Brown School of Mathematical and Statistical Sciences Arizona State University Tempe, Arizona 85287 matthewdbrown@asu.edu March

More information

Inference. Data. Model. Variates

Inference. Data. Model. Variates Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j

More information

Riemannian geometry of surfaces

Riemannian geometry of surfaces Riemannian geometry of surfaces In this note, we will learn how to make sense of the concepts of differential geometry on a surface M, which is not necessarily situated in R 3. This intrinsic approach

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17

DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17 DIFFERENTIAL GEOMETRY, LECTURE 16-17, JULY 14-17 6. Geodesics A parametrized line γ : [a, b] R n in R n is straight (and the parametrization is uniform) if the vector γ (t) does not depend on t. Thus,

More information

1. Geometry of the unit tangent bundle

1. Geometry of the unit tangent bundle 1 1. Geometry of the unit tangent bundle The main reference for this section is [8]. In the following, we consider (M, g) an n-dimensional smooth manifold endowed with a Riemannian metric g. 1.1. Notations

More information

F -Geometry and Amari s α Geometry on a Statistical Manifold

F -Geometry and Amari s α Geometry on a Statistical Manifold Entropy 014, 16, 47-487; doi:10.3390/e160547 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article F -Geometry and Amari s α Geometry on a Statistical Manifold Harsha K. V. * and Subrahamanian

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

1 First and second variational formulas for area

1 First and second variational formulas for area 1 First and second variational formulas for area In this chapter, we will derive the first and second variational formulas for the area of a submanifold. This will be useful in our later discussion on

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech

Math 6455 Nov 1, Differential Geometry I Fall 2006, Georgia Tech Math 6455 Nov 1, 26 1 Differential Geometry I Fall 26, Georgia Tech Lecture Notes 14 Connections Suppose that we have a vector field X on a Riemannian manifold M. How can we measure how much X is changing

More information

Characterizing the Region of Entropic Vectors via Information Geometry

Characterizing the Region of Entropic Vectors via Information Geometry Characterizing the Region of Entropic Vectors via Information Geometry John MacLaren Walsh Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu Thanks

More information

Information geometry in optimization, machine learning and statistical inference

Information geometry in optimization, machine learning and statistical inference Front. Electr. Electron. Eng. China 2010, 5(3): 241 260 DOI 10.1007/s11460-010-0101-3 Shun-ichi AMARI Information geometry in optimization, machine learning and statistical inference c Higher Education

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Information Geometric Structure on Positive Definite Matrices and its Applications

Information Geometric Structure on Positive Definite Matrices and its Applications Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1 Outline

More information

Section 2. Basic formulas and identities in Riemannian geometry

Section 2. Basic formulas and identities in Riemannian geometry Section 2. Basic formulas and identities in Riemannian geometry Weimin Sheng and 1. Bianchi identities The first and second Bianchi identities are R ijkl + R iklj + R iljk = 0 R ijkl,m + R ijlm,k + R ijmk,l

More information

Homework for Math , Spring 2012

Homework for Math , Spring 2012 Homework for Math 6170 1, Spring 2012 Andres Treibergs, Instructor April 24, 2012 Our main text this semester is Isaac Chavel, Riemannian Geometry: A Modern Introduction, 2nd. ed., Cambridge, 2006. Please

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Dually Flat Geometries in the State Space of Statistical Models

Dually Flat Geometries in the State Space of Statistical Models 1/ 12 Dually Flat Geometries in the State Space of Statistical Models Jan Naudts Universiteit Antwerpen ECEA, November 2016 J. Naudts, Dually Flat Geometries in the State Space of Statistical Models. In

More information

Differential Geometry Exercises

Differential Geometry Exercises Differential Geometry Exercises Isaac Chavel Spring 2006 Jordan curve theorem We think of a regular C 2 simply closed path in the plane as a C 2 imbedding of the circle ω : S 1 R 2. Theorem. Given the

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Information geometry of the power inverse Gaussian distribution

Information geometry of the power inverse Gaussian distribution Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Exercises in Geometry II University of Bonn, Summer semester 2015 Professor: Prof. Christian Blohmann Assistant: Saskia Voss Sheet 1

Exercises in Geometry II University of Bonn, Summer semester 2015 Professor: Prof. Christian Blohmann Assistant: Saskia Voss Sheet 1 Assistant: Saskia Voss Sheet 1 1. Conformal change of Riemannian metrics [3 points] Let (M, g) be a Riemannian manifold. A conformal change is a nonnegative function λ : M (0, ). Such a function defines

More information

Algorithms for Variational Learning of Mixture of Gaussians

Algorithms for Variational Learning of Mixture of Gaussians Algorithms for Variational Learning of Mixture of Gaussians Instructors: Tapani Raiko and Antti Honkela Bayes Group Adaptive Informatics Research Center 28.08.2008 Variational Bayesian Inference Mixture

More information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information In Geometric Science of Information, 2013, Paris. Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin 1 School of Engineering and Information Sciences Middlesex University,

More information

DIFFERENTIAL GEOMETRY. LECTURE 12-13,

DIFFERENTIAL GEOMETRY. LECTURE 12-13, DIFFERENTIAL GEOMETRY. LECTURE 12-13, 3.07.08 5. Riemannian metrics. Examples. Connections 5.1. Length of a curve. Let γ : [a, b] R n be a parametried curve. Its length can be calculated as the limit of

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Lecture 8. Connections

Lecture 8. Connections Lecture 8. Connections This lecture introduces connections, which are the machinery required to allow differentiation of vector fields. 8.1 Differentiating vector fields. The idea of differentiating vector

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Invariant Nonholonomic Riemannian Structures on Three-Dimensional Lie Groups

Invariant Nonholonomic Riemannian Structures on Three-Dimensional Lie Groups Invariant Nonholonomic Riemannian Structures on Three-Dimensional Lie Groups Dennis I. Barrett Geometry, Graphs and Control (GGC) Research Group Department of Mathematics, Rhodes University Grahamstown,

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

4.7 The Levi-Civita connection and parallel transport

4.7 The Levi-Civita connection and parallel transport Classnotes: Geometry & Control of Dynamical Systems, M. Kawski. April 21, 2009 138 4.7 The Levi-Civita connection and parallel transport In the earlier investigation, characterizing the shortest curves

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18

Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18 Differential Geometry MTG 6257 Spring 2018 Problem Set 4 Due-date: Wednesday, 4/25/18 Required problems (to be handed in): 2bc, 3, 5c, 5d(i). In doing any of these problems, you may assume the results

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

Tensors, and differential forms - Lecture 2

Tensors, and differential forms - Lecture 2 Tensors, and differential forms - Lecture 2 1 Introduction The concept of a tensor is derived from considering the properties of a function under a transformation of the coordinate system. A description

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

WARPED PRODUCTS PETER PETERSEN

WARPED PRODUCTS PETER PETERSEN WARPED PRODUCTS PETER PETERSEN. Definitions We shall define as few concepts as possible. A tangent vector always has the local coordinate expansion v dx i (v) and a function the differential df f dxi We

More information

Geometric inequalities for black holes

Geometric inequalities for black holes Geometric inequalities for black holes Sergio Dain FaMAF-Universidad Nacional de Córdoba, CONICET, Argentina. 3 August, 2012 Einstein equations (vacuum) The spacetime is a four dimensional manifold M with

More information

Practice Qualifying Exam Questions, Differentiable Manifolds, Fall, 2009.

Practice Qualifying Exam Questions, Differentiable Manifolds, Fall, 2009. Practice Qualifying Exam Questions, Differentiable Manifolds, Fall, 2009. Solutions (1) Let Γ be a discrete group acting on a manifold M. (a) Define what it means for Γ to act freely. Solution: Γ acts

More information

Lecture 4 - The Basic Examples of Collapse

Lecture 4 - The Basic Examples of Collapse Lecture 4 - The Basic Examples of Collapse July 29, 2009 1 Berger Spheres Let X, Y, and Z be the left-invariant vector fields on S 3 that restrict to i, j, and k at the identity. This is a global frame

More information

LECTURE 10: THE PARALLEL TRANSPORT

LECTURE 10: THE PARALLEL TRANSPORT LECTURE 10: THE PARALLEL TRANSPORT 1. The parallel transport We shall start with the geometric meaning of linear connections. Suppose M is a smooth manifold with a linear connection. Let γ : [a, b] M be

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor RANDOM FIELDS AND GEOMETRY from the book of the same name by Robert Adler and Jonathan Taylor IE&M, Technion, Israel, Statistics, Stanford, US. ie.technion.ac.il/adler.phtml www-stat.stanford.edu/ jtaylor

More information

LECTURE 8: THE SECTIONAL AND RICCI CURVATURES

LECTURE 8: THE SECTIONAL AND RICCI CURVATURES LECTURE 8: THE SECTIONAL AND RICCI CURVATURES 1. The Sectional Curvature We start with some simple linear algebra. As usual we denote by ( V ) the set of 4-tensors that is anti-symmetric with respect to

More information

Discrete Euclidean Curvature Flows

Discrete Euclidean Curvature Flows Discrete Euclidean Curvature Flows 1 1 Department of Computer Science SUNY at Stony Brook Tsinghua University 2010 Isothermal Coordinates Relation between conformal structure and Riemannian metric Isothermal

More information

ON THE FOLIATION OF SPACE-TIME BY CONSTANT MEAN CURVATURE HYPERSURFACES

ON THE FOLIATION OF SPACE-TIME BY CONSTANT MEAN CURVATURE HYPERSURFACES ON THE FOLIATION OF SPACE-TIME BY CONSTANT MEAN CURVATURE HYPERSURFACES CLAUS GERHARDT Abstract. We prove that the mean curvature τ of the slices given by a constant mean curvature foliation can be used

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

GEOMETRIC QUANTIZATION

GEOMETRIC QUANTIZATION GEOMETRIC QUANTIZATION 1. The basic idea The setting of the Hamiltonian version of classical (Newtonian) mechanics is the phase space (position and momentum), which is a symplectic manifold. The typical

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

Outline of the course

Outline of the course School of Mathematical Sciences PURE MTH 3022 Geometry of Surfaces III, Semester 2, 20 Outline of the course Contents. Review 2. Differentiation in R n. 3 2.. Functions of class C k 4 2.2. Mean Value Theorem

More information

Homework 4. Goldstein 9.7. Part (a) Theoretical Dynamics October 01, 2010 (1) P i = F 1. Q i. p i = F 1 (3) q i (5) P i (6)

Homework 4. Goldstein 9.7. Part (a) Theoretical Dynamics October 01, 2010 (1) P i = F 1. Q i. p i = F 1 (3) q i (5) P i (6) Theoretical Dynamics October 01, 2010 Instructor: Dr. Thomas Cohen Homework 4 Submitted by: Vivek Saxena Goldstein 9.7 Part (a) F 1 (q, Q, t) F 2 (q, P, t) P i F 1 Q i (1) F 2 (q, P, t) F 1 (q, Q, t) +

More information

Statistical physics models belonging to the generalised exponential family

Statistical physics models belonging to the generalised exponential family Statistical physics models belonging to the generalised exponential family Jan Naudts Universiteit Antwerpen 1. The generalized exponential family 6. The porous media equation 2. Theorem 7. The microcanonical

More information

Syllabus. May 3, Special relativity 1. 2 Differential geometry 3

Syllabus. May 3, Special relativity 1. 2 Differential geometry 3 Syllabus May 3, 2017 Contents 1 Special relativity 1 2 Differential geometry 3 3 General Relativity 13 3.1 Physical Principles.......................................... 13 3.2 Einstein s Equation..........................................

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

CHAPTER 1 PRELIMINARIES

CHAPTER 1 PRELIMINARIES CHAPTER 1 PRELIMINARIES 1.1 Introduction The aim of this chapter is to give basic concepts, preliminary notions and some results which we shall use in the subsequent chapters of the thesis. 1.2 Differentiable

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information

class # MATH 7711, AUTUMN 2017 M-W-F 3:00 p.m., BE 128 A DAY-BY-DAY LIST OF TOPICS

class # MATH 7711, AUTUMN 2017 M-W-F 3:00 p.m., BE 128 A DAY-BY-DAY LIST OF TOPICS class # 34477 MATH 7711, AUTUMN 2017 M-W-F 3:00 p.m., BE 128 A DAY-BY-DAY LIST OF TOPICS [DG] stands for Differential Geometry at https://people.math.osu.edu/derdzinski.1/courses/851-852-notes.pdf [DFT]

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Distance-Divergence Inequalities

Distance-Divergence Inequalities Distance-Divergence Inequalities Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences Motivation To find a simple proof of the Blowing-up Lemma, proved by Ahlswede,

More information

Bounding the Entropic Region via Information Geometry

Bounding the Entropic Region via Information Geometry Bounding the ntropic Region via Information Geometry Yunshu Liu John MacLaren Walsh Dept. of C, Drexel University, Philadelphia, PA 19104, USA yunshu.liu@drexel.edu jwalsh@coe.drexel.edu Abstract This

More information

Chapter 7 Curved Spacetime and General Covariance

Chapter 7 Curved Spacetime and General Covariance Chapter 7 Curved Spacetime and General Covariance In this chapter we generalize the discussion of preceding chapters to extend covariance to more general curved spacetimes. 145 146 CHAPTER 7. CURVED SPACETIME

More information

LECTURE 2. (TEXED): IN CLASS: PROBABLY LECTURE 3. MANIFOLDS 1. FALL TANGENT VECTORS.

LECTURE 2. (TEXED): IN CLASS: PROBABLY LECTURE 3. MANIFOLDS 1. FALL TANGENT VECTORS. LECTURE 2. (TEXED): IN CLASS: PROBABLY LECTURE 3. MANIFOLDS 1. FALL 2006. TANGENT VECTORS. Overview: Tangent vectors, spaces and bundles. First: to an embedded manifold of Euclidean space. Then to one

More information

Faculty of Engineering, Mathematics and Science School of Mathematics

Faculty of Engineering, Mathematics and Science School of Mathematics Faculty of Engineering, Mathematics and Science School of Mathematics JS & SS Mathematics SS Theoretical Physics SS TSM Mathematics Module MA3429: Differential Geometry I Trinity Term 2018???, May??? SPORTS

More information

Linear Ordinary Differential Equations

Linear Ordinary Differential Equations MTH.B402; Sect. 1 20180703) 2 Linear Ordinary Differential Equations Preliminaries: Matrix Norms. Denote by M n R) the set of n n matrix with real components, which can be identified the vector space R

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Transport Continuity Property

Transport Continuity Property On Riemannian manifolds satisfying the Transport Continuity Property Université de Nice - Sophia Antipolis (Joint work with A. Figalli and C. Villani) I. Statement of the problem Optimal transport on Riemannian

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix. Quadratic forms 1. Symmetric matrices An n n matrix (a ij ) n ij=1 with entries on R is called symmetric if A T, that is, if a ij = a ji for all 1 i, j n. We denote by S n (R) the set of all n n symmetric

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Metrics and Curvature

Metrics and Curvature Metrics and Curvature How to measure curvature? Metrics Euclidian/Minkowski Curved spaces General 4 dimensional space Cosmological principle Homogeneity and isotropy: evidence Robertson-Walker metrics

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Two dimensional manifolds

Two dimensional manifolds Two dimensional manifolds We are given a real two-dimensional manifold, M. A point of M is denoted X and local coordinates are X (x, y) R 2. If we use different local coordinates, (x, y ), we have x f(x,

More information

LECTURE 9: MOVING FRAMES IN THE NONHOMOGENOUS CASE: FRAME BUNDLES. 1. Introduction

LECTURE 9: MOVING FRAMES IN THE NONHOMOGENOUS CASE: FRAME BUNDLES. 1. Introduction LECTURE 9: MOVING FRAMES IN THE NONHOMOGENOUS CASE: FRAME BUNDLES 1. Introduction Until now we have been considering homogenous spaces G/H where G is a Lie group and H is a closed subgroup. The natural

More information

Gravitation: Tensor Calculus

Gravitation: Tensor Calculus An Introduction to General Relativity Center for Relativistic Astrophysics School of Physics Georgia Institute of Technology Notes based on textbook: Spacetime and Geometry by S.M. Carroll Spring 2013

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Two simple ideas from calculus applied to Riemannian geometry

Two simple ideas from calculus applied to Riemannian geometry Calibrated Geometries and Special Holonomy p. 1/29 Two simple ideas from calculus applied to Riemannian geometry Spiro Karigiannis karigiannis@math.uwaterloo.ca Department of Pure Mathematics, University

More information

Dependence, correlation and Gaussianity in independent component analysis

Dependence, correlation and Gaussianity in independent component analysis Dependence, correlation and Gaussianity in independent component analysis Jean-François Cardoso ENST/TSI 46, rue Barrault, 75634, Paris, France cardoso@tsi.enst.fr Editor: Te-Won Lee Abstract Independent

More information

Is there a magnification paradox in gravitational lensing?

Is there a magnification paradox in gravitational lensing? Is there a magnification paradox in gravitational lensing? Olaf Wucknitz wucknitz@astro.uni-bonn.de Astrophysics seminar/colloquium, Potsdam, 26 November 2007 Is there a magnification paradox in gravitational

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information