Parametrized Measure Models

Size: px
Start display at page:

Download "Parametrized Measure Models"

Transcription

1 Parametrized Measure Models Nihat Ay Jürgen Jost Hông Vân Lê Lorenz Schwachhöfer SFI WORKING PAPER: SFI Working Papers contain accounts of scienti5ic work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. SANTA FE INSTITUTE

2 Parametrized measure models Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer October 25, 2015 Abstract We develope a new and general notion of parametric measure models and statistical models on an arbitrary sample space. This is given by a diffferentiable map from the parameter manifold M into the set of finite measures or probability measures on, respectively, which is differentiable when regarded as a map into the Banach space of all signed measures on. Furthermore, we also give a rigorous definition of roots of measures and give a natural definition of the Fisher metric and the Amari-Chentsov tensor as the pullback of tensors defined on the space of roots of measures. We show that many features such as the preservation of this tensor under sufficient statistics and the monotonicity formula hold even in this very general set-up. MSC2010: 53C99, 62B05 Keywords: Fisher quadratic form, Amari-Chentsov tensor, sufficient statistic, monotonicity 1 Introduction Information geometry is concerned with the use of differential geometric methods in probability theory. An important object of investigation are families of probability measures or, more generally, of finite measures on a given sample space which depend differentiably on a finite number of parameters. Associated to such a family there are two symmetric tensors on the parameter space M. The first is a quadratic form (i.e., a Riemannian metric), called the Fisher metric g F, and the second is a 3-tensor, called the Amari-Chentsov tensor T. The Fisher metric was first suggested by Rao [19], followed by Jeffreys [12], Efron [11] and then systematically developed by Chentsov and Morozova [8], [9] and [16]; the Amari-Chentsov tensor and its significance was discovered by Amari [1], [2] and Chentsov [10]. These tensors are of interest from the differential geometric point of view as they do not depend on the particular choice of parametrization of the family, but they are also natural objects from the point of view of statistics, as they are unchanged under sufficient statistics and are in fact characterized by this property; this was shown in the case of finite sample spaces by Chentsov in [9] and more recently for general sample spaces in [5]. In fact, Chentsov not only showed the invariance of these tensors under sufficient statistics, but also under what he called congruent embeddings of probability measures. These are Markov kernels between finite sample spaces which are right inverses of a statistic. We use this property to give a precise definition of congruent embeddings between arbitrary sample spaces (cf. Definition 3.1). As it turns out, every Markov kernel induces a congruent embedding in this sense, but there are congruent embeddings which are not induced by Markov kernels, cf. Theorem

3 The main conceptual difficulty in the investigation of families of probability measures is the lack of a canonical manifold structure on the spaces M() and P() of finite measures and probability measures on. If is finite, then this issue is not a problem, as in this case a measure is given by finitely many parameters, allowing to identify M() with the positive orthant in R and P() with the intersection of this orthant with an affine hyperplane in R, so that both are (finite dimensional) manifolds in a canonical way. But this is no longer true if is infinite. Attempts have been made to provide P() and M() with a Banach manifold structure. For instance, Pistone and Sempi [18] equipped these spaces with a topology, the so-called e-topology. With this, P() and M() become Banach manifolds and have many remarkable features, see e.g. [7], [17]. On the other hand, the e-topology is very strong in the sense that many families of measures on fail to be continuous w.r.t. the e-topology, so it cannot be applied as widely as one would wish. Another approach was recently pursued by Bauer, Bruveris and Michor [6] under the assumption that is a manifold. In this case, the space of smooth densities also carries a natural topology, and they were able to show that the invariance under diffeomorphisms already suffices to characterize the Fisher metric of a family of such densities. In [5], the authors of the present article proposed to define parametrized measure models as a family given as p(ξ) = p(ω; ξ)µ. (1.1) for some reference measure µ and a positive function p on M which is differentiable in ξ M, an idea which closely follows the notion of Amari [1]. While this embraces many interesting families of measures, it is still restricted as it heavily depends on the choice of the reference measure µ which is a priori not naturally given. It is the aim of the present article to provide a yet more general definition of parametrized measure models which embraces all of the aforementioned definitions, but is more general and more natural than these at the same time. Namely, in this article we define parametrized measure models and statistical models, respectively, as families (p(ξ)) ξ M which are given by a map p from M to M() and P(), respectively, which is differentiable when regarded as a map between the (finite or infinite dimensional) manifold M and the Banach space S() of finite signed measures on, since evidently P() and M() are subsets of S(). That is, the geometric structure on M() and P() is given by the inclusions P() M() S(). If the model is given as in the definition from [5] by (1.1), then we say that it is given by a regular density function. We shall show that most of the statements shown in [5] for parametrized measure models or statistical models with a regular density function also hold without this assumption. As it turns out, neither the Fisher metric nor the Amari-Chentsov tensor may be regarded as pull-backs of a tensor on S() via the map p. In order to overcome this problem, let us for the moment assume that the family is given by a regular density function as in (1.1). In this case, the definitions of g F and T AC can be rewritten as g F (V, W ) = V log p(ω; ξ) W log p(ω; ξ) dp(ξ), = 4 V p(ω; ξ) W p(ω; ξ) dµ (1.2) = 4 ) d ( V p(ξ) W p(ξ) 2

4 and T AC (V, W, U) = V log p(ω; ξ) W log p(ω; ξ) U log p(ω; ξ) dp(ξ) = 27 3 V p(ω; ξ) 3 W p(ω; ξ) 3 U p(ω; ξ) dµ (1.3) ( = 27 d 3 V p(ξ) 3 W p(ξ) ) 3 U p(ξ). Of course, the last lines in (1.2) and (1.3) do not make sense a priori, as it is not clear what a root of a measure should be, but this is precisely the way we shall make use of this. Namely, for r (0, 1] we define the space S r () of r-th powers of a measure, which has M r () and P r () as subsets. Again, S r () is a Banach space and S 1 () = S(). Furthermore, there is a bounded bilinear multiplication map : S r () S s () S r+s (), for r, s, r + s (0, 1] as well as differentiable maps taking the (signed) k-th power, π k, ˆπ k : S r () S kr () for r, kr (0, 1]. That is, one can work with these objects in a very suggestive way. We call a parametrized measure model p : M M() k-integrable, if the map p 1/k : M M 1/k () S 1/k () is (formally) differentiable. This notion of k-integrability turns out to be equivalent to that given in [5] for models with a regular density function. On S 1/n () we define the canonical n-tensor as L n (ν 1,..., ν n ) := n n d(ν 1 ν n ). Then by (1.2) and (1.3) the Fisher metric and the Amari-Chentsov tensor are given as the pull-backs g F = (p 1/2 ) (L 2 ) and T AC = (p 1/3 ) (L 3 ), if the model is k-integrable for k 2 or k 3, respectively. That is, these tensors may be defined as pullbacks of objects which are naturally defined in terms of only. We also discuss the behaviour of the Fisher metric under statistics. For this, let κ : be measurable and κ : S() S( ) be the push-forward of (signed) measures. This is a bounded linear map which maps M() to M( ) and P() to P( ), respectively. In particular, given a parametrized measure model p : M M(), it induces a map p := κ p. We then show that this process preserves k-integrability, i.e., if p is k-integrable, then so is p (cf. Theorem 5.1). Moreover, we show that in this generality the monotonicity formula holds (Theorem 5.2): g F (V, V ) g F (V, V ), (1.4) where g F and g F denote the Fisher metrics of p and p, respectively, and where V T M. Moreover, if p(ξ) has a regular density function p, then equality holds for all V iff κ is a sufficient statistic for the model. In fact, we show (1.4) even in the case where κ is replaced by any Markov kernel K : P( ). While the monotonicity formula has been known in special cases, e.g. if one of, is finite, or if both are manifolds and κ is differentiable, our approach shows that (1.4) holds without any further 3

5 assumption on the model whatsoever. In particular, it is also valid for rather peculiar statistics as e.g. Example 3.2. This paper is structured as follows. In Section 2 we give the formal definition of the spaces of powers of measures, setting up the formalism needed later on. In Section 3 we provide a precise definition of congruent embeddings for arbitrary sample spaces and discuss their relations with Markov kernels and the existence of transverse measures. In the following Section 4 we establish the notion of k-integrability, which is applied in the final Section 5 to the discussion of sufficient statistics and the proof of the monotonicity formula. Acknowledgements. This work was mainly carried out at the Max Planck Institute for Mathematics in the Sciences in Leipzig, and we are grateful for the excellent working conditions provided at that institution. H.V. Lê is partially supported by Grant RVO: J. Jost acknowledges support from the ERC Advanced Grant FP The spaces of measures and their powers 2.1 The space of (signed) finite measures Let (, Σ) be a measurable space, that is an arbitrary set together with a sigma algebra Σ of subsets of. Regarding the sigma algebra Σ on as fixed, we let P() := {µ : µ a probability measure on } M() := {µ : µ a finite measure on } S() := {µ : µ a signed finite measure on } S 0 () := {µ S() : dµ = 0}. Clearly, P() M() S(), and S 0 (), S() are real vector spaces. In fact, both S 0 () and S() are Banach spaces whose norm is given by the total variation of a signed measure, defined as µ T V n := sup µ(a i ) i=1 where the supremum is taken over all finite partitions = A 1... A n with disjoint sets A i Σ. By the Jordan decomposition theorem, each measure µ S() can be decomposed uniquely as µ = µ + µ with µ ± M(), µ + µ. (2.1) Thus, if we define then (2.1) implies so that µ := µ + + µ M(), µ(a) µ (A) for all µ S() and A Σ, (2.2) µ T V = µ T V = µ (). 4

6 In particular, Moreover, fixing a measure µ 0 M(), we let P() = {µ M() : µ T V = 1}. P(, µ 0 ) := {µ P() : µ is dominated by µ 0 } M(, µ 0 ) := {µ M() : µ is dominated by µ 0 } (2.3) S(, µ 0 ) := {µ S() : µ is dominated by µ 0 } S 0 (, µ 0 ) := S(, µ 0 ) S 0 (), where we say that µ 0 dominates µ if every µ 0 -null set is also a µ -null set. We may canonically identify S(, µ 0 ) with L 1 (, µ 0 ) by the correspondence ı can : L 1 (, µ 0 ) S(, µ 0 ), φ φ µ 0. By the Radon-Nikodým theorem, this is an isomorphism whose inverse is given by the Radon- Nikodým derivative µ dµ dµ 0. Observe that ı can is an isomorphism of Banach spaces, since evidently φ L 1 (,µ 0 ) = φ dµ 0 = φ µ 0 T V. 2.2 Differential maps between Banach manifolds and tangent spaces In this section, we shall recall some basic notions of maps between Banach manifolds. For simplicity, we shall restrict ourselves to maps between open subsets of Banach spaces, even though this notion can be generalized to general Banach manifolds, see e.g. [13]. Let V and W be Banach spaces and U V an open subset. A map φ : U W is called differentiable at x U, if there is a bounded linear operator d x φ Lin(V, W ) such that φ(x + h) φ(x) d x φ(h) W lim = 0. (2.4) h 0 h V In this case, d x φ is called the (total) differential of φ at x. Moreover, φ is called continuously differentiable or shortly a C 1 -map, if it is differentiable at every x U, and the map dφ : U Lin(V, W ), x d x φ is continuous. Furthermore, a differentiable map c : ( ε, ε) W is called a curve in W. Definition 2.1. Let X V be an arbitrary subset and let x 0 X. Then v V is called a tangent vector of X at x 0, if there is a curve c : ( ε, ε) X V such that c(0) = x 0 and ċ(0) := d 0 c(1) = v. The set of all tangent vectors at x 0 is called the tangent space of X at x 0 and is denoted by T x0 X. We also let T X := x 0 X T x 0 X X V V V, equipped with the induced topology. For instance, if X = U V is an open set, then T x0 U = V for all x 0 and hence, T U = U V. Indeed, the curve c(t) = x 0 + tv U for small t satisfies the properties required in the definition. While reparametrization of c implies that T x0 X is invariant under scalar multiplication, this set fails to be a linear subspace in general; if X V is a submanifold, however, then T x0 X coincides with the standard notion of the tangent space of a (Banach) manifold, justifying our notation. 5

7 If U V is open and φ : U W is a C 1 -map whose image is contained in X W, then d x0 φ(v ) T φ(x0 )X, whence φ induces a continuous map dφ : T U = U V T X, (u, v) d u φ(v). Proposition 2.1. Let V = S() be the Banach space of signed measures on. Then the tangent spaces of M() and P() are T M() := µ M() S(, µ) M() S() and T P() := µ P() S 0(, µ) P() S(). Remark 2.1. This result is not obvious, because, in the case of an infinite sample space, P() and M() are not Banach submanifolds of S(). Our proof will handle a more general situation, as needed in Proposition 2.2 below. To prove Proposition 2.1, we first show the following simple Lemma 2.1. Let {ν n : n N} S() be a countable family of measures. Then there is a measure µ 0 M() dominating ν n for all n. Proof. We assume w.l.o.g. that ν n 0 for all n and define µ 0 := n=1 1 2 n ν n T V ν n. Since ν n T V = ν n (), it follows that this sum converges, so that µ 0 M() is well defined. Moreover, if µ 0 (A) = 0, then ν n (A) = 0 for all n, showing that µ 0 dominates all ν n as claimed. Proof of Proposition 2.1. Given ν T µ0 M() S(), let (µ t ) t ( ε,ε) be a curve in M() with µ 0 = ν. Pick two sequences t ± n 0, t + n > 0, t n < 0, and let ˆµ 0 M() be a measure which dominates the measures µ t ± n, µ 0 and ν. This measure exists by Lemma 2.1. Thus, there are functions φ 0, φ 0, φ± n L 1 (, ˆµ 0 ) such that µ t ± n = φ ± n ˆµ 0, µ 0 = φ 0ˆµ 0, ν = φ 0ˆµ 0. By hypothesis, φ ± n, φ 0 0. Adapting the definition of a C 1 -map in (2.4), the differential quotient (t ± n ) 1 (µ t ± n µ 0 ) = (t ± n ) 1 (φ ± n φ 0 )ˆµ 0 converges in S() to ν = φ 0ˆµ 0, whence φ ± n φ 0 t ± n n φ 0 in L 1 (, ˆµ 0 ). After passing to subsequences this implies pointwise convergence ˆµ 0 -a.e. If φ 0 (ω) = 0, then φ+ n (ω) φ 0 (ω) 0 while φ t + n (ω) φ 0 (ω) 0. Thus, if both converge to φ n t 0 (ω), then n φ 0 (ω) = 0. That is: on φ 1 0 (0) we have φ 0 = 0 ˆµ 0-a.e., showing that ν = φ 0ˆµ 0 is dominated by µ 0 = φ 0ˆµ 0 and hence ν S(, µ 0 ). Thus, T µ0 M() S(, µ 0 ). 6

8 Conversely, given ν = φµ 0 S(, µ 0 ), define µ t := (1 + tφ) + µ 0 M(, µ 0 ). Then µ t µ 0 tν T V t = ((1 + tφ) + 1 tφ)µ 0 T V t = (1 + tφ) 1 t ( t 1 φ ) 1 t 0 0, using the monotone convergence theorem in the last step. Thus, ν is the derivative of (µ t ) t ( ε,ε), showing that ν T µ0 M() as claimed. To show the statement for P(), let (µ t ) t ( ε,ε) be a curve in P() with µ 0 = ν. Then as µ t is a probability measure for all t, we conclude dν = 1 t d(µ t µ 0 tν) µ t µ 0 tν T V t t 0 0, so that ν S 0 (). Since P() M(), it follows that T µ0 P() T µ0 M() S 0 () = S 0 (, µ 0 ) for all µ 0 P(). Conversely, given ν = φµ 0 S 0 (, µ 0 ), define the curve λ t := µ t µ t 1 T V P() with µ t = (1 + tφ) + µ 0 as before, so that λ 0 = µ 0. We set c ± t := (1 + tφ) ± dµ 0 0, so that c + t = µ t T V. Moreover, c + t c t = (1 + tφ) dµ 0 = 1 = µ t T V = 1 + c t 1, as µ 0 P() and φµ 0 S 0 (, µ). Thus, λ t λ 0 tν T V = (1 + tφ) + 1 tφ µ t T V dµ 0 = c t (1 + tφ) + + (1 + tφ) µ t T V dµ 0 c t = (1 + tφ) + dµ 0 + (1 + tφ) dµ 0 = 2c t µ t. T V } {{ } =c + t = µt T V } {{ } =c t But as shown above, lim t 0 c t t = 0, whence ν = λ(0) and ν T µ0 P(). 2.3 Powers of densities Let us now give the formal definition of roots of measures. On the set M() we define the preordering µ 1 µ 2 if µ 2 dominates µ 1. Then (M(), ) is a directed set, meaning that for any pair µ 1, µ 2 M() there is a µ 0 M() dominating both of them (use e.g. µ 0 := µ 1 + µ 2, but observe that by Lemma 2.1, this even holds for countable families of measures). 7

9 For fixed r (0, 1] and measures µ 1 µ 2 on we define the linear embedding ( ) r ı µ 1 µ 2 : L 1/r (, µ 1 ) L 1/r dµ1 (, µ 2 ), φ φ. dµ 2 Observe that ı µ 1 µ 2 (φ) 1/r = = ı µ 1 µ 2 (φ) 1/r dµ 2 r = φ 1/r dµ 1 r = φ 1/r, φ 1/r dµ 1 dµ 2 dµ 2 r (2.5) so that ı µ 1 µ 2 is an isometry. Moreover, we have evidently ı µ 1 µ 2 ı µ 2 µ 3 = ı µ 1 µ 3 whenever µ 1 µ 2 µ 3. Then we define the space of r-th roots of measures on to be the directed limit over the directed set (M(), ) S r () := lim L 1/r (, µ). (2.6) Let us give a more concrete definition of S r (). On the disjoint union of the spaces L 1/r (, µ) for µ M() we define the equivalence relation L 1/r (, µ 1 ) φ ψ L 1/r (, µ 2 ) ı µ 1 µ 0 (φ) = ı µ 2 µ 0 (ψ) ( ) r ( dµ1 dµ2 φ = ψ dµ 0 dµ 0 for some µ 0 µ 1, µ 2. Then S r () is the set of all equivalence classes of this relation. If we denote the equivalence class of φ L 1/r (, µ) by φµ r, then the equivalence relation yields ( ) r µ r dµ1 1 = µ r 2 (2.7) dµ 2 whenever µ 1 µ 2, justifying this notation. In fact, from this description in the case r = 1 we see that S 1 () = S(). Observe that by (2.5) φ 1/r is constant on equivalence classes, whence there is a norm on S r (), also denoted by. 1/r, for which the inclusions ) r L 1/r (, µ) S r (), φ φµ r are isometries. For r = 1, we have. 1 =. T V. Thus, φµ r 1/r = φ 1/r = φ 1/r dµ r for 0 < r 1 (2.8) Note that the equivalence relation also preserves nonnegativity of functions, whence we may define the subsets M r () := {φµ r : µ M(), φ 0} P r () := {φµ r : µ P(), φ 0, φ 1/r = 1}. (2.9) 8

10 In analogy to (2.3) we define for a fixed measure µ 0 M() and r (0, 1] the spaces S r (, µ 0 ) := {φ µ r 0 : φ L 1/r (, µ 0 )} M r (, µ 0 ) := {φ µ r 0 : φ L 1/r (, µ 0 ), φ 0} P r (, µ 0 ) := {φ µ r 0 : φ L 1/r (, µ 0 ), φ 0, φ 1/r = 1} { } S0(, r µ 0 ) := φµ r 0 : φ L 1/r (, µ 0 ), φ dµ = 0. The elements of P r (, µ 0 ), M r (, µ 0 ), S r (, µ 0 ) are said to be dominated by µ r 0. From Lemma 2.1, we can now conclude the following statement: Any sequence in S r () is contained in S r (, µ 0 ) for some µ 0 M(). In particular, any Cauchy sequence in S r () is a Cauchy sequence in S r (, µ 0 ) = L 1/r (, µ 0 ) for some µ 0 and hence convergent. Thus, (S r (),. 1/r ) is a Banach space. In analogy to Proposition 2.1, we can also determine the tangent spaces of the subsets P r () M r () S r (). The proof of the statement is identical to that of Proposition 2.1 and whence is omitted. Proposition 2.2. For each µ M() (µ P(), respectively), the tangent spaces of P r () M r () S r () at µ r are given as and T µ rm r () T µ rp r () := S r (, µ) S r () := S r 0(, µ) S r (). The product of powers of measures can now be defined for all r, s (0, 1] with r + s 1 and for measures φµ r S r (, µ) and ψµ s S s (, µ): (φµ r ) (ψµ s ) := φψµ r+s. By definition φ L 1/r (, µ) and ψ L 1/s (, µ), whence Hölder s inequality implies that φψ 1/(r+s) φ 1/r ψ 1/s <, so that φψ L 1/(r+s) (, µ) and hence, φψµ r+s S r+s (, µ). Since by (2.7) this definition of the product is independent of the choice of representative µ, it follows that it induces a bilinear product satisfying the Hölder inequality : S r () S s () S r+s (), where r, s, r + s (0, 1], (2.10) ν r ν s 1/(r+s) ν r 1/r ν s 1/s, so that the product in (2.10) is a bounded bilinear map. Definition 2.2. (Canonical pairing) For r (0, 1] we define the pairing (.;.) : S r () S 1 r () R, (ν 1 ; ν 2 ) := d(ν 1 ν 2 ). (2.11) 9

11 It is straightforward to verify that this pairing is non-degenerate in the sense that (ν r ;.) = 0 ν r = 0. (2.12) Besides multiplication of roots of measures, we also wish to take their powers. Here, we have two possibilities to deal with signs. For 0 < k r 1 and ν r = φµ r S r () we define ν r k := φ k µ rk and ν k r := sign (φ) φ k µ rk. Since φ L 1/r (, µ), it follows that φ k L k/r (, µ), so that ν r k, ν k r S rk (). By (2.7) these powers are well defined, independent of the choice of the measure µ, and, moreover, ν r k 1/(rk) = ν k r 1/(rk) = ν r k 1/r. (2.13) Proposition 2.3. Let r (0, 1] and 0 < k 1/r, and consider the maps π k, π k : S r () S rk (), π k (ν) := ν k π k (ν) := ν k. Then π k, π k are continuous maps. Moreover, for 1 < k 1/r they are C 1 -maps between Banach spaces, and their derivatives are given as d νr π k (ρ r ) = k ν r k 1 ρ r and d νr π k (ρ r ) = k ν k 1 r ρ r. (2.14) Observe that for k = 1, π 1 (ν r ) = ν r fails to be C 1, whereas π 1 (ν r ) = ν r, so that π 1 is the identity and hence a C 1 -map. Proof. Let us first assume that 0 < k 1. We assert that in this case, there are constants C k, C k > 0 such that for all x, y R x + y k x k C k y k and sign (x + y) x + y k sign (x) x k C k y k. (2.15) Namely, by homogeneity it suffices to show this for y = 1, and since the functions x x + 1 k x k and x sign (x + 1) x + 1 k sign (x) x k are continuous and have finite limits for x ±, it follows that they are bounded, showing (2.15). Let ν 1, ν 2 S r (), and choose µ 0 M() such that ν 1, ν 2 S r (, µ 0 ), i.e., ν i = φ i µ r 0 with φ i L 1/r (, µ 0 ). Then π k (ν 1 + ν 2 ) π k (ν 1 ) 1/(rk) = φ 1 + φ 2 k φ 1 k 1/(rk) C k φ 2 k 1/rk by (2.15) = C k ν 2 k 1/r by (2.13), so that lim ν2 1/r 0 π k (ν 1 + ν 2 ) π k (ν 1 ) 1/(rk) = 0, showing the continuity of π k for 0 < k 1. The continuity of π k follows analogously. 10

12 Now let us assume that 1 < k 1/r. In this case, the functions x x k and x sign (x) x k with x R are C 1 -maps with respective derivatives x k sign (x) x k 1 and x k x k 1. Thus, if we pick ν i = φ i µ r 0 as above, then by the mean value theorem we have π k (ν 1 + ν 2 ) π k (ν 1 ) = ( φ 1 + φ 2 k φ 1 k )µ rk 0 = k sign (φ 1 + ηφ 2 ) φ 1 + ηφ 2 k 1 φ 2 µ rk 0 = k sign (φ 1 + ηφ 2 ) φ 1 + ηφ 2 k 1 µ r(k 1) 0 ν 2 for some function η : (0, 1). If we let ν η := ηφ 2 µ r 0, then ν η 1/r ν 2 1/r, and we get With the definition of d ν1 π k from (2.14) we have π k (ν 1 + ν 2 ) π k (ν 1 ) = k π k 1 (ν 1 + ν η ) ν 2. π k (ν 1 + ν 2 ) π k (ν 1 ) d ν1 π k (ν 2 ) 1/(rk) = k( π k 1 (ν 1 + ν η ) π k 1 (ν 1 )) ν 2 1/(rk) k π k 1 (ν 1 + ν η ) π k 1 (ν 1 ) 1/(r(k 1)) ν 2 1/r and hence, π k (ν 1 + ν 2 ) π k (ν 1 ) d ν1 π k (ν 2 ) 1 rk ν 2 1 r k π k 1 (ν 1 + ν η ) π k 1 (ν 1 ) 1. r(k 1) Thus, the differentiability of π k will follow if π k 1 (ν 1 + ν η ) π k 1 (ν 1 ) 1/(r(k 1)) ν 2 1/r 0 0, and because of ν η 1/r ν 2 1/r, this is the case if π k 1 is continuous. Analogously, one shows that π k is differentiable if π k 1 is continuous. Since we already know continuity of π k and π k for 0 < k 1, and since C 1 -maps are continuous, the claim now follows by induction on k. Thus, (2.14) implies that the differentials of π k and π k (which coincide on P r () and M r ()) yield continuous maps dπ k = d π k : T P r () T P rk () T M r () T M rk (), (µ, ρ) kµ rk r ρ. 11

13 3 Congruent embeddings 3.1 Statistics and congruent embeddings Given two measurable sets (, Σ) and (, Σ ), a measurable map κ : will be called a statistic. If µ is a (signed) measure on (, Σ), it then induces a (signed) measure κ µ on (, Σ ), via κ µ(a) := µ(κ 1 A), which is called the push-forward of µ by κ. Note that κ : S() S( ) (3.1) is a bounded linear map. Indeed, the norm on both spaces is given by the total variation, and we have for any partition = A 1... A n κ µ(a i) = µ(κ 1 A i) µ T V, whence κ µ T V µ T V. Moreover, κ is monotone, i.e., it maps nonnegative measures to nonnegative measures, and for these, the total variation is preserved: κ µ T V = µ T V for all µ M(). In particular, κ maps probability measures to probability measures, i.e., κ (P()) P( ). We also define the pull-back of a measurable function φ : R as κ (φ ) := φ κ. With this, for subsets A, B and A := κ 1 (A ), B := κ 1 (B ) we have d(κ (κ (χ A ) µ)) = κ (χ A ) dµ = dµ B B A B = dκ (µ) = χ A dκ (µ), A B B since κ (χ A ) = χ A. Thus, κ (κ (χ A )µ) = χ A κ (µ). By linearity, this equation holds when replacing χ A by a step function on, whence by the density of step functions in L 1 (, µ ) we obtain κ (κ (φ ) µ) = φ κ (µ) for all φ L 1 (, κ (µ)). (3.2) Recall that M() and S() denote the spaces of all (signed) measures on, whereas M(, µ) and S(, µ) denote the subspaces of the (signed) measures on which are dominated by µ. Definition 3.1. (Congruent embedding) Let κ : be a statistic and µ M( ). A κ-congruent embedding is a bounded linear map K : S(, µ ) S() such that 12

14 1. K is monotone, i.e., it maps nonnegative measures to nonnegative measures, or shortly: K (M(, µ )) M(). 2. κ (K (ν )) = ν for all ν S(, µ ). Furthermore, the image of a κ-congruent embedding K in S() is called a κ-congruent subspace of S(). Example 3.1. Let κ : be a statistic, let µ M() and µ := κ (µ) M( ). Then the map K µ : S(, µ ) S(, µ) S(), φ µ κ (φ )µ (3.3) for all φ L 1 (, µ ) is a κ-congruent embedding, since by (3.2). κ (K µ (φ µ )) = κ (κ (φ )µ) = φ κ (µ) = φ µ We shall now see that the above example exhausts all possibilities of congruent embeddings. Proposition 3.1. Let κ : be a statistic, let K : S(, µ ) S() for some µ M( ) be a κ-congruent embedding, and let µ := K (µ ) M(). Then K = K µ with the map K µ given in (3.3). Proof. We have to show that K (φ µ) = κ (φ )µ for all φ L 1 (, µ ). By continuity, it suffices to show this for step functions, as these are dense in L 1 (, µ ), whence by linearity, we have to show that for all A, A := κ 1 (A ) K (χ A µ ) = χ A µ. (3.4) Let A 1 := A and A 2 = \A, and let A i := κ 1 (A i ). We define the measures µ i := χ A i µ M( ), and µ i := K (µ i ) M(). Since µ 1 + µ 2 = µ, it follows that µ 1 + µ 2 = µ by the linearity of K. Taking indices mod 2, and using κ (µ i ) = κ (K (µ i )) = µ i by the κ-congruency of K, note that µ i (A i+1 ) = µ i (κ 1 (A i+1)) = κ (µ i )(A i+1) = µ i(a i+1) = 0. Thus, for any measurable B we have µ 1 (B) = µ 1 (B A 1 ) since µ 1 (B A 2 ) µ 1 (A 2 ) = 0 = µ 1 (B A 1 ) + µ 2 (B A 1 ) since µ 2 (B A 1 ) µ 2 (A 1 ) = 0 = µ(b A 1 ) since µ = µ 1 + µ 2 = (χ A µ)(b) since A 1 = A. That is, χ A µ = µ 1 = K (µ 1 ) = K (χ A µ ), so that (3.4) follows. As a consequence, any κ-congruent subspace of S() must be of the form for some µ M(). C κ,µ := {κ (φ )µ : φ L 1 (, κ (µ))} (3.5) 13

15 3.2 Markov kernels and Markov morphisms Definition 3.2. (Markov kernel and Markov morphism, cf. [8], [15]) A Markov kernel between two measurable spaces ( 1, Σ 1 ) and ( 2, Σ 2 ) is a map K : 1 P( 2 ), associating to each ω 1 1 a probability measure on 2 such that for each fixed measurable A 2 2 the map 1 [0, 1], ω 1 K(ω 1 ; A 2 ) := K(ω 1 )(A 2 ) is measurable. The Markov morphism induced by K is the linear map K : S( 1 ) S( 2 ), K(µ 1 ; A 2 ) = K (µ 1 )(A 2 ) := K(ω 1 ; A 2 ) dµ 1 (ω 1 ). (3.6) 1 Since K(ω 1 ) P( 2 ), it follows that K(ω 1 ; 2 ) = 1 and hence (3.6) implies that K (µ 1 )( 2 ) = µ 1 ( 1 ). Thus, K (µ 1 ) T V = µ 1 T V for all µ 1 M( 1 ). (3.7) In particular, a Markov morphism maps probability measures to probability measures. For a general measure µ 1 S( 1 ), (2.2) implies that K (µ 1 ; A 2 ) K ( µ 1 ; A 2 ) for all A 2 Σ 2 and hence, K (µ 1 ) T V K ( µ 1 ) T V = µ 1 T V for all µ 1 S( 1 ), so that K : S( 1 ) S( 2 ) is a bounded linear map. Observe that we can recover the Markov kernel K from K using the relation K(ω 1 ) = K (δ ω 1 ) for all ω 1 1, where δ ω 1 denotes the Dirac measure supported at ω 1 1. Remark 3.1. From (3.6) it is immediate that K preserves dominance of measures, i.e., if µ 1 dominates µ 1, then K (µ 1 ) dominates K (µ 1 ). Thus, for each µ 1 M( 1 ) there is a restriction where µ 2 := K (µ 1 ). K : S( 1, µ 1 ) S( 2, µ 2 ), Definition 3.3. (Composition of Markov kernels) Let ( i, Σ i ), i = 1, 2, 3 be measurable spaces, and let K i : i P( i+1 ) for i = 1, 2 be Markov kernels. The composition of K 1 and K 2 is the Markov kernel K 2 K 1 : 1 P( 3 ), ω (K 2 ) (K 1 (ω)). Since (K 2 ) (K 1 (ω)) T V = K 1 (ω) T V = 1 by (3.7), (K 2 ) (K 1 (ω)) is a probability measure, hence this composition yields indeed a Markov kernel. Moreover, it is straightforward to verify that this composition is associative, and for the induced Markov morphism we have (K 2 K 1 ) = (K 2 ) (K 1 ). (3.8) Markov kernels are generalizations of statistics. In fact, a statistic κ : induces a Markov kernel by K κ (ω) := δ κ(ω), so that K κ (ω; A ) := χ κ 1 (A )(ω). In this case, the Markov morphism induced by K κ is the map κ : S() S( ) from (3.1). We shall write the Markov kernel K κ also as κ if there is no danger of confusion. 14

16 Definition 3.4. (Congruent Markov kernels) A Markov kernel K : P() is called κ-congruent for a statistic κ : if or, equivalently, κ (K(ω )) = δ ω for all ω, (3.9) (K κ K) = Id S( ) : S( ) S( ). In this case, we also call the induced Markov morphism K : S( ) S() κ-congruent. In order to relate the notions of κ-congruent Markov morphism and κ-congruent embeddings from Definition 3.1, we need the notion of κ-transverse measures. Definition 3.5. (Transverse measures) Let κ : be a statistic. A measure µ M() is said to admit κ-transverse measures if there are measures µ ω on κ 1 (ω ) such that for all φ L 1 (, µ) ( ) φ dµ = φ dµ ω dµ (ω ), (3.10) κ 1 (ω ) where µ := κ (µ). In particular, the function ˆR, ω is measurable for all φ L 1 (, µ). κ 1 (ω ) φ dµ ω Observe that the choice of κ-transverse measures µ ω these measures for all ω in a µ -null set. is not unique, but rather, one can change Proposition 3.2. Let κ : be a statistic and µ M() a measure which admits κ-transverse measures {µ ω : ω }. Then µ ω is a probability measure for almost every ω and hence, we may assume w.l.o.g. that µ ω P(κ 1 (ω )) for all ω. Proof. Given ε > 0, define A ε := {ω : µ ω (κ 1 (ω )) 1 + ε}. Then for φ := χ κ 1 (A the ε) two sides of equation (3.10) read ( χ κ 1 (A dµ = µ(κ ε) 1 (A ε)) = µ (A ε) ) ( ) χ κ 1 (A ε) dµ ω dµ (ω ) = dµ ω dµ (ω ) κ 1 (ω ) A ε κ 1 (ω ) = µ ω (κ 1 (ω )) dµ (ω ) A ε (1 + ε)µ (A ε). Thus, (3.10) implies µ (A ε) (1 + ε)µ (A ε), 15

17 and hence, µ (A ε) = 0 for all ε > 0. Thus, µ ({ω : µ ω (κ 1 (ω )) > 1}) = µ ( n=1 A 1/n ) µ (A 1/n ) = 0, whence {ω : µ ω (κ 1 (ω )) > 1} is a µ -null set. Analogously, {ω : µ ω (κ 1 (ω )) < 1} is a µ -null set, that is, µ ω P(κ 1 (ω )) and hence µ ω T V = 1 for µ -a.e. µ. Thus, if we replace µ ω by µ ω := µ ω 1 T V µ ω, then µ ω P(κ 1 (ω )) for all ω, and since µ ω = µ ω for µ -a.e. ω, it follows that (3.10) holds when replacing µ ω by µ ω. We are now ready to relate the notions of κ-congruent embeddings and κ-congruent Markov kernels. Theorem 3.1. Let κ : be a statistic and µ M( ) be a measure. 1. If K : P() is a κ-congruent Markov kernel, then the restriction of K to S(, µ ) S( ) is a κ-congruent embedding and hence, for φ L 1 (, µ ) we have K (φ µ ) = κ (φ )K (µ ). 2. Conversely, if K : S(, µ ) S() is a κ-congruent embedding, then the following are equivalent. (a) K is the restriction of a κ-congruent Markov morphism to S(, µ ) S( ). (b) µ := K (µ ) S() admits κ-transverse measures. Theorem 3.1 implies that the two notions of congruency are equivalent for large classes of statistics κ, since the existence of transversal measures is guaranteed under rather mild hypotheses, e.g. if one of, is a finite set, or if, are differentiable manifolds equipped with a Borel measure µ and κ is a differentiable map. However, there are examples of statistics and measures which do not admit κ-transverse measures, cf. Example 3.2 below. Proof. The first statement follows directly from (K κ K) = (K κ ) K = κ K by (3.8) and Proposition 3.1. For the second, suppose that K : S(, µ ) S() is a κ-congruent embedding. Then K = K µ given in (3.3) for the measure µ := K (µ ) by Proposition 3.1. If we assume that K is the restriction of a κ-congruent Markov morphism induced by the κ- congruent Markov kernel K : P(), then we define the measures µ ω := K(ω ) κ 1 (ω) M(κ 1 (ω )). n=1 Note that for ω K(ω ; \κ 1 (ω )) = (3.9) = dk(ω ) = dκ (K(ω )) \κ 1 (ω ) \ω = 0. \ω dδ ω 16

18 That is, K(ω ) is supported on κ 1 (ω ) and hence, for an arbitrary set A we have K(ω ; A) = K(ω ; A κ 1 (ω )) = µ ω (A κ 1 (ω )) = χ A dµ ω. κ 1 (ω ) Substituting this into the definition of K we obtain for a subset A χ A dµ = µ(a) = K (µ ; A) (3.6) = K(ω ; A) dµ (ω ) ( ) = χ A dµ ω dµ (ω ), κ 1 (ω ) showing that (3.10) holds for φ = χ A. But then, by linearity (3.10) holds for any step function φ, and since these are dense in L 1 (, µ), it follows that (3.10) holds for all φ, so that the measures µ ω defined above yield indeed κ-transverse measures of µ. Conversely, suppose that µ := K (µ ) admits κ-transverse measures µ ω, and by Proposition 3.2 we may assume w.l.o.g. that µ ω P(κ 1 (ω )). Then we define the map K : P(), K(ω ; A) := µ ω (A κ 1 (ω )) = χ A dµ ω. κ 1 (ω ) Since for fixed A the map ω κ 1 (ω ) χ A dµ ω is measurable by the definition of transversal measures, K is indeed a Markov kernel. Moreover, for A κ (K(ω ))(A ) = K(ω ; κ 1 (A )) = µ ω (κ 1 (A ) κ 1 (ω )) = χ A (ω ), so that κ K(ω ) = δ ω for all ω, whence K is κ-congruent. Moreover, for any φ L 1 (, µ ) and A we have K µ (φ µ (3.3) )(A) = κ (φ )µ(a) = χ A κ (φ ) dµ ( ) (3.10) = χ A κ (φ ) dµ ω dµ (ω ) κ 1 (ω ) ( ) = χ A dµ ω φ (ω ) dµ (ω ) κ 1 (ω ) = K(ω ; A) d(φ µ )(ω ) (3.6) = K (φ µ )(A). Thus, K µ (φ µ ) = K (φ µ ) for all φ L 1 (, µ ) and hence, K µ (ν) = K (ν) for all ν S(, µ ). That is, the given congruent embedding K µ coincides with the Markov morphism K induced by K, and this completes the proof. Now we give an example of a statistic which does not admit κ-transverse measures. Example 3.2. Let := S 1 be the unit circle group in the complex plain with the 1-dimensional Borel algebra B. Let Γ := exp(2π 1Q) S 1 be the subgroup of rational rotations, and let := S 1 /Γ be the quotient space with the canonical projection κ :. Let B := {A : 17

19 κ 1 (A ) B}, so that κ : is measurable. For γ Γ, we let m γ : S 1 S 1 denote the multiplication by γ. Let λ be the 1-dimensional Lebesgue measure on and λ := κ (λ) be the induced measure on. Suppose that λ admits κ-transverse measures (λ ω ) ω. Then for each A B we have ( ) λ(a) = dλ ω dλ (ω ). (3.11) A κ 1 (ω ) Since λ is invariant under rotations, we have on the other hand for γ Γ ( ) λ(a) = λ(m 1 γ A) = dλ ω dλ (ω ) = ( (m 1 γ A) κ 1 (ω ) ) d((m γ ) λ ω ) A κ 1 (ω ) dλ (ω ). (3.12) Comparing (3.11) and (3.12) implies that ((m γ ) λ ω ) ω is another familiy of κ-transverse measures of λ which implies that (m γ ) λ ω = λ ω for λ -a.e. ω, and as Γ is countable, it follows that (m γ ) λ ω = λ ω for all γ Γ and λ -a.e. ω. Thus, for a.e. ω we have λ ω ({γ x}) = λ ω ({x}), and since Γ acts transitively on κ 1 (ω ), it follows that singleton subsets have equal measure, i.e., there is a constant c ω with λ ω (A ) = c ω A for all A κ 1 (ω ). As κ 1 (ω ) is countable and infinite, this implies that λ ω = 0 if c ω = 0, and λ ω (κ 1 (ω )) = if c ω > 0. Thus, λ ω is not a probability measure for a.e. ω, contradicting Proposition 3.2. This shows that λ does not admit κ-transverse measures. We conclude this section by the following result (cf. [5, Theorem 4.10]). Theorem 3.2. Any Markov kernel K = P( ) can be decomposed into a statistic and a congruent Markov kernel. That is, there is a Markov kernel K cong : P(ˆ) which is congruent w.r.t. some statistic κ 1 : ˆ, and a statistic κ 2 : ˆ such that K = K κ 2 K cong. Proof. Let ˆ := and let κ 1 : ˆ and κ 2 : ˆ be the canonical projections. We define the Markov kernel K cong : P(ˆ), K cong (ω 1 ; Â) := K(ω 1; κ 2 (Â ({ω 1} ))), and we assert that it is κ 1 -congruent. Namely, (κ 1 ) (K cong (ω 1 ))(A 1 ) = K cong (ω 1 ; κ 1 1 (A 1)) = K cong (ω 1 ; A 1 ) = K(ω 1 ; κ 2 ((A 1 ) ({ω 1 } ))) { K(ω1 ; = ) = 1 if ω 1 A 1 K(ω 1 ; ) = 0 if ω 1 / A 1 = χ A1 (ω 1 ), 18

20 whence (κ 1 ) (K cong (ω 1 )) = δ ω 1 as claimed. Observe that (κ 2 ) (K cong (ω 1 ))(A 2 ) = K cong (ω 1 ; κ 1 2 (A 2)) = K cong (ω 1 ; 1 A 2 ) = K(ω 1 ; κ 2 (( 1 A 2 ) ({ω 1 } ))) = K(ω 1 ; κ 2 ({ω 1 } A 2 )) = K(ω 1 ; A 2 ). Therefore, K = K κ 2 K cong, so the claim follows. 3.3 Powers of densities and congruent embeddings Given a statistic κ :, recall that by Proposition 3.1 any κ-congruent embedding is of the form K µ : S(, µ ) S(, µ) for some µ M() with µ := κ (µ), and the image of K µ, denoted by C κ,µ, is called a κ-congruent subspace. We now wish to generalize the notion of congruent embeddings and congruent subspaces to powers of measures. Namely, for r (0, 1] we define the congruent embedding of r-th powers to be the map K r µ : S r (, µ ) S r (), φµ r κ (φ)µ r, and denote its image by C r κ,µ := {κ (φ )µ r : φ L 1/r (, µ )}. (3.13) As before, we denote the spaces of the form C r κ,µ for µ M() as κ-congruent subspaces. Observe that K 1 µ = K µ and C 1 κ,µ = C κ,µ with the definitions from (3.3) and (3.5). Since the pull-back of functions preserves products and powers, we have immediately the following properties. K r µ(ν r ) 1/r = ν r 1/r K r 1+r 2 µ (ν r1 ν r 2 ) = K r 2 µ (ν r 1 ) K r 2 µ (ν r 2 ) (3.14) K rα µ (π α (ν r )) = π α (K r µ(ν r )) and K rα µ ( π α (ν r )) = π α (K r µ(ν r )) for r 1 + r 2 1 and 0 < α < 1/r. We now show the following decomposition result. Proposition 3.3. Let κ :, µ M() and µ := κ (µ) M( ) be as above. Then for the congruent subspaces C r κ,µ, we have the decomposition where S r (, µ) = C r κ,µ ( (C 1 r κ,µ ) S r (, µ) ), (3.15) (C 1 r κ,µ ) = {ν r S r () : (ν r ; ρ 1 r ) = 0 for all ρ 1 r C 1 r κ,µ } with the pairing (.;.) from (2.11). In particular, for r = 1/2 (3.15) is an orthogonal decomposition w.r.t. the Hilbert metric on S 1/2 (, µ). Moreover, (C 1 r κ,µ ) S r (, µ) = {ν r S r (, µ) : κ (ν r µ 1 r ) = 0}. (3.16) 19

21 Proof. We begin by showing (3.16). Let ν r = φµ r S r (, µ) with φ L 1/r (, µ). Then ν r (Cκ,µ 1 r ) if and only if for all ψ L 1/(1 r) (, µ ) we have 0 = d ( ν r (κ (ψ )µ 1 r)) = φκ (ψ ) dµ = dκ (φκ (ψ )µ) (3.2) = ψ dκ (φµ). But ψ dκ (φµ) = 0 for all ψ L 1/(1 r) (, µ ) holds if and only if κ (φµ) = 0, which shows (3.16) since φµ = ν r µ 1 r. The spaces on the right hand side of (3.15) are obviously contained in S r (, µ). To see that Cκ,µ r (Cκ,µ 1 r ) = 0, observe that for ν r = κ (φ )µ r Cκ,µ r we have κ (ν r µ 1 r ) = κ (κ (φ )µ) = φ µ by (3.2), and by (3.16) this is contained in (Cκ,µ 1 r ) only if φ = 0. Since Kµ r : S r (, µ ) Cκ,µ r is an isometry by (3.14) and S r (, µ ) is complete, it follows that Cκ,µ r is complete and hence closed in S r (, µ). Also (Cκ,µ 1 r ) S r () is a closed subspace being the orthogonal complement of the subspace Cκ,µ 1 r. Therefore, the right hand side of (3.15) is a closed subspace of the left, hence it suffices to show that the right hand side contains the dense subspace {ν r = φµ r : φ L (, µ)} of S r (, µ). Thus, for such a ν r = φµ r S r (, µ) define φ L 1 (, µ ) by Then for A we have (κ (φµ))(a ) = κ (ν r µ 1 r ) = κ (φµ) = φ µ. κ 1 (A ) φ dµ φ µ(κ 1 A ) = φ µ (A ), whence φ φ, so that φ is bounded as well. In particular, φ µ r S r (, µ ), and we decompose ν r = κ (φ )µ r +(ν r κ (φ )µ r ). }{{} Cκ,µ r Thus, it remains to show that ν r 1 := νr κ (φ )µ r (C 1 r κ,µ ) S r (, µ). Since evidently ν r 1 S r (, µ), we may use (3.16) to verify this. Namely, κ (ν r 1 µ 1 r ) = κ (ν r µ 1 r ) κ (κ (φ )µ r µ 1 r ) = φ µ φ κ (µ) = 0 by (3.2) and the definition of φ. This completes the proof. 4 Parametrized measure models and k-integrability In this section, we shall now present our notion of a parametrized measure model. Definition 4.1. (Parametrized measure model) Let be a measure space. 20

22 1. A parametrized measure model is a triple (M,, p) where M is a (finite or infinite dimensional) Banach manifold and p : M M() S() is a C 1 -map in the sense explained in Section The triple (M,, p) is called a statistical model if it consists only of probability measures, i.e., such that the image of p is contained in P(). 3. We call such a model dominated by µ 0 if the image of p is contained in S(, µ 0 ). In this case, we use the notation (M,, µ 0, p) for this model. Remark 4.1. Evidently, for the applications we have in mind, we are interested mainly in statistical models. However, we can take the point of view that P() is the projectivisation of P() = P(M()\0) via rescaling. Thus, given a parametrized measure model (M,, p), normalisation yields a statistical model (M,, p 0 ) defined by p 0 (ξ) := p(ξ) p(ξ) T V. which is again a C 1 -map. Indeed, the map µ µ T V on M() is a C 1 -map, being the restriction of the linear (and hence continuous) map µ dµ on S(). Observe that while S() is a Banach space, the subsets M() and P() do not carry a canonical manifold structure. If a parametrized measure model (M,, µ 0, p) is dominated by µ 0, then there is a density function p : M R such that p(ξ) = p(.; ξ)µ 0. (4.1) From the context, i.e., from the number of arguments, it will be clear which map p is meant, whence we will denote both maps by the same symbol. Evidently, we must have p(.; ξ) L 1 (, µ 0 ) for all ξ. In particular, for fixed ξ, p(.; ξ) is defined only up to changes on a µ 0 -null set. Definition 4.2. (Regular density function) Let (M,, µ 0, p) be a parametrized measure model dominated by µ 0. We say that this model has a regular density function if the density function p : M R satisfying (4.1) can be chosen such that for all V T ξ M the partial derivative V p(.; ξ) exists and lies in L 1 (, µ 0 ). Remark 4.2. The standard notion of a statistical model always assumes that it is dominated by some measure and has a regular density function (e.g. [3, 2, p. 25], [4, 2.1], [19], [5, Definition 2.4]). In fact, the definition of a parametrized measure model or statistical model in [5, Definition 2.4] is equivalent to a parametrized measure model or statistical model with a regular density function in the sense of Definition 4.2. Let us point out why the present notion is indeed more general. The formal definition of differentiability of p implies that for each C 1 -path ξ(t) M with ξ(0) = ξ, ξ(0) =: V Tξ M, the curve t p(.; ξ(t)) L 1 (, µ 0 ) is differentiable. This implies that there is a d ξ p(v ) L 1 (, µ 0 ) such that p(.; ξ(t)) p(.; ξ) d ξ p(v )(.) t 1 t 0 0. If this is a pointwise convergence, then d ξ p(v ) = V p(.; ξ) is the partial derivative and whence, V p(.; ξ) lies in L 1 (, µ 0 ), so that the density function is regular. 21

23 However, in general convergence in L 1 (, µ 0 ) does not imply pointwise convergence, whence there are parametrized measure models in the sense of Definition 4.1 without a regular density function, cf. Example 4.1.2below. Nevertheless, for simplicity we shall frequently use the notation V p( ; ξ) instead of d ξ p(v )(.), even if the density function is not regular. By this convention, for a parametrized measure model (M,, µ 0, p) we can describe its derivative in the direction of V T ξ M as d ξ p(v ) = V p(.; ξ) µ 0. Example The family of normal distributions on R p(µ, σ) := 1 (x µ)2 exp( 2πσ 2σ 2 ) dx is a statistical model with regular density function on the upper half plane H = {(µ, σ) : µ, σ R, σ > 0}. 2. To see that there are parametrized measure models without a regular density function, consider the family of measures on = (0, π) ( 1 + ξ (sin 2 (t 1/ξ)) 1/ξ2) dt for ξ 0 p(ξ) := dt for ξ = 0. This model is dominated by the Lebesgue measure dt, with density function p(t; ξ) = 1 + ξ (sin 2 (t 1/ξ)) 1/ξ2 for ξ 0, p(t; 0) = 1. Thus, the partial derivative ξ p does not exist at ξ = 0, whence the density function is not regular. On the other hand, p : R M(, dt) is differentiable in the above sense at ξ = 0 with d 0 p( ξ ) = 0, so that (M,, p) is a parametrized measure model in the sense of Definition 4.1. To see this, we calculate p(ξ) p(0) ξ = 1 = = (sin 2 (t 1/ξ)) 1/ξ2 dt π 0 π 0 1 (sin 2 (t 1/ξ)) 1/ξ2 dt (sin 2 t) 1/ξ2 dt ξ 0 0. which shows the claim. Here, we used the π-periodicity of the integrand for fixed ξ and dominated convergence in the last step. Since for a parametrized measure model (M,, p) the map p is C 1, it follows that its derivative yields a continuous map between the tangent spaces dp : T M T M() = µ M() S(, µ). That is, for each tangent vector V T ξ M, its differential d ξ p(v ) is contained in S(, p(ξ)) and hence dominated by p(ξ). 22

24 Definition 4.3. Let (M,, p) be a parametrized measure model. Then for each tangent vector V T ξ M of M, we define V log p(ξ) := d{d ξp(v )} dp(ξ) L 1 (, p(ξ)) (4.2) and call this the logarithmic derivative of p at ξ in direction V. If such a model is dominated by µ 0 and has a regular density function p for which (4.1) holds, then we can calculate the Radon-Nikodým derivative as d{d ξ p(v )} dp(ξ) = d{d ( ) ξp(v )} dp(ξ) 1 dµ 0 dµ 0 = V p(.; ξ)(p(.; ξ)) 1 = V log p(.; ξ), where we use the convention log 0 = 0. This justifies the notation in (4.2) even for models without a regular densitiy function. For a parametrized measure model (M,, p) and k > 1 we consider the map p 1/k := π 1/k p : M S 1/k (), ξ p(ξ) 1/k. Since π 1/k is continuous by Proposition 2.3, it follows that p 1/k is continuous as well. Let us pretend for the moment that p 1/k is a C 1 -map, so that d ξ p 1/k (V ) T p(ξ) 1/kM 1/k () = S 1/k (, p(ξ)). In this case, because of π k π 1/k = Id, we have p = π k p 1/k, whence by the chain rule and (2.14) we have for ξ M and V T ξ M Thus with (4.2) this implies d ξ p(v ) = k p(ξ) 1 1/k (d ξ p 1/k (V ) ). d ξ p 1/k (V ) = 1 k V log p(ξ) p 1/k (ξ) S 1/k (, p(ξ)) (4.3) and hence, in particular, V log p(ξ) L k (, p(ξ)), and depends continuously on V T M. This motivates the following definition. Definition 4.4. (k-integrable parametrized measure model) A parametrized measure model (M,, p) is called k-integrable for k 1 if for all ξ M and V T ξ M we have V log p(ξ) = d{d ξp(v )} L k (, p(ξ)), dp(ξ) and moreover, the map dp 1/k : T M T S 1/k () given in (4.3) is continuous. dp 1/k is called the formal derivative of p 1/k. Furthermore, we call the model -integrable if it is k-integrable for all k 1. 23

25 Since p(ξ) is a finite measure, we have L k (, p(ξ)) L l (, p(ξ)) for all 1 l k. Thus, k- integrability implies l-integrability for all such l. Remark By our previous discussion, a parametrized measure model (M,, p) for which p 1/k is a C 1 -map is always k-integrable, and the derivative coincides with the formal derivative. However, it is not clear if there are k-integrable parametrized measure models for which p 1/k is not a C 1 -map. 2. Observe that for parametrized measure models with a regular density function the notion of k-integrability coincides with that given in [5, Definition 2.4]. Definition 4.5. (Canonical n-tensor) For n N, the canonical n-tensor is the covariant n-tensor on S 1/n (), given by L n (ν 1,..., ν n ) = n n d(ν 1 ν n ), where ν i S 1/n (). (4.4) For n = 2, the pairing (.;.) : S 1/2 () S 1/2 () R from (2.11) satisfies (ν 1 ; ν 2 ) = 1 4 L2 (ν 1, ν 2 ). Since (ν; ν) = ν 2 2 by (2.8), it follows: ( S 1/2 (), 1 ) 4 L2 is a Hilbert space with norm. 2. The main purpose of defining the notion of k-integrability is that for a k-integrable model, there is a well defined pullback of the canonical n-tensor L n via the map p1/n for all n k. That is, we define for V 1,..., V n T ξ M τ n (M,,p) (V 1,..., V n ) := L n (d ξp 1/n (V 1 ),..., d ξ p 1/n (V n )) = V 1 log p(ξ) Vn log p(ξ) dp(ξ), where the second line follows immediately from (4.3) and (4.4). Example For n = 1, the canonical 1-form is given as τ(m,,p) 1 (V ) := V log p(ξ) dp(ξ) = V p(ξ). Thus, it vanishes if and only if p(ξ) is locally constant, e.g., if (M,, p) is a statistical model. 2. For n = 2, τ 2 (M,,p) coincides with the Fisher metric g F (V, W ) ξ := V log p(ξ) W log p(ξ) dp(ξ) (4.5) 24

26 3. For n = 3, τ(m,,p) 3 coincides with the Amari-Chentsov 3-symmetric tensor T AC (V, W, X) ξ := log p(ξ) W log p(ξ) X log p(ξ) dp(ξ). V Remark 4.4. While the Fisher metric and the Amari-Chentsov tensor give an interpretation of τ(m,,p) n for n = 2, 3, we do not know of any statistical significance of τ (M,,p) n for n 4. However, one might hope to get an interpretation of e.g. τ(m,,p) 4 as some kind of curvature tensor of a suitable connection. Moreover, in [14, p.212] the question is posed if there are other significant tensors on statistical manifolds, and the canonical n-tensors may be considered as natural candidates. 5 Parametrized measure models and sufficient statistics Given a parametrized measure model (statistical model, respectively) (M,, p) and a Markov kernel K : P( ) which induces the Markov morphism K : M() M( ) as in (3.6), we obtain another parametrized measure model (statistical model, respectively) (M,, p ) by defining p (ξ) := K (p(ξ)). It is the purpose of this section to investigate the relation between these two models in more detail. Theorem 5.1. Let (M,, p), K : P( ) and (M,, p ) be as above, and suppose that (M,, p) is k-integrable for some k 1. Then the following hold. 1. (M,, p ) is also k-integrable. 2. If K is congruent w.r.t. some statistic κ :, then V log p(ξ) k = V log p (ξ) k for all V T ξ M. 3. If K is induced by a statistic κ :, then ( V log p(ξ) κ ( V log p (ξ)) ) p(ξ) 1/k (C 1 1/k κ,p(ξ) ). Proof. Since K is the restriction of a bounded linear map, it is obvious that p : M M( ) is again differentiable, and in fact, d ξ p (V ) = K (d ξ p(v )). for all V T ξ M, ξ M. Let us now assume that K is κ-congruent w.r.t. a statistic κ :. Applying Proposition 3.1 (with 1 := and 2 := ) to µ := p(ξ) M() and µ := K (µ ) = p (ξ) M( ), it follows that d ξ p (V ) = K (d ξ p(v )) = K ( V log p(ξ)µ ) = κ ( V log p(ξ))µ = κ ( V log p(ξ))p (ξ), and hence, for any κ-congruent Markov morphism K, we have V log p (ξ) = d{d ξp (V )} dp (ξ) = κ ( V log p(ξ)), 25

Information Geometry and Sufficient Statistics

Information Geometry and Sufficient Statistics Information Geometry and Sufficient Statistics Nihat Ay Jürgen Jost Hong Van Le Lorenz Schwachhoefer SFI WORKING PAPER: 2012-11-020 SFI Working Papers contain accounts of scientific work of the author(s)

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

2 Lebesgue integration

2 Lebesgue integration 2 Lebesgue integration 1. Let (, A, µ) be a measure space. We will always assume that µ is complete, otherwise we first take its completion. The example to have in mind is the Lebesgue measure on R n,

More information

9 Radon-Nikodym theorem and conditioning

9 Radon-Nikodym theorem and conditioning Tel Aviv University, 2015 Functions of real variables 93 9 Radon-Nikodym theorem and conditioning 9a Borel-Kolmogorov paradox............. 93 9b Radon-Nikodym theorem.............. 94 9c Conditioning.....................

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

arxiv: v1 [math.st] 31 May 2017

arxiv: v1 [math.st] 31 May 2017 CONGRUENT FAMILIES AND INVARIANT TENSORS LORENZ SCHWACHHÖFER, NIHAT AY, JÜRGEN JOST, HÔNG VÂN LÊ arxiv:1705.11014v1 [math.st] 31 May 2017 Abstract. Classical results of Chentsov and Campbell state that

More information

Problem Set. Problem Set #1. Math 5322, Fall March 4, 2002 ANSWERS

Problem Set. Problem Set #1. Math 5322, Fall March 4, 2002 ANSWERS Problem Set Problem Set #1 Math 5322, Fall 2001 March 4, 2002 ANSWRS i All of the problems are from Chapter 3 of the text. Problem 1. [Problem 2, page 88] If ν is a signed measure, is ν-null iff ν () 0.

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

6 Classical dualities and reflexivity

6 Classical dualities and reflexivity 6 Classical dualities and reflexivity 1. Classical dualities. Let (Ω, A, µ) be a measure space. We will describe the duals for the Banach spaces L p (Ω). First, notice that any f L p, 1 p, generates the

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Combinatorics in Banach space theory Lecture 12

Combinatorics in Banach space theory Lecture 12 Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)

More information

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. Measures These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. 1 Introduction Our motivation for studying measure theory is to lay a foundation

More information

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

The Lebesgue Integral

The Lebesgue Integral The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first

3. (a) What is a simple function? What is an integrable function? How is f dµ defined? Define it first Math 632/6321: Theory of Functions of a Real Variable Sample Preinary Exam Questions 1. Let (, M, µ) be a measure space. (a) Prove that if µ() < and if 1 p < q

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

LECTURE 3 Functional spaces on manifolds

LECTURE 3 Functional spaces on manifolds LECTURE 3 Functional spaces on manifolds The aim of this section is to introduce Sobolev spaces on manifolds (or on vector bundles over manifolds). These will be the Banach spaces of sections we were after

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

6. Duals of L p spaces

6. Duals of L p spaces 6 Duals of L p spaces This section deals with the problem if identifying the duals of L p spaces, p [1, ) There are essentially two cases of this problem: (i) p = 1; (ii) 1 < p < The major difference between

More information

THE RADON NIKODYM PROPERTY AND LIPSCHITZ EMBEDDINGS

THE RADON NIKODYM PROPERTY AND LIPSCHITZ EMBEDDINGS THE RADON NIKODYM PROPERTY AND LIPSCHITZ EMBEDDINGS Motivation The idea here is simple. Suppose we have a Lipschitz homeomorphism f : X Y where X and Y are Banach spaces, namely c 1 x y f (x) f (y) c 2

More information

1 Differentiable manifolds and smooth maps

1 Differentiable manifolds and smooth maps 1 Differentiable manifolds and smooth maps Last updated: April 14, 2011. 1.1 Examples and definitions Roughly, manifolds are sets where one can introduce coordinates. An n-dimensional manifold is a set

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Signed Measures and Complex Measures

Signed Measures and Complex Measures Chapter 8 Signed Measures Complex Measures As opposed to the measures we have considered so far, a signed measure is allowed to take on both positive negative values. To be precise, if M is a σ-algebra

More information

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions FUNDAMNTALS OF RAL ANALYSIS by Doğan Çömez IV. DIFFRNTIATION AND SIGND MASURS IV.1. Differentiation of Monotonic Functions Question: Can we calculate f easily? More eplicitly, can we hope for a result

More information

Gaussian automorphisms whose ergodic self-joinings are Gaussian

Gaussian automorphisms whose ergodic self-joinings are Gaussian F U N D A M E N T A MATHEMATICAE 164 (2000) Gaussian automorphisms whose ergodic self-joinings are Gaussian by M. L e m a ńc z y k (Toruń), F. P a r r e a u (Paris) and J.-P. T h o u v e n o t (Paris)

More information

CHAPTER I THE RIESZ REPRESENTATION THEOREM

CHAPTER I THE RIESZ REPRESENTATION THEOREM CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

Lebesgue-Radon-Nikodym Theorem

Lebesgue-Radon-Nikodym Theorem Lebesgue-Radon-Nikodym Theorem Matt Rosenzweig 1 Lebesgue-Radon-Nikodym Theorem In what follows, (, A) will denote a measurable space. We begin with a review of signed measures. 1.1 Signed Measures Definition

More information

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1 MATH 5310.001 & MATH 5320.001 FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING 2016 Scientia Imperii Decus et Tutamen 1 Robert R. Kallman University of North Texas Department of Mathematics 1155

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1 Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and

More information

Math 396. Bijectivity vs. isomorphism

Math 396. Bijectivity vs. isomorphism Math 396. Bijectivity vs. isomorphism 1. Motivation Let f : X Y be a C p map between two C p -premanifolds with corners, with 1 p. Assuming f is bijective, we would like a criterion to tell us that f 1

More information

CHAPTER 6. Differentiation

CHAPTER 6. Differentiation CHPTER 6 Differentiation The generalization from elementary calculus of differentiation in measure theory is less obvious than that of integration, and the methods of treating it are somewhat involved.

More information

Disintegration into conditional measures: Rokhlin s theorem

Disintegration into conditional measures: Rokhlin s theorem Disintegration into conditional measures: Rokhlin s theorem Let Z be a compact metric space, µ be a Borel probability measure on Z, and P be a partition of Z into measurable subsets. Let π : Z P be the

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen Title Author(s) Some SDEs with distributional drift Part I : General calculus Flandoli, Franco; Russo, Francesco; Wolf, Jochen Citation Osaka Journal of Mathematics. 4() P.493-P.54 Issue Date 3-6 Text

More information

Solutions to Tutorial 11 (Week 12)

Solutions to Tutorial 11 (Week 12) THE UIVERSITY OF SYDEY SCHOOL OF MATHEMATICS AD STATISTICS Solutions to Tutorial 11 (Week 12) MATH3969: Measure Theory and Fourier Analysis (Advanced) Semester 2, 2017 Web Page: http://sydney.edu.au/science/maths/u/ug/sm/math3969/

More information

Exercise 1. Show that the Radon-Nikodym theorem for a finite measure implies the theorem for a σ-finite measure.

Exercise 1. Show that the Radon-Nikodym theorem for a finite measure implies the theorem for a σ-finite measure. Real Variables, Fall 2014 Problem set 8 Solution suggestions xercise 1. Show that the Radon-Nikodym theorem for a finite measure implies the theorem for a σ-finite measure. nswer: ssume that the Radon-Nikodym

More information

1.1. MEASURES AND INTEGRALS

1.1. MEASURES AND INTEGRALS CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined

More information

Reminder Notes for the Course on Measures on Topological Spaces

Reminder Notes for the Course on Measures on Topological Spaces Reminder Notes for the Course on Measures on Topological Spaces T. C. Dorlas Dublin Institute for Advanced Studies School of Theoretical Physics 10 Burlington Road, Dublin 4, Ireland. Email: dorlas@stp.dias.ie

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

Folland: Real Analysis, Chapter 7 Sébastien Picard

Folland: Real Analysis, Chapter 7 Sébastien Picard Folland: Real Analysis, Chapter 7 Sébastien Picard Problem 7.2 Let µ be a Radon measure on X. a. Let N be the union of all open U X such that µ(u) =. Then N is open and µ(n) =. The complement of N is called

More information

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X.

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X. Math 6342/7350: Topology and Geometry Sample Preliminary Exam Questions 1. For each of the following topological spaces X i, determine whether X i and X i X i are homeomorphic. (a) X 1 = [0, 1] (b) X 2

More information

THE FUNDAMENTAL GROUP OF MANIFOLDS OF POSITIVE ISOTROPIC CURVATURE AND SURFACE GROUPS

THE FUNDAMENTAL GROUP OF MANIFOLDS OF POSITIVE ISOTROPIC CURVATURE AND SURFACE GROUPS THE FUNDAMENTAL GROUP OF MANIFOLDS OF POSITIVE ISOTROPIC CURVATURE AND SURFACE GROUPS AILANA FRASER AND JON WOLFSON Abstract. In this paper we study the topology of compact manifolds of positive isotropic

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

CHAPTER V DUAL SPACES

CHAPTER V DUAL SPACES CHAPTER V DUAL SPACES DEFINITION Let (X, T ) be a (real) locally convex topological vector space. By the dual space X, or (X, T ), of X we mean the set of all continuous linear functionals on X. By the

More information

Topological vectorspaces

Topological vectorspaces (July 25, 2011) Topological vectorspaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ Natural non-fréchet spaces Topological vector spaces Quotients and linear maps More topological

More information

Loos Symmetric Cones. Jimmie Lawson Louisiana State University. July, 2018

Loos Symmetric Cones. Jimmie Lawson Louisiana State University. July, 2018 Louisiana State University July, 2018 Dedication I would like to dedicate this talk to Joachim Hilgert, whose 60th birthday we celebrate at this conference and with whom I researched and wrote a big blue

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION

INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION 1 INTRODUCTION TO MEASURE THEORY AND LEBESGUE INTEGRATION Eduard EMELYANOV Ankara TURKEY 2007 2 FOREWORD This book grew out of a one-semester course for graduate students that the author have taught at

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

The Caratheodory Construction of Measures

The Caratheodory Construction of Measures Chapter 5 The Caratheodory Construction of Measures Recall how our construction of Lebesgue measure in Chapter 2 proceeded from an initial notion of the size of a very restricted class of subsets of R,

More information

Open Questions in Quantum Information Geometry

Open Questions in Quantum Information Geometry Open Questions in Quantum Information Geometry M. R. Grasselli Department of Mathematics and Statistics McMaster University Imperial College, June 15, 2005 1. Classical Parametric Information Geometry

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

( f ^ M _ M 0 )dµ (5.1)

( f ^ M _ M 0 )dµ (5.1) 47 5. LEBESGUE INTEGRAL: GENERAL CASE Although the Lebesgue integral defined in the previous chapter is in many ways much better behaved than the Riemann integral, it shares its restriction to bounded

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

A VERY BRIEF REVIEW OF MEASURE THEORY

A VERY BRIEF REVIEW OF MEASURE THEORY A VERY BRIEF REVIEW OF MEASURE THEORY A brief philosophical discussion. Measure theory, as much as any branch of mathematics, is an area where it is important to be acquainted with the basic notions and

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t)) Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time

More information

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Banach Spaces V: A Closer Look at the w- and the w -Topologies BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION

AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION AN ALGEBRAIC APPROACH TO GENERALIZED MEASURES OF INFORMATION Daniel Halpern-Leistner 6/20/08 Abstract. I propose an algebraic framework in which to study measures of information. One immediate consequence

More information

Periodic Sinks and Observable Chaos

Periodic Sinks and Observable Chaos Periodic Sinks and Observable Chaos Systems of Study: Let M = S 1 R. T a,b,l : M M is a three-parameter family of maps defined by where θ S 1, r R. θ 1 = a+θ +Lsin2πθ +r r 1 = br +blsin2πθ Outline of Contents:

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

GENERALIZED COVARIATION FOR BANACH SPACE VALUED PROCESSES, ITÔ FORMULA AND APPLICATIONS

GENERALIZED COVARIATION FOR BANACH SPACE VALUED PROCESSES, ITÔ FORMULA AND APPLICATIONS Di Girolami, C. and Russo, F. Osaka J. Math. 51 (214), 729 783 GENERALIZED COVARIATION FOR BANACH SPACE VALUED PROCESSES, ITÔ FORMULA AND APPLICATIONS CRISTINA DI GIROLAMI and FRANCESCO RUSSO (Received

More information

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1) 1.4. CONSTRUCTION OF LEBESGUE-STIELTJES MEASURES In this section we shall put to use the Carathéodory-Hahn theory, in order to construct measures with certain desirable properties first on the real line

More information

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that.

6.2 Fubini s Theorem. (µ ν)(c) = f C (x) dµ(x). (6.2) Proof. Note that (X Y, A B, µ ν) must be σ-finite as well, so that. 6.2 Fubini s Theorem Theorem 6.2.1. (Fubini s theorem - first form) Let (, A, µ) and (, B, ν) be complete σ-finite measure spaces. Let C = A B. Then for each µ ν- measurable set C C the section x C is

More information

A CONSTRUCTION OF TRANSVERSE SUBMANIFOLDS

A CONSTRUCTION OF TRANSVERSE SUBMANIFOLDS UNIVERSITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLI 2003 A CONSTRUCTION OF TRANSVERSE SUBMANIFOLDS by J. Szenthe Abstract. In case of Riemannian manifolds isometric actions admitting submanifolds

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Holomorphic line bundles

Holomorphic line bundles Chapter 2 Holomorphic line bundles In the absence of non-constant holomorphic functions X! C on a compact complex manifold, we turn to the next best thing, holomorphic sections of line bundles (i.e., rank

More information

Another Riesz Representation Theorem

Another Riesz Representation Theorem Another Riesz Representation Theorem In these notes we prove (one version of) a theorem known as the Riesz Representation Theorem. Some people also call it the Riesz Markov Theorem. It expresses positive

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Solutions to the Hamilton-Jacobi equation as Lagrangian submanifolds

Solutions to the Hamilton-Jacobi equation as Lagrangian submanifolds Solutions to the Hamilton-Jacobi equation as Lagrangian submanifolds Matias Dahl January 2004 1 Introduction In this essay we shall study the following problem: Suppose is a smooth -manifold, is a function,

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Signed Measures. Chapter Basic Properties of Signed Measures. 4.2 Jordan and Hahn Decompositions

Signed Measures. Chapter Basic Properties of Signed Measures. 4.2 Jordan and Hahn Decompositions Chapter 4 Signed Measures Up until now our measures have always assumed values that were greater than or equal to 0. In this chapter we will extend our definition to allow for both positive negative values.

More information

7 The Radon-Nikodym-Theorem

7 The Radon-Nikodym-Theorem 32 CHPTER II. MESURE ND INTEGRL 7 The Radon-Nikodym-Theorem Given: a measure space (Ω,, µ). Put Z + = Z + (Ω, ). Definition 1. For f (quasi-)µ-integrable and, the integral of f over is (Note: 1 f f.) Theorem

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

1 Inner Product Space

1 Inner Product Space Ch - Hilbert Space 1 4 Hilbert Space 1 Inner Product Space Let E be a complex vector space, a mapping (, ) : E E C is called an inner product on E if i) (x, x) 0 x E and (x, x) = 0 if and only if x = 0;

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

SYMPLECTIC GEOMETRY: LECTURE 5

SYMPLECTIC GEOMETRY: LECTURE 5 SYMPLECTIC GEOMETRY: LECTURE 5 LIAT KESSLER Let (M, ω) be a connected compact symplectic manifold, T a torus, T M M a Hamiltonian action of T on M, and Φ: M t the assoaciated moment map. Theorem 0.1 (The

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

s P = f(ξ n )(x i x i 1 ). i=1

s P = f(ξ n )(x i x i 1 ). i=1 Compactness and total boundedness via nets The aim of this chapter is to define the notion of a net (generalized sequence) and to characterize compactness and total boundedness by this important topological

More information

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ. Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed

More information