Mapping of Probabilities

Size: px

Start display at page:

Download "Mapping of Probabilities"

Anna Goodman
5 years ago
Views:

1 Mapping of Probabilities Theory for the Interpretion of Uncertain Physical Measurements Albert Tarantola 1 Université de Paris, Institut de Physique du Globe 4, place Jussieu; Paris; France alberttarantola@ipgpjussieufr September 17, c A Tarantola, 2006

2 ii

3 iii To Kike & Vittorio

4 iv

5 v Preface Note: this is an old preface that must be replaced In this book, I attempt to reach two goals The first is purely mathematical: to clarify some of the basic concepts of probability theory The second goal is physical: to clarify the methods to be used when handling the information brought by measurements, in order to understand how accurate are the predictions we may wish to make Probability theory is solidly based on Kolmogorov axioms, and there is no problem when treating discrete probabilities But I am very unhappy with the usual way of extending the theory to continuous probability distributions In this text, I introduce the notion of volumetric probability different from the more usual notion of probability density I claim that some of the more basic problems of the theory of continuous probability distributions can only ne solved within this framework, and that many of the well known paradoxes of the theory are fundamental misunderstandings, that I try to clarify I start the book with an introduction to tensor calculus, because I choose to develop the probability theory considering metric manifolds The second chapter deals with the probability theory per se I try to use intrinsic notions everywhere, ie, I only introduce definitions that make sense irrespectively of the particular coordinates being used in the manifold under investigation The reader shall see that this leads to many developments that are at odds with those found in usual texts In physical applications one not only needs to define probability distributions over (typically) large-dimensional manifolds One also needs to make use of them, and this is achieved by sampling the probability distributions using the Monte Carlo methods described in chapter 3 There is no major discovery exposed in this chapter, but I make the effort to set Monte Carlo methods using the intrinsic point of view mentioned above The metric foundation used here allows to introduce the important notion of homogeneous probability distributions Contrary to the noninformative probability distributions common in the Bayesian literature, the homogeneity notion is not controversial (provided one has agreed on a given metric over the space of interest) After a brief chapter that explains what an ideal measuring instrument should be, the book enters in the four chapters developing what I see as the four more basic inference problems in physics: (i) problems that are solved using the notion of sum of probabilities (just an elaborate way of making histograms ), (ii) problems that are solved using the product of probabilities (an approach that seems to be original), (iii) problems that are solved using conditional probabilities (these including the so-called inverse problems ), and (iv) problems that are solved using the trans-

6 vi port of probabilities (like the typical [indirect] mesurement problem, but solved here transporting probability distributions, rather than just transporting uncertainties ) I am very indebted to my colleagues (Bartolomé Coll, Georges Jobert, Klaus Mosegaard, Miguel Bosch, Guillaume Évrard, John Scales, Christophe Barnes, Frédéric Parrenin and Bernard Valette) for illuminating discussions I am also grateful to my collaborators at what was the Tomography Group at the Institut de Physique du Globe de Paris Paris, September 17, 2006 Albert Tarantola

7 Contents 1 Sets 1 11 Sets Relations Sets Basic Properties 5 12 Sigma-Fields Cardinality of a Set Field Sigma-Field 9 13 Mappings Some Properties Indicator of the Image of a Set Assimilation of Data ( Inverse Problems ) Exercise 21 2 Manifolds Manifolds and Coordinates Linear Spaces Manifolds Changing Coordinates Tensors, Capacities, and Densities Kronecker Tensors (I) Orientation of a Coordinate System Totally Antisymmetric Tensors Levi-Civita Capacity and Density Determinants Dual Tensors and Exterior Product of Vectors Capacity Element Integral Capacity Element and Change of Coordinates Volume Metric 42

8 viii CONTENTS 222 Bijection Between Forms and Vectors Kronecker Tensor (II) Fundamental Density Bijection Between Capacities, Tensors, and Densities Levi-Civita Tensor Volume Element Volume Element and Change of Variables Volume of a Domain Example: Mass Density and Volumetric Mass Mappings Image of the Volume Element Reciprocal Image of the Volume Element 56 3 Probabilities Introduction Preliminary Remarks What we Shall Do in this Chapter A Collection of Formulas; I: Discrete Probabilities A Collection of Formulas; II: Probabilities over Manifolds Probabilities Basic Definitions Image of a Probability Intersection of Probabilities Reciprocal Image of a Probability Compatibility Property Other Properties Marginal Probability Probabilities over Manifolds Probability Density Image of a Probability Density Marginal Probability Density Probabilities over Volume Manifolds Volumetric Probability Volumetric Histograms and Density Histograms Homogeneous Probability Function Image of a Volumetric Probability Intersection of Volumetric Probabilities Reciprocal Image of a Volumetric Probability Compatibility Property Assimilation of Uncertain Observations (the Bayes-Popper Approach) Popper-Bayes Algorithm Exercise 115

9 CONTENTS ix 35 Probabilities over Metric Manifolds Conditional Probability Marginal of a Conditional Probability Demonstration: marginals of the conditional Bayes Theorem Examples Homogeneous Probability for Elastic Parameters Measuring a One-Dimensional Strain (I) Measuring a One-Dimensional Strain (II) Free-Fall of an Object Measure of Poisson s Ratio Mass Calibration Probabilistic Estimation of Earthquake Locations Appendices APPENDICES FOR SET THEORY Proof of the Set Property ϕ[ A ϕ -1 [ B ] ] = ϕ[a] B APPENDICES FOR MANIFOLDS Capacity Element and Change of Coordinates Conditional Volume APPENDICES FOR PROBABILITY THEORY Image of a Probability Density Proof of the Compatibility Property (Discrete Sets) Proof of the Compatibility Property (Manifolds) Axioms for the Union and the Intersection Union of Probabilities Conditional Volumetric Probability (I) Conditional Volumetric Probability (II) Marginal Probability Density Replacement Gymnastics The Borel Paradox Sampling a Probability Random Points on the Surface of the Sphere Basic Probability Distributions Fisher from Gaussian (Demonstration) Probability Distributions for Tensors Homogeneous Distribution of Second Rank Tensors Center of a Probability Distribution Dispersion of a Probability Distribution Monte Carlo (Sampling) Methods APPENDICES FOR INVERSE PROBLEMS Inverse Problems 253

10 x CONTENTS 542 Solution in the Observable Parameter Space Implementation of Inverse Problems OTHER APPENDICES Determinant of a Partitioned Matrix Operational Definitions can not be Infinitely Accurate The Ideal Output of a Measuring Instrument Measurements The Shipwrecked Person Problem Problems Solved Using Conditional Probabilities Parameters 301 Bibliography 501 Index 601

11 Chapter 1 Sets This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter This is the introduction to the chapter

12 2 Sets 11 Sets 111 Relations The elements of a mathematical theory may have some properties, and may have some mutual relations The elements are denoted by symbols (like x or 2 ), and the relations are denoted by inserting other symbols between the elements (like = or ) The element denoted by a symbol may be variable or may be determined Given an initial set of (non contradictory) relations, other relations may be demonstrated to be true or false A relation (or property) containing variable elements is an identity if it becomes true for any determined value given to the variables If R and S are two relations containing variable elements, one says that R implies S, and one writes R S, if S is true every time that R is true If R S and S R, then one writes R S, and one says that R and S are equivalent (or that R is true if, and only if, S is true) The relation R, the negation of R, is true if R is false Therefore, one has From R S it follows S R : ( (R) ) R (11) ( R S ) ( ( S) ( R) ) (12) (but it does not follow ( R) ( S) ) If R and S are two relations, then, R OR S is also a relation, that is true if at least one of the two relations {R, S} is true Similarly, the relation R AND S is true only if the two relations {R, S} are both true Therefore, for any two relations {R, S}, the relation R OR S is false only if both R and S are false: and R AND S is false if any of the R, S is false: ( R OR S ) ( R) AND ( S), (13) ( R AND S ) ( R) OR ( S) (14) In theories relations like a = b and a A make sense, the relation (a = b) is written a = b, while the relation (a A) is written a / A

13 11 Sets Sets A set is a well-defined collection of (abstract) elements An element belongs to a set, or is a member of a set If an element a is member of a set A one writes a A (or a / A if a is not member of A ) If a and b are elements of a given set, they may be different elements ( a = b ), or they may, in fact, be the same element ( a = b ) Two sets A and B are equal if they have the same elements, and one writes A = B (or A = B if they are different) If a set A consists of the elements a, b,, one writes A = {a, b, } The empty set, the set without any element, is denoted The set of all the subsets of a set A is called the power set of A, and is denoted [A] The Cartesian product of two sets A and B, denoted A B, is the set whose elements are of all the ordered pairs (a, b), with a A and b B Given two sets A and B one says that A is a subset of B if every member of A is also a member of B One then writes A B, and one also says that A is contained in B In particular, any set A is a subset of itself, A A If A B but A = B one says that A is a proper subset of B, and one writes A B The union of two sets A 1 and A 2, denoted A 1 A 2, is the set consisting of all elements that belong to A 1, or to A 2, or to both: a A 1 A 2 a A 1 OR a A 2 (15) The intersection of two sets A 1 and A 2, denoted A 1 A 2, is the set consisting of all elements that belong to both A 1 and A 2 : a A 1 A 2 a A 1 AND a A 2 (16) If A 1 A 2 =, one says that A 1 and A 2 are disjoint Given some reference set A 0, to any subset A of A 0, one associates its complement, that is the set of all the elements of A 0 that are not members of A The complement of a set A (with respect to some reference set) is denoted A c Given some reference set A 0, the indicator function 1 of a set A A 0 is the function that to every element a A 0 associates the number one, if a A, or the number zero, if a / A (see figure 11) This function may be denoted by a symbol like χ A or ξ A For instance, using the former, { 1 if a A χ A (a) = (17) 0 if a / A as The union and intersection of sets can be expressed in term of indicator functions χ A1 A 2 = max{χ A1, χ A2 } = χ A1 + χ A2 χ A1 χ A2 χ A1 A 2 = min{χ A1, χ A2 } = χ A1 χ A2 (18) 1 The indicator function is sometimes called characteristic function, but there is another sense for that name in probability theory

14 4 Sets A A A 0 a a A 0 Figure 11: The indicator of a subset A (of a given set) is the function that takes the value one for every element of the subset, and the value zero for the elements out of the subset As two subsets are equal if their indicators are equal, the properties of the two operations and (like those in equations ) can be demonstrated using the numerical relations 18 A partition of a set A is a set P of subsets of A such that the union of all the subsets equals A (the elements of P cover A ) and such that the intersection of any two subsets is empty (the elements are pairwise disjoint ) The elements of P are called the blocks of the partition

15 11 Sets Basic Properties Let A 1, A 2, and A 3 be arbitrary subsets of some reference set A 0 From A 1 A 2 and A 2 A 3 it follows A 1 A 3, while from A 1 A 2 and A 2 A 1 it follows A 1 = A 2 Among the many other properties valid, let us remark the De Morgan laws the commutativity relations the associativity relations (A B) c = A c B c ; (A B) c = A c B c, (19) A 1 A 2 = A 2 A 1 ; A 1 A 2 = A 2 A 1, (110) A 1 ( A 2 A 3 ) = ( A 1 A 2 ) A 3 A 1 ( A 2 A 3 ) = ( A 1 A 2 ) A 3, (111) and the distributivity relations A 1 ( A 2 A 3 ) = ( A 1 A 2 ) ( A 1 A 3 ) A 1 ( A 2 A 3 ) = ( A 1 A 2 ) ( A 1 A 3 ) (112)

16 6 Sets 12 Sigma-Fields 121 Cardinality of a Set When a set A has finite number of elements, its cardinality, denoted A, is the number of elements in the set Two sets with an infinite number of elements have the same cardinality if the elements of the two sets can be put in a one-to-one correspondence (through a bijection) The sets than can be put in correspondence with the set N of natural numbers are called countable (or enumerable) The (infinite) cardinality of N is denoted N = ℵ 0 (aleph-zero), so if a set A is countable, its cardinality is A = ℵ 0 Cantor (1884) proved that the set R of real numbers is not countable: (the real numbers form a continuous set) The (infinite) cardinality of R is denoted R = ℵ 1 (aleph-one) Any set A that can be put in correspondence with the set of real numbers has, therefore, the cardinality A = ℵ 1 One can give a clear sense to the relation ℵ 1 > ℵ 0, but we don t need those details in this book

17 12 Sigma-Fields Field The power set of a set Ω, denoted (Ω), has been defined as the set of all possible subsets of Ω If the set Ω has a finite or a countably infinite number of elements, we can build probability theory on (Ω), and we can then talk about the probability of any subset of Ω Things are slightly more complicated when the set Ω has an uncountable number of elements As in most of our applications the set Ω is a (finite-dimensional) manifold, these complications matter The difficulty is that one can consider subsets in a manifold whose shape is so complicated that it is not possible to assign to them a measure (be it a volume or a probability ) Then, when dealing with a set Ω with an uncountable number of elements one needs to only consider subsets of Ω with shapes that are simple enough This is why we need to review here some of the mathematical concepts necessary for the development of probability theory, notably the notions of field, of sigma-field, and of Borel field Definition 11 Consider an arbitrary set Ω, and the set (Ω) of all the subsets of Ω A set F of subsets of Ω, ie, a set F (Ω), is called a field if the empty set belongs to F, F, (113) F is closed under complementation, A F A c F, (114) F is closed under the union of two sets, A F AND B F A B F (115) Property 11 It immediately follows from this definition that the whole set Ω also belongs to F, Ω F, (116) F is closed under the intersection of two sets, A F AND B F A B F (117)

18 8 Sets It also follows that F is closed under finite union and finite intersection of sets, ie, for any finite n, A 1 A 2 A n F ; A 1 A 2 A n F (118) Example 11 Ω being an arbitrary set, F = {, Ω} is a field (called the trivial field) Example 12 Ω being an arbitrary set, F = (Ω) is a field Definition 12 Let C be a collection of subsets of Ω The minimal field containing C, denoted F(C), is the smallest field containing C (See figure 12 for an example) Figure 12: Let Ω be an interval [a, b) of the real line (suggested at the top), and let be C the collection of the two intervals [a 1, b 1 ) and [a 2, b 2 ) suggested in the middle The minimal field containing C is the collection of intervals suggested at the bottom

19 12 Sigma-Fields Sigma-Field Definition 13 Consider an arbitrary set Ω, and the set (Ω) of all the subsets of Ω A set F of subsets of Ω, ie, a set F (Ω), is called a sigma-field (σ-field ) or sigmaalgebra (σ-algebra ) if the empty set belongs to F, F is closed under complementation, F is closed under any countable union of sets, F, (119) A F A c F, (120) A i F A i F (121) i=1 For short, one says that a σ-field is a collection of subsets that is closed under countable unions It is easy to demonstrate that the same holds for the intersection, so one can say that a σ-field is a collection of subsets of that is closed under countable unions and intersections The elements of a σ-field are called measurable sets If a field has a finite number of elements, it is also a σ-field A σ-field is always a field Example 13 The fields in examples 11 and 12 are σ-fields Example 14 Ω being an arbitrary set, consider a finite partition (resp a countably infinite partition) of Ω, ie, sets C 1, C 2,, whose total union is Ω, and that do not overlap: i C i = Ω ; i = j C i C j = (122) The set F consisting of the empty set plus all finite (resp countable) unions of sets C i is a σ-field Definition 14 Let C be a collection of subsets of Ω The minimal σ-field containing C, denoted σ(c), is the smallest σ-field containing C If C is finite, then σ(c) = F(C) If Ω is non-denumerable and we use F = (Ω) we get into trouble defining our probability measure because (Ω) is too huge, ie, there will often be sets to which it will be impossible to assign a unique measure, giving rise to some difficulties 2 So we have to use a smaller σ-field (eg the Borel field of Ω, which is the smallest σ-field that makes all open sets measurable) 2 Like the Banach-Tarski paradox

20 10 Sets Definition 15 Borel field Let Ω = R, and C the set of all intervals of the real line of the form [r 1, r 2 ) The Borel field, denoted B, is the set σ(c), ie, the minimal σ-field containing C The sets of the Borel field are called Borel sets The Borel set contains all countable sets of numbers, all open, semi-open, and closed intervals, and all the sets that can be obtained by countably many set operations Although it contains a large collection of subsets of the real line, it is smaller than (R), the power set of R, and it is possible (but nontrivial) to define subsets of the real line that are not Borel sets These sets do not matter for the problems examined in this book Example 15 Consider the n-dimensional Euclidean manifold, Ω = R n Stating from the open or the closed the Borel field is

21 13 Mappings Mappings Consider a mapping (or function) from a set A 0 into a set B 0 We write a b = ϕ(a), (123) and we say that a is the argument of the mapping, and that b is the image of a under the mapping ϕ Given A A 0, the set B B 0 of all the points that are the images of the points in A is called the image of A under the mapping ϕ (see figure 13), and one writes A B = ϕ[a] (124) Note that, while we write ϕ( ) for the function mapping an element into an element, we write ϕ[ ] for the function mapping a set into a set Reciprocally, given B B 0, the set A A 0 of all the points a A 0 such that ϕ(a) B is called the reciprocal image (or preimage) of B (see figure 13), and one writes A = ϕ -1 [ B ] (125) The mapping ϕ -1 [ ] is called the reciprocal extension of ϕ( ) Of course, the notation ϕ -1 doesn t imply that the point-to-point mapping x ϕ(x) is invertible (in general, it is not) Note that there may exists sets B B 0 for which ϕ -1 [ B ] = a b = (a) image reciprocal image A?? B A 0 B 0 Figure 13: Left: A mapping from a set A 0 into a set B 0 Center: the image ϕ[a] of a set A A 0 Right: the reciprocal image ϕ -1 [ B ] of a set B B 0 A mapping ϕ from a set A into a set B is called surjective if for every b B there is at least one a A such that ϕ(a) = b (see figure 14) One also says that ϕ is a surjection, or that it maps A onto B A mapping ϕ such that ϕ(a 1 ) = ϕ(a 2 ) a 1 = a 2 is called injective (or one-to-one mapping, or injection) A mapping that is both, injective and surjective, is called called bijective (or a bijection) It is then invertible: for every b B there is one, and only one, a A such that ϕ(a) = b, that one denotes a = ϕ -1 (b), and calls the inverse image of b If ϕ is a bijection, then, the reciprocal image of a set, as introduced by equation 125, equals the inverse image of the set

22 12 Sets mapping (not surjective, not injective) A B surjective mapping (not injective) injective mapping (not surjective) bijective mapping (surjective and injective) A B A B A B Figure 14: The mapping at the top is not surjective (because there is one element in B that has not a reciprocal image in A ), and is not injective (because two elements in A have the same image in B ) At the bottom, examples of a surjection, an injection, and a bijection

23 13 Mappings Some Properties In what follows, let us always denote by ϕ a mapping from a set A 0 into a set B 0 The following properties are well known (see Bourbaki [1970] for a demonstration) For any A A 0, one has A ϕ -1 [ϕ[a] ], (126) and one has A = ϕ -1 [ϕ[a] ] if the mapping is injective For any B B 0, one has ϕ[ϕ -1 [ B ] ] B, (127) and one has ϕ[ϕ -1 [ B ] ] = B if the mapping is surjective For any two subsets of B 0, For any two subsets of A 0, ϕ -1 [B 1 B 2 ] = ϕ -1 [B 1 ] ϕ -1 [B 2 ] ϕ -1 [B 1 B 2 ] = ϕ -1 [B 1 ] ϕ -1 [B 2 ] ϕ[a 1 A 2 ] = ϕ[a 1 ] ϕ[a 2 ] ϕ[a 1 A 2 ] ϕ[a 1 ] ϕ[a 2 ], and one has ϕ[a 1 A 2 ] = ϕ[a 1 ] ϕ[a 2 ] if ϕ is injective (see figure 16) (128) (129) Figure 15: In general, ϕ[a 1 A 2 ] ϕ[a 1 ] ϕ[a 2 ], unless when ϕ is injective, in which case, the two sets are equal I now introduce a relation that is similar to the second relation in 129, but with an equality symbol (this relation shall play a major role when we shall become interested in problems of data interpretation) For any A A 0 and any B B 0, one has ϕ[ A ϕ -1 [ B ] ] = ϕ[a] B (130)

24 14 Sets B B A A Figure 16: One always has ϕ[ A ϕ -1 [ B ] ] = ϕ[a] B Note: this figure should perhaps replace figure 17 See appendix 511 for the proof of this property, and figure 17 for a graphical illustration

25 13 Mappings 15 Figure 17: Geometrical construction illustrating the property ϕ[ A ϕ -1 [B] ] = ϕ[a] B when the sets are one-dimensional manifolds

26 16 Sets 132 Indicator of the Image of a Set In a later chapter of this book we shall be interested in the notions of image and of reciprocal image of a probability Because a probability can be seen as a generalization of an indicator, we can here prepare our future work by examining here the image and the reciprocal image of a set in terms of indicators Note: I should, perhaps, use the notation abuse ϕ -1 (b) ϕ -1 [{b}] (131) Note: explain that I introduce the definition { 0 if A χ A1 [ A 2 ] 2 = max a A2 { χ A1 (a) } if A 2 = (132) 1321 Indicator of the image Let ϕ be a mapping from a set A 0 into a set B 0 For any A A 0, we have introduced the image ϕ[ A ] B 0, and by definition of indicator, ξ ϕ[a] (b) = { 1 if b ϕ[a] 0 if b / ϕ[a], How can we express the indicator ξ ϕ[a] in terms of the indicator χ A? Should the mapping ϕ be a bijection, then, of course, (133) ξ ϕ[a] (b) = χ A (ϕ -1 (b) ), (134) but let us try to be general If ϕ is an arbitrary mapping, then, for a given point b B 0, there is a set (perhaps empty) of points a A 0 that map into b, ie, such that ϕ(a) = b This is the set ϕ -1 [{b}] Then, { 0 if ϕ -1 [{b}]) = ξ ϕ[a] (b) = max a ϕ -1 [{b}] ( χ A(a) ) if ϕ -1 (135) [{b}] =, ie, using the definition 132, for any b B 0, ξ ϕ[a] (b) = χ A (ϕ -1 [{b}] ) (136) The apparent simplicity of this expression should not mislead us: the evaluation of ξ ϕ[a] (b) requires (for each b B 0 ) the identification of all the elements in ϕ -1 [{b}], which may not be easy if the mapping ϕ is complicated

27 13 Mappings Indicator of the reciprocal image Let ϕ be a mapping from a set A 0 into a set B 0 For any B B 0, we can express the indicator of the set ϕ -1 [ B ] A 0 in terms of the indicator of the set B B 0 : it is the function χ ϕ -1 [ B ] that to any a A 0 associates the number χ ϕ -1 [ B ] (a) = ξ B(ϕ(a) ) : for any a A 0, χ ϕ -1 [ B ] (a) = ξ B(ϕ(a) ) (137) Using the simplified notation for the indicators, for any a A 0, (ϕ -1 [ B ])(a) = B(ϕ(a) ) (138)

28 18 Sets 14 Assimilation of Data ( Inverse Problems ) We need to become familiar with properties like a A ϕ(a) ϕ[a] a A ϕ(a) ϕ[a] (139) and ϕ(a) B a ϕ -1 [ B ] (140) We also need to remember here that the definition of intersection of sets (equation 16) can be written a A 1 AND x A 2 a A 1 A 2 (141) Consider an arbitrary mapping ϕ from a set A 0 into a set B 0 For any A A 0 and any B B 0, one has a A a A A ϕ -1 [ B ] AND AND b B b B ϕ[a] B (142) AND b = ϕ(a) AND b = ϕ(a) While the implication is trivial, the implication is an actual restriction of the domains We do not need a formal proof: a look at the bottom panel of figure 18 will show that 142 is, indeed, a general property What is less obvious is the relation B = ϕ[a ], (143) that follows from the property ϕ[ A ϕ -1 [ B ] ] = ϕ[a] B (equation 130, demonstrated in appendix 511) Remark that we are inside the paradigm typical of an inverse problem : the mapping a b = ϕ(a) can be seen as the typical mapping between the model parameter space and the observable parameter space Working with sets instead of working with volumetric probabilities corresponds to the interval estimation philosophy that some authors (eg, Stark [1992, 1997]) prefer Under this set theory setting, we can be precise about the posterior domain for a and also about the posterior domain for b = ϕ(a)

29 14 Assimilation of Data ( Inverse Problems ) 19 In figure 18, the typical problem of data assimilation is examined (this is sometime called an inverse problem ) One starts with a set A 0 (the the set of all conceivable models) and a set B 0 (the set of all conceivable observations) Any model a must satisfy a A 0 and any observation b must satisfy b B 0 (top row of the figure) One then may have the a priori information a A A 0 and the (imprecise) observation b B B 0 (second row of the figure) In the absence of any relation between a and b, this is all what one has But if some physical theory is able to provide a relation b = ϕ(a) (prediction of the observation, given a model), then one has { a A AND b B AND b = ϕ(a) } (third row of the figure) And, as explained above, this is equivalent to { a A ϕ -1 [ B ] AND b ϕ[a] B AND b = ϕ(a) } (bottom row of the figure) The use of the (imprecise) observation b B has allowed to pass (via the relation b = ϕ(a) ) from the a priori information a A to the a posteriori information a A A ϕ -1 [ B ] A This also increases the available information on b, as one passes from the initial b B to the final b B ϕ[a] B B And, because of the identity expressed in equation 130, one has the consistency relation B = ϕ[a ] In chapter 3 of this book this theory is generalized so as to work with probability distributions instead of with sets

30 20 Sets A 0 B 0 a b a A 0 b B 0 A 0 B 0 a b A 0 B 0 a b a b A 0 B 0 a b a b Figure 18: A typical problem of data assimilation (see text for details)

31 14 Assimilation of Data ( Inverse Problems ) Exercise This exercise is all explained in the caption of figure 19 The exercise is the same than that in section 3410, excepted that while here we use sets, in section 3410 we use volumetric probabilities The initial intervals here (initial information on x and on y ) correspond to the two-sigma intervals of the (truncated) Gaussians of section 3410, so the results are directly comparable y y = (x) q p Figure 19: Two quantities x and y have some definite values x true and y true, that we try to uncover For the time being, we have the following information on these two quantities, x true f = [1, 5], and y true p = [14, 22] (the black intervals in the figure) We then learn that, in fact, y true = ϕ(x true ), with the function x y = ϕ(x) = x 2 (x 3) 3 represented above To deduce the posterior interval containing x true, we can first introduce the reciprocal image ϕ -1 (p) of the interval p (blue interval at the bottom), then define the intersection g = f ϕ -1 (p) (red interval at the bottom) To obtain the posterior interval containing y true, we can just evaluate the image of the interval g by the mapping: q = ϕ(g) = ϕ( f ϕ -1 (p)) We could have obtained the interval q following a different route We could first have evaluated the image of f, to obtain the interval ϕ( f ) (blue interval at the left) The intersection of the interval p with the interval ϕ( f ) then gives the same interval q, because of the property ϕ( f ϕ -1 (p)) = ϕ( f ) p f g x

32 22 Sets

33 Chapter 2 Manifolds Old text begins Probability densities play an important role in physics To handle them properly, we must have a clear notion of what integrating a scalar function over a manifold means While mathematicians may assume that a manifold has a notion of volume defined, physicists must check if this is true in every application, and the answer is not always positive We must understand how far can we go without having a notion of volume, and we must understand which is the supplementary theory that appears when we do have such a notion It is my feeling that every book of probability theory should start with a chapter explaining all the notions of tensor calculus that are necessary to develop an intrinsic theory of probability This is the role of this chapter In it, in addition to ordinary tensors, we shall find the tensor capacities and tensor densities that were common in the books of a certain epoch, but that are not in fashion today (wrongly, I believe) Old text ends

34 24 Manifolds 21 Manifolds and Coordinates In this first chapter, the basic notions of tensor calculus and of integration theory are introduced I do not try to be complete Rather, I try to develop the minimum theory that is necessary in order to develop probability theory in subsequent chapters The reader is assumed to have a good knowledge of tensor calculus, the goal of the chapter being more to fix terminology and notations than to advance in the theory Many books on tensor calculus exist Among the many books on tensor calculus, the best are (of course) in French, and Brillouin (1960) is the best among them Many other books contain introductory discussions on tensor calculus Weinberg (1972) is particularly lucid Perhaps original in this text is a notation proposed to distinguish between densities and capacities While the trick of using indices in upper or lower position to distinguish between vectors or forms (or, in metric spaces, to distinguish between contravariant or covariant components) makes formulas intuitive, I propose to use a bar (in upper or lower position) to distinguish between densities (like a probability density) or capacities (like the capacity element of integration theory), this also leading to intuitive results In particular the bijection existing between these objects in metric spaces becomes as natural as the one just mentioned between contravariant and covariant components All through this book the implicit sum convention over repeated indices is used: an expression like t i j n j means j t i j n j

35 21 Manifolds and Coordinates Linear Spaces Consider a finite-dimensional linear space L, with vectors denoted u, v If {e i } is a basis of the linear space, any vector v can be (uniquely) decomposed as v = v i e i, (21) this defining the components {v i } of the vector v in the basis {e i } A linear form over L is a linear application from L into the set of real numbers, ie, a linear application that to every vector v L associates a real number Denoting by f a linear form, the number λ associated by f to an arbitrary vector v is denoted λ = f, v (22) For any given linear form, say f, there is a unique set of quantities { f i } such that for any vector v, f, v = f i v i (23) It is easy to see that the set of linear forms over a linear space L is itself a linear space, that is denoted L The quantities { f i } can then be seen as being the components of the form f on a basis of forms {e i }, that is called the dual of the vector basis {e i }, and that may be defined by the condition e i, e j = δ i j (24) (where δ i j is the symbol that takes the value one when i = j and zero when i = j ) The two linear space L and L are the building blocks of an infinite series of more complex linear spaces For instance, a set of coefficients t jk i can be used to define the linear application {v i }, { f i }, {g i } λ = t i jk v i f j g k (25) As it is easy to define the sum of two such linear applications, and the multiplication of such a linear application by a real number, we can say that the coefficients {t i jk } define an element of a linear space, denoted L L L The coefficients {t i jk } can then be seen as the components of an element t of the linear space L L L on a basis that is denoted {e i e j e k }, and one writes t = t i jk e i e j e k (26)

36 26 Manifolds 212 Manifolds Grossly speaking, a manifold is a space of points The physical 3D space is an example of a three-dimensional manifold, and the surface of a sphere is an example of a two-dimensional manifold In our theory, we shall consider manifolds with an arbitrary but finite number of dimensions Those manifolds may be flat or not (although the curvature of a manifold will appear only in one of the appendixes [note: what about the curvature of the sphere?]) We shall examine smooth manifolds only For instance, the surface of a sphere is a smooth manifold The surface of a cone is smooth everywhere, excepted at the tip of the cone The points inside well chosen portions of a manifold can be designated by their coordinates: a coordinate system with n coordinates defines a one-to-one application between a portion of a manifold and a portion of R n We then say that the manifold has n dimensions The term portion is used here to stress that many manifolds can not be completely covered by a single coordinate system: any single coordinate system on the surface of the sphere will be pathological at least at one point (the spherical coordinates are pathological at two points, the two poles) In what follows, smooth manifolds shall be denoted by symbols like M and N, and the points of a manifold by symbols like P and Q A coordinate system is denoted, for instance, by {x i } At each point P of an n-dimensional manifold M one can introduce the linear tangent space, and all the vectors and tensors that can exist 1 at that point When a system of coordinates {x i } is defined over the manifold M, at each point P of the manifold there is the natural basis (of the tangent linear space at P ) Actual tensors can be defined at any point independently of any coordinate system (and of any local basis), but their components are, of course, only defined when a basis is chosen Usually, this basis is the natural basis associated to a coordinate system When changing coordinates, the natural basis changes, so the components of the tensors change too The formulas describing the change of components of a tensor under a change of coordinates are recalled below While tensors are intrinsic objects, it is sometimes useful to introduce tensor densities and tensor capacities, that depend on the coordinates being used in an essential way These densities and capacities are useful, in particular, to develop the notion of volume (or of measure ) on a manifold, and, therefore, to introduce the basic concept of integral It is for this reason that, in addition to tensors, densities and capacities are also considered below 1 The vectors belong to the tangent linear space, and the tensors belong to the different linear spaces that can be built at point P using the different tensor products of the tangent linear space and its dual

37 21 Manifolds and Coordinates Changing Coordinates Consider, over a finite-dimensional (smooth) manifold M, a first system of coordinates {x i } ; (i = 1,, n) and a second system of coordinates {x i } ; (i = 1,, n) (putting the primes in the indices rather than in the x s greatly simplifies many tensor equations) One may write the coordinate transformation using any of the two equivalent functions x i = x i (x 1,, x n ) ; (i = 1,, n) x i = x i (x 1,, x n ) ; (i = 1,, n) (27) We shall need the two sets of partial derivatives 2 One has X i i = xi x i ; X i i = xi x i (28) X i k X k j = δi j ; Xi k Xk j = δ i j (29) To simplify language and notations, it is useful to introduce two matrices of partial derivatives, ranging the elements X i i and Xi i as follows, X 1 1 X1 2 X1 3 X 1 X = X 2 1 X 1 2 X X2 2 X2 3 ; X = X 2 1 X 2 2 X 2 3 (210) Then, equations 29 just tell that the matrices X and X are mutually inverses: X X = X X = I (211) The two matrices X and X are called Jacobian matrices As the matrix X is obtained by taking derivatives of the variables x i with respect to the variables x i, one obtains the matrix {X i i} as a function of the variables {x i }, so we can write X (x) rather than just writting X The reciprocal argument tels that we can write X(x ) rather than just X We shall later use this to make some notations more explicit Finally, the Jacobian determinants of the transformation are the determinants of the two Jacobian matrices: Of course, X X = 1 X = det X ; X = det X (212) 2 Again, the same letter X is used here, the primes in the indices distinguishing the different quantities

38 28 Manifolds 214 Tensors, Capacities, and Densities Consider a finite-dimensional manifold M with some coordinates {x i } Let P be a point of the manifold, and {e i } a basis of the linear space tangent to M at P, this basis being the natural basis associated to the coordinates {x i } at point P Let T = T i j kl e i e j e k e l be a tensor at point P The T i j kl are, therefore, the components of T on the basis e i e j e k e l On a change of coordinates from {x i } into {x i }, the natural basis will change, and, therefore, the components of the tensor will also change, becoming T i j k l It is well known that the new and the old components are related through T i j k l = xi x i x j x j xk x k x l x l T i j kl, (213) or, using the notations introduced above, T i j k l = X i i X j j X k k Xl l Ti j kl (214) In particular, for totally contravariant and totally covariant tensors, T i j = X i i X j j T i j ; T i j = X i i X j j T i j (215) In addition to actual tensors, we shall encounter other objects, that have indices also, and that transform in a slightly different way: densities and capacities (see for instance Weinberg [1972] and Winogradzki [1979]) Rather than a general exposition of the properties of densities and capacities, let us anticipate that we shall only find totally contravariant densities and totally covariant capacities (like the Levi-Civita capacity, to be introduced below) From now on, in all this text, a density is denoted with an overline, like in a ; a capacity is denoted with an underline, like in b Let me now give what we can take as defining properties: Under the considered change of coordinates, a totally contravariant density a = a i j e i e j changes components following the law a i j = 1 X Xi i X j j a i j, (216) or, equivalently, a i j = X X i i X j j a i j Here X = det X and X = det X are the Jacobian determinants introduced in equation 212 This rule for the change of components for a totally contravariant density is the same as that for a totally contravariant tensor (equation at left in 215), excepted that there is an extra factor, the Jacobian determinant X = 1/X

39 21 Manifolds and Coordinates 29 Similarly, a totally covariant capacity b = b i j e i e j changes components following the law b i j = 1 X Xi i X j j b i j, (217) or, equivalently, b i j = X X i i X j j b i j Again, this rule for the change of components for a totally covariant capacity is the same as that for a totally covariant tensor (equation at right in 215), excepted that there is an extra factor, the Jacobian determinant Y = 1/X The most notable examples of tensor densities and capacities are the Levi-Civita density and Levi-Civita capacity (examined in section 218 below) The number of terms in equations 216 and 217 depends on the variance of the objects considered (ie, in the number of indices they have) We shall find, in particular, scalar densities and scalar capacities, that do not have any index The natural extension of equations 216 and 217 is (a scalar can be considered to be a totally antisymmetric tensor) for a scalar density, and a = 1 X a = X a (218) b = 1 X b = X b (219) for a scalar capacity The most notable example of a scalar capacity is the capacity element (as explained in section 2111, this is the equivalent of the volume element that can be defined in metric manifolds) Scalar densities abound; for example, a probability density Let us write the two equations more explicitly Using x as variable, a (x ) = X(x ) a(x(x )) ; b (x ) = 1 X(x ) b(x(x )), (220) or, equivalently, using x as variable, a (x (x)) = 1 X (x) a(x) ; b (x (x)) = X (x) b(x) (221) For completeness, let me mention here that densities and capacities of higher degree are also usually introduced (they appear briefly below) For instance, under a change of variables, a second degree (totally contravariant) tensor density would not satisfy equation 216, but, rather, a i j = 1 (X ) 2 Xi i X j j a i j, (222)

40 30 Manifolds where the reader should note the double bar used to indicate that a i j is a second degree tensor density Similarly, under a change of variables, a second degree (totally covariant) tensor capacity would not satisfy equation 217, but, rather, b i j = 1 X 2 Xi i X j j b i j (223) The multiplication of tensors is one possibility for defining new tensors, like in t k i j = f j s k i Using the rules of change of components given above it is easy to demonstrate the following properties: the product of a density by a tensor gives a density (like in p i = ρ v i ); the product of a capacity by a tensor gives a capacity (like in s i j = t i u j ); the product of a capacity by a density gives a tensor (like in dσ = g dτ ) Therefore, in a tensor equality, the total number of bars in each side of the equality must be balanced (counting upper and lower bars with opposite sign)

41 21 Manifolds and Coordinates Kronecker Tensors (I) There are two Kronecker s symbols, δ i j and δ i j They are defined similarly: δ i j = { 1 if i and j are the same index 0 if i and j are different indices, (224) and δ i j = { 1 if i and j are the same index 0 if i and j are different indices (225) It is easy to verify that these are more than simple symbols : they are tensors For under a change of variables we should have, using equation 214, δ i j = Xi i X j j δi j, ie, δ i j = Xi i X i j, which is indeed true (see equation 29) Therefore, we shall say that δ i j and δ i j are the Kronecker tensors Warning: a common error in beginners is to give the value 1 to the symbol δ i i In fact, the right value is n, the dimension of the space, as there is an implicit sum assumed: δ i i = δ δ δ n n = = n

42 32 Manifolds 216 Orientation of a Coordinate System The Jacobian determinants associated to a change of variables x y have been defined in section 212 As their product must equal +1, they must be both positive or both negative Two different coordinate systems x = {x 1, x 2,, x n } and y = {y 1, y 2,, y n } are said to have the same orientation (at a given point) if the Jacobian determinants of the transformation, are positive If they are negative, it is said that the two coordinate systems have opposite orientation Note: what follows is a useless complication! Note: what about ± det g? Nota: hablar con Tolo (tiene las ideas muy claras sobre el asunto As changing the orientation of a coordinate system simply amounts to change the order of two of its coordinates, in what follows we shall assume that in all our changes of coordinates, the new coordinates are always ordered in a way that the orientation is preserved The special one-dimensional case (where there is only one coordinate) is treated in an ad-hoc way Example 21 In the Euclidean 3D space, a positive orientation is assigned to a Cartesian coordinate system {x, y, z} when the positive sense of the z is obtained from the positive senses of the x axis and the y axis following the screwdriver rule Another Cartesian coordinate system {u, v, w} defined as u = y, v = x, w = z, then would have a negative orientation A system of theee spherical coordinates, if taken in their usual order {r,θ,ϕ}, then also has a positive orientation, but when changing the order of two coordinates, like in {r,ϕ,θ}, the orientation of the coordinate system is negative For a system of geographical coordinates 3, the reverse is true, while {r,ϕ, λ} is a positively oriented system, {r, λ,ϕ} is negatively oriented 3 The geographical coordinate λ (latitude) is related to the spherical coordinate θ as λ +θ = π/2 Therefore, cos ϑ = sinθ

43 21 Manifolds and Coordinates Totally Antisymmetric Tensors A tensor is completely antisymmetric if any even permutation of indices does not change the value of the components, and if any odd permutation of indices changes the sign of the value of the components: t pqr = { +ti jk if i jk is an even permutation of pqr t i jk if i jk is an odd permutation of pqr (226) For instance, a fourth rank tensor t i jkl is totally antisymmetric if t i jkl = t ikl j = t il jk = t jilk = t jkil = t jlki = t ki jl = t k jli = t kli j = t lik j = t l jik = t lki j = t i jlk = t ik jl = t ilk j = t jikl = t jkli = t jlik (227) = t kil j = t k jil = t kl ji = t li jk = t l jki = t lki j a third rank tensor t i jk is totally antisymmetric if t i jk = t jki = t ki j = t ik j = t jik = t k ji, (228) a second rank tensor t i j is totally antisymmetric if t i j = t ji (229) By convention, a first rank tensor t i and a scalar t are considered to be totally antisymmetric (they satisfy the properties typical of other antisymmetric tensors)

44 34 Manifolds 218 Levi-Civita Capacity and Density When working in a manifold of dimension n, one introduces two Levi-Civita symbols, ɛ i1 i 2 i n and ɛ i 1i 2 i n (having n indices each) They are defined similarly: and ɛ i jk = ɛ i jk = +1 if i jk is an even permutation of 12 n 0 if some indices are identical 1 if i jk is an odd permutation of 12 n, +1 if i jk is an even permutation of 12 n 0 if some indices are identical 1 if i jk is an odd permutation of 12 n (230) (231) In fact, these are more than symbols : they are respectively a capacity and a density Let us check this, for instance, for ɛ i jk In order for ɛ i jk to be a capacity, one should verify that, under a change of variables over the manifold, expression 217 holds, so one should have ɛ i j = X 1 Xi i X j j ɛ i j That this is true, follows from the property X i i X j j ɛ i j = X ɛ i j that can be demonstrated using the definition of a determinant (see equation 233) It is not obvious a priori that a property as strong as that expressed by the two equations is conserved through an arbitrary change of variables We see that this is due to the fact that the very definition of determinant (equation 233) contains the Levi-Civita symbols Therefore, ɛ i jk is to be called the Levi-Civita capacity, and ɛ i jk is to be called the Levi-Civita density By definition, these are totally antisymmetric In a space of dimension n the following properties hold ɛ s1 s n ɛ s 1s n = n! ɛ i1 s 2 s n ɛ j 1s 2 s n = (n 1)! δ j 1 i 1 ɛ i1 i 2 s 3 s n ɛ j 1 j 2 s 3 s n = (n 2)! ( δ j 1 i 1 δ j 2 i 2 δ j 2 i 1 δ j 1 i 2 ) =, (232) the successive equations involving the Kronecker determinants, whose theory is not developed here

45 21 Manifolds and Coordinates Determinants The Levi-Civita s densities and capacities can be used to define determinants For instance, in a space of dimension n, the determinants of the tensors Q i j, R i j, S i j, and T i j are defined by Q = 1 n! ɛi 1i 2 i n ɛ j 1 j 2 j n Q i1 j 1 Q i2 j 2 Q in j n R = 1 n! ɛi 1i 2 i n ɛ j1 j 2 j n R i1 j 1 R i2 j 2 R in j n (233) S = 1 n! ɛ i 1 i 2 i n ɛ j 1 j 2 j n S i 1 j1 S i 2 j2 S i n jn T = 1 n! ɛ i 1 i 2 i n ɛ j1 j 2 j n T i 1 j 1 T i 2 j 2 T i n j n In particular, it is the first of equations 233 that is used below (equation 273) to define the metric determinant

46 36 Manifolds 2110 Dual Tensors and Exterior Product of Vectors In a space of dimension n, to any totally antisymmetric tensor T i 1i n associates the scalar capacity of rank n one t = 1 n! ɛ i 1 i n T i 1i n, (234) while to any scalar capacity t we can associate the totally antisymmetric tensor of rank n T i 1i n = ɛ i 1i n t (235) These two equations are consistent when taken together (introducing one into the other gives an identity) We say that the capacity t is the dual of the tensor T, and that the tensor T is the dual of the capacity t In a space of dimension n, given n vectors v 1, v 2 v n, one defines the scalar capacity w = ɛ i1 i n (v 1 ) i 1 (v 2 ) i 2 (v n ) i n, or, using simpler notations, w = ɛ i1 i n v i 1 1 v i 2 2 v i n, (236) that is called the exterior product of the n vectors This exterior product is usually denoted w = v 1 v 2 v n, (237) and we shall sometimes use this notation, although it is not manifestly covariant (the number of bars at the left and the right is not balanced) The exterior product changes sign if the order of two vectors is changed, and is zero if the vectors are not linearly independent

47 21 Manifolds and Coordinates Capacity Element Consider, at a point P of an n-dimensional manifold M, n vectors {dr 1, dr 2,, dr n } of the tangent linear space (the notation dr is used to suggest that, later on, a limit will be taken, where all these vectors will tend to the zero vector) Their exterior product is dv = ɛ i1 i n dr i 1 1 dr i 2 2 dr i n, (238) or, equivalently, dv = dr 1 dr 2 dr n (239) Let us see why this has to be interpreted as the capacity element associated to the n vectors {dr 1, dr 2,, dr n } Assume that some coordinates {x i } have been defined over the manifold, and that we choose the n vectors at point P each tangent to one of the coordinate lines at this point: dx dr 1 = 0 ; dr dx 2 2 = ; ; dr n = 0 (240) 0 0 dx n The n vectors, then, can be interpreted as the perturbations of the n coordinates The definition in equation 238 then gives dv = ɛ 12n dx 1 dx 2 dx n (241) Using a notational abuse, this capacity element is usually written, in mathematical texts, as dv(x) = dx 1 dx 2 dx n, (242) while in physical texts, using more elementary notations, one simply writes dv = dx 1 dx 2 dx n (243) This is the usual capacity element that appears in elementary calculus to develop the notion of integral I say capacity element and not volume element because the volume spanned by the vectors {dr 1, dr 2,, dr n } shall only be defined when the manifold M shall be a metric manifold, ie, when the distance between two points of the manifold is defined The capacity element dv can be interpreted as the small hyperparallepiped defined by the small vectors {dr 1, dr 2,, dr n }, as suggested in figure 21 for a threedimensional space Under a change of coordinates (see an explicit demonstration in appendix 521) one has dx 1 dx n = det X dx 1 dx n (244) This, of course, is just a special case of equation 219 (that defines a scalar capacity)

48 38 Manifolds Figure 21: From three small vectors in a threedimensional space one defines the three-dimensional capacity element dv = ɛ i jk dr i 1 dr j 2 drk 3, that can be interpreted as representing the small parallelepiped defined by the three vectors To this parallelepiped there is no true notion of volume associated, unless the three-dimensional space is metric dr 3 dr 2 dr 1

49 21 Manifolds and Coordinates Integral Consider an n-dimensional manifold M, with some coordinates {x i }, and assume that a scalar density f (x 1, x 2, ) has been defined at each point of the manifold (this function being a density, its value at each point depends on the coordinates being used; an example of practical definition of such a scalar density is given in section 2210) Dividing each coordinate line in small increments x i divides the manifold M (or some domain D of it) in small hyperparallelepipeds that are characterized, as we have seen, by the capacity element (equations ) v = ɛ 12n x 1 x 2 x n = x 1 x 2 x n (245) At every point, we can introduce the scalar v f (x 1, x 2, ) and, therefore, for any domain D M, the discrete sum v f (x 1, x 2, ) can be considered, where only the hyperparallelepipeds that are inside the domain D (or at the border of the domain) are taken into account (as suggested by figure 22) Figure 22: The volume of an arbitrarily shaped, smooth, domain D of a manifold M, can be defined as the limit of a sum, using elementary regions adapted to the coordinates (regions whose elementary capacity is well defined) The integral of the scalar density f over the domain D is defined as the limit (when it exists) I = D dv f (x 1, x 2, ) lim v f (x 1, x 2, ), (246) where the limit corresponds, taking smaller and smaller cells, to consider an infinite number of them This defines an invariant quantity: while the capacity values v and the density values f (x 1, x 2, ) essentially depend on the coordinates being used, the integral does not (the product of a capacity times a density is a tensor) This invariance is trivially checked when taking seriously the notation D dv f In a change of variables x x, the two capacity elements dv(x) and dv (x ) are related via (equation 219) dv (x ) = 1 X(x ) dv( x(x ) ) (247)

50 40 Manifolds (where X(x ) is the Jacobian determinant det{ x i / x i } ), as they are tensorial capacities, in the sense of section 214 Also, for a density we have f (x ) = X(x ) f ( x(x ) ) (248) In the coordinates x we have I(D) = and in the coordinates x, I(D) = dv(x) f (x), (249) x D dv (x ) f (x ) (250) x D using the two equations , we imediately obtain I(D) = I(D) this showing that the integral of a density (integrated using the capacity element) is an invariant

51 21 Manifolds and Coordinates Capacity Element and Change of Coordinates Note: real text yet to be written, this is a first attempt to the demonstration The demonstration os possibly wrong, as I have not cared to well define the new capacity element At a given point of an n-dimensional manifold we can consider the n vectors {dr 1,, dr n }, associated to some coordinate system {x 1,, x n }, and we have the capacity element dv = ɛ i1 i n (dr 1 ) i 1 (dr n ) i n (251) In a change of coordinates {x i } {x i }, each of the n vectors shall have his components changed according to Reciprocally, (dr) i = xi x i (dr) i = X i i (dr) i (252) (dr) i = xi (dr) i = X i x i i (dr) i (253) The capacity element introduced above can now be expressed as dv = ɛ i1 i n X i 1 i 1 X i n i n (dr 1 ) i 1 (drn ) i n (254) I guess that I can insert here the factor n! 1 ɛi 1 i n ɛ j 1 j, to obtain n dv = ɛ i1 i n X i 1 i 1 X i n i n ( 1 n! ɛi 1 i n ɛ j 1 j n ) (dr 1) j 1 (drn ) j n (255) If yes, then I would have dv = ( 1 n! ɛ i 1 i n ɛ i 1 i n X i 1 i 1 X i n i n )ɛ j 1 j n (dr 1) j 1 (drn ) j n, (256) ie, using the definition of determinant (third of equations 233), dv = det X ɛ j 1 j n (dr 1) j 1 (drn ) j n (257) We recognize here the capacity element dv = ɛ j 1 j (dr n 1) j 1 (dr n ) j n the new coordinates Therefore, we have obtained associated to dv = det X dv (258) This, of course, is consistent with the definition of a scalar capacity (equation 219)

52 42 Manifolds 22 Volume 221 Metric OLD TEXT BEGINS In some parameter spaces, there is an obvious definition of distance between points, and therefore of volume For instance, in the 3D Euclidean space the distance between two points is just the Euclidean distance (which is invariant under translations and rotations) Should we choose to parameterize the position of a point by its Cartesian coordinates {x, y, z}, then, Note: I have to talk about the conmensurability of distances, ds 2 = ds 2 r + ds 2 s, (259) every time I have to define the Cartesian product of two spaces each with its own metric OLD TEXT ENDS A manifold is called a metric manifold if there is a definition of distance between points, such that the distance ds between the point of coordinates x = {x i } and the point of coordinates x + dx = {x i + dx i } can be expressed as 4 ds 2 = g i j (x) dx i dx j, (260) ie, if the notion of distance is of the L 2 type 5 At every point of a metric manifold, therefore, there is a symmetric tensor g i j defined, the metric tensor The inverse metric tensor, denoted g i j, is defined by the condition g i j g jk = δ i k (261) It can be demonstrated that under a change of variables, its components change like the components of a contravariant tensor, from where the notation g i j Therefore, the equations defining the change of components of the metric and of the inverse metric are (see equations 215) g i j = Xi i X j j g i j and g i j = X i i X j j g i j (262) In section 212, we introduced the matrices of partial derivatives It is useful to also introduce two metric matrices, with respectively the covariant and contravariant components of the metric: g 11 g 12 g 13 g 11 g 12 g 13 g = g 21 g 22 g 23 ; g -1 = g 21 g 22 g 23, (263) 4 This is a property that is valid for any coordinate system that can be chosen over the space 5 As a counterexample, in a two-dimensional manifold, the distance defined as ds = dx 1 + dx 2 is not of the L 2 type (it is L 1 )

53 22 Volume 43 the notation g -1 for the second matrix being justified by the definition 261, that now reads g -1 g = I (264) In matrix notation, the change of the metric matrix under a change of variables, as given by the two equations 262, is written g = X t g X ; g -1 = X g -1 X t (265) If an every point P of a manifold M there is a metric g i j defined, then, the metric can be used to define a scalar product over the linear space tangent to M at P : given two vectors v and w, their scalar product is v w g i j v i w j (266) One can also define the scalar product of two forms f and h at P (forms that belong to the dual of the linear space tangent to M at P ): f h g i j f i h j (267) The norm of a vector v and the norm of a form f are respectively defined as v = v v = g i j v i v j and f = f f = g i j v i v j

54 44 Manifolds 222 Bijection Between Forms and Vectors Let {e i } be the basis of a linear space, and {e i } the dual basis (that, as we have seen, is a basis of the dual space) In the absence of a metric, there is no natural association between vectors and forms When there is a metric, to a vector v i e i we can associate a form whose components on the dual basis {e i } are v i g i j v j (268) Similarly, to a form f = f i e i, one can associate the vector whose components on the vector basis e i are f i g i j f j (269) [Note: Give here some of the properties of this association (that the scalar product is preserved, etc)]

55 22 Volume Kronecker Tensor (II) The Kronecker s tensors δ i j and δ i j are defined that the space has a metric or not When one has a metric, one can raise and lower indices Let us, for instance, lower the first index of δ i j : δ i j g ik δ k j = g i j (270) Equivalently, let us raise one index of g i j : g i j g ik g k j = δ i j (271) These equations demonstrate that when there is a metric, the Kronecker tensor and the metric tensor are identical Therefore, when there is a metric, we can drop the symbols δ i j and δ i j, and use the symbols g i j and g i j instead

56 46 Manifolds 224 Fundamental Density Let g the metric tensor of the manifold For any (positively oriented) system of coordinates, we define the quantity g, that we call the metric density (in the given coordinates) as g = det g (272) More explicitly, using the definition of determinant in the first of equations 233, g = 1 n! ɛi 1i 2 i n ɛ j 1 j 2 j n g i1 j 1 g i2 j 2 g in j n (273) This equation immediately suggests what it is possible to prove: the quantity g so defined is a scalar density (at the right, we have two upper bars under a square root) The quantity g = 1/g (274) is obviously a capacity, that we call the metric capacity It could also have been defined as g = det g -1 1 = n! ɛ i 1 i 2 i n ɛ j1 j 2 j n g i 1 j 1 g i 2 j 2 g i n j n (275)

57 22 Volume Bijection Between Capacities, Tensors, and Densities As mentioned in section 214, (i) the product of a capacity by a density is a tensor, (ii) the product of a tensor by a density is a density, and (iii) the product of a tensor by a capacity is a capacity So, when there is a metric, we have a natural bijection between capacities and tensors, and between tensors and densities For instance, to a tensor capacity t i j kl we can associate the tensor to a tensor s i j kl we can associate the tensor density t i j kl g t i j kl (276) s i j kl g s i j kl (277) and the tensor capacity s i j kl g s i j kl, (278) and to a tensor density r i j kl we can associate the tensor r i j kl g r i j kl (279) Equations introduce an important notation (that seems to be novel): in the bijections defined by the metric density and the metric capacity, we keep the same letter for the tensors, and we just put bars or take out bars, much like in the bijection between vectors and forms defined by the metric, where we keep the same letter, and we raise or lower indices

58 48 Manifolds 226 Levi-Civita Tensor From the Levi-civita capacity ɛ i jk we can define the Levi-Civita tensor ɛ i jk as ɛ i jk = g ɛ i jk (280) Explicitly, this gives + det g if i jk is an even permutation of 12 n ɛ i jk = 0 if some indices are identical det g if i jk is an odd permutation of 12 n (281) Alternatively, from the Levi-civita density ɛ i jk we could have defined the contravariant tensor ɛ i jk as ɛ i jk = g ɛ i jk (282) It can be shown that ɛ i jk can be obtained from ɛ i jk using the metric to raise the indices, so ɛ i jk and ɛ i jk are the same tensor (from where the notation)

59 22 Volume Volume Element We may here start by remembering equation 238, dv = ɛ i1 i n dr i 1 1 dr i 2 2 dr i n, (283) that expresses the capacity element defined by n vectors dr 1, dr 2 dr n In the special situation where n vectors are taken successively along each of the n coordinate lines, this gives (equation 243) dv = dx 1 dx 2 dx n The dx i in this expression are mere coordinate increments, that bear no relation to a length As we are now working under the hypothesis that we have a metric, we know that the length associated to the coordinate increment, say, dx 1, is 6 ds = g 11 dx 1 If the coordinate lines where orthogonal at the considered point, then, the volume element, say dv, associated to the capacity element dv = dx 1 dx 2 dx n would be dv = g11 dx 1 g 22 dx 2 g nn dx n If the coordinates are not necessarily orthogonal, this expression needs, of course, to be generalized One of the major theorems of integration theory is that the actual volume associated to the hyperparallelepiped characterized by the capacity element dv, as expressed by equation 283, is dv = g dv, (284) where g is the metric density introduced above [Note: Should I give a demonstration of this property here?] We know that dv is a capacity, and g a density Therefore, the volume element dv, being the product of a density by a capacity is a true scalar While dv has been called a capacity element, dv is called a volume element The overbar in g is to remember that the determinant of the metric tensor is a density, in the tensorial sense of section 214, while the underbar in dv is to remember that the capacity element is a capacity in the tensorial sense of the term In equation 284, the product of a density times a capacity gives the volume element dv, that is a true scalar (ie, a scalar whose value is independent of the coordinates being used) In view of equation 284, we can call g(x) the density of volume in the coordinates x = {x 1,, x n } For short, we shall call g(x) the volume density 7 It is important to realize that the values g(x) do not represent any intrinsic property of the space, but, rather, a property of the coordinates being used Example 22 In the Euclidean 3D space, using spherical coordinates x = {r,θ,ϕ}, as the length element is ds 2 = dr 2 + r 2 sin 2 θ dϕ 2 + r 2 dθ 2, the metric matrix is g rr g rθ g rϕ g θr g θθ g θϕ = 0 r 2 0 (285) g ϕr g ϕθ g ϕϕ 0 0 r 2 sin 2 θ 6 Because the length of a general vector with components dx i is ds 2 = g i j dx i dx j 7 So we now have two names for g, the metric density and the volume density

60 50 Manifolds and the metric determinant is g = det g = r 2 sinθ (286) As the capacity element of the space can be expressed (using notations that are not manifestly covariant) dv = dr dθ dϕ, (287) the expression dv = g dv gives the volume element dv = r 2 sinθ dr dθ dϕ (288)

61 22 Volume Volume Element and Change of Variables Assume that one has an n-dimensional manifold M and two coordinate systems, say {x 1,, x n } and {x 1,, x n } If the manifold is metric, the components of the metric tensor can be expressed both in the coordinates x and in the coordinates x The (unique) volume element, say dv, accepts the two different expressions dv = det g x dx 1 dx n = det g x dx 1 dx n (289) The Jacobian matrices of the transformation (matrices with the partial derivatives), X and X, have been introduced in section 213 The components of the metric are related through g i j = Xi i X j j g i j, (290) or, using matrix notations, g x = X t g x X Using the identities det g x = det(x t g x X) = (det X) 2 det g x, one arrives at det gx = det X det g x (291) This is he relation between the two fundamental densities associated to each of the two coordinate systems Of course, this corresponds to equation 218 (page 29), used to define scalar densities

62 52 Manifolds 229 Volume of a Domain With the volume element available, we can now define the volume of a domain D of the manifold M, that we shall denote as by the expression D V(D) = dv D D dv, (292) dv g (293) This definition makes sense because we have already defined the integral of a density in equation 246 Note that the (finite) capacity of a finite domain D cannot be defined, as an expression like D dv would make any (invariant) sense We have here defined equation 292 in terms of equation 293, but it may well happen that, in numerical evaluations of an integral, the division of the space into small hyperparallelepideds that is implied by the use of the capacity element is not the best choice Figure 23 suggests a division of the space into cells having grossly similar volumes (to compared with figure 24) If the volume v p of each cell is known, the volume of a domain D can obviously be defined as a limit V(D) = lim v p 0 p v p (294) We will discuss this point further in later chapters Figure 23: The volume of an arbitrarily shaped, smooth, domain D of a manifold M, can be defined as the limit of a sum, using elementary regions whose individual volume is known (for instance, triangles in this 2D illustration) This way of defining the volume of a region does not require the definition of a coordinate system over the space Figure 24: For the same shape of figure 23, the volume can be evaluated using, for instance, a polar coordinate system In a numerical integration, regions near the origin may be oversampled, while regions far from the orign may be undersampled In some situation, this problem may become crucial, so this sort of coordinate integration is to be reserved to analytical developments only The finite volume obviously satisfies the following two properties:

63 22 Volume 53 for any domain D of the manifold, V(D) 0 ; if D 1 and D 2 are two disjoint domains of the manifold, then V(D 1 D 2 ) = V(D 1 ) + V(D 2 )

64 54 Manifolds 2210 Example: Mass Density and Volumetric Mass Imagine that a large number of particles of equal mass are distributed in the physical space (assimilated to an Euclidean 3D space) and that, for some reason, we chose to work with cylindrical coordinates {r, ϕ, z} Choosing small increments { r, ϕ, z} of the coordinates, we divide the space into cells of equal capacity, that (using notations that are not manifestly covariant) is given by v = r ϕ z (295) We can count how many particles are inside each cell (see figure 25), and, therefore which is the mass m inside each cell The quantity δm/ v, being the ratio of a scalar by a capacity is a density In the limit of an infinite number of particles, we can take the limit where r, ϕ, and z all tend to zero and the limit ρ(r, ϕ, z) = lim r 0, ϕ 0, z 0 m v (296) is the mass density at point {r,ϕ, z} ϕ ϕ r ϕ Figure 25: We consider, in an Euclidean 3D space, a cylinder with a circular basis of radius 1, and cylindrical coordinates (r,ϕ, z) Only a section of the cylinder is represented in the figure, with all its thickness, z, projected on the drawing plane At the left, we have represented a map of the corresponding circle, and, at the middle, the coordinate lines on a metric representation of the space By construction, all the cells in the middle have the same capacity v = r ϕ z The points represent particles with given masses As explained in the text, counting how many particles are inside each cell directly gives an estimation of the mass density ρ(r,ϕ, z) To have, instead, a direct estimation of the volumetric mass ρ(r,ϕ, z), a division of the space into cells of equal volume (not equal capacity) should have been done, as suggested at the right Given the mass density ρ(r,ϕ, z), the total mass inside a domain D of the space is to be obtained as M(D) = D dv ρ, (297)

65 22 Volume 55 where the capacity element dv appears, not the volume element dv If instead of dividing the space into cells of equal capacity v, we divide it into cells of equal volume, v (as suggested at the right of the figure 25), then the limit ρ(r, ϕ, z) = lim v 0 m v (298) gives the volumetric mass ρ(r,ϕ, z), different from the mass density ρ(r,ϕ, z) Given the volumetric mass density ρ(r,ϕ, z), the total mass inside a domain D of the space is to be obtained as M(D) = dv ρ, (299) where the volume element dv appears, not the capacity element dv The relation between the mass density ρ and the volumetric mass ρ is the universal relation between any scalar density and a scalar, D ρ = g ρ, (2100) where g is the metric density As in cylindrical coordinates, g = r, the relation between mass density and volumetric mass is ρ(r, ϕ, z) = r ρ(r, ϕ, z) (2101) It is unfortunate that in common physical terminology the terms mass density and volumetric mass are used as synonymous While for common applications, this does not pose any problem, there is a sometimes a serious misunderstanding in probability theory about the meaning of a probability density and of a volumetric probability

66 56 Manifolds 23 Mappings Note: explain here that we consider a mapping from a p-dimensional manifold M into a q-dimensional manifold N 231 Image of the Volume Element This section is very provisional I do not know yet how much of it I will need We have p-dimensional manifold, with coordinates {x a } = {x 1,, x p } At a given point, we have p vectors {dx 1,, dx p } The associated capacity element is dv = ɛ a1 a p (dx 1 ) a 1 (dx p ) a p (2102) We also have a second, q-dimensional manifold, with coordinates {ψ α } = {ψ 1,, ψ q } At a given point, we have q vectors {dψ 1,, dψ q } The associated capacity element is dω = ɛ α1 α q (dψ 1 ) α 1 (dψ q ) α q (2103) Consider now an application x ψ = ψ(x) (2104) from the first into the second manifold I examine here the case p q, (2105) ie, the case where the dimension of the first manifold is smaller or equal than that of the second manifold When transporting the p vectors {dx 1,, dx p } from the first into the second manifold (via the application ψ(x) ), this will define on the q- dimensional manifold a p-dimensional capacity element, dω p We wish to relate dω p and dv It is shown in the appendix (check!) that one has dω p = det(ψ t Ψ) dv (2106) Let us be interested in the image of the volume element We denote by g the metric tensor of the first manifold, and by γ the metric tensor of the second manifold Bla, bla, bla, and it follows from this that when letting dω p be the (p-dimensional) volume element obtained in the (q-dimensional) second manifold by transport of the volume element dv of the first manifold, one has dω p dv = det(ψ t γ Ψ) det g (2107) 232 Reciprocal Image of the Volume Element I do not know yet if I will need this section

67 Chapter 3 Probabilities Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Introduction to be written here Note: Through this chapter, I should always say the probability function P and the probability value P[A]

68 58 Probabilities 31 Introduction 311 Preliminary Remarks While probabilities over a discrete set are quite easy to introduce, in most of our applications we shall consider probabilities defined over a manifold, and the introduction of these probabilities requires care In this respect, I disagree with some of the notions usually presented in the literature Let me explain why A probability function over a set is a mapping that to some subsets of the given set associates a real nonnegative number When the considered set is, in fact, a (finitedimensional) manifold, say M, it is a nontrivial question to discover to which kind of subsets of M one can unambiguously associate a probability value Here, I do as everyone else, and choose to associate a probability value only to a special class of subsets of M, those that belong to the Borel field of M (the set of all the subsets [of points] of M having a reasonably simple shape ) Then, a probability function can axiomatically be introduced as the mapping that to every set A of the Borel field of M associates a number, say P[A], the mapping satisfying some conditions (for instance, the well known set of Kolmogorov axioms) Once a probability function P has been introduced over a manifold M, one can pass to the next nontrivial definition, that of conditional probability function If C is a given set of the Borel field of M, with P[C] = 0, the conditional probability function given C is the probability function over M that to any set A of the Borel field of M associates a value, denoted P[A C], and defined as P[A C] = P[A C] / P[C] So far, so good But in all the interesting situations we shall face (when interpreting physical measurements), the conditioning set C will not be an ordinary subset of M, but a submanifold of M, ie, a manifold N contained in M, but having a lower number of dimensions At this point, one may have the illusion that this is a special case of the definition just mentioned, as a submanifold N of M can be seen as the limit of an ordinary set C of M But problems arise when one realizes that one can define as many conditional probability functions (given N ) as there are ways of taking the limit C N And, with the sole elements so introduced there is no natural way of taking the limit Unfortunately, the problem is well known, and it surfaces from time to time The difficulties then encountered are called paradoxes (like the so-called Borel paradox, see appendix 5310) Those are no paradoxes, just the effects of loose definitions This problem is solved by just imposing that the convergence of the set C M into the submanifold N, must be uniform, but this is a notion that can only be introduced if the manifold M is a metric manifold, ie, a manifold where the distance between points is defined, and is locally Euclidean (ie, results from a length element ds 2 ) (Note: I have to check if I need a metric manifold or just a manifold with a notion of distance between points [not necessarily derivable from a ds 2 ]) (Note: explain that, more dramatically, the notion of center of a probability distribution [ mean value ]

69 31 Introduction 59 can be totally meaningless in the absence of a notion of distance over a manifold [the definition is in appendix 5317]) I think that most mathematicians are reluctant to introduce the extra structure brought by a metric over the manifold But in doing so, they leave probability theory incomplete (and open the way to the mentioned paradoxes ) I will emphasize in chapter 4 that the abstract manifolds that we will introduce (to represent physical measurements) do accept a (nontrivial) metric We will not always need to consider metric manifolds But there is another important problem encountered in practical applications that also requires the introduction of an extra structure on the manifold I claim that some of the most basic inference problems can (and must) be solved using the notion of intersection of probabilities, the analogue for probabilities of the intersection of sets 1 We shall see that this very basic, and necessary, notion the intersection of two probability functions can only be considered if one has introduced over the manifold a notion of volume, ie, if to every set A of the Borel field is attached a positive real number V[A], called the volume of the set This volume must be defined intrinsically (ie, independently of any possible choice of coordinates over the manifold) Practically, this means that one special measure function is introduced over the manifold M, that is very special in that all the probability functions to be considered over M must be absolutely continuous with respect to the volume measure (ie, all the probability functions to be introduced must give zero probability to the sets of zero volume) (Note: I have to be careful here, as some of the probability functions to be considered may be delta distributions) 1 Much in the way as the intersection of fuzzy sets is an analogue of the intersection of sets

70 60 Probabilities 312 What we Shall Do in this Chapter The probabilities we are going to formalize below are intended to be used as mathematical models for different real world situations The two more common situations are as follows Consider a physical process producing outcomes that, when analyzed from a certain point of view, look random (in the intuitive sense of the word, without we needing to enter in any metaphysical discussion about the reality of the apparent randomness) Then a probabilistic (mathematical) model can be used to represent the physical process The better the probabilistic model, the better the predictions that can be made about some outcomes (unknown or yet to materialize) Should we have access to a potentially infinite sequence of outcomes, the best probabilistic model would simply be obtained by taking the limit of properly defined histograms As a second example, one may wish to use probabilities to represent imperfect states of information In particular, any measurement of a physical quantity has attached uncertainties, and the result of a measurement is well expressed as a probability distribution on the possible values of the quantity being measured Typically, this probability distribution results in part from statistical considerations, and, in part, from a more or less subjective estimation of other uncertainties not amenable to statistical treatment (note: mention here ISO) An extreme case of this is when the measurement is, in fact, only an educated guess (as when using some forms of a priori information in a Bayesian reasoning) To fix ideas, let us start with the first situation, when we use probabilities to model some random process The outcomes could be produced by some physical process, but we will more simple assume that they are generated by some random algorithm (practical algorithms are, in fact, pseudo-random) With this situation in mind, it will be possible to illustrate the basic ideas in the chapter, without the need to immediately entering into all the technicalities that will later be necessary In the first part of this discussion, let us assume that the outcomes are elements of a discrete set (and to make the discussion simpler, consider that the number of elements is finite) In the second part, each outcome will be a point on a manifold, and this will require some extra structure that is better not to introduce yet Evaluating a Probability (Intuitive Introduction) Consider, then, a discrete and finite set, say A 0, with elements {a 1, a 2,, a N }, and some random algorithm that produces, on demand, sample elements of A 0 For instance, if the algorithm is run n times, we may successively obtain a 7, a 1, a 7, a 7, a 3 Alternatively, we may have n copies of the same algorithm (with different random seeds ) and run each algorithm once When two two situations are undistinguishable, we say that the sample elements are independent We can use the random algorithm to generate n sample elements, and denote k i the number of time the element a i has materialized The experimental frequency

71 31 Introduction 61 of a i is defined as f (a i ; n) = k i /n If when n, the experimental frequencies f (a i ; n) converge to some definite values, say p(a i ), then we say that the probability of element a i is p(a i ), and we write p(a i ) = lim f (a i ; n) = n k lim i n n (31) Let A be a subset of A 0 The probability of the set A, denoted P[ A ], is defined, for obvious reasons, as P[ A ] = p(a), (32) a A where a denotes a generic element of A 0 One has P[ ] = 0 and P[ A 0 ] = 1 The function P that to any set A A 0 associates the value P[ A ] is called a probability function (or, for short, a probability), while the number P[ A ] is the probability value of the set A For any a A 0, p(a) is the elementary probability value of the element a Image of a Probability Let A 0 be a discrete and finite set with elements {a 1, a 2,, a N }, B 0 a discrete and finite set with elements {b 1, b 2,, b N }, and ϕ a mapping from A 0 into B 0 For any a A, we write a b = ϕ(a) To any probability P over A 0 we can associate a (unique) probability over B 0, that we shall call the image of P, and shall denote ϕ[ P ], as follows Let {a, a, a, a, } be a set of elements of A 0 that are sample elements obtained from the probability P We can then consider the set {ϕ(a), ϕ(a ), ϕ(a ), ϕ(a ), } of elements of B 0 By definition, this shall be a set of sample elements of the probability ϕ[ P ], image of the probability P via the maping ϕ As shown below, if p( ) is the elementary probability associated with the probability P, and q( ) is the elementary probability associated with Q = ϕ[ P ], then, for any element b B 0, q(b) = p(a) (33) all a such that ϕ(a) = b Intersection of Probabilities Given a set A 0, we can consider different probabilities over A 0 For instance, P 1 may be the probability function that to any A A 0 associates the probability value P 1 [ A ], while P 2 may be the probability function that to any A A 0 associates the probability value P 2 [ A ] When assimilating a probability to a random algorithm, having two probabilities, means, of course, having two random algorithms Assume that we play the following game We generate a pair of elements of A 0, one using the random algorithm P 1, and one using the random algorithm P 2 These

72 62 Probabilities two elements may be distinct or they may, in fact, be the same element When they are distinct we ignore them, and when it is the same element, we keep it We can then make the histogram of the elements obtained in that way (the histogram is made as outlined above) This histogram will, in the limit when the number of elements tends to infinity, define a new probability over A 0, that we shall denote P 3 = P 1 P 2 (34) and call the intersection of the two probabilities P 1 and P 2 We shall see below that this operation deserves the name of intersection because it generalizes the notion of intersection of sets We shall also see that the intersection of two probabilities can be easily be expressed in terms of the associated elementary probabilities: one has, for any a A 0, p 3 (a) = 1 ν p 1(a) p 2 (a), (35) where ν is the normalization constant ν = a A0 p 1 (a) p 2 (a) Reciprocal Image of a Probability While the image of a probability generalizes, in some sense, the notion of image of a set, we wish now to generalize for probabilities the notion of reciprocal image of a set So, let A 0 be a discrete and finite set with elements {a 1, a 2,, a N }, B 0 a discrete and finite set with elements {b 1, b 2,, b N }, and ϕ a mapping from A 0 into B 0 For any a A, we write a b = ϕ(a) To any probability Q over B 0 we wish to associate a (unique) probability over A 0, that we shall call the reciprocal image of Q, and shall denote ϕ -1 [ Q ] The way is not obvious For we may just remember two of the properties relating images and reciprocal images of sets (see equations 126 and 127): A ϕ -1 [ϕ[a] ] ; ϕ[ϕ -1 [ B ] ] B (36) (Is this useful?) Given Q, there are many P s such that Q = ϕ[p] Then we sample A 0 homogeneously we sample B 0 following Q and we keep when This defines ϕ -1 [Q] The formula is Bla, bla, bla The Popperian Problem Let A 0 be the space of possible models, B 0 the space of possible observables, and ϕ a mapping from A 0 into B 0 (the prediction mapping ) We start with the a priori information that the true model belongs to A A 0 : a A A 0 (37)

73 31 Introduction 63 If a measurement gives us the information that the the true observable belongs to B B 0, b B B 0, (38) then, as, necessarily, the true model must satisfy b = ϕ(a), (39) a A ϕ -1 [ B ] (310) As A ϕ -1 [ B ] A, we may have actually reduced the domain where the true model can be All the models, if any, that were in A but that are not in A ϕ -1 [ B ] have been falsified By the way, as shown elsewhere in this text, one also has b ϕ[ A ϕ -1 [ B ] ] = ϕ[ A ] B (311) As ϕ[ A ] B B, we may have actually reduced the domain where the true observable can be All the observables, if any, that were in B but that are not in ϕ[ A ] B B have also been falsified We can imagine many algorithms that actually find the set A ϕ -1 [ B ] and the set ϕ[ A ] B B The simplest consists in visiting all the a A, in evaluating b = ϕ(a) for all of them, and in keeping the pair {a, b} only if b B The set of all the a that have been kept is A ϕ -1 [ B ], while the set of all the b that have been kept is ϕ[ A ] B B But we need to consider another algorithm, much less efficient, but that opens the way to probabilization Bla, bla, bla We have a probability P over A 0, a probability Q over B 0, and a mapping ϕ from A 0 into B 0

74 64 Probabilities 313 A Collection of Formulas; I: Discrete Probabilities Image of a Probability Function: Q = ϕ[ P ] q(b) = p(a) (312) a : ϕ(a)=b Intersection of Two Probability Functions: P 3 = P 1 P 2 p 3 (a) = 1 ν p 1(a) p 2 (a) (313) Reciprocal Image of a Probability Function: P = ϕ -1 [Q] p(a) = 1 ν q(ϕ(a) ) (314) Bayes-Popper Inference: ( ϕ[ P ϕ -1 [Q] ] = ϕ[ P ] Q ) Expression of the term P 2 = P 1 ϕ -1 [Q 1 ] : p 2 (a) = 1 ν p 1(a) q 1 (ϕ(a) ) (315) Expression of the term Q 2 = ϕ[ P 1 ϕ -1 [Q 1 ] ] = ϕ[ P 1 ] Q 1 : q 2 (b) = 1 ν ( a : ϕ(a)=b ) p 1 (a) q 1 (b) (316) Note: q 2 is both, the intersection of the image of p 1 with q 1, and the image of p 2 Marginals of a Conditional: p 2 (a) = 1 ν p 1(a) q 1 (ϕ(a) ) (317) q 2 (b) = 1 ν ( a : ϕ(a)=b ) p 1 (a) q 1 (b) (318) Bayes-Popper and marginals of the conditional are identical

75 31 Introduction A Collection of Formulas; II: Probabilities over Manifolds Image of a Probability Function: Q = ϕ[ P ] This is the only situation where we do not need any special structure over the manifolds (volume or metric) In terms of probability densities, one has 2 : g(y) = x : ϕ(x)=y f (x) det Φ(x) t Φ(x) (319) If the manifolds are metric manifolds 3, one can also express the image of a probability function in terms of volumetric probabilities: g(q) = f (P) P : ϕ(p)=q det γ(p) det Φ(P) t Γ(ϕ(P) ) Φ(P) (320) Intersection of Two Probability Functions: P 3 = P 1 P 2 Equivalently, in terms of probability densities, f 3 (P) = 1 ν f 1(P) f 2 (P) (321) f 3 (x) = 1 ν f 1 (x) f 2 (x) v x (x) (322) Reciprocal Image of a Probability Function: P = ϕ -1 [Q] f (P) = 1 ν g(ϕ(p) ) (323) Equivalently, in terms of probability densities, f (x) = 1 ν g(ϕ(x) ) v x (x) v y (ϕ(x) ) (324) 2 This equation correspond to the case when the dimension of the departure manifold is smaller or equal that the dimension of the arrival manifold See the text for the other situation 3 If the manifolds are only volume manifolds, it is better to write this making explicit the volume densities: g(q) = P : ϕ(p)=q f (P) v M (P) v N (ϕ(p) ) det Φ(P) t Φ(P)

76 66 Probabilities Bayes-Popper Inference: ( ϕ[ P ϕ -1 [Q] ] = ϕ[ P ] Q ) Expression of the term P 2 = P 1 ϕ -1 [Q 1 ] : Equivalently, in terms of probability densities, f 2 (P) = 1 ν f 1(P) g 1 (ϕ(p) ) (325) f 2 (x) = 1 ν f 1 (x) g 1 (ϕ(x) ) v y (ϕ(x) ) (326) Expression of the term Q 2 = ϕ[ P 1 ϕ -1 [Q 1 ] ] = ϕ[ P 1 ] Q 1 : g 2 (Q) = 1 ν ( P : ϕ(p)=q f 1 (P) det γ(p) ) g det Φ(P) t 1 (Q) (327) Γ(Q) Φ(P) Equivalently, in terms of probability densities, g 2 (Q) = 1 ν ( P : ϕ(p)=q f 1 (P) ) g det Φ(P) t Γ(Q) Φ(P) 1 (Q) (328) Note: g 2 is both, the intersection of the image of f 1 with g 1, and the image of f 2 Note: although I have chosen to write equations 327 and 328 in the typical case when the manifolds have a metric, the only structure required for defining the image of a probability, the reciprocal image of a probability, and the intersection of probabilities, is the volume Therefore these two equations are expressible in terms of the volume densities in each of the two manifolds: g 2 (Q) = 1 ν g 2 (Q) = 1 ν ( ( P : ϕ(p)=q P : ϕ(p)=q f 1 (P) v M (P) det Φ(P) t Φ(P) ) g1 (Q) v N (Q) f 1 (P) ) g1 (Q) det Φ(P) t Φ(P) v N (Q) (329) Marginals of a Conditional Note: see formulas in figure 320 f 2 (P) = 1 ν f 1(P) g 1 (ϕ(p) ) det(γ(p) + Φ(P) t Γ(ϕ(P) ) Φ(P)) det γ(p) (330)

77 31 Introduction 67 Equivalently, in terms of probability densities, f 2 (x) = 1 ν The other marginal is: g 2 (Q) = 1 ν f 1 (x) g 1 (ϕ(x) ) det(γ(x) + Φ(x) t Γ(ϕ(x) ) Φ(x)) det γ(x) det Γ(ϕ(x) ) (331) ( P : ϕ(p)=q f 1 (P) Equivalently, in terms of probability densities, g 2 (y) = 1 ( f ν 1 (x) x : ϕ(x)=y det( γ(p) + Φt (P) Γ(Q) Φ(P) ) ) g det Φ(P) t 1 (Q) Γ(Q) Φ(P) det( γ(x) + Φt (x) Γ(y) Φ(x) ) det γ(x) det Φ(x) t Γ(y) Φ(x) (332) ) g 1 (y) (333) Comparison Between Bayes-Popper and Marginal of the Conditional For Bayes-Popper, one has f 2 (P) = 1 ν f 1(P) g 1 (ϕ(p) ) g 2 (Q) = 1 ν = 1 ν ( ( P : ϕ(p)=q P : ϕ(p)=q f 1 (P) f 1 (P) v M (P) ) g1 (Q) det Φ(P) t Φ(P) v N (Q) det γ(p) ) g det Φ(P) t 1 (Q), Γ(Q) Φ(P) while for the marginal of the conditional, one has: f 2 (P) = 1 det(γ(p) + Φ(P) ν f 1(P) g 1 (ϕ(p) ) t Γ(ϕ(P) ) Φ(P)) det γ(p) g 2 (Q) = 1 ν ( P : ϕ(p)=q f 1 (P) det( γ(p) + Φt (P) Γ(Q) Φ(P) ) ) g det Φ(P) t 1 (Q) Γ(Q) Φ(P) (334) (335) Besides the mathematical rigor, there is an obvious nice property of the Bayes- Popper way of reasoning: To evaluate the important volumetric probability, f 2 (P), no knowledge of the derivatives is needed (compare the first of equations 334 with the first of equations 335) In particular, a Monte Carlo sampling method will require (many) evaluations of ϕ(p) (ie, resolutions of the forward modeling problem ), but no evaluation of the derivatives Φ(P) Now, although the analytical expression for g 2 (Q) (second of third of equations 334) does contain the derivatives, we do not need to use this analytical expression for sampling: the image of the samples of f 2 (P) (obtained in the way just mentioned) are samples of g 2 (Q) So, the samples of g 2 (Q) are obtained as a by-product of the process of sampling f 2 (P)

78 68 Probabilities 32 Probabilities 321 Basic Definitions There are different ways for introducing the notion of probability One may, for instance, introduce the Kolmogorov axioms (see, for instance, Kolmogorov, 1950), or follow Jaynes (2003) ideas One could also start from the notion of random algorithm (ie, of algorithm that produces random outputs), and obtain as properties the usual axioms of the theory I choose to start the theory using (a simplified version of) the Kolmogorov axioms Let, then, consider a non-empty set Ω and its power set (Ω) (the set containing all the subsets of Ω ) If the set Ω has a finite number of elements, or if the elements are numerable, then, one can consider the probability of any set A (Ω) This is not as simple when Ω is not numerable, as, then, there are in (Ω) sets so complicated, that are not mesurable This is why, as we did when introducing a measure over a set, we must restrict our consideration to a suitable chosen subset of (Ω), say F (Ω), that must be a σ-field, this implying, in particular, that F contains the empty set,, and contains the whole set, Ω ; if a set belongs to F, then, the complement of the set also belongs to F ; the union and the intersection of any finite or countably infinite sequence of sets of F belongs to F (Note: what follows is a little bit redundant with what precedes) Typically, if Ω has a finite number of elements, or if the elements are numerable, then, one chooses F = (Ω) If the elements are not numerable (for instance, when Ω is a finitedimensional manifold) one chooses for F the Borel field of Ω, that has been introduced in section 123 Remember that this is a huge set of subsets of Ω, containing all the points of Ω, all the open and closed sets, and all sets having any reasonable shape (in fact, all the sets to which one could unambiguously associate a volume, finite or not) Huge as this set is, it is much smaller than (Ω) Definition 31 Probability Given a set Ω and chosen a set F of subsets of Ω that is a σ-field, we shall call probability function (or, when no confusion is possible, probability) a mapping P from F into the real interval [0, 1] such that and such that for any two sets of F P[Ω] = 1, (336) P[A 1 A 2 ] = P[A 1 ] + P[A 2 ] P[A 1 A 2 ] (337) The number P[A] is called the probability value of the set A (or, when no confusion is possible, the probability of the set A )

79 32 Probabilities 69 It follows immediately from the definition of probability that if two sets A 1 and A 2 are complementary (with respect to Ω ), then, P[A 2 ] = 1 P[A 1 ] In particular, the probability of the empty set is zero: P[ ] = 0 (338) We shall call the triplet {Ω, F, P} a probability triplet It is usually called a probability space, but we should refrain from using this terminology here: given the pair {Ω, F}, we shall consider below the space of all probabilities over {Ω, F} (that we shall endow with an internal operation, the intersection of probabilities) So, here, what would deserve the name of probability space would be a given Ω, a given σ-field F (Ω) and the collection of all the probabilities over {Ω, F} So, to avoid any confusion, we better don t use the term probability space, and call {Ω, F, P} a probability triplet Also, the set Ω is sometimes called the sample space, while the sets in F are sometimes called events We do not ned to use this terminology here 4 Let {Ω, F, P 1 } and {Ω, F, P 2 } be two probability triplets (ie, let P 1 and P 2 be two possibly different probability functions defined over the same set F ) If for any A F one has P 1 [A ] = P 2 [ A ], (339) then one says that P 1 and P 2 are identical, and one writes P 1 = P 2 (340) Definition 32 Conditional Probability Let {Ω, F, P} be a probability triplet, and let C F be a given subset of Ω with non-zero probability, P[C] > 0 The conditional probability of any set A F given the set C is denoted P[A C], and is defined as P[A C] = P[A C] P[C] (341) It is easy to see that, for any given C with P[C] > 0, the function A P[A C] is indeed a probability function (ie, it satisfies the conditions in definition 31) (Note explain briefly here the intuitive meaning of the definition of conditional probability) Note: write here the Bayes theorem: P[A C] = P[C A] P[A] P[C] (342) 4 Let us also choose to ignore what a random variable could be

80 70 Probabilities Property 31 If the elements of the set Ω are numerable (or if there is a finite number of them), Ω = {ω 1, ω 2, }, then, for any probability P over F = (Ω), there exist a unique set of non-negative real numbers { p(ω 1 ), p(ω 2 ), }, with i p(ω i ) = 1, such that for any A Ω P[A] = p(ω i ) (343) ω i A It is then clear that the probability of a set containing a single element is P[{ω i }] = p(ω i ) (344) Accordingly, we shall call the number p(ω i ) the elementary probability (or, for short, probability) of the element ω i While the function A P[A] (defined on sets) is the probability function, we shall call the function a p(a) (defined on elements) the elementary probability function A look at figure 31 makes this property obvious In many practical situations, one does not reason on the abstract function P, that associates a number to every subset, but on the collection of numbers p i = p(ω i ), one associated to each element of the set p 1 p 3 A 1 p 2 p 4 p 6 p 5 p 7 p 8 p 9 A 2 Figure 31: From a practical point of view, defining a probability P over a discrete set consists in assigning an elementary probability p i to each of the elements of the set, with i p i = 1 Note that, given a discrete set A 0, we can consider different elementary probability functions over A 0, say p 1, p 2 To any element a A 0 the probability function p 1 shall associate the probability value p 1 (a), the probability function p 2 shall associate the probability value p 2 (a), and so on In a very similar way, given different sets A 1, A 2 (all of them subsets of A 0 ), the indicator functions associated to each set shall associate, to any element a A 0, the values χ A1 (a), χ A2 (a)

81 32 Probabilities 71 and so on As we call any of the p 1 (a), p 2 (a) the probability of a, we could call any of the χ A1 (a), χ A2 (a) the possibility of a : a possibility value of 1 meaning certainty, and a possibility value of 0 meaning impossibility Example 31 I have arbitrary selected an element a from some set A 0 Question: is it possible that the element a belongs, in fact, to some subset A A 0? Answer: the possibility that the element a belongs to A is χ A (a), ie, if χ A (a) = 1 it is certain that a belongs to A, while if χ A (a) = 0 it is impossible that a belongs to A We thus see that the notion possibility of an element a bears some resemblance to the notion probability of an element a While the possibility of an element can be 0 or can be 1, the probability of a element can be any number in the interval [0, 1], so, in some lousy sense, probabilities are a generalization of indicators (ie, of possibilities ) (In that sense, a probability function is a fuzzy set ) The following table displays the correspodences between sets and probability functions (the table is for probabilities over a discrete set, and a similar table is possible for probabilities over a manifold): set theory probability theory probability functions P 1, P 2 sets A 1, A 2 that to any A A 0 that are subsets of a set A 0 associate the probability values P 1 [A], P 2 [A] indicator functions elementary probability functions χ A1, χ A2 p 1, p 2 that to any a A 0 that to any a A 0 associate the indicator values associate the elementary probability values χ A1 (a), χ A2 (a) p 1 (a), p 2 (a) This table suggests that when, in the following pages, we generalize to probability functions some of the notions usually introduced for sets (intersection of two probability functions, image of a probability function [via a mapping], reciprocal image of a probability function) we should compare the formulas obtained for elementary probability functions (or for volumetric probabilities, in the case of a manifold) with the formulas of set theory expressed in terms of indicator functions Note: explain here that introducing the elementary conditional probability p(ω C) via P[A C] = p(ω C), (345) ω A one finds p(ω C) = { p(ω) / ω C p(ω ) if ω C 0 if ω / C (346)

82 72 Probabilities Note: I have to define somewhere the notion of support of a probability function 5 When we have a probability triplet {Ω, F, P 1 }, the set F will always be (Ω), the set of all the subsets of Ω if Ω has a numerable number of elements, the Borel set of Ω if Ω is a manifold, or some explicitly introduced set in the general case So, to simplify the language, instead of saying that a probability function P is defined over {Ω, F}, we shall sometimes say that a probability function is defined over Ω 5 Recall: if M is a topological space, and f real function defined over M, the support of f, denoted supp( f ), is the closure of the set where f does not vanish (ie, supp f is the smallest closed set such that f is zero on the complement of the set) One has the following two properties supp( f g) supp( f ) supp(g) supp( f + g) supp( f ) supp(g) Note: I must understand why one has instead of = in these two equations I must use this definition for probability functions, and explain that a probability function can be can be seen as a modulation of its support I also have to explain that I wish the notions to be introduced below of image of a probability, of reciprocal image of a probability, and of intersection of two probabilities have to generalize the equivalent notions in set theory ( image of a set, of reciprocal image of a set, and of intersection of two sets ), ie, we must have supp[ϕ[p] ] = ϕ[ supp[p] ] supp[ P 1 P 2 ] = supp[p 1 ] supp[p 2 ] supp[ϕ -1 [Q] ] = ϕ -1 [ supp[q] ], where, possibly, some of the signs = have to replaced by

83 32 Probabilities Image of a Probability Let ϕ be a mapping from a set A 0 into a set B 0, and let P be a probability function over A 0 As we shall see below, there is a unique probability function over B 0, denoted ϕ[ P ], such that for any 6 B B 0, (ϕ[ P ])[ B ] = P[ϕ -1 [ B ] ] (347) Definition 33 We shall say that ϕ[ P ] is the image of the probability P via the mapping ϕ So, in simple words, the image of a probability function (via a mapping ϕ ) is such that the probability value of any subset B (in the arrival set) equals the probability value (in the departure set) of the reciprocal image of B (via the mapping ϕ ) Note: In set theory, one introduces the notion of image of a set I must verify that the following property holds: the image of the support of a probability function equals the support of the image of the probability function: supp[ϕ[ P ] ] = ϕ[ supp[ P ] ] (348) So, in this sense, our definition of image of a probability function is consistent with the definition of image of a set In fact, the definition of image of a probability function can be seen as a generalization of the definition of image of a set, if a set is considered as a special case of a probability function (ie, of a fuzzy set ) There is another way to see this The image of a set can be defined in terms of the indicator of a set (see equation 136) as for any b B 0, ξ ϕ[a] (b) = χ A (ϕ -1 [{b}] ), (349) and bla, bla, bla Let us check that we have, indeed, defined a probability First, for any B B 0, (ϕ[p])[ B ] = P[ϕ -1 [ B ] ] 0 Second, for any B 1 B 0, and any B 2 B 0, (ϕ[p])(b 1 B 2 ) = P[ϕ -1 [ B 1 B 2 ] ] = P[ϕ -1 [ B 1 ] ϕ -1 [ B 2 ] ] = P[ϕ -1 [ B 1 ] ] + P[ϕ -1 [ B 2 ] ] P[ϕ -1 [ B 1 B 2 ] ] = (ϕ[p])[ B 1 ] + (ϕ[p])[ B 2 ] (ϕ[p])[ B 1 B 2 ] So, that ϕ[p] is a probability follows immediately from the fact that P is a probability Taking B = ϕ[a] in the definition 347 gives the property (ϕ[ P ])[ϕ[a] ] = P[ϕ -1 [ϕ[a] ] ] (350) As (equation 126) A ϕ -1 [ϕ[a] ], in general P[ϕ -1 [ϕ[a] ] ] P[A], and, therefore, (ϕ[ P ])[ϕ[a] ] P[A], (351) 6 More rigorously, for any B in the selected set F of subsets of B 0 (a σ-field)

84 74 Probabilities the equality holding if the mapping ϕ is injective (check!) In section 332 we shall examine what this definition implies when we consider that the sets A 0 and B 0 are manifolds (where the probability of any single point is, typically, zero) But when the two sets A 0 and B 0 are discrete, we can obtain a simple result: Property 32 Let A 0 and B 0 be two discrete sets, and ϕ a mapping from A 0 into B 0 As equation 347 must hold for any B B 0, it must also hold for a set containing a single element Therefore, for any b B 0, (ϕ[p])(b) = P[ϕ -1 [ {b} ] ] (352) More simply, we can state this property as follows If p( ) is the elementary probability associated with the probability P, and ϕ[p]( ) is the elementary probability associated with ϕ[ P ], then, for any element b B 0, (ϕ[p])(b) = p(a) (353) all a such that ϕ(a) = b This, of course, just results from the definition of the reciprocal image of a set Example 32 Figure 32 suggests a discrete set A 0 = {a 1, a 2, a 3 }, a discrete set B 0 = {b 1, b 2, b 3 }, and a mapping ϕ from A 0 into B 0 To every probability P defined over A 0 one can associate its image Q = ϕ[p] (defined over B 0 ) as just introduced Given the mapping suggested in the figure, one obtains (using equation 352) Q(b 1 ) = P(a 1 ) + P(a 2 ) ; Q(b 2 ) = P(a 3 ) ; Q(b 3 ) = 0 (354) Note that from P(a 1 ) + P(a 2 ) + P(a 3 ) = 1 it follows Q(b 1 ) + Q(b 2 ) + Q(b 3 ) = 1 Note: I have to demonstrate here the following property Property 33 Let ϕ be a mapping from a set A 0 into a set B 0, and let P be a probability over A 0 If {a 1, a 2, } be a sample of P, then, {ϕ(a 1 ),ϕ(a 2 ), } is a sample of ϕ[p] This property has both a conceptual and a practical importance Conceptually, because it gives an intuitive understanding of the notion of transport of a probability function Practically, because, to obtain a sample of ϕ[p], one should not try to develop an ad-hoc method based on the explicit expressions associated to ϕ[p] (like the expression for ϕ[p] in equation 353) One should rather just obtain a sample of P, and, then, just map the sample as suggested by property 33 (Note: is the image of a σ-field a σ-field?) (Note: is the reciprocal image of a σ-field a σ-field?)

85 32 Probabilities 75 A 0 B 0 a 1 b 1 Figure 32: The image of a probability via a mapping a 2 a 3 b 2 b 3 p(a 1 ) p(a 2 ) p(a 3 ) given q(b 1 ) = p(a 1 ) + p(a 2 ) q(b 2 ) = p(a 3 ) q(b 3 ) = 0

86 76 Probabilities 323 Intersection of Probabilities The notion of intersection of probabilities was introduced by Tarantola (1987) I think that some problems that are formulated using the notion of conditional probability (and Bayes theorem) and better formulated using this notion This is true, in particular, for the so-called inverse problems see an example in section XXX In few words, the notion of intersection of sets plays a major role when formulating problems in terms of sets As far as one can see a probability as a generalization of (the indicator of) a set, it is natural to ask what how the intersection of sets generalizes when dealing with probabilities Of course, there will be strong similarities between the intersection of probabilities and the intersection of fuzzy sets (Zadeh, 1965), but the final equations are not quantitatively equivalent, and the domain of application of the two definitions is quite different Note: explain here that the intersection of probability functions can only be defined if a given probability function (perhaps an improper one, ie, a measure function) H has been chose beforehand, such that all other probability functions are absolutely continuous with respect to it The special probability functions is named homogeneous (note: explain why it is called homogeneous) Typically, when working with discrete sets, the (homogeneous) measure of a set will simply correspond to the number of elements on the set, while when working with manifolds, the measure of a set will result from an ad-hoc of volume element of the manifold (see examples XXX and XXX) Note: the tactic I am going to follow here is to define the intersection as an internal operation between (normalized) probabilities As the homogeneous measure must be a probability, this assumes that the measure of the whole set is finite (I call it the homogeneous probability) and represent it by the letter H ) Once the final formulas have been obtained, both for discrete sets and for manifolds, we extend the definition, when it makes sense, to sets with infinite measure Note: be careful, for the intersection of two probabilities to be defined, the intersection of their support must be non-empty Definition 34 Intersection of probabilities Let a volume triplet T = ( Ω, F, V ) be given We can consider the space of all probabilities that can be build on the triplet T We endow this space of all the probabilities with an internal operation, called the intersection, and denoted with the symbol, defined by the following conditions: the operation is commutative, ie, for any two probability functions, P 1 P 2 = P 2 P 1, (355) the operation is associative, ie, for any three probability functions, (P 1 P 2 ) P 3 = P 1 (P 2 P 3 ), (356)

87 32 Probabilities 77 the homogeneous probability function H is a neutral element of the operation, ie, for any probability P, P H = H P = P, (357) for whatever A F, P 1 [A] = 1 (P 1 P 2 )[A] = 1, (358) and for whatever A F, P 1 [A] = 0 (P 1 P 2 )[A] = 0 (359) Note: the last conditions implies that the intersection P 1 P 2 is absolutely continuous with respect to both, P 1 and P 2 Note: The examples below demonstrate here that there is at least one solution to the previous set of conditions It remains to prove that this solution is unique 7 Note: I still don t know if I can prove the relation or if I have to add it as an axiom supp[ P 1 P 2 ] = supp[p 1 ] supp[p 2 ] (360) Property 34 Assume that the set Ω is discrete, and that we choose F = (Ω) Let P 1 and P 2 be two probabilities, with elementary probabilities respectively denoted p 1 and p 2 The elementary probability representing P 1 P 2, that we may denote p 1 p 2, is given, for any element ω Ω, by (p 1 p 2 )(ω) = 1 ν p 1(ω) p 2 (ω), (361) where ν is the normalization factor ν = ω Ω p 1 (ω) p 2 (ω) It is obvious that with the expression above, the four conditions above are satisfied For the expression of the intersection of probability functions defined over manifolds, see property 310, page For the time being, I have taken the simple example of a set with only two elements Denoting by f (x, y) the formula that, in this simple example, expresses the intersection, the axioms impose the following conditions: f (x, y) = f (y, x), f (0, x) = 0, f (1/2, x) = x, f (1, x) = 1, f (x, f (y, z)) = f ( f (x, y), z) One solution of this is the right expression, f (x, y) = x y/(1 x y + 2 x y), but I don t know yet if there are other solutions

88 78 Probabilities As we have expressed the intersection of two probabilities in terms of a elementary probability (for discrete sets) of in terms of volumetric probabilities (for manifolds), and as these are positive, normalized functions, it is clear that the intersection of two probabilities is, indeed, a probability (the operation is internal) Note: warn here the reader that the expression 3153 is not be valid if, instead of using volumetric probabilities, one uses probability densities See appendix?? So we see that the intersection of two probabilities is defined in terms of the product of the elementary probabilities (or the volumetric probabilities) At this point we may remember the second of equations 18, defining the intersection of sets in terms of their indicators: for any ω Ω, we had χ A1 A 2 (ω) = min{ χ A1 (ω), χ A2 (ω) } = χ A1 (ω) χ A2 (ω) (362) The second of these expressions is similar to the two equations 361 and 3153, excepted for the normalization factor that makes no sense for indicators Note that the intersection of fuzzy sets is, rather, defined by the expression in the middle of 362 An expression like this one is forbidden to us, because of the third of the conditions in definition 34 (that the homogeneous probability is neutral for the intersection operation) (Note: I have moved the definition of the union of probabilities to the appendices [appendix 535, page 194])

89 32 Probabilities Reciprocal Image of a Probability Let ϕ be a mapping from a set A 0 into a set B 0, and let Q be a probability over B 0 How should we define ϕ -1 [Q], the reciprocal image of Q? A look at figure 33 suggests that there is no obvious way to use the mapping ϕ to infer the probability ϕ -1 [Q] (contrary to what happens in figure 32, where the inference is obvious [for instance, via property 33, page 74]) As we have already defined the intersection of two probabilities (see section 323), and we have seen that the intersection of probabilities is a generalization of the notion of intersection of sets, we shall define the reciprocal image of a probability by imposing that the equivalent of equation 130 (reproduced in equation 369 here below), that is valid for images, reciprocal images, and intersections of sets, remains valid for images, reciprocal images, and intersections of probability functions Note: I guess that the right definition is as follows We consider a mapping from a set A 0 into a set B 0 The mapping is arbitrary, in the sense that is not assumed to be injective or to be surjective Given a subset A A 0 one can always (check!) consider the decomposition of A into a collection of disjoint subsets A i such that A = i A i, (363) and such the mapping from each of the A i into B 0 is injective [as an example, draw here a cubic function] Definition 35 Then, given an arbitrary mapping ϕ from a set A 0 into a set B 0, and given a probability Q over B 0, there is a unique probability over A 0, that we shall denote ϕ -1 [Q] and we shall call the reciprocal image of Q, defined by the condition that for any A A 0, (ϕ -1 [Q])[A] = (ϕ -1 [Q])[ i A i ] = 1 ν i Q[ϕ[A i ] ], (364) where A = i A i is the partition introduced above, and where ν is the normalization constant (independent on A ) ensuring that (ϕ -1 [Q])[A 0 ] = 1 [I have to demonstrate here that I am indeed defining a probability over A 0 (ie, that the function A (ϕ -1 [Q])[A] so introduced satisfies the Kolmogorov axioms)] Example 33 If the sets A 0 and B 0 are discrete, then the elementary probability of any element a A 0 is given by where ν is the normalization factor ν = a A0 q(ϕ(a) ) (ϕ -1 [q])(a) = 1 q(ϕ(a) ), (365) ν

90 80 Probabilities See section 346 for the expression of the reciprocal of a probability function when considering a mapping between manifolds Example 34 Figure 33 suggests a discrete set A 0 = {a 1, a 2, a 3 }, a discrete set B 0 = {b 1, b 2, b 3 }, and a mapping ϕ from A 0 into B 0 To every probability Q defined over B 0 one can associate its reciprocal image P = ϕ -1 [Q] as just defined Given the mapping suggested in the figure, one obtains (using equation 365) P(a 1 ) = Q(b 1 ) / ( 2 Q(b 1 ) + Q(b 2 ) ) P(a 2 ) = Q(b 1 ) / ( 2 Q(b 1 ) + Q(b 2 ) ) P(a 3 ) = Q(b 2 ) / ( 2 Q(b 1 ) + Q(b 2 ) ) (366) Note that from Q(b 1 ) + Q(b 2 ) + Q(b 3 ) = 1 it follows P(a 1 ) + P(a 2 ) + P(a 3 ) = 1 A 0 B 0 a 1 b 1 a 2 b 2 Figure 33: The reciprocal image of a probability via a mapping, P ϕ -1 [Q] a 3 b 3 Note: I have to check that the property p(a 1 ) = q(b 1 )/ν p(a 2 ) = q(b 1 )/ν p(a 3 ) = q(b 2 )/ν ν = 2 q(b 1 ) + q(b 2 ) q(b 1 ) q(b 2 ) q(b 3 ) given supp[ϕ -1 [Q] ] = ϕ -1 [ supp[q] ] (367) holds

91 32 Probabilities Compatibility Property We have just introduced the notion of image of a probability, ϕ[ P ], of reciprocal image of a probability ϕ -1 [Q], and of intersection of two probabilities, R 1 R 2 With the definitions introduced above, the following property holds: Property 35 Let ϕ be a mapping from some set A 0 into some other set B 0, P be a probability function defined over A 0, and Q a probability function defined over B 0 For any mapping ϕ, and any probabilities P and Q, ϕ[ P ϕ -1 [Q] ] = ϕ[ P ] Q (368) This property is ALMOST demonstrated in appendix 532), where it is also demonstrated that the definition of reciprocal image of a probability, introduced above, is the only definition leading to this compatibility property We may remember here the relation 130, demonstrated in chapter 1: when A and B are sets, ϕ[a] and ϕ -1 [ B ] respectively denote the image and the reciprocal image of a set, and C 1 C 2 denotes the intersection of two sets, one has ϕ[ A ϕ -1 [ B ] ] = ϕ[a] B (369)

92 82 Probabilities Note: mention here figure 34 A 0 B 0 a 1 b1 a 2 b 2 a 3 b 3 p(a 1 ) p(a 2 ) p(a 3 ) q(b 1 ) q(b 2 ) q(b 3 ) given 1 1 given 2 r(a 1 ) = q(b 1 )/ν r(a 2 ) = q(b 1 )/ν r(a 3 ) = q(b 2 )/ν s(b 1 ) = p(a 1 ) + p(a 2 ) s(b 2 ) = p(a 3 ) s(b 3 ) = 0 2 u(a 1 ) = p(a 1 ) r(a 1 )/ν' = p(a 1 ) q(b 1 )/ν' u(a 2 ) = p(a 2 ) r(a 2 )/ν' = p(a 2 ) q(b 1 )/ν' u(a 3 ) = p(a 3 ) r(a 3 )/ν' = p(a 3 ) q(b 2 )/ν' v(b 1 ) = q(b 1 ) s(b 1 )/ν' = q(b 1 ) ( p(a 1 ) + p(a 2 ) )/ν' v(b 2 ) = q(b 2 ) s(b 2 )/ν' = q(b 2 ) p(a 3 )/ν' v(b 3 ) = q(b 3 ) s(b 3 )/ν' = 0 3 ν = 2 q(b 1 ) + q(b 2 ) ν' = ( p(a 1 ) + p(a 2 ) ) q(b 1 ) + p(a 3 ) q(b 2 ) Figure 34: Illustration of the compatibility property: that one follows the blue path or the red path, obe arrives at the same values V(b) (Note: explain this better)

93 32 Probabilities Other Properties I should perhaps write here some properties like the two below A mapping a b = ϕ(a) (that to any element a A 0 associates an element b B 0 ) is traditionally extended into a mapping A B = ϕ[a] that to any set A A 0 associates a set B B 0 The reciprocal of this mapping ϕ[ ] is then introduced (see section 13), and one arrives at the two properties expressed by equations 126 and 127, that we may remember here For any A A 0, one has A ϕ -1 [ϕ[a] ], (370) and one has A = ϕ -1 [ϕ[a] ] if the mapping is injective For any B B 0, one has ϕ[ϕ -1 [ B ] ] B, (371) and one has ϕ[ϕ -1 [ B ] ] = B if the mapping is surjective As we have extended the mapping ϕ to define a mapping P Q = ϕ[p] of probabilities (that to any probability P over A 0 associates a probability Q over B 0 ), we need to examine how the expression ϕ -1 [ϕ[p] ] relates to P, and how the expression ϕ[ϕ -1 [Q] ] relates to Q (Note: I have not yet derived general formulas concerning subsets, so, for the time being, I write here the formulas concerning the elements of discrete sets) Property 36 For any (discrete) probability p and any mapping ϕ one has 8 (for any element a A 0 ) ( ϕ -1 [ϕ[p] ] )(a) = 1 ν p(a ), (372) a ϕ -1 [ {ϕ(a)} ] where ν is a normalization factor Should the mapping ϕ be injective, then the right-hand side of the expression would collapse into p(a), so one would have ϕ -1 [ϕ[p] ] = p (373) Property 37 For any (discrete) probability q and any mapping ϕ one has 9 (for any element b B 0 ) ( ϕ[ϕ -1 [q] ] )(b) = 1 n(b) q(b), (374) ν where n(b) is the number of elements in A 0 that map into the element b (ie, the measure of the set ϕ -1 [{b}] ), and where ν is a normalization factor Should the mapping ϕ be bijective, then the right-hand side of the expression would collapse into q(b), so one would have ϕ[ϕ -1 [q] ] = q (375) These two properties are illustrated in figures 35 and 36 8 For one has ( ϕ -1 [ϕ[p] ] )(a) = 1 ν (ϕ[p] )(ϕ(a) ) = 1 ν a ϕ -1 [ {ϕ(a)} ] p(a ) 9 For one has ( ϕ[ϕ -1 [q] ] )(b) = a ϕ -1 [{b}] (ϕ-1 [q])(a) = a ϕ -1 [{b}] 1 ν q(ϕ(a)) = 1 ν a ϕ -1 [{b}] q(ϕ(a)) = ν 1 n(b) q(b)

94 84 Probabilities A 0 B 0 a 1 b 1 a 2 b 2 a 3 b 3 Figure 35: Evaluating p = ϕ -1 [ϕ[p] ] p(a 1 ) p(a 2 ) p(a 3 ) given (ϕ[p])(b 1 ) = p(a 1 ) + p(a 2 ) (ϕ[p])(b 2 ) = p(a 3 ) (ϕ[p])(b 3 ) = 0 p (a 1 ) = ( p(a 1 ) + p(a 2 ) )/ν p (a 2 ) = ( p(a 1 ) + p(a 2 ) )/ν p (a 3 ) = p(a 3 )/ν ν = 2 p(a 1 ) + 2 p(a 2 ) + p(a 3 ) A 0 B 0 a 1 b 1 a 2 b 2 a 3 b 3 Figure 36: Evaluating q = ϕ[ϕ -1 [q] ] (ϕ -1 [q])(a 1 ) = q(b 1 )/ν (ϕ -1 [q])(a 2 ) = q(b 1 )/ν (ϕ -1 [q])(a 3 ) = q(b 2 )/ν ν = 2 q(b 1 ) + q(b 2 ) q(b 1 ) q(b 2 ) q(b 3 ) given q (b 1 ) = 2 q(b 1 )/ν q (b 2 ) = q(b 2 )/ν q (b 3 ) = 0

95 32 Probabilities Marginal Probability Note: the importance of this definition has to be downplayed: it must not correspond to a whole section The definition given here below of marginal probability may seem less general than that find in other texts In fact, I am just trying to give the definition that, when dealing with probabilities over manifolds, makes intrinsic sense, ie, does not require to invoke any special coordinate system So, to introduce the notion of marginal probability we do not consider an arbitrary set, but we consider a set that is the Cartesian product of two sets: A 0 B 0 Let P then be a (general) probability function over A 0 B 0 To any set S A 0 B 0 the probability function P associates the probability value denoted P[ S ] The probability function P induces one probability function over A 0 and one probability function over B 0, respectivelt denoted P A0 and P B0, and called marginal probability functions and defined as follows P A0 is the function that to any set A A 0 associates the probability value P A0 [ A ] P[ A B 0 ], (376) and P B0 is the function that to any set B B 0 associates the probability value P B0 [ B ] P[ A 0 B ] (377) (Note: check these definitions, and demonstrate that they actually define a probability function) Example 35 If the set A 0 B 0 is discrete, denoting a generic element of the set as {a, b}, we can introduce the elementary probability function p through the condition that for any set S A 0 B 0, P[ S ] = {a,b} S p(a, b) Similarly, we can introduce the elementary probability function p A0 through the condition that for any set A A 0, P A0 [ A ] = a A p A0 (a), and the elementary probability function p B0 through the condition that for any set B B 0, P B0 [ B ] = b B p B0 (b) It is then easy to see (note: check!) that the elementary marginal probability functions are given (for any a A 0 and any b B 0 ) by and p A0 (a) = p B0 (b) = b B 0 p(a, b) (378) a A 0 p(a, b) (379)

96 86 Probabilities Example 36 If the two sets A 0 and B 0 are, in fact, two manifolds, then (Note: check if I really need the volume elements) Bla, bla, bla, and C = A B (380) Bla, bla, bla, and Bla, bla, bla, and etc dv C = dv A dv B (381) dv C = dv A dv B (382)

97 33 Probabilities over Manifolds Probabilities over Manifolds Note: say that all the general definition given above hold We only need to explicitly develop the special mathematics that appear when dealing with manifolds 331 Probability Density In this section we consider that the set Ω is, in fact, a finite-dimensional manifold, that we may denote M We select for F the (THE?) Borel set of M Bla, bla, and selecting some coordinates {x i } over the manifold, bla, bla, and Radon-Nikodym theorem, and bla, bla, and we write P[A] = {x 1,,x n } A dv x f (x 1,, x n ), (383) where dv x = dx 1 dx n Using more elementary notations, dv x = dx 1 dx 2 dx n, (384) and equation 383 can be rewritten under the non manifestly covariant form P[A] = {x 1,,x n } A dx 1 dx 2 dx n f (x 1,, x n ) (385) The function f (x 1,, x n ) is called the probability density (associated to the probability distribution P and to the coordinates {x i } ) It is a density, in the tensorial sense of the term, ie, under a change of variables x y it changes according to the Jacobian rule (see below) Example 37 Consider a homogeneous probability distribution at the surface of a sphere of unit radius When parameterizing a point by its spherical coordinates (θ, ϕ), the associated (2D) probability density, the homogeneous probability distribution is represented by the function f (θ,ϕ) = 1 sinθ, (386) 4π and the probability of a domain A is computed as P[A] = }{{} {θ,ϕ} A the integral over the whole surface giving one dθ dϕ f (θ,ϕ), (387)

98 88 Probabilities A probability density is a density in the tensorial sense of the term Under a change of variables {x 1,, x n } {y 1,, y n }, expression 383 becomes P[A] = dv y g(y 1,, y n ), (388) {y 1,,y n } A where dv y = dy 1 dy n As this identity must hold for any domain A, it must also hold infinitesimally, so we can write dp = f (x 1,, x n ) dx 1 dx n = g(y 1,, y n ) dy 1 dy n (389) We have already seen that the relation between the two capacity elements associated to the two coordinate systems is (see equation 244) dy 1 dy n = (1/X) dx 1 dx n, so we immediately obtain the Jacobian rule for probability densities g(y 1,, y n ) = f (x 1,, x n ) X(y 1,, y n ) (390) Note: explain that the coordinates x i at the right take the values x i = x i (y 1,, y n ) Of course, this is the general rule of transformation of scalar densities (equation 218, page 29), that they represent a probability density or any other density Note that the X appearing in this equation is the determinant of the matrix {X i j} = { x i / y j }, not that of the matrix {Y i j} = { y i / x j } Note: I must also give the formula (referred to in the appendices) g(y 1,, y n ) = f (x1,, x n ) Y(x 1,, n n ), (391) and, perhaps, write it as g(y) = f ( x(y) ) Y( x(y) ) (392) (Note: correct what follows) In the literature, the equivalent of equation 390 is presented taking the absolute sign of the Jacobian determinant In principle, we do not heed here this absolute sign, as we have assumed above that the new variables are always classified in a way such that the Jacobian determinant is positive It remains the case of a one-dimensional variable, that we may treat in an ad-hoc way 10 (Note: I must warn the reader that if there is a natural notion of volume on the manifold, integrating densities may be quite unefficient Note: without a metric, we can not define the conditional probability density Note: without the notion of volume, we can not define the intersection of probabilities 10 When, for instance, passing from ρ to l = 1/ρ, we have a negative value of the Jacobian determinant, dl/dρ = 1/ρ 2

99 33 Probabilities over Manifolds Image of a Probability Density The notion of image of a probability function has been introduced in section 322 for probabilities defined over an arbitrary set We are now interested in probabilities defined over a manifold, and these probabilities are represented by probability density densities So, our task here is to translate the general definition of image of a probability in terms of probability densities By the way, from the three notions (image of a probability, intersection of two probabilities, and reciprocal image of a probability) that we have now to revisit (to see the implications when the probabilities are defined over manifolds), it is only the first notion, that of image of a probability that does not require any special structure over the manifold The two other notions (intersection of two probabilities, and reciprocal image of a probability) will only make sense if the manifolds have a notion of volume attached Even worse, we shall later see that the notion of conditional probability density can only be introduced if the manifolds are metric (ie, if the distance between points is defined) So, consider a mapping ϕ from an m-dimensional manifold M into an n-dimensional manifold N As we wish, in a moment, to introduce probability densities over M and over N, we need to endow each of the two manifolds with a coordinate system So, let x {x α } = {x 1,, x m } be a coordinate system over M, and let y {y i } = {y 1,, y n } be a coordinate system over N We can thus write the mapping as x y = ϕ(x) Let P be a probability over (the Borel set of) M, with probability density f (x) Its image via the mapping ϕ is, as defined in section 322, a probability Q = ϕ[p] defined over (the Borel set of) M, that shall have some probability density g(y) One obtains something like (see appendix) g(y) = x : ϕ(x)=y f (x) det Φ(x) t Φ(x) ( if m n ), (393) where Φ is the matrix of partial derivatives of the mapping ϕ In the case m m, bla, bla, bla Note: explain that we can write g = ϕ[ f ] (394) For the expression of the image of a probability in terms of volumetric probabilities, see section 344 We can give here a version of property 33 adapted to probabilities over manifolds Note: I have to demonstrate here the following property Property 38 Let ϕ be a mapping from a manifold M into a manifold N, and consider a probability function over (the Borel set of) M, represented, in the given coordinates, by the probability density f (x), If the set of points {x 1, x 2, } is a sample of f (x), then, {y 1, y 2, } = {ϕ(x 1 ),ϕ(x 2 ), } is a sample of g = ϕ( f )

100 90 Probabilities Again, to obtain a sample of g = ϕ[ f ], one should not try to develop an ad-hoc method based on the analytic expression of g equation 393 or equation XXX, but, rather, one should just obtain a sample of f, and, then, just map the sample as suggested by property 38 Note: figure 37 is old, and will probably be suppressed X x y = (x) Y f(x) image?? preimage g(y) Figure 37: Left: Mapping from X into Y Center: the image of a volumetric probability (evaluating the image of a probability density is a difficult problem) Right: the preimage of a volumetric probability (evaluating the preimage of a probability density is an easy problem) Example 38 We examine here a common problem in experimental sciences: a set of quantities {y 1, y 2,, y q }, that are not directly measurable, are defined in terms of some other directly measurable quantities {x 1, x 2,, x p }, y 1 = ϕ 1 (x 1, x 2,, x p ) y 2 = ϕ 2 (x 1, x 2,, x p ) = y q = ϕ q (x 1, x 2,, x p ) ; (395) the result of a measurement of the quantities {x 1, x 2,, x p }, is represented by the probability density f (x 1, x 2,, x p ) ; which is the probability density g(y 1, y 2,, y q ) that represents the information induced on the quantities {y 1, y 2,, y q }? The answer is, of course, g = ϕ( f ) One may wish to write down explicitly the function g(y), in which case expression 393 or expression XXX has to be used (and the partial derivatives evaluated), or one may just wish to have a set of sample points of f (x) (from which some simple histograms can be drawn), in which case it is much simpler to sample f (x) and to transport the sample points (Note: explain here that the figure 47 below shows a result of such a Monte Carlo transport, compared with the result of the analytical transport) Note: explain why this is intimately related to the ISO recommendation for the transportation of uncertainties

101 33 Probabilities over Manifolds 91 Note: explain here that this is one of the few problems on this book that can be fully and consistently posed in terms of probability densities Note: for a detailed description of the standard practice of transportation of uncertainties, see Dietrich, 1991 For a complete description of metrology and calibration see Fluke, 1994 Note: I should explain here that there is another example where the same question arises, the typical prediction problem in physics: any serious physical theory is to able to make predictions (that may be confronted to experiments) An engineer, for instance, may wish to predict the load at which a given bridge may collapse, or an astrophysicist may wish to predict the flux of neutrinos from the Sun In these situations, the parameters defining the system (the bridge or the Sun) may be known with some uncertainties, and these uncertainties shall reflect as an uncertainty on the prediction

102 92 Probabilities 333 Marginal Probability Density Note: refer here to section 327 The notion of marginal probability density (or marginal volumetric probability) tries to address one simple question: if we have a probability density f (x, y) in two variables x and y, and we don t care much about y which is the probability density for x alone? I choose here to develop the notion using the specific setting of this section (354), that is well adapted to our future needs So, again, we consider a p-dimensional manifold R p, metric or not, with some coordinates r = {r α }, and with the capacity element dv r Consider also a q-dimensional manifold S q, metric or not, with some coordinates s = {s a }, and with the capacity element dv s We build the Cartesian product M p+q = R p S q of the two manifold, ie, we consider the p + q- dimensional manifold whose points consist in a couple of points, one in R p and one in S q As coordinates over M p+q we can obviously choose {r, s} From the capacity elements dv r and dv s one can introduce the capacity element dv = dv r dv s over M p+q Assume now that some random process produces pairs or random points, one point of the pair in R p and the other point S q If fact, the random process is producing points on M p+q We can make a histogram, and, when enough points have materialized, we have a normalized probability density f (r, s), that we assume normalized to one: M p+q dv(r, s) f (r, s) = 1 (396) Instead, one may have just made the histogram of the points on R p, disregarding the points of S q, to obtain the probability density f r (r) It is clear that one has f r (r) = S q dv s (s) f (r, s) (397) This function f r (r) is called the marginal probability density for the variables r The probability of a domain D p R p is to be computed via P(D p ) = D p dv r (r) f r (r), (398) and the probability density is normed to one: R p dv r (r) f r (r) = 1 Similarly, one may have just made the histogram of the points on S q, disregarding the points of R p, to obtain the probability density f s (s) One then has f s (s) = R p dv r (r) f (r, s) (399)

103 33 Probabilities over Manifolds 93 This function f s (s) is called the marginal probability density for the variables s The probability of a domain D q S q is to be computed via P(D q ) = D q dv s (s) f s (s), (3100) and the probability density is normed to one: S q dv s (s) f s (s) = 1 To introduce this notion of marginal probability we have not assumed that the manifolds are metric Of course, the definitions also make sense when working in a metric context Let us introduce the basic formulas Let the distance element over R p be ds 2 r = (g r ) αβ dr α dr β, and the distance element over S q be ds 2 s = (g s ) ab ds a ds b Then, under our hypotheses here, the distance element over M p+q is ds 2 = ds 2 r + ds 2 s We have the metric determinants g r = det g r, g s = det g s and g = g r g s The probability densities above are related to the volumetric probabilities via f (r, s) = g f (r, s) ; f r (r) = g r f r (r) ; f s (s) = g s f s (s), (3101) while the capacity elements are related to the volume elements via dv(r, s) = g dv(r, s) ; dv r (r) = g r dv(r) ; dv s (s) = g s dv(s) (3102) We can now easily translate the equations above in terms of volumetric probabilities As an example, the marginal volumetric probability is (equation 399) f s (s) = R p dv r (r) f (r, s), (3103) the probability of a domain D q S q is evaluated as (equation 3100) P(D q ) = D q dv s (s) f s (s), (3104) and the volumetric probability is normed to one: S q dv s (s) f s (s) = 1

104 94 Probabilities 34 Probabilities over Volume Manifolds 341 Volumetric Probability Bla, bla, bla We shall assume that there is a way of defining the size or volume of the sets into consideration For discrete sets, the volume of the set will just be the number of its elements For manifolds, there is an assumption that is commonly made that, I think, it is not correct: that there is a natural way for defining the volume of a set of points We have seen in the previous chapter that there is a big difference between a capacity element and a volume element: capacity elements are attached to coordinate systems, and they are not intrinsically defined This results in that one can not define the volume of set of points intrinsically unless a particular measure has been explicitly introduced: the volume measure For instance, if the manifold is a metric manifold, the volume measure is deduced from the metric (using the square root of the determinant of the metric, see section 227) Some of the fundamental operations to be introduced below like the intersection of two probabilities can not be defined unless this volume measure is given So, we start here by introducing such a volume measure, that we will always assume that it is given beforehand, and once for all Definition 36 Measure Given a set Ω and a suitable set F of subsets of Ω, a measure is a mapping M from F into the real interval [0, ] that satisfies the following two properties M( ) = 0, (3105) and for any two sets inside F M(A 1 A 2 ) = M(A 1 ) + M(A 2 ) M(A 1 A 2 ) (3106) Definition 37 Volume measure Given a set Ω and a suitable set F of subsets of Ω, the volume measure, denoted V is the particular measure that if the set Ω is discrete, for any A F, V[A] equals the number of elements in A ; if the set Ω is a manifold, V is an explicitly introduced measure, that to any A F associates the volume of A, denoted V[A] = dv(p) (3107) (where dv is the volume element), or, if some coordinates {x 1,, x n } are introduced over the manifold, V[A] = dv(x 1,, x n ) g(x 1,, x n ), (3108) P A P A

105 34 Probabilities over Volume Manifolds 95 where dv is the capacity element associated to the coordinates, and where g is a given density In a metric manifold, g = ± det g (see section XXX for details) Definition 38 Volume triplet When a set Ω is given, a suitable set F of subsets of Ω considered, and a volume measure V over F chosen, we shall say that we have a volume triplet, that we shall denote ( Ω, F, V ) Bla, bla, bla Property 39 If the set Ω is a (finite-dimensional) manifold, its elements are called points, that we denote P, P As the points of a manifold are not numerable, one may choose for F the usual Borel field B (Ω) associated to the manifold Assume that, as explained above, we have introduced a particular volume measure V over F = B, so we have the triplet ( Ω, B, V ) As already mentioned, to every set A B we can then associate its volume, that we write as V[A] = dv(p) (3109) P A Then, it follows from the Radon-Nikodym theorem that for any probability P over B there exist a unique positive function f (P) satisfying such that for any A B P Ω P[A] = dv(p) f (P) = 1, (3110) P A We call f (P) the volumetric probability of the point P dv(p) f (P) (3111)

106 96 Probabilities 3411 Second Piece of Text Consider an n-dimensional smooth manifold M, over which a notion of volume (as defined in the previous chapter) exists Then, to any domain A M one can associate its volume: A V[A] (3112) A particular volume distribution having been introduced over M, once for all, different probability distributions may be considered, that we are about to characterize axiomatically We shall say that a probability distribution (or, for short, a probability) has been defined over M if to any domain A M we can associate an adimensional real number, A P[A] (3113) called the probability of A, that satisfies Postulate 31 for any domain A M, P[A] 0 ; (3114) Postulate 32 for disjoints domains of the manifold M, the probabilities are additive: A 1 A 2 = P(A 1 A 2 ) = P(A 1 ) + P(A 2 ) ; (3115) Postulate 33 the probability distribution must be absolutely continuous with respect to the volume distribution, ie, the probability P(A) of any domain A M with vanishing volume must be zero: V[A] = 0 P[A] = 0 (3116) The probability of the whole manifold M may be zero, it may be finite, or it may be infinite The first two axioms are due to Kolmogorov (1933) In common texts, there is usually an axiom concerning the behaviour of a probability when we consider an infinite collection 11 of sets, A 1, A 2, A 3, but this is a technical issue that I choose to ignore Our third axiom here is not usually introduced, as the distinction between the volume distribution and a probability distribution is generally not made: both are just considered as examples of measure distributions This distinction shall, in fact, play a major role in the theory that follows When the probability of the whole manifold is finite, a probability distribution can be renormalized, so as to have P(M) = 1 We shall then say that we face an absolute probability If a probability distribution is not normalizable, we shall say 11 Presentations of measure theory that pretend to mathematical rigor, assume finite additivity or, alternatively, countable additivity See, for instance, the interesting discussion in Jaynes (1995)

107 34 Probabilities over Volume Manifolds 97 that we have a relative probability : in that case, what usually matters is not the probability P[A] of a domain A M, but the relative probability between two domains A 1 and A 2, denoted P( A 1 ; A 2 ), and defined as P( A 1 ; A 2 ) P(A 1) P(A 2 ) (3117)

108 98 Probabilities 3412 Third Piece of Text In what follows, a generic point of the manifold M is denoted with the symbol P, as in P M The point of the manifold where some scalar quantity is considered is explicitly designated For instance, the expression giving the volume of a domain A M than in equation 292 was simply written V[A] = A dv, shall, from now on, be written V[A] = dv(p) (3118) P A We have just defined a probability distribution over the manifold M that is absolutely continuous with respect to the volume distribution over the manifold Then, by virtue of the Radon-Nikodym theorem (eg, Taylor, 1966), one can define over M a volumetric probability f (P) such that the probability of any domain A of the manifold can be obtained as P[A] = dv(p) f (P) (3119) P A Note that this equation makes sense even if no particular coordinate system is defined over the manifold M, as the integral here can be understood in the sense suggested in figure 23 If a coordinate system x = {x 1,, x n } is defined over M, we may well wish to write equation 3119 as P[A] = x A dv x (x) f x (x), (3120) where, now, dv x (x) is to be understood as the special expression of the volume element in the coordinates x One may be interested in using the volume element dv x (x) directly for the integration (as suggested in figure 23) Alternatively, one may wish to use the coordinate lines for the integration (as suggested in figure 24) In this case, one writes (equation 284) to get P[A] = dv x (x) = g x (x) dv x (x), (3121) x A dv x (x) g x (x) f x (x) (3122) Using dv x (x) = dx 1 dx n (equation 239) and g x (x) = det g(x) (equation 272), this expression can be written in the more explicit form P[A] = dx 1 dx n det g(x) f x (x) (3123) x A These two (equivalent) expressions may be useful for analytical developments, but not for numerical evaluations, where one should choose a direct handling of expression 3120

109 34 Probabilities over Volume Manifolds Fourth Piece of Text In a change of coordinates x y(x), the expression 3120 becomes P[A] = P[A] = x A y A dv x (x) f x (x) (3124) dv y (y) f y (y) (3125) where dv y (y) and f y (y) are respectively the expressions of the volume element and of the volumetric probability in the coordinates y These are actual invariants (in the tensorial sense), so, when comparing this equation (written in the coordinates y ) to equation 3120 (written in the coordinates x ), one simply has, at every point, or, to be more explicit, f y = f x, (3126) dv y = dv x f y (y) = f x ( x(y) ) ; dv y (y) = dv x ( x(y) ) (3127) That under a change of variables x y one has f y = f x for volumetric probabilities, is an important property It contrasts with the property found in usual texts (where the Jacobian of the transformation appears): remember that we are considering here volumetric probabilites, not the usual probability densities Later on this text, a systematic procedure for representing volumetric probabilities under a change of variables is suggested (see figures 47, 410 and 411)

110 100 Probabilities 342 Volumetric Histograms and Density Histograms A volumetric probability or a probability density can be obtained as the limit of a histogram It is important to understand which kind of histogram produces as a limit a volumetric probability and which other kind of histogram produces a probability density In short, when counting the number of samples inside a division of the space into cells of equal volume one obtains a volumetric probability If, instead, one divides the space into cells of equal capacity (ie, in fact, one divides the space using constant coordinate increments), one obtains a probability density Figure 38 presents a one-dimensional example of the building of a histogram The manifold M into consideration here is a one-dimensional manifold where each point represents the volumetric mass M/ V of a rock As a coordinate over this one-dimensional manifold, we can use the value ρ = M/ V, but we could use its inverse l = 1/ρ = V/ M as well (or, as considered below, the logarithmic volumetric mass) Let us first use the volumetric mass ρ To make a histogram whose limit would be a volumetric probability we should divide the one-dimensional manifold M into cell of equal one-dimensional volume, ie, of equal length This requires that a definition of the distance between two point of the manifold is used For reasons exposed in chapter??, the good definition of distance between the point ρ 1 and the point ρ 2 is D = log(ρ 2 /ρ 1 ) When dividing the manifold M into cells of constant length D one obtains the histogram in the middle of figure, whose limit is a (one-dimensional) volumetric probability f (ρ) When instead of dividing the manifold M in cells of constant length D one uses cells of constant coordinate increment ρ, one obtains a different histogram, displayed at the top of the figure Its limit is a probability density f (ρ) The relation between the volumetric probability f (ρ) and the probability density f (ρ) is that expressed in equation?? As D = log((ρ + ρ)/ρ) = ρ/ρ +, in this one-dimensional example, the equivalent of equation 3121 is Therefore, equation?? here becomes dd = 1 dρ (3128) ρ f (ρ) = f (ρ)/ρ, (3129) this being the relation between the volumetric probability and the probability density obtained in figure 38 The advantage of using the histogram that produces a probability density is that one does not need to care about possible definitions of length or volume: whatever the coordinate being used, ρ, or l = 1/ρ, one has only to consider constant coordinate increments, δρ, or δl, and make the histogram The advantage in using the histogram that produces a volumetric probability is that, as obvious in the figure, when dividing the manifold into consideration in cells

111 34 Probabilities over Volume Manifolds 101 Figure 38: Histograms of the volumetric mass of the 571 different rock types quoted by Johnson and Olhoeft (1984) The histogram at the top, where cells with constant values of ρ have been used, has a probability density f (ρ) as limit The histogram in the middle, where cells with constant length D = ρ/ρ have been used, has a volumetric probability f (ρ) as limit The relation between the two is f (ρ) = f (ρ)/ρ (see text for details) In reality there is one natural variable for this problem (bottom), the logarithmic volumetric mass ρ = log(ρ/ρ 0 ), as for this variable, intervals of constant length are also intervals with constant increment of the variable 0 10 g/cm3 20 g/cm3 ρ 0 10 g/cm3 20 g/cm3 ρ ρ = log 10 (ρ/ρ 0 ) ρ 0 = 1 g/cm3

112 102 Probabilities of equal volume (here equal length), the number of samples inside each cell tends to be more equilibrated, and the histograms converges more rapidly into significant values Example 39 Note: explain here what is a volumetric histogram and a density histogram Say that while the limit of a volumetric histogram is a volumetric probability, the limit of a density histogram is a probability density Note: Introduce the notion of naïve histogram Consider a problem where we have two physical properties to analyze The first is the property of electric resistance-conductance of a metallic wire, as it can be characterized, for instance, by its resistance R or by its conductance 4 C = 1/R The second is the cold-warm property of the wire, as it can be charcterized by its temperature T or its thermodynamic parameter β = 1/kT (k being the Boltzmann constant) The parameter manifold is, here, two-dimensional In the resistance-conductance manifold, the distance between two points, characterized by the resistances R 1 and R 2, or by the conductances C 1 and C 2 is, as explained in section XXX, D = log R 2 = log C 2 (3130) R 1 Similarly, in the cold-warm manifold, the distance between two points, characterized by the temperatures T 1 and T 2, or by the thermodynamic parameters β 1 and β 2 is D = log T 2 = log β 2 (3131) T 1 An homogeneous probability distribution can be defined as Bla, bla, bla In figure??, the two histograms that can be made from the two first diagrams give the volumetric probability The naïve histrogram that could be made form the diagram at the right would give a probability density C 1 β 1 (ν 0 = 1 Hz) Figure 39: Note: explain here how to make a volumetric histogram Explain that when the electric resistance or the temperature span orders of magnitude, some diagrams become totally impractical ν = log 10 ν/ν ν = log 10 ν/ν 0

113 34 Probabilities over Volume Manifolds Hz Figure 310: Note: write this caption Hz 0 Hz 0 Hz 500 Hz 1000 Hz 1500 Hz

114 104 Probabilities 343 Homogeneous Probability Function kaka kaka kaka Definition 39 Homogeneous probability Let a volume triplet ( Ω, F, V ) be given If the volume measure of the whole set, V(Ω), is finite, then one can introduce the homogeneous probability, denoted H, that, by definition, to any set A F associates a probability H[A] proportional to V[A], the volume measure of A Example 310 If Ω is a set with a finite number of elements, say n, then the homogenenous probability H associates, to every element ω Ω the (constant) elementary probability p(ω) = 1/n (3132) Example 311 If Ω is a manifold with finite volume measure V(Ω), then the homogenenous probability H associates, to every point P Ω the (constant) volumetric probability f (P) = 1/V(Ω) (3133) Definition 310 Step probability Let a volume triplet ( Ω, F, V ) be given To every subset A F with finite volume measure V[A], we associate a probability, denoted H A, that to any A F associates a probability proportional to the volume measure of A A Example 312 For a discrete probability, let k be the number of elements in A, k = V[A] The elementary probability associated to H A is (see figure 311) { 1/k if ω A p(ω) = (3134) 0 if ω / A Example 313 For a manifold, let V[A] be the volume measure of A The volumetric probability associated to H A is { 1/V[A] if P A f (P) = (3135) 0 if P / A

115 34 Probabilities over Volume Manifolds 105 p0 1 1/4 A 1/4 1/ /4 0 0 Figure 311: The step probability H A associated to a set A The values of the elementary probability associated to H A are proportional to the values of the indicator of the set A (compare this with figure 11) Example 314 (Note: compare this with example 37) Consider a homogeneous probability distribution at the surface of a sphere of unit radius When parameterizing a point by its spherical coordinates (θ,ϕ), the associated (2D) volumetric probability is The probability of a domain A of the surface is computed as P[A] = f (θ,ϕ) = 1 4π (3136) }{{} {θϕ} A ds(θ, ϕ) f (θ, ϕ), (3137) where ds(θ,ϕ) = sinθ dθ dϕ is the usual surface element of the sphere The total probability (over the whole sphere) of course equals one kaa kaka kaka Note: explain that the measure (or size) of a discrete set is the number of its elements The measure of a set A A 0 can be expressed as M[A] = χ A0 (a), (3138) a A where χ A is the indicator function associated to a set A If the measure of the whole set A 0 is finite, say M 0, M 0 = M[A 0 ], (3139)

116 106 Probabilities then we can introduce a probability function, denoted H, that to any A A 0, associates the probability value Then, with the elementary probability function H[A] = M[A] M 0 (3140) H[A] = h(a), (3141) a A h(a) = χ A 0 (a) M 0 (3142) kaka kaka kaka If the set if a manifold, there is not natural measure of a set (that would independent of any coordinate system), and, for this reason, we have assumed above the existence of a particular volume element dv Then, the measure (or volume) of a set A M is M[A] = dv (3143) a special definition must be introduced This is typically done by choosing an arbitrary system of coordinates, and, in those coordinates, select one particular volume density (or measure density ) Note: rewrite this section A probability distribution P is homogeneous if the probability asociated by P to any domain A M is proportional to the volume of the domain, ie, if there is a constant k such for any A M, P A P[A] = k V[A] (3144) The homogenenous probability distribution is not necessarily normed It follows immediately from this definition that the volumetric probability f associated to the homogeneous probability distribution is constant: f (P) = k (3145) Should one not work with volumetric probabilities, but with probability densities, one should realize that the probability density f associated to the homogeneous probability distribution is not necessarily constant For a probability density depends on the coordinates being used When using a coordinate system x = {x i }, where

117 34 Probabilities over Volume Manifolds 107 the metric determinant takes the value g x (x), the probability density representing the homogeneous probability distribution is (using equation??) f x (x) = k g x (x) (3146) For instance, in the physical 3D space, when using spherical coordinates, the homogeneous probability density (this is a short name for the probability density representing the homogeneous probability distribution ) is f (r,θ,ϕ) = k r 2 sinθ

118 108 Probabilities 344 Image of a Volumetric Probability In section 332 we have obtained the expression for the image of a probability function (defined over a manifold) in terms of probability densities This has been possible because the notion of volume is not necessary to define the image of a probability density through a mapping We arrived at (equation 393) g(y) = x : ϕ(x)=y f (x) det Φ(x) t Φ(x) ( if m n ) (3147) In the case n m, bla, bla, bla Now, should the manifolds have a notion of volume defined, there would be, for the coordinates x and y being used over the two manifolds, the volume densities (see chapter 2) v M (x), and v N (y), so that the respective volume elements can be written in terms of the capacity elements as dv M (x) = v M (x) dv M ; dv N (y) = v N (y) dv N, (3148) and the probability densities would be related to the volumetric probabilities as f (x) = v M (x) f (x) ; g(y) = v N (y) g(y) (3149) Equation 3147 would then translate into g(q) = f (P) P : ϕ(p)=q v M (P) v N (ϕ(p) ) det Φ(P) t Φ(P) ( if m n ) (3150) (note: explain this) Should the two volume elements derive, in fact, from two metric tensors, v M (P) = det γ M (P) ; v N (Q) = det γ N (Q), (3151) and we could rewrite equation 3150 as g(q) = f (P) P : ϕ(p)=q det γ(p) det Φ(P) t Γ(ϕ(P) ) Φ(P) ( if m n ) (3152)

119 34 Probabilities over Volume Manifolds Intersection of Volumetric Probabilities Note: say that the definition of intersection of probability functions is in section 323 Property 310 Assume that the set Ω is a manifold, and that we choose for F the usual Borel field of Ω Let P 1 and P 2 be two probabilities, with volumetric probabilities respectively denoted f 1 and f 2 The volumetric probability representing P 1 P 2, that we may denote f 1 f 2, is given, for any point P Ω, by ( f 1 f 2 )(P) = 1 ν f 1(P) f 2 (P) (3153) where ν is the normalization factor ν = P Ω dv(p) f 1(P) f 2 (P) Example 315 Product of probability distributions Let S represent the surface of the Earth, using geographical coordinates (longitude ϕ and latitude λ ) An estimation of the position of a floating object at the surface of the sea by an airplane navigator gives a probability distribution for the position of the object corresponding to the (2D) volumetric probability f (ϕ, λ) By definition, then, the probability that the floating object is inside some region A of the Earth s surface is P[A] = A ds(ϕ, λ) f (ϕ, λ), where ds(ϕ, λ) = cos(λ) dϕ dλ An independent (and simultaneous) estimation of the position by another airplane navigator gives a probability distribution corresponding to the volumetric probability g(ϕ, λ) How the two volumetric probabilities f (ϕ, λ) and g(ϕ, λ) should be combined to obtain a resulting volumetric probability? The answer is given by the intersection of the two volumetric probabilities densities: ( f g)(ϕ, λ) = f (ϕ, λ) g(ϕ, λ) S ds(ϕ, λ) f (ϕ, λ) g(ϕ, λ) (3154)

120 110 Probabilities 346 Reciprocal Image of a Volumetric Probability If the sets A 0 and B 0 are manifolds, with points respectively denoted A, A and B, B, then the volumetric probability A f (A) representing the probability P = ϕ -1 [Q] can be expressed in terms of the volumetric probability B g(b) representing Q via f (A) = 1 ν g(ϕ(a) ), (3155) where ν = A A 0 dv g(ϕ(a) ) We shall use the notation See figure 37 f = ϕ -1 (g) (3156)

121 34 Probabilities over Volume Manifolds Compatibility Property Note: remember that we have ϕ( f (ϕ -1 (g) ) ) = ϕ( f ) g (3157) Properties like this one suggest that volumetric probabilities can be interpreted as fuzzy sets It is then arguable that probability theory is the fuzzy set theory

122 112 Probabilities 348 Assimilation of Uncertain Observations (the Bayes-Popper Approach) Note: the preset text is not Bayes-Popper It has to be modified And I have to find an example, where the approach becomes a theorem An expression like x A has a crystal clear meaning: the element x belongs to the set A I want to give meaning to the expression x ɛ f, where f is a volumetric probability (For the time being, I use here the symbol ɛ instead of the symbol, with the hope of being later able to use a unique symbol) It could be something like the point x is a sample point of the volumetric probability f (x) But this seems not to lead to simple consequences So, at present, my guess is that the right definition is as follows: When f is a volumetric probability, the expression x ɛ f means that the actual value of the quantity x is unknown, but, should you have to place bets on what this actual value could be, we inform you that you should place bets according to the volumetric probability f (x) For short: The expression x ɛ f is to be read x PYBAT f (x), this meaning for x Place Your Bets According To f (x) Now, if equation XXX XXX XXX XXX remains valid for volumetric probabilities (it has become equation??), will equation 142 also remain valid, ie, can we demonstrate that the expression x ɛ f AND ϕ(x) ɛ p x ɛ f AND ϕ(x) ɛ p, where f = f ϕ -1 (p), p = ϕ( f ) p, is a theorem? The meaning of the theorem would be as follows (3158) There is some given (but unknown) value of x and the associated value ϕ(x) If we are informed that we should place our bets on the actual value of x according to f (x), and, independently, we are informed that we should place our bets on the actual value of y = ϕ(x) according to p(y), then, we must conclude that, in fact,

123 34 Probabilities over Volume Manifolds 113 and we should place our bets on the actual value of x according to f (x), we should place our bets on the actual value of y = ϕ(x) according to p (y) This conjecture may be false But, if it is true, this would be, at last, a justification of the formulas that Tarantola (1987, 2005) and Mosegaard and Tarantola (1995) have been using I find quite pleasant that there is some flavor of game theory in this conjecture I am working to demonstrate it

124 114 Probabilities 349 Popper-Bayes Algorithm Let M p be a p-dimensional manifold, with volume element dv p and let M q a q- dimensional manifold, with volume element dv q The points of M p are denoted P, P, while the points of M q are denoted Q, Q Let f be a volumetric probability over M p and let ϕ be a volumetric probability over M q Let P Q = a(p) (3159) be an application from M p into M q Let P be a sample point of f, and submit this point to a survive or perish test, the probability of survival being π = ϕ( a(p) )/ϕ max (3160) (and the probability of perishing being 1 π ) If the point P perishes, then, we start again the survive or perish test with a second sample point of f, and so on, until one point survives Property: The surviving point is a sample point of a volumetric probability h over M p whose normalized expression is h(p) = 1 ν f (P)ϕ( a(p) ), (3161) the normalizing constant being ν = M dv p f (P)ϕ( a(p) ) [NOTE: this is still a conjecture, that shall be demonstrated here]

125 34 Probabilities over Volume Manifolds Exercise This exercise is all explained in the caption of figure 312 The exercise is the same than that in section 141, excepted that while here we use volumetric probabilities, in section 141 we use sets The initial (truncated) Gaussians here (initial information on x and on y ) are such that the two-sigma intervals of the Gaussians correspond to the intervals defining the sets in section 141 Therefore, the results are directly comparable

126 116 Probabilities y 40 y = (x) 30 p(y) q(y) g(x) x f(x) Figure 312: Two quantities x and y have some definite values x true and y true, that we try to uncover For the time being, we have some (independent) information on these two quantities, represented by the volumetric probabilities f (x) and p(y) (black functions in the figure) We then learn that, in fact, y true = ϕ(x true ), with the function x y = ϕ(x) = x 2 (x 3) 3 represented at the top-right To obtain the volumetric probability representing the posterior information on x true there is only the following way The volumetric probability p(y) is transported from the ordinate to the abscissa axis, by application of the notion of preimage of a volumetric probability This gives the volumetric probability [ϕ -1 (p)](x), by application of the formula?? This function is represented in blue (dashed line) at the bottom of the figure The volumetric probability representing the posterior information on x true is then obtained by the intersection of f (x) and of [ϕ -1 (p)](x), using formula?? This gives the function g(x) represented in red at the bottom In total, then, we have evaluated g = f ϕ -1 (p) To obtain the posterior information on y true we just transport the g(x) just obtained from the abscissa into the ordinate axis, ie, we compute the image p = ϕ(g) of g using formula?? This gives the function q(y) represented in red at the left of the figure We then have q = ϕ(g) = ϕ( f ϕ -1 (p)) We could have arrived at q(y) following a different route We could first have evaluated the image of f (x), to obtain the function [ϕ( f )](y), represented in blue (dashed line) at the left The intersection of p(y) with [ϕ( f )](y) then gives the same q(y), because, as demonstrated in the text, ϕ( f ϕ -1 (p)) = ϕ( f ) p Note: the original functions f (x) and p(y) were two Gaussians (respectively centered at x = 3 and y = 18, and with standard deviations σ x = 1 and σ y = 2 ), truncated inside the working intervals x [ 1/2, 7 ] and y [ϕ( 1, 2), ϕ(7) ]

127 35 Probabilities over Metric Manifolds Probabilities over Metric Manifolds Note: explain here that to introduce the notion of conditional volumetric probability given the constraint Q = ϕ(p), we need more than volume manifolds, we need metric manifolds (in order to be able to taje an uniform limit ) 351 Conditional Probability 3511 First Piece of Text Note: explain here that a condition is a subset Example: a 3 ( P( a a 3 ) ) Example: b = a 2 ( P( a, b b = a 2 ) ) Example: b = a 2 ( P( a, b b = a 2 ) ) Consider a probability P over a set A 0, and a given set C A 0 of nonzero probability (ie, such that with P[C] = 0 ) The set C is called the condition The conditional probability (with respect to the condition C ) is, by definition, the probability (over A 0 ) that to any A A 0 associates the number, denoted P[ A C ], defined as P[ A C ] = P[ A C ] (3162) P[C] This number is called the conditional probability of A given (the condition) C This, of course, is the original Kolmogorov s definition of conditional probability, so, to demonstrate that this defines, indeed, a probability, we can just outline here the original demonstration (Note: do it!) Example 316 A probability P over a discrete set A 0 with elements {a, a } is represented by the elementary probability p defined, as usual, a p(a) = P[{a}] Then, the probability of any set A A 0 is P[A] = a A p(a) Given, now, a set C A 0, the elementary probability that represents the probability P[ C ], denoted p( C ), is given (for any a A 0 ) by p( a C ) = { p(a) / ν if a C 0 if a / C, (3163) where ν is the normalizing constant ν = a C p(a) Then, for any A A 0, P[ A C ] = a A p( a C )

128 118 Probabilities Example 317 Let us consider a probability over N N By definition, then, each element a N N is an ordered pair of natural numbers, that we may denote a = {n, m} To any probability P over N N is associated a nonnegative real function p(n, m) defined as Then, for any A N N p(n, m) = P[{n, m}] (3164) P[A] = p(n, m) (3165) {n,m} A As suggested above, we call p(n, m) the elementary probability of the element {n, m} While P associates a number to every subset of N N, p associates a number to every element of N N (Note: I have already introduced this notion above; say where) Introduce now the condition m = 3 This corresponds to the subset C of N N made by all pairs of numbers of the form {n, 3} If P is such that P[C] = 0, we can introduce the conditional probability P[ C ] To this conditional probability is associated an elementary probability q(n, m), that that we may also denote p( n, m m = 3 ) It can be expressed as { p(n, m) / q(n, m) = p( n, m m = 3 ) = n N p(n, m) if m = 3 (3166) 0 if m = 3 Note that, although the elementary probability q(n, m) takes only nonzero values when m = 3, it is a probability over N N, not a probability over N The definition of conditional probability applies equally well when we deal with discrete probabilities or when we deal with manifolds But when working with manifolds, there is one particular circumstance that needs clarification: when instead of conditioning the original probability by a subset of points that has (as a manifold) the same dimensions as the original manifold, we consider a submanifold with lower number of dimensions As this situation shall have a tremendous practical importance (when dealing with the so-called inverse problems), let us examine it here Consider a manifold M with m dimensions, and let C be a (strict) submanifold of M Denoting my c the number of dimensions of C, then, by hypothesis, c < m Consider now a probability P defined over M As usual, P[A] then denotes the probability of any set of points A M Can we define P[ A C ], the conditional probability over M given the (condition represented by the) submanifold C? The answer is negative (unless some extra ingredient is added) Let us see this (Note: explain here that we need to take a uniform limit, and, for that, we need a metric manifold) Example 318 Let M be a finite-dimensional manifold, with points respectively denoted P, P, and let C be a submanifold of M, ie, a manifold contained in M and having a smaller number of dimensions Let now P denote a probability function over M : to

129 35 Probabilities over Metric Manifolds 119 any set A M it associates the (unconditional) probability value P[ A ] How can we characterize the conditional probability function P[ C ] that to any set A M associates the probability value P[ A C ]? This can be done by considering some set C M (having the same dimension than M, and taking an uniform limit C C Such a limit can only be taken if the manifold M is metric (note: explain why) This has two implications First, there is a volume element over M, that we may denote dv M, and, therefore, the unconditional probability P can be represented by a volumetric probability f (P), such that, for any A M, P[ A ] = dv M f ( P ) (3167) P A The second implication is that a metric over the manifold M induces a metric over any submanifold of M Therefore, there will also be a volume element dv C over C (whose expression we postpone to example 319) Given this volume element, we can introduce the conditional volumetric probability over C, that we may denote f ( P N ) By definition, then, for any set B C, P[ B C ] = dv C f ( P C ) (3168) P B It turns out (note: refer here to the demonstration in the appendixes) that (at every point P of C ) the conditional volumetric probability equals the unconditional one: f ( P C ) = f ( P ) (3169) This result is not very surprising, as we have introduced volumetric probabilities to have this kind of simple results (that would not hold if working with probability densities) It remains, in this example, that the most important problem to express the induced volume element dv C is not solved (it is solved below) Example 319 Let M and N two finite-dimensional manifolds, with points respectively denoted P, P and Q, Q, and let ϕ be a mapping from M into N We can represent this mapping as P Q = ϕ(p) The points {P, Q} of the manifold M N that satisfy Q = ϕ(p), ie, the points of the form { P, ϕ(p) }, constitute a submanifold, say C, of M N The dimension of C equals the dimension of M Consider now a probability P over M N By definition, the (unconditional) probability value of any set A M N is P[A] Which is the conditional probability P[ A Q = ϕ(p) ]? Again, one starts by introducing some set C M N, and tries to define the conditional probability function by taking an uniform limit C C To define an uniform limit, we need a metric over M N Typically, one has a metric over M, with distance element dsm 2, and a metric over N, with distance element dsn 2, and one defines ds2 M N = ds2 M + ds2 N Let g M denote the metric tensor over M and let g N denote the metric tensor over N Then, the metric tensor over M N is g M N = g M g N, and as demonstrated in appendix XXX, the metric tensor induced over C is g C = g M + Φ t g N Φ, (3170)

130 120 Probabilities where Φ denotes the (linear) tangent mapping to the mapping ϕ (at the considered point), and where Φ t is the transpose of this linear mapping We then have (i) a volume element dv M over M (remember that a choice of coordinates over a metric manifold induces a capacity element dv, and that the volume element can then be expressed as dv = g dv, with g = det g ), (ii) a volume element dvn over N, (iii) a volume element dv M N = dv M dv N (note: check this!) over M N, and (iv) a volume element dv C over C (the volume element associated to the metric given in equation 3170) We can now come back to the original (unconditional) probability P[ ] defined over M N Associated to it is the (unconditional) volumetric probability f (P, Q), and, by definition, the probability of any set A M N is P[ A ] = dv M N f (P, Q) (3171) {P,Q} A The conditional volumetric probability f ( P, Q Q = ϕ(p) ), associated to the conditional probability function P[ Q = ϕ(p) ] is, by definition, such that the conditional probability of any set B C is obtained as P[ B Q = ϕ(p) ] = dv C f ( P, Q Q = ϕ(p) ) (3172) {P,Q} B But, for the reason explained in the previous example, for any point (on C ), f ( P, Q Q = ϕ(p) ) = f (P, Q), so, for any set B C, P[ B Q = ϕ(p) ] = dv C f (P, Q) (3173) {P,Q} B There are two differences between the sums at the right in expressions 3171 and 3173: (i) the first sum is over a subset of the the manifold M N, while the second sum is over a subset of the submanifold C ; and (ii) in the first sum, the volume element is the original volume element over M N, while in the second sum, it is the volume element over C that is associated to the induced metric g C (expressed in equation 3170) (Note: Say here that the expression of this volume element when some coordinates are chosen over the manifolds, is given elsewhere in this text) Note: In the next section, the marginal of the conditional is defined Using the formulas obtained elsewhere in this text (see section 354), we obtain the result for the marginal of the conditional that the probability of any set M M is where P M [ M Q = ϕ(p) ] = 1 ν ω(p) = P M dv M ω(p) f ( P, ϕ(p) ), (3174) det( g M + Φ t g N Φ ), (3175) and where ν is a normalization constant In a typical inverse problem one has f (P, Q) = g(p) h(q), (3176)

131 35 Probabilities over Metric Manifolds 121 and equation 3174 becomes P M [ M Q = ϕ(p) ] = 1 ν P M dv M ω(p) g(p) h(ϕ(p) ) (3177) While the prior volumetric probability is g(p), we see that the posterior volumetric probability is i(p) = 1 ν g(p) L(P), (3178) where the likelihood volumetric probability L(P) is L(P) = ω(p) h(ϕ(p) ) (3179) Note that we have the factor ω(p) that is not there when, instead of the marginal of the conditional approach, we follow the mapping of probabilities approach

132 122 Probabilities 3512 Second Piece of Text We need here to introduce a very special probability distribution, denoted H B, that is homogeneous inside a domain D M and zero outside: for any domain B with finite volume V(B) we then have the volumetric probability { 1/V(B) if P B h B (P) = (3180) 0 if P B Warning: this definition has already been introduced What is the result of the intersection P H B of this special probability distribution with a normed probability distribution P? This intersection can be evaluated via the product of the volumetric probabilities, that, in a normalized form (equation??) gives (p h B )(P) = p(p)/ P B dv(p ) p(p ) if P B, and zero if P B As P B dv(p) p(p) = P(B), we have, in fact, ( p hb ) (P) = {p(p)/p(b) if P B 0 if P B (3181) The probability of a domain A M is to be calculated as (P Q)(A) = ( P A p ) h B (P) = 1 P(B) P A B p(p), and this gives ( P HB ) (A) = P(A B) P(B) (3182) The expression at the right corresponds to the usual definition of conditional probability One usually says that it represents the probability of the domain A given the domain B, this suggesting a possible use of the definition For the time being, I prefer just to regard this expression as the result of the intersection of an arbitrary probability distribution P and the probability distribution H B (that is homogeneous inside B and zero outside) Note: should I introduce here the notion of volumetric probability, or should I simply send the reader to the appendix? Note: mention somewhere figures 313 and 314 Figure 313: Illustration of the intersection of two probability distributions, via the product of the associated volumetric probabilities P( ) f(x) Q( ) P Q g(x) (P Q)( ) (f g)(x) = k f(x) g(x) (f g)(x)

133 35 Probabilities over Metric Manifolds 123 Figure 314: Illustration of the definition of conditional probability Given an intial probability distribution P( ) (left of the figure) and a set B (middle of the figure), P( B) is identical to P( ) inside B (except for a renormalization factor guaranteeing that P(B B) = 1 ) and vanishes outside B (right of the figure) P( ) p(x) P(A B) P(A B) = P(B) B H(x) P( B) p(x B) = k p(x) H(x) p(x B)

134 124 Probabilities 3513 Third Piece of Text Assume that there is a probability distribution defined over an n-dimensional metric manifold M n This probability distribution is represented by the volumetric probability f n (P) Inside the manifold M n we consider a submanifold M p with dimension p n Which is the probability distribution induced over M p? This situation, schematized in figure 315, has a well defined solution because we assume that the initial manifold M n is metric 12 Figure 315: In an n-dimensional metric manifold M n, some random points suggest a probability distribution On this manifold, there is a submanifold M p, and we wish to evaluate the probability distribution over M p induced by the probability distribution over M n Mn Mp The careful, quantitative, analysis of this situation is done in appendix 536 The result obtained is quite simple, and could have been guessed Let us explain it here with some detail The initial volumetric probability (over M n ) is f n (P), that me may assume to be normalized to one The probability of a domain D n M n is evaluated via P(D n ) = P D n dv n (P) f n (P) (3183) It happens (see appendix 536) that the induced volumetric probability over M p has the value f n (P) f p (P) = P D p dv p (P ) f p (P (3184) ) at any point P M p (and is undefined at any point P M p ) So, on the submamifold M p, the induced volumetric probability f p (P) takes the same values than the original volumetric probability f n (P), excepted for a renormalization factor The probability of a domain D p M p is to be evaluated as P(D p ) = P D p dv p (P) f p (P) (3185) 12 Therefore, we can consider a bar of constant thickness around M p and take the limit where the thickness tends uniformly to zero All this can be done without using any special coordinate system, so we obtain an invariant result If the manifold is not metric, there is no way to define a uniform limit

135 35 Probabilities over Metric Manifolds 125 Example 320 In the Euclidean 3D space, consider an isotropic Gaussian probability distribution with standard deviation σ Which is the conditional (2D) volumetric probability it induces on the surface of a sphere of unit radius whose center is at unit distance from the center of the Gaussian? Using geographical coordinates (see figure 316), the answer is given by the (2D) volumetric probability ( ) sin λ f (ϕ, λ) = k exp, (3186) where k is a norming constant (demonstration in section 5314) This is the celebrated Fisher probability distribution, widely used as a model probability on the sphere s surface The surface element over the surface of the sphere could be obtained using the equations , but it is well known to be ds(ϕ, λ) = cos λ dϕ dλ σ 2 Figure 316: The spherical Fisher distribution corresponds to the conditional probability distribution induced over a sphere by a Gaussian probability distribution in an Euclidean 3D space (see example 320) To have a full 3D representation of the property, this figure should be rotated around the vertical axis ϑ Equations contain the induced volume element dv p (P) Sometimes, this volume element is directly known, as in example 320, but when dealing with abstract manifolds, while the original n-dimensional volume element dv n (P) may be given, the p-dimensional volume element dv p (P) must be evaluated To do this, let us take over M n a coordinate system x = {x 1,, x n } adapted to our problem Separating these coordinates into a group of p coordinates r = {x 1,, x p } and a group of q = n p coordinates s = {s 1,, s q }, such that the p-dimensional manifold M p is defined by the set of q constraints s = s(r) (3187) Note that the coordinates r define a coordinate system over the submanifold M p The two equations can now be written f p (r s(r)) = f (r, s(r)) r M p dv p (r) f (r, s(r)) (3188) and P(D p ) = r D p dv p (r) f p (r s(r)) (3189)

136 126 Probabilities In these equations, the notation f p (r s(r)) is used instead of just f p (r), to remember the constraint defining the conditional volumetric probability The volume element of the submanifold can be written dv p (r) = g p (r) dv p (r), (3190) with dv p (r) = dr 1 dr p, and where g p (r) is the metric determinant of the metric g p (r) induced oner the submanifold M p by the original metric over M n : g p (r) = We have obtained in section 522 (equation 515, page 178): det g p (r) (3191) g p (r) = g rr + g rs S + S T g sr + S T g ss S, (3192) where S is the matrix of the partial derivatives of the relations s = s(r) The probability of a domain D p M p can then either be computed using equation 3189 or as P(D p ) = dv p (r) g p (r) f p (r s(r)) (3193) r D p The reader should realize that have here a conditional volumetric probability, not a conditional probability density The expression for the conditional probability density is given in appendix 537 (equation 592, page 198), and is quite complicated This is so because we care here to introduce a proper (ie, metric) limit The expressions proposed in the literature, that look like expression 3188, but are written with probability densities, are quite arbitrary The mistake, there, is to take, instead a uniform limit, a limit that is guided by the coordinates being used, as if a coordinate increment was equivalent to a distance See the discussion in figures 317 and 318 Figure 317: In a two-dimensional manifold, some random points suggest a probability distribution On the manifold, there is a curve, and we wish to evaluate the probability distribution over the curve induced by the probability distribution over the manifold Note: say somewhere that a nontrivial example of application of the notion of conditional volumetric probability is made in section 5563 (adjusting a measurement to a theory)

137 35 Probabilities over Metric Manifolds 127 Figure 318: To properly define the induced probability over the curve (see figure 317), one has to take a domain around the curve, evaluate the finite probability, and take the limit when the size of the domain tends to zero The only intrinsic definition of limit can be made when the considered manifold is metric, as the limit can be taken uniform and normal to the curve Careless definitions of conditional probability density work without assuming the there is metric over the manifold, and just take a sort of vertical limit as suggested in the middle This is as irrelevant as it would be to take a horizontal limit (at the right) Some of the paradoxes of probability theory (like Borel s paradox) arise from this inconsistency Example 321 In the case where we work in a two-dimensional manifold M 2, with p = q = 1, we can use the notation r and s instead of r and s, so that the constraint 3187 is written s = s(r), (3194) and the matrix of partial derivatives is now a simple real quantity S = ds/dr The conditional volumetric probability on the line s = s(r) induced by a volumetric probability f (r, s) is (equation 3188), f 1 (r) = f (r, s(r)) dl(r ) f (r, s(r )) where, if the metric of the manifold M 2 is written g(r, s) =, (3195) ( ) grr (r, s) g rs (r, s), the g sr (r, s) g ss (r, s) (1D) volume element is (equations ) dl(r) = g rr (r, s(r)) + 2 S(r) g rs (r, s(r)) + S(r) 2 g ss (r, s(r)) dr (3196) The probability of an interval (r 1 < r < r 2 ) along the line s = s(r) is then P = r2 r 1 dl(r) f 1 (r) If the constraint 3194 is, in fact, s = s 0, then, equation 3195 simplifies into f (r, s f 1 (r) = 0 ) dl(r ) f (r, (3197), s 0 ) and, as the partial derive vanishes, S = 0, the length element 3196 becomes dl(r) = g rr (r, s 0 ) dr (3198)

138 128 Probabilities Example 322 Consider two Cartesian coordinates {x, y} on the Euclidean plane, associated to the usual metric ds 2 = dx 2 + dy 2 It is easy to see (using, for instance, equation 265) that the metric matrix associated to the new coordinates (see figure 319) is g(r, s) = r = x ; s = x y (3199) ( 1 + s 2 /r 4 s/r 3 ) s/r 3 1/r 2, (3200) with metric determinant det g(r, s) = 1/r Assume that all what we know about the position of a given point is described by the volumetric probability f (r, s) Then, we are told that, in fact, the point is on the line defined by the equation s = s 0 What can we now say about the coordinate r of the point? This is clearly a problem of conditional volumetric probability, and the information we have now on the position of the point is represented by the volumetric probability (on the line s = s 0 ) given by equation 3197: f 1 (r) = f (r, s 0 ) dl(r ) f (r, s 0 ) (3201) Here, considering the special form of the metric in equation 3200, the length element given by equation 3198 is dl(r) = 1 + s 2 0 /r4 dr (3202) The special case s = s 0 = 0 gives f 1 (r) = f (r, 0) dl(r ) f (r, 0) ; dl(r) = dr (3203) y = +1 v = +1 Figure 319: The Euclidian plane, with, at the left, two Cartesian coordinates {x, y}, and, at the right the two coordinates u = x ; v = x y x = 0 x = 02 x = 04 x = 06 x = 08 x = 1 y = +05 y = 0 y = -05 y = -1 u = 0 u = 02 u = 04 u = 06 u = 08 u = 1 v = +05 v = 0 v = -05 v = -1 Example 323 To address a paradox mentioned by Jaynes (2003), let us solve the same problem solved in the previous example, but using the Cartesian coordinates {x, y} The information that was represented by the volumetric probability f (r, s) is now represented by the volumetric probability h(x, y) given by (as volumetric probabilities are invariant objects) h(x, y) = f (r, s) r=x ; s=x y (3204)

139 35 Probabilities over Metric Manifolds 129 As the condition s = 0 is equivalent to the condition y = 0, and as the metric matrix is the identity, it is clear that the shall arrive, for the (1D) volumetric probability representing the information we have on the coordinate x to h 1 (x) = h(x, 0) dl(x ) h(x, 0) ; dl(x) = dx (3205) Not only this equation is similar in form to equation 3203; replacing here h by f (using equation 3204) we obtain an identity that can be expressed using any of the two equivalent forms h 1 (x) = f 1 (r) r=x ; f 1 (r) = h 1 (x) x=r (3206) Along the line s = y = 0, the two coordinates r and s coincide, so we obtain the same volumetric probability (with the same length elements dl(x) = dx and dl(r) = dr ) Trivial as it may seem, this result is not that found the traditional definition of conditional probability density Jaynes (2003) lists this as one of the paradoxes of probability theory It is not a paradox, it is a mistake one makes when falling into the illusion that a conditional probability density (or a conditional volumetric probability) can be defined without invoking the existence of a metric (ie, of a notion of distance) in the working space This paradox is related to the Borel-Kolmogorov paradox, that I address in appendix 5310

140 130 Probabilities 352 Marginal of a Conditional Probability Let us place here in the situation when both, conditional and marginal probabilities make sense: we have two sets A 0 and B 0, and we introduce their Cartesian product S 0 = A 0 B 0 ; if the two sets A 0 and B 0 are, in fact, manifolds, they are assumed to be metric (with respective length elements being ds 2 A 0 and ds 2 B 0 ), and the metric over S 0 is introduced as ds 2 C 0 = ds 2 A 0 + ds 2 B 0 Given a particular probability function P over S 0 = A 0 B 0 and given a particular set C S 0 with P[C] = 0 (the set C is the condition ), the conditional probability function given C is introduced as usual: it is the probability function over S 0 that to any set S S 0 associates the probability value P[ S C ] P[ S C ] P[ C ] (3207) This conditional probability function has two marginal probability functions (see section 327): the probability function over A 0 that to any set A A 0 associates the probability value P A0 [ A C ] P[ A B 0 C ], (3208) and the probability function over B 0 that to any set B B 0 associates the probability value P B0 [ B C ] P[ A 0 B C ] (3209) Explicitly, this gives P A0 [ A C ] = P[ A B 0 C ] P[ C ] and P B0 [ B C ] = P[ A 0 B C ] P[ C ] (3210) Example 324 When the two sets A 0 and B 0 are discrete, bla, bla, bla, and introducing the elementary probability p(a, b) via P[ S ] = p(a, b), (3211) {a,b} S and when the conditioning set C corresponds to a mapping a b = ϕ(a), then, introducing the two elementary probabilities p A0 ( a b = ϕ(a) ) and p B0 ( b b = ϕ(a) ) via P A0 [ A b = ϕ(a) ] = p A0 ( a b = ϕ(a) ), (3212) a A

141 35 Probabilities over Metric Manifolds 131 we arrive at (note: check this!) P B0 [ B b = ϕ(a) ] = p B0 ( b b = ϕ(a) ), (3213) b B p A0 ( a b = ϕ(a) ) = 1 ν p( a,ϕ(a) ) (3214) and (note: explain that, for every given b B 0, the summation is over all a A 0 such that ϕ(a) = b ) p B0 ( b b = ϕ(a) ) = 1 ν a : ϕ(a)=b p( a,ϕ(a) ), (3215) where ν is the normalization constant ν = a A0 p( a, ϕ(a) ), or, equivalently (note: check!), ν = b B0 a : ϕ(a)=b p( a, ϕ(a) ) Example 325 (Note: give here a concrete example of the situation analyzed in example 324 One of the drawings?) Example 326 The most important example for us shall be when the two sets A 0 and B 0 are, in fact, manifolds say M and N, and the conditioning set C is not an ordinary subset of M N but it is a submanifold of M N Then Example 327 (Note: give here a concrete example of the situation analyzed in example 326) A common situation is when the original mapping P (over A 0 B 0 ) is the product of two marginal mappings (Note: continue this, checking that the notion of independence has been already introduced)

142 132 Probabilities 353 Demonstration: marginals of the conditional Equations 330 and 332 are demonstrated as follows One considers a p-dimensional metric manifold M, that, for the sake of the demonstration we may endow with some coordinates x = {x α } = {x 1,, x p } Denoting as γ(x) the metric tensor, the volume element can then be written as dv M (x) = det γ(x) dx 1 dx p (3216) One also considers a q-dimensional metric manifold N, also endowed with some coordinates y = {y i } = {y 1,, y q } Denoting as Γ(y) the metric tensor, the volume element is, then, dv N (y) = det Γ(y) dy 1 dy q (3217) Finally, one considers a mapping ϕ from M into N, that, with the given coordinates, can be written as y = ϕ(x) We shall denote as Φ the tangent linear mapping, in fact, the matrix of partial derivatives Φ i α = y i / x α This is defined at every point x, so we can write Φ(x) One introduces the p + q-dimensional manifold M N It is easy to see that the mapping x y = ϕ(x) defines, inside M N, a submanifold whose dimension is p (the dimension of M ) Therefore, the coordinates {x α } (of M ) can also be used as coordinates over that submanifold Introducing, on that submanifold, the line element ds 2 = dsm 2 + ds2 N, one can respectively write ds 2 = ds 2 M + ds2 N = γ αβ dx α dx β + Γ i j dy i dy j = γ αβ dx α dx β + Γ i j Φ i α Φ j β dx α dx β = (γ αβ + Φ i α Γ i j Φ j β) dx α dx β, (3218) this showing that the components of the metric over the submanifold are γ αβ + Φ i α Γ i j Φ j β Said otherwise, the metric at point x of the submanifold is γ(x) + Φ t (x) Γ(ϕ(x) ) Φ(x) This implies that the volume element induced on the submanifold is dv(x) = det( γ(x) + Φ t (x) Γ(ϕ(x) ) Φ(x) ) dx 1 dx p (3219) Let now h(x, y) be a volumetric probability over M N The conditional volumetric probability is defined as the limit (which one? which one? which one?) This will lead to a volumetric probability c(x) (remember that we are using the coordinates {x α } over the submanifold) that we shall express in a moment But let us made clear before that to evaluate the probability of a set A (at the same time a set

143 35 Probabilities over Metric Manifolds 133 of M and of the submanifold) the volumetric probability c(x) has to be integrated as P[A] = dv(x) c(x), (3220) A with the volume element dv(x) expressed in equation 3220 Because (what? what? what?), the value of c(x) at any point ( x, ϕ(x) ) is just proportional to h( x, ϕ(x) ), c(x) = 1 ν h( x, ϕ(x) ), (3221) where ν is the normalization constant ν = dv(x) h( x, ϕ(x) ) (3222) M Now the two marginals f (x) and g(y) can be introduced by considering some dp on the submanifold, and by projecting it into M and into N This can be written by identifying the three expressions dp = h( x, ϕ(x) ) det( γ(x) + Φ t (x) Γ(ϕ(x) ) Φ(x) ) dx 1 dx p = f (x) det γ(x) dx 1 dx p = g(y) det Γ(y) dy 1 dy q, (3223) ie, det( γ(x) + Φ t (x) Γ(ϕ(x) ) Φ(x) ) dx 1 dx p dp = h( x, ϕ(x) ) = f (x) det γ(x) dx 1 dx p = g(y) det Φ t (x) Γ(ϕ(x) ) Φ(x) dx 1 dx p (3224) (Note: what have I done here???) And from that, it would follow f (x) = 1 ν h( x, ϕ(x) ) det(γ(x) + Φ(x) t Γ(ϕ(x) ) Φ(x)) det γ(x) (3225) g(y) = 1 ν det( γ(x) + Φt (x) Γ(y) Φ(x) ) h( x, y ) x : ϕ(x)=y det Φ(x) t Γ(y) Φ(x) (3226)

144 134 Probabilities 354 Bayes Theorem 3541 First Piece of Text When the manifold M n just considered is general, the conditional probability distribution we have defined is over the considered surface, and there is not much more we can do In the special situation where the manifold M n has been built by Cartesian product of two manifolds R p and S q, we can advance a little bit further Case M n = R p S q Assume that we have a p-dimensional metric manifold R p, with some coordinates r = {r α }, and a metric tensor denoted g r = {g αβ } We also have a q-dimensional metric manifold S q, with some coordinates s = {s a }, and a metric tensor denoted g s = {g ab } Given two such manifolds, we can always introduce the manifold M n = M p+q = R p S q, ie, a manifold whose points consist on a pair of points, one in R p and one in S q As R p and S q are both metric manifolds, it is always possible to also endow M p+q with a metric While the distance element over R p is ds 2 r = (g r ) αβ dr α dr β, and the distance element over S q is ds 2 s = (g s ) ab ds a ds b, the distance element over M p+q is ds 2 = ds 2 r + ds 2 s The relation s = s(r) (3227) considered above (equation 3187) can now be considered as a mapping from R p into S q The conditional volumetric probability f p (r s(r)) = const f (r, s(r)) (3228) of equation 3188 is defined over the submanifold M p M p+q defined by the constraints s = s(r) On this submanifold we integrate as (equation 3193) P(D p ) = dv p (r) g p (r) f p (r s(r)), (3229) r D p with dv p (r) = dr 1 dr p (because the coordinates {r α } are being also used as coordinates over the submanifold), and where the metric determinant g p (r), given in equations 3191 and 3192, here simplifies into g p (r) = det g p (r) ; g p = g r + S T g s S (3230) Again, this volumetric probability is defined over the submanifold of M p+q corresponding to the constraints s = s(r) Can we consider a volumetric probability defined over R p? Yes, of course, and this is quite easy, as we are already considering over the submanifold the coordinates {r α } that are, in fact, the coordinates of R p The only

145 35 Probabilities over Metric Manifolds 135 difference is that instead of evaluating integrals using the induced metric in equations 3230 we have to use the actual metric of R p, ie, the metric g r The basic criterion that shall allow us to make the link between the volumetric probability f p (r s(r)) (in equation 3228) and a volumetric probability, say f r (r s(r)), defined over R p is that the integral over a given domain defined by the coordinates {r α } gives the same probability (as the cordinates {r α } are common to the submanifold and to M p ) We easily obtain that the volumetric probability defined over R p is f r (r s(r)) = const det(gr + S T g s S) det gr f (r, s(r)), (3231) and we integrate on a domain D p R p as P(D p ) = dv r (r) g r (r) f r (r s(r)) = r D p r D p dv r (r) f r (r s(r)), (3232) with dv r (r) = dr 1 dr p, g r (r) = det g r, and dv r (r) = dv r (r) g r (r) See figure 320 for a schematic representation of this definition of a volumetric probability over R p When these is no risk of confusion with the function f p (r s(r)) of equation 3188, we shall also call f r (r s(r)) the conditional volumetric probability for r, given s = s(r) Remember that while f p (r s(r)) is defined on the p-dimensional submanifold M p M p+q, f r (r s(r)) is defined over R p As a special case, the relation s = s(r) may just be s = s 0, (3233) in which case, the matrix of partial derivatives vanishes, S = 0 In this case, the function f r (r s(r)) simplifies into f r (r s 0 ) = const f (r, s 0 ) When dropping the index 0 in s 0 we just write f r (r s) = const f (r, s), or, under normalized form f r (r s) = f (r, s) R p dv r (r) g r (r) f (r, s) = f (r, s) R p dv r (r) f (r, s) (3234) Example 328 With the notations of this section, consider that the metric g r of the space R p and the metric g s of the space S q are constant (ie, that both, the coordinates r α and s i are rectilinear coordinates in Euclidean spaces), and that the application s = s(r) is a linear application, that we can write s = S r, (3235) as this is consistent with the definition of S as the matrix of partial derivatives, S i α = s i / s α Consider that we have a Gaussian probability distribution over the space R p, represented by the volumetric probability f p (r) = 1 (2π) p/2 exp ( 1 ) 2 (r r 0) t g r (r r 0 ), (3236)

146 136 Probabilities s r s r Figure 320: In an n-dimensional space M n that is the Cartesian product of two spaces R p and S q, with coordinates r = {r 1,, r p } and s = {s 1,, s q } and metric tensors g r and g s, there is a volume element on each of R p and S q, and an induced volume element in M n = R p S q Given a p-dimensional submanifold manifold s = s(r) of M n, there also is an induced volume element on it A volumetric probability f (r, s) over M n, induces a (conditional) volumetric probability f x (r) over the submanifold s = s(r) (equation 3188), and, as the submanifold shares the same coordinates as R p, a volumetric probability f r (r) is also induced over R p (equation 3231)

147 35 Probabilities over Metric Manifolds 137 that is normalized via dr 1 dr p det g r f p (r) = det g r dr 1 dr p f p (r) = 1 Similarly, consider that we also have a Gaussian probability distribution over the space S q, represented by the volumetric probability f q (s) = 1 (2π) q/2 exp ( 1 2 (s s 0) t g s (s s 0 ) ), (3237) that is normalized via ds 1 ds q det g s f q (s) = det g s ds 1 ds q f q (s) = 1 Finally, consider the p + q-dimensional probability distribution over the space M p+q defined as the product of these two volumetric probabilities, f (r, s) = f p (r) f q (s) (3238) Given this p + q-dimensional volumetric probability f (r, s) and given the p-dimensional hyperplane s = S r, we obtain the conditional volumetric probability f r (r) over R p as given by equation 3231 All simplifications done 13 one obtains the Gaussian volumetric probability 14 1 det g f r (r) = r (2π) p/2 exp ( 12 ) (r r 0 )t g r (r r 0 ), (3239) det gr where the metric g r (inverse of the covariance matrix) is g r = g r + S t g s S (3240) and where the mean r 0 can be obtained solving the expression15 g r (r 0 r 0) = S t g s (s 0 S 0 r 0 ) (3241) Note: I should now show here that f s (s), the volumetric probability in the space S q is given, in all cases ( p q or p q ) by 1 det g f s (s) = s (2π) q/2 exp ( 12 ) (s s 0 )t g s (s s 0 ), (3242) det gs where the metric g s (inverse of the covariance matrix) is (g s) 1 = S (g r) 1 S t (3243) and where the mean s 0 is s 0 = S r 0 (3244) Note: say that this is illustrated in figure Note: explain this 14 This volumetric probability is normalized by dr 1 dr p det g r f r (r) = 1 15 Explicitly, one can write r 0 = r 0 + (g r) 1 S t g s (s 0 S r 0 ), but in numerical applications, the direct resolution of the linear system 3241 is preferable

148 138 Probabilities fq(s) fq(s) s = S r Figure 321: Provisional figure to illustrate example 328 fp(r) fs(s) fp(r) fs(r)

149 35 Probabilities over Metric Manifolds Second Piece of Text In equation 3234 we have written the conditional volumetric probability f r (r s) = f (r, s)/ R p dv r (r) f (r, s), while in equation 3103 we have written the marginal volumetric probability f s (s) = R p dv r (r) f (r, s) Combining the two equations gives f (r, s) = f r (r s) f s (s), (3245) an expression that we can read as follows: a joint volumetric probability can be expressed as the product of a conditional volumetric probability times a marginal volumetric probability Similarly, we could have obtained f (r, s) = f s (s r) f r (r) (3246) Combining the two last equations gives the Bayes theorem f s (s r) = f r(r s) f s (s) f r (r), (3247) that allows to express one of the conditionals in terms of the other conditional and the two marginals Of course, in general, f (r, s) = f r (r) f s (s) When one has the property then, it follows from the two equations that f (r, s) = f r (r) f s (s), (3248) f r (r s) = f r (r) and f s (s r) = f s (s), (3249) ie, the conditionals equal the marginals This means that the variable r is independent from the variable s and vice-versa Therefore, when the relation 3248 holds, it is then said that the two variables are independent Example 329 (This example has to be updated) Over the surface of the unit sphere, using geographical coordinates, we have the two displacement elements ds ϕ (ϕ, λ) = cos λ dϕ ; ds λ (ϕ, λ) = dλ, (3250) with the associated surface element (as the coordinates are orthogonal) ds(ϕ, λ) = cos λ dϕ dλ Consider a (2D) volumetric probability f (ϕ, λ) over the surface of the sphere, normed under the usual condition ds(ϕ, λ) f (ϕ, λ) = surface +π dϕ π +π/2 dλ cos λ f (ϕ, λ) = π/2 +π/2 dλ cos λ π/2 +π dϕ f (ϕ, λ) = 1 π (3251)

150 140 Probabilities One may define the partial integrations η ϕ (ϕ) = +π/2 dλ cos λ f (ϕ, λ) ; η λ (λ) = π/2 +π dϕ f (ϕ, λ), (3252) π so that the probability of a sector between two meridians and of an annulus between two parallels are respectively computed as ϕ2 P(ϕ 1 < ϕ < ϕ 2 ) = dϕ η ϕ (ϕ) ; P(λ 1 < λ < λ 2 ) = dλ cos λ η λ (λ), ϕ 1 λ 1 (3253) but the terms dϕ and cos λ dλ appearing in these two expressions are not the displacement elements on the sphere s surface (equation 3250) The functions η ϕ (ϕ) and η λ (λ) should not be mistaken as marginal volumetric probabilities: as the surface of the sphere is not the Cartesian product of two 1D spaces, marginal volumetric probabilities are not defined λ2

151 Chapter 4 Examples Bla, bla, bla

152 142 Examples 41 Homogeneous Probability for Elastic Parameters In this appendix, we start from the assumption that the uncompressibility modulus and the shear modulus are Jeffreys parameters (they are the eigenvalues of the stiffness tensor c i jkl ), and find the expression of the homogeneous probability density for other sets of elastic parameters, like the set { Young s modulus - Poisson ratio } or the set { Longitudinal wave velocity - Tranverse wave velocity } Uncompressibility Modulus and Shear Modulus The Cartesian parameters of elastic theory are the logarithm of the uncompressibility modulus and the logarithm of the shear modulus κ = log κ κ 0 ; µ = log µ µ 0, (41) where κ 0 and µ 0 are two arbitrary constants The homogeneous probability density is just constant for these parameters (a constant that we set arbitrarily to one) f κ µ (κ, µ ) = 1 (42) As is often the case for homogeneous probability densities, f κ µ (κ, µ ) is not normalizable Using the jacobian rule, it is easy to transform this probability density into the equivalent one for the positive parameters themselves f κµ (κ, µ) = 1 κ µ (43) This 1/x form of the probability density remains invariant if we take any power of κ and of µ In particular, if instead of using the uncompressibility κ we use the compressibility γ = 1/κ, the Jacobian rule simply gives f γµ (γ, µ) = 1/(γ µ) Associated to the probability density 42 there is the Euclidean definition of distance ds 2 = (dκ ) 2 + (dµ ) 2, (44) that corresponds, in the variables (κ, µ), to ds 2 = ( ) dκ 2 + κ ( ) dµ 2, (45) µ ie, to the metric ( ) gκκ g κµ g µκ g µµ = ( 1/κ 2 0 ) 0 1/µ 2 (46)

153 41 Homogeneous Probability for Elastic Parameters 143 Young Modulus and Poisson Ratio The Young modulus Y and the Poisson ration σ can be expressed as a function of the uncompressibility modulus and the shear modulus as Y = 9κ µ 3κ + µ ; σ = 1 2 3κ 2µ 3κ + µ (47) or, reciprocally, κ = Y 3(1 2σ) ; µ = Y 2(1 + σ) (48) The absolute value of the Jacobian of the transformation is easily computed, J = Y 2(1 + σ) 2 (1 2σ) 2, (49) and the Jacobian rule transforms the probability density 43 into f Yσ (Y,σ) = 1 κ µ J = 3 Y (1 + σ)(1 2σ), (410) which is the probability density representing the homogeneous probability distribution for elastic parameters using the variables (Y,σ) This probability density is the product of the probability density 1/Y for the Young modulus and the probability density 3 g(σ) = (411) (1 + σ)(1 2σ) for the Poisson ratio This probability density is represented in figure 41 From the definition of σ it can be demonstrated that its values must range in the interval 1 < σ < 1/2, and we see that the homogeneous probability density is singular at these points Although most rocks have positive values of the Poisson ratio, there are materials where σ is negative (eg, Yeganeh-Haeri et al, 1992) Figure 41: The homogeneous probability density for the Poisson ratio, as deduced from the condition that the uncompressibility and the shear modulus are Jeffreys parameters Poisson's ratio It may be surprising that the probability density in figure 41 corresponds to a homogeneous distribution If we have many samples of elastic materials, and if their

154 144 Examples logarithmic uncompressibility modulus κ and their logarithmic shear modulus µ have a constant probability density (what is the definition of homogeneous distribution of elastic materials), then, σ will be distributed according to the g(σ) of the figure To be complete, let us mention that in a change of variables x i x I, a metric g i j changes to g IJ = Λ i I Λ j J g i j = xi x j x I x J g i j (412) The metric 45 then transforms into ( ) gyy g Yσ g σy g σσ The surface element is = ( 2 Y 2 2 (1 2σ) Y 1 2 (1 2σ) Y 1 (1+σ) Y ds Yσ (Y,σ) = det g dy dσ = (1+σ) Y (1 2σ) 2 (1+σ) 2 3 dy dσ Y (1 + σ)(1 2σ) ) (413), (414) a result from which expression 410 can be inferred Although the Poisson ratio has a historical interest, it is not a simple parameter, as shown by its theoretical bounds 1 < σ < 1/2, or the form of the homogeneous probability density (figure 41) In fact, the Poisson ratio σ depends only on the ratio κ/µ (incompressibility modulus over shear modulus), as we have 1 + σ 1 2σ = 3 κ 2 µ (415) The ratio J = κ/µ of two Jeffreys parameters being a Jeffreys parameter, a useful pair of Jeffreys parameters may be {κ, J} The ratio J = κ/µ has a physical interpretation easy to grasp (as the ratio between the uncompressibility and the shear modulus), and should be preferred, in theoretical developments, to the Poisson ratio, as it has simpler theoretical properties Longitudinal and Transverse Wave Velocities Equation 43 gives the probability density representing the homogeneous probability distribution of elastic media, when parameterized by the uncompressibility modulus and the shear modulus: f κµ (κ, µ) = 1 κ µ (416) Should we have been interested, in addition, to the mass density ρ, then we would have arrived (as ρ is another Jeffreys parameter), to the probability density f κµρ (κ, µ, ρ) = 1 κ µ ρ (417)

155 41 Homogeneous Probability for Elastic Parameters 145 This is the starting point for this section What about the probability density representing the homogeneous probability distribution of elastic materials when we use as parameters the mass density and the two wave velocities? The longitudinal wave velocity α and the shear wave velocity β are related to the uncompressibility modulus κ and the shear modulus µ through α = κ + 4µ/3 ρ ; β = µ ρ, (418) and a direct use of the Jacobian rule transforms the probability density 417 into f αβρ (α, β, ρ) = 1 ( ) (419) ρα β 34 β2 α 2 which is the answer to our question That this function becomes singular for α = 2 3 β is just due to the fact that the boundary α = 2 3 β can not be crossed: the fundamental inequalities κ > 0 ; µ > 0 impose that the two velocities are linked by the inequality constraint α > 2 3 β (420) Let us focus for a moment on the homogeneous probability density for the two wave velocities (α, β) existing in an elastic solid (disregard here the mass density ρ ) We have 1 f αβ (α, β) = ( ) (421) α β 34 β2 α 2 It is displayed in figure 42 Figure 42: The joint homogeneous probability density for the velocities (α, β) of the longitudinal and transverse waves propagating in an elastic solid Contrary to the incompressibility and the shear modulus, that are independent parameters, the longitudinal wave velocity and the transversal wave velocity are not independent (see text for an explanation) The scales for the velocities are unimportant: it is possible to multiply the two velocity scales by any factor without modifying the form of the probability (which is itself defined up to a multiplicative constant) β 0 α

156 146 Examples Let us demonstrate that the marginal probability density for both α and β is of the form 1/x For we have to compute and f α (α) = f β (β) = 3α/ β/ 3 dβ f (α, β) (422) dα f (α, β) (423) (the bounds of integration can easily be understood by a look at figure 42) These integrals can be evaluated as 1 ε ( 3α/2 4 f α (α) = lim dβ f (α, β) = lim ε 0 ε 3α/2 ε 0 3 log 1 ε ) 1 ε α (424) and 2 β/( ε 3) ( 2 f β (β) = lim dα f (α, β) = lim ε 0 1+ε 2 β/ 3 ε 0 3 ) 1/ε 1 1 log ε β (425) The numerical factors tend to infinity, but this is only one more manifestation of the fact that the homogeneous probability densities are usually improper (not normalizable) Dropping these numerical factors gives and f α (α) = 1/α (426) f β (β) = 1/β (427) It is interesting to note that we have here an example where two parameters that look like Jeffreys parameters, but are not, because they are not independent (the homogeneous joint probability density is not the product of the homogeneous marginal probability densities) It is also worth to know that using slownesses instead of velocities ( n = 1/α, η = 1/β ) leads, as one would expect, to f nηρ (n, η, ρ) = 1 ( ) (428) ρ n η 34 n2 η 2

157 42 Measuring a One-Dimensional Strain (I) Measuring a One-Dimensional Strain (I) A one-dimensional material medium with an initial length X is deformed, and its length becomes Y The strain that has affected the medium, denoted ε, is defined as ε = log Y X (429) A measurement of X and Y provides the information represented by a probability density f (X, Y) This induces an information on the actual value of the strain, that is represented by a probability density g(ε) The problem is to express g ε (ε) using as inputs the definition 429 and the probability density f (X, Y) We need to introduce a slack variable, say Z, in order to be able to change from the pair {X, Y} into a pair {ε, Z} We have a total freedom for the choice of Z I choose here the (geometric) average length Z = X Y The change of variables is, then, ε = log(y/x) Z = X Y ( X ε Y ε X Z Y Z X = Z e ε/2 Y = Z e ε/2 (430) The probability density for the new variables, say g(ε, Z), can be obtained using the Jacobian rule: ) g(ε, Z) = f (X, Y) det = f (X, Y) X Y (431) Z Replacing X and Y by their expressions in terms of ε and Z gives The probability density for ε is then g(ε, Z) = Z f ( Z e ε/2, Z e ε/2 ) (432) g ε (ε) = 0 dz g(ε, Z) = 0 dz Z f ( Z e ε/2, Z e ε/2 ) (433) Should we be interested in Z instead of ε we would evaluate g Z (Z) = + + dε g(ε, Z) = Z dε f ( Z e ε/2, Z e ε/2 ) (434) As an example, assume that our measurement of the initial length X and of the final length Y has produced an independent information on X and Y that can be modeled by the product of two log-normal functions 1 : f (X, Y) = LN(X, X 0,σ x ) LN(Y, Y 0,σ y ), (435) 1 The log-normal model is acceptable, as X and Y are Jeffreys quantities

158 148 Examples where LN(R, R 0,σ) 1 1 ( 2π σ R exp 1 ( 1 2 σ log R ) 2 ) R 0 (436) Here, X 0 and Y 0 are respectively the center of the probability distributions of X and Y, and σ x and σ y are respectively the standard deviations of the associated logarithmic variables Then, using equation 433 one arrives at where g ε (ε) = 1 2π σε ( exp 1 2 ε 0 = log Y 0 X 0 ; σ ε = (ε ε 0 ) 2 ) σ 2 ε (437) σx 2 + σy 2 (438) This is a normal probability density, centered at ε 0 = log(y 0 /X 0 ), that is the strain that would be computed from the central values of X and Y The standard deviation on the variable ε is σ ε = σx 2 + σy 2 (remember that σ x and σ y are the standard deviations associated to the logarithms of X and Y Using equation 434 one arrives at g Z (Z) = 1 1 ( 2π σz Z exp 1 ( 1 log Z ) 2 ) 2 σ z Z 0 (439) where Z 0 = X 0 Y 0 ; σ Z = σx 2 + σy 2 / 2 (440) This is a lognormal probability density, centered at Z 0 = X 0 Y 0 Remember that Z was defined as the (geometric) average of X and Y, so it is quite reasonable that the Z 0, the average of Z, equals the average of X 0 (that is the average of X ) and Y 0 (that is the average of Y ) The standard deviation of the logarithmic variable associated with Z is σ Z = σx 2 + σy 2 / 2 This example has been treated using probability densities only To pass from probability densities to volumetric probabilities we can introduce a metric in the {X, Y} manifold As X and Y are Jeffreys quantities we can select the metric ds 2 = (dx/x) 2 + (dy/y) 2 This leads, for the quantities {ε, Z}, to the metric ds 2 = 1 2 dε2 + 2 (dz/z) 2

159 42 Measuring a One-Dimensional Strain (I) Measuring a One-Dimensional Strain (II) A one-dimensional material medium with an initial length X is deformed into a second state, where its length is Y The strain that has affected the medium, denoted ε, is defined as ε = log Y X (441) A measurement of X and Y provides the information represented by a volumetric probability f r (Y, X) This induces an information on the actual value of the strain, that shall be represented by a volumetric probability f s (ε) The problem is to express f s (ε) using as inputs the definition 441 and the volumetric probability f r (Y, X) Let us introduce the two-dimensional data space R 2, over which the quantities X and Y are coordinates The lengths X and Y being Jeffreys quantities (see discussion in section XXX), we have, in the space R 2, the distance element ds 2 r = ( dy Y )2 + ( dx X )2, associated to the metric matrix ( ) 1 0 g r = Y 2 1 (442) 0 X 2 This, in particular, gives so the (2D) volume element over R 2 is dv r = dy dx Y X f r (Y, X) over R 2 is to be integrated via P r = det gr = 1 Y X, (443), and any volumetric probability dy dx 1 Y X f r(y, X), (444) over the appropriate bounds In particular, a volumetric probability f r (Y, X) is normalized if the integral over ( 0 < Y < ; 0 < X < ) equals one Let us also introduce the one-dimensional space of deformations S 1, over which the quantity ε is the chosen coordinate (one could as well chose the exponential of ε, or twice the strain as coordinate) The strain being an ordinary Cartesian coordinate, we have, in the space of deformations S 1 the distance element ds 2 s = dε 2, associated to the trivial metric matrix g s = (1) Therefore, det gs = 1 (445) The (1D) volume element over S 1 is dv s = dε, and any volumetric probability f s (ε) over S 1 is to be integrated via P s = dε f s (ε), (446) over given bounds A volumetric probability f s (ε) is normalized by the condition that the integral over ( < ε < + ) equals one As suggested in the general

160 150 Examples theory, we must change the coordinates in R 2 using as part of the coordinates those of S 1, ie, here, using the strain ε Then, arbitrarily, select X as second coordinate, so we pass in R 2 from the coordinates { Y, X } to the coordinates {ε, X } Then, the Jacobian matrix defined in equation?? is K = ( ) U V and we obtain, using the metric 442, = ( ε/ Y ) ε/ X X/ Y X/ X = ( ) 1/Y 1/X 0 1, (447) det K g 1 r K t = X (448) Noting that the expression 441 can trivially be solved for Y as Y = X expε, (449) everything is ready now to attack the problem If a measurement of X and Y has produced the information represented by the volumetric probability f r (Y, X), this transports into a volumetric probability f s (ε) that is given by equation?? Using the particular expressions 445, 448 and 449 this gives f s (ε) = 0 dx 1 X f r( X expε, X ) (450) Example 41 In the context of the previous example, assume that the measurement of the two lengths X and Y has provided an information on their actual values that: (i) has independent uncertainties and (ii) is Gaussian (which, as indicated in section 53132, means that the dependence of the volumetric probability on the Jeffreys quantities X and Y is expressed by the lognormal function) Then we have and f X (X) = f Y (Y) = 1 2π sx 1 2π sy exp exp ( ( 1 2 s 2 x 1 2 s 2 Y (log XX0 ) 2 ) (log YY0 ) 2 ), (451) (452) f r (Y, X) = f Y (Y) f X (X) (453) The volumetric probability for X is centered at point X 0, with standard deviation s X, and the volumetric probability for Y is centered at point Y 0, with standard deviation s Y (see section?? for a precise invariant definition of standard deviation) In this simple

161 42 Measuring a One-Dimensional Strain (I) 151 example, the integration in equation 450 can be performed analytically, and one obtains a Gaussian probability distribution for the strain, represented by the normal function f s (ε) = 1 2π sε exp ( (ε ε 0) 2 ) 2 sε 2, (454) where ε 0, the center of the probability distribution for the strain, equals the logarithm of the ratio of the centers of the probability distributions for the lengths, ε 0 = log Y 0 X 0, (455) and where s 2 ε, the variance of the probability distribution for the strain, equals the sum of the variances of the probability distributions for the lengths, s 2 ε = s 2 X + s2 Y (456)

162 152 Examples 43 Free-Fall of an Object The one-dimensional free fall of an object (under the force of gravity) is given by the expression x = x 0 + v 0 t g t2 (457) (note: explain the assumptions and what the variables are) The acceleration of gravity is assumed to have a fixed value (for instance, g = 981 m/s 2 ) A firecracker is dropped that has been prepared with random values of initial position x 0, of initial velocity v 0, and flying time t, and we are interested in the position x at which it will explode Assume the random values of {x 0, v 0, t} have been generated according to some probability density f (x 0, v 0, t) Which is the probability density of the quantity x? This is a typical problem of transport of probabilities Here we transport a probability distribution defined in a three-dimensional manifold into a one-dimensional manifold We start by introducing two slack quantities that, together with x, will form a three-dimensional set Among the infinitely many possible choices, let us take the quantities ω and τ defined through the following change of variables x = x 0 + v 0 t g t2 ω = v 0 τ = t x 0 = x ω τ 1 2 g τ 2 v 0 = ω t = τ, (458) ie, the quantity ω is, in fact, identical to the initial velocity v 0, and the quantity τ is identical to the falling time t We can now apply the Jacobian rule to transform the probability density f (x 0, v 0, t) into a probability density g(x, ω, τ) : g(x, ω, τ) = f (x 0, v 0, t) det x 0 x v 0 x t x x 0 ω v 0 ω t ω x 0 τ v 0 τ t τ (459) Because of the particular variables ω and τ chosen, the Jacobian determinant just equals 1, and, therefore, we simply have g(x, ω, τ) = f (x 0, v 0, t) It is understood in this expression that the three variables {x 0, v 0, t} have to be replaced, in the function f, by their expression in terms of the three variables {x, ω, τ} (in the right of formula 458), so we could write, more explicitly, g(x, ω, τ) = f ( x ω τ 1 2 g τ 2, ω, τ ) (460) The probability density we were seeking can now be obtained by integration: g x (x) = dω dτ g(x, ω, τ) Explicitly, g x (x) = dω dτ f ( x ω τ 1 2 g τ 2, ω, τ ) (461)

163 43 Free-Fall of an Object 153 As an example, assume that the three variables {x 0, v 0, t} in the probability density f are independent and, furthermore, that the probability distributions of x 0 and v 0 are normal: Here, f (x 0, v 0, t) = N(x 0, X 0,σ x ) N(v 0, V 0,σ v ) h(t) (462) N(u, U,σ) 1 ( exp 1 2π σ 2 (u U) 2 σ 2 ) (463) This means that we assume that the variable x 0 is centered at the value X 0 with a standard deviation σ x that the variable v 0 is centered at the value V 0 with a standard deviation σ v and that the variable t has an arbitrary probability density h(t) The computation is just a matter of replacing the expression 462 into 461, and invoking a good mathematical software to perform the analytical integrations In fact, it is better to do this in steps First, in equation 462 we replace the variables {x 0, v 0, t} by their expressions (at the right in equation 458) in terms of the variables {x, ω, τ}, and we input the result in equation 460, to obtain the explicit expression for g(x, ω, τ) We then first evaluate g(x, τ) = dω g(x, ω, τ), (464) and, then, g x (x) = My mathematical software produces the result 2 dτ g(x, τ) (465) g(x, τ) = f (τ) 1 ( exp 1 2π σ(τ) 2 (x x(τ)) 2 σ(τ) 2 ), (466) where x(τ) = X 0 + V 0 τ g τ 2 ; σ(τ) = σ 2 x + σ 2 v τ 2 (467) Then, g x (x) is obtained by evaluation of the integral in equation 465 Note that x(τ) is the position the firecracker would have at time τ if its initial position was X 0 (the mean value of the distribution of x 0 ) and if its initial velocity was V 0 (the mean value of the distribution of v 0 ) Note also that the standard deviation σ(τ) increases with time (as the result of the uncertainty in the initial velocity v 0 ) 2 Unfortunately, at the time of this computation, the integral is well evaluated by the software (mathematica), but the simplification of the result has still to be made by hand

164 154 Examples 44 Measure of Poisson s Ratio Hooke s Law in Isotropic Media For an elastic medium, in the limit of infinitesimal strains (Hooke s law), σ i j = c i jkl ε kl, (468) where c i jkl is the stiffness tensor If the elastic medium is isotropic, c i jkl = λ κ 3 g i j g kl + λ µ ( gik g 2 jl + g il g jk 2 3 g ) i j g kl, (469) where λ κ (with multiplicity one) and λ µ (with multiplicity five) are the two eigenvalues of the stiffness tensor c i jkl They are related to the common umcompressibility modulus κ and shear modulus µ through κ = λ κ /3 ; µ = λ µ /2 (470) The Hooke s law 468 can, alternatively, be written ε i j = d i jkl σ kl, (471) where d i jkl, the inverse of the stiffness tensor, is called the compliance tensor If the elastic medium is isotropic, d i jkl = γ 3 g i j g kl + ϕ 2 ( gik g jl + g il g jk 2 3 g i j g kl ), (472) where γ (with multiplicity one) and ϕ (with multiplicity five) are the two eigenvalues of d i jkl These are, of course, the inverse of the eigenvalues of c i jkl : γ = 1 λ κ = 1 3κ ; ϕ = 1 λ µ = 1 2 µ (473) From now on, I shall call γ the eigencompressibility or, if there is no risk of confusion with 1/κ, the compressibility The quantitity ϕ shall be called the eigenshearability or, if there is no risk of confusion with 1/µ, the shearability With the isotropic stiffness tensor of equation 469, the Hooke s law 468 becomes σ i j = λ κ 3 g i j ε k k + λ µ ( εi j 1 3 g i j ε k k ), (474) or, equivalently, with the isotropic compliance tensor of equation 472, the Hooke s law 471 becomes ε i j = γ 3 g i j σ k k + ϕ ( σ i j 1 3 g i j σ k k ) (475)

165 44 Measure of Poisson s Ratio 155 Definition of the Poisson s Ratio Consider the experimental arrangement of figure 43, where an elastic medium is submitted to the (homogeneous) uniaxial stress (using Cartesian coordinates) σ xx = σ yy = σ xy = σ yz = σ zx = 0 ; σ zz = 0 (476) Then, the Hooke s law 471 predicts the strain ε xx = ε yy = 1 3 (γ ϕ)σ zz ε zz = 1 3 (γ + 2ϕ)σ zz σ xy = σ yz = σ zx = 0 The Young modulus Y and the Poisson ratio ν are defined as Y = σ zz ; ν = ε xx ε zz ε zz and equation 477 gives Y = with reciprocal relations 3 2ϕ + γ γ = 1 2 ν Y (477) = ε yy ε zz, (478) ; ν = ϕ γ 2ϕ + γ, (479) ; ϕ = 1 + ν Y (480) Figure 43: A possible experimental setup for measuring the Young modulus and the Poisson ratio of an elastic medium The measurement of the force F of the bar length Z and of the bar diameter X allows to estimate the two elastic parameters Details below X Z F Note that when γ and ϕ take values inside their natural range the variation of Y and ν is 0 < γ < ; 0 < ϕ <, (481) 0 < Y < ; 1 < ν < +1/2 (482) Although most materials have positive values of the Poisson ratio ν, there are materials where it is negative (see figures 44 and 45) The Poisson ratio has mainly a historical interest Note that a simple function of it would have given a bona fide Jeffreys quantity, J = with the natural domain of variation 0 < J < 1 + ν 1 2 ν = λ κ λ µ, (483)

166 156 Examples Figure 44: An example of a 2D elastic structure with a positive value of the Poisson ratio When imposing a stretching in one direction (the horizontal here), the elastic structure reacts contracting in the perpendicular direction Figure 45: An example of a 2D elastic structure with a negative value of the Poisson ratio When imposing a stretching in one direction (the horizontal here), the elastic structure reacts also stretching in the perpendicular direction The Parameters Although one may be interested in the Young modulus Y and the Poisson ratio ν, we may choose to measure the compressibility γ = 1/λ κ and the shearability ϕ = 1/λ µ Any information we may need on Y and ν can be obtained, as usual, through the change of variables From the two first equations in expression 477 it follows that the relation between the elastic parameters γ and ϕ, the stress and the strains is γ = ε zz + 2ε xx σ zz ; ϕ = ε zz ε xx σ zz (484) As the uniaxial tress is generated by a force F applied to one of the ends of the bar (and the reaction force of the support), σ zz = F s, (485) where s, the section of the bar, is s = π X2 (486) 4 The most general definition of strain (that does not assume the strains to be small) is ε xx = log X ; ε zz = log Z, (487) X 0 Z 0 where X 0 and Z 0 are the initial lengths (see figure 43) and X and Z are the final lengths We have then the final relation γ = π ( ) X2 log Z/Z log X/X 0 ; ϕ = π ( ) X2 log Z/Z 0 log X/X 0 4 F 4 F (488)

167 44 Measure of Poisson s Ratio 157 When necessary, these two expressions shall be written γ = γ(x 0, Z 0, X, Z, F) ; ϕ = ϕ(x 0, Z 0, X, Z, F) (489) We shall later need to extract from these relations the two parameters X 0 and Z 0 : ( ) ( ) 4 F (γ ϕ) 4 F (γ + 2ϕ) X 0 = X exp 3 π X 2 ; Z 0 = Z exp 3 π X 2, (490) expressions that, when necessary, shall be written X 0 = X 0 (γ,ϕ, X, Z, F) ; Z 0 = Z 0 (γ,ϕ, X, Z, F) (491) The Partial Derivatives The variables me measure are r = {X 0, Z 0, X, Z, F}, (492) while we are interested in the two variables {γ,ϕ} In order to have a set of file variables, we take s = {γ,ϕ, X, Z, F }, (493) where X = X ; Z = Z ; F = F (494) The relation s = s(r) corresponds to these three identities plus the two relations 488 We can then introduce the (inverse) matrix of partial derivatives γ/ X 0 γ/ Z 0 γ/ X γ/ Z γ/ F J 1 ϕ/ X 0 ϕ/ Z 0 ϕ/ X ϕ/ Z ϕ/ F = X/ X 0 X/ Z 0 X/ X X/ Z X/ F Z/ X 0 Z/ Z 0 Z/ X Z/ Z Z/ F, (495) F/ X 0 F/ Z 0 F/ X F/ Z F/ F to easily obtain J = 16 F2 X 0 Z 0 3 π 2 X 4 (496) The Measurement We measure {X 0, Z 0, X, Z, F} and describe the result of our measurement via a probability density f (X 0, Z 0, X, Z, F) (497) [Note: Explain this]

168 158 Examples Transportation of the Probability Distribution To obtain the probability density in the variables {γ,ϕ}, we just apply equations???? With the present notations this gives g(γ,ϕ) = π 2 dx dz df 0 0 F 2 X 0 Z 0 X 4 f (X 0, Z 0, X, Z, F) }{{} X 0 =X 0 (γ,ϕ,x,z,f) ; Z 0 =Z 0 (γ,ϕ,x,z,f), (498) where the functions X 0 = X 0 (γ,ϕ, X, Z, F) and Z 0 = Z 0 (γ,ϕ, X, Z, F) are those expressed by equations The two associated marginal probability densities are, then, g γ (γ) = 0 dϕ g(γ,ϕ) and g ϕ (ϕ) = 0 dγ g(γ,ϕ) (499) As γ and ϕ are Jeffreys quantities, we can easily transform these probability densities into volumetric probabilities One has g γ (γ) = 0 g(γ,ϕ) = γ ϕ g(γ,ϕ), (4100) dϕ ϕ g(γ,ϕ) and g ϕ(ϕ) = 0 dγ γ g(γ,ϕ) (4101) To represent the results is better to use the Cartesian parameters of the problem [note: explain] Here, the logarithmic parameters γ = log γ γ 0 ϕ = log ϕ ϕ 0, (4102) where γ 0 and ϕ 0 are two arbitray constants having the dimension of a compliance are Cartesian coordinates over the 2D space of elastic (isotropic) media As volumetric probabilities are invariant, we simply have h(γ,ϕ ) = g(γ,ϕ) γ = γ0 exp γ ; ϕ =ϕ 0 expϕ (4103) Numerical Illustration Let us use the notations N(u, u 0, s) and L(U, U 0, s) respectively for the normal and the lognormal probability densities N(u, u 0, s) = L(U, U 0, s) = 1 ( exp (u u 0) 2 ) 2π s 2 s ( 2π s U exp 1 ( 2 s 2 log U ) 2 ) U 0 (4104)

169 44 Measure of Poisson s Ratio 159 Asume that the result of the measurement of the quantities X 0, Z 0 (initial diameter and length of the bar), X, Z (final diameter and length of the bar), and the force F, has given an information that can be represented by a probability desity with independent uncertainties, f (X, X 0, Z, Z 0, F) = L(X 0, X obs 0, s X0 ) L(Z 0, Z obs 0, s Z0 ) L(X, X obs, s X ) L(Z, Z obs, s Z ) N(F, F obs, s F ), (4105) with the numerical values X0 obs = 1000 m ; s X0 = 0015 Z0 obs = 1000 m ; s Z0 = 0015 X obs = 0975 m ; s X = 0015 Z obs = 1105 m ; s Z = 0015 F obs = 981 kg m/s 2 ; s F 0 This is the probability density that appears at the right of equation 498 To simplify the example I have assumed that the uncertainty on the force F is much smaller than the other uncertainties, so, in fact, F can be treated as a constant Figure 46 displays the four (marginal) one-dimensional lognormal probability densities (with the small uncertainties chosen, the lognormal probability densities in 4105 visually appear as normal probability densities) To illustrate how the uncertaintiers in the measurement of the lengths propagate into uncertainties in the elastic parameters, I have chosen the quite unrealistic example where the uncertainties in X and X 0 overlap: it is likely that the diameter of the rod has decreased (so the Poisson ratio is positive) but the probability that it has increased (negative Poisson ratio) is significant In fact, as we shall see, the measurement don t even exclude the virtuality of negative elastis parameters γ and ϕ (this possibility being excluded by the elastic theory that in included in the present formulation) Figure 46: The four 1D marginal volumetic probabilitities for the initial and final lengths Note that the uncertainties in X and X 0 overlap: it is likely that the diameter of the rod has decreased (so the Poisson ratio is positive) but the probability X X 0 diameter 1 11 length Z 0 Z that it has increased (negative Poisson ratio) is significant 1 11 Figure 47 represents the volumetric probability h(γ,ϕ ) defined by equations 498, 4103 and?? It represents the information that the measurements of the length has given on the elastic parameters γ and ϕ [Note: Explain this better] [Note: Explain that negative values of γ and ϕ are excluded by hand ] The two associated marginal volumetric probabilities are defined in equations 4101, and are represented in figure 48 Note: mention here figure 49

160 Examples ϕ = log ϕ Q -4-5 -6-10 -8-6 -4 γ = log γ Q ϕ = log ϕ Q -4-5 -6-10 ( Q = 1N/m 2 ) -8-6 -4 γ = log γ Q Figure 47: The (2D) volumetric probability for the compressibility γ and the

170 160 Examples ϕ = log ϕ Q γ = log γ Q ϕ = log ϕ Q ( Q = 1N/m 2 ) γ = log γ Q Figure 47: The (2D) volumetric probability for the compressibility γ and the shearability ϕ, as induced from the measurement results At the left a direct representation of the volumetric probability defined by equation 498 and 4103 At the right, a Monte Carlo simulation of the measurement (see section??) Here, natural logarithms are used, and Q = 1 N/m 2 Of the 3000 points used, 9 falled at the left and 7 below the domain plotted, and are not represented The zone of nonvanishing probability extends over all the space, and only the level lines automatically proposed by the plotting software have been used Figure 48: The marginal (1D) volumetric probabilities defined by equations γ = log γ Q ϕ = log ϕ Q Log[X /k] = Log[X/k] = 0094 Log[X/k] = Log[X /k] = Log[X/k] = 0094 Log[X/k] = Figure 49: The marginal probability distributions for the lengths X and X 0 At the left, a Monte Carlo sampling of the probability distribution for X as X 0 defined by equation 4105 (the values Z and Z 0 are also sampled, but are not shown) At the right, the same Monte Carlo sampling, but where only the points that correspond, through equation 488, to positive values of γ and ϕ (and, thus, acceptable by the theory of elastic media) Note that many of the points behind the diagonal bar have been suppressed

NOTES ON DIFFERENTIAL FORMS. PART 3: TENSORS

NOTES ON DIFFERENTIAL FORMS. PART 3: TENSORS 1. What is a tensor? Let V be a finite-dimensional vector space. 1 It could be R n, it could be the tangent space to a manifold at a point, or it could just