Probability Density versus Volumetric Probability lbert Tarantola September 0, 00 1 Probability Density (the Standard Definition) Consider, in the Euclidean plane, a unit radius circle, endowed with cylindrical coordinates {r, ϕ} This space has been divided in small cells by taking constant radius increments r and constant angle increments ϕ (see figure 1) The cells have different surfaces 1, but one says that, for the coordinates being used, they have equal capacity c = r ϕ (this only means that the coordinate increments are all identical) / Figure 1: circle, some coordinate lines, and some random points ϕ = 3π/ ssume that some process generates independent random points on the circle, with a given probability distribution (a few of these points have been suggested in the figure) When enough points have been generated, we can make a histogram, counting the number of points inside each cell If the histogram is represented by building a prism on top of each cell, the height of the prism representing the number of points in each of the cells, in the 1 The surface of the cell going from the point {r, ϕ} to the point {r + r, ϕ + ϕ} is S r r ϕ (that is the expression of the surface elements in polar coordinates) When the realization of a point depend on where the previous points are located, the reasoning below is not necessarily valid 1
limit of very small cells and very large number of points we obtain a view of the probability density f(r, ϕ) representing the random process More precisely, letting n be the number of points inside the cell at point {r, ϕ}, the probability density f(r, ϕ) is defined, when the total number of points tends to infinity, as f(r, ϕ) = lim r 0, ϕ 0 n r ϕ (1) Once the probability density f(r, ϕ) is known, a direct use of this definition, and of the definition of integral sum, shows that we can evaluate the probability P for the next point to materialize inside some domain inside the circle as P () = dr dϕ f(r, ϕ) () This is the expression usually taken taken to define the probability density More generally, in an abstract space (or, more technically, in a manifold ) with some arbitrary coordinates {x 1, x,, x n }, the probability of a domain is, by definition of probability density, P () = dx 1 dx dx n f(x 1, x,, x n ) (3) Because of this particular way of defining a probability density, that is fundamentally associated to a given coordinate system {x 1, x,, x n }, if one needs to evaluate the probability density (for the same random process) using some other coordinate system {y 1, y,, y n } (this means that we choose to analyze the random process using some other variables), the new probability density g(y 1, y,, y n ) is related to the previous one through the Jacobian rule x 1 x 1 y g(y 1, y,, y n ) = f(x 1, x,, x n 1 y n ) det () x n x n y 1 y n Warning: This formula is written at a particular point P This given point P has the coordinates {x 1, x,, x n } in the first coordinate system, and the coordinates {y 1, y,, y n } in the second coordinate system What this equation says is that the values of the two functions f and g evaluated at a given point P are not identical: their ratio equals the value of the Jacobian determinant at point P
lternative Definition Could one have done things differently? Yes Instead of dividing the circle in cells with constant capacity element c = r ϕ one could have used cells with constant surface S r r ϕ, as suggested in figure Making an histogram, and taking the limit for infinitely small cells, would give another function, say Φ(r, ϕ), that we will refrain from calling a probability density 3 With this definition, the probability of a domain is computed via P () = ds(r, ϕ) Φ(r, ϕ) ; ds(r, ϕ) = r dr dϕ, (5) an expression to be compared with expression / Figure : nother division of the circle in cells using polar coordinates Contrary to the division in cells made in figure 1, this time the cells have equal surface ϕ = 3π/ More generally, in an abstract space where the notion of volume makes sense, where some coordinates {x 1, x,, x n } have been chosen, and where one denotes the volume element as dv (x 1, x,, x n ), the probability of a domain would be calculated as P () = dv (x 1, x,, x n ) Φ(x 1, x,, x n ), (6) an expression to be compared with expression 3 With this definition, the Jacobian rule mentioned above is not valid Had one used some other variables {y 1, y,, y n }, one would have obtained the function Ω(y 1, y,, y n ) The relation between the two functions would here simply be Ω(y 1, y,, y n ) = Φ(x 1, x,, x n ) (7) 3 It could be called a volumetric probability ( volumetric because volume is the term used in any number of dimensions of what in two dimensions is a surface ) 3
The same warning made before applies here: This formula is written at a particular point P This given point P has the coordinates {x 1, x,, x n } in the first coordinate system, and the coordinates {y 1, y,, y n } in the second coordinate system This equation is not saying that the two functions are identical (in their dependence on their variables) What this equation says is that the values of the two functions Φ and Ω evaluated at a given point P are identical (they are true scalars) More explicitly, one should write Ω(y 1 (x 1,, x n ),, y n (x 1,, x n )) = Φ(x 1,, x n ) The definitions in this section may seem better than those in the previous section The problem is that they are only possible because the space we are considering has a notion of volume In the abstract spaces we face in the use of probability theory, where the variables {x 1,, x n } are physical quantities (masses, times, thicknesses, electric conductivities, etc) it may or it may not be possible to introduce a notion of volume It is to avoid this question that the founding fathers of probability theory chosed to introduce the notion of probability density In all books written for physicists, the Jacobian rule is assumed Therefore it is the definitions of the previous section that the student should keep in mind 3 Morality If you have a probability density in spherical coordinates, say f(r, θ, ϕ), where the volume element is dv (r, θ, ϕ) = r sin θ dr dθ dϕ, forget about it, as you should never compute the probability of a domain via P () = dv (r, θ, ϕ) f(r, θ, ϕ) (wrong) (8) but via P () = bstract Variables dr dθ dϕ f(r, θ, ϕ) (right) (9) When one has a probability density f(u, v) depending on two variables {u, v} is it customary to use (for making histograms, and for representing the probability density itself) the two axis representation suggested in figure 3, as if the variables where Cartesian coordinates in an Euclidean space We will never understand a probability book written for mathematicians (who are not able to work with real life probabilities)
Figure 3: Usual representation of two variables {u, v} appearing in a probability density f(u, v) - - 0 - - 0 u v This is reasonable, but the student should note that strict adherence to this rule would suggest to redraw figure 1 as suggested in figure Figure : n alternative presentation of the data (similar to thode in) in figure 1 The cells in this figure (that have constant capacity element c = r ϕ ), are the same as those in figure 1 - - 0 - - 0 r = 0 r = 1 r = 1/ 5