Probability Density (1) Let f(x 1, x 2... x n ) be a probability density for the variables {x 1, x 2... x n }. These variables can always be viewed as coordinates over an abstract space (a manifold ). The probability of a domain A is computed via P (A) = dx 1 dx 2... dx n f(x 1, x 2... x n ). A Even when there is a volume element dv (x 1, x 2... x n ) over the space, one should never integrate a probability density using P (A) = dv (x 1, x 2... x n ) f(x 1, x 2... x n ). A 1
Probability Density (2) When changing from the variables {x 1, x 2... x n } to some other variables {y 1, y 2... y n }, the probability distribution that was represented by the probability density f(x 1, x 2... x n ) is now represented by a probability density g(y 1, y 2... y n ), and one has (Jacobian rule) g(y 1, y 2... y n ) = f(x 1, x 2... x n ) J, where J is the absolute value of the determinant of the matrix of partial derivatives x 1 x 1 y 1 y n...... x n x n y 1 y n 2
Marginal Probability Density When the whole set of variables {x 1, x 2... x n } naturally separates into two groups of variables {u 1, u 2... u p } and {v 1, v 2... v q } (with p+q = n ), all the information concerning the variables {u 1, u 2... u p } alone is contained in the marginal probability density = all range f u (u 1, u 2... u p ) = dv 1... dv q f(u 1, u 2... u p, v 1, v 2... v q ). Similarly, all the information concerning the variables {v 1, v 2... v q } alone is contained in the marginal probability density f v (v 1, v 2... v q ) = = du 1... du p f(u 1, u 2... u p, v 1, v 2... v q ). all range 3
Warning This definition of marginal probability density is, excepted for some minor interpretation details, safe. The same is not true for the definition of conditional probability density. The simple definition one finds in most texts is usually overinterpreted, and leads to paradoxes, the most famous of all being the Borel paradox. 4
Reproduced from Kolmogorov s Foundations of the Theory of Probability (1950, pp. 50 51). 2. Explanation of a Borel Paradox Let us choose for our basic set E the set of all points on a spherical surface. Our F will be the aggregate of all Borel sets of the spherical surface. And finally, our P (A) is to be proportional to the measure set of A. Let us now choose two diametrically opposite points for our poles, so that each meridian circle will be uniquely defined by the longitude ψ, 0 ψ < π. Since ψ varies from 0 only to π, in other words, we are considering complete meridian circles (and not merely semicircles) the latitude θ must vary from π to +π (and not from π to + π ). Borel set the following problem: Required 2 2 to determine the conditional probability distribution of latitude θ, π θ < +π, for a given longitude ψ. It is easy to calculate that P ψ (θ 1 θ < θ 2 ) = 1 4 θ2 θ 1 cos θ dθ. The probability distribution of θ for a given ψ is not uniform. If we assume the the conditional probability distribution of θ with the hypothesis that ξ lies on the given meridian circle must be uniform, then we have arrived at a contradiction. This shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible. For we van obtain a probability distribution for θ on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface into meridian circles with the given poles. 5
Conditional Probability (Not yet conditional probability density) P (A B) = P (A B) P (B) 6
f(u v 0 ) = f(u, v 0 ) du f(u, v0 ) 7
f( u v = v(u) ) = f( u, v(u) ) du f( u, v(u) ) 8
The Borel Paradox an example of the danger of overinterpreting the usual definition of conditional probability density Arbitrary probability density over the sphere, using spherical coordinates: f(θ, ϕ). P (A) = dθ dϕ f(θ, ϕ) The homogeneous probability density: A f(θ, ϕ) = sin θ 4 π ; π dθ 2π 0 0 dϕ f(θ, ϕ) = 1 9
Marginal probability density for θ : f θ (θ) = 2π 0 dϕ f(θ, ϕ) = sin θ 2 Marginal probability density for ϕ : f ϕ (ϕ) = π 0 dθ f(θ, ϕ) = 1 2 π interpretation O.K. 10
A point P has materialized on the surface of the sphere, with homogeneous probability density, and we are told that it has materialized in the meridian defined by ϕ = ϕ 0. Which is the probability density for the colatitude θ? Conditional probability density for θ given ϕ = ϕ 0 : f θ ϕ (θ ϕ = ϕ 0 ) = f(θ, ϕ 0 ) dθ f(θ, ϕ 0 0) = sin θ 2 π 11
Rather than developing here the theory that is totally free from those inconsistencies (and to propose more general formulas for the conditional probability density), I choose to take the formulas above as they are, and give (later on) the precise conditions for their validity. (conditions that are not fulfilled in the Borel problem... ) 12
Bayes Theorem Some variables {u, v} = {u 1, u 2,..., v 1, v 2,... } Joint probability density: f(u, v) Marginal probability density: f v (v) = du f(u, v) Conditional probability density: f(u v 0 ) = f(u,v 0) R du f(u,v0 ) f(u v) = f(u,v) R du f(u,v) Using the definition of marginal probability density, the conditional probability density can be written f(u v) = f(u, v) f v (v) Therefore, f(u, v) = f(u v) f v (v) 13