Jointly continuous distributions and the multivariate Normal

Similar documents
Euler equations for multiple integrals

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Sturm-Liouville Theory

REAL ANALYSIS I HOMEWORK 5

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Diagonalization of Matrices Dr. E. Jacobs

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

Exam 2 Review Solutions

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

1. Aufgabenblatt zur Vorlesung Probability Theory

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Final Exam Study Guide and Practice Problems Solutions

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

PDE Notes, Lecture #11

Table of Common Derivatives By David Abraham

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

Math 342 Partial Differential Equations «Viktor Grigoryan

Linear First-Order Equations

Separation of Variables

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

LeChatelier Dynamics

Math 115 Section 018 Course Note

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Implicit Differentiation

Pure Further Mathematics 1. Revision Notes

Linear ODEs. Types of systems. Linear ODEs. Definition (Linear ODE) Linear ODEs. Existence of solutions to linear IVPs.

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

First Order Linear Differential Equations

ELEC3114 Control Systems 1

Physics 5153 Classical Mechanics. The Virial Theorem and The Poisson Bracket-1

6 General properties of an autonomous system of two first order ODE

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Calculus in the AP Physics C Course The Derivative

Optimization Notes. Note: Any material in red you will need to have memorized verbatim (more or less) for tests, quizzes, and the final exam.

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

2 Functions of random variables

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012

arxiv: v4 [math.pr] 27 Jul 2016

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

IMPLICIT DIFFERENTIATION

Schrödinger s equation.

Chapter 2. Exponential and Log functions. Contents

Function Spaces. 1 Hilbert Spaces

Many problems in physics, engineering, and chemistry fall in a general class of equations of the form. d dx. d dx

1 Definition of the derivative

The Multivariate Gaussian Distribution

Applications of the Wronskian to ordinary linear differential equations

MATH 56A: STOCHASTIC PROCESSES CHAPTER 3

Gaussian random variables inr n

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

Capacity Analysis of MIMO Systems with Unknown Channel State Information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

Math 1B, lecture 8: Integration by parts

Introduction to the Vlasov-Poisson system

Topic 7: Convergence of Random Variables

Laplace s Equation in Cylindrical Coordinates and Bessel s Equation (II)

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

arxiv: v1 [math.mg] 10 Apr 2018

Lower bounds on Locality Sensitive Hashing

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Introduction to Markov Processes

Math Chapter 2 Essentials of Calculus by James Stewart Prepared by Jason Gaddis

1. At time t = 0, the wave function of a free particle moving in a one-dimension is given by, ψ(x,0) = N

u!i = a T u = 0. Then S satisfies

Permanent vs. Determinant

Bivariate distributions characterized by one family of conditionals and conditional percentile or mode functions

Some Examples. Uniform motion. Poisson processes on the real line

Linear Algebra- Review And Beyond. Lecture 3

II. First variation of functionals

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Partial Differential Equations

Differentiation ( , 9.5)

Step 1. Analytic Properties of the Riemann zeta function [2 lectures]

Calculus I Sec 2 Practice Test Problems for Chapter 4 Page 1 of 10

Least-Squares Regression on Sparse Spaces

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Some functions and their derivatives

Topics in Probability and Statistics

2Algebraic ONLINE PAGE PROOFS. foundations

1 Lecture 20: Implicit differentiation

Qubit channels that achieve capacity with two states

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Markov Chains in Continuous Time

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

Chapter 2 Derivatives

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Lecture 4 : General Logarithms and Exponentials. a x = e x ln a, a > 0.

Calculus of Variations

Calculus of Variations

18.440: Lecture 28 Lectures Review

Logarithmic spurious regressions

2.1 Derivatives and Rates of Change

Designing Information Devices and Systems II Fall 2017 Note Theorem: Existence and Uniqueness of Solutions to Differential Equations

Transcription:

Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability ue to lack of time an prerequisites. Elementary linear algebra an multivariate calculus is require. asics of joint istributions We give the funamental tools to investigate the joint behaviour of several ranom variables. For most cases several will mean two, but everything can easily be generalise for any possibly countably infinite number of variables. Let X an Y be ranom variables. The most important object to escribe their joint behaviour is the following. Definition The joint istribution function of the ranom variables X an Y is Fx, y := P{X x, Y y. This function has an answer to every meaningful question one wants to know about the istribution of X an Y. For example, let a < b an c < be fixe real numbers. Then P{a < X b, c < Y = Fb, Fa, Fb, c+fa, c, which can be seen in the below picture after realising that Fx, y is the probability of our ranom point X, Y falling in the lower-left quarter with corner x, y. Looking for the probability of the upper-right non shae rectangle, we can exactly express that using the above combination. y c a b x Here is a formal erivation of the same thing: {a < X b, c < Y = {X > a {X b {Y > c {Y = {X > a {Y {Y > c {X b = {X > a {Y {Y > {Y {X > b {X b {Y > c {X b = {X b {Y {X > a {Y > {X > b {Y > c = {X b {Y {X a {Y c {X b {Y c c = {X b {Y [ {X a {Y {X b {Y c ] c = {X b {Y [ {X a {Y {X b {Y c ]. University of ristol / uapest University of Technology an Economics

As the subtracte set is subset of the previous one, inclusion exclusion gives P{a < X b, c < Y = P { {X b {Y P { {X a {Y {X b {Y c = P { {X b {Y [ P { {X a {Y +P { {X b {Y c P { {X a {Y c ] = Fb, Fa, Fb, c+fa, c. Now, every element of the σ-algebra generate by rectangles, that is, every orel measurable set has a probability that can be expresse in terms of F. Non-negativity of such probabilities give nontrivial conitions for a function F of two variables to make sure that this function is a istribution function. For more variables the situation is even more complicate. One of the funamental questions regaring joint istributions is the behaviour of one single variable: Definition The marginal istribution of the ranom variable X is F X x := P{X x. Here we o not care at all about Y. A similar efinition applies to the marginal of Y, when X is completely isregare. Proposition 3 The marginal istributions can be expresse in terms of the joint istribution as F X x = lim Fx, y, F y Yy = lim Fx, y. x It is very important to notice that marginal istributions o not contain enough information to restore the whole joint istribution. Proof. The events {Y n are non-ecreasing in n, an lim {Y n = {Y n = Ω. Therefore n n>0 F X x = P{X x = P { {X x lim {Y n = P { lim {X x {Y n n n = lim P{ {X x {Y n = lim Fx, y, n y as the intersection of the two events is also non-ecreasing in n, an finally we can make use of the monotonicity of F to change to the limit in the real values y from that of n s. The case of joint iscrete ranom variables is covere in the Probability slies. Here we concentrate on the jointly continuous case. Definition 4 The pair X, Y has jointly continuous istribution, if there exists a probability ensity function f of two variables such that for all C R measurable sets P{X, Y C = fx, yxy. It is important to notice that marginally continuous ranom variables are not necessarily jointly continuous. The pair X = Y E0, has a istribution that is concentrate on a line, an therefore cannot have a joint ensity function. A few simple propositions follow easily from the efinition: C Proposition 5 For all A, R measurable sets, P{X A, Y = A fx, yyx. In particular, the intuitive meaning of the joint ensity is emonstrate by Also, P{X a, a+ε, Y b, b+δ = a+ε a b+δ b fx, yyx fa, b εδ.

Proposition 6 Fa, b = P{X a, Y b = fa, b = Fa, b a b a b = Fa, b b a. fx, yyx, from which Proposition 7 The marginal ensity of the ranom variables X an Y, respectively, is f X x = Proof. For all measurable A R, fx, yy an f Y y = P{X A = P{X A, Y R = A fx, yx. fx, yyx, which can be compare to the efinition P{X A = of the marginal ensity to conclue the statement. Example 8 Let the joint ensity of X an Y be given by A f X xx fx, y = Determine the marginal istribution of Y. { e x/y e y /y, if x, y > 0, 0, otherwise. Let y > 0, as otherwise the marginal ensity is surely zero. Accoring to the above, f Y y = fx, yx = 0 e x/y e y /yx = e y. Therefore Y Exp, its marginal istribution function is F Y y = e y for positive y s, an zero otherwise. Observe that fining the marginal of X woul be much more troublesome. Transforming joint istributions In several imensions there are various cases of transforming variables. If we are after the istribution of a real-value function Z = gx, X or a pair X, X, then we can write F Z z = P{Z z = P{gX, X z = fx, x x x. x,x :gx,x z If, after integration, F Z is ifferentiable then the ensity f Z of Z can be obtaine this way. An example follows in Section 6. If we eal with an R -value function Y, Y = g X, X, g X, X of the pair X, X, we can still procee along the efinitions. For simplicity, we use a vector notation: F Y y = P{Y y, Y y = P{g X, X y, g X, X y = f X x, x x x. 3 x,x :g x,x y, g x,x y If the mixe partial erivative exists, it gives the joint ensity of Y, Y. Assuming g is nice enough, one can summarise the above as: 3

Proposition 9 two imensional transformations Let g : R R be such that it is one-to-one, it has continuous partial erivatives, its Jacobean J = g / x g / x g / x g / x 0 on R. Then g is invertible, an the ranom variable Y := gx is jointly continuous with ensity f Y y = f X g y J. The last term is the reciprocal of the Jacobi eterminant taken at x = g y. Compare this to the one imensional statement from Probability. Proof. Continue from 3 an perform the two imensional change a := gx of variables: F Y y = a <y, a <y f X g a J a a = y y f X g a J a a, from which f Y y = F Y y, y y y = f X g y J. A nice an important example will be given in Proposition 6 below. 3 Inepenent ranom variables The iscrete case of inepenent ranom variables has been covere in Probability. Here we concentrate on continuous ranom variables. Definition 0 The ranom variables X, X,..., X n are inepenent, if marginal events formulate with them are so, that is, for any measurable sets A, A,..., A n in R, P{X A, X A,... X n A n = P{X A P{X A P{X n A n. Countably infinitely many variables are inepenent, if all finite families of them are so in the above sense. The abbreviation i.i.. stans for inepenent ientically istribute. We shall mostly consier two variables, but everything generalises to any countable number of variables. Proposition Inepenence of X an Y. is equivalent to the factorisation of the istribution functions: Fx, y = F X x F Y y x, y R;. in the jointly continuous case is equivalent to the ensity function factorising: fx, y = f X x f Y y x, y R; 3. in the jointly continuous case is equivalent to the prouct form of the joint ensity: fx, y = hx gy x, y R. Similar statements are vali in the iscrete case with the probability mass function, those were covere in Probability. 4

Proof.. That inepenence implies is trivial. To prove the reverse, let a < b an c <. Then, see, P{a < X b, c < Y = Fb, Fa, Fb, c+fa, c = F X b F Y F X a F Y F X b F Y c+f X a F Y c = F X b F X a F Y F Y c = P{a < X b P{c < Y. This is the efining property of inepenence for intervals, the extension to all orel-measurable sets is one in the usual measure-theoretic way.. If X an Y are jointly continuous an inepenent, then fx, y = Fx, y = x y x y F Xx F Y y = x F Xx y F Yy = f X x f Y y. To see the reverse, if fx, y = f X x f Y y, then for any A an measurable sets, P{X A, Y = fx, yyx = f X x f Y yyx A A = f X xx A f Y yy = P{X A P{Y. 3. This epens on the right grouping of multiplicative constants, exactly the same way as one in Probability for the iscrete case. It now follows that knowing the marginal istributions an the fact that the ranom variables are inepenent uniquely etermines the istribution. Let X Ua, b an Y Uc, be inepenent an uniformly istribute. Then their marginal ensities are constant = /b a an / c on the respective intervals, therefore the joint ensity is also constant on the Cartesian prouct of the intervals. In other wors, the ensity is constant on the rectangle a, b c,, an the value of the constant is the reciprocal of the area of the rectangle. This motivates the following Definition Fix a measurable subset C of the plane with positive an finite Lebesgue measure C. The istribution with constant ensity on C an zero outsie C is calle uniform on C. Then the value of the constant ensity is necessarily / C. The probability of falling in a measurable A C is A / C, the proportion of areas. The first thing to o in a problem involving two imensional uniformsor inepenent one imensional uniforms is rawing a picture. The problem is then often solve by comparing areas. Example 3 Juliet an John arrive ranomly an inepenently to a renez-vous between :00 an 3:00. What is the probability that the first to arrive waits more than 0 minutes for the secon one? We can assume that the respective arrival times X an Y are i.i.. U:00, 3:00 ranom variables. Therefore the joint istribution is uniform on the square :00, 3:00 :00, 3:00. Plotting the event { X Y > 0 in question: Y 3:00 Y X = 0 Y X = 0 :0 :0 X 3:00 5

The probability of the event is the fraction of areas of the event an of the full square: 50 /60 = 5/36. Of course the problem can also be solve by two imensional integration this is basically a calculation of the areas of the two triangles. Example 4 Let the joint istribution function of X an Y be given by { 4xy, if 0 < x <, 0 < y <, 0 < x+y <, fx, y = 0, otherwise. Are X an Y inepenent? At first sight one might think that this function is of prouct form. However, incluing the conitions given: fx, y = 4xy {0 < x < {0 < y < {0 < x+y < clearly shows that this is not the case ue to the last inicator. Any pair of variables the istribution of which is not concentrate on a prouct set cannot be inepenent, an this is the case here. 4 Multivariate Normal The following efinition is given in general, imensions. Definition 5 Let A be a positive efinite symmetric real matrix that is, for any x R, x T Ax > 0, an m R a fixe vector. The vector X R of ranom variables is sai to have the Multivariate Normal or Multivariate Gauss istribution, if its ensity is given by fx = eta π e x mt Ax m x R. 4 The matrix A will be given a very natural meaning below in Proposition 0. Proposition 6 The imensional multivariate normal is obtaine as the affine transform of i.i.. stanar normal variables. Proof. With the above notations the matrix A is symmetric, therefore it has real eigenvalues, an we can pick orthonormal eigenvectors v, v,..., v. The matrix P forme with these vectors as columns iagonalises A-t: λ 0 0 0 0 λ 0 0 P AP = D = 0 0 λ 3 0........ 0 0 0 λ As A is positive efinite, the eigenvalues are positive, an we can consier the iagonal matrix D forme by the square-roots of the eigenvalues. Define the affine transformation g : R R, x gx = DP x m shift by m, rotation accoring to the basis {v i i, then stretching by various factors in various irections. Due to A = P DP = P T D DP = DP T DP we can write 4 as f X x = etd π e gxt gx. The function g satisfies the conitions of Proposition 9, its Jacobi eterminant is et D etp = ±et D = ± etd, hence the ensity of the ranom variable Y := gx is f Y y = f X g y J etd = e gg y T gg y etd π = π e yt y = 6 i= π e y i /,

which exactly means that the variables Y = Y, Y,..., Y are i.i.. N0, istribute. As the proof is complete. The reverse statement is also true: X = g Y = P D / Y +m, Proposition 7 Any affine transform of i.i.. Stanar Normal ranom variables with nonzero Jacobi eterminant results in a imensional multivariate normal istribution. Proof. Substituting the i.i.. Normals into the transformation proposition 9 we easily recognise the joint ensity with its A positive efinite symmetric matrix an m vector. Proposition 8 The joint istribution of i.i.. N0, σ ranom variables is rotationally invariant. Proof. We have m = 0 an A is the /σ multiple of the ientity matrix, therefore with any orthogonal transformation g : R R, x gx = P x it follows that f Y y = f X g y J = e σ P yt P y = e yt y σ π σ π σ which is still the prouct of N0, σ ensity functions. 5 The covariance matrix Definition 9 Let X, X,..., X n be ranom variables. Their covariance matrix C is the one forme of as entries i, j n. C ij := CovX i, X j Due to the properties of the covariance, this matrix is symmetric, its iagonal consists of the variances. Let a be a fixe vector, then a T Ca = a i CovX i, X j a j = Cov a i X i, a j X j = Var a i X i 0, i,j i j which shows that the covariance matrix is positive semiefinite. Proposition 0 The covariance matrix of the Multivariate Normal istribution in Definition 5 is A. Proof. We have seen that X = P D / Y +m, where P is an orthogonal matrix, D = P AP, an Y are i.i.. stanar normal variables. As the covariance matrix is translation-invariant, we can immeiately forget about m. The i,j entry of the matrix is C ij = CovX i, X j = Cov P jm D / mh Y h = k,l,m,h = k,l,m,h = k,l,m = k,l,m k,l P ik D / kl Y l, m,h P ik D / kl P jm D / mh CovY l, Y h P ik D / kl P jm D / mh δ lh P ik D / kl P jm D / ml P ik D / kl D / lm P mj = P D P ij = P DP ij = A ij, i where δ hl = {h = l is Kronecker s elta. 7

6 Continuous convolutions Discrete integer-value convolutions were covere in Probability. Here we erive the continuous convolution formula, an show a few applications. The convolution of Exponential an Gamma istributions were also covere in Probability. Let X an Y be inepenent continuous ranom variables. We start with the istribution function of the sum with the help of : F X+Y a = P{X +Y a = x+y a fx, yxy = a y f X x f Y yxy = This formula is calle the convolution of istribution functions. Differentiating it gives Proposition If X an Y are continuous inepenent ranom variables, then f X+Y a = f X a y f Y yy, which formula is known as the convolution of ensity functions. Example Determine the ensity of the sum of two i.i.. U0, ranom variables. F X a y f Y yy. With inepenent uniform variables it is always avisable to work with pictures an istribution functions. Instea, here we apply the convolution formula. The marginal ensities are f X z = f Y z = {0 < z <. Therefore with a 0, the ensity is zero elsewhere f X+Y a = {0 < a y < {0 < y < y = a a 0 y = { a, if 0 < a <, a, if < a <. The sum of inepenent uniforms is not uniform. However, the sum of inepenent Normals is Normal: Proposition 3 Let X Nµ x, σ x an Y Nµ y, σ y be inepenent. Then X+Y Nµ x +µ y, σ x+σ y. Proof. Consier the case µ x = µ y = 0 first. The ensity of the sum is f X+Y a = e a y πσx σx e y σy y = πσy πσ x σ y e [ a y The strategy is to complete the square in the bracket so that the integral can be compute. a y σ x + y σ y = y σx + y σy ay σx = σ x +σ y σ xσ y = σ x +σ y σ xσ y = σ x +σ y σ xσ y = σ x +σ y σ xσ y Introucing the integration variable z = f X+Y a = π σx +σy y a y a y a y a + a σx = σ x +σy σxσ y σ y σx +σy y σx σx +σy σ y y a σ x + a σ x y + a σ x σx +σy σxσ y σy +a σ y +σx +σy σx +σy σx +σyσ x σy a + σx +σy σx +σy. σ x +σy σ xσ y y a σ y σ, x +σ y e z / z e a σ x +σ y = πσx +σy ] + y σy y. a σ 4 y σ x +σ y + a σ x e a σ x +σ y. 8

This shows that X +Y N0, σ x +σ y. For the general case, we write X +Y = X µ {{ x + Y µ y +µ x +µ y Nµ x +µ y, σx +σ {{ y. N0,σx N0,σy {{ N0,σx +σ y Of course the statement implies X Y Nµ x µ y, σ x+σ y as well notice that the variances still a up!. Example 4 A basketball team plays n A = 6 games against class A teams, an n = 8 games against class teams. The probability of winning against class A teams is, inepenently, p A = 0.4, while it is p = 0.7 against class teams. What is the probability that our teams wins 5 or more games? What is the probability that it wins more against class A teams than class teams? The number of games won against class A teams is X inomn A, p A, against class teams is Y inomn, p, an these variables are inepenent. The problem is that ue to p A p, the sum of these variables is not inomial. However, the number of games is enough for the DeMoivre-Laplace Theorem to give a usable approximation. That is, X n A p A na p A p A N0, an Y n p N0,. n p p We therefore norm our centere variable with a yet unknown constant C: P{X +Y 5 = P{X +Y 4.5 { X na p A = P + Y n p 4.5 n Ap A n p C C C { X n A p A na p A p A Y n = P p n p p + na p A p A C n p p C {{{{ N0, N0, {{ {{ N0,n Ap A p A/C {{ N0,n p p /C N0,[n Ap A p A+n p p ]/C From here we see that C = n A p A p A +n p p 4.5 n Ap A n p. C is a reasonable choice as then the left han-sie will be close to a stanar normal. The DeMoivre-Laplace approximation remains vali if the factors n A p A p A /C an n p p /C are not too large compare to n A p A p A an n p p, that is, n A p A p A an n p p o not have ifferent orers of magnitue. In our case their values are approximately.50 an.94. Of course these are intuitive arguments, precise statements woul require estimates of error terms. With this choice of C, 4.5 na p A n p P{X +Y 5 Φ C 4.5 n A p A n p = Φ Φ0.474 0.3. na p A p A +n p p Similarly, the probability of winning more games against class A teams than against class teams is P{X Y > 0 = P{X Y 0.5 { X na p A = P Y n p 0.5 n Ap A +n p C C C { X n A p A na p A p A Y n = P p n p p na p A p A C n p p C {{{{ N0, N0, {{ {{ N0,n Ap A p A/C {{ N0,n p p /C N0,[n Ap A p A+n p p ]/C 0.5 n Ap A +n p, C 9

hence we can use the same norming constant C as before, an 0.5 n A p A +n p P{X Y > 0 Φ Φ0.853 0.0. na p A p A +n p p 0