Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x g(x f X (xdx g(x f X (xdx E[g(X]. Markov s Iequality. For ay h > 0, P( X h E[ X ] h. Whe X oly takes o-egative values the for ay h > 0 P(X h E[X] h. Proof. Take g(x X h lemma. i the lemma. If X oly takes o-egative values take g(x X h i the 2. Chebyshev s Iequality. If E[X] µ ad Var(X σ 2, which are fiite, the for ay h > 0 P( X µ h σ2 h 2. Proof. Take g(x ( 2 X µ h i the lemma. Note: Chebyshev s iequality ca be used to derive the weak law of large umbers. This is specified i the theorem below. Theorem. Let X,X 2,... be a sequece of i.i.d. radom variables each with fiite mea µ ad fiite variace σ 2. The for ay ε > 0 ad δ > 0 there exists a N such that P( X µ ε δ for all N, where X j X j. Proof. Note that E[X ] µ ad Var(X σ2. Cosider ay ε > 0 ad δ > 0. Apply Chebyshev s iequality to X ad let h ε. The P( X µ ε σ2 δ provided σ2 ε 2 ε 2 δ. Therefore we eed oly choose N σ2 to obtai the result. ε 2 δ Note. Observe that lim P( X µ ε 0 for ay ε > 0. We say that X coverges i probability to µ as teds to ifiity. Some examples usig the iequalities.. From Markov s iequality with h Nµ, if X is a o-egative radom variable, P(X > Nµ for ay N > 0. µ Nµ N 2. If σ 2 0 the from Chebyshev s iequality for ay h > 0, P( X µ < h P( X µ h σ2. Hece P(X µ lim h 2 h 0 P( X µ < h. So variace zero implies the radom variable takes a sigle value with probability.

3. Whe σ 2 > 0 Chebyshev s iequality gives a lower boud o the probability that X lies withi k stadard deviatios from the mea. Take h kσ. The P( X µ < kσ P( X µ kσ σ2 (kσ 2 k 2 4. Whe σ, how large a sample is eeded if we wat to be at least 95% certai that the sample mea lies withi 0.5 of the true mea? We use Chebyshev s iequality for X with h 0.5. The P( X µ < 0.5 ( X µ 0.5 provided 0.05 4 80. So we eed a miimum sample size of 80. The Cetral Limit Theorem. σ2 (0.5 2 4 0.95 Let X,X 2,... be a sequece of i.i.d. radom variables each with fiite mea µ ad fiite variace σ 2 ad let X be the sample mea based o X,...,X. The we ca fid a approximatio for P(X A whe is large by writig the evet for X i terms of the stadardized variable Z (X µ/σ (i.e. P(X A P ( Z (A µ σ ad provig that lim P(Z z Φ(z z 2π e x2 /2 dx which is the c.d.f. of N(0,. The proof of this result uses the m.g.f. ad the followig lemma. Lemma. Let Z,Z 2,... be a sequece of radom variables. If lim M Z (t M(t, which is the m.g.f. of a distributio with c.d.f. F, the lim F Z (z F(z at all poits z for which F(z is cotiuous. Theorem (The Cetral Limit Theorem. Let X,X 2,... be a sequece of i.i.d. radom variables each with m.g.f. which exists for etries i a ope regio about zero so is differetiable with fiite mea, deoted by µ, ad fiite variace, deoted by σ 2. Let Z (X µ/σ, the lim P(Z z Φ(z. Proof. Let U j (X j µ/σ ad let M U (t be the commo m.g.f. The M U (t e µt/σ M X (t/σ exists i a ope iterval about t 0, M(0, M (0 E[U] 0 ad M (0 E[U 2 ] Var(U. So U,U 2,... are i.i.d. with mea zero ad variace oe. Now [ M Z (t E e t j U j/ ] j E[e tu j/ ] ( M U (t/ Takig logs to base e gives l(m Z (t (l(m U (t/. L Hopital s rule. The Now let x / ad use lim l(m U(t/ lim x 0 l(m U (xt x 2

lim tm x 0 U (xt/m U(xt lim 2x t 2 (M x 0 t2 (M U (0M U(0 (M U (02 2(M U (0 2 t2 2 U (xtm U(xt (M U (xt2 /(M U (xt 2 2 Hece lim t l(m Z (t t 2 /2 ad so lim t M Z (t e t2 /2. Sice this is the m.g.f. of a N(0, distributio, usig the lemma proves that lim P(Z z Φ(z z 2π e x2 /2 dx The bivariate ad multivariate ormal distributio. A idirect method was used o problem sheet 9 to get you to derive stadard results for a bivariate ormal distributio. The results are summarised below. The results may be proved directly, however it is messy uless you use matrix ad vector otatio. Oce you do this results ca just as easily be obtaied for the multivariate ormal, so we may just as well derive results immediately for the more geeral case. Summary of results for the bivariate ormal distributio.. If X ad X 2 have bivariate ormal distributio the the joit p.d.f. is f X,X 2 (x,x 2 2πσ σ 2 ( ρ 2 e ( ( x µ 2 2ρ ( ( ( x µ x2 µ 2 x2 µ 2 2( ρ 2 σ σ σ + 2 2 σ 2 for all x,y. The distributio has parameters µ,µ 2,σ 2,σ2 2,ρ. The parameter ρ is restricted so that < ρ <. 2. The joit m.g.f. is M X,X 2 (t,t 2 e (µ t +µ 2 t 2 + 2 (σ2 t2 +2ρσ σ 2 t t 2 +σ 2 2 t2 2 This ca be used to idetify the parameters ad fid the margial distributios. M X (t M X,X 2 (t,0 e µ t + 2 σ2 t2. Hece X N(µ,σ 2. Similarly X 2 N(µ 2,σ 2 2. Differetiatig the joit m.g.f. i stadard maer shows that ρ(x,x 2 ρ. 3. X ad X 2 are idepedet iff ρ 0. This is easily see from either the joit p.d.f. or the joit m.g.f. 4. The coditioal distributio of X 2 X x is ormal with mea liear i x ad variace which does ot deped o x. A similar result holds for the coditioal distributio of X X 2 x 2. Usig vector ad matrix otatio. X ( X X 2 ( x ; x x 2 ( t ; t t 2 ( µ ; m µ 2 ( σ 2 ; V ρσ σ 2 ρσ σ 2 σ 2 2

The m is the vector of meas ad V is the variace-covariace matrix. Note that V σ 2 σ2 2 ( ρ2 ad V ( σ 2 ( ρ 2 ρ σ σ 2 ρ σ σ 2 σ 2 2 Hece f X (x (2π 2/2 V /2 e 2 (x mt V (x m for all x. Also M X (t e tt m+ 2 tt Vt. The Multivariate Normal Distributio. We agai use matrix ad vector otatio, but ow there are radom variables so that X, x, t ad m are ow -vectors with i th etries X i, x i, t i ad µ i ad V is the matrix with ii th etry σ 2 i ad i j th etry (for i j σ i j. Note that V is symmetric so that V T V. The joit p.d.f. is f X (x (2π /2 V /2 e 2 (x mt V (x m for all x. We say that X N(m,V. We ca fid the joit m.g.f. quite easily. ] M X (t E [e j t jx j E[e tt X ]... (2π /2 V /2 e 2((x m T V (x m 2t T x dx...dx We do the equivalet of completig the square, i.e. we write (x m T V (x m 2t T x (x m a T V (x m a + b for a suitable choice of the -vector a of costats ad a costat b. The M X (t e b/2... (2π /2 V /2 e 2 (x m at V (x m a dx...dx e b/2. We just eed to fid a ad b. Expadig we have ((x m a T V ((x m a + b (x m T V (x m 2a T V (x m + a T V a + b (x m T V (x m 2a T V x + [ 2a T V m + a T V a + b ] This has to equal (x m T V (x m 2t T x for all x. Hece we eed a T V t T ad b [ 2a T V m + a T V a ]. Hece a Vt ad b [ 2t T m + t T Vt ]. Therefore

M X (t e b/2 e tt m+ 2 tt Vt Results obtaied usig the m.g.f.. Ay (o-empty subset of multivariate ormals is multivariate ormal. Simply put t j 0 for all j for which X j is ot i the subset. For example M X (t M X,...,X (t,0,...,0 e t µ +t 2 σ2 /2. Hece X N(µ,σ 2. A similar result holds for X i. This idetifies the parameters µ i ad σ 2 i as the mea ad variace of X i. Also M X,X 2 (t,t 2 M X,...,X (t,t 2,0,...,0 e t µ +t 2 µ 2 + 2 (t2 σ2 +2σ 2t t 2 +σ 2 2 t2 2 Hece X ad X 2 have bivariate ormal distributio with σ 2 Cov(X,X 2. A similar result holds for the joit distributio of X i ad X j for i j. This idetifies V as the variace-covariace matrix for X,...,X. 2. X is a vector of idepedet radom variables iff V is diagoal (i.e. all off-diagoal etries are zero so that σ i j 0 for i j. Proof. From (, if the X s are idepedet the σ i j Cov(X i,x j 0 for all i j, so that V is diagoal. If V is diagoal the t T Vt j σ2 j t2 j ad hece M X (t e tt m+ 2 tt Vt j ( e µ jt j + 2 σ2 j t2 j /2 j M Xj (t j By the uiqueess of the joit m.g.f., X,...,X are idepedet. 3. Liearly idepedet liear fuctios of multivariate ormal radom variables are multivariate ormal radom variables. If Y AX + b, where A is a o-sigular matrix ad b is a (colum -vector of costats, the Y N(Am + b,ava T. Proof. Use the joit m.g.f. M Y (t E[e tt Y ] E[e tt AX+b ] e tt b E[e (AT t T X ] e tt b M X (A T t e tt b e (AT t T m+ 2 (AT t T V(A T t e tt (Am+b+ 2 tt (AVA T t This is just the m.g.f. for the multivariate ormal distributio with vector of meas Am + b ad variace-covariace matrix AVA T. Hece, from the uiqueess of the joit m.g.f, Y N(Am + b,ava T. Note that from (2 a subset of the Y s is multivariate ormal.

NOTE. The results cocerig the vector of meas ad variace-covariace matrix for liear fuctios of radom variables hold regardless of the joit distributio of X,...,X. We defie the expectatio of a vector of radom variables X, E[X] to be the vector of the expectatios ad the expectatio of a matrix of radom variables Y, E[Y], to be the matrix of the expectatios. The the variace-covariace matrix of X is just E[(X E[X](X E[X] T ]. The followig results are easily obtaied: (i Let A be a m matrix of costats, B be a m k matrix of costats ad Y be a k matrix of radom variables. The E[AY + B] AE[Y] + B. Proof. The i j th etry of E[AY + B] is E[ r A iry r j + B i j ] r A ire[y r j ] + B i j, which is the i j th etry of AE[Y] + B. The result is the immediate. (ii Let C be a k m matrix of costats ad Y be a k matrix of radom variables. The E[YC] E[Y]C. Proof. Just traspose the equatio. The result the follows from (i. Hece if Z AX+b, where A is a m matrix of costats, b is a m-vector of costats ad X is a -vector of radom variables with E[X] m ad variace-covariace matrix V, the E[Z] E[AX + b] AE[X] + b Am + b Also the variace-covariace matrix for Y is just E[(Y E[Y](Y E[Y] T ] E[A(X m(x m T A T ] AE[(X m(x m T ]A T AVA T Example. Suppose that E[X ], E[X 2 ] 0, Var(X 2, Var(X 2 4 ad Cov(X,X 2. Let Y X + X 2 ad Y 2 X + ax 2. Fid the meas, variaces ad covariace ad hece fid a so that Y ad Y 2 are ucorrelated. Writig i vector ad matrix otatio we have E[Y] Am ad the variace-covariace matrix for Y is just AVA T where m ( 0 V ( 2 4 A ( a Therefore Am ( a ( 0 (

( AVA T a ( 2 4 ( a ( 8 3 + 5a 3 + 5a 2 + 2a + 4a 2 Hece Y ad Y 2 have meas ad, variaces 8 ad 2 + 2a + 4a 2 ad covariace 3 + 5a. They are therefore ucorrelated if 3 + 5a 0, i.e. if a 3 5.