SELECTED PROOFS DeMorgan s formulas: The frst one s clear from Venn dagram, or the followng truth table: A B A B A B Ā B Ā B T T T F F F F T F T F F T F F T T F T F F F F F T T T T The second one can be derved from the frst by changng A to Ā and B to B, thus: Ā B = A B (snce A = A), and takng complement of each sde: Product rule: Ā B = A B Pr(A) Pr(B A) Pr(C A B) = Pr(A B) Pr(A B C) Pr(A) = Pr(A) Pr(A B) Pr(A B C) Total probablty formula: P Pr(B A k )Pr(A k )= All k P All k P All k Pr(A k B) Pr(A k ) Pr(A k )= Pr(A k B) = (because they are dsjont) Pr [ All k (A k B)] = (by dstrbutve law) Pr [( All k A k ) B] =Pr(B) (snce All k A k = Ω) Total mean formula: E (X) = P Pr(X = ) (defnton of expacted value) = P P Pr(X = A k )Pr(A k ) (by the prevous formula) All k = P P Pr(X = A k )Pr(A k ) (nterchange the 2 summatons) All k = P E(X = A k )Pr(A k ) (by defnton of condtonal expacted value) All k
ρ (or, equvalently, ρ 2 ): E{[X µ x λ(y µ y )] 2 } = E[(X µ x ) 2 ] 2λE[(X µ x )(Y µ y )] + λ 2 E[(Y µ y ) 2 ]= Var(X) 2λCov(X, Y )+λ 2 Var(Y ) for any λ (averagng non-negatve quantty yelds non-negatve answer). The last expresson wll have the smallest possble value when the λ-dervatve s zero, namely 2Cov(X, Y )+2λVar(Y )= or λ = Cov(X, Y ) Var(Y ) Substtutng ths nto the same expresson yelds Var(X) 2 Cov(X, Y ) Cov(X, Y )2 Cov(X, Y )+ Var(Y ) Var(Y ) 2 Var(Y )= Var(X) Cov(X, Y )2 Var(Y ) Ths mples or Cov(X, Y )2 Var(X)Var(Y ) Cov(X, Y )2 Var(X)Var(Y ) = ρ2 Expected value of lnear combnaton of RVs: E (ax + by + c) = (ax + by + c) f(x, y)dxdy = a x f(x, y)dxdy + b x f(x, y)dxdy + c = ae (X)+bE(Y )+c (snce an ntegral s a lnear operator, whch means: a constant can be taken out, and: ntegratng a sum can be done by ntegratng the terms ndvdually and addng the answers). In a dscrete case, we use summaton nstead of ntegraton (rest s the same). 2
Varance of lnear combnaton of RVs: Var (ax + by + c) = [(ax + by + c) (aµ x + bµ y + c)] 2 f(x, y)dxdy = [a(x µ x )+b(y µ y )] 2 f(x, y)dxdy = [a 2 (x µ x ) 2 + b 2 (y µ y ) 2 +2ab(x µ x )(y µ y )] f(x, y)dxdy = a 2 Var(X)+b 2 Var(Y )+2abCov((X, Y ) Propertes of MGF: Snce: M X (u) =E[e ux ]= then, qute clearly R e bu All x R All x M ax+b (u) = E[e u(ax+b) ]= R e aux f(x)dx = e bu M X (au) When X and Y are ndependent, then Fnally: e ux f(x)dx All x e aux+bu f(x)dx = M X+Y (u) =E[e u(x+y ) ]= R R e ux+uy f X (x) f Y (y)dxdy = (the ntegral s separable) All y All x R e ux f X (x)dx R e uy f Y (y)dy = M X (u) M Y (y) All x All y M X (u) = E[e ux ]=E[ + ux + u2 2 X2 + u3 3! X3 +...] = +ue[x]+ u2 2! E[X2 ]+ u3 3! E[X3 ]+... whch proves that the smple moments are coeffcents n the Taylor expanson of M X (u). Propertes of PGF: Snce: P X (z) =E[z X ]= P z f() 3
then, when X and Y are ndependent, we get M X+Y (z) =E[z (X+Y ) ]= P P z +j f X () f Y (j) = All j P z f X () P z j f Y (j) =P X (z) P Y (z) All y Snce we get P (k) X (z) = P ( )...( k +)z k f() P (k) X (z) z= = P ( )...( k +)f() = E [X(X )...(X k +)].e. the k th factoral moment. Smlarly P (k) X (z) z= = k!f(k) Convoluton: Snce v x Pr(X + Y<v)= f(x, y)dydx The pdf of V = X + Y s the v dervatve of the above, namely Here, we need to recall that d dv g(v) n general. Central Lmt theorem: We need the MGF of X µ σ/ n = P n f(x, v x)dx f(y)dy = g (v) f[g(v)] = X µ σ n. The MGF of each X µ σ expands to: n + u2 2n + u3 E[(X µ) 3 ] +... 3! σ 3 n 3/2 Rasng ths to the power of n, and takng the n lmt yelds + u2 2n + u3 E[(X µ) 3 n ] +... 3! σ 3 n 3/2 4 /2 n eu2
snce terms wth a hgher-than- power of n n the denomnaton don t matter. Ths s the MGF of the standardzed Normal dstrbuton wth the pdf equal to f(z) = e z2 /2 2π (for all real z). Verfcaton: 2π Composton: e uz e z2 /2 dz = eu2 /2 2π e (z u)2 /2 dz = e u2 /2 where E z P SN = E z SN N = n Pr(N = n) = n= P E z S P n Pr(N = n) = P X (z) n Pr(N = n) = n= n= P n Pr(N = n) =P N () =P N [P X (z)] n= S N = N P X = N s a RV wth PGF gven by P N (z), and the X are IID from a dstrbuton wth the followng PGF: P X (z) Bnomal dstrbuton: Thesamplespaceoftheexpermentconsstsofalln-letter words made up of two letters, S and F (success and falure) - we know there are 2 n of them. Each of these has the probablty of p ( p) n where p s the probablty of a sngle success, and s the number of S letters the word contans. We also know that there are n words wth letters S, the total probablty of gettng successes (n any order) s thus µ n Pr(X = ) = p ( p) n where ranges from to n nclusve. Note that provng the Bnomal Theorem, whch states that µ n A B n =(A + B) n = 5
for any A and B, would be smlar: expand (A + B) (A + B)... (A + B) usng the dstrbutve law, and get a sum of all words consstng of A and B, etc. Wth the help of the Bnomal Theorem, the PGF of X s P (z) = The correspondng mean s the second factoral moment s mplyng, for the varance: µ n (pz) ( p) n =( p + pz) n µ = n( p + pz) n p z= = np n(n )( p + pz) n 2 p 2 z= = n(n )p 2 σ 2 = n(n )p 2 + µ µ 2 = np( p) Geometrc: It s obvous that the probablty of falures followed by a success s Pr(X = ) =pq for any postve (nteger), where q p. The correspondng PGF s: P (z) X pq z = pz( + qz + q 2 z 2 + q 3 z 3 +...) = pz qz = snce +A + A 2 + A 3 +... = A for any A < - to prove that, do the Taylor expanson of A (as a functon of A). Expandng PGF n terms of z at (Maple s qute good at ths) yelds pz ( p)z ' +z ( p) + p p 2 (z ) 2 +... whch mples that the mean s 2( p) p and the varance s p + 2 p p = p 2 p. 2 Negatve bnomal: Snce t s a sum of k ndependent RVs of geometrc type, ts mean and varance are k tmes the prevous two results, and P (z) = µ pz k qz 6
To get the k th success at the th tral, the frst trals must result n exactly k successes (n any order), and the th tral must be a success. Thus, we get µ µ Pr(X = ) = p k q p = p k q k k where s a postve nteger k. Posson: It can be ntroduced as a lmt of the Bnomal dstrbuton, when n but the mean s kept constant at λ (thsmplesthatp = λ n ), namely: snce n n Pr(X = ) = lm n, n 2 n,... all tend to, and n(n )(n 2)...(n +) λ! n ( λ n )n = λ! e λ for any a. Its PGF s P (z) =e λ X = (λz)! lm ( + a n n )n = e a = e λ( z) ' +λ(z ) + λ2 (z ) 2 +... 2 mplyng that the mean s λ and the varance s λ 2 + λ λ 2 = λ. Exponental: It can be ntroduced as a lmt of the geometrc dstrbuton, when we perform n trals every unt of tme, keepng the mean tme (of gettng the frst success) fxed at β (ths mples that p = nβ ), thus: F (x) =Pr(X x) = lm n ( nβ )nx = e x/β for any x>. Ths mples that and the MGF s β e x/β+xu dx = β Expandng n terms of u at yelds f(x) =F (x) = β e x/β e x/β+xu u β x= = +βu + β 2 u 2 +... β(u β ) = βu tellng us that the mean s β and the varance s 2β 2 β 2 = β 2. The memory-less property Pr(X a>x X>a)=Pr(X>x) 7
(where x and a are postve) s verfed by: Pr(X a > x X>a)= Pr(X >x+ a) Pr(X >a) Pr(X a>x X>a) Pr(X >a) = e (x+a)/β e a/β = = e x/β =Pr(X>x) Gamma: Snce t s defned as a sum of k ndependent RVs of the exponental type, ts mean and varance are kβ and kβ 2 respectvely, and ts MGF s ( βu) k. To derve ts pdf, we start wth k =2and do a convoluton of two exponentals, thus: β 2 y e x/β e (y x)/β dx = ye y/β β 2 Convoluton of ths and another exponental (k =3case): β 3 y And one more tme (k =4): 2β 4 y whch makes t obvous that, n general: xe x/β e (y x)/β dx = y2 e y/β 2β 3 x 2 e x/β e (y x)/β dx = y3 e y/β f(x) = xk e x/β (k )!β k 3!β 4 for x> (zero otherwse). To verfy, let s fnd the correspondng MGF: (k )!β k x k e x/β+xu dx = β k ( β = u)k ( βu) k (check!). The dstrbuton functon s h F (x) =Pr(X x) = + x β + x2 + x3 +... + xk e x/β 2β 2 3!β 3 (k )!β k To verfy ths, dfferentate F (x) wth respect to x, and get f(x). Multnomal: It s a generalzaton of bnomal dstrbuton, except: nstead of two possble outcomes, each tral can have three (or more - our formulas wll assume three). We wll call them Wn, Loss, and Te. Then, the probablty of wns, j losses and k tes s computed by Pr(X = Y = j = k) = n,j,k p x p j y pk z 8
canbeprovensthesamemannerasweddforbnomal(thesamplespacewould now consst of all n-letter words bult out of three letters,...). The margnal dstrbutons of X, Y and are (qute obvously) all bnomal, so we can easly compute ther means and varances. To fnd Cov(X, Y ), we wrte X = X + X 2 +... + X n Y = Y + Y 2 +... + Y n where X,X 2,... s the number of wns n Game, Game 2,... (smlarly Y,Y 2,... count the losses). Obvously, each of these 2n RVs can have only two values, or. Now Cov(X + X 2 +... + X n,y + Y 2 +... + Y n )= Cov(X,Y j )=,j= n Cov(X,Y ) Cov(X,Y )+ = Cov(X,Y j )= snce, when 6= j, the RVs are ndependent and have covarance. Snce E (X Y )=(the X Y product cannot have any other value than, as you cannot have a wn and loss n Game at the same tme!), Cov(X,Y )= p x p y. The fnal formula thus reads: Cov(X, Y )= np x p y 6=j 9