MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a measure, ma) = mb) + ma \ B); rearranging gives the result Note that more generally ie not assuming B A) ma \ B) = ma) ma B), by the same argument M2 As ma \ B) 0 by definition of a measure, this follows immediately from M1 M3 Apply M1 with B = A; then A \ A = so the LHS is m ), and the RHS is ma) ma) = 0 M4 We can write A B = A B) A \ B) B \ A), and the three sets here are disjoint So using the definition of measure ma B) = ma B) + ma \ B) + mb \ A) Applying M1, we get ma B) + ma) ma B) + mb) ma B) which, simplifying, gives the result Note A B and B A are the same) M5 See Exercise 4 M6 See Exercise 4 Theorem 3 Law of Total Probability) Because the E i form a partition, they are disjoint Hence their intersections with F, F E i, are also disjoint Again because the E i are a partition, any element of F must be in one of them, so the union of F E i for i = 1,, n must be the whole of F, and the previous sentence says it is a disjoint union Hence P F ) = n i=1 P F E i) The second form of the statement which is the more useful one in practice) follows immediately by writing P F E i ) = P E i )P F E i ) from the definition of conditional probability) 1
Theorem 4 Bayes Theorem) By the definition of conditional probability, P E i F ) = P E i F )/P F ) However, we also know from the definition of conditional probability that P E i F ) = P E i )P F E i ) Hence P E i F ) = P E i)p F E i ) P F ) Theorem 5 Before the full proof, consider Example 35 again Here we have a random variable X with range R X = { 1, 0, 1}, and we let Y = X 2 Thus R Y = {0, 1} By definition, we have EY ) = y R Y yp Y = y) = P Y = 1) after a bit of simplification) So we need to consider the event {Y = 1} For Y to be 1 means that either X = 1 or X = 1, and by the obvious) disjointness of the two possibilities P Y = 1) = P X = 1)+P X = 1), so we can say that EY ) = P X = 1)+P X = 1) Now consider the general case, and let Y = gx) Then, by definition, EY ) = y R Y yp Y y) = y R Y yp Y = y) In the example above, we split the event {Y = 1} up into events in terms of X which give Y = 1 More generally, the event {Y = y} is the disjoint union of events {X = x} for each x R X such that gx) = y If g is injective, there will be only one event in the union) So EY ) = yp Y = y) y R Y = y R Y y =,gx)=y y R Y,gx)=y = y R Y,gx)=y P X = x) yp X x) gx)p X x), and the double sum here is equivalent to, giving the result Theorem 6 This is a special case of Theorem 8: in the notation of that theorem let a = 1 and b = EX) 2
Theorem 7 By definition and Theorem 5, VarX) = EX EX)) 2 ) = Expanding the brackets, we have VarX) = x 2 p X x) 2EX) x EX)) 2 p X x) xp X x) + EX) 2 p X x) Note that here we have used the fact that 2EX) and EX) 2 are constants which do not depend on x, so can be taken outside the sum Then xp X x) = EX), by definition, and p X x) = 1 as p X is a probability mass function, so we get VarX) = EX 2 ) 2EX)EX) + EX) 2 = EX 2 ) EX) 2, as required Theorem 8, mean part By Theorem 5, EaX + b) = ax + b)p X x) = a xp X x) + b xp X x) = aex) + b, again using xp X x) = EX) and p X x) = 1 Hence we have the result Theorem 8, variance part By definition, By the mean part, we get VaraX + b) = EaX + b EaX + b)) 2 VaraX + b) = EaX + b aex) b)) 2 ) = EaX EX))) 2 ) = Ea 2 X EX)) 2 ) = a 2 VarX) 3
Theorem 9 The definition of expectation gives EX + Y ) = zp X + Y = z) z R X+Y Now, if z R X+Y we can write z = x + y where x R X and y R Y Hence we can replace the sum over z by a sum over x and y: EX + Y ) = x + y)p X = x, Y = y) y R Y Split the sum up: EX + Y ) = xp X = x, Y = y) + yp X = x, Y = y) y R Y y R Y = x P X = x, Y = y) + y P X = x, Y = y) y R Y y R Y If R X or R Y is infinite, you ll need to take on trust that the reversal of the order of summation is OK here) Now y R Y P X = x, Y = y) = P X = x), and similarly P X = x, Y = y) = P Y = y) So we get EX + Y ) = xp X = x) + yp Y = y) y R Y = EX) + EY ) Theorem 10 This is similar to Theorem 9 Start with EXY ) = zp XY = z) z R XY = xy)p X = x, Y = y) y R Y By independence, P X = x, Y = y) = P X = x)p Y = y), so we get EXY ) = xy)p X = x)p Y = y) y R Y 4
Now, with respect to y, we can regard x and P X = x) as constants, so we take them out of the sum with respect to y, and get EXY ) = xp X = x) yp Y = y), y R Y which immediately gives EXY ) = EX)EY ) Corollary 11 Start with the variance identity Theorem 7): VarX + Y ) = EX + Y ) 2 ) EX + Y )) 2 Use Theorems 8, 9 and 10 to get VarX + Y ) = EX 2 + 2XY + Y 2 ) EX)) 2 EY )) 2 2EX)EY ) = EX 2 ) + EY 2 ) + 2EXY ) EX)) 2 EY )) 2 2EX)EY ) = VarX) + VarY ) + 2EXY ) EX)EY )), and by Theorem 10 EXY ) EX)EY ) = 0, giving the result Theorem 12 This is an easy exercise with the definitions of mean and variance: EX) = 0 1 p) + 1 p = p and EX 2 ) is also p since X only takes values 0 and 1, so X and X 2 are actually the same) Hence VarX) = EX 2 ) = EX)) 2 = p p 2 = p1 p) Theorem 13 Use the fact that X = n i=1 Z i, where Z i = 1 if trial i is a success and Z i = 0 if it is a failure By assumption, the Z i are independent Thus Theorems 9 and 12 tell us EX) = E n Z i ) = i=1 and Corollary 11 and Theorem 12 give VarX) = Var n Z i ) = i=1 n EZ i ) = np, i=1 n VarZ i ) = np1 p) Theorem 14 We are looking for ) ) x n λ lim 1 λ n x, n x n n) 5 i=1
which, writing ) n x = n! depend on n, becomes Now, and λ x x! x!n x)! nn 1)n 2) n x + 1) lim n n x n 1 lim n n = lim n n 2 n is also 1 So we are left with λ x and factoring out terms which do not 1 λ ) x 1 λ n n n) = = lim n n x + 1 n lim 1 λ ) x n n x! lim n 1 λ n) n = 1, By Note 638 in MAS110, lim n 1 λ n) n = e λ, so we are left with as required e λ λ x Theorem 15: valid pmf As p X x) 0, we just need to check x=0 p Xx) = 1 Checking, we have p X x) = x=0 x=0 x! e λ λ x = e λ x! x=0 λ x x! = e λ e λ = 1, recognising the sum as the series expansion of the exponential function Theorem 15: mean and variance We have EX) = x e λ λ x = e λ λ x x! x 1)! x=0 using x! = xx 1)!) Changing variables to y = x 1, we get EX) = e λ λ y+1 y=0 y! = λe λ y=0 x=1 λ y y! = λe λ e λ = λ For the variance, we have EX 2 ) = λ 2 + λ see Exercise 36) and thus VarX) = λ 2 + λ) λ) 2 = λ 6
Theorem 16 For x N, x F X x) = P X = a) = a=1 x a=1 1 p) a 1 p = p 1 p)x p 1 1 p) geometric series with x terms, first term p and common ration 1 p) and that simplifies to 1 1 p) x as required Theorem 17 By the definition of mean, EX) = x1 p) x 1 p x=1 The Binomial Theorem negative integer case) tells us that for θ < 1, 1 θ) 2 = n=0 n + 1)θn = m=1 mθm 1, which you can also obtain by differentiating term by term the formula for the sum of an infinite geometric series Using this with θ = 1 p gives x1 p) x 1 p = p x1 p) x 1 = p1 1 p)) 2 = 1 p x=1 x=1 For the variance, we start by finding EXX 1)) = xx 1)1 p) x 1 p = x=1 xx 1)1 p) x 1 p, as the x = 1 term is zero Again the Binomial Theorem or term by term differentiation says 1 θ) 3 mm 1) = θ m 2, 2 x=2 m=2 and thus xx 1)1 p) x 1 p = 2p1 p) xx 1)1 p) x 2 = p1 p)1 1 p)) 3 = x=2 Now, EX 2 ) = EXX 1) + X) = 1 p p 2 VarX) = EX 2 ) EX)) 2 = x=2 + 1 p, and 21 p) p 2 + 1 p 1 p 2 = 1 p p 2 21 p) p 2 7
Theorem 18 This is really just a repeat of Corollary 11, but without the last line which uses the independence assumption Start with the variance identity Theorem 7): Use Theorem 8 to get VarX + Y ) = EX + Y ) 2 ) EX + Y )) 2 VarX + Y ) = EX 2 + 2XY + Y 2 ) EX)) 2 EY )) 2 2EX)EY ) Theorem 19 = EX 2 ) + EY 2 ) + 2EXY ) EX)) 2 EY )) 2 2EX)EY ) = VarX) + VarY ) + 2EXY ) EX)EY )) = VarX) + VarY ) + 2 CovX, Y ) 1 By definition, CovX, X) = EX EX))X EX))) = EX EX)) 2 ), which is the definition of the variance 2 We have CovaX + b, cy + d) = EaX + b aex) + b))cy + d cey ) + d))) = EacX EX))Y EY ))) = ac CovX, Y ), using the definition of covariance and Theorem 8 3 That CovX, Y ) = 0 if X and Y are independent follows from Theorem 10 That the converse does not necessarily hold is shown by Example 46 CovX,X) 4 We have CorX, X) = = VarX) = 1 For CorX, X), VarX) VarX) VarX) note that 2 above shows that CovX, X) = VarX), from which CorX, X) = 1 follows immediately 5 Let σx 2 = VarX), ) σ2 Y = VarY ) and c = CovX, Y ) Consider Var X c Y, and note that because this is a variance it must σy 2 be non-negative By Theorem 18, and also using Theorem 8 and 8
2 above, we get Var c ) Y σy 2 = VarX) + c2 σ 4 Y = σ 2 X + c2 σ 2 Y = σx 2 c2 σy 2 2 c2 σ 2 Y VarY ) 2 c σ 2 Y CovX, Y ) Thus σx 2 c2 0, σy 2 and dividing through by σx 2 we get CovX, Y ) 2 = c2 σ 2 X σ2 Y 1, from which the result follows 6 By 2 above, CovX, a + bx) = b VarX), and by Theorem 8 Vara + bx) = b 2 VarX) So we get CorX, a + bx) = b VarX) bvarx)) 2, which gives the result, remembering that b 2 is b if b > 0 and b if b < 0 Theorem 20 We want P X 2 = x 2, X 3 = x 3,, X k = x k X 1 = x 1 ), which by the definition of conditional probability is P X 2 = x 2, X 3 = x 3,, X k = x k, X 1 = x 1 ) P X 1 = x 1 ) By the formulae for the multinomial and binomial distributions, this becomes n! x 1!x 2!x k! px 1 1 p x 2 2 p x k k n! x 1!n x 1 )! px 1 1 1 p 1 ) n x 1 9
and various terms cancel, giving which is the same as n x 1 )! x 2! x k! giving the result p2 n x 1 )!p x 2 2 p x 3 3 p x k k x 2! x k!1 p 1 ) n x 1, 1 p 1 ) x2 p3 1 p 1 ) x3 ) xk pk, 1 p 1 Theorem 21 For x 0, F X x) = P X x) = x 0 λe λt dt = 1 e λx Note that if x < 0, P X x) = 0 as x cannot be negative, so in full { 1 e λx x 0 F X x) = 0 x < 0 Theorem 22 We have EX) = λxe λx dx Integration by parts gives 0 λ [ 1 ) λ xe λx ] 1 0 + λ e λx dx As xe λx 0 as x, we get 0 e λx dx which gives 1/λ For the variance see exercise 51 Theorem 23 By the definition of conditional probability, the left hand side is P {X > x + a} {X > a}) P X > a) However {X > x + a} {X > a} = {X > x + a}, so we get 0 P X > x + a) P X > a) = e λx+a) e λa = e λx = P X > x), where we have used Theorem 21 and the fact that it implies P X > x) = e λx for all x > 0 Theorem 24 For x [a, b], F X x) = x a 1 b a dt = x a b a 10
Theorem 25 We have EX) = b For the variance, first find EX 2 ) = Then b a a x 1 [ b a dx = x 2 2b a) [ x 2 1 b a dx = x 3 3b a) VarX) = b2 + ab + a 2 3 Theorem 26 By definition, Φ z) = a + b)2 4 z ] b a ] b a = b2 a 2 2b a) = b + a 2 = b3 a 3 3b a) = b2 + ab + a 2 3 = b2 + a 2 2ab 12 φt) dt Change variables to s = t and use the symmetry of φ to get Φ z) = which is 1 Φz) as required Theorem 27 For the expectation, EZ) = z φ s) ds = zφz) dz = 1 2π z φs) ds, = ze z2 /2 dz Considering the improper integral as a limit, this is 1 t lim ze z2 /2 dz, 2π s,t s which becomes 1 2π lim /2 s,t [ e z2 ] t s = 1 2π b a)2 12 lim /2 s,t e s2 e t2 /2 ) = 0 11
For the variance, we need to calculate EZ 2 ) = 1 2π z 2 e z2 /2 dz Writing z 2 e z2 /2 as z ze z2 /2 and integrating by parts, we get EZ 2 ) = 1 [ ] ) ze z2 /2 + e z2 /2 dz 2π The integral is just the integral of the Normal pdf again, so is 1, and ze z2 /2 0 both as z and z Hence we get EZ 2 ) = 1, and so VarZ) = 1 EZ) 2 = 1, as EZ) = 0 Theorem 28 If X = µ+σz, consider the cumulative) distribution function of X: F X x) = P X x) = P µ + σz x) = P Z x µ ) σ ) x µ = Φ σ To get the probability density function of X, differentiate, using the chain rule: f X x) = F Xx) = 1 ) x µ σ φ σ That EX) = µ and VarX) = σ 2 follows from Theorem 8 12