MAS113 Introduction to Probability and Statistics. Proofs of theorems

MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a measure, ma) = mb) + ma \ B); rearranging gives the result Note that more generally ie not assuming B A) ma \ B) = ma) ma B), by the same argument M2 As ma \ B) 0 by definition of a measure, this follows immediately from M1 M3 Apply M1 with B = A; then A \ A = so the LHS is m ), and the RHS is ma) ma) = 0 M4 We can write A B = A B\A), and the two sets here are disjoint So using the definition of measure ma B) = ma) + mb \ A) Applying M1, we get ma) + mb) ma B) which gives the result Note A B and B A are the same) M5 See Exercise 4 M6 See Exercise 4 Theorem 3 Law of Total Probability) Because the E i form a partition, they are disjoint Hence their intersections with F, F E i, are also disjoint Again because the E i are a partition, any element of F must be in one of them, so the union of F E i for i = 1,, n must be the whole of F, and the previous sentence says it is a disjoint union Hence P F ) = n i=1 P F E i) The second form of the statement which is the more useful one in practice) follows immediately by writing P F E i ) = P E i )P F E i ) from the definition of conditional probability) 1

Theorem 4 Bayes Theorem) By the definition of conditional probability, P E i F ) = P E i F )/P F ) However, we also know from the definition of conditional probability that P E i F ) = P E i )P F E i ) Hence P E i F ) = P E i)p F E i ) P F ) Theorem 5 Before the full proof, consider Example 35 again Here we have a random variable X with range R X = { 1, 0, 1}, and we let Y = X 2 Thus R Y = {0, 1} By definition, we have EY ) = y R Y yp Y = y) = P Y = 1) after a bit of simplification) So we need to consider the event {Y = 1} For Y to be 1 means that either X = 1 or X = 1, and by the obvious) disjointness of the two possibilities P Y = 1) = P X = 1)+P X = 1), so we can say that EY ) = P X = 1)+P X = 1) Now consider the general case, and let Y = gx) Then, by definition, EY ) = y R Y yp Y y) = y R Y yp Y = y) In the example above, we split the event {Y = 1} up into events in terms of X which give Y = 1 More generally, the event {Y = y} is the disjoint union of events {X = x} for each x R X such that gx) = y If g is injective, there will be only one event in the union) So EY ) = yp Y = y) y R Y = y R Y y =,gx)=y y R Y,gx)=y = y R Y,gx)=y P X = x) yp X x) gx)p X x), and the double sum here is equivalent to, giving the result Theorem 6 By definition and Theorem 5, VarX) = EX EX)) 2 ) = 2 x EX)) 2 p X x)

Expanding the brackets, we have VarX) = x 2 p X x) 2EX) xp X x) + EX) 2 p X x) Note that here we have used the fact that 2EX) and EX) 2 are constants which do not depend on x, so can be taken outside the sum Then xp X x) = EX), by definition, and p X x) = 1 as p X is a probability mass function, so we get VarX) = EX 2 ) 2EX)EX) + EX) 2 = EX 2 ) EX) 2, as required Theorem 7, mean part By Theorem 5, EaX + b) = ax + b)p X x) = a xp X x) + b xp X x) = aex) + b, again using xp X x) = EX) and p X x) = 1 Hence we have the result Theorem 7, variance part By definition, By the mean part, we get VaraX + b) = EaX + b EaX + b)) 2 VaraX + b) = EaX + b aex) b)) 2 ) = EaX EX))) 2 ) = Ea 2 X EX)) 2 ) = a 2 VarX) Theorem 8 The definition of expectation gives EX + Y ) = zp X + Y = z) z R X+Y 3

Now, if z R X+Y we can write z = x + y where x R X and y R Y Hence we can replace the sum over z by a sum over x and y: EX + Y ) = x + y)p X = x, Y = y) y R Y Split the sum up: EX + Y ) = xp X = x, Y = y) + yp X = x, Y = y) y R Y y R Y = x P X = x, Y = y) + y P X = x, Y = y) y R Y y R Y If R X or R Y is infinite, you ll need to take on trust that the reversal of the order of summation is OK here) Now y R Y P X = x, Y = y) = P X = x), and similarly P X = x, Y = y) = P Y = y) So we get EX + Y ) = xp X = x) + yp Y = y) y R Y = EX) + EY ) Theorem 9 This is similar to Theorem 8 Start with EXY ) = z R XY zp XY = z) = y R Y xy)p X = x, Y = y) By independence, P X = x, Y = y) = P X = x)p Y = y), so we get EXY ) = xy)p X = x)p Y = y) y R Y Now, with respect to y, we can regard x and P X = x) as constants, so we take them out of the sum with respect to y, and get EXY ) = xp X = x) yp Y = y), y R Y which immediately gives EXY ) = EX)EY ) 4

Corollary 10 Start with the variance identity Theorem 6): VarX + Y ) = EX + Y ) 2 ) EX + Y )) 2 Use Theorems 8 and 7 to get VarX + Y ) = EX 2 + 2XY + Y 2 ) EX)) 2 EY )) 2 2EX)EY ) = EX 2 ) + EY 2 ) + 2EXY ) EX)) 2 EY )) 2 2EX)EY ) = VarX) + VarY ) + 2EXY ) EX)EY )), and by Theorem 9 EXY ) EX)EY ) = 0, giving the result Theorem 11 This is an easy exercise with the definitions of mean and variance: EX) = 0 1 p) + 1 p = p and EX 2 ) is also p since X only takes values 0 and 1, so X and X 2 are actually the same) Hence VarX) = EX 2 ) = EX)) 2 = p p 2 = p1 p) Theorem 12 Use the fact that X = n i=1 Z i, where Z i = 1 if trial i is a success and Z i = 0 if it is a failure By assumption, the Z i are independent Thus Theorems 8 and 11 tell us n EX) = E Z i ) = i=1 n EZ i ) = np, i=1 and Corollary 10 and Theorem 11 give n VarX) = Var Z i ) = i=1 n VarZ i ) = np1 p) i=1 Theorem 13 We are looking for n x which, writing ) n x = n! depend on n, becomes λ x x! x!n x)! ) λ n ) x 1 λ n) n x, and factoring out terms which do not nn 1)n 2) n x + 1) 1 λ ) x 1 λ n n x n n) 5

Now, and n 1 n = n 2 n is also 1 So we are left with = = n x + 1 n 1 λ ) x n λ x x! 1 λ n) n = 1, By Note 639 in MAS110, or below, 1 λ n) n = e λ, so we are left with e λ λ x as required Limit of 1 + n) r n This it is needed for Theorem 13, and also occurs in other areas of mathematics One example is compound interest in financial mathematics) In fact, the exponential function is sometimes defined to be equal to this it; the argument below assumes that we have a different definition of the exponential function which implies the familiar properties of the exponential and logarithmic functions, including the derivative of the latter x! Assume r 0 We can start off with the fact that h 0 log1+h) h = 1 This is the statement that the derivative of log x at x = 1 is 1 Multiply both sides by r to get r log1 + h) = r, h 0 h and now let n = r/h so that h = r/n If r > 0 then n corresponds to h 0 from above, and if r < 0 the case we actually need in Theorem 13) then n corresponds to h 0 from below, but the it is valid in both cases So 1 n log + r ) = r, n 6

and now take exponentials of both sides and use the continuity of the exponential function to obtain 1 + n) r n = e r Theorem 14: valid pmf As p X x) 0, we just need to check x=0 p Xx) = 1 Checking, we have p X x) = x=0 e λ λ x x=0 x! = e λ x=0 λ x x! = e λ e λ = 1, recognising the sum as the series expansion of the exponential function Theorem 14: mean and variance We have EX) = x=0 x e λ λ x x! = e λ x=1 λ x x 1)! using x! = xx 1)!) Changing variables to y = x 1, we get EX) = e λ λ y+1 y! y=0 = λe λ y=0 λ y y! = λe λ e λ = λ For the variance, we have EX 2 ) = λ 2 + λ see Exercise 36) and thus VarX) = λ 2 + λ) λ) 2 = λ Theorem 15 For x N, F X x) = x P X = a) = a=1 x a=1 1 p) a 1 p = p 1 p)x p 1 1 p) geometric series with x terms, first term p and common ratio 1 p) and that simplifies to 1 1 p) x as required Theorem 16 By the definition of mean, EX) = x1 p) x 1 p x=1 7

The Binomial Theorem negative integer case) tells us that for θ < 1, 1 θ) 2 = n=0 n + 1)θn = m=1 mθm 1, which you can also obtain by differentiating term by term the formula for the sum of an infinite geometric series Using this with θ = 1 p gives x1 p) x 1 p = p x1 p) x 1 = p1 1 p)) 2 = 1 p x=1 x=1 For the variance, we start by finding EXX 1)) = xx 1)1 p) x 1 p = x=1 xx 1)1 p) x 1 p, as the x = 1 term is zero Again the Binomial Theorem or term by term differentiation says x=2 1 θ) 3 = m=2 x=2 mm 1) θ m 2, 2 and thus xx 1)1 p) x 1 p = 2p1 p) xx 1)1 p) x 2 = p1 p)1 1 p)) 3 = x=2 21 p) p 2 Now, EX 2 ) = EXX 1) + X) = 1 p p 2 + 1 p, and VarX) = EX 2 ) EX)) 2 = 21 p) p 2 + 1 p 1 p 2 = 1 p p 2 Theorem 17 For x 0, F X x) = P X x) = x 0 λe λt dt = 1 e λx Note that if x < 0, P X x) = 0 as x cannot be negative, so in full { 1 e λx x 0 F X x) = 0 x < 0 Theorem 18 We have EX) = λxe λx dx Integration by parts gives 0 λ [ 1 ) λ xe λx ] 1 0 + λ e λx dx 8 0

As xe λx 0 as x, we get 0 e λx dx which gives 1/λ In the lecture I did this carefully, with the improper integral treated as a it of integrals from 0 to t as t ) For the variance see exercise 45 for EX 2 ) = 2/λ 2, from which it follows that VarX) = 1/λ 2 Theorem 19 By the definition of conditional probability, the left hand side is P {X > x + a} {X > a}) P X > a) However {X > x + a} {X > a} = {X > x + a}, so we get P X > x + a) P X > a) = e λx+a) e λa = e λx = P X > x), where we have used Theorem 18 and the fact that it implies P X > x) = e λx for all x > 0 Theorem 20 For x [a, b], F X x) = x a Theorem 21 We have EX) = b For the variance, first find EX 2 ) = Then b a a 1 b a x 1 [ b a dx = x 2 2b a) [ x 2 1 b a dx = x 3 3b a) VarX) = b2 + ab + a 2 3 Theorem 22 By definition, Φ z) = a + b)2 4 z dt = x a b a ] b a ] b a = b2 a 2 2b a) = b + a 2 = b3 a 3 3b a) = b2 + ab + a 2 3 = b2 + a 2 2ab 12 φt) dt = b a)2 12 9

Change variables to s = t and use the symmetry of φ to get Φ z) = which is 1 Φz) as required Theorem 23 For the expectation, EZ) = z φ s) ds = zφz) dz = 1 z φs) ds, ze z2 /2 dz Considering the improper integral as a it, this is which becomes 1 1 s,t /2 s,t [e z2 ] t s = 1 t For the variance, we need to calculate s EZ 2 ) = 1 ze z2 /2 dz, /2 s,t e t2 e s2 /2 ) = 0 z 2 e z2 /2 dz Writing z 2 e z2 /2 as z ze z2 /2 and integrating by parts, we get EZ 2 ) = 1 [ ] ) ze z2 /2 + e z2 /2 dz The integral is just the integral of the Normal pdf again, so is 1, and ze z2 /2 0 both as z and z Hence we get EZ 2 ) = 1, and so VarZ) = 1 EZ) 2 = 1, as EZ) = 0 Theorem 24 If X = µ+σz, consider the cumulative) distribution function of X: F X x) = P X x) = P µ + σz x) = P Z x µ ) σ ) x µ = Φ σ 10

To get the probability density function of X, differentiate, using the chain rule: f X x) = F Xx) = 1 ) x µ σ φ σ That EX) = µ and VarX) = σ 2 follows from Theorem 7 11