MAS113 Introduction to Probability and Statistics. Proofs of theorems

Similar documents
MAS113 Introduction to Probability and Statistics. Proofs of theorems

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

1 Random Variable: Topics

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

CHAPTER 4 MATHEMATICAL EXPECTATION. 4.1 Mean of a Random Variable

Probability and Distributions

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

Introduction to Machine Learning

1 Review of Probability

Review of Probability Theory

3 Continuous Random Variables

Probability Theory and Statistics. Peter Jochumzen

BASICS OF PROBABILITY

Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution

3. Probability and Statistics

ECON Fundamentals of Probability

Expectation of Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Class 8 Review Problems solutions, 18.05, Spring 2014

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Random Variables and Their Distributions

Continuous Random Variables and Continuous Distributions

Northwestern University Department of Electrical Engineering and Computer Science

More on Distribution Function

STAT 430/510: Lecture 16

Random variables (discrete)

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Chapter 4. Chapter 4 sections

Lecture 4: Probability and Discrete Random Variables

Chapter 4 continued. Chapter 4 sections

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chp 4. Expectation and Variance

2 (Statistics) Random variables

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

1.1 Review of Probability Theory

Notes for Math 324, Part 19

SDS 321: Introduction to Probability and Statistics

Topic 3: The Expectation of a Random Variable

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

CME 106: Review Probability theory

Slides 8: Statistical Models in Simulation

Analysis of Engineering and Scientific Data. Semester

Things to remember when learning probability distributions:

18.440: Lecture 28 Lectures Review

3 Multiple Discrete Random Variables

Gaussian vectors and central limit theorem

SDS 321: Introduction to Probability and Statistics

Math Bootcamp 2012 Miscellaneous

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Joint Distribution of Two or More Random Variables

Final Exam # 3. Sta 230: Probability. December 16, 2012

ACM 116: Lectures 3 4

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

This does not cover everything on the final. Look at the posted practice problems for other topics.

Probability Review. Chao Lan

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

1 Review of Probability and Distributions

Stat 5101 Notes: Algorithms

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Quick Tour of Basic Probability Theory and Linear Algebra

The Multivariate Normal Distribution. In this case according to our theorem

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

ENGG2430A-Homework 2

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

We introduce methods that are useful in:

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

2 Continuous Random Variables and their Distributions

LIST OF FORMULAS FOR STK1100 AND STK1110

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Probability Review. Gonzalo Mateos

18.440: Lecture 28 Lectures Review

Lecture 22: Variance and Covariance

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

Introduction to Probability

Tom Salisbury

SDS 321: Introduction to Probability and Statistics

Week 12-13: Discrete Probability

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Topic 3: The Expectation of a Random Variable

01 Probability Theory and Statistics Review

Gaussian random variables inr n

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

The mean, variance and covariance. (Chs 3.4.1, 3.4.2)

Transcription:

MAS113 Introduction to Probability and Statistics Proofs of theorems Theorem 1 De Morgan s Laws) See MAS110 Theorem 2 M1 By definition, B and A \ B are disjoint, and their union is A So, because m is a measure, ma) = mb) + ma \ B); rearranging gives the result Note that more generally ie not assuming B A) ma \ B) = ma) ma B), by the same argument M2 As ma \ B) 0 by definition of a measure, this follows immediately from M1 M3 Apply M1 with B = A; then A \ A = so the LHS is m ), and the RHS is ma) ma) = 0 M4 We can write A B = A B) A \ B) B \ A), and the three sets here are disjoint So using the definition of measure ma B) = ma B) + ma \ B) + mb \ A) Applying M1, we get ma B) + ma) ma B) + mb) ma B) which, simplifying, gives the result Note A B and B A are the same) M5 See Exercise 4 M6 See Exercise 4 Theorem 3 Law of Total Probability) Because the E i form a partition, they are disjoint Hence their intersections with F, F E i, are also disjoint Again because the E i are a partition, any element of F must be in one of them, so the union of F E i for i = 1,, n must be the whole of F, and the previous sentence says it is a disjoint union Hence P F ) = n i=1 P F E i) The second form of the statement which is the more useful one in practice) follows immediately by writing P F E i ) = P E i )P F E i ) from the definition of conditional probability) 1

Theorem 4 Bayes Theorem) By the definition of conditional probability, P E i F ) = P E i F )/P F ) However, we also know from the definition of conditional probability that P E i F ) = P E i )P F E i ) Hence P E i F ) = P E i)p F E i ) P F ) Theorem 5 Before the full proof, consider Example 35 again Here we have a random variable X with range R X = { 1, 0, 1}, and we let Y = X 2 Thus R Y = {0, 1} By definition, we have EY ) = y R Y yp Y = y) = P Y = 1) after a bit of simplification) So we need to consider the event {Y = 1} For Y to be 1 means that either X = 1 or X = 1, and by the obvious) disjointness of the two possibilities P Y = 1) = P X = 1)+P X = 1), so we can say that EY ) = P X = 1)+P X = 1) Now consider the general case, and let Y = gx) Then, by definition, EY ) = y R Y yp Y y) = y R Y yp Y = y) In the example above, we split the event {Y = 1} up into events in terms of X which give Y = 1 More generally, the event {Y = y} is the disjoint union of events {X = x} for each x R X such that gx) = y If g is injective, there will be only one event in the union) So EY ) = yp Y = y) y R Y = y R Y y =,gx)=y y R Y,gx)=y = y R Y,gx)=y P X = x) yp X x) gx)p X x), and the double sum here is equivalent to, giving the result Theorem 6 This is a special case of Theorem 8: in the notation of that theorem let a = 1 and b = EX) 2

Theorem 7 By definition and Theorem 5, VarX) = EX EX)) 2 ) = Expanding the brackets, we have VarX) = x 2 p X x) 2EX) x EX)) 2 p X x) xp X x) + EX) 2 p X x) Note that here we have used the fact that 2EX) and EX) 2 are constants which do not depend on x, so can be taken outside the sum Then xp X x) = EX), by definition, and p X x) = 1 as p X is a probability mass function, so we get VarX) = EX 2 ) 2EX)EX) + EX) 2 = EX 2 ) EX) 2, as required Theorem 8, mean part By Theorem 5, EaX + b) = ax + b)p X x) = a xp X x) + b xp X x) = aex) + b, again using xp X x) = EX) and p X x) = 1 Hence we have the result Theorem 8, variance part By definition, By the mean part, we get VaraX + b) = EaX + b EaX + b)) 2 VaraX + b) = EaX + b aex) b)) 2 ) = EaX EX))) 2 ) = Ea 2 X EX)) 2 ) = a 2 VarX) 3

Theorem 9 The definition of expectation gives EX + Y ) = zp X + Y = z) z R X+Y Now, if z R X+Y we can write z = x + y where x R X and y R Y Hence we can replace the sum over z by a sum over x and y: EX + Y ) = x + y)p X = x, Y = y) y R Y Split the sum up: EX + Y ) = xp X = x, Y = y) + yp X = x, Y = y) y R Y y R Y = x P X = x, Y = y) + y P X = x, Y = y) y R Y y R Y If R X or R Y is infinite, you ll need to take on trust that the reversal of the order of summation is OK here) Now y R Y P X = x, Y = y) = P X = x), and similarly P X = x, Y = y) = P Y = y) So we get EX + Y ) = xp X = x) + yp Y = y) y R Y = EX) + EY ) Theorem 10 This is similar to Theorem 9 Start with EXY ) = zp XY = z) z R XY = xy)p X = x, Y = y) y R Y By independence, P X = x, Y = y) = P X = x)p Y = y), so we get EXY ) = xy)p X = x)p Y = y) y R Y 4

Now, with respect to y, we can regard x and P X = x) as constants, so we take them out of the sum with respect to y, and get EXY ) = xp X = x) yp Y = y), y R Y which immediately gives EXY ) = EX)EY ) Corollary 11 Start with the variance identity Theorem 7): VarX + Y ) = EX + Y ) 2 ) EX + Y )) 2 Use Theorems 8, 9 and 10 to get VarX + Y ) = EX 2 + 2XY + Y 2 ) EX)) 2 EY )) 2 2EX)EY ) = EX 2 ) + EY 2 ) + 2EXY ) EX)) 2 EY )) 2 2EX)EY ) = VarX) + VarY ) + 2EXY ) EX)EY )), and by Theorem 10 EXY ) EX)EY ) = 0, giving the result Theorem 12 This is an easy exercise with the definitions of mean and variance: EX) = 0 1 p) + 1 p = p and EX 2 ) is also p since X only takes values 0 and 1, so X and X 2 are actually the same) Hence VarX) = EX 2 ) = EX)) 2 = p p 2 = p1 p) Theorem 13 Use the fact that X = n i=1 Z i, where Z i = 1 if trial i is a success and Z i = 0 if it is a failure By assumption, the Z i are independent Thus Theorems 9 and 12 tell us EX) = E n Z i ) = i=1 and Corollary 11 and Theorem 12 give VarX) = Var n Z i ) = i=1 n EZ i ) = np, i=1 n VarZ i ) = np1 p) Theorem 14 We are looking for ) ) x n λ lim 1 λ n x, n x n n) 5 i=1

which, writing ) n x = n! depend on n, becomes Now, and λ x x! x!n x)! nn 1)n 2) n x + 1) lim n n x n 1 lim n n = lim n n 2 n is also 1 So we are left with λ x and factoring out terms which do not 1 λ ) x 1 λ n n n) = = lim n n x + 1 n lim 1 λ ) x n n x! lim n 1 λ n) n = 1, By Note 638 in MAS110, lim n 1 λ n) n = e λ, so we are left with as required e λ λ x Theorem 15: valid pmf As p X x) 0, we just need to check x=0 p Xx) = 1 Checking, we have p X x) = x=0 x=0 x! e λ λ x = e λ x! x=0 λ x x! = e λ e λ = 1, recognising the sum as the series expansion of the exponential function Theorem 15: mean and variance We have EX) = x e λ λ x = e λ λ x x! x 1)! x=0 using x! = xx 1)!) Changing variables to y = x 1, we get EX) = e λ λ y+1 y=0 y! = λe λ y=0 x=1 λ y y! = λe λ e λ = λ For the variance, we have EX 2 ) = λ 2 + λ see Exercise 36) and thus VarX) = λ 2 + λ) λ) 2 = λ 6

Theorem 16 For x N, x F X x) = P X = a) = a=1 x a=1 1 p) a 1 p = p 1 p)x p 1 1 p) geometric series with x terms, first term p and common ration 1 p) and that simplifies to 1 1 p) x as required Theorem 17 By the definition of mean, EX) = x1 p) x 1 p x=1 The Binomial Theorem negative integer case) tells us that for θ < 1, 1 θ) 2 = n=0 n + 1)θn = m=1 mθm 1, which you can also obtain by differentiating term by term the formula for the sum of an infinite geometric series Using this with θ = 1 p gives x1 p) x 1 p = p x1 p) x 1 = p1 1 p)) 2 = 1 p x=1 x=1 For the variance, we start by finding EXX 1)) = xx 1)1 p) x 1 p = x=1 xx 1)1 p) x 1 p, as the x = 1 term is zero Again the Binomial Theorem or term by term differentiation says 1 θ) 3 mm 1) = θ m 2, 2 x=2 m=2 and thus xx 1)1 p) x 1 p = 2p1 p) xx 1)1 p) x 2 = p1 p)1 1 p)) 3 = x=2 Now, EX 2 ) = EXX 1) + X) = 1 p p 2 VarX) = EX 2 ) EX)) 2 = x=2 + 1 p, and 21 p) p 2 + 1 p 1 p 2 = 1 p p 2 21 p) p 2 7

Theorem 18 This is really just a repeat of Corollary 11, but without the last line which uses the independence assumption Start with the variance identity Theorem 7): Use Theorem 8 to get VarX + Y ) = EX + Y ) 2 ) EX + Y )) 2 VarX + Y ) = EX 2 + 2XY + Y 2 ) EX)) 2 EY )) 2 2EX)EY ) Theorem 19 = EX 2 ) + EY 2 ) + 2EXY ) EX)) 2 EY )) 2 2EX)EY ) = VarX) + VarY ) + 2EXY ) EX)EY )) = VarX) + VarY ) + 2 CovX, Y ) 1 By definition, CovX, X) = EX EX))X EX))) = EX EX)) 2 ), which is the definition of the variance 2 We have CovaX + b, cy + d) = EaX + b aex) + b))cy + d cey ) + d))) = EacX EX))Y EY ))) = ac CovX, Y ), using the definition of covariance and Theorem 8 3 That CovX, Y ) = 0 if X and Y are independent follows from Theorem 10 That the converse does not necessarily hold is shown by Example 46 CovX,X) 4 We have CorX, X) = = VarX) = 1 For CorX, X), VarX) VarX) VarX) note that 2 above shows that CovX, X) = VarX), from which CorX, X) = 1 follows immediately 5 Let σx 2 = VarX), ) σ2 Y = VarY ) and c = CovX, Y ) Consider Var X c Y, and note that because this is a variance it must σy 2 be non-negative By Theorem 18, and also using Theorem 8 and 8

2 above, we get Var c ) Y σy 2 = VarX) + c2 σ 4 Y = σ 2 X + c2 σ 2 Y = σx 2 c2 σy 2 2 c2 σ 2 Y VarY ) 2 c σ 2 Y CovX, Y ) Thus σx 2 c2 0, σy 2 and dividing through by σx 2 we get CovX, Y ) 2 = c2 σ 2 X σ2 Y 1, from which the result follows 6 By 2 above, CovX, a + bx) = b VarX), and by Theorem 8 Vara + bx) = b 2 VarX) So we get CorX, a + bx) = b VarX) bvarx)) 2, which gives the result, remembering that b 2 is b if b > 0 and b if b < 0 Theorem 20 We want P X 2 = x 2, X 3 = x 3,, X k = x k X 1 = x 1 ), which by the definition of conditional probability is P X 2 = x 2, X 3 = x 3,, X k = x k, X 1 = x 1 ) P X 1 = x 1 ) By the formulae for the multinomial and binomial distributions, this becomes n! x 1!x 2!x k! px 1 1 p x 2 2 p x k k n! x 1!n x 1 )! px 1 1 1 p 1 ) n x 1 9

and various terms cancel, giving which is the same as n x 1 )! x 2! x k! giving the result p2 n x 1 )!p x 2 2 p x 3 3 p x k k x 2! x k!1 p 1 ) n x 1, 1 p 1 ) x2 p3 1 p 1 ) x3 ) xk pk, 1 p 1 Theorem 21 For x 0, F X x) = P X x) = x 0 λe λt dt = 1 e λx Note that if x < 0, P X x) = 0 as x cannot be negative, so in full { 1 e λx x 0 F X x) = 0 x < 0 Theorem 22 We have EX) = λxe λx dx Integration by parts gives 0 λ [ 1 ) λ xe λx ] 1 0 + λ e λx dx As xe λx 0 as x, we get 0 e λx dx which gives 1/λ For the variance see exercise 51 Theorem 23 By the definition of conditional probability, the left hand side is P {X > x + a} {X > a}) P X > a) However {X > x + a} {X > a} = {X > x + a}, so we get 0 P X > x + a) P X > a) = e λx+a) e λa = e λx = P X > x), where we have used Theorem 21 and the fact that it implies P X > x) = e λx for all x > 0 Theorem 24 For x [a, b], F X x) = x a 1 b a dt = x a b a 10

Theorem 25 We have EX) = b For the variance, first find EX 2 ) = Then b a a x 1 [ b a dx = x 2 2b a) [ x 2 1 b a dx = x 3 3b a) VarX) = b2 + ab + a 2 3 Theorem 26 By definition, Φ z) = a + b)2 4 z ] b a ] b a = b2 a 2 2b a) = b + a 2 = b3 a 3 3b a) = b2 + ab + a 2 3 = b2 + a 2 2ab 12 φt) dt Change variables to s = t and use the symmetry of φ to get Φ z) = which is 1 Φz) as required Theorem 27 For the expectation, EZ) = z φ s) ds = zφz) dz = 1 2π z φs) ds, = ze z2 /2 dz Considering the improper integral as a limit, this is 1 t lim ze z2 /2 dz, 2π s,t s which becomes 1 2π lim /2 s,t [ e z2 ] t s = 1 2π b a)2 12 lim /2 s,t e s2 e t2 /2 ) = 0 11

For the variance, we need to calculate EZ 2 ) = 1 2π z 2 e z2 /2 dz Writing z 2 e z2 /2 as z ze z2 /2 and integrating by parts, we get EZ 2 ) = 1 [ ] ) ze z2 /2 + e z2 /2 dz 2π The integral is just the integral of the Normal pdf again, so is 1, and ze z2 /2 0 both as z and z Hence we get EZ 2 ) = 1, and so VarZ) = 1 EZ) 2 = 1, as EZ) = 0 Theorem 28 If X = µ+σz, consider the cumulative) distribution function of X: F X x) = P X x) = P µ + σz x) = P Z x µ ) σ ) x µ = Φ σ To get the probability density function of X, differentiate, using the chain rule: f X x) = F Xx) = 1 ) x µ σ φ σ That EX) = µ and VarX) = σ 2 follows from Theorem 8 12