King Saud University College of Since Statistics and Operations Research Department PROBABILITY (I) 215 STAT. By Weaam Alhadlaq

Size: px

Start display at page:

Download "King Saud University College of Since Statistics and Operations Research Department PROBABILITY (I) 215 STAT. By Weaam Alhadlaq"

Bertram Bailey
6 years ago
Views:

1 King Saud University College of Since Statistics and Operations Research Department PROBABILITY (I) 25 STAT By Weaam Alhadlaq I

2 Topics to be covered. General Review: Basic concepts of Probability and Random variables (discrete and continuous cases), moments and moment generating function and its applications. 2. Discrete Probability Distributions: Uniform Bernoulli Binomial - Negative Binomial Geometric Poisson - Hyper Geometric. 3. Continuous Probability Distributions: Uniform Exponential Normal Beta Gamma - - Chi-squared. 4. Bivariate Distributions (discrete and continuous cases): Joint probability function - Joint distribution function - Marginal distributions - Expectations of jointly distributed random variables - Joint moment generating function. 5. Conditional Distribution: Conditional distribution function - Conditional expectation - Moment generating function. 6. Stochastic Independence of random variables: some consequent results of independence on Joint probability functions Covariance Correlation. 7. Distributions of functions of random variables (Transformations): The case of one variable and two variables (discrete and continuous). II

3 References - Probabilities Theory and Its Applications: Dr.M.Ibraheem Akeel and A.M.Abuammoh.42 KSU. - Introduction to the Statistical Theory: Dr.Ahmed Aodah,4 KSU. - Probabilities Theory: Dr. Jalal Mostafa Assayyad, 427 A.H. haves Publications House Jeddah. - A First Course in Probability: S. Ross Maxwell Macmillan An Introduction to Probability and Its Applications: Larson & Marx prentice Hall Modern Probability Theory and Its Applications: Parzen John Wiley Probability; An Introduction: G. Grimmett and D. Welsh Oxford Probability and Mathematical Statistics; An Introduction: L. Lukacs Academic Press New York 972. Credits: (3+) Assessment Tasks for Students During the Semester: Assessment task Date Proportion of Total Assessment First Term Exam 25% 2 Second Term Exam 25% 3 Home works and Quizzes (tutorial) % 4 Final Exam 4% III

4 Contents Topics to be covered... II References... III Chapter One: Basic concepts of Probability and Random variables.... Basic concepts of Probabilities....2 Random Variable Discrete Random Variable Continuous random variable Probability Function Discrete Case (Probability Mass function) Continuous Case (Probability Density function) Cumulative distribution function (CDF) Discrete Case Continuous Case... Chapter Two: Mathematical Expectation, Moments and Moment Generating Function Mathematical Expectation Expected value of a random variable Variance of a Random Variable Standard deviation Mean and Variance for Linear Combination Chebyshev's inequality Central Moments and Raw Moments Central Moments Raw Moments Moment Generating Function Finding Moments from MGF Moment Generating Function for Linear Combination... 2 Chapter Three: Frequently Used Discrete Probability Distributions IV

5 3. Discrete Uniform Distribution Bernoulli Distribution Binomial Distribution Geometric Distribution Negative Binomial Distribution Hypergeometric Distribution Binomial Approximation to Hypergemetric Distribution Poisson Distribution Poisson Approximation to Binomial Distribution Chapter Four: Frequently Used Continuous Probability Distributions Uniform Distribution Exponential Distribution Lack of Memory Property (Memoryless) Relation Between Exponential and Poisson Distributions Gamma Distribution Gamma function Some useful Properties for Gamma distribution Definition of Gamma Distribution Special Cases Incomplete Gamma Function Chi-squared Distribution Beta Distribution Normal Distribution Some properties of the normal curve f(x) of N(μ, σ) Standard Normal Distribution Calculating Probabilities of Standard Normal Distribution Calculating Probabilities of Normal Distribution Normal Approximation to Binomial V

6 4.7 Student s t Distribution Some properties of the t distribution... 6 Chapter Five: Joint, Marginal, and Conditional Distributions Joint Distributions Joint Probability function Joint distribution function Marginal probability Distributions Joint Mathematical Expectation Joint Moment Generating Function Conditional Distributions Conditional Probability function Conditional Distribution Function Conditional Expectation Moment Generating Function for Conditional Distributions... 7 Chapter Six: Covariance, Correlation, Independence of Variables (Stochastic Independence) Covariance of random variables Correlation Coefficient Independence of random variables Chapter Seven: Distributions of Functions of Random Variables (Transformations) Discrete Case The Case of One Variable The Case of Two Variables Continuous Case Distribution Function Method (CDF) Change-of-Variable Method Moment-Generating Function Method Appendix A VI

7 Chapter One Basic concepts of Probability and Random variables. Basic concepts of Probabilities An Experiment An experiment is some procedure (or process) that we do. Sample Space The sample space of an experiment is the set of all possible outcomes of an experiment. Also, it is called the universal set, and is denoted by Ω. An Event Any subset of the sample space A Ω is called an event. Note φ Ω: Is the impossible event. Ω Ω: Is the sure event. Complement of an Event The complement of the event A is denoted by A c or A. The event A consists of all outcomes of Ω but are not in A. Probability Probability is a measure (or number) used to measure the chance of the occurrence of some event. This number is between and. Equally Likely Outcomes The outcomes of an experiment are equally likely if the outcomes have the same chance of occurrence. Probability of an Event If the experiment has n(ω) equally likely outcomes, then the probability of the event E is denoted by P(E) and is defined by: P(E) = n(e) n(ω).

8 Thus, probabilities have the following properties. P(A) for each event A. 2. P(Ω) =. 3. P(φ) =. 4. P(A c ) = P(A). Some Operations on Events Union: The event A B consists of all outcomes in A or in B or in both A and B. Intersection: The event A B consists of all outcomes in both A and B. Sub event: The event A is called a sub event of B A B if event B occurs whenever the event A occurs. Addition Rule P(A B) = P(A) + P(B) P(A B) Mutually exclusive (disjoint) Events Two events are mutually exclusive if they have no sample points in common, or equivalently, if they have empty intersection. Events A, A 2,, A n are mutually exclusive if A i A j = φ for all i j, where φ denotes the empty set with no sample points. For this case:. P(A B) =. 2. A and A c are disjoint. 3. P(A B) = P(A) + P(B). special case of addition rule 4. P ( i= A i ) = P (Ai) i=. A B A B Disjoint Events Joint (Not Disjoint) Events Exhaustive Events The events A, A 2,, A n are exhaustive events if: A A 2 A n = Ω. For this case: P(A A 2 A n ) = P(Ω) =. 2

9 A Partition A collection of events A, A 2,, A n of a sample space Ω is called a partition of Ω if A, A 2,, A n are mutually exclusive and exhaustive events. Example. Ω = {,2,3,4,5,6}, A = {,2}, B = {2,3} and C = {,5,6}. Find:P(A), P(B), P(C), P(A B), P(B C) and P(A C). P(A) = P(B) = 3, P(C) = 2, P(A B) = 6, P(B C) =, P(A C) = P(A) + P(C) P(A C) = = 2 3. What can we say about B&C? Since P(B C) =, Then B&C are disjoint events. Conditional Probability The conditional probability of A, given B, written as P(A B), is defined to be P(A B) = P(A B) P(B) Independence. Two events A and B are said to be independent if B provides no information about whether A has occurred and vice versa. In symbols: - P(A B) = P(A)P(B), Or - P(A B) = P(A), Or - P(B A) = P(B). Return to Example.. Compute P(A C). P(A C) = P(A C) P(C) Are A&C independent? = /6 /2 = 3. Since P(A C) = = P(A), then A&C are independent. 3.2 Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we often want to represent 3

10 outcomes as numbers. A random variable is a function that associates a unique numerical value with every outcome of an experiment X: Ω R. The value of the random variable will vary from trial to trial as the experiment is repeated. There are two types of random variable discrete and continuous..2. Discrete Random Variable The random variable X is discrete and is said to have a discrete distribution if it can take on values only from a finite X {x, x 2,, x n } or countable infinite sequence X {x, x 2, }. Discrete random variables are usually represents count data, such as, the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery and the number of defective light bulbs in a box of ten. Example.2 Consider the experiment of successive tosses of a coin. Define a variable X as X = if the first head occurs on an even-numbered toss, X = if the first head occurs on an odd-numbered toss; and define the variable Y denote the number of the toss on which the first head occurs. The sample space for this experiment is Ω = {H, TH, TTH, TTTH, }. Therefore, w X(w) Y(w) H TH 2 TTH 3 TTTH 4 Both X and Y are discrete random variables, where the set of all possible values of X is {,} (finite), and the set of all possible values of Y is {,2,3,4, } (infinite but countable)..2.2 Continuous random variable A continuous random variable usually can assume numerical values from an interval of real numbers, perhaps the whole set of real numbers R; X { x: a < x < b; a, b R}. Continuous random variables are usually measurements, for example, height, weight, the amount of sugar in an orange, the time required to run a mile. 4

11 .3 Probability Function.3. Discrete Case (Probability Mass function) The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is called the probability mass function (pmf) which is usually denoted by f X (x), f(x), p(x)or p x and is equal to P(X = x). The probability mass function must satisfy. f(x) for all x, 2. f(x) x =. Example.3 Consider the following game. A fair 4-sided die, with the numbers, 2, 3, 4 is rolled twice. If the score on the second roll is strictly greater than the score on the first the player wins the difference in euro. If the score on the second roll is strictly less than the score on the first roll, the player loses the difference in euro. If the scores are equal, the player neither wins nor loses. If we let X denote the (possibly negative) winnings of the player, what is the probability mass function of X? The total number of outcomes of the experiment is 4 4 = 6. The sample space of this experiment is Ω = {(,), (,2), (,3), (,4), (2,), (2,2), (2,3), (4,3), (4,4)}. Thus, X can take any of the values 3, 2,,,, 2, 3. The pmf can be found as follow f( 3) = P(X = 3) = P{(4,)} = 6, f( 2) = P(X = 2) = P{(4,2)} + P{(3,)} = 2 6, f(3) = P(X = 3) = P{(,4)} = 6. Hence, the distribution of X can be written as X Total f(x) /6 2/6 3/6 4/6 3/6 2/6 /6 Example.4 Consider an experiment of tossing an unfair coin twice. The probability is.6 that a coin will turn up heads on any given toss, and let X be defined as the number of 5

12 heads observed. Find the range (possible values) of X, as well as its probability mass function. Then find P(X = ), P(X =.5), P( x 3), P( < x 3), P(X > 4) & P(X > 2). I. The sample space of this experiment is Ω = {HH, HT, TH, TT}, therefore, the number of heads will be, or 2. Thus, the possible values of X are {,,2}. Since this is a finite countable set, the random variable X is discrete. Next, we need to find the pmf w X HH 2 HT TH TT P(H) =.6 & P(T) =.4, Therefore we have f() = P(X = ) = P(TT) = (.4)(.4) =.6, f() = P(X = ) = P(HT) + P(TH) = (.6)(.4) + (.4)(.6) =.48, f(2) = P(X = 2) = P(HH) = (.6)(.6) =.36. Hence, the distribution of X can be written as X 2 Total f(x) II. Now we can calculate the probabilities as follow. P(X = ) = P(X =.5) = 3. P( x 3) = P(X = ) + P(X = 2) = =.84, 4. P( < x 3) = P(X = 2) =.36, 5. P(X > 4) =. 6. P(X > 2) = P(X = ) + P(X = ) + P(X = 2) = Example.5 Suppose the range of a discrete random variable is {, 2, 3, 4}. If the probability mass function is f (x) = cx for x =, 2, 3, 4. Find is the value of c, then calculate P(X = 3.25), P(X > 2), P( < X 5) 6

I. Since f(x) is a pmf, it should satisfy two conditions. First, f (x) c. 2. Second, f(x) x = f() + f(2) + f(3) + f(4) = f (x) = x ; x =, 2, 3, 4. II.. P(X = 3.25) = c + 2c + 3c + 4c = c = c =. 2. P(X > 2) = f(3) + f(4) = 3 + 4 = 7 =.

13 I. Since f(x) is a pmf, it should satisfy two conditions. First, f (x) c. 2. Second, f(x) x = f() + f(2) + f(3) + f(4) = f (x) = x ; x =, 2, 3, 4. II.. P(X = 3.25) = c + 2c + 3c + 4c = c = c =. 2. P(X > 2) = f(3) + f(4) = = 7 =.7 3. P( < X 5) = f(2) + f(3) + f(4) = = 9 = Continuous Case (Probability Density function) The probability density function of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval. It is called the probability density function (pdf) which is b usually denoted by f X (x) or f(x). Symbolically, P(a < X < b) = f(x)dx a area under the curve of f(x) and over the interval (a,b). = the The probability density function must satisfy. f(x) or all x, 2. f(x)dx =. Note: In the continuous case for any x R.. f(x) P(X = x), 2. P(X = x) =, 3. P(a < X < b) = P(a X < b) = P(a < X b) = P(a X b) Example.6 2x; < x < Let f(x) = { ; otherwise I. Check if f(x) is pdf. 7

14 Since,. x > f(x) = 2x >, 2. f(x)dx = 2x dx Thus, f(x) is a pdf. =, II. Calculate P ( < X < ), P(X =.5) & P( 2 < X <.75) 4 2. P ( < X < ) = 2 f(x)dx P(X =.5) = = 2x dx 4 3. P( 2 < X <.75) = f(x)dx Example.7 = = 2x dx = Let X be a continuous random variable with the following pdf f(x) = { ke x ; x ; otherwise I. Find k. The probability density function must satisfy two conditions. f(x) ke x k 2. f(x)dx = ke x dx k = f(x) = e x ; x. II. Find P( < X < 3), P(X > 4) & P(X 4) 3. P( < X < 3) = e x dx 4 = k[e e ] = k[ ] = = [e 3 e ] = P(X > 4) = e x dx = [e e 4 ] = [.83] = P(X 4) = P(X > 4) = Cumulative distribution function (CDF) All random variables (discrete and continuous) have a cumulative distribution function (CDF) denoted by F(x). It is a function giving the probability that the random variable X is less than or equal to x, for every value x. Formally, the cumulative distribution function F(x) is defined to be: 8

15 .4. Discrete Case F(x) = P(X x) for < x <. For a discrete random variable, the cumulative distribution function is found by summing up the probabilities Example.8 F(x) = f(t) t<x. Return to Examples.4,.5. Find the distribution function CDF for the r.v. X. For example.4 we have X 2 Total f(x) F(x).6.64 In a formal way, ; x <.6; x < F(x) = {.64; x < 2 ; x 2 Thus, we can immediately calculate: P(X ) = F() =.64. P(X > ) = P(X ) = F() =.6 =.84, and so on. For example.5 we have F(x) = P(X x) = x i i= ; Let say we want to calculate P(X 2) = F(2) = i= = + 2 = 3 =.3. Generally, Formally, X Total f(x) F(x)..3.6 F(x) = 2 i ; x <.; x < 2.3; 2 x < 3.6; 3 x < 4 { ; x 4 9

16 .4.2 Continuous Case For a continuous random variable, the cumulative distribution function is the integral of its probability density function on the interval (, x). Result F(x) = x f(x)dx The different Inequalities probabilities can be written by using the CDF as follow P(X a) = F(a), P(X > a) = P(X a) = F(a), P(a < X b) = P(X b) P(X a) = F(b) F(a). Example.9 Return to Examples.6,.7. Find the distribution function CDF for the r.v. X. For example.6 we have Formally, x F(x) = 2x dx = x 2 ; < x <, ; x < F(x) = { x 2 ; x <. ; x Now, we can immediately calculate the probabilities on the form P(X < x) or P(X x), such as P(X <.5) = F(.5) =.5 2 =.25, P(X < 3) =, P(X < 2) =. For example.7 we have Formally, x F(x) = e x dx = [e x e ] = e x ; x >, ; x < F(x) = { e x ; x. Now, let find P(X < 4) = F(4) = e 4 =.987, P(.25 < X <.5) = F(.5) F(.25) = e.5 ( e.25 ) =.723.

17 Chapter Two Mathematical Expectation, Moments and Moment Generating Function 2. Mathematical Expectation In this section, we learn a general definition of mathematical expectation, as well as some specific mathematical expectations, such as the mean and variance. 2.. Expected value of a random variable For a random variable X, the expected value (mean, average, a predicted value of a variable) is denoted E(X), μ X or μ. Discrete case If f(x) is the pmf of the discrete random variable X, then the expected value of X is Continuous case μ = E(X) = x f(x) x = x P(X = x) x. If f(x) is the pdf of the continuous random variable X, then the expected value of X is E(X) = x f(x)dx. Note: although the integral is written with lower limit and upper limit, the interval of integration is the interval of non-zero-density for X. Example 2. Compute the expected values of the r.v. s which presented in Examples.4 &.6. For Example.4, the expected value is calculated by E(X) = x x f(x). Thus, X Total f(x) /6 2/6 3/6 4/6 3/6 2/6 /6 xf(x) -3/6-4/6-3/6 3/6 4/6 3/6 E(X) = For Example.6, the pmf is f (x) = x ; x =, 2, 3, 4. Then E(X) = x x f(x) =. f() + 2. f(2) + 3. f(3) + 4. f(4). = = 3.

18 Example 2.2 Compute the expected values of the r.v. s which presented in Examples.6 &.7. For Example.7, the pdf is f(x) = 2x; E(X) = x f(x)dx = x (2x)dx = 2 3. < x <. Thus, For Example.8, the pdf is f(x) = e x ; x >. Hence, E(X) = x e x dx Use integration by parts: u = x dv = e x dx du = dx v = e x Hence, E(X) =. x e x dx Expectation of a function = xe x ( e x )dx = + e x dx = [e e ],= If g is a function, then E(g(X)) is equal to g(x)f(x) x if X is a discrete random variable, and it is equal to Corollary g(x)f(x)dx if X is a continuous random variable. Let a, b R are two constants and g, g 2 are two functions of a random variable X. Then E(aX ± b) = ae(x) ± b. E(ag (X) + bg 2 (X)) = ae(g (X)) + be(g 2 (X)). Corollary If X, X 2, X n are n independent r.v. s and g, g 2,, g n are any functions then E[g (X ) g 2 (X 2 ) g n (X n )] = E[g (X )] E[g 2 (X 2 )] E[g n (X n )]. Example 2.3 n = i= E[g i (X i )]. Compute the expected values of g(x) = X 2 and h(x) = 3X + 2 in Examples.3 &.5. For Example.3, the expected value is calculated by 2

19 E(g(X)) = E(X 2 ) = E(X 2 ). Thus, X Total f(x) /6 2/6 3/6 4/6 3/6 2/6 /6 x 2 f(x) 9/6 8/6 3/6 3/6 8/6 9/6 E(X) = 2.5 Hence, E(X 2 ) = 2.5 =.5. Also, E(3X + 2) = 3E(X) + 2 = 3() + 2 = 2. For Example.5, the expected value is calculated by E(g(X)) = E(X 2 ) = E(X 2 ) = ( x 2 f(x) x ), = + (.) + 4(.2) + 9(.3) + 6(.4) = = 9. Also, E(3X + 2) = 3E(X) + 2 = 3(3) + 2 =. Example 2.4 Compute the expected value of g (X) = X 2 3 in Example.6, and the expected value of g 2 (X) = X 2 + X in Example.7. For Example.6, the pdf is f(x) = 2x; E(g (X)) = E (X 2 3) = x 2 3 f(x)dx < x <. Thus, = x 2 3 (2x)dx For Example.7, the pdf is f(x) = e x ; x >. Hence, E(g 2 (X)) = E(X 2 + X) = E(X 2 ) + E(X) = x 2 e x dx Use integration by parts: u = x 2 dv = e x dx du = 2x dx v = e x Hence, E(X 2 ) = 2 x e x dx x 2 e x dx = x 2 e x 2x e x dx = 2E(X) = 2() = 2. Therefore, E(g 2 (X)) = 2 + = Variance of a Random Variable = 2 x 5 3 dx + = + 2x e x dx = 6 8 x8 3 = 6 8. The variance (which denoted by V(X), σ X 2 or σ 2 ) is a measure of the "dispersion" of X about the mean. A large variance indicates significant levels of probability or = 3

20 density for points far from E(X). The variance is always positive (σ 2 ). It is defined as the average of the squared differences from the Mean. Symbolically, This is equivalent to V(X) = E[(X μ X ) 2 ], V(X) = E(X 2 ) μ X 2 = E(X 2 ) [E(X)] 2. Where E(X 2 ) = { x x2 f(x) if X is discrete r. v. x 2 f(x) if X is continuous r. v. The second formula is commonly used in calculations Standard deviation The standard deviation of the random variable X is the square root of the variance, and is denoted σ X = σ X 2 = V(X). Corollary Let a, b R are two constants. If X is a random variable with variance V(x), then V(aX ± b) = a 2 V(X). Example 2.5 Compute the variance and standard deviation of the r.v. s which presented in Examples.3. Then, calculate V(X 6). V(X) = E(X 2 ) [E(X)] 2 From Example 2., we found that E(X) = Now, let calculate E(X 2 ) X Total f(x) /6 2/6 3/6 4/6 3/6 2/6 /6 x 2 f(x) 9/6 8/6 3/6 3/6 8/6 9/6 E(X 2 ) = 2.5 Then, V(X) = E(X 2 ) [E(X)] 2 = = 2.5, σ X = 2.5 =.58, and V(X 6) = V(X) = 2.5. Example 2.6 Compute the variance and standard deviation of the r.v s which presented in Examples.6. Then, calculate V(3X 6). 4

From Example 2.2, we found that E(X) = 2 3. E(X 2 ) = x 2 f(x) = x 2 (2x)dx = 2 =.5. V(X) = E(X 2 ) [E(X)] 2 =.5 ( 2 3 )2 =.556, σ X =.556 =.2357, and V(3X 6) = 3 2 V(X) = 9(.5556) =.5. 2..4 Mean and Variance for Linear Combination Suppose X, X 2,, X n are n independent random variables with means μ, μ 2,, μ n and variances σ 2, σ 2 2,, σ n 2.

21 From Example 2.2, we found that E(X) = 2 3. E(X 2 ) = x 2 f(x) = x 2 (2x)dx = 2 =.5. V(X) = E(X 2 ) [E(X)] 2 =.5 ( 2 3 )2 =.556, σ X =.556 =.2357, and V(3X 6) = 3 2 V(X) = 9(.5556) = Mean and Variance for Linear Combination Suppose X, X 2,, X n are n independent random variables with means μ, μ 2,, μ n and variances σ 2, σ 2 2,, σ n 2. Then, the mean and variance of the linear combination Y = and n i= respectively. a i X i, where a, a 2,, a n are real constants are: n μ Y = E(Y) = i= a i μ i = a E(X ) + a 2 E(X 2 ) + + a n E(X n ) σ 2 n Y = V(X) = a 2 2 i= i σ i = a 2 V(X ) + a 2 2 V(X 2 ) + + a 2 n V(X n ) 2..5 Chebyshev's inequality If X is a random variable with mean μ and standard deviation σ, then for any real number k >, P[ X μ > kσ] k 2 P(μ kσ < X < μ + kσ) k 2 Example 2.7 Let Y be the outcome when a single and fair die is rolled. If E(Y) = 3.5 and V(Y) = 2.9. Evaluate P ( Y ). Since the distribution is unknown, we cannot compute the exact value of the probability. To estimate the probability we will use Chebyshev's inequality. kσ = k = 2.5 k =.47. Thus, P ( Y ) (.47) 2 =

22 Example 2.8 Toss coins and let X count the number of heads where E(X) = 5, V(X) = 25. Estimate the probability that 4 X 6. To estimate the probability we will use Chebyshev's inequality. 6 = μ + σk 6 = 5 + 5k = 5k k = 2. P(4 < X < 6) = Central Moments and Raw Moments 2.2. Central Moments The r th central moment of a random variable X (moment about the mean μ) denoted by μ r is the expected value of (X μ) r ; symbolically, μ r = E[(X μ) r ] for r =,, 2,. Therefore, μ = E[(X μ) ] = E() =. The first central moment μ = E[(X μ)] = E(X) μ =. The second central moment μ 2 = E[(X μ) 2 ] = V(X) = σ 2 (The Variance) Raw Moments The r th moment about the origin of a random variable X, denoted by μ r, is the expected value of X r ; symbolically, μ r = E(X r ) for r =,, 2,. Therefore, μ = E(X ) = E() =. The first raw moment μ = E(X) = μ (The expected value of X). The second raw moment μ 2 = E(X 2 ). Notes It is known that V(X) = E(X 2 ) [E(X)] 2, thus μ 2 = μ 2 μ 2. 6

23 Henceforth, the term "moments" will be indicate to "raw moments". 2.3 Moment Generating Function If X is a random variable, then its moment generating function (MGF) denoted by M X (t) or M(t) is defined as M X (t) = E(e tx ) = { e tx f(x) x e tx f(x)dx if X is a discrete r. v if X is a continuous r. v We say that MGF of X exists, if there exists a positive constant h such that M X (t) is finite for all t [ h, h]. Notes We always have M X () = E[e X ] =. There are some cases where MGF does not exist. Example 2.9 For each of the following random variables, find the MGF. I. X is a discrete random variable, with pmf f(x) = { 3 2 3, x =, x = 2. M X (t) = E(e tx ) = 2 x= e tx f(x) = e t() f() + e t(2) f(2) = 3 et e2t. II. Y is a random variable, with pdf M Y (t) = E(e ty ) = e ty f(y) Why is the MGF useful? f(y) =, < y < dy = ety dy There are basically two reasons for this: = ety t = et e t = et. t First, the MGF of X gives us all moments of X. That is why it is called the moment generating function. 7

24 Second, the MGF (if it exists) uniquely determines the distribution. That is, if two random variables have the same MGF, then they must have the same distribution. Thus, if you find the MGF of a random variable, you have indeed determined its distribution Finding Moments from MGF Remember the Taylor series for e x : for all x R, we have Now, we can write Thus, we have e tx = M X (t) = E[e tx ] = Proposition e x = + x + x2 + x3 + = x k 2! 3! k! (tx) k k= k! k= k=. t = k x k k= = + tx + t2 x 2 E(X k ) tk k! k! 2! + t3 x 3 3! +. = + E(X)t + E(X 2 ) t2 2! + E(X3 ) t3 3! +. The r th moment about the origin can be found by evaluating the r th derivative of the moment generating function at t =. That is Example 2. d r dt r M(t) t= = M r () = E(X r ) = μ r Let X be a r.v. with MGF M X (t) = ( 3 et ).Drive the first and the second moments of X. By using the previous proposition we can find the moments as follow The first moment is E(X) = μ = d dt M(t) t= = d dt ( 3 et ) t= = ( 3 et )9 ( 3 et ) t= = 3 ( )9 = 3 = The second moment is E(X 2 ) = μ 2 = d2 dt 2 M(t) t= = d2 dt 2 ( 3 et ) t= = d dt ( 3 et )9 ( 3 et ) t= = [9 ( 3 et )8 ( 3 et ) 2 + ( 3 et )9 ( 3 et )] t= = =

25 Example 2. Let X be a r.v. with MGF M X (t) = ( 2t) 2.Drive the Mean (expected value) and the standard deviation of X. The mean is E(X) = μ = d dt M(t) t= = d dt ( 2t) 2 t= = 2 ( 2t) 3 2( 2) t= =. The variance is V(X) = E(X 2 ) μ 2 E(X 2 ) = d2 dt 2 M(t) t= = d dt ( 2t) 3 2 t= = 3 2 ( 2t) 5 2( 2) t= = 3( 2t) 5 2 t= = 3. V(X) = E(X 2 ) μ 2 V(X) = 3 2 = 2. Hence, the standard deviation is σ X = 2 =.4. Example 2.2 Find the MGF of the r.v. X, then use it to find the first four moments. Where f(x) = x 2 ; < x < M X (t) = E(e tx ) = xetx Use integration by parts: u = x dv = e tx dx 2 dx. du = 2 dx Hence, etx v =. t 2 M X (t) = xetx 2t 2 etx dx = e2t etx 2t t 2t 2 2 = e2t e2t + t 2t 2 2t 2., Since the derivative of M X (t) does not exist at t =, we will use the Taylor series form. Thus, we have to put the MGF on the form We have M X (t) = + E(X)t + E(X 2 ) t2 2! + E(X3 ) t3 3! +. M X (t) = + e2t e2t = + (2t ) e 2t. 2t 2 t 2t 2 2t 2 2t 2 = + (2t ) [ t + 2t 2 2t 2! 2! t ! t ! t ! t ! t6 + ]. 9

26 = + (2t ) + (2t ) + 2(2t ) + 22 t(2t ) + 23 t 2 (2t ) + 24 t 3 (2t ) + 25 t 4 (2t ) +. 2t 2 2t 2 t 2! 3! 4! 5! 6! = t + 23 t 2 22 t + 24 t 3 23 t t 4 24 t t 5 2t 2 t 2t 2 t 3! 3! 4! 4! 5! 5! 6! 2 5 t 4 6! +. = + (2 22 ) t + (23 23 ) 3! 3! 4! t2 + ( ) 4! 5! t3 + ( ) 5! 6! t4 +. = + 8 t + (23 23 ) t ! (24 24 ) t ! (25 25 ) t ! = + 8 t2 t ! (24) t ! (24) t ! Therefore, by comparing with the Taylor form the first four moments are E(X) = 8 3 ; E(X2 ) = 2; E(X 3 ) = 24 5 = 6 5 ; E(X4 ) = 24 3 = Moment Generating Function for Linear Combination Theorem Suppose X, X 2,, X n are n independent random variables, and the random variable Y is defined as Y = X + X X n ; Then Proof Special cases M Y (t) = M X +X 2 + +X n (t) = M X (t)m X2 (t) M Xn (t) M Y (t) = E[e ty ] = E[e t(x+x2+ +Xn) ] = E[e tx e tx2 e txn ] = E[e tx ]E[e tx2 ] E[e txn ] (since Xi s are independent) = M X (t)m X2 (t) M Xn (t). If X and Y are two independent r.v. s (n=2), then M X+Y (t) = M X (t)m Y (t). If X and Y are i.i.d. r.v. s (independent identically distributed), then Proposition M X+Y (t) = [M(t)] 2 ; where M(t) is the common MGF. If X is any random variable and Y = a + bx, then M Y (t) = e at M X (bt) In particular, if Z = X μ σ ; then M Z(t) = e μ σm X ( t σ ). 2

27 Proof M Y (t) = E(e ty ) = E(e t(a+bx) ) = E(e at+btx ) = E(e at. e btx ) = e at E(e btx ) (e at is a constant) = e at M X (bt) (from MGF definition). Example 2.3 Let X be a discrete random variable with values in {,, 2, } and moment generating function M(t) = 3 I. Y = 3X t M Y (t) = e 7t M X (3t) = e 7t ( 3 3 3t ) = e7t ( t). II. W = X M W (t) = M X ( t) = 3 3+t.. Find, in terms of M(t), the generating functions for Example 2.4 Let X and Y are two independent random variables. Find the MGF for Z = X + Y, if I. M X (t) = et t, M Y (t) = 4t. M Z (t) = M X (t)m Y (t) = ( et ) ( t 4t ) = II. M X (t) = M Y (t) = M(t) = et. M Z (t) = [M(t)] 2 = ( et ) 2. et t( 4t). 2

28 Chapter Three Frequently Used Discrete Probability Distributions Distributions to be Covered Discrete uniform distribution. Bernoulli distribution. Binomial distribution. Geometric distribution. Negative binomial distribution. Hypergeometric distribution. Poisson distribution. 3. Discrete Uniform Distribution The discrete uniform distribution is also known as the "equally likely outcomes" distribution. A random variable X has a discrete uniform distribution if each of the k values in its range, say x, x 2,..., x k, has equal probability. Then, where k is a constant. f(x) = f(x; k) = { ; x = x k, x 2,, x k ; otherwise, Parameter of the Distribution: k N + (number of outcomes of the experiment). Mean and Variance Suppose X is a discrete uniform random variable on the consecutive integers a, a +, a + 2,, b for a b. The mean and the variance of X are Note E(X) = μ = b+a 2, V(X) = σ 2 = (b a+)2 2 If you compute the mean and variance by their definitions, you will get the same answer. 22

29 Example 3. Let X represent a random variable taking on the possible values of {,, 2, 3, 4, 5, 6, 7, 8, 9}, and each possible value has equal probability. Find the distribution of X. Then, calculate the mean and the variance. X has a discrete uniform distribution, thus f(x) = ; x =,,,9. Therefore, E(X) = 9+ = 4.5 and V(X) = (9 +)2 = Bernoulli Distribution Bernoulli trial It is a trial has only two outcomes, denoted by S for success and F for failure with P(S) = p and P(F) = q = p. Suppose that X is a r.v. representing the number of successes (x = or ). Therefore, X has a Bernoulli distribution (X~Ber(p)) and its pmf is given by f(x) = f(x; p) = { px q x ; x =,. ; otherwise Parameter of the Distribution: p (probability of success). Mean and Variance If X is a discrete random variable has Bernoulli distribution with parameter p then, E(X) = μ = p and V(x) = σ 2 = pq. Example 3.2 Let X~Ber(.6). Find the mean and the standard deviation p =.6 q =.6 =.4. E(X) = p =.6 and V(X) = pq = (.6)(.4) =.24 σ = Binomial Distribution If we perform a random experiment by repeating n independent Bernoulli trials where the probability of successes is p, then the random variable X representing the number of successes in the n trials has a binomial distribution (X~Bin(n, p)). The 23

30 possible values for binomial random variable X depends on the number of Bernoulli trials independently repeated, and is {,, 2,..., n}. Thus, the pmf of X is given by where ( n x ) = n! x!(n x)!. f(x) = f(x; n, p) = { (n x )px q n x ; x =,,, n., ; otherwise Parameters of the Distribution: n N + (number of trails or sample size) and p (probability of success). Characteristics of Binomial Distribution. There is a fixed number, n, of identical trials. 2. The trials are independent of each other. 3. For each trial, there are only two possible outcomes (success/failure). 4. The probability of success, p, remains the same for each trial (constant). Mean and Variance If X is a discrete random variable has binomial distribution with parameters n, p then, Proof I. E(X) = np. n E(X) = μ = np and V(x) = σ 2 = npq. n x= E(X) = x= xf(x) = x( n x )px q n x = n x= x( n x )px q n x (Set summation from, since when x = the expression= ) n! x!(n x)! px q n x = n x= x = x = n n (n )! x= (x )!(n x)! px q n x (n )! (x )!(n x)! px q n x n x= n(n )! x(x )!(n x)! px q n x = np n x= (Assume Y = X X = Y + ) = np n (n )! y= y!(n (y+))! py q n (y+) = np n (n )! y= y!(n y )! py q n y = np n y= ( n ) py q (n ) y (Assume m = n ) = np m y= ( m y ) py q m y y = np () = np. 24

31 (The part ( m y ) py q m y is a binomial pmf for y successes in m trails. Hence, the summation= ). II. V(X) = npq. V(X) = E(X 2 ) [E(X)] 2 = E(X 2 ) [np] 2 = E(X 2 ) n 2 p 2 = E(X 2 X + X) n 2 p 2 (Add and subtract X) = E(X 2 X) + E(X) n 2 p 2 = E(X 2 X) + np n 2 p 2. Now, let s simplify the part E(X 2 X): n E(X 2 X) = E(X(X )) = x= x(x )f(x) = x(x )( n x )px q n x = n x=2 x(x )( n x )px q n x (Set summation from 2, since when x =, the expression= ) n! x!(n x)! px q n x = n x=2 x(x ) = x(x ) = n(n ) n (n 2)! x=2 (x 2)!(n x)! px q n x n x=2 (n 2)! (x 2)!(n x)! px 2 q n x n x= n(n )(n 2)! x(x )(x 2)!(n x)! px q n x = n(n )p 2 n x=2 (Assume Z = X 2 X = Z + 2) = n(n )p 2 = n(n )p 2 n 2 (n 2)! z= z!(n (z+2))! pz q n (z+2) n 2 (n 2)! z= z!(n z 2)! pz q n z 2 = n(n )p 2 n 2 z= ( n 2 )pz q (n 2) z (Assume k = n 2) z = n(n )p 2 k z= ( k z )pz q k z = n(n )p 2 () = n(n )p 2. (The part ( k z )pz q k z is a binomial pmf for z successes in k trails. Hence, the summation = ). Therefore, V(X) = E(X 2 X) + np n 2 p 2 = n(n )p 2 + np n 2 p 2 = n 2 p 2 np 2 + np n 2 p 2 = np np 2 = np( p) = npq. Moment Generating Function If X is a discrete random variable has binomial distribution with parameters n, p then, the MGF of X is M X (t) = (pe t + q) n Proof Hint: The binomial formula gives n x= ( n x )ux v n x = (u + v) n. 25

32 M X (t) = E(e tx ) = n x= e tx f(x) = n x= e tx ( n x )px q n x = n x= ( n x )(pet ) x q n x = (pe t + q) n. Note The Bernoulli distribution is a special case of the binomial distribution when n =. Example 3.3 Suppose 4% of a large population of registered voters favor the candidate Obama. A random sample of n = 5 voters will be selected, and X, the number favoring Obama out of 5, is to be observed. What is the probability of getting no one who favors Obama (i.e. P(X = ))?. Then compute the mean and the variance. n = 5, p = 4 =.4 q =.6. X~Bin(5,.4) f(x) = ( 5 x ) (.4)x (.6) 5 x ; x =,,2,3,4,5. Hence, P(X = ) = f() = ( 5 )(.4) (.6) 5 =.778. E(X) = np = 5(.4) = 2. V(X) = npq = 5(.4)(.6) =.2. Example 3.4 If the MGF of the r.v. X is M X (t) = (.8 +.2e t ) 4. Find P(X 3), P( < X < 2) and the mean. The MGF is on the form (pe t + q) n, thus, X~Bin(4,.2). So, the possible values of X are,,2,3,4. Therefore, P(X 3) = f() + f() + f(2) + f(3) = f(4) = ( 4 4 )(.2)4 (.8) =.6 = P( < X < 2) = f() + f() = ( 4 )(.2) (.8) 4 + ( 4 )(.2) (.8) 3 = =.892. E(X) = 4(.2) =.8. 26

33 3.4 Geometric Distribution A single trial of an experiment results in either success with probability p, or failure with probability q = p. The experiment is performed with successive independent trials until the first success occurs. If X represents the number of trails until the first success, then X is a discrete random variable that can be,2,3,. X is said to have a geometric distribution with parameter p (X~Geom(.)) and its pmf is given by f(x) = f(x; p) = { pqx ; x =,2, ; otherwise Parameter of the Distribution: p (probability of success). Characteristics of Geometric Distribution. The outcome of each trial is Bernoulli, that is either a success(s) or failure(f). 2. The Probability of success is constant P(S) = p. 3. The trials are repeated until one successes occur. For example A coin is tossed until a head is obtained. From now we count the no. of days until we get a rainy day. Mean and Variance If X is a discrete random variable has geometric distribution with parameter p then, Proof Hint Let i R: i <. E(X) = μ = p and V(x) = σ2 = p p 2 = q p 2. n= ni n = + 2i + 3i 2 + = ( i) 2. n= n(n + )i n = 2 ( i) 3. I. E(X) = p. E(X) = x= xf(x) = x= x pq x = p x= xq x = p( + 2q + 3q 2 + ) = p ( q) 2 = p p 2 = p. 27

34 II. V(X) = q p 2. V(X) = E(X 2 ) [E(X)] 2 = E(X 2 ) ( p )2 = E(X 2 ) p 2. Now, E(X 2 ) = x= x 2 f(x) = x= x 2 pq x = x= (x 2 + x x)pq x (Add and subtract x) = x= [x(x + ) x]pq x = x= x(x + )pq x x= xpq x = p x= x(x + )q x E(X) = p 2 = 2p = 2. ( q) 3 p p 3 p p 2 p Therefore, V(X) = 2 p 2 p p 2 = p 2 p = p p 2 = q p 2. Moment Generating Function If X is a discrete random variable has geometric distribution with parameter p then, the MGF of X is Proof M X (t) = pet qe t. Hint: For any i R: i < ; n= i n = n= i n = + i + i 2 + = M X (t) = E(e tx ) = x= e tx f(x) = x= e tx pq x = pe t x= e t(x ) q x = pe t x= (qe t ) x Example 3.5 = pe t = pet qe t qe t. In a certain manufacturing process it is known that, on the average, in every items is defective. What is the probability that the fifth item inspected is the first defective item found? Find the mean and the variance. Let X represents the no. of items until the first defective item is found. The probability of successes (defective item) is p = we want to find P(X = 5) = f(5) = (.)(.99) 4 =.96. i. =..Thus, X~Geom(.). So, 28

35 μ =. =. σ 2 =.99 (.) 2 = Negative Binomial Distribution If r is an integer, then the negative binomial random variable X can be interpreted as being the number of trails until the r th success occurs when successive independent trials of an experiment are performed for which the probability of success in a single particular trial is p. The pmf of X~NBin(r, p) is given by f(x) = f(x; r, p) = { (x r ) pr q x r ; x = r, r +, r + 2, ; otherwise Parameters of the Distribution: r N + (number of successes), p (probability of success). Characteristics of Negative Binomial Distribution. The outcome of each trial is Bernoulli, that is either a success(s) or failure(f). 2. The Probability of success is constant P(S) = p. 3. The trials are repeated until r successes occur. Note The geometric distribution is a special case of the negative binomial distribution when r =. Moment Generating Function If X is a discrete random variable has negative binomial distribution with parameters k, p then, the MGF of X is M X (t) = ( pet qe t)r. Proof Hint: The sum of a negative binomial series k= ( k+r )uk = ( u) r. r M X (t) = E(e tx ) = x=r e tx f(x) = x=r e tx ( x r )pr q x r = p r e rt x=r ( x r )et(x r) q x r 29

36 = p r e rt x=r ( x r )(qet ) x r (Let Y = X r X = Y + r) y= = p r e rt ( y+r )(qet ) y r Mean and Variance = p r e rt ( qe t ) r = ( pet qe t)r. If X is a discrete random variable has negative binomial distribution with parameters r, p then, Proof III. E(X) = r p. E(X) = d dt M X(t) t= = d dt ( pet qe t) = r ( pet t)r ( qe qe [ t )pe t +pe t qe t ( qe t ) 2 E(X) = μ = r p and V(x) = σ2 = r( p) p 2 r t= (pe t ) r ] t= = r ( qe t ) r+ t= = r = rq p 2. p r pr = r ( q) r+ p r+ = r p. IV. V(X) = q p 2. V(X) = E(X 2 ) [E(X)] 2 E(X 2 ) = d2 M dt X(t) 2 t= = d r (pe t r ) dt ( qe t ) r+ t= = d dt r(pet ) r ( qe t ) (r+) t= = [r(pe t ) r ( (r + ))( qe t ) (r+2) ( qe t ) + r 2 (pe t ) r (pe t )( qe t ) (r+) ] t= = [rp r ( (r + ))( q) (r+2) ( q) + r 2 p r ( q) (r+) ] = [r(r + )p 2 q + r 2 p ] = r2 Hence, V(X) = r2 +rq p 2 Example 3.6 ( r p )2 = rq p 2. p [q p + ] + rq p 2 = r2 p 2 + rq = r2 +rq p 2 p 2 (p + q = ) Bob is a high school basketball player. He is a 7% free throw shooter. That means his probability of making a free throw is.7. During the season, what is the probability that Bob makes his third free throw on his fifth shot?. Find the MGF. Let X represents the no. of throws until the third free throw is done. The probability of successes (free throw) is p =.7.Thus, X~NBin(3,.7). So, we want to find 3

37 P(X = 5) = f(5) = ( 5 3 )(.7)3 (.3) 5 3 =.852. M X (t) = (.7et.3e t) Example Sara flips a coin repeatedly and counts the number of heads (successes).. Find the probability that Sara gets I. The fourth head before the seventh flip. II. The first head on the fourth flip. The probability of successes (getting head) is p =.5. Let X represents the no. of throws until the fourth head is shown. Thus, X~NBin(4,.5). So, we want to find I. P(X < 6) = f X (4) + f X (5) + f X (6) = ( 4 4 )(.5)4 + ( 5 4 )(.5)5 + ( 6 4 )(.5)6 = = Now, let Y represents the no. of throws until the first head is shown. Thus, Y~Geom(.5). II. P(Y = 4) = f Y (4) = (.5) 4 =.625. Comparison For Bernoulli and binomial distributions the number of trails is fixed ( for Ber. And n > for Bin.) while the number of successes is variable. For geometric and negative binomial distributions the number of trails is variable and the number of successes is fixed ( for Geom. r > for NBin.). 3.6 Hypergeometric Distribution M-K n n-x x K M In a group of M objects, K are of Type I and M K are of Type II. If n objects are randomly chosen without replacement from the group of M, let X denote the 3

38 number that are of Type I in the group of n. Thus, X has a hypergeometric distribution X~H(M, n, K). The pmf for X is ( K x )(M K n x ) ; x = Max[, n (M K)],, Min[n, K] f(x) = f(x; M, n, K) = { ( M n ) ; otherwise Parameters of the Distribution: M N + (population size), n N + (sample size), K N + (population elements with a certain characteristic). Characteristics of Hypergeometric Distribution. n trials in a sample taken from a finite population of size M. 2. The population (outcome of trials) has two outcomes Success (S) and Failure(F). 3. Sample taken without replacement. 4. Trials are dependent. 5. The probability of success changes from trial to trial. Mean and Variance If X is a discrete random variable has hypergeometric distribution with parameters M, n, K then, Proof E(X) = μ = nk M We will assume that the bounds of X are and n. I. E(X) = nk M. E(X) = n x= xf(x) = n x (K x )(M K x= and V(x) = σ2 = nk(m K)(M n). M 2 (M ) n x ) ( M n ) = n x (K x )(M K x= (Set summation from, since when x = the expression= ) n x ) ( M n ) K! = n x!(k x)! x (M K n x ) x= M! = nk (K )! n M x= n!(m n)! (x )!(K x)! (M K n x ) (M )! (n )!(M n)! = nk n ( K x )(M K n x ) M x= (Let Y = X, L = M, S = K and r = n ) ( M n ) = nk ( S r y )(L S M y= r y ) ( L r ) = nk M () = nk M ( Y~H(L, r, S) Sum of pmf = ) 32

39 II. V(X) = nk(m K)(M n). M 2 (M ) V(X) = E(X 2 ) [E(X)] 2 E(X 2 ) = n x= x 2 f(x) = n x 2 (K x )(M K x= n x ) ( M n ) = n x 2 (K x )(M K x= (Set summation from, since when x = the expression= ) n x ) ( M n ) K! = n x 2 x!(k x)! (M K n x ) x= M! = nk (K )! n x M n!(m n)! (x )!(K x)! (M K n x ) x= (M )! = nk M (n )!(M n)! n (K x We use the same variable substitution as when deriving the mean. E(X 2 ) = nk r (y + ) (S y )(L S M y= = nk [ r y (S y )(L S M y= r y ) ( L r ) r y ) ( L r ) + x= x )(M K n x ) ( M n ) ( S r y )(L S r y ) y= ] ( L r ) The first sum is the expected value of a hypergeometric random variable with parameters (L,r,S). The second sum is the total sum that random variable's pmf. E(X 2 ) = nk [E(Y) + ] = nk M M [rs + ] = nk L M [(n )(K ) + ] ; Thus, (M ) V(X) = nk M [(n )(K ) (M ) = nk M [M(n )(K )+M(M ) nk(m ) ] M(M ) + ] ( nk M )2 = nk M [(n )(K ) (M ) + nk M ] = nk M 2 (M ) [MnK Mn MK + M + M2 M nkm + nk] = nk M 2 (M ) [ Mn MK + M2 + nk] = = nk(m n)(m K). M 2 (M ) Example 3.8 nk M 2 (M ) [M(M n) K(M n)] Lots of 4 components each are called acceptable if they contain no more than 3 defectives. The procedure for sampling the lot is to select 5 components at random (without replacement) and to reject the lot if a defective is found. What is the probability that exactly one defective is found in the sample if there are 3 defectives in the entire lot. M = 4, n = 5, K = 3. 33

40 Let X represents the no. of defective items in the sample. X~H(4,5,3). We want to find P(X = ) = f() = (3 )(37 4 ) ( 4 5 ) =.3. Does the procedure being used is good? 3.6. Binomial Approximation to Hypergemetric Distribution Suppose we still have the population of size M with K units labeled as success and M K labeled as failure, but now we take a sample of size n is drawn with replacement. Then, with each draw, the units remaining to be drawn look the same: still K successes and M K failures. Thus, the probability of drawing a success on each single draw is p = K, and this doesn't change. When we were drawing M without replacement, the proportions of successes would change, depending on the result of previous draws. For example, if we were to obtain a success on the first draw, then the proportion of successes for the second draw would be K M, whereas if we were to obtain a failure on the first draw the proportion of successes for the second draw would be Proposition K M. If the population size M in such a way that the proportion of successes K M p,and n is held constant, then the hypergeometric probability mass function approaches the binomial probability mass function i.e. H(M, n, K) Bin (n, p = K M ). As a rule of thumb, if the population size is more than 2 times the sample size (M > 2 n), then we may use binomial probabilities in place of hypergeometric probabilities. Example 3.9 A box contains 6 blue and 4 red balls. An experiment is performed a ball is chosen and its color observed. Find the probability, that after 5 trials, 3 blue balls will have been chosen when I. The balls are replaced. II. The balls not replaced. 34

41 I. Let X represents the no. of blue balls in the sample. X~Bin(5,.6). So, we want to find P(X = 3) = ( 5 3 )(.6)3 (.4) 2 = II. Let Y represents the no. of blue balls in the sample. Y~H(,5,6). So, we want to find P(Y = 3) = (6 3 )(4 2 ) ( 5 ) = Example 3. It is estimated that 4 of the, voting residents of a town are against a new sales tax. If 5 eligible voters are selected at random and asked their opinion, what is the probability that at most 3 favor the new tax? Use binomial approximation. M =, n = 5, K = 6. To use the binomial approximation we have to check if M > 2n? M = > 2 5 = 3. Thus, X the no. of voting that favor the new sales tax in the sample has binomial distribution with parameters n = 5, p = K M =.6. P(X 3) = f() + f() + f(2) + f(3) = ( 5 )(.6) (.4) 5 + ( 5 )(.6)(.4)4 + ( 5 2 )(.6)2 (.4) 3 + ( 5 3 )(.6)3 (.4) 2 = Poisson Distribution The Poisson distribution is often used as a model for counting the number of events of a certain type that occur in a certain period of time (or space). If the r.v. X has Poisson distribution X~Poisson(λ)then its pmf is given by e λ λ x f(x) = f(x; λ) = { ; x =,,2, x! ; otherwise Parameter of the Distribution: λ > (The average) 35

42 For example The number of births per hour during a given day. The number of failures of a machine in one month. The number of typing errors on a page. The number of postponed baseball games due to rain. Note Suppose that X represents the number of customers arriving for service at bank in a one hour period, and that a model for X is the Poisson distribution with parameter λ. Under some reasonable assumptions (such as independence of the numbers arriving in different time intervals) it is possible to show that the number arriving in any time period also has a Poisson distribution with the appropriate parameter that is "scaled" from λ. Suppose that λ = 4 meaning that X, the number of bank customers arriving in one hour, has a mean of 4. If Y represents the number of customers arriving in 2 hours, then Y has a Poisson distribution with a parameter of 8. In general, for any time interval of length t, the number of customers arriving in that time interval has a Poisson distribution with parameter λt = 4t. So, the number of customers arriving during a 5-minute period (t = 4 hour) will have a Poisson distribution with parameter 4 =. In general, If W represents the 4 number of customers arriving in t hours W~Poisson(λt) therefore, f(w) = e λt (λt) w ; w =,, 2,. w! Mean and Variance If X is a discrete random variable has Poisson distribution with parameter λ then, E(X) = V(x) = λ. Proof Hint: e x = I. E(X) = λ. E(X) = x= x n n=. n! xf(x) = x e λ λ x x= (Set summation from, since when x = the expression= ) x! 36

43 = e λ λ x x= (x )! = λe λ λ y y= y! II. V(X) = λ. λ x = λe λ x= (Let Y = X ) (x )! = λe λ e λ = λ. V(X) = E(X 2 ) [E(X)] 2 E(X 2 ) = E(X 2 X + X) = E[X(X ) + X] = E[X(X )] + E(X) = E[X(X )] + λ = x= x(x )f(x) + λ = x(x ) e λ λ x x=2 + λ (Set summation from 2, since when x =, the expression= ) = e λ λ x x=2 (x 2)! = λ 2 e λ Hence, x! λ x 2 + λ = λ 2 e λ x=2 + λ (Let Z = X 2) λ z z= z! V(X) = λ 2 + λ λ 2 = λ. (x 2)! + λ = λ 2 e λ e λ + λ = λ 2 + λ. Moment Generating Function If X is a discrete random variable has Poisson distribution with parameter λ then, the MGF of X is Proof Hint: e x = x n n=. n! M X (t) = e λ(et ). M X (t) = E(e tx ) = x= e tx f(x) = e tx e λ λ x x= = e λ x= = e λ e et λ = e λ(et ). Example 3. x! (e t λ) x Suppose that the number of typing errors per page has a Poisson distribution with average 6 typing errors. What is the probability that I. the number of typing errors in a page will be 7. II. the number of typing errors in a page will be at least 2. III. in 2 pages there will be typing errors. IV. in a half page there will be no typing errors. x! 37

44 I. Let X represents the no. of typing errors per page. Therefore, λ X = 6 X~Poisson(6). P(X = 7) = e = ! II. P(X 2) = f(2) + f(3) + = P(X < 2) = f() f() = e 6 6! e 6 6! = III. Let Y represents the no. of typing errors in 2 pages. Therefore, λ Y = λ X t = 6 2 = 2 Y~Poisson(2). P(Y = ) = e 2 (2) =.48.! IV. Let Z represents the no. of typing errors in a half pages. Therefore, λ Z = λ X t = 6 2 = 3 Z~Poisson(3). P(Z = ) = e 3 3 =.498.! 3.7. Poisson Approximation to Binomial Distribution For those situations in which n is large ( ) and p is very small (.), the Poisson distribution can be used to approximate the binomial distribution. The larger the n and the smaller the p, the better is the approximation. The following mathematical expression for the Poisson model is used to approximate the true (binomial) result: f(x) = e (np) (np) x x! Where n is the sample size and p is the true probability of success (i.e. λ = np). Example 3.2 Given that 5% of a population are left-handed, use the Poisson distribution to estimate the probability that a random sample of people contains 2 or more left-handed people, then compare the result with the true probability using the binomial distribution. Let X represents the no. of left-handed on the sample. To use Poisson approximation we should check if n and p.. 38

45 Since n =, p =.5. we can use Poisson approximation. λ = np =.5 = 5 X~Poisson(5). Thus, P(X 2) = P(X < 2) = f() f() = e 5 (5) Now, let us use binomial distribution. P(X 2) = P(X < 2) = f() f()! e 5 (5)! = ( )(.5) (.95) ( )(.5) (.95) 99 = =

Chapter Four Frequently Used Continuous Probability Distributions Distributions to be Covered Uniform distribution. Exponential distribution. Gamma distribution. Chi-squared distribution.

46 Chapter Four Frequently Used Continuous Probability Distributions Distributions to be Covered Uniform distribution. Exponential distribution. Gamma distribution. Chi-squared distribution. Beta distribution. Normal distribution. Standard normal distribution. T distribution. 4. Uniform Distribution A uniform distribution, sometimes also known as a rectangular distribution, is a distribution that has constant probability. f(x) x a b The probability density function for a continuous uniform distribution on the interval [a, b] is We write X~U(a, b) f(x) = f(x; a, b) = { b a ; a x b ; otherwise Parameters of the Distribution: a, b R (The limits of the interval) 4

47 Mean and Variance If X is a continuous random variable has uniform distribution with parameters a and b then, Proof Hint b 2 a 2 = (b a)(b + a), E(X) = a+b 2 b 3 a 3 = (b a)(b 2 + ab + a 2 ) and (b a) 2 = b 2 2ab + a 2. I. E(X) = a+b 2. b E(X) = xf(x)dx a b dx a b a = x and = b a [x2 2 ] a b = V(x) = (b a)2 2. a 2 b a [b2 2 ] = b a [(b a)(b+a) 2 ] = a+b 2. II. V(X) = (b a)2 2. V(X) = E(X 2 ) [E(X)] 2 E(X 2 b ) = x 2 f(x)dx a = +ab+a 2 ) b a [(b a)(b2 3 Hence, V(X) = b2 +ab+a 2 = b2 2ab+a = (b a)2 2. b dx a b a = x2 ] = b2 +ab+a 2. Moment Generating Function 3 = b a [x3 3 ] a b = a 3 b a [b3 3 ( a+b 2 )2 = b2 +ab+a 2 b2 +2ab+a 2 = 4b2 +4ab+4a 2 3b 2 6ab 3a If X is a continuous random variable has uniform distribution with parameters a and b then, the MGF of X is Proof M X (t) = E(e tx b ) = e tx f(x)dx a M X (t) = ebt e at t(b a). b dx a b a = etx = etx t(b a) a b = ebt e at t(b a). Note that the above derivation is valid only when t. However, remember that it always when t =, M X (t) = E(e x() ) = E() =. ] 4

48 Example 4. The daily amount of coffee, in liters, dispensed by a machine located in an airport lobby is a random variable X having a continuous uniform distribution with a = 7 and b =. Find the probability that on a given day the amount of coffee dispensed by this machine will be I. at most 8.8 liters. II. more than 7.4 liters but less than 9.5 liters. III. at least 2.5 liters. f(x) = 7 = I. P(X 8.8) = dx 7 3 II. P(7.4 X 9.5) = III.P(X 2.5) =. Example 4.2 = x = =.7. =.6. A bus arrives every minutes at a bus stop. It is assumed that the waiting time for a particular individual is a r.v. with a continuous uniform distribution. What is the probability that the individual waits more than 7 minutes. Find the mean and the standard deviation. Let X is the waiting time for the individual. Thus, X~U(,). P(X > 7) = 7 =.3. μ = + 2 = 5 and σ = ( ) Exponential Distribution = A continuous random variable X is said to have an exponential distribution X~Exp(θ) if it has probability density function where θ >. f(x) = f(x; θ) = { θe θx ; x ; otherwise, 42

49 The exponential distribution is usually used to model the time until something happens in the process. For example The exponential random variable is used to measure the waiting time for elevator to come. The time it takes to load a truck. The waiting time at a car wash. Another form of exponential distribution is f(x) = μ e x μ; rest of this course, we will use the first form. x. However, for the Parameter of the Distribution: θ >. Cumulative Distribution Function x F(x) = P(X x) = θe θx dx = e θx x = e θx. Direct way to find probabilities I. P(X a) = F(a) = e θa. II. P(a X b) = F(b) F(a) = e θa e θb. III.P(X b) = e θb. Mean and Variance If X is a continuous random variable has exponential distribution with parameter θ then, Proof I. E(X) = θ. E(X) = θ and V(x) = θ 2. E(X) = xf(x)dx Use integration by parts: u = x dv = e θx dx du = dx E(X) = θ = θ. v = e θx θ xe θx dx = θxe θx dx = θ [ xe θx θ = θ xe θx dx. + e θx dx ] = + e θx dx θ = e θx θ 43

50 II. V(X) = θ 2. V(X) = E(X 2 ) [E(X)] 2 E(X 2 ) = x 2 f(x)dx Use integration by parts: u = x 2 dv = e θx dx du = 2x dx E(X 2 ) = θ = 2 [ xe θx θ Hence, v = e θx θ = θx 2 e θx dx = θ x 2 e θx dx x 2 e θx dx = θ [ x2 e θx + 2 xe θx dx] = + 2 xe θx dx V(X) = 2 θ 2 ( θ )2 = θ 2. + e θx dx ] = 2 e θx dx θ θ θ θ = 2 e θx θ 2 = 2 θ 2.. Moment Generating Function If X is a continuous random variable has exponential distribution with parameter θ then, the MGF of X is Proof M X (t) = E(e tx ) = = θ θ t e (θ t)x = e tx f(x)dx θ θ t. M X (t) = θ θ t. = e tx θe θx dx Note that the above derivation is valid only when t < θ. Example 4.3 = θ e (θ t)x dx The time between arrivals of cars at Al s full-service gas pump follows an exponential distribution with a mean time between arrivals of 3 minutes. Al would like to know the probability that the time between two successive arrivals will be 2 minutes or less. Then find the variance. Let X represents the time between two successive arrivals. θ = 3 X~Exp ( 3 ). 44

51 P(X 2) = F(2) = e 2 3 = V(X) = θ 2 = 32 = Lack of Memory Property (Memoryless) Let X be exponentially distributed with parameter θ. Suppose we know X > t. What is the probability that X is also greater than some value s + t? That is, we want to know P(X > s + t X > t). This type of problem shows up frequently when we are interested in the time between events; (Such as, queuing system). For example, suppose that costumers in a particular system have exponentially distributed service times. If we have a costumer that s been waiting for one minute, what s the probability that it will continue to wait for more than two minutes? Using the definition of conditional probability, we have P(X > s + t X > t) = P(X>s+t X>t) P(X>t) If X > s + t, then X > t is redundant, so we can simplify the numerator as P(X > s + t X > t) = P(X>s+t) P(X>t) Using the CDF of the exponential distribution, P(X > s + t X > t) = P(X>s+t) P(X>t) = e θ(s+t) e θt = e θs It turns out that the conditional probability does not depend on t. Thus, In our queue example, the probability that a costumer waits for one additional minute is the same as the probability that it wait for one minute originally, regardless of how long it s been waiting. This is called the lack of memory property, Example 4.4 P(X > s + t X > t) = P(X > s). On average, it takes about 5 minutes to get an elevator at stat building. Let X be the waiting time until the elevator arrives. Find the pdf of X then calculate the probability that I. you will wait less than 3 minutes? II. you will wait for more than minutes? 45

52 III.you will wait for more than 7 minutes? IV. you will wait for more than minutes given that you already wait for more than 3 minutes? X~Exp ( ) 5 f(x) = 5 e x 5 ; x. I. P(X < 3) = e 3 5 =.452. II. P(X > ) = e 2 =.353. III.P(X > 7) = e 7 5 = IV. P(X > X > 3) = P(X > X > 3) = P(X > 7) = Relation Between Exponential and Poisson Distributions An interesting feature of these two distributions is that, if the Poisson provides an appropriate description of the number of occurrences per interval of time, then the exponential will provide a description of the length of time between occurrences. Consider the probability function for the Poisson distribution, f(x) = e (λt) (λt) x ; x =,, 2,, x! where, λ is the mean rate of arrivals and t is a period of time. Defining the r.v. Y as the time of an event, we have (by definition), F(t) = P(T t) = P(T > t). Now, The st arrival occurs after time t iff there are no arrivals in the interval [, t]. Therefore, P(T > t) = P(zero events occur in time to t) = P(X = ) = e (λt) (λt) = e λt. Hence, F(t) = P(T t) = P(T > t) = e λt, which is the distribution function for the exponential distribution with parameter λ. Briefly, X (no. of occurrences per interval of time t) ~Poissin(λt), Y (time between occurrences)~exp(λ).! 46

53 Example 4.5 If we know that there are on average customers visiting a store within 2 hours (2 minutes) interval, then the r.v. that represents the number of costumers is X~Poisson(λt = ). From another point of view, the r.v. that represents the time between costumers arrivals Y~Exp (λ = ), where the average time between customers arrival is λ = Gamma Distribution = 2 minutes. The gamma distribution is another widely used distribution. Its importance is largely due to its relation to exponential and normal distributions. Before introducing the gamma random variable, we need to introduce the gamma function Gamma function The gamma function denoted by Γ(α), is an extension of the factorial function to real (and complex) numbers. Specifically, if n {,2,3,... }, then 2 Γ(n) = (n )! More generally, for any positive real number α, Γ(α) is defined as Γ(α) = x α e x dx; α > Some useful Properties for Gamma distribution I. Γ ( 2 ) = π. II. Γ(α + ) = αγ(α), α >. Proof Γ(α + ) = x α e x dx Use integration by parts: u = x α dv = e x dx du = αx α dx v = e x (from gamma function definition). Γ(α + ) = x α e x + αx α e x dx = α x α e x dx = αγ(α). (from gamma function definition) III. x α e βx dx = Γ(α+) ; α, β >. βα+ 47

54 Proof Let y = βx dy = βdx dx = dy, thus, x: y: then, x α e βx dx = ( y β )α β y dy e = β β α+ yα e y dy = Γ(α+) βα+ (from gamma function definition) Example 4.6 I. Find Γ ( 7 2 ). II. Find the value of the following integral I = x 6 e 5x dx. I. Γ ( 7 ) = 5 Γ 2 2 (5) = 5 3 Γ (3) = 5 3 Γ () = II. I = x 6 e 5x dx = Γ(6+) = 6! = Definition of Gamma Distribution We now define the gamma distribution by providing its pdf. π = 5 8 π. A continuous random variable X is said to have a gamma distribution with parameters α > and β >, denoted by X Gamma(α, β), if its pdf is given by β α f X (x) = f(x; α, β) = { Γ(α) xα e βx ; x ; otherwise Parameters of the Distribution: α >, β >. Mean and Variance If X is a continuous random variable has gamma distribution with parameters α, β then, Proof III.E(X) = α β. E(X) = α β and V(x) = α β 2. E(X) = = βα Γ(α+) Γ(α) β xf(x)dx = x β α Γ(α) xα e βx dx α+ (Using property III) = βα Γ(α) xα e βx dx 48

55 = αγ(α) = α. (Using property II) Γ(α) β β IV. V(X) = α β 2. V(X) = E(X 2 ) [E(X)] 2 E(X 2 ) = = βα Γ(α+2) Γ(α) β β α x 2 f(x)dx = x 2 Γ(α) xα e βx dx α+2 (Using property III) = (α+)αγ(α) = α(α+). (Using property II) Γ(α) β 2 β 2 Hence, V(X) = α(α+) β 2 ( α β )2 = α β 2. = βα Γ(α) xα+ e βx dx Moment Generating Function If X is a continuous random variable has gamma distribution with parameters α, β then, the MGF of X is Proof M X (t) = E(e tx ) = = βα Γ(α) xα e (β t)x dx = βα β = ( (β t) α β t )α. M X (t) = ( β β t )α = ( t β ) α. β α e tx f(x)dx = e tx Γ(α) xα e βx dx = βα Γ(α) Γ(α) (β t) α (Using property III) Note that the above derivation is valid only when t < β Special Cases First Case If we let α =, we obtain f(x;, β) = βe βx ; x. Thus, we conclude that Gamma(, β) = Exp(β). 49

56 Second Case If we let α = υ, β =, we obtain f (x; υ, ) = x υ (υ 2 ) Γ( υ 2 e x 2; x, which is the 2 ) pdf of the Chi-squared distribution with υ degrees of freedom. (This distribution will be discussed in section 4.4). Third Case If we let β =, we obtain f(x; α, ) = Γ(α) xα e x ; x, which is called the standard gamma distribution with parameter α Incomplete Gamma Function When X follows the standard Gamma distribution then its cdf is F x (x; α) = xα e x dx ; x. Γ(α) This is also called the incomplete gamma function. Proposition If X~Gamma(α, β), then F(x; α, β) = P(X x) = F (βx; α), where F is the incomplete gamma function, and F is the cdf of the gamma distribution. Note Table 4. in appendix (A) provides some values of F (x; α) for α =,2,, and x =,2,..,5. Example 4.7 Let X represents the survival time in weeks, where X~Gamma(6,.5). Find the mean and the variance, then calculate the probabilities P(6 < X < 2), P(X < 3). μ = α β = 6.5 = 2 weeks, and σ2 = α β 2 = 6 (.5) 2 = 24. P(6 < X < 2) = P(X < 2) P(X < 6) = F(2; 6,.5) F(6; 6,.5) F (2.5; 6) F (6.5; 6) = F (6; 6) F (3; 6) = =.47. 5

98 =.92. II. P(X.25) =.25 4xe 2x dx Use integration by parts: u = 4x dv = e 2x dx du = 4 dx v = e 2x 2 P(X.25) = 2xe 2x.25 + = 3.5e 2.5 =.727..25 2e 2x dx = 2(.25)e 2(.25) + e 2x.

57 Example 4.8 Suppose that the time, in hours, taken to repair a heat pump is a r.v. X having a gamma distribution with parameters α = β = 2. What is the probability that the next service call will required I. at least 2 hours to repair the heat pump. II. at most.25 hours to repair the heat pump. f(x) = 22 Γ(2) x2 e 2x = 4xe 2x. I. P(X 2) = P(X < 2) = F(2; 2,2) = F (4; 2) =.98 =.92. II. P(X.25) =.25 4xe 2x dx Use integration by parts: u = 4x dv = e 2x dx du = 4 dx v = e 2x 2 P(X.25) = 2xe 2x.25 + = 3.5e 2.5 = e 2x dx = 2(.25)e 2(.25) + e 2x.25 Why do we need gamma distribution? Any normal distribution is bell-shaped and symmetric. There are many practical situations that do not fit to symmetrical distribution. The Gamma family pdfs can yield a wide variety of skewed distributions. β is called the scale parameter because values other than either stretch or compress the pdf in the x-direction. 5

58 4.4 Chi-squared Distribution The r.v. X is said to has a Chi-Squared distribution with parameter υ (X~χ 2 υ ) if its pdf is given by x υ f(x) = f(x; υ) = { 2 (υ 2 ) Γ( υ 2 e 2; x 2 ) ; otherwise Parameter of the Distribution: υ > (The degrees of freedom df ). x Mean and Variance If X is a continuous random variable has chi-squared distribution with parameter υ then, Moment Generating Function E(X) = υ and V(x) = 2υ. If X is a continuous random variable has Chi-squared distribution with parameter υ then, the MGF of X is Note M X (t) = ( 2t ) υ 2 = ( 2t) υ 2. Table 4.2 in appendix (A) provides some Lower Critical values for Chi-square distribution. Example 4.9 If X~χ 7 2 find a, b if I. P(X < a) =.. a = (From the table) II. P(X b) =.99. P(X b) = P(X < b) =.99 P(X < b) =.99 =.. Therefore, b =

59 4.5 Beta Distribution A continuous random variable X is said to have a beta distribution X~Beta(α, β) if it has probability density function f(x) = f(x; α, β) = { B(α,β) xα ( x) β ; < x <. ; otherwise Where the beta function is defined as B(α, β) = x α ( x) β dx Thus, f(x) = Γ(α+β) Γ(α)Γ(β) xα ( x) β. Parameters of the Distribution: α, β >. = Γ(α)Γ(β) Γ(α+β). Mean and Variance If X is a continuous random variable has beta distribution with parameters α and β then, E(X) = Moment Generating Function α and V(x) = αβ. α+β (α+β) 2 (α+β+) The moment-generating function for the beta distribution is complicated. Therefore, we will not mention it. Example 4. If X~Beta(3,2), Find P(X < 3), P(X = 7) and P(X <.5). f(x) = Γ(3+2) Γ(3)Γ(2) x3 ( x) 2 = 4! 2! x2 ( x) = 2 x 2 ( x); < x <. P(X < 3) = 2 x 2 ( x) dx =. P(X = 7) =. P(X <.5) =.5 2 x 2 ( x) = (4x 3 3x 4 ).5 = dx = 2 (x 2 x 3 ) dx = 2 ( x3 x4 )

60 4.6 Normal Distribution The normal distribution is one of the most important continuous distributions. Many measurable characteristics are normally or approximately normally distributed, such as, height and weight. The graph of the probability density function pdf of a normal distribution, called the normal curve, is a bell-shaped curve. A continuous random variable X is said to have a normal distribution X~N(μ, σ) if it has probability density function f(x) = f(x; μ, σ) = { σ 2π e 2 (x μ σ )2 ; < x <. ; otherwise Parameters of the Distribution: < μ < (The mean), σ > (The standard deviation). Mean and Variance If X is a continuous random variable has normal distribution with parameters μ and σ then, Moment Generating Function E(X) = μ and V(x) = σ 2. If X is a continuous random variable has normal distribution with parameters μ and σ then, the MGF of X is Note M X (t) = e μt+ 2 σ2 t 2. The proof for the normal distribution MGF will be reviewed later in this chapter. 54

61 4.6. Some properties of the normal curve f(x) of N(μ, σ) I. f(x) is symmetric about the mean μ. II. The total area under the curve of f(x) =. III. The highest point of the curve of f(x) at the mean μ. IV. The mode, which is the point on the horizontal axis where the curve is a maximum, occurs at = μ, (Mode = Median = Mean). V. The curve has its points of inflection at X = μ ± σ is concave downward if μ σ < X < μ + σ and is concave upward otherwise. VI. The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean. VII. The location of the normal distribution depends on μ and its shape depends on σ. μ < μ 2 ; σ = σ 2 μ = μ 2 ; σ < σ 2 μ < μ 2 ; σ < σ 2 Where the solid line represents N(μ, σ ), and the dashed line represents N(μ 2, σ 2 ) Standard Normal Distribution The special case of the normal distribution where the mean μ = and the variance σ 2 = called the standard normal distribution denoten(,). Thus, the pdf is reduced to Notations f(z) = f(z;,) = { 2π e 2 z2 ; < z <. ; otherwise The random variable which has a standard normal distribution is usually denoted by Z. If < α < the notation z α refers to the point in the standard normal distribution Z such that P(Z < z α ) = α. 55

62 Moment Generating Function If Z is a continuous random variable has standard normal distribution then, the MGF of Z is M Z (t) = e t2 2. Proof M Z (t) = E(e tz ) = = 2π e 2 (z2 2tz+t 2 t 2) dz e tz f(z) dz = e tz 2π e 2 z2 dz (add and subtract t 2 ) = 2π e 2 (z2 2tz) dz = e t2 2 ( 2π e 2 (z2 2tz+t 2) dz 2π e 2 (z t)2 dz = e t2 2 2π e 2 (z t)2 dz = because it is a pdf of a N(t, )). Deriving the MGF of a Normal Distribution Recall the MGF of the normal distribution M X (t) = e μt+ 2 σ2 t 2. Proof We know that Z = X μ σ = e t2 2 = e t2 2. X = σz + μ. Where Z~N(,) and X~N(μ, σ). Using the theorem: If Y = a + bx M Y (t) = e at M X (bt), we get M X (t) = M σz+μ (t) = e μt M Z (σt) = e μt e 2 σ2 t 2 = e μt+ 2 σ2 t 2. Note Table 4.3 in appendix (A) provides the area to the left of Z for standard normal distribution Calculating Probabilities of Standard Normal Distribution The standard normal distribution is very important because probabilities of any normal distribution can be calculated from the probabilities of the standard normal distribution. I. P(Z a) from the table. II. P(Z b) = P(Z b) where P(Z b) from the table. III. P(a Z b) = P(Z b) P(Z a), where P(Z a) and P(Z b) from the table. 56

63 Proposition If X~N(μ, σ), then Z = X μ ~N(,). σ Calculating Probabilities of Normal Distribution I. P(X a) = P (Z a μ ) from the table. σ II. P(X b) = P (Z b μ b μ b μ ) = P (Z ), where P (Z ) from the table. III. P(a X b) = P ( a μ σ σ σ b μ b μ a μ Z ) = P (Z ) P (Z ), σ σ σ where P (Z b μ a μ ) and P (Z ) from the table. Example 4. σ σ σ If Z~N(,). Find P(Z <.5), P(Z >.98), P(Z < ) and P(.33 < Z < 2.42). P(Z <.5) = P(Z >.98) =.8365 =.635. P(Z < ) =.5. P(.33 < Z < 2.42) = P(Z < 2.42) P(Z <.33) = =.94. Example 4.2 Suppose that the birth weight of Saudi babies X has a normal distribution with mean μ = 3.4 and standard deviation σ =.35. I. Find the probability that a randomly chosen Saudi baby has a birth weight between 3. and 4. kg. II. What is the percentage of Saudi babies who have a birth weight between 3. and 4. kg. I. P(3 < X < 4) = P ( < Z < ) = P(.4 < Z <.7).35 = P(Z <.7) P(Z <.4) = = II. P(3 < X < 4) % =.8293 % = 82.93%. 57

64 4.6.3 Normal Approximation to Binomial The binomial distribution is symmetrical (like the normal distribution) whenever p =.5. When p.5, the binomial distribution is not symmetrical. However, the closer p is to.5 and the larger the sample size n, the more symmetric the distribution becomes. Fortunately, whenever the sample size is large, you can use the normal distribution to approximate the exact probabilities of the items of interest. Proposition If X is a binomial r.v. with mean μ = np and variance σ 2 = npq, then the limiting form of the distribution of Z = X np npq, As n, is the standard normal distribution N(,). Note As a general rule, you can use the normal distribution to approximate the binomial distribution whenever np, nq 5. Steps to solving a Normal approximation to the Binomial distribution Step. Check if appropriate to use the approximation. Rule of Thumb: np 5 and nq 5. Step 2. Calculate µ = np and σ = npq where p + q =. Step 3. Approximate the r.v. X with a normal r.v. N(µ = np, σ = npq). Step 4. Use the continuity correction factor. Change all the < to and > to, remember to correct the endpoints to new value (i.e. X > 6 would change to X 7 and X < 9 would change to X 8). 2. For the, subtract.5, and for the, add.5. P(X a) = P(Y a.5), P(X b) = P(Y b +.5), P(X = c) = P(c.5 Y c +.5), where Y~N(μ, σ). 58

65 Step 5. Solve the normal problem. Example 4.3 P(X a) = P (Z (a.5) np ), npq P(X b) = P (Z (b+.5) np ), P(X = c) = P ( (c.5) np npq npq Z (c+.5) np ). npq Let X be the number of times that a fair coin when flipped 4 times lands on head. Use normal distribution to find the probability that I. it will be equal to 2. II. it will be less than 7. p(h) = p =.5 = q = p(t), n = 4, x = 2. Step. np = nq = 4.5 = 2 5. Step 2. μ = 2, σ = npq = = 3.6. thus, X N(2,3.6). Step 3. I. P(X = 2) = P ( (2.5) = P(.6 Z.6) Z (2+.5) 2 ) 3.6 = P(Z.6) P(P(Z.6) = =.272. II. P(X < 7) = P(X 6) = P (Z (6+.5) 2 ) 3.6 = P(Z.) =

66 4.7 Student s t Distribution A continuous random variable T is said to have a t distribution with parameter ν if it has probability density function f(t) = f(t; ν) = { Γ( ν+ 2 ) νπγ( ν 2 Parameters of the Distribution: ν >. ) ( + t2 ν ) ν+ 2 ; < t <. ; otherwise 4.7. Some properties of the t distribution I. It has mean of zero. II. It is symmetric about zero (mean=median=mode=). III. Compared to the standard normal distribution, the t distribution is less peaked in the center and has higher tails. IV. It depends on the degrees of freedom. V. t distribution approaches the standard normal distribution as ν approaches. Notation P(T < t α ) = α. Note Table 4.4 in appendix (A) provides the Lower Critical Values for t Distribution. Example 4.4 If υ = 5, find P(T <.753), t.99. P(T <.753) =.95. t.99 =

67 Chapter Five Joint, Marginal, and Conditional Distributions In the study of probability, given at least two random variables X, Y,..., that are defined on a probability space, the joint probability distribution for X, Y,... is a probability distribution that gives the probability that each of X, Y,... falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate (joint) distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. For both discrete and continuous random variables we will discuss the following Joint Distributions. Cumulative distribution. Marginal Distributions (computed from a joint distribution). Joint Mathematical Expectation Conditional Distributions (e.g. P(Y = y X = x)). Joint Moment Generating Function. 5. Joint Distributions 5.. Joint Probability function Joint distribution of two random variables X and Y has a probability function or probability density function f(x, y) that is a function of two variables (sometimes denoted f X,Y (x, y)). Discrete Case If X = x, x 2,, x n and Y = y, y 2,, y m are two discrete random variables, then the values of the joint probability function of X and Y f(x, y) is 6

68 f(x, y) y y 2 y j y m f(x, y) y x f(x, y ) f(x, y 2 ) f(x, y j ) f(x, y m ) f X (x ) x 2 f(x 2, y ) f(x 2, y 2 ) f(x 2, y j ) f(x 2, y m ) f X (x 2 ) x i f(x i, y ) f(x i, y 2 ) f(x i, y j ) f(x i, y m ) f X (x i ) x n f(x n, y ) f(x n, y 2 ) f(x n, y j ) f(x n, y m ) f X (x n ) f(x, y) x f Y (y ) f Y (y 2 ) f Y (y j ) f Y (y m ) f(x, y) = P(X = x, Y = y) must satisfy I. f(x, y). II. x y f(x, y) =. Continuous Case If X and Y are continuous random variables, then f(x, y) must satisfy I. f(x, y). II. f(x, y) dydx x y = Joint distribution function If random variables X and Y have a joint distribution, then the cumulative distribution function is y x t= s= f(s, t) ; If X, Y are discrete r. v. s F(x, y) = P(X x, Y y) = { y x f(s, t) ds dt ; If X, Y are continouos r. v. s Note In the continuous case, 2 x y Some Properties of the Joint CDF F(x, y) = f(x, y). F(x, y) is non-decreasing in both x and y. F(x, ) = F(x). F(, y) = F(y). F(, ) =. F(x, ) = F(, y) = F(, ) =. 62

69 5..3 Marginal probability Distributions If X and Y have a joint distribution with joint density or probability function f(x, y) then The marginal distribution of X has a probability function or density function denoted f X (x) is equal to y f(x, y) ; In the discrete case f X (x) = { f(x, y) dy ; In the continouos case The marginal distribution of Y has a probability function or density function denoted f Y (y) is equal to x f(x, y) ; In the discrete case f Y (y) = { f(x, y) dx ; In the continouos case 5..4 Joint Mathematical Expectation If g(x, y) is a function of two variables, and X and Y are jointly distributed random variables with joint probability function f(x, y), then the expected value of g(x, y) is defined to be y x g(x, y)f(x, y) ; In the discrete case E[g(X, Y)] = { g(x, y)f(x, y) dxdy ; In the continouos case Special cases I. If g(x, Y) = X, we get y x xf(x, y) ; In the discrete case E[g(X, Y)] = { xf(x, y) dxdy ; In the continouos case x x y f(x, y) = x xf(x) = E(X); In the discrete case = { x f(x, y) dydx = xf(x)dx = E(X); In the continouos case Similarly for g(x, Y) = Y. II. If g(x, Y) = (X μ) 2, we get E[g(X, Y)] = { y x (x μ)2 f(x, y) ; In the discrete case (x μ) 2 f(x, y) dxdy ; In the continouos case 63

70 = { x (x μ)2 y f(x, y) ; In the discrete case (x μ) 2 f(x, y) dydx ; In the continouos case = { x (x μ)2 f(x) = V(X); In the discrete case (x μ) 2 f(x)dx = V(X); In the continouos case Similarly for g(x, Y) = (Y μ) 2. Example 5. A company that services air conditioner units in residences and office blocks is interested in how to schedule its technicians in the most efficient manner. The random variable X, taking the values,2,3 and 4, is the service time in hours. The random variable Y, taking the values,2 and 3, is the number of air conditioner units. The joint probability function for X and Y is given in the table below Y: Number of Air Conditions Units X: Service Time I. Proof that f(x, y) is a joint probability function. II. Find: f(2,), f X (3), F(2,2), P(X < 3, Y > 2), P(X + Y 4). III. Find the marginal function f Y (y). IV. Find E(Y), V(Y), E(XY). I. First, it is clear that f(x, y), x, y. Second, x y f(x, y) = =. II. f(2,) =.8. f X (3) = =.3. F(2,2) = f(,) + f(,2) + f(2,) + f(2,2) = =.43. P(X < 3, Y > 2) = f(,3) + f(2,3) =. +. =.2. P(X + Y 4) = f(,) + f(,2) + f(,3) + f(2,) + f(2,2) + f(3,) =.5. 64

71 III. Y 2 3 Sum f Y (y) IV. Y 2 3 Sum f Y (y) yf(y) E(Y) =.79 y 2 f(y) E(Y 2 ) = 3.59 Thus, E(Y) =.79, and V(Y) = E(Y 2 ) [E(Y)] 2 = 3.59 (.79) 2 = Now, 3 E(XY) = 4 y= x= xyf(x, y) = ()()(.2) + ()(2)(.8) + ()(3)(.) + + (4)(3)(.7) = Example 5.2 Consider the joint probability function f(x, y) = c(x + y); < x, y < 2. Find c, f X (x), f Y (y), F(,), V(X), E[X(X + 6)]. To find c we know that f(x, y) dydx x y Thus, f(x, y) = x+y 8. f X (x) = 2 (x + y) dy 8 f Y (y) = 2 (x + y) dx = = c (x + y) dydx = c [xy + y2 ] 2 2 dx 2 = c (2x + 2)dx 2 = c[x 2 + 2x] 2 = 8c c = 8. = y2 [xy + ] = x+ (2x + 2) = ; < x < = 8 [x2 2 + xy] 2 = 8 F(,) = P(X, Y ) = 8 (x + y) dydx (x + 8 ) dx 2 = 8 [x2 2 + x 2 ] = 8. V(X) = E(X 2 ) [E(X)] 2 E(X) = 4 2 x(x + ) dx = 2 (x 2 + x) dx 4 y+ (2y + 2) = ; < y < 2. 4 = 8 y2 [xy + ] 2 dx = = 4 [x3 3 + x2 2 ] 2 = 4 [ ] =

72 E(X 2 ) = 2 4 x2 (x + ) dx V(X) = 5 3 [7 6 ]2 = 36. = 2 (x 3 + x 2 ) dx 4 = 4 [x4 4 + x3 3 ] 2 = 4 [ ] = 5 3. E[X(X + 6)] = E(X 2 + 6X) = E(X 2 ) + 6E(X) = (7 6 ) = Joint Moment Generating Function Given jointly distributed random variables X and Y, the moment generating function of the joint distribution is M X,Y (t, t 2 ) = E(e Xt +Yt 2 ) = { ext +yt 2 x y f(x, y) ; In the discrete case e xt +yt 2 f(x, y) dxdy ; In the discrete case, where, < t, t 2 <. Some Properties If X, Y are r.v s and r, r 2 are integer values, then I. M X,Y (,) =. II. M X,Y (t, ) = M X (t ). III. M X,Y (, t 2 ) = M Y (t 2 ). IV. r t r M X,Y (t, t 2 ) t =t 2 = = E(X r ). V. VI. r 2 t 2 r2 M X,Y (t, t 2 ) t =t 2 = = E(Y r 2). r +r2 t r t 2 r2 M X,Y (t, t 2 ) t =t 2 = = E(X r Y r 2). (The (r + r 2 )th joint raw moment) Example 5.3 Consider the joint probability function X Y f(x, y) Find M(t, t 2 ), then use it to find E(X), E(XY). 3 x= 5 y= 2 M(t, t 2 ) = e xt +yt 2 f(x, y) 66

73 =.5e t 2t e t +.2e t +5t 2 +.2e 3t 2t 2 +.5e 3t +.5e 3t +5t 2 E(X) = t M X,Y (t, t 2 ) t =t 2 = =.5e t 2t e t +.2e t +5t 2 +.6e 3t 2t 2 +.5e 3t +.45e 3t +5t 2 t =t 2 = =.8. E(XY) = 2 M t t X,Y (t, t 2 ) t =t 2 = = [.5e t 2t e t +.2e t +5t t 2.6e 3t 2t 2 +.5e 3t +.45e 3t +5t 2 ] t =t 2 = = [.3e t 2t 2 + e t +5t 2.2e 3t 2t e 3t +5t 2 ] t =t 2 = = Conditional Distributions 5.2. Conditional Probability function The probability of the random variable X under the knowledge provided by the value of Y is given by Note that f X Y=y (x y) must satisfy I. f X Y=y (x y). f X Y=y (x y) = f(x,y) ; f(y) >. f(y) II. { x f X Y=y(x y) = ; In the discrete case f X Y=y (x y) dx = ; In the continuous case. Similarly, f Y X=x (y x) = f(x,y) ; f(x) >. f(x) Conditional Distribution Function The conditional CDF is as follows Note x F(x y) = P[X x Y y] = { x s= f(s y) ; In the discrete case x f(u y) du; In the continuous case F(x y) = f(x y). 67

74 Example 5.4 If the joint probability function of X and Y is given by f(x, y) = 2; < x < y <. Find f(x), f(y), f(x y), f(y x), F(x y), F(y x), f(x.5), F(y.25). f(x) = 2 dy x y f(y) = 2 dx = 2y x = 2 2x = 2( x); < x <. = 2x y = 2y; < y <. f X Y=y (x y) = f(x,y) = 2 = ; < x < y <. f(y) 2y y f Y X=x (y x) = f(x,y) = 2 = ; < x < y <. f(x) 2( x) x x F(x y) = f(x y) dx y F(y x) = f(y x) dy x x = dx y y dy x x = f(x.5) =.5 = 2; < x < 2. = x y x = x ; < x < y <. y = y x x F(y.25) = y.25 = (4y ); < y < Conditional Expectation y = y x x ; < x < y <. If X and Y are two r.v s have a joint probability distribution f(x, y), then the conditional expectation of any function of X, g(x) given Y = y is E[g(X) Y = y] = { x g(x)f X Y=y(x y) ; In the discrete case g(x)f X Y=y (x y) dx; In the continuous case Similarly for E[h(Y) X = x], where h(y) is a function of the r.v. Y. Special Cases I. If g(x) = X, we get E(X Y = y) = { x xf X Y=y(x y) ; In the discrete case xf X Y=y (x y) dx; In the continuous case = μ X Y, which is the conditional expectation of X given Y (this expectation is considered as a variable of Y). II. If g(x) = (X μ X Y ) 2, we get E [(X μ X Y ) 2 Y = y] = V(X Y = y) = σ 2 X Y, 68

75 which is the conditional variance of X given Y Corollary The conditional variance can be expressed as Theorem V(X Y = y) = E(X 2 Y = y) [E(X Y = y)] 2 = E(X 2 Y = y) μ 2 X Y. Let X, Y be random variables, a, b R, and g R R. Assuming all the following expectations exist, we have I. E(a Y) = a. II. E[(aX + b) Y] = ae(x Y) + b. III. E[E(X Y)] = E(X), similarly E[E(Y X)] = E(Y). Proof The first two are not hard to prove, and we leave them to the reader. Consider (III). We prove the continuous case and leave the discrete case to the reader. E(X Y) = E[E(X Y)] = xf X Y=y (x y) dx (a function of Y). Thus, E(X Y)f Y (y) dy = [ xf X Y=y (x y) dx]f Y (y) dy = x f(x,y) f Y (y) dxdy = xf(x, y) dxdy x[ f(x, y) dy] dx f(y) = xf(x) dx = E(X). Theorem If X,Y have a joint distribution, then the marginal variance of X can be factored in the form Similarly, Example 5.5 V(X) = E Y [V(X Y)] + V Y [E(X Y)]. V(Y) = E X [V(Y X)] + V X [E(Y X)]. If f(x y) = ; < x < y <. Find E(X Y), V(X Y). y 69

76 y E(X Y) = xf(x y)dx E(X 2 y Y) = x 2 f(x y)dx y = x dx y y = x2 dx y = x2 y 2y = y ; < y <. 2 = x3 3y V(X Y) = E(X 2 Y) [E(X Y)] 2 = y2 3 y2 y = y2 = y ; < y <. Thus, ; < y < Moment Generating Function for Conditional Distributions If X, Y have a joint probability distribution, and f(x y) is the conditional probability function of X, then the moment generating function for the conditional distribution (if it exist) defined as Properties I. M X Y () =. M X Y (t) = E(e tx Y = y) = { x ext f(x y) e xt f(x y) dx II. d r dt r M X Y(t) t= = E(X r Y = y), which is the rth conditional raw moment. Example 5.6 If f(x y) = y ; < x < y <. Find M X Y(t) then use it to compute E(X Y), V(X Y). y M X Y (t) = e xt f(x y) dx Now, M X Y (t) = eyt yt y = ext y dx = ext yt y = eyt = ( (yt) r yt r= ) = ( + yt + y2 t 2 yt r! yt 2!. + y3 t 3 3! = + yt + y2 t yr t r + = + y t + y2 t2 + + yr tr +. 2! 3! (r+)! 2! 3 2! (r+) r! Therefore, + + yr t r + ) r! μ r = Hence, yr (r + ). E(X Y) = μ X Y = y 2. E(X 2 Y) = μ 2 = y2 3. V(X Y) = y2 2. 7

77 Chapter Six Covariance, Correlation, Independence of Variables (Stochastic Independence) 6. Covariance of random variables If random variables X and Y are jointly distributed with joint probability function f(x, y), then the covariance between X and Y is defined as σ X,Y = Cov(X, Y) = E{[X E(X)][Y E(Y)]} = E[(X μ X )(Y μ Y )] = E(XY) E(X)E(Y). Proof σ X,Y = Cov(X, Y) = E[(X μ X )(Y μ Y )] = E(XY Xμ Y Yμ X + μ X μ Y ) = E(XY) E(Xμ Y ) E(Yμ X ) + E(μ X μ Y ) = E(XY) μ Y E(X) μ X E(Y) + μ X μ Y = E(XY) μ Y μ X μ X μ Y + μ X μ Y = E(XY) 2μ X μ Y + μ X μ Y = E(XY) μ X μ Y = E(XY) E(X)E(Y). Some Properties for Covariance If X, Y, Z are r.v s and a,b are constants, then I. Cov(X, X) = V(X). II. Cov(X, Y) = Cov(Y, X). III. Cov(X, a) =. IV. Cov(aX, by) = abcov(x, Y). V. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z), this property can be generalized to n m n m i=. Cov( i= X i, j= Y j ) = j= Cov(X i, Y j ) VI. V(X ± Y) = V(X) + V(Y) ± 2Cov(X, Y). Proof II, III, IV are not hard to prove, and we leave them to the reader. I. Cov(X, X) = E(X X) E(X)E(X) = E(X 2 ) [E(X)] 2 = V(X). V. Cov(X + Y, Z) = E[(X + Y)Z] E(X + Y)E(Z) = E(XZ + YZ) [E(X) + E(Y)]E(Z) = E(XZ) + E(YZ) E(X)E(Z) E(Y)E(Z) 7

78 = [E(XZ) E(X)E(Z)] + [E(YZ) E(Y)E(Z)] = Cov(X, Z) + Cov(Y, Z). VI. V(X + Y) = Cov(X + Y, X + Y) (From I) = Cov(X, X) + Cov(X, Y) + Cov(Y, X) + Cov(Y, Y) (From V) = Cov(X, X) + Cov(X, Y) + Cov(X, Y) + Cov(Y, Y) (From II) = V(X) + V(Y) + 2Cov(X, Y). (From I) We can prove the case V(X Y)by the same way. 6.2 Correlation Coefficient The correlation is a measure of the linear relationship between X and Y. It is obtained by ρ X,Y = Corr(X, Y) = Cov(X,Y) V(X)V(Y) = Cov(X,Y) σ X σ Y, where σ X and σ Y are the standard deviations of X and Y respectively. Some Properties for Correlation If X, Y are r.v s and a,b,c,d are constants, then I. ρ X,Y = ρ Y,X. II. ρ X,X =. (strong positive relationship) III. ρ X, X =. (strong negative relationship) IV. ρ X,Y. V. ρ (ax±b),(cy±d) = ρ X,Y. Proof VI. ρ (ax±b),(cy±d) = Cov(aX±b,cY±d) = accov(x,y) = Cov(X,Y) = ρ V(aX±b)V(cY±d) a 2 V(X)c 2 V(Y) V(X)V(Y) X,Y. Example 6. If f(x, y) = x+y 8 ; < x, y < 2. Find Cov(X, Y), V(X + Y), ρ X,Y. From example 5.2 we got E(X) = E(Y) = 7, and V(X) = V(Y) =

79 Cov(X, Y) = E(XY) E(X)E(Y) 2 2 E(XY) = (xy) x+y 2 = 8 (x3 y 3 + x2 y ) 2 dy Cov(X, Y) = 4 3 (7 6 )2 =.278. ρ X,Y = Cov(X,Y) V(X)V(Y) = dxdy = (x 2 y + xy 2 ) dxdy 8 = 2 8 (8y + 3 2y2 ) dy =.99. ( )2 36 = +2y 3 8 (4y2 ) 2 3 = 4 = V(X + Y) = V(X) + V(Y) + 2Cov(X, Y) = = Independence of random variables Random variables X and Y with cumulative distribution functions F(x) and F(y) are said to be independent (or stochastically independent) if and only if the cumulative distribution function of the joint distribution F(x, y) can be factored in the form F(x, y) = F(x)F(y); for all (x, y). Alternatively, stochastic independence can be defined via the probability functions, that, X and Y are independent if and only if Corollary If X and Y are two independent r.v. s then f(x y) = f(x,y) f(y) Similaly, f(y x) = f(y). Note = f(x)f(y) f(y) f(x, y) = f(x)f(y); for all (x, y). = f(x). To proof that any two variables X and Y are independent we only need to proof one of the following I. F(x, y) = F(x)F(y). II. F(x y) = F(x). III. F(y x) = F(y). 73

80 IV. f(x, y) = f(x)f(y). V. f(x y) = f(x). VI. f(y x) = f(y) Joint Expectation Under Independence Condition If the two r.v s X and Y are independent, then I. E(XY) = E(X)E(Y), II. E(X Y) = E(X), III. E(Y X) = E(Y), (and vise versa). Example 6.2 ; < x <, < y < If f(x, y) = {, check if X and Y are independent. ; Otherwis First, let find the marginal probability functions f(x) = dy f(y) = dx Now, since = y =, and = x =. f(x, y) = = f(x)f(y). Thus, the random variables X and Y are independent. What is the distribution of X (or Y)?! Example 6.3 Are the random variables X and Y with the following joint probability density table independent? X values Y values

81 First, let find the marginal probability functions Now, since X values Y values 2 3 f X (x) f Y (y) f(,) = 8 64 = 8 8 = f X()f Y () Thus, the random variables X and Y are not independent (dependent). 8 (We can use any pair other than (,) to reject that X and Y are independent) Covariance Under Independence Condition If X and Y are two independent r.v. s then I. Cov(X, Y) =. (But the converse is not true in general) II. V(X ± Y) = V(X) + V(Y). Proof I. Cov(X, Y) = E(XY) E(X)E(Y) = E(X)E(Y) E(X)E(Y) =. II. V(X ± Y) = V(X) + V(Y) ± 2Cov(X, Y) = V(X) + V(Y) ± (From I) = V(X) + V(Y) Correlation Under Independence Condition If X and Y are two independent r.v. s then but the converse is not true in general. ρ X,Y = Corr(X, Y) = ;

82 Example 6.4 Let the joint probability density function of X and Y is X values I. Are X and Y independent. II. Find Cov(X, Y) and ρ X,Y. I. Since, f(, ) = 6 - f(y) Y values - f(x) = = = = f X( )f Y ( ). Therefore, X and Y are dependent. II. Cov(X, Y) = E(XY) E(X)E(Y) E(X) = E(Y) = =. E(XY) = =. Thus, Cov(X, Y) = =. Therefore, ρ X,Y =. 6 Note Cov(X, Y) = ρ X,Y = even that X and Y are dependent. 76

83 Chapter Seven Distributions of Functions of Random Variables (Transformations) In this chapter, we will study how to find the distribution of a function of a random variable with known distribution, which is called transformations of variables. 7. Discrete Case 7.. The Case of One Variable Suppose that X is a discrete random variable with probability function f(x). If g(x)is a function of x, and Y is a random variable defined by the equation Y = g(x), then Y is a discrete random variable with probability function f(y) = f(x) y=g(x) - given a value of y, find all values of x for which y = g(x), (say, g(x ) = g(x 2 ) = = g(x t ) = y), and then g(y) is the sum of those f(x i ) probabilities. Corollary If X and Y are independent random variables, and g and h are functions, then the random variables g(x) and h(y) are independent. There are two cases I. One-to-one correspondence. II. Not one-to-one correspondence. However, we will focus on the first case. If g is a one-to-one function, then the inverse image of a single value is itself a single value. For instance, g(x) = x 3, this inverse function is the cube root, while g(x) = x 2, this inverse function is the square root which may results in two values. Steps to Obtain f Y (y) for One-To-One Functions I. Compute Y values that corresponding to X values, y = g (x ), g (x 2 ),. II. Find the inverse x = g (y) III. f Y (y) = P(Y = y) = P(g(X) = y) = P(X = g (y)) = f X (g (y)). 77

84 Hence, the pmf of Y is f Y (y) = f X (g (y)); y = g(x ), g(x 2 ),. Example 7. If the r.v. X has pmf f X (x) = x ; x =,2,3,4,5. Find the pmf of the r.v. Y whrer Y = 5 X 3. Note that Y = g(x) = X 3 is a one-to-one function. Thus, x =,2,3,4,5 y = ( 3), (2 3), (3 3), (4 3), (5 3) = 2,,,,2. Y = X 3 g (Y) = X = Y + 3 g (y) = x = y + 3. f Y (y) = f X (g (y)) = f X (y + 3) = y+3 5. Thus, ; Example 7.2 f Y (y) = y+3 ; y = 2,,,,2. 5 If the r.v. X has pmf f X (x) = 3 ; x =,,2. Find the pmf of the r.v. Y whrer Y = X3. Note that Y = g(x) = X 3 is a one-to-one function. Thus, x =,,2 y = 3, 3, 2 3 =,,8. Y = X 3 g 3 (Y) = X = Y f Y (y) = f X (g (y)) = f X (y 3) = 3. Thus, ; f Y (y) = ; y =,,8. 3 = Y 3 g (y) = x = y The Case of Two Variables Suppose the two discrete r.v. s (X, X 2 ) has joint probability function f X X 2 (x, x 2 ) and joint sample space Ω X X 2. Let (Y, Y 2 ) be some function of (X, X 2 ) defined by Y = g (X, X 2 ) and Y 2 = g 2 (X, X 2 ) with the single-valued inverse given by X = g (Y, Y 2 ) and X 2 = g 2 (Y, Y 2 ). Let Ω Y Y 2 be the sample space of Y, Y 2. Then, the joint probability function of (Y, Y 2 ) is given by f Y Y 2 (y, y 2 ) = f X X 2 (g (y, y 2 ), g 2 (y, y 2 )). 78

85 Example 7.3 Let the two r.v. s X, X 2 have a joint probability function as follow x x 2 f(x, x 2 ) Find the pmf of the r.v. Y where Y = X + X 2. x =,,2 & x 2 =,,2,3 y =,,2,3,4,5. We will compute the values of f Y (y) by equivalency as follow f Y (y) = P(Y = y), Thus, f Y () = P(X =, X 2 = ) =.6, f Y () = P(X =, X 2 = ) + P(X =, X 2 = ) = =.5, f Y (2) = P(X =, X 2 = 2) + P(X =, X 2 = ) + P(X = 2, X 2 = ) = =.26, f Y (3) = P(X =, X 2 = 3) + P(X =, X 2 = 2) + +P(X = 2, X 2 = ) = =.27, f Y (4) = P(X =, X 2 = 3) + +P(X = 2, X 2 = 2) =.9 +. =.9, f Y (5) = P(X = 2, X 2 = 3) =.7. Therefore, y f Y (y) Continuous Case There are three techniques to compute the distribution of function of random variable: Method of distribution function. (F(x)) Method of change-of-variable. (One-to-One transformation) Method of moment-generating function. (M X (t)) 79

86 7.2. Distribution Function Method (CDF) Let X,, X n ~f(x,, x n ) and Y = g(x,, X n ). Then we follow the following steps to obtain f Y (y) by using the CDF technique I. Find F X (x). (If it not given) II. Find y rang in terms of x. III. Compute F Y (y) = P(Y y) = P(g(X) y) = P(X g (y)) = F X (g (y)) over the region where Y y. IV. Compute f y (y) = df y (y) (by derivation the CDF). Example 7.4 dy Let the probability density function of a random variable X is 2x ; < x < f X (x) = { ; other wise Use the CDF method to find the probability density function of the random variable Y = 8X 3. x F X (x) = 2x dx = x 2 x = x 2 The rang of y: < x < < x 3 < < 8x 3 < 8 < y < 8. F Y (y) = P(Y y) = P(8X 3 y) = P (X 3 y 8 ) = P (X 2 y 3) = F X ( 2 y 3) = ( 2 2 y 3) = 4 y2 3 ; < y < 8. f y (y) = df y (y) dy Example 7.5 = 2 2 y 3 = y 3 6 ; < y < 8. Let X~Exp(θ) i.e. f X (x) = θe θx ; x. Use the CDF method to find the distribution of the random variable Y = e X. Since X~Exp(θ) then F X (x) = e θx. The rang of y: x e e x e y. F Y (y) = P(Y y) = P(e X y) = P(ln(e X ) ln(y)) = P(X ln(y)) = F X (ln(y)) = e θ ln(y) = e ln(y θ) = y θ ; y. 8

87 f y (y) = df y (y) dy Example 7.6 = θy (θ+) ; y. Let X~N(μ, σ). Use the CDF method to find the distribution of the random variable Z = X μ σ. f X (x) = σ 2π e 2σ (X μ)2. Note At this example we notice that it is difficult to compute F X (x), therefore, we will use the differentiation that d dy F Y(y) = f Y (y). The rang of z: < x < μ σ F Z (z) = P(Z z) = P ( X μ σ f Z (z) = df Z (z) dz f Z (z) = σ. = df X (σz+μ) dz < x μ σ < μ σ < z <. z) = P(X σz + μ) = F X(σz + μ) = df X (σz+μ) dx dx dz = f x (σz + μ) σ, (by using the chain rule) thus, i.e. Z~N(,). σ 2π e 2σ (σz+μ μ)2 = Change-of-Variable Method One Variable Definition 2π e 2σ (σz)2 = 2π e 2 z2 ; Z ; Let X be a continuous random variable with probability density function f (x) defined over the rang c < x < c 2, and, let Y = g(x) be an invertible function of X with inverse function X = g (Y). Then, using the change-of-variable technique, the probability density function of Y is f Y (y) = f X (g (y)) dg (y) dy defined over the rang g (c ) < y < g (c 2 ). Example 7.7 Use the change-of-variable method to find the distribution of the random variable Y in Example

88 g (y) = y 3 2. The rang of y: < x < < x 3 < < 8x 3 < 8 < y < 8. f Y (y) = f X (g (y)) dg (y) = f dy X ( y 3 Example 7.8 = 6 y 3 ; < y < 8. 3 ) 2 6 y 2 3 = 2 ( y ) 2 6 y 2 3 If X~Uniform(2,5). Use the change-of-variable method to find the distribution of the random variable Y = f X (x) = 3. Y = X +X g (y) = X +X. Y + YX = X Y = X YX Y = X( Y) X = Y Y. Hence, Y Y. The rang of y: 2 < x < 5 3 < + x < < x +x < < y < 5 6. f Y (y) = f X (g (y)) dg (y) = f dy X ( y ) y ( y) 2 = ( y) 2 3 = ( y) 2 3 ; 2 3 < y < 5 6. Example 7.9 Let X~Exp(λ) i.e. f X (x) = λe λx ; x. Use the change-of-variable method to find the distribution of the random variable Y = Xβ. Y = Xβ X = Y β. Hence, g (y) = Y β. The rang of y: x β xβ β y. f Y (y) = f X (g (y)) dg (y) = f dy X (y β ) βy β = λβy β e λyβ ; y >, β > ; i.e. Y~Weibull Distribution. 82

89 Two Variables Definition Suppose the two contiuous r.v. s (X, X 2 ) has joint probability function f X X 2 (x, x 2 ) and joint sample space Ω X X 2. Let (Y, Y 2 ) be some function of (X, X 2 ) defined by Y = g (X, X 2 ) and Y 2 = g 2 (X, X 2 ) with the single-valued inverse given by X = g (Y, Y 2 ) and X 2 = g 2 (Y, Y 2 ). Let Ω Y Y 2 be the sample space of Y, Y 2. Then, we usually find Ω Y Y 2 by considering the image of Ω X X 2 under the transformation (Y, Y 2 ). The joint pdf Y and Y 2 is f Y Y 2 (y, y2) = J f X X 2 (g (y, y 2 ),g 2 (y, y 2 )), where J refers to the absolute value of the Jacobian "J" which is given by Example 7. J = x x y y 2 x 2 x 2 y y 2 = g (y,y 2 ) g (y,y 2 ) y y 2 g 2 (y,y 2 ) g 2 (y,y 2 ) y y 2 Let X and X 2 are two independent random variables having exponential distributions with parameters λ and λ 2 respectively. Find the distribution of Y = X + X 2 and Y 2 = X X +X 2. f X (x ) = λ e λ x ; x and f X2 (x 2 ) = λ 2 e λ 2x 2 ; x 2. Since X and X 2 are independent, hence f(x, x 2 ) = λ e λ x. λ 2 e λ 2x 2 = λ λ 2 e (λ x +λ 2 x 2 ) Y = X + X 2 & Y 2 = X X +X 2 X = Y Y 2 X 2 = Y Y Y 2 = Y ( Y 2 ). Hence, g (y, y 2 ) = y y 2 & g 2 (y, y 2 ) = y ( y 2 ). The rang of y & y 2 : x x + x 2 y 2 ; y y 2 y. J = x x y y 2 x 2 x 2 y y 2 y = y 2 y = y y 2 y ( y 2 ) = y. y 2 f Y Y 2 (y, y2) = J f X X 2 (g (y, y 2 ),g 2 (y, y 2 )) = y f (y y 2, y ( y 2 )) 83

90 Example 7. = λ λ 2 y e (λ y y 2 +λ 2 y ( y 2 )) = λ λ 2 y e [(λ λ 2 )y y 2 +λ 2 y ] ; y, y 2. Let X and X 2 are two independent random variables having Gamma distributions with parameters α = 2 and β =. Find the distribution of X + X 2. f X (x ) = x e x ; x and f X2 (x 2 ) = x 2 e x 2 ; x 2. Since X and X 2 are independent, hence f(x, x 2 ) = x e x. x 2 e x 2 = x x 2 e (x +x 2 ). Let Y = X & Y 2 = X + X 2 X = Y. Also, X 2 = Y 2 Y. Hence, g (y, y 2 ) = y & g 2 (y, y 2 ) = y 2 y. The rang of y & y 2 : x = y & x 2 = y 2 y y y 2. J = x x y y 2 x 2 x 2 y = = =. y 2 f Y Y 2 (y, y2) = J f(g (y, y 2 ),g 2 (y, y 2 )) = f(y, y 2 y ) = y (y 2 y )e (y +y 2 y ) = y (y 2 y )e y 2 ; y 2 y. y f Y2 (y 2 ) = 2 y (y 2 y )e y 2 dy = e y 2 ( y 2 y 2 2 i.e. Y 2 = X + X 2 ~ Gamma(4,). y 3 ) y 2 3 = e y 2 [( y 2 3 = e y y 2 2 (y y 2 y 2 ) dy y Moment-Generating Function Method 3 ) ] = y e y 2; y 2, Let X & Y are two random variables where M X (t), M Y (t) exist and equal, then, depending on the uniqueness of the moment generating function of a random variable X and Y have the same distribution. 84

91 Properties (From chapter two) If Y = ax + b, then M Y (t) = e bt M X (at). If X and Y are two independent r.v s, then M X+Y (t) = M X (t)m Y (t). Example 7.2 (Sum of Independent Gammas). Let X i ~Gamma(α i, β) i =,, n. Independent random variables. Use the moment generating function to find the distribution of Y = M Xi (t) = ( β β t )α i, i =,, n. n i= X i. M y (t) = M n i= Xi (t) = E(e t(x + +X n ) ) = E(e tx e tx n) = E(e tx ) E(e tx n) = M X (t) M Xn (t) = ( β β t )α ( β β t )α n = ( β β t ) n Thus, Y = i= X i ~ Gamma( i= α i, β). n n i= α i Example 7.3 (Linear Function of Independent Normal r.v s). Let X i ~ N(μ i, σ i 2 ) ; i =,, n. Independent random variables. Use the moment generating function to find the distribution of Y = i= a i X i. M Xi (t) = e μ it+ σ 2 i t 2 2, i =,, n. M y (t) = M n i= ai X i (t) = E(e t(a X + +a n X n ) ) = E(e ta X e ta nx n ) = E(e ta X ) E(e ta nx n ) = M X (a t) M Xn (a n t) = e μ a t+ σ 2 a 2 t 2 2 e μ na n t+ σ n 2 a 2 nt 2 2 = e t n μ ia i n Thus, Y = n i= a i X i ~ N ( i= α i μ i, σ 2 2 i a i n i= ). n i= + t2 2 n σ i 2 2 i= a i. Example 7.4 Use the moment generating function to find the distribution of Z 2 where Z~N(,). f Z (z) = z 2 2π e 2 ; < z <. M Z 2(t) = E(e tz2 ) = e tz2. z 2 2π e 2 dz = 2t 2π e ( 2 )z2 dz 85

92 = 2 2π 2t e ( 2 )z2 dz (Symmetric about ) Let u = z 2 z = u /2, dz = 2 u 2 du and u. Hence M Z 2(t) = 2 2π = Γ( 2 )( 2 2t ) 2 2π 2t e ( 2 )u. 2 u 2du = ( 2t) 2. Thus, Z 2 ~ Gamma ( 2, 2 ). Γ( 2 )( 2 2t ) 2 = 2π u 2e ( 2t 2 )u du u 2e ( 2t 2 )u du = Γ( 2 )( 2 2t ) 2 2π. () = π( 2 2t ) 2 2π Example 7.5 (Sum of two exponential r.v s). Let X, X 2 are two independent random variables have the same exponential distribution with parameter θ. f Xi (x i ) = θe θx i, x i. Use the moment generating function to find the distribution of X + X 2. M Xi (t) = θ θ t. M U (t) = M X +X 2 (t) = M X (t)m X2 (t) = θ θ t. θ θ t = ( θ θ t )2. Thus, X + X 2 ~Gamma(2, θ). 86

93 4.: 25 STAT Probability I Weaam Alhadlaq Appendix A. 87

$α χ α 2 Table 4.2: Lower Critical values for Chi-square distribution df \α.5..25.5..9.95.975.99.995.4.6.98.39.58 2.7 3.84 5.2 6.63 7.88 2..2.56.26.27 4.6 5.99 7.38 9.2.6 3.77.5.26.352.584 6.25 7.8 9.35.34 2.$

94 α χ α 2 Table 4.2: Lower Critical values for Chi-square distribution df \α

Table 4.3: Area to the Left of the Z score for Standard Normal Distribution. Z...2.3.4.5.6.7.8.9-3.4.3.3.3.3.3.3.3.3.3.2-3.3.5.5.5.4.4.4.4.4.4.3-3.2.7.7.6.6.6.6.6.5.5.5-3...9.9.9.8.8.8.8.7.7-3..3.3.3.2.2..... -2.

95 Table 4.3: Area to the Left of the Z score for Standard Normal Distribution. Z

Discrete Distributions

Discrete Distributions A simplest example of random experiment is a coin-tossing, formally called Bernoulli trial. It happens to be the case that many useful distributions are built upon this simplest form of experiment, whose