A normal distribution and other density functions involving exponential forms play the most important role in probability and statistics. They are related in a certain way, as summarized in a diagram later in this topic. Exponential distribution. The exponential density function is defined as { λe λx x ; f(x) x < Then the cdf is computed as F (t) where λ > is called a rate parameter. { e λt t ; t < Example. Suppose that the number of wins in a slot machine for an hour is distributed with Poisson distribution with λ. (a) If you play for t hours, what is the distribution for the number of wins in t hours? (b) Let X be the exact time of your first win. Find P (X > t). (c) What is the distribution for X? Solution. (a) It is again a Poisson distribution with parameter λt. (b) Let Y be the number of wins in t hours. Then we have P (X > t) P (Y ). Thus, λt λ P (X > t) e! e λt (c) Since F (t) P (X t) P (X > t) e λt, X must be exponentially distributed. Survival function and memoryless property. The function S(t) F (t) P (X > t) is called the survival function. When F (t) is an exponential distribution, we have S(t) e λt for t. Furthermore, we can find that P (X > t + s X > s) P (X > t + s) P (X > s) S(t + s) S(s) S(t) P (X > t), (4.) which is referred as the memoryless property of exponential distribution. Example. A service time (in minutes) of post office is exponentially distributed with λ.. (a) Suppose that no one is there, and that you get the service immediately. Find the probability that it takes more than 5 minutes to complete the business. (b) Suppose that someone else is in service ahead of you, and that you are next in line. Find the probability that you wait more than 5 minutes. Solution. Let X be the service time. (a) We obtain P (X > 5) P (X 5) F (5) e.37. Page Probability and Statistics I/November, 7
(b) The service of the person ahead of you started s minutes ago. P (X > 5 + s X > s) P (X > 5).37. Gamma distribution. The gamma function, denoted by Γ(α), is defined as Γ(α) Then the gamma density is defined as f(t) u α e u du, α >. λα Γ(α) tα e λt t which depends on two parameters α > and λ >. [The textbook uses a parameter β λ instead.] Properties of gamma function. Γ() Γ(α) e u du u α e u du [ e u u α ] u + (α ) u α e u du u (α )Γ(α ) Suppose that the parameter α is a positive integer n. Then we have Γ(n) (n )Γ(n ) (n ) Γ() (n )! The mgf of gamma distribution. Let X be a gamma random variable with parameter (α, λ). For t < λ, the mgf M X (t) of X can be computed as follows. e tx λα λ α M X (t) Γ(α) xα e λx dx λα x α e (λ t)x dx Γ(α) ((λ t)x) α e (λ t)x (λ t)dx (λ t) α Γ(α) ( ) α λ ( ) α λ u α e u du λ t Γ(α) λ t If you use λ /β then we obtain M X (t) ( βt) α. Parameters of gamma/exponential distribution. We call the parameter α a shape parameter, because changing α changes the shape of the density. We call the parameter λ a rate parameter, because changing λ merely rescales the density without changing its shape. In particular, the gamma distribution with α becomes an exponential distribution (with parameter λ). Page Probability and Statistics I/November, 7
The parameter β /λ, if you choose, is called a scale parameter. Property of scale parameter. Let X be a service time (in minutes) having a gamma distribution. If we change the unit of measurement from minutes to seconds then (6)X has a gamma distribution with new scale parameter (6)β. While the scale parameter changes from β to (6)β, the rate parameter changes from λ to λ/6. Theorem 3. If X has a gamma distribution with parameters α and β /λ then Y kx has a gamma distribution with parameters α and kβ. Solution. We obtain M Y (t) M X (kt) [ β(kt)] α [ (kβ)t] α Thus, Y has the mgf of gamma distribution with parameter α and kβ. Moments of gamma distribution. We can find the first and second moment and Thus, we can obtain E[X] M X() α λ E[X ] M X() α(α + ) λ Var(X) α(α + ) λ If you use β /λ then E[X] αβ and Var(X) αβ. α λ α λ. Expectation by direct calculation. Let X be a gamma random variable with parameter (α, λ). Then we can calculate E[X] and E[X ] as follows. E[X] x λα λγ(α) Γ(α) xα e λx dx λγ(α) u α e u du Γ(α + ) λ Γ(α) α λ (λx) α e λx λdx Variance by direct calculation. We have E[X ] x λ α λ Γ(α) λ Γ(α) u α+ e u du Γ(α + ) λ Γ(α) Γ(α) xα e λx dx (λx) α+ e λx λdx α(α + ) λ Thus, we can obtain Var(X) α(α + ) λ α λ α λ Page 3 Probability and Statistics I/November, 7
Sum of independent random variables. Let X and Y be independent gamma random variables with the respective parameters (α, λ) and (α, λ). Then the sum Z X + Y of random variables has the mgf M Z (t) M X (t) M Y (t) ( ) α ( ) α ( ) α +α λ λ λ λ t λ t λ t which is the mgf of gamma distribution with parameter (α + α, λ). random variable with parameter (α + α, λ). Thus, Z is a gamma Chi-square distribution. The gamma distribution f(x) n/ Γ(n/) xn/ e x/, x with α n and λ is called the chi-square distribution with n degrees of freedom. [It plays a vital role later in understanding another important distribution, called t-distribution later.] A chi-square random variable X has the mean E[X] n and the variance Var(X) n. Example 4. Let X be a chi-square random variable with n 6. (a) Find the quantile Q(.5) and Q(.95). (b) Let Y be a gamma random variable with α 3 and λ /4. Then find P (3.8 < Y < 5.). Solution. (a) From the table we obtain Q(.5).635 and Q(.95).59. (b) Since X Y/ has a chi-square random variable with n 6, we obtain P (3.8 < Y < 5.) P (.64 < X <.6) F (.6) F (.64).95.5.9 Normal distribution. A normal distribution is represented by a family of distributions which have the same general shape, sometimes described as bell shaped. The normal distribution has the pdf f(x) [ π exp ] (x µ), (4.) which depends upon two parameters µ and. In (4.), π 3.459... is the famous pi (the ratio of the circumference of a circle to its diameter), and exp(u) is the exponential function e u with the base e.788... of the natural logarithm. Page 4 Probability and Statistics I/November, 7
Integrations of normal density functions. Much celebrated integrations are the following: π π π e (x µ) dx, (4.3) xe (x µ) dx µ, (x µ) e (x µ) dx. Equation (4.3) guarantees that the function (4.) always represents a probability density no matter what values the parameters µ and would take. Parameters of normal distribution. We say that a random variable X is normally distributed with parameter (µ, ) when X has the pdf (4.). Since E[X] µ and Var(X), the parameter µ is called a mean (or, location parameter), and the parameter a standard deviation (or, a scale parameter). The parameter µ provides the center of distribution, and the density function f(x) is symmetric around µ. Small values of lead to high peaks but sharp drops, while larger values of lead to flatter densities. The shorthand notation X N(µ, ) is often used to express that X is a normal random variable with parameter (µ, ) with variance. Linear transformation of normal distribution. One of the important properties of normal distribution is that their linear transformation remains normal. Theorem 5. Assume a. If X is a normal random variable with parameter (µ, ) [that is, the pdf of X is given by (4.)], then Y ax + b is also a normal random variable having parameter (aµ + b, (a) ). In particular, X µ (4.4) becomes a normal random variable with parameter (, ), called the standard normal distribution. Proof of linear transformation theorem. Assume that a >. Since X has the pdf (4.), we obtain the cdf of Y as follows: ( F Y (t) P (ax + b t) P X t b ) a π a π t b a t e (x µ) dx e (y aµ b) a Thus, Y has the pdf of N(aµ + b, a ). The case with a < is an exercise; see Problem 9. Page 5 Probability and Statistics I/November, 7 dy
Standard normal distribution. The normal density φ(x) with parameter (, ) is given by φ(x) : π e x, and the table of standard normal distribution is used to obtain the values for the cdf Φ(t) : t φ(x) dx. How to calculate probabilities. Suppose that a tomato plant height X is normally distributed with parameter (µ, ). Then what is the probability that the tomato plant height is between a and b? The integration seems too complicated. P (a X b) b a (x µ) π e dx How to calculate probabilities, continued. If we consider the random variable X µ, then the event {a X b} is equivalent to the event { a µ Let a a µ X µ b µ }. and b b µ. Then in terms of probability, this means that ( P (a X b) P a X µ ) b b a φ(x) dx Φ(b ) Φ(a ). (4.5) Finally look at the values for Φ(a ) and Φ(b ) from the table of standard normal distribution, and plug them into (4.5). Example 6. The tomato plant height is normally distributed with parameter (5, 6) in inches. What is the probability that the height is between 4.4 and 6.6 inches? Solution. We can calculate a 4.4 5.9 and b 6.6 5.9. Then find 4 4 Φ(.9).447 and Φ(.9).64. Thus, the probability of interest becomes.894, or approximately.9. Summary of probability calculations. Suppose that a random variable X is normally distributed with mean µ and standard deviation. Then we can obtain ( ) ( ) b µ a µ (a) P (a X b) Φ Φ ; ( ) b µ (b) P (X b) Φ ; Page 6 Probability and Statistics I/November, 7
( ) a µ (c) P (a X) Φ. The mgf of normal distribution. Let X be a standard normal random variable. Then we can compute the mgf of X as follows. M X (t) π e t π e tx e x dx π e (x t) dx exp e (x t) + t ( ) t dx Since X (Y µ)/ becomes a standard normal random variable and Y X + µ, the mgf M(t) of Y can be given by ( ) M(t) e µt t M X (t) exp + µt. Moments of normal distribution. Then we can find the first and second moment E[X] M () µ and E[X ] M () µ + Thus, we can obtain Var(X) (µ + ) µ. Sum of independent random variables Let X and Y be independent normal random variables with the respective parameters (µ x, x) and (µ y, y). Then the sum Z X + Y of random variables has the mgf ( ) M Z (t) M X (t) M Y (t) exp x t + µ xt ( ( x + y)t ) exp + (µ x + µ y )t ( y t ) exp + µ yt which is the mgf of normal distribution with parameter (µ x + µ y, x + y). By the property (a) of mgf, we can find that Z is a normal random variable with parameter (µ x + µ y, x + y). Relation between normal and chi-square distribution. random variable, and let Y X Let X be a standard normal be the square of X. Then we can express the cdf F Y of Y as F Y (t) P (X t) P ( t X t) Φ( t) Φ( t) in terms of the standard normal cdf Φ. By differentiating the cdf, we can obtain the pdf f Y as follows. f Y (t) d dt F Y (t) π t e t / Γ(/) t / e t/, where we use Γ(/) π. Thus, the square X of X is the chi-square random variable with degree of freedom. Page 7 Probability and Statistics I/November, 7
The relation, continued. Now let X,..., X n be independent and identically distributed random variables (iid random variables) of standard normal distribution. Since Xi s have the gamma distribution with parameter (/, /), the sum Y n i X i has the gamma distribution with parameter (n/, /). That is, the sum Y has the chi-square distribution with n degree of freedom. Extrema of exponential random variables. Let X,..., X n be independent exponential random variables with the respective parameters λ..., λ n. Then we can define the random variable V min(x,..., X n ) of the minimum of X,..., X n. To find the distribution of V, consider the survival function P (V > t) of V and calculate as follows. P (V > t) P (X > t,..., X n > t) P (X > t) P (X n > t) e λ t e λ t e (λ + +λ n)t, which is the survival function of exponential distribution with parameter λ λ + + λ n. Thus, V has an exponential distribution with parameter λ λ + + λ n. Page 8 Probability and Statistics I/November, 7
Summary diagram. Normal(µ, ) (4) Exponential(λ) f(x) π e (x µ), < x < () E[X] µ, V ar(x) (3) Chi-square (with m degrees of freedom) f(x) E[X] m, () e x ( x ) m Γ( m ), x > V ar(x) m f(x) λe λx x > E[X] λ, V ar(x) λ (5) (6) Gamma(α, λ) f(x) λα Γ(α) xα e λx, x > E[X] α λ, V ar(x) α λ (7) Γ(n) (n )!, when n is a positive integer. Relationship among random variables. () If X and X are independent normal random variables with respective parameters (µ, ) and (µ, ), then X X +X is a normal random variable with (µ + µ, + ). m ( () If X,, X m are iid normal random variables with parameter (µ, Xk µ ), then X is a chi-square (χ ) random variable with m degrees of freedom. (3) α m and λ. (4) If X,..., X n are independent exponential random variables with respective parameters λ,, λ n, then X min{x,..., X n } is an exponential random variable with parameter n i λ i. k ) Relationship among random variables, continued. (5) If X,, X n are iid exponential n random variables with parameter λ, then X X k is a gamma random variable with (α n, λ). (6) α. (7) If X and X are independent random variables having gamma distributions with respective parameters (α, λ) and (α, λ), then X X + X is a gamma random variable with parameter (α + α, λ). k Page 9 Probability and Statistics I/November, 7
Exercises Exercises Problem. Suppose that the lifetime of an electronic component follows an exponential distribution with rate parameter λ.. (a) Find the probability that the lifetime is less than. (b) Find the probability that the lifetime is between 5 and 5. Problem. Suppose that in a certain population, individual s heights are approximately normally distributed with parameters µ 7 and 3in. (a) What proportion of the population is over 6ft. tall? (b) What is the distribution of heights if they are expressed in centimeters? Problem 3. Let X be a normal random variable with µ 5 and. (a) Find P (X > ). (b) Find P ( < X < 5). (c) Find the value of x such that P (X > x).5. Problem 4. Suppose that X N(µ, ). (a) Find P ( X µ.675). (b) Find the value of c in terms of such that P (µ c X µ + c).95. Problem 5. X is an exponential random variable, and P (X < ).5. What is λ? Problem 6. The lifetime in months of each component in a microchip is exponentially distributed with parameter µ. (a) For a particular component, we want to find the probability that the component breaks down before 5 months, and the probability that the component breaks down before 5 months. Use the approximation e x x for a small number x to calculate these probabilities. (b) Suppose that the microchip contains of such components and is operational until three of these components break down. The lifetimes of these components are independent and exponentially distributed with parameter µ. By using the Poisson approximation, find the probability that the microchip operates functionally for 5 months, and the probability that the microchip operates functionally for 5 months. Page Probability and Statistics I/November, 7
Exercises Problem 7. The length of a phone call can be represented by an exponential distribution. Then answer to the following questions. (a) A report says that 5% of phone call ends in minutes. Assuming that this report is true, calculate the parameter λ for the exponential distribution. (b) Suppose that someone arrived immediately ahead of you at a public telephone booth. Using the result of (a), find the probability that you have to wait between and minutes. Problem 8. Let X i, i,..., 6, be independent standard normal random variables. Find P (X + X + X 3 + X 4 + X 5 + X 6.64) Problem 9. Let X be a normal random variable with parameter (µ, ), and let Y ax + b with a <. (a) Show that where f(x) P (Y t) t f(x) dx, ] [ a π exp (x aµ b). (a) (b) Conclude that Y is a normal random variable with parameter (aµ + b, (a) ). Problem. Let X be a normal random variable with parameter (µ, ), and let Y e X. Find the density function f of Y by forming P (Y t) t f(x) dx. This density function f is called the lognormal density since ln Y is normally distributed. Page Probability and Statistics I/November, 7
Optional problems Optional problems Problem. Show that the standard normal density integrates to unity by completing the following questions. (a) Show that (b) Show that π e x +y dx dy π e (x µ) dx π e r r dθ dr π. e y dy. Problem. Suppose that a non-negative random variable X has the memoryless property P (X > t + s X > s) P (X > t). By completing the following questions, we will show that X must be an exponential random variable. (a) Let S(t) is the survival function of X. Show that the memoryless property implies that S(s + t) S(s)S(t) for s, t. (b) Let κ S(). Show that κ >. (c) Show that S ( ) n κ n and S ( m n ) κ m n. Therefore, we can obtain S(t) κ t for any t. By letting λ ln κ, we can write S(t) e λt. Problem 3. The gamma function is a generalized factorial function. (a) Show Γ( ) e x dx. Hint: Use the substitution u x. (b) Show e x dx π. Hint: Use the result of optional problem. (c) Show that Γ( ) π, and that if n is an odd integer. ( n ) Γ π(n )! ( ) n n! Page Probability and Statistics I/November, 7
Answers to exercises Answers to exercises Problem. Let X be the life time of the electronic component. (a) P (X ) e (.)().8647 (b) P (5 X 5) e (.)(5) e (.)(5).38 Problem. Let X be a normal random variable with µ 7 and 3. ( ) 7 7 (a) P (X (6)()) Φ Φ(.67).5 3 (b) In centimeters, the height becomes (.54)X, and it is normally distributed with µ (.54)(7) 77.8 and (.54)(3) 7.6. Problem 3. ( ) 5 (a) P (X > ) Φ Φ(.5).385. ( ) ( ) 5 5 5 (b) P ( < X < 5) Φ Φ Φ() Φ(.5).835. ( ) x 5 (c) Since P (X > x) Φ.5, we must have Φ ( ) x 5.95. By using the normal distribution table, we can find x 5.64. Thus, x 5 + ()(.64).4. Problem 4. (a) P ( X µ.675) P (.675 X µ.675) Φ(.675) Φ(.675).5 ( X µ (b) P (µ c X µ + c).95 implies P c ) ( c ) Φ.975. Then we can find c.96. Thus, c.96. Problem 5. By solving P (X < ) e λ.5, we obtain λ ln.95.53. Problem 6. (a) Let p be the probability that the component breaks down before 5 months, and let p be the probability that the component breaks down before 5 months. Then, p e 5µ (.5).5; p e 5µ (.5).5. (b) Let X be the number of components breaking down before 5 months, and let X be the number of components breaking down before 5 months. Then X i has the binomial distribution Page 3 Probability and Statistics I/November, 7
Answers to exercises with parameter (p i, ) for i,. By using the Poisson approximation with λ p.5 and λ p 5, we can obtain P (X ) e λ ( + λ + λ ).986; P (X ) e λ ( + λ + λ ).5. Problem 7. Let X be the length of phone call in minutes. (a) By solving P (X ) e λ.5, we obtain λ (b) P ( X ) ( e λ ) ( e λ ).7. ln.5.347. Problem 8. Since Y X + X + X 3 + X 4 + X 5 + X 6 has a chi-square random variables with n 6, we obtain P (Y.64).5. Problem 9. (a) Recall that a <. P (Y t) P (ax + b t) P (X (t b)/a) t by substituting x (y b)/a. ] [ a π exp (y aµ b) dy (a) (t b)/a (b) The integrand f(y) is a normal density with (aµ + b, (a) ). ] [ π exp (x µ) dx Problem. For t > we obtain ln t ] P (Y t) P (e X t) P (X ln t) [ π exp (x µ) dx t ] [ y π exp (ln y µ) dy by substituting x ln y. Thus, the density is given by f(y) [ y π exp (ln y µ) ], y >. Page 4 Probability and Statistics I/November, 7