Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables that can take (at least theoretically) on an uncountable number of possible values, e.g., the weight of a randomly selected person in a population, the length of time that a randomly selected light bulb works, the error in experimentally measuring the speed of light. Example (Uniform Spinner, LNp.2-14): = (, ] For (a, b], P((a, b]) = b a/(2 ) Consider the random variables: X: R, and X( ) = for Y: R, and Y( ) = tan( ) for Then, X and Y are random variables that takes on an uncountable number of possible values. p. 5-1 Notice that: P X ({X = x})=, for any x R, But, for a<b, P X ({X (a, b]})=p((a, b]) = b a/(2 ) >. Q: Can we still define a probability mass function for X? If not, what can play a similar role like pmf for X? Probability Density Function and Continuous Random Variable Definition. A function f: R R is called a probability density function (pdf) if 1. f(x), for all x (, ), and 2. f(x)dx =1. p. 5-2 Definition: A random variable X is called continuous if there exists a pdf f such that for any set B of real numbers P X ({X B}) = B f(x) dx. For example, P X (a X b) = b a f(x)dx.
Theorem. If f is a pdf, then there must exist a continuous random variable with pdf f. Some properties P X ({X = x}) = x for any x R x f(y)dy = It does not matter if the intervals are open or close, i.e., P (X [a, b]) = P (X (a, b]) = P (X [a, b)) = P (X (a, b)). It is important to remember that the value a pdf f(x) is NOT a probability itself It is quite possible for a pdf to have value greater than 1 Q: How to interpret the value of a pdf f(x)? For small dx, P x dx 2 X x + dx 2 = x+ dx 2 f(y)dy f(x) dx. x dx 2 f(x) is a measure of how likely it is that X will be near x We can characterize the distribution of a continuous random variable in terms of its 1.Probability Density Function (pdf) 2.Cumulative Distribution Function (cdf) 3.Moment Generating Function (mgf, Chapter 7) Relation between the pdf and the cdf Theorem. If F X and f X are the cdf and the pdf of a continuous random variable X, respectively, then F X (x) =P (X x) = x f X(y)dy for all <x< f X (x) =F at continuity points of f X (x) = d dx F X(x) X p. 5-3 p. 5-4 Some Notes For a<b P (a <X b) =F X (b) F X (a) = The cdf for continuous random variables has the same interpretation and properties as in the discrete case The only difference is in plotting F X. In the discrete case, there are jumps. In the continuous case, F X is a continuous nondecreasing function. b a f X(x)dx.
Example (Uniform Distributions) If < < <, then 1, if α <x β, f(x) = β α, otherwise, is a pdf since 1. f(x) for all x R 2. f(x)dx = β 1 α β α dx = 1 (β α) =1. β α Its corresponding cdf is x, if x α, x α F (x) = f(y)dy = β α, if α <x β, 1, if x>β. (exercise) Conversely, it can be easily checked that F is a cdf and f(x)=f (x) except at x= and x= (Derivative does not exist when x= and x=, but it does not matter.) p. 5-5 An example of Uniform distribution is the r.v. X in the Uniform Spinner example where = and =. Transformation Q: Y=g(X), how to find the distribution of Y? p. 5-6 Suppose that X is a continuous random variable with cdf F X and pdf f X. Consider Y=g(X), where g is a strictly monotone (increasing or decreasing) function. Let R Y be the range of g. Note. Any strictly monotone function has an inverse function, i.e., g 1 exists on R Y. The cdf of Y, denoted by F Y 1.Suppose that g is a strictly increasing function. For y R Y, F Y (y) = P (Y y) = P (g(x) y) =P (X g 1 (y)) = F X (g 1 (y)). 2.Suppose that g is a strictly decreasing function. For y R Y, F Y (y) = P (Y y) = P (g(x) y) =P (X g 1 (y)) = 1 P (X <g 1 (y)) = 1 F X (g 1 (y)).
p. 5-7 Theorem. Let X be a continuous random variable whose cdf F X possesses a unique inverse F 1 X. Let Z=F X (X), then Z has a uniform distribution on [, 1]. Proof. For z 1, F Z (z) =F X (F 1 X (z)) = z. Theorem. Let U be a uniform random variable on [, 1] and F is a cdf which possesses a unique inverse F 1. Let X=F 1 (U), then the cdf of X is F. Proof. F X (x) =F U (F (x)) = P (U F (x)) = F (x). The 2 theorems are useful for pseudo-random number generation in computer simulation. X is r.v. F(X) is r.v. X 1,, X n : r.v. s with cdf F F(X 1 ),, F(X n ): r.v. s with distribution Uniform(, 1) U 1,, U n : r.v. s with distribution Uniform(, 1) F 1 (U 1 ),, F 1 (U n ): r.v. s with cdf F The pdf of Y, denoted by f Y 1.Suppose that g is a differentiable strictly increasing function. For y R Y, p. 5-8 2.Suppose that g is a differentiable strictly decreasing function. For y R Y, Theorem. Let X be a continuous random variable with pdf f X. Let Y=g(X), where g is differentiable and strictly monotone. Then, the pdf of Y, denoted by f Y, is f Y (y) =f X (g 1 (y)) dg 1 (y) dy, for y such that y=g(x) for some x, and f Y (y)= otherwise.
Q: What is the role of dg 1 (y)/dy? How to interpret it? p. 5-9 Some Examples. Given the pdf f X of random variable X, find the pdf f Y of Y=aX+b, where a. find the pdf f Y of Y=1/X. find the cdf F Y and pdf f Y of Y=X 2. p. 5-1 Expectation, Mean, and Variance Definition. If X has a pdf f X, then the expectation of X is defined by E(X) = R x f X(x) dx, provided that the integral converges absolutely.
Example (Uniform ½ Distributions). If 1, if α <x β, f X (x) = β α, otherwise, then p. 5-11 Some properties of expectation Expectation of Transformation. If Y=g(X), then E(Y )= R y f Y (y) dy = R g(x) f X(x) dx, provided that the integral converges absolutely. proof. (homework) Expectation of Linear Function. E(aX+b)=a E(X)+b, since p. 5-12 Definition. If X has a pdf f X, then the expectation of X is also called the mean of X or f X and denoted by X, so that μ X = E(X) = R x f X(x) dx. The variance of X is defined as Var(X) =E[(X μ x ) 2 ]= R (x μ X) 2 f X (x) dx, and denoted by X 2. The X is called the standard deviation. Some properties of mean and variance The mean and variance for continuous random variables have the same intuitive interpretation as in the discrete case. Var(X) = E(X 2 ) [E(X)] 2 Variance of Linear Function. Var(aX+b)=a 2 Var(X) Theorem. For a nonnegative continuous random variable X, E(X) = R 1 F X (x)dx = R P (X >x)dx. Proof. E(X) = R x f X (x) dx = R R x 1 dt f X (x) dx = R R x f X(x) dt dx = R R t f X (x)dx dt = R 1 F X (t)dt.
Example (Uniform Distributions) p. 5-13 Reading: textbook, Sec 5.1, 5.2, 5.3, 5.7 Some Common Continuous Distributions Uniform Distribution Summary for X ~ Uniform( ) ½ Pdf: 1/(β α), if α <x β, f(x) =, otherwise, Cdf:, if x α, F (x) = (x α)/(β α), if α <x β, 1, if x>β. Parameters: < < < Mean: E(X)=( + )/2 Variance: Var(X)= ( ) 2 /12 Exponential Distribution For >, the function ½ λe f(x) = λx, if x,, if x<, is a pdf since (1) f(x) for all x R, and (2) R f(x) dx = R λe λx dx = e λx =1. The distribution of a random variable X with this pdf is called the exponential distribution with parameter. The cdf of an exponential r.v. is F(x)= for x <, and for x, F (x) =P (X x) = R x λe λy dy = e λy x =1 e λx. Theorem. The mean and variance of an exponential distribution with parameter are μ =1/λ and 2 =1/λ 2. E(X) = R xλe λx dx = R y Proof. λ (λe y ) R 1 λ dy = 1 ye y dy = 1 λ λ Γ(2) = 1 λ. E(X 2 ) = R x 2 λe λx dx = R y 2 λ (λe y ) R 1 λ dy = 1 y 2 e y dy = 1 Γ(3) = 2. λ 2 λ 2 λ 2 p. 5-14
Some properties The exponential distribution is often used to model the length of time until an event occurs The parameter is called the rate and is the average number of events that occur in unit time. This gives an intuitive interpretation of E(X)=1/. p. 5-15 Theorem (relationship between exponential, gamma, and Poisson distributions, Sec. 9.1). Let 1. T 1, T 2, T 3,, be independent and ~ exponential( ), 2. S k =T 1 + +T k, k=1, 2, 3,, 3. X i be the number of S k s that falls in the time interval (t i 1, t i ], i=1,, m. Then, (i) X 1,, X m are independent, (ii) X i ~Poisson( (t i t i 1 )), (iii) S k ~ Gamma(k, ). (iv) The reverse statement is also true. The rate parameter is the same for both the Poisson and exponential random variable. The exponential distribution can be thought of as the continuous analogue of the geometric distribution. Theorem. The exponential distribution (like the geometric distribution) is memoryless, i.e., for s, t, p. 5-16
p. 5-17 This means that the distribution of the waiting time to the next event remains the same regardless of how long we have already been waiting. This only happens when events occur (or not) totally at random, i.e., independent of past history. Notice that it does not mean the two events {X > s+t} and {X > t} are independent. Summary for X ~ Exponential( ) ½ Pdf: λe λx, if x, Cdf: f(x) =, if x<. ½ 1 e F (x) = λx, if x,, if x<. Parameters: Mean: E(X)= 1/ Variance: Var(X)= 1/ 2. Gamma Distribution Gamma Function Definition. For >, the gamma function is defined as Γ(α) = R x α 1 e x dx. (1) = 1 and (1/2) = (exercise) ( +1) = ( ) Proof. By integration by parts, Γ(α +1) = R x α e x dx = x α e x + R αx α 1 e x dx = αγ(α). ( )=( 1)! if is an integer p. 5-18 Γ(α/2) = π(α 1)! if is an odd integer 2 α 1 ( α 1 2 )! Gamma function is a generalization of the factorial functions For, >, the function ½ λ α f(x) = Γ(α) xα 1 e λx, if x,, if x<,
is a pdf since (1) f(x) for all x R, and (2) R f(x) dx = R λ α The distribution of a random variable X with this pdf is called the gamma distribution with parameters and. The cdf of gamma distribution can be expressed in terms of the incomplete gamma function, i.e., F(x)= for x<, and for x, F (x) = R x = 1 Γ(α) λ α Γ(α) yα 1 e λy dy = 1 Γ(α) Γ(α) xα 1 e λx dx R y α 1 e y dy =1. Theorem. The mean and variance of a gamma distribution with parameter and are μ = α/λ and 2 = α/λ 2. Proof. E(X) = R x λα = λα Γ(α) Γ(α+1) λ α+1 Γ(α) xα 1 e λx dx R E(X 2 )= R x 2 λα Γ(α) xα 1 e λx dx = λα Γ(α) Γ(α+2) λ α+2 R p. 5-19 R λx z α 1 e z dz γ(α,λx) Γ(α). λ α+1 Γ(α+1) x(α+1) 1 e λx dx = α λ. λ α+2 Γ(α+2) x(α+2) 1 e λx dx = α(α+1). λ 2 (exercise) E(X k )= Γ(α+k), for <k,and λ k Γ(α) E( 1 )= λk Γ(α k) X k Γ(α), for <k<α. Some properties The gamma distribution can be used to model the waiting time until a number of random events occurs When =1, it is exponential( ) T 1,, T n : independent exponential( ) r.v. s T 1 + + T n ~ Gamma(n, ) Gamma distribution can be thought of as a continuous analogue of the negative binomial distribution A summary Discrete Time Version Continuous Time Version number of events binomial Poisson waiting time until 1 event occurs geometric exponential waiting time until r events occur negative binomial gamma is called shape parameter and scale parameter (Q: how to interpret and from the viewpoint of waiting time?) p. 5-2
A special case of the gamma distribution occurs when =n/2 and =1/2 for some positive integer n. This is known as the Chi-squared distribution with n degrees of freedom (Chapter 6) Summary for X ~ Gamma(, ) ½ Pdf: λ α f(x) = Γ(α) xα 1 e λx, if x,, if x<. Cdf: F (x) =γ(α, λx)/γ(α). Parameters:, Mean: E(X) = / Variance: Var(X) = / 2. Beta Distribution Beta Function: For, >, the function f(x) = is a pdf (exercise). B(α, β) = R 1 xα 1 (1 x) β 1 dx = Γ(α)Γ(β) ( Γ(α+β) Γ(α)Γ(β) xα 1 (1 x) β 1, if x 1,, otherwise, Γ(α+β). p. 5-21 The distribution of a random variable X with this pdf is called the beta distribution with parameters and. The cdf of beta distribution can be expressed in terms of the incomplete beta function, i.e., F(x)= for x<, F(x)=1 for x>1, and for x 1, p. 5-22 Theorem. The mean and variance of a beta distribution with parameters and are μ = α α+β and 2 αβ = (α+β) 2 (α+β+1). Proof. E(X) = R x Γ(α+β) Γ(α)Γ(β) xα 1 (1 x) β 1 dx = Γ(α+β) Γ(α)Γ(β) α = α+β. Γ(α+1)Γ(β) Γ(α+β+1) R Γ(α+β+1) Γ(α+1)Γ(β) x(α+1) 1 (1 x) β 1 dx
E(X 2 )= R x 2 Γ(α+β) Γ(α)Γ(β) xα 1 (1 x) β 1 dx = Γ(α+β) Γ(α+2)Γ(β) R Γ(α)Γ(β) Γ(α+β+2) α(α+1) = (α+β)(α+β+1). Some properties When = =1, the beta distribution is the same as the uniform(, 1). Whenever =, the beta distribution is symmetric about x=.5, i.e., f(.5 )=f(.5+ ). As the common value of and increases, the distribution becomes more peaked at x=.5 and there is less probability outside of the central portion. p. 5-23 Γ(α+β+2) Γ(α+2)Γ(β) x(α+2) 1 (1 x) β 1 dx When >, values close to become more likely than those close to 1; when <, values close to 1 are more likely than those close to (Q: How to connect it with E(X)?) Summary for X ( ~ Beta(, ) Pdf: Γ(α+β) f(x) = Γ(α)Γ(β) xα 1 (1 x) β 1, if x 1,, otherwise, Cdf: F (x) =B(x; α, β)/b(α, β). Parameters:, Mean: E(X) = /( + Variance: Var(X) = [ /[. Normal Distribution For R and >, the function f(x) = 1 2π e (x μ)2 2 2, <x<, is a pdf since (1) f(x) for all x R, and (2) and R 1 2π e R (x μ)2 2 2 dx = 1 y2 e 2 2π dy I ³ R ³ 2 R I 2 = e x 2 dx R x2 +y 2 e e y2 2 dy = R 2 dxdy = R = 2π R re r2 2 dr = 2πe r2 2 2π, R 2π e r2 2 rdθdr =2π. p. 5-24
The distribution of a random variable X with this pdf is called the normal (Gaussian) distribution with parameters and, denoted by N(, 2 ). The normal pdf is a bell-shaped curve. It is symmetric about the point, i.e., f( + )=f( ) and falls off in the rate determined by. The pdf has a maximum at can be shown by differentiation and the maximum height is The cdf of normal distribution does not have a close form. Theorem. The mean and variance of a N(, 2 ) distribution are and 2, respectively. : location parameter; 2 : scale parameter Proof. E(X) = R x 1 2π e (x μ)2 2 2 R y2 = ye 2 2π dy + μ R = 2π +μ 1=μ. dx = R (y + μ) 1 2π e y2 2 1 2π e y2 2 dy p. 5-25 dy E(X 2 )= R x2 1 2π e (x μ)2 2 2 = 2 R y2 1 2π e y2 2 +μ 2 R 1 2π e y2 2 p. 5-26 dx = R (y + μ)2 1 2π e y2 2 dy dy + 2μ 2π R ye y2 2 dy = 2 1+ 2μ 2π +μ 2 1= 2 + μ 2. dy Some properties Normal distribution is one of the most widely used distribution. It can be used to model the distribution of many natural phenomena. Theorem. Suppose that X~N(, 2 ). The random variable Y=aX+b, where a, is also normally distributed with parameters a +b and a 2 2, i.e., Y~N(a +b, a 2 2 ). Proof. f Y (y) =f X ³ y b a 1 a = 1 2π a e [y (aμ+b)]2 2 2 a 2.
Corollary. If X~N(, 2 ), then p. 5-27 is a normal random variable with parameters and 1, i.e., N(, 1), which is called standard normal distribution. The N(, 1) distribution is very important since properties of any other normal distributions can be found from those of the standard normal. The cdf of N(, 1) is usually denoted by. Theorem. Suppose that X~N(, 2 ). The cdf of X is Proof. F X (x) =F Z x μ. = Φ x μ F X (x) =Φ x μ Example. Suppose that ³ X~N(, 2 ). For <a<b<, P (a <X<b)=P a μ < X μ < b μ ³ ³ = P a μ <Z< b μ = P Z< b μ P Z< a μ ³ = Φ b μ Φ a μ.. Table 5.1 (textbook, p.21) gives values of. To read the table: 1.Find the first value of z up to the first place of decimal in the left hand column. 2.Find the second place of decimal across the top row. 3.The value of (z) is where the row from the first step and the column from the second step intersect. p. 5-28 For the values greater than z=3.49, (z) 1. For negative values of z, use ( z)=1 (z) Normal distribution plays a central role in the limit theorems of probability (e.g., CLT, chapter 8)
and Normal approximation to the Binomial Recall. Poisson approximation to Binomial Theorem. Suppose that X n ~binomial(n, p). Define Z n =(X n np)/ np(1 p). Then, as n, the distribution of Z n converge to the N(, 1) distribution, i.e., F Zn (z) =P (Z n z) Φ(z). Proof. It is a special case of the CLT in Chapter 8. p. 5-29 Plot the pmf of X n ~binomial(n, p) Superimpose the pdf of Y n ~N( n, n2 ) with n =np and n2 = np(1 p). When n is sufficiently large, the normal pdf approximates the binomial pmf. Z n (Y n n )/ n The size of n to achieve a good approximation depends on the value of p. For p near.5 moderate n is enough For p close to zero or one require much larger n Continuity Correction Q: Why need continuity correction? Ans. The binomial(n, p) is a discrete random variable and we are approximating it with a continuous random variable. For example, suppose X~binomial(5,.4) and we want to find P(X=18), which is larger than. With the normal pdf, however, P(Y=18)= since we are using a continuous distribution Instead, we make a continuity correction, p. 5-3 and can obtain the approximate value from Table 5.1. Similary,
p. 5-31 Summary for X ~ Normal(, ) Pdf: f(x) = 1 2π e (x μ)2 2 2, <x<, Cdf: no close form, but usually denoted by ((x )/ ). Parameters: R and > Mean: E(X) = Variance: Var(X) = 2. Weibull Distribution For, > and R, the function β x ν β 1 f(x) = α α e ( x ν α ) β, if x ν,, if x<ν, is a pdf since (1) f(x) for all x R, and (2) f(x) dx = = β x ν β 1 ν α α e ( x ν α ) β dx e y dy = e y =1. p. 5-32 The distribution of a random variable X with this pdf is called the Weibull distribution with parameters, and. (exercise) The cdf of Weibull distribution is F (x) = 1 e ( x ν α ) β, if x ν,, if x<ν. Theorem. The mean and variance of a Weibull distribution with parameter, and are μ = αγ 1+ 1 β + ν and 2 = α 2 Γ 1+ 2 β Γ 1+ 1 β E(X) = x β x ν β 1 Proof. v α α e ( x ν α ) β dx = (αy1/β + μ)e y dy = α y 1/β e y dy + μ e y dy = αγ 1 β +1 + μ E(X 2 )= x 2 β x ν β 1 v α α e ( x ν α ) β dx = (αy1/β + μ) 2 e y dy = α 2 y 2/β e y dy +2αμ y 1/β e y dy + μ 2 e y dy = α 2 Γ 2 β +1 +2αμΓ 1 +1 + μ2 β 2.
Some properties Weibull distribution is widely used to model lifetime. : scale parameter; : shape parameter; : location parameter Theorem. If X~exponential( ), then Y = α (λx) 1/β + μ is distributed as Weibull with parameter,, and (exercise). Cauchy Distribution For R and >, the function f(x) = π 1 2 +(x μ) 2, <x<, is a pdf since (1) f(x) for all x R, and (2) f(x) dx = 1 π 2 +(x μ) dx 2 = 1 π The distribution of a random variable X with this pdf is called the Cauchy distribution with parameters and, denoted by Cauchy(, ). The cdf of Cauchy distribution is F (x) = x π p. 5-33 1 1+y 2 dy = 1 π tan 1 (y) =1. 1 2 +(y μ) dy = 1 2 2 + 1 π tan 1 x μ for <x<. (exercise) The mean and variance of Cauchy distribution do not exist because the integral does not converge absolutely Some properties Cauchy is a heavy tail distribution : location parameter; : scale parameter Theorem. If X~Cauchy(, ), then ax+b~cauchy(a +b, a ). (exercise) p. 5-34 Reading: textbook, Sec 5.4, 5.5, 5.6