Chapter 3.3 Continuous distributions

Chapter 3.3 Continuous distributions In this section we study several continuous distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For each of these, we will discuss the PDF/CDF, moments, parameters, relationship with other distributions, and potential applications. S X (a, b) Families Uniform (0, 1) Beta (0, ) Exponential; Gamma; Log normal; Chi-squared (, ) Normal; Double exponential; T; Cauchy ST521 Chapter 3.3 Page 1

Uniform distribution The Uniform(a, b) distribution has support S X = (a, b). The family has two parameters a and b with a < b. The PDF is f X (x) = 1 I(a < x < b). b a The CDF is F X (x) = The mean is E(X) = The variance is V(X) = ST521 Chapter 3.3 Page 2

Beta distribution The Beta(a, b) distribution has support S X = (0, 1), and a more flexible shape than a uniform. The PDF is f X (x) = Γ(a+b) Γ(a)Γ(b) xa 1 (1 x) b 1. The gamma function for a > 0 is Γ(a) = 0 t a 1 exp( t)dt, so that (integration by parts) Γ(a + 1) = aγ(a) for any a, and thus Γ(a) = (a 1)! if a is an integer. The two parameters a > 0 and b > 0 control the shape of the PDF. The uniform is a special case. What type of data might be modeled with a beta? ST521 Chapter 3.3 Page 3

The mean and variance of the beta are: ST521 Chapter 3.3 Page 4

What if Y is a test score between 0 and 30, can we model it with a beta? ST521 Chapter 3.3 Page 5

Gamma distribution The Gamma(a, b) distribution has support S X = R +. The PDF is f X (x) = xa 1 exp( x/b) Γ(a)b a. Sometimes the PDF is written f X (x) = ba x a 1 exp( xb), so be careful! Γ(a) The two parameters a > 0 and b > 0 control the shape of the PDF. a is the shape parameter, b is the scale. The shape of the PDF changes from very skewed for small a to symmetric for large a. To see that b sets the scale, note that if c > 0 and X Gamma(a, b), then Y = cx Gamma(a, cb). What types of data might be modeled with a Gamma? ST521 Chapter 3.3 Page 6

The mean and variance of the gamma are: ST521 Chapter 3.3 Page 7

The gamma has two important special cases, the exponential and the chi-square. If X Gamma(a, b) and a = p/2 and b = 2, then X Chi-squared(p). If the data are normal, the sample variance follows a chi-square distribution (Chapter 5). If a = 1, then X Exponential(b) with PDF f X (x) = 1 exp( x/b). b The exponential has the memoryless property, sometimes used in reliability analysis P (X > t + c X > c) = P (X > t). ST521 Chapter 3.3 Page 8

Double exponential distribution The double exponential has support on all real numbers. If X DE(µ, σ), then f X (x) = 1 ( ) 2σ exp x µ. σ The mean is E(X) = µ and the variance is V(X) = 2σ 2. ST521 Chapter 3.3 Page 9

Normal distribution By far, the most common continuous distribution used in statistics is the normal (also called Gaussian) distribution. It extremely useful because of 1. The central limit theorem 2. Mathematical tractability If X N(µ, σ 2 ), then f X (x) = 1 ] [ 2πσ exp (x µ)2. 2σ 2 The two parameters are the mean E(X) = µ and variance is V(X) = σ 2. Since the log PDF is quadratic in the error x µ, it turns out there is a connection between the normal distribution and sum of squared errors in a least squares analysis (Chapter 11). ST521 Chapter 3.3 Page 10

The moment generating function is: ST521 Chapter 3.3 Page 11

Therefore the mean and variance are: ST521 Chapter 3.3 Page 12

Setting µ = 0 and σ 2 = 1 gives the standard normal distribution, Z N(0, 1). Empirical rule: P( 1 < Z < 1) 0.68 P( 2 < Z < 2) 0.95 P( 3 < Z < 3) 0.99 If Y = µ + σz, then E(Y ) = V(Y ) = In fact, a linear combination of normals is normal (Chapter 5), so Y N(µ, σ 2 ). That is, the normal distribution is a location-scale distribution (Chapter 3.5). This works the other way too: If Y N(µ, σ) and Z = (Y µ)/σ, then: ST521 Chapter 3.3 Page 13

Another version of the empirical rule: If Y is normal, then Y is within one standard deviation of the mean with probability approximately 0.68. Y is within two standard deviation of the mean with probability approximately 0.95. Y is within three standard deviation of the mean with probability approximately 0.99. ST521 Chapter 3.3 Page 14

The normal distribution can be used to approximate many other distributions. For example, if Y Binomial(n, p), then E(Y ) = np and V(Y ) = np(1 p), and if n is large, Y N [np, np(1 p)]. This can be used to avoid tedious look-ups in the binomial table. Example: If Y Binomial(1000, 0.1), what is P (Y < 90)? Use both the exact binomial and approximate normal computation. ST521 Chapter 3.3 Page 15

Log normal distribution If X N(µ, σ 2 ), then Y = exp(x) LogNormal(µ, σ 2 ). Y s domain is f Y (y) = E(Y ) ST521 Chapter 3.3 Page 16