1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions
2 / 43 5.1 Introduction Families of distributions How: Why: Parameter and Parameter space pf /pdf and cdf - new notation: f (x parameters ) Mean, variance and the m.g.f. ψ(t) Features, connections to other distributions, approximation Reasoning behind a distribution Natural justification for certain experiments A model for the uncertainty in an experiment All models are wrong, but some are useful George Box
3 / 43 Bernoulli distributions 5.2 Bernoulli and Binomial distributions Def: Bernoulli distributions Bernoulli(p) A r.v. X has the Bernoulli distribution with parameter p if P(X = 1) = p and P(X = 0) = 1 p. The pf of X is { p f (x p) = x (1 p) 1 x for x = 0, 1 0 otherwise Parameter space: p [0, 1] In an experiment with only two possible outcomes, success and failure, let X = number successes. Then X Bernoulli(p) where p is the probability of success. E(X) = p, Var(X) = p(1 p) and ψ(t) = E(e tx ) = pe t + (1 p) 0 for x < 0 The cdf is F (x p) = 1 p for 0 x < 1 1 for x 1
4 / 43 Binomial distributions 5.2 Bernoulli and Binomial distributions Def: Binomial distributions Binomial(n, p) A r.v. X has the Binomial distribution with parameters n and p if X has the pf { ( n ) f (x n, p) = x p x (1 p) n x for x = 0, 1,..., n 0 otherwise Parameter space: n is a positive integer and p [0, 1] If X is the number of successes in n independent tries where prob. of success is p each time, then X Binomial(n, p) Theorem 5.2.1 If X 1, X 2,..., X n form n Bernoulli trials with parameter p (i.e. are i.i.d. Bernoulli(p)) then X = X 1 + + X n Binomial(n, p)
5.2 Bernoulli and Binomial distributions Binomial distributions Let X Binomial(n, p) E(X) = np, Var(X) = np(1 p) To find the m.g.f. of X write X = X 1 + + X n where X i s are i.i.d. Bernoulli(p). Then ψ i (t) = pe t + 1 p and we get ψ(t) = n ψ i (t) = i=1 n ( pe t + 1 p ) = (pe t + 1 p) n i=1 cdf: F(x n, p) = x t=0 ( n t ) p t (1 p) n t = yikes! Theorem 5.2.2 If X i Binomial(n i, p), i = 1,..., k and the X i s are independent, then X = X 1 + + X k Binomial( k i=1 n i, p) 5 / 43
6 / 43 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) The setup: 1000 people need to be tested for a disease that affects 0.2% of all people. The test is guaranteed to detect the disease if it is present in a blood sample. Task: Find all the people that have the disease. Strategy: Test 1000 samples What s the expected number of people that have the disease? Any assumptions you need to make? Strategy (611): Divide the people into 10 groups of 100. For each group take a portion of each of the 100 blood samples and combine into one sample. Then test the combined blood samples (10 tests).
7 / 43 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) continued Strategy (611): If all of these tests are negative then none of the 1000 people have the disease. Total number of tests needed: 10 If one of these tests are positive then we test each of the 100 people in that group. Total number of tests needed: 110... If all of the 10 tests are positive we end up having to do 1010 tests Is this strategy better? What is the expected number of tests needed? When does this strategy lose?
8 / 43 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) continued Let Y i = 1 if test for group i is positive and Y i = 0 otherwise Let Y = Y 1 + + Y 10 = the number of groups where every individual has to be tested. Total number of tests needed: T = 10 + 100Y. Let Z i = number of people in group i that have the disease, i = 1,..., 10. Then Z i Binomial(100, 0.002) Then Y i is a Bernoulli(p) r.v. where p = P(Y i = 1) = P(Z i > 0) = 1 P(Z i = 0) ( ) 100 = 1 0.002 0 (1 0.002) 100 = 1 0.998 100 = 0.181 0 Then Y Binomial(10, 0.181) ET = E(10+100Y ) = 10+100E(Y ) = 10+100(10 0.181) = 191
9 / 43 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) continued When does this strategy (611) lose? Worst case scenario P(T 1000) = P(Y 9.9) = P(Y = 10) = ( ) 10 10 0.181 10 0.819 0 3.8 10 8 Question: can we go further - a 611-A strategy Any further improvement?
Hypergeometric distributions 5.3 Hypergeometric distributions Def: Hypergeometric distributions A random variable X has the Hypergeometric distribution with parameters N, M and n if it has the pf ( N )( M ) x n x f (x N, M, n) = ) ( N+M n Parameter space: N, M and n are nonnegative integers with n N + M Reasoning: Say we have a finite population with N items of type I and M items of type II. Let X be the number of items of type I when we take n samples without replacement from that population Then X has the hypergeometric distribution 10 / 43
11 / 43 5.3 Hypergeometric distributions Hypergeometric distributions Binomial: Sampling with replacement (effectively infinite population) Hypergeometric: Sample without replacement from a finite population You can also think of the Hypergeometric distribution as a sum of dependent Bernoulli trials Limiting situation: Theorem 5.3.4: If the samples size n is much smaller than the total population N + M then the Hypergeometric distribution with parameters N, M and n will be nearly the same as the Binomial distribution with parameters n and p = N N + M
12 / 43 Poisson distributions 5.4 Poisson distributions Def: Poisson distributions Poisson(λ) A random variable X has the Poisson distribution with mean λ if it has the pf { e λ λ x f (x λ) = x! for x = 0, 1, 2... 0 otherwise Parameter space: λ > 0 Show that f (x λ) is a pf E(X) = λ Var(X) = λ ψ(t) = e λ(et 1) The cdf: F(x λ) = x k=0 e λ λ k k! = yikes.
13 / 43 5.4 Poisson distributions Why Poisson? The Poisson distribution is useful for modeling uncertainty in counts / arrivals Examples: How many calls arrive at a switch board in one hour? How many busses pass while you wait at the bus stop for 10 min? How many bird nests are there in a certain area? Under certain conditions (Poisson postulates) the Poisson distribution can be shown to be the distribution of the number of arrivals (Poisson process). However, the Poisson distribution is often used as a model for uncertainty of counts in other types of experiments. The Poisson distribution can also be used as an approximation to the Binomial(n, p) distribution when n is large and p is small.
Poisson Postulates 5.4 Poisson distributions For t 0, let X t be a random variable with possible values in N 0 (Think: X t = number of arrivals from time 0 to time t) (i) Start with no arrivals: X 0 = 0 (ii) Arrivals in disjoint time periods are ind.: X s and X t X s ind. if s < t (iii) Number of arrivals depends only on period length: X s and X t+s X t are identically distributed (iv) Arrival probability is proportional to period length, if length is small: P(X t = 1) lim = λ t 0 t (v) No simultaneous arrivals: lim t 0 P(X t >1) t = 0 If (i) - (v) hold then for any integer n λt (λt)n P(X t = n) = e that is, X t Poisson(λt) n! Can be defined in terms of spatial areas too. 14 / 43
15 / 43 5.4 Poisson distributions Properties of the Poisson Distributions Useful recursive property: P(X = x) = λ x P(X = x 1) for x 1 Theorem 5.4.4: Sum of Poissons is a Poisson If X 1,..., X k are independent r.v. and X i Poisson(λ i ) for all i, then ( k ) X 1 + + X k Poisson λ i Theorem 5.4.5: Approximation to Binomial Let X n Binomial(n, p n ), where 0 < p n < 1 for all n and {p n } n=1 is a sequence so that lim n np n = λ. Then for all x = 0, 1, 2,... i=1 lim f λ λx X n n (x n, p n ) = e x! = f Poisson (x λ)
16 / 43 5.4 Poisson distributions Example: Poisson as approximation to Binomial Recall the disease testing example. We had 1000 X = X i Binomial(1000, 0.002) and i=1 Y Binomial(100, 0.181)
17 / 43 Geometric distributions 5.5 Negative Binomial distributions Def: Geometric distributions Geometric(p) A random variable X has the Geometric distribution with parameter p if it has the pf { p(1 p) x for x = 0, 1, 2... f (x r, p) = 0 otherwise Parameter space: 0 < p < 1 Say we have an infinite sequence of Bernoulli trials with parameter p X = number of failures before the first success. Then X Geometric(p)
18 / 43 Negative Binomial distributions 5.5 Negative Binomial distributions Def: Negative Binomial distributions NegBinomial(r, p) A random variable X has the Negative Binomial distribution with parameters r and p if it has the pf { ( r+x 1 ) f (x r, p) = x p r (1 p) x for x = 0, 1, 2... 0 otherwise Parameter space: 0 < p < 1 and r positive integer. Say we have an infinite sequence of Bernoulli trials with parameter p X = number of failures before the r th success. Then X NegBinomial(r, p) Geometric(p) = NegBinomial(1, p) Theorem 5.5.2: If X 1,..., X r are i.i.d. Geometric(p) then X = X 1 + + X r NegBinomial(r, p)
19 / 43 5.5 Negative Binomial distributions sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions
20 / 43 Gamma distributions 5.7 Gamma distributions The Gamma function: Γ(α) = 0 x α 1 e x dx Γ(1) = 1 and Γ(0.5) = π Γ(α) = (α 1)Γ(α 1) if α > 1 Def: Gamma distributions Gamma(α, β) A continuous r.v. X has the gamma distribution with parameters α and β if it has the pdf { β α f (x α, β) = Γ(α) x α 1 e βx for x > 0 0 otherwise Parameter space: α > 0 and β > 0 Gamma(1, β) is the same as the exponential distribution with parameter β, Expo(β)
21 / 43 5.7 Gamma distributions Properties of the gamma distributions ψ(t) = ( β β t ) α, for t < β. E(X) = α β and E(X) = α β 2 If X 1,..., X k are independent Γ(α i, β) r.v. then ( k ) X 1 + + X k Gamma α i, β i=1
22 / 43 5.7 Gamma distributions Properties of the gamma distributions Theorem 5.7.9: Exponential distribution is memoryless Let X Expo(β) and let t > 0. Then for any h > 0 P(X t + h X t) = P(X h) Theorem 5.7.12: Times between arrivals in a Poisson process Let Z k be the time until the k th arrival in a Poisson process with rate β. Let Y 1 = Z 1 and Y k = Z k Z k 1 for k 2. Then Y 1, Y 2, Y 3,... are i.i.d. with the exponential distribution with parameter β.
23 / 43 Beta distributions 5.8 Beta distributions Def: Beta distributions Beta(α, β) A continuous r.v. X has the beta distribution with parameters α and β if it has the pdf { Γ(α+β) f (x α, β) = Γ(α)Γ(β) x α 1 (1 x) β 1 for 0 < x < 1 0 otherwise Parameter space: α > 0 and β > 0 Beta(1, 1) = Uniform(0, 1) Used to model a r.v.that takes values between 0 and 1. The Beta distributions are often used as prior distributions for probability parameters, e.g. the p in the Binomial distribution.
24 / 43 Beta distributions 5.8 Beta distributions
25 / 43 5.8 Beta distributions sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions Continuous univariate distributions: 5.6 Normal distributions 5.7 Gamma distributions Just skim 5.8 Beta distributions Multivariate distributions Just skim 5.9 Multinomial distributions 5.10 Bivariate normal distributions
26 / 43 5.6 Normal distributions Why Normal? Works well in practice. Many physical experiments have distributions that are approximately normal Central Limit Theorem: Sum of many i.i.d. random variables are approximately normally distributed Mathematically convenient especially the multivariate normal distribution. Can explicitly obtain the distribution of many functions of a normally distributed random variable have. Marginal and conditional distributions of a multivariate normal are also normal (multivariate or univariate). Developed by Gauss and then Laplace in the early 1800s Also known at the Gaussian distributions Gauss Laplace
27 / 43 Normal distributions 5.6 Normal distributions Def: Normal distributions N(µ, σ 2 ) A continuous r.v. X has the normal distribution with mean µ and variance σ 2 if it has the pdf f (x µ, σ 2 ) = 1 ) (x µ)2 exp ( 2π σ 2σ 2, < x < Parameter space: µ R and σ 2 > 0 Show: ψ(t) = exp ( µt + 1 2 σ2 t 2) E(X) = µ Var(X) = σ 2
28 / 43 The Bell curve 5.6 Normal distributions
29 / 43 Standard normal 5.6 Normal distributions Standard normal distribution: N(0, 1) The normal distribution with µ = 0 and σ 2 = 1 is called the standard normal distribution and the pdf and cdf are denoted as φ(x) and Φ(x) The cdf for a normal distribution cannot be expressed in closed form and is evaluated using numerical approximations. Φ(x) is tabulated in the back of the book. Many calculators and programs such as R, Matlab, Excel etc. can calculate Φ(x). Φ( x) = 1 Φ(x) Φ 1 (p) = Φ 1 (1 p)
30 / 43 5.6 Normal distributions Properties of the normal distributions Theorem 5.6.4: Linear transformation of a normal is still normal If X N(µ, σ 2 ) and Y = ax + b where a and b are constants and a 0 then Y N(aµ + b, a 2 σ 2 ) Let F be the cdf of X, where X N(µ, σ 2 ). Then ( ) x µ F(x) = Φ σ and F 1 (p) = µ + σφ 1 (p)
31 / 43 Example: Measured Voltage 5.6 Normal distributions Suppose the measured voltage, X, in a certain electric circuit has the normal distribution with mean 120 and standard deviation 2 1 What is the probability that the measured voltage is between 118 and 122? 2 Below what value will 95% of the measurements be?
32 / 43 5.6 Normal distributions Properties of the normal distributions Theorem 5.6.7: Linear combination of ind. normals is a normal Let X 1,..., X k be independent r.v. and X i N(µ i, σi 2 ) for i = 1,..., k. Then ( ) X 1 + + X k N µ 1 + + µ k, σ1 2 + + σ2 k Also, if a 1,..., a k and b are constants where at least one a i is not zero: In particular: a 1 X 1 + + a k X k + b N ( b + k a i µ i, i=1 k i=1 a 2 i σ2 i The sample mean: X n = 1 n n i=1 X i If X 1,..., X n are a random sample from a N(µ, σ 2 ), what is the distribution of the sample mean? )
33 / 43 5.6 Normal distributions Example: Measured voltage continued Suppose the measured voltage, X, in a certain electric circuit has the normal distribution with mean 120 and standard deviation 2. If three independent measurements of the voltage are made, what is the probability that the sample mean X 3 will lie between 118 and 120? Find x that satisfies P( X 3 120 x) = 0.95
34 / 43 Area under the curve 5.6 Normal distributions
35 / 43 5.6 Normal distributions Lognormal distributions Def: Lognormal distributions If log(x) N(µ, σ 2 ) then we say that X has the Lognormal distribution with parameters µ and σ 2. The support of the lognormal distribution is (0, ). Often used to model time before failure. Example: Let X and Y be independent random variables such that log(x) N(1.6, 4.5) and log(y ) N(3, 6). What is the distribution of the product XY?
36 / 43 Bivariate normal distributions 5.10 Bivariate normal distributions Def: Bivariate normal Two continuous r.v. X 1 and X 2 have the bivariate normal distribution with means µ 1 and µ 2, variances σ1 2 and σ2 2 and correlation ρ if they have the joint pdf 1 f (x 1, x 2 ) = 2π(1 ρ 2 ) 1/2 σ 1 σ 2 ( [ ( ) ( ) 1 (x1 µ 1 ) 2 x1 µ 1 x2 µ 2 exp 2ρ + (x ]) 2 µ 2 ) 2 2(1 ρ 2 ) σ 2 1 σ 1 σ 2 σ 2 2 (1) Parameter space: µ i R, σ 2 i > 0 for i = 1, 2 and 1 ρ 1
37 / 43 Bivariate normal pdf 5.10 Bivariate normal distributions Bivariate normal pdf with different ρ: Contours:
5.10 Bivariate normal distributions Bivariate normal as linear combination Theorem 5.10.1: Bivariate normal from two ind. standard normals Let Z 1 N(0, 1) and Z 2 N(0, 1) be independent. Let µ i R, σi 2 > 0 for i = 1, 2 and 1 ρ 1 and let X 1 = σ 1 Z 1 + µ 1 X 2 = σ 2 (ρz 1 + 1 ρ 2 Z 2 ) + µ 2 (2) Then the joint distribution of X 1 and X 2 is bivariate normal with parameters µ 1, µ 2, σ 2 1, σ2 2 and ρ Theorem 5.10.2 (part 1) the other way Let X 1 and X 2 have the pdf in (1). Then there exist independent standard normal r.v. Z 1 and Z 2 so that (2) holds. 38 / 43
39 / 43 Properties of a bivariate normal 5.10 Bivariate normal distributions Theorem 5.10.2 (part 2) Let X 1 and X 2 have the pdf in (1). Then the marginal distributions are X 1 N(µ 1, σ 2 1 ) and X 2 N(µ 2, σ 2 2 ) And the correlation between X 1 and X 2 is ρ Theorem 5.10.4: The conditional is normal Let X 1 and X 2 have the pdf in (1). Then the conditional distribution of X 2 given that X 1 = x 1 is (univariate) normal with E(X 2 X 1 = x 1 ) = µ 2 + ρσ 2 (x 1 µ 1 ) σ 1 and Var(X 2 X 1 = x 1 ) = (1 ρ 2 )σ 2 2
5.10 Bivariate normal distributions Properties of a bivariate normal Theorem 5.10.3: Uncorrelated Independent Let X 1 and X 2 have the bivariate normal distribution. Then X 1 and X 2 are independent if and only if they are uncorrelated. Only holds for the multivariate normal distribution One of the very convenient properties of the normal distribution Theorem 5.10.5: Linear combinations are normal Let X 1 and X 2 have the pdf in (1) and let a 1, a 2 and b be constants. Then Y = a 1 X 1 + a 2 X 2 + b is normally distributed with E(Y ) = a 1 µ 1 + a 2 µ 2 + b and Var(Y ) = a 2 1 σ2 1 + a2 2 σ2 2 + 2a 1a 2 ρσ 1 σ 2 This extends what we already had for independent normals 40 / 43
41 / 43 Example 5.10 Bivariate normal distributions Let X 1 and X 2 have the bivariate normal distribution with means µ 1 = 3, µ 2 = 5, variances σ1 2 = 4, σ2 2 = 9 and correlation ρ = 0.6. a) Find the distribution of X 2 2X 1 b) What is expected value of X 2, given that we observed X 1 = 2? c) What is the probability that X 1 > X 2?
42 / 43 5.10 Bivariate normal distributions Multivariate normal Matrix notation The pdf of an n-dimensional normal distribution, X N(µ, Σ): { 1 f (x) = (2π) n/2 exp 1 } Σ 1/2 2 (x µ) Σ 1 (x µ) where µ 1 x 1 µ 2 µ =., x = x 2. µ n x n σ 2 1 σ 1,2 σ 1,3 σ 1,n σ 2,1 σ2 2 σ 2,3 σ 2,n and Σ = σ 3,1 σ 3,2 σ3 2 σ 3,n....... σ n,1 σ n,2 σ n,3 σn 2 µ is the mean vector and Σ is called the variance-covariance matrix.
43 / 43 5.10 Bivariate normal distributions Multivariate normal Matrix notation Same things hold for multivariate normal distribution as the bivariate. Let X N(µ, Σ) Linear combinations of X are normal AX + b is (multivariate) normal for fixed matrix A and vector b The marginal distribution of X i is normal with mean µ i and variance σ 2 i The off-diagonal elements of Σ are the covariances between individual elements of X, i.e. Cov(X i, X j ) = σ i,j. The joint marginal distributions are also normal where the mean and covariance matrix are found by picking the corresponding elements from µ and rows and columns from Σ. The conditional distributions are also normal (multivariate or univariate)