Introduction to Normal Distribution

Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1

Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2

Outline of Notes 1) Univariate Normal: Distribution form Standard normal Probability calculations Affine transformations Parameter estimation 3) Multivariate Normal: Distribution form Probability calculations Affine transformations Conditional distributions Parameter estimation 2) Bivariate Normal: Distribution form Probability calculations Affine transformations Conditional distributions 4) Sampling Distributions: Univariate case Multivariate case Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3

Univariate Normal Univariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4

Univariate Normal Distribution Form Normal Density Function (Univariate) Given a variable x R, the normal probability density function (pdf) is where µ R is the mean f (x) = 1 σ (x µ) 2 2π e 2σ 2 = 1 } (1) { σ 2π exp (x µ)2 2σ 2 σ > 0 is the standard deviation (σ 2 is the variance) e 2.71828 is base of the natural logarithm Write X N(µ, σ 2 ) to denote that X follows a normal distribution. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5

Univariate Normal Standard Normal Distribution Standard Normal If X N(0, 1), then X follows a standard normal distribution: f (x) = 1 2π e x 2 /2 (2) f(x) 0.0 0.2 0.4 4 2 0 2 4 x Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 6

Univariate Normal Probability Calculations Probabilities and Distribution Functions Probabilities relate to the area under the pdf: P(a X b) = b a f (x)dx = F (b) F(a) (3) where F(x) = x is the cumulative distribution function (cdf). f (u)du (4) Note: F(x) = P(X x) = 0 F(x) 1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7

Normal Probabilities Univariate Normal Probability Calculations Helpful figure of normal probabilities: 0.0 0.1 0.2 0.3 0.4 3σ 34.1% 34.1% 0.1% 2.1% 13.6% 13.6% 2.1% 0.1% 2σ 1σ µ 1σ 2σ 3σ From http://en.wikipedia.org/wiki/file:standard_deviation_diagram.svg Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8

Univariate Normal Probability Calculations Normal Distribution Functions (Univariate) Helpful figures of normal pdfs and cdfs: 1.0 1.0 0.8 2 μ= 0, σ = 0.2, 2 μ= 0, σ = 1.0, 2 μ= 0, σ = 5.0, 2 μ= 2, σ = 0.5, 0.8 2 μ= 0, σ = 0.2, 2 μ= 0, σ = 1.0, 2 μ= 0, σ = 5.0, 2 μ= 2, σ = 0.5, φ μ,σ 2( x) 0.6 0.4 Φ μ,σ 2(x) 0.6 0.4 0.2 0.2 0.0 0.0 4 2 1 0 2 4 5 3 1 3 5 x 4 2 1 0 2 4 5 3 1 3 5 x http://en.wikipedia.org/wiki/file:normal_distribution_pdf.svg http://en.wikipedia.org/wiki/file:normal_distribution_cdf.svg Note that the cdf has an elongated S shape, referred to as an ogive. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9

Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X N(µ, σ 2 ) and a, b R with a 0. If we define Y = ax + b, then Y N(aµ + b, a 2 σ 2 ). Suppose that X N(1, 2). Determine the distributions of... Y = X + 3 Y = 2X + 3 Y = 3X + 2 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10

Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X N(µ, σ 2 ) and a, b R with a 0. If we define Y = ax + b, then Y N(aµ + b, a 2 σ 2 ). Suppose that X N(1, 2). Determine the distributions of... Y = X + 3 = Y N(1(1) + 3, 1 2 (2)) N(4, 2) Y = 2X + 3 = Y N(2(1) + 3, 2 2 (2)) N(5, 8) Y = 3X + 2 = Y N(3(1) + 2, 3 2 (2)) N(5, 18) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10

Likelihood Function Univariate Normal Parameter Estimation Suppose that x = (x 1,..., x n ) is an iid sample of data from a normal distribution with mean µ and variance σ 2 iid, i.e., x i N(µ, σ 2 ). The likelihood function for the parameters (given the data) has the form L(µ, σ 2 x) = n f (x i ) = i=1 n i=1 and the log-likelihood function is given by { 1 exp (x i µ) 2 } 2πσ 2 2σ 2 LL(µ, σ 2 x) = n i=1 log(f (x i )) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 n (x i µ) 2 i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11

Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Mean The MLE of the mean is the value of µ that minimizes n (x i µ) 2 = i=1 n i=1 x 2 i 2n xµ + nµ 2 where x = (1/n) n i=1 x i is the sample mean. Taking the derivative with respect to µ we find that n i=1 (x i µ) 2 µ = 2n x + 2nµ x = ˆµ i.e., the sample mean x is the MLE of the population mean µ. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12

Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Variance The MLE of the variance is the value of σ 2 that minimizes n 2 log(σ2 ) + 1 2σ 2 n (x i ˆµ) 2 = n 2 log(σ2 ) + i=1 where x = (1/n) n i=1 x i is the sample mean. n i=1 x i 2 2σ 2 n x 2 2σ 2 Taking the derivative with respect to σ 2 we find that n 2 log(σ2 ) + 1 2σ 2 n i=1 (x i ˆµ) 2 σ 2 = n 2σ 2 1 2σ 4 n (x i ˆµ) 2 which implies that the sample variance ˆσ 2 = (1/n) n i=1 (x i x) 2 is the MLE of the population variance σ 2. i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 13

Bivariate Normal Bivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14

Bivariate Normal Distribution Form Normal Density Function (Bivariate) Given two variables x, y R, the bivariate normal pdf is { [ 1 (x µx ) 2 + 2(1 ρ 2 ) exp f (x, y) = σ 2 x (y µy )2 σ 2 y 2πσ x σ y 1 ρ 2 ]} 2ρ(x µx )(y µy ) σ x σ y (5) where µ x R and µ y R are the marginal means σ x R + and σ y R + are the marginal standard deviations 0 ρ < 1 is the correlation coefficient X and Y are marginally normal: X N(µ x, σ 2 x) and Y N(µ y, σ 2 y) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15

Bivariate Normal Distribution Form Example: µ x = µ y = 0, σ 2 x = 1, σ 2 y = 2, ρ = 0.6/ 2 f(x, y) 0.12 0.10 0.08 0.06 0.04 0.02 4 2 x 0 2 4 4 2 0 2 y 4 y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) http://en.wikipedia.org/wiki/file:multivariatenormal.png Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16

Bivariate Normal Example: Different Means Distribution Form µ x = 0, µ y = 0 µ x = 1, µ y = 2 µ x = 1, µ y = 1 y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) Note: for all three plots σ 2 x = 1, σ 2 y = 2, and ρ = 0.6/ 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17

Bivariate Normal Example: Different Correlations Distribution Form ρ = 0.6/ 2 ρ = 0 ρ = 1.2/ 2 y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.05 0.10 0.15 0.20 f(x, y) Note: for all three plots µ x = µ y = 0, σ 2 x = 1, and σ 2 y = 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18

Bivariate Normal Example: Different Variances Distribution Form σ y = 1 σ y = 2 σ y = 2 y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.05 0.10 0.15 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 f(x, y) Note: for all three plots µ x = µ y = 0, σ 2 x = 1, and ρ = 0.6/(σ x σ y ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19

Bivariate Normal Probability Calculations Probabilities and Multiple Integration Probabilities still relate to the area under the pdf: P([a x X b x ] and [a y Y b y ]) = bx by a x a y f (x, y)dydx (6) where f (x, y)dydx denotes the multiple integral of the pdf f (x, y). Defining z = (x, y), we can still define the cdf: F(z) = P(X x and Y y) = x y f (u, v)dvdu (7) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20

Bivariate Normal Probability Calculations Normal Distribution Functions (Bivariate) Helpful figures of bivariate normal pdf and cdf: y 4 2 0 2 4 4 2 0 2 4 x 0.00 0.02 0.04 0.06 0.08 0.10 0.12 f(x, y) y 4 2 0 2 4 4 2 0 2 4 x 0.0 0.2 0.4 0.6 0.8 1.0 F(x, y) Note: µ x = µ y = 0, σ 2 x = 1, σ 2 y = 2, and ρ = 0.6/ 2 Note that the cdf still has an ogive shape (now in two-dimensions). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21

Bivariate Normal Affine Transformations Affine Transformations of Normal (Bivariate) Given z = (x, y), suppose that z N(µ, Σ) where µ = (µ x, µ y ) is the 2 1 mean vector ( ) σ 2 Σ = x ρσ x σ y is the 2 2 covariance matrix ρσ x σ y σ 2 y ( ) a11 a Let A = 12 and b = a 21 a 22 ( b1 b 2 ) with A 0 2 2 = ( ) 0 0. 0 0 If we define w = Az + b, then w N(Aµ + b, AΣA ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22

Bivariate Normal Conditional Distributions Conditional Normal (Bivariate) The conditional distribution of a variable Y given X = x is f Y X (y X = x) = f XY (x, y) f X (x) (8) where f XY (x, y) is the joint pdf of X and Y f X (x) is the marginal pdf of X In the bivariate normal case, we have that Y X N(µ, σ 2 ) (9) where µ = µ y + ρ σy σ x (x µ x ) and σ 2 = σ 2 y(1 ρ 2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23

Bivariate Normal Conditional Distributions Derivation of Conditional Normal To prove Equation (9), simply write out the definition and simplify: f Y X (y X = x) = f XY (x, y) f X (x) { = = = exp exp exp { { 1 2(1 ρ 2 ) 1 2(1 ρ 2 ) 1 2σ 2 y (1 ρ2 ) [ (x µx ) 2 σ 2 x [ (x µx ) 2 [ σ 2 x ρ 2 σ2 y σ 2 x (y µy )2 + σy 2 { exp (x µx )2 2σ 2 x { } [ exp 1 2σy 2 y µ y ρ σy ] 2 = (1 ρ2 ) σx (x µx ) 2πσy 1 ρ 2 ]} 2ρ(x µx )(y µy ) / (2πσ ) x σ y 1 ρ 2 σx σy } ) / (σ x 2π ] 2ρ(x µx )(y µy ) + σx σy } (x µx )2 2σx 2 (y µy )2 + σy 2 2πσy 1 ρ 2 ]} (x µ x ) 2 + (y µ y ) 2 2ρ σy (x µx )(y µy ) σx 2πσy 1 ρ 2 which completes the proof. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24

Bivariate Normal Conditional Distributions Statistical Independence for Bivariate Normal Two variables X and Y are statistically independent if f XY (x, y) = f X (x)f Y (y) (10) where f XY (x, y) is joint pdf, and f X (x) and f Y (y) are marginals pdfs. Note that if X and Y are independent, then f Y X (y X = x) = f XY (x, y) f X (x) = f X (x)f Y (y) f X (x) = f Y (y) (11) so conditioning on X = x does not change the distribution of Y. If X and Y are bivariate normal, what is the necessary and sufficient condition for X and Y to be independent? Hint: see Equation (9) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25

Bivariate Normal Conditional Distributions Example #1 A statistics class takes two exams X (Exam 1) and Y (Exam 2) where the scores follow a bivariate normal distribution with parameters: µ x = 70 and µ y = 60 are the marginal means σ x = 10 and σ y = 15 are the marginal standard deviations ρ = 0.6 is the correlation coefficient Suppose we select a student at random. What is the probability that... (a) the student scores over 75 on Exam 2? (b) the student scores over 75 on Exam 2, given that the student scored X = 80 on Exam 1? (c) the sum of his/her Exam 1 and Exam 2 scores is over 150? (d) the student did better on Exam 1 than Exam 2? (e) P(5X 4Y > 150)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26

Example #1: Part (a) Bivariate Normal Conditional Distributions Answer for 1(a): Note that Y N(60, 15 2 ), so the probability that the student scores over 75 on Exam 2 is P(Y > 75) = P ( Z > = P(Z > 1) = 1 P(Z < 1) = 1 Φ(1) ) 75 60 15 = 1 0.8413447 = 0.1586553 where Φ(x) = x 1 f (z)dz with f (x) = e x 2 /2 denoting the standard 2π normal pdf (see R code for use of pnorm to calculate this quantity). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27

Example #1: Part (b) Bivariate Normal Conditional Distributions Answer for 1(b): Note that (Y X = 80) N(µ, σ 2 ) where µ = µ Y + ρ σ Y σ X (x µ X ) = 60 + (0.6)(15/10)(80 70) = 69 σ 2 = σ 2 Y (1 ρ2 ) = 15 2 (1 0.6 2 ) = 144 If a student scored X = 80 on Exam 1, the probability that the student scores over 75 on Exam 2 is ( ) 75 69 P(Y > 75 X = 80) = P Z > 12 = P (Z > 0.5) = 1 Φ(0.5) = 1 0.6914625 = 0.3085375 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28

Example #1: Part (c) Bivariate Normal Conditional Distributions Answer for 1(c): Note that (X + Y ) N(µ, σ 2 ) where µ = µ X + µ Y = 70 + 60 = 130 σ 2 = σ 2 X + σ2 Y + 2ρσ X σ Y = 10 2 + 15 2 + 2(0.6)(10)(15) = 505 The probability that the sum of Exam 1 and Exam 2 is above 150 is ( ) 150 130 P(X + Y > 150) = P Z > 505 = P (Z > 0.8899883) = 1 Φ(0.8899883) = 1 0.8132639 = 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29

Example #1: Part (d) Bivariate Normal Conditional Distributions Answer for 1(d): Note that (X Y ) N(µ, σ 2 ) where µ = µ X µ Y = 70 60 = 10 σ 2 = σ 2 X + σ2 Y 2ρσ X σ Y = 10 2 + 15 2 2(0.6)(10)(15) = 145 The probability that the student did better on Exam 1 than Exam 2 is P(X > Y ) = P(X Y > 0) ( = P Z > 0 10 ) 145 = P (Z > 0.8304548) = 1 Φ( 0.8304548) = 1 0.2031408 = 0.7968592 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30

Example #1: Part (e) Bivariate Normal Conditional Distributions Answer for 1(e): Note that (5X 4Y ) N(µ, σ 2 ) where µ = 5µ X 4µ Y = 5(70) 4(60) = 110 σ 2 = 5 2 σ 2 X + ( 4)2 σ 2 Y + 2(5)( 4)ρσ X σ Y = 25(10 2 ) + 16(15 2 ) 2(20)(0.6)(10)(15) = 2500 Thus, the needed probability can be obtained using ( ) 150 110 P(5X 4Y > 150) = P Z > 2500 = P (Z > 0.8) = 1 Φ(0.8) = 1 0.7881446 = 0.2118554 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31

Example #1: R Code Bivariate Normal Conditional Distributions # Example 1a > pnorm(1,lower=f) [1] 0.1586553 > pnorm(75,mean=60,sd=15,lower=f) [1] 0.1586553 # Example 1b > pnorm(0.5,lower=f) [1] 0.3085375 > pnorm(75,mean=69,sd=12,lower=f) [1] 0.3085375 # Example 1d > pnorm(-10/sqrt(145),lower=f) [1] 0.7968592 > pnorm(0,mean=10,sd=sqrt(145),lower=f) [1] 0.7968592 # Example 1e > pnorm(0.8,lower=f) [1] 0.2118554 > pnorm(150,mean=110,sd=50,lower=f) [1] 0.2118554 # Example 1c > pnorm(20/sqrt(505),lower=f) [1] 0.1867361 > pnorm(150,mean=130,sd=sqrt(505),lower=f) [1] 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32

Multivariate Normal Multivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33

Multivariate Normal Distribution Form Normal Density Function (Multivariate) Given x = (x 1,..., x p ) with x j R j, the multivariate normal pdf is where f (x) = { 1 (2π) p/2 exp 1 } Σ 1/2 2 (x µ) Σ 1 (x µ) µ = (µ 1,..., µ p ) is the p 1 mean vector σ 11 σ 12 σ 1p σ 21 σ 22 σ 2p Σ =..... is the p p covariance matrix. σ p1 σ p2 σ pp (12) Write x N(µ, Σ) or x N p (µ, Σ) to denote x is multivariate normal. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34

Multivariate Normal Distribution Form Some Multivariate Normal Properties The mean and covariance parameters have the following restrictions: µ j R for all j σ jj > 0 for all j σ ij = ρ ij σii σ jj where ρ ij is correlation between X i and X j σij 2 σ ii σ jj for any i, j {1,..., p} (Cauchy-Schwarz) Σ is assumed to be positive definite so that Σ 1 exists. Marginals are normal: X j N(µ j, σ jj ) for all j {1,..., p}. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35

Multivariate Normal Multivariate Normal Probabilities Probability Calculations Probabilities still relate to the area under the pdf: P(a j X j b j j) = b1 bp f (x)dx p dx 1 (13) a 1 a p where f (x)dx p dx 1 denotes the multiple integral f (x). We can still define the cdf of x = (x 1,..., x p ) F(x) = P(X j x j j) = x1 xp f (u)du p du 1 (14) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36

Multivariate Normal Affine Transformations Affine Transformations of Normal (Multivariate) Suppose that x = (x 1,..., x p ) and that x N(µ, Σ) where µ = {µ j } p 1 is the mean vector Σ = {σ ij } p p is the covariance matrix Let A = {a ij } n p and b = {b i } n 1 with A 0 n p. If we define w = Ax + b, then w N(Aµ + b, AΣA ). Note: linear combinations of normal variables are normally distributed. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37

Multivariate Normal Conditional Distributions Multivariate Conditional Distributions Given variables x = (x 1,..., x p ) and y = (y 1,..., y q ), we have f Y X (y X = x) = f XY (x, y) f X (x) (15) where f Y X (y X = x) is the conditional distribution of y given x f XY (x, y) is the joint pdf of x and y f X (x) is the marginal pdf of x Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38

Multivariate Normal Conditional Normal (Multivariate) Suppose that z N(µ, Σ) where z = (x, y ) = (x 1,..., x p, y 1,..., y q ) Conditional Distributions µ = (µ x, µ y) = (µ 1x,..., µ px, µ 1y,..., µ qy ) Note: µ x is mean vector of x, and µ y is mean vector of y ( ) Σxx Σ Σ = xy where (Σ xx ) p p, (Σ yy ) q q, and (Σ xy ) p q, Σ xy Σ yy Note: Σ xx is covariance matrix of x, Σ yy is covariance matrix of y, and Σ xy is covariance matrix of x and y In the multivariate normal case, we have that y x N(µ, Σ ) (16) where µ = µ y + Σ xyσ 1 xx (x µ x ) and Σ = Σ yy Σ xyσ 1 xx Σ xy Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39

Multivariate Normal Conditional Distributions Statistical Independence for Multivariate Normal Using Equation (16), we have that y x N(µ, Σ ) N(µ y, Σ yy ) (17) if and only if Σ xy = 0 p q (a matrix of zeros). Note that Σ xy = 0 p q implies that the p elements of x are uncorrelated with the q elements of y. For multivariate normal variables: uncorrelated independent For non-normal variables: uncorrelated independent Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40

Example #2 Multivariate Normal Conditional Distributions Each Delicious Candy Company store makes 3 size candy bars: regular (X 1 ), fun size (X 2 ), and big size (X 3 ). Assume the weight (in ounces) of the candy bars (X 1, X 2, X 3 ) follow a multivariate normal distribution with parameters: 5 4 1 0 µ = 3 and Σ = 1 4 2 7 0 2 9 Suppose we select a store at random. What is the probability that... (a) the weight of a regular candy bar is greater than 8 oz? (b) the weight of a regular candy bar is greater than 8 oz, given that the fun size bar weighs 1 oz and the big size bar weighs 10 oz? (c) P(4X 1 3X 2 + 5X 3 < 63)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41

Example #2: Part (a) Multivariate Normal Conditional Distributions Answer for 2(a): Note that X 1 N(5, 4) So, the probability that the regular bar is more than 8 oz is ( P(X 1 > 8) = P Z > 8 5 ) 2 = P(Z > 1.5) = 1 Φ(1.5) = 1 0.9331928 = 0.0668072 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42

Example #2: Part (b) Multivariate Normal Conditional Distributions Answer for 2(b): (X 1 X 2 = 1, X 3 = 10) is normally distributed, see Equation (16). The conditional mean of (X 1 X 2 = 1, X 3 = 10) is given by µ = µ X1 + Σ 12 Σ 1 22 ( x µ) = 5 + ( 1 0 ) ( ) 1 ( ) 4 2 1 3 2 9 10 7 = 5 + ( 1 0 ) ( ) ( ) 1 9 2 2 32 2 4 3 = 5 + 24/32 = 5.75 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43

Multivariate Normal Example #2: Part (b) continued Conditional Distributions Answer for 2(b) continued: The conditional variance of (X 1 X 2 = 1, X 3 = 10) is given by σ 2 = σ 2 X 1 Σ 12 Σ 1 22 Σ 12 = 4 ( 1 0 ) ( ) 1 ( ) 4 2 1 2 9 0 = 4 ( 1 0 ) ( 1 9 2 32 2 4 = 4 9/32 = 3.71875 ) ( 1 0 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44

Multivariate Normal Example #2: Part (b) continued Conditional Distributions Answer for 2(b) continued: So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz, the probability that the regular bar is more than 8 oz is ( P(X 1 > 8 X 2 = 1, X 3 = 10) = P Z > 8 5.75 ) 3.71875 = P(Z > 1.166767) = 1 Φ(1.166767) = 1 0.8783477 = 0.1216523 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45

Multivariate Normal Conditional Distributions Example #2: Part (c) Answer for 2(c): (4X 1 3X 2 + 5X 3 ) is normally distributed. The expectation of (4X 1 3X 2 + 5X 3 ) is given by µ = 4µ X1 3µ X2 + 5µ X3 = 4(5) 3(3) + 5(7) = 46 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46

Multivariate Normal Example #2: Part (c) continued Conditional Distributions Answer for 2(c) continued: The variance of (4X 1 3X 2 + 5X 3 ) is given by σ 2 = ( 4 3 5 ) 4 Σ 3 5 = ( 4 3 5 ) 4 1 0 4 1 4 2 3 0 2 9 5 = ( 4 3 5 ) 19 6 39 = 289 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47

Multivariate Normal Example #2: Part (c) continued Conditional Distributions Answer for 2(c) continued: So, the needed probability can be obtained as ( ) 63 46 P(4X 1 3X 2 + 5X 3 < 63) = P Z < 289 = P(Z < 1) = Φ(1) = 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48

Example #2: R Code Multivariate Normal Conditional Distributions # Example 2a > pnorm(1.5,lower=f) [1] 0.0668072 > pnorm(8,mean=5,sd=2,lower=f) [1] 0.0668072 # Example 2b > pnorm(2.25/sqrt(119/32),lower=f) [1] 0.1216523 > pnorm(8,mean=5.75,sd=sqrt(119/32),lower=f) [1] 0.1216523 # Example 2c > pnorm(1) [1] 0.8413447 > pnorm(63,mean=46,sd=17) [1] 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49

Likelihood Function Multivariate Normal Parameter Estimation Suppose that x i = (x i1,..., x ip ) is a sample from a normal distribution iid with mean vector µ and covariance matrix Σ, i.e., x i N(µ, Σ). The likelihood function for the parameters (given the data) has the form L(µ, Σ X) = n f (x i ) = i=1 n i=1 and the log-likelihood function is given by { 1 (2π) p/2 exp 1 } Σ 1/2 2 (x i µ) Σ 1 (x i µ) LL(µ, Σ X) = np 2 log(2π) n 2 log( Σ ) 1 2 n (x i µ) Σ 1 (x i µ) i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50

Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Mean Vector The MLE of the mean vector is the value of µ that minimizes n (x i µ) Σ 1 (x i µ) = i=1 n x i Σ 1 x i 2n x Σ 1 µ + nµ Σ 1 µ i=1 where x = (1/n) n i=1 x i is the sample mean vector. Taking the derivative with respect to µ we find that LL(µ, Σ X) µ = 2nΣ 1 x + 2nΣ 1 µ x = ˆµ The sample mean vector x is the MLE of the population mean µ vector. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51

Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Covariance Matrix The MLE of the covariance matrix is the value of Σ that minimizes n log( Σ 1 ) + n tr{σ 1 (x i ˆµ)(x i ˆµ) } i=1 where ˆµ = x = (1/n) n i=1 x i is the sample mean. Taking the derivative with respect to Σ 1 we find that LL(µ, Σ X) Σ 1 = nσ + n (x i ˆµ)(x i ˆµ) i.e., the sample covariance matrix ˆΣ = (1/n) n i=1 (x i x)(x i x) is the MLE of the population covariance matrix Σ. i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52

Sampling Distributions Sampling Distributions Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53

Sampling Distributions Univariate Case Univariate Sampling Distributions: x and s 2 In the univariate normal case, we have that x = (1/n) n i=1 x i N(µ, σ 2 /n) (n 1)s 2 = n i=1 (x i x) 2 σ 2 χ 2 n 1 χ 2 k denotes a chi-square variable with k degrees of freedom. σ 2 χ 2 k = k i=1 z2 i where z i iid N(0, σ 2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54

Sampling Distributions Multivariate Case Multivariate Sampling Distributions: x and S In the multivariate normal case, we have that x = (1/n) n i=1 x i N(µ, Σ/n) (n 1)S = n i=1 (x i x)(x i x) W n 1 (Σ) W k (Σ) denotes a Wishart variable with k degrees of freedom. W k (Σ) = k i=1 z iz i where z i iid N(0 p, Σ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55