Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010
Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method. Y i,1 : pre-instructional score for student i Y i,2 : post-instructional scorefor student i vector of observations for each student Y i = (Y i,1, Y i,2 ) Questions: Does the instructional method lead to improvements in reading comprehesion (on average)? If so, by how much? Can improvements be predicted based on the first test?
Plot Post Instructional Score 30 40 50 60 70 80 30 40 50 60 70 Pre instructional Score
Parameters We could model the data as bivariate normal Y i iid N2 (µ, Σ). Normal distributions are characterized by their mean vector µ = (µ 1, µ 2 ) where µ 1 = E[Y i,1 ] and µ 2 = E[Y i,2 ] variance-covariance matrix [ ] σ 2 Σ = 1 σ 12 σ 21 σ2 2 where σ 2 1 is the variance of Y i,1, σ 2 2 is the variance of Y i,2, and σ 12 is the covariance between Y i,1 and Y i,2, σ 21 = σ 12 = E[(Y i,1 µ 1 )(Y i,2 µ 2 )] Correlation between pre and post test scores is ρ = σ 12 σ 1 σ 2, measure of the strength of association. 1 ρ < 1
Multivariate Normal For more than 2 dimensions, we write Y N d (µ, Σ) E[Y ] = µ: d dimensional vector with means E[Y j ] Cov[Y ] = Σ: d d matrix with diagonal elements that are the variances of Y j and off diagonal elements that are the covariances E[(Y j µ j )(Y k µ k )] if Σ is positive definite x Σx > 0 for any x 0 in R d then there is a density or p(y ) = (2π) d/2 Σ d/2 exp( 1 2 (Y µ)t Σ 1 (Y µ)) p(y ) = (2π) d/2 Φ d/2 exp( 1 2 (Y µ)t Φ(Y µ) where Φ is the precision matrix, Φ = Σ 1.
Bayesian Inference Need to specify prior distributions, then use Bayes Theorem to obtain posterior distributions. Normal N d (µ 0, P0 1 ) is the semi-conjugate prior for the mean (conjugate given the covariance matrix Σ). Given the mean vector, the semi-conjugate prior for Φ is the Wishart distribution, a generalization of the gamma distribution to higher dimensions. If Φ Wishart d (ν 0, Φ 0 ) then it has density p(φ) = 1 C Φ (ν 0 d 1)/2 exp( 1 2 trφs 0) where C = 2 νd/2 π d(d 1)/4 d j=1 Γ(ν/2 + (1 j)/2) Φ 0 ν/2 If Φ Wishart d (ν 0, Φ 0 ) then E[Φ] = ν0 Φ 0 E[Σ] = 1 ν 0 d 1 Φ 1 0 Choose ν 0 = d + 2 so that E[Σ] = Φ 1 0
Priors Distributions Prior for µ: µ N 2 (( 50 50 Test designed to have a mean of 50 ) ( 625 312.5, 312.5 625 True mean constrained to be between 0 and 100; µ ± 2σ = (0, 100) implies σ 2 = (50/2) 2 = 625. prior correlation is 0.50 ))
Priors Continued Prior for the precision matrix ( 625 312.5 Φ Wishart 2 (4, 312.5 625 ) 1 ) loosely centered at the same covariance as in the normal mean prior distribution. ν 0 is a prior degrees of freedom ν 0 = d + 2 leads to E[Σ] = Φ 1 0 S 0 For the full conditional distributions, we just need to find the updated hyperparameters (see Hoff)
Full Conditionals µ Φ, Y N d (P o + nφ) 1 (nφȳ + P 0 m 0 ), (P 0 + nφ) 1 ) Φ µ, Y Wishart d (n + ν 0, S 0 + (y i µ)(y i µ)) Limiting Case: m 0 = 0, P 0 = 0, S 0 = 0, ν 0 = Jeffrey s prior
WinBugs Model multmodel=function() { for( i in 1 : N ) { Y[i,1:2] ~ dmnorm(mu[], Phi[,]) } mu[1:2] ~ dmnorm(mu0[], prec[,]) Phi[1:2, 1:2] ~ dwish(phi0[,], nu0 ) effect <- mu[2] - mu[1] Sigma[1:2,1:2] <- inverse(phi[,]) rho <- Sigma[1,2]/sqrt(Sigma[1,1]*Sigma[2,2]) Ynew[1:2] ~ dmnorm(mu[], Phi[,]) }
Joint Distribution of µ µ 2 45 50 55 60 65 70 40 45 50 55 60 µ 1
Predictive Distribution of a new student Y Y 2 0 20 40 60 80 100 0 20 40 60 80 100 Y 1
Rao-Blackwellization Monte Carlo estmates are noisy! Can do better by using the idea of Rao-Blackwellization E MC [g(θ 1 ) Y] = 1 T Iterated Expectations: t g(θ(t) 1 ) E[g(θ 1 ) Y] = E θ2 E θ1 θ 2 [g(θ 1 ) Y, θ 2 ] = E θ2 [ g(θ 2 ) Y ] 1 g(θ (t) 2 T ) E RB[g(θ 1 ) Y] t Calculate inner conditional expectation analytically May reduce variance over MCMC average (Liu et al 1996) Motivated by the classical Rao Blackwell Theorem that says that taking any estimator and conditioning on a sufficient statistic will reduce its mean squared error. (proof in Stat 215)
Density estimation Rather than use a histogram estimate of the density, use a Rao-Blackwell estimate. We know the posterior density of µ given Φ and Y, p(µ Σ, Y) where p(µ Y, Φ) = 1 p/2 { P n 1/2 exp 1 } 2π 2 (µ µ n) P n (µ µ n ) prior precision P 0 = S 1 0 posterior precision P n = P 0 + nφ posterior mean µ n = P 1 n (P 0 µ 0 + ΦnȲ ) Use MC average over draws of Φ (1),..., Φ (T ) to integrate out Φ to obtain the marginal ˆp(µ Y) = 1 p(µ Y, Φ (t) ) T t
Alternative Priors on Σ No reason to restrict attention to semi-conjugate prior distributions, as long as the prior on Σ lead to positive definite matrices. See Dongchu Sun and Jim Berger s paper for an interesting discussion of priors for the Multivariate normal model. http://www.stat.missouri.edu/ dsun/psfiles/sunbergerv8.pdf May require Metropolis-Hastings algorithms as full conditionals distributions are not recognizable. Automatic in WinBugs.