Multivariate Normal & Wishart

Similar documents
STAT 425: Introduction to Bayesian Analysis

Part 6: Multivariate Normal and Linear Models

Markov Chain Monte Carlo

Multiparameter models (cont.)

Conjugate Priors for Normal Data

ST 740: Linear Models and Multivariate Normal Inference

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Bayesian Inference in a Normal Population

Riemann Manifold Methods in Bayesian Statistics

Student-t Process as Alternative to Gaussian Processes Discussion

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Bayesian Inference in a Normal Population

David Giles Bayesian Econometrics

Principles of Bayesian Inference

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Monte Carlo in Bayesian Statistics

Bayes: All uncertainty is described using probability.

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Markov Chain Monte Carlo methods

Bayesian Analysis of RR Lyrae Distances and Kinematics

Bayesian linear regression

Conjugate Priors for Normal Data

2 Bayesian Hierarchical Response Modeling

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

An Introduction to Bayesian Linear Regression

Bayesian Model Comparison:

Predictive Distributions

MCMC algorithms for fitting Bayesian models

Bayesian Inference for the Multivariate Normal

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Chapter 8: Sampling distributions of estimators Sections

Metropolis-Hastings Algorithm

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Hierarchical Modeling for Univariate Spatial Data

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Part 4: Multi-parameter and normal models

The Bayesian Approach to Multi-equation Econometric Model Estimation

Bayesian Inference. Chapter 9. Linear models and regression

Advanced Statistical Modelling

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

Principles of Bayesian Inference

Bayesian Regression Linear and Logistic Regression

Markov Chain Monte Carlo methods

F denotes cumulative density. denotes probability density function; (.)

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Bayesian Linear Regression

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Bayesian Inference and Decision Theory

Bayesian Inference for Regression Parameters

Recall that the AR(p) model is defined by the equation

Bayesian Estimation of Input Output Tables for Russia

INFINITE MIXTURES OF MULTIVARIATE GAUSSIAN PROCESSES

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Module 4: Bayesian Methods Lecture 5: Linear regression

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Hierarchical Linear Models

Part 7: Hierarchical Modeling

Density Estimation. Seungjin Choi

Bayesian Estimation of DSGE Models

Introduction to Bayesian Inference

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

STA 4273H: Statistical Machine Learning

Robust Bayesian Regression

MCMC and Gibbs Sampling. Sargur Srihari

Markov Chain Monte Carlo Methods

variability of the model, represented by σ 2 and not accounted for by Xβ

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Bayesian Inference and Decision Theory

Lecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007

The lmm Package. May 9, Description Some improved procedures for linear mixed models

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Curve Fitting Re-visited, Bishop1.2.5

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Bayesian Inference and MCMC

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Bayesian spatial hierarchical modeling for temperature extremes

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Lecture 14 Bayesian Models for Spatio-Temporal Data

Bayesian GLMs and Metropolis-Hastings Algorithm

BAYESIAN INFERENCE FOR A COVARIANCE MATRIX

Hierarchical Modelling for Univariate Spatial Data

eqr094: Hierarchical MCMC for Bayesian System Reliability

Darwin Uy Math 538 Quiz 4 Dr. Behseta

Likelihood-free MCMC

MULTILEVEL IMPUTATION 1

Bayesian non-parametric model to longitudinally predict churn

On Bayesian Computation

Down by the Bayes, where the Watermelons Grow

Hierarchical Modeling for Spatial Data

Bayesian data analysis in practice: Three simple examples

Nonparametric Bayesian Methods (Gaussian Processes)

MCMC Sampling for Bayesian Inference using L1-type Priors

STAT Advanced Bayesian Inference

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

First Year Examination Department of Statistics, University of Florida

Automated Sensitivity Analysis for Bayesian Inference via Markov Chain Monte Carlo: Applications to Gibbs Sampling

Transcription:

Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010

Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method. Y i,1 : pre-instructional score for student i Y i,2 : post-instructional scorefor student i vector of observations for each student Y i = (Y i,1, Y i,2 ) Questions: Does the instructional method lead to improvements in reading comprehesion (on average)? If so, by how much? Can improvements be predicted based on the first test?

Plot Post Instructional Score 30 40 50 60 70 80 30 40 50 60 70 Pre instructional Score

Parameters We could model the data as bivariate normal Y i iid N2 (µ, Σ). Normal distributions are characterized by their mean vector µ = (µ 1, µ 2 ) where µ 1 = E[Y i,1 ] and µ 2 = E[Y i,2 ] variance-covariance matrix [ ] σ 2 Σ = 1 σ 12 σ 21 σ2 2 where σ 2 1 is the variance of Y i,1, σ 2 2 is the variance of Y i,2, and σ 12 is the covariance between Y i,1 and Y i,2, σ 21 = σ 12 = E[(Y i,1 µ 1 )(Y i,2 µ 2 )] Correlation between pre and post test scores is ρ = σ 12 σ 1 σ 2, measure of the strength of association. 1 ρ < 1

Multivariate Normal For more than 2 dimensions, we write Y N d (µ, Σ) E[Y ] = µ: d dimensional vector with means E[Y j ] Cov[Y ] = Σ: d d matrix with diagonal elements that are the variances of Y j and off diagonal elements that are the covariances E[(Y j µ j )(Y k µ k )] if Σ is positive definite x Σx > 0 for any x 0 in R d then there is a density or p(y ) = (2π) d/2 Σ d/2 exp( 1 2 (Y µ)t Σ 1 (Y µ)) p(y ) = (2π) d/2 Φ d/2 exp( 1 2 (Y µ)t Φ(Y µ) where Φ is the precision matrix, Φ = Σ 1.

Bayesian Inference Need to specify prior distributions, then use Bayes Theorem to obtain posterior distributions. Normal N d (µ 0, P0 1 ) is the semi-conjugate prior for the mean (conjugate given the covariance matrix Σ). Given the mean vector, the semi-conjugate prior for Φ is the Wishart distribution, a generalization of the gamma distribution to higher dimensions. If Φ Wishart d (ν 0, Φ 0 ) then it has density p(φ) = 1 C Φ (ν 0 d 1)/2 exp( 1 2 trφs 0) where C = 2 νd/2 π d(d 1)/4 d j=1 Γ(ν/2 + (1 j)/2) Φ 0 ν/2 If Φ Wishart d (ν 0, Φ 0 ) then E[Φ] = ν0 Φ 0 E[Σ] = 1 ν 0 d 1 Φ 1 0 Choose ν 0 = d + 2 so that E[Σ] = Φ 1 0

Priors Distributions Prior for µ: µ N 2 (( 50 50 Test designed to have a mean of 50 ) ( 625 312.5, 312.5 625 True mean constrained to be between 0 and 100; µ ± 2σ = (0, 100) implies σ 2 = (50/2) 2 = 625. prior correlation is 0.50 ))

Priors Continued Prior for the precision matrix ( 625 312.5 Φ Wishart 2 (4, 312.5 625 ) 1 ) loosely centered at the same covariance as in the normal mean prior distribution. ν 0 is a prior degrees of freedom ν 0 = d + 2 leads to E[Σ] = Φ 1 0 S 0 For the full conditional distributions, we just need to find the updated hyperparameters (see Hoff)

Full Conditionals µ Φ, Y N d (P o + nφ) 1 (nφȳ + P 0 m 0 ), (P 0 + nφ) 1 ) Φ µ, Y Wishart d (n + ν 0, S 0 + (y i µ)(y i µ)) Limiting Case: m 0 = 0, P 0 = 0, S 0 = 0, ν 0 = Jeffrey s prior

WinBugs Model multmodel=function() { for( i in 1 : N ) { Y[i,1:2] ~ dmnorm(mu[], Phi[,]) } mu[1:2] ~ dmnorm(mu0[], prec[,]) Phi[1:2, 1:2] ~ dwish(phi0[,], nu0 ) effect <- mu[2] - mu[1] Sigma[1:2,1:2] <- inverse(phi[,]) rho <- Sigma[1,2]/sqrt(Sigma[1,1]*Sigma[2,2]) Ynew[1:2] ~ dmnorm(mu[], Phi[,]) }

Joint Distribution of µ µ 2 45 50 55 60 65 70 40 45 50 55 60 µ 1

Predictive Distribution of a new student Y Y 2 0 20 40 60 80 100 0 20 40 60 80 100 Y 1

Rao-Blackwellization Monte Carlo estmates are noisy! Can do better by using the idea of Rao-Blackwellization E MC [g(θ 1 ) Y] = 1 T Iterated Expectations: t g(θ(t) 1 ) E[g(θ 1 ) Y] = E θ2 E θ1 θ 2 [g(θ 1 ) Y, θ 2 ] = E θ2 [ g(θ 2 ) Y ] 1 g(θ (t) 2 T ) E RB[g(θ 1 ) Y] t Calculate inner conditional expectation analytically May reduce variance over MCMC average (Liu et al 1996) Motivated by the classical Rao Blackwell Theorem that says that taking any estimator and conditioning on a sufficient statistic will reduce its mean squared error. (proof in Stat 215)

Density estimation Rather than use a histogram estimate of the density, use a Rao-Blackwell estimate. We know the posterior density of µ given Φ and Y, p(µ Σ, Y) where p(µ Y, Φ) = 1 p/2 { P n 1/2 exp 1 } 2π 2 (µ µ n) P n (µ µ n ) prior precision P 0 = S 1 0 posterior precision P n = P 0 + nφ posterior mean µ n = P 1 n (P 0 µ 0 + ΦnȲ ) Use MC average over draws of Φ (1),..., Φ (T ) to integrate out Φ to obtain the marginal ˆp(µ Y) = 1 p(µ Y, Φ (t) ) T t

Alternative Priors on Σ No reason to restrict attention to semi-conjugate prior distributions, as long as the prior on Σ lead to positive definite matrices. See Dongchu Sun and Jim Berger s paper for an interesting discussion of priors for the Multivariate normal model. http://www.stat.missouri.edu/ dsun/psfiles/sunbergerv8.pdf May require Metropolis-Hastings algorithms as full conditionals distributions are not recognizable. Automatic in WinBugs.