Multiparameter models (cont.)

Similar documents
Part 6: Multivariate Normal and Linear Models

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Multivariate Normal & Wishart

A Few Special Distributions and Their Properties

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

BAYESIAN INFERENCE FOR A COVARIANCE MATRIX

1 Data Arrays and Decompositions

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

variability of the model, represented by σ 2 and not accounted for by Xβ

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution. Charles J. Geyer School of Statistics University of Minnesota

Bayesian hypothesis testing

STAT J535: Chapter 5: Classes of Bayesian Priors

Probability Distributions

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Chapter 5 continued. Chapter 5 sections

Data Mining Techniques. Lecture 3: Probability

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Bayesian Inference. Chapter 9. Linear models and regression

Advanced topics from statistics

MAS3301 Bayesian Statistics

Bayesian Models in Machine Learning

PMR Learning as Inference

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Bayesian Linear Regression

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Bayesian Inference for the Multivariate Normal

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Default Priors and Effcient Posterior Computation in Bayesian

Contents 1. Contents

A Bayesian Treatment of Linear Gaussian Regression

Lecture 3. Univariate Bayesian inference: conjugate analysis

STAT Advanced Bayesian Inference

STT 843 Key to Homework 1 Spring 2018

Introduction to Bayesian Computation

Introduction to Machine Learning

Chapter 5. Chapter 5 sections

More Spectral Clustering and an Introduction to Conjugacy

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

Probability Theory. Patrick Lam

Probability and Estimation. Alan Moses

Part 4: Multi-parameter and normal models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Stat 5101 Lecture Notes

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Hierarchical models. Dr. Jarad Niemi. STAT Iowa State University. February 14, 2018

Prior Choice, Summarizing the Posterior

Conjugate Priors, Uninformative Priors

Conditional distributions

Linear Models A linear model is defined by the expression

Introduc)on to Bayesian Methods

ST 740: Linear Models and Multivariate Normal Inference

Chapter 5 Class Notes

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

November 2002 STA Random Effects Selection in Linear Mixed Models

Basics on Probability. Jingrui He 09/11/2007

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability Theory for Machine Learning. Chris Cremer September 2015

Inference for a Population Proportion

Lecture 16: Mixtures of Generalized Linear Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Stochastic dynamic models for low count observations

Part 8: GLMs and Hierarchical LMs and GLMs

Nonparametric Bayes Uncertainty Quantification

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

ST 740: Multiparameter Inference

Introduction to Normal Distribution

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

IE 361 Module 4. Modeling Measurement. Reading: Section 2.1 Statistical Methods for Quality Assurance. ISU and Analytics Iowa LLC

Bayesian Inference. Chapter 2: Conjugate models

Overall Objective Priors

Introduction to Bayesian Methods

CSC411 Fall 2018 Homework 5

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Gibbs Sampling in Endogenous Variables Models

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 5: Joint Probability Distributions

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Bayesian Inference for Normal Mean

Weakly informative priors

Lecture 1: August 28

AMS-207: Bayesian Statistics

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Gibbs Sampling in Linear Models #2

Lecture 2: Repetition of probability theory and statistics

Accounting for Population Uncertainty in Covariance Structure Analysis

Multivariate Random Variable

Stat 5101 Notes: Brand Name Distributions

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Transcription:

Multiparameter models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 1, 2018 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 1 / 20

Outline Multinomial Multivariate normal Unknown mean Unknown mean and covariance In the process, we ll introduce the following distributions Multinomial Dirichlet Multivariate normal Inverse Wishart (and Wishart) normal-inverse Wishart distribution Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 2 / 20

Multinomial Motivating examples Multivariate count data: Item-response (Likert scale) Voting Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 3 / 20

Multinomial Multinomial distribution Suppose there are K categories and each individual independently chooses category k with probability π k such that K k=1 π k = 1. Let y k be the number of individuals who choose category k with n = K k=1 y k being the total number of individuals. Then Y = (Y 1,..., Y K ) has a multinomial distribution, i.e. Y Mult(n, π), with probability mass function (pmf) p(y) = n! k k=1 π y k k y k!. Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 4 / 20

Multinomial Properties of the multinomial distribution The multinomial distribution with pmf: has the following properties: E[Y k ] = nπ k V [Y k ] = nπ k (1 π k ) p(y) = n! Cov[Y k, Y k ] = nπ k π k for k k k k=1 π y k k y k! Marginally, each component of a multinomial distribution is a binomial distribution with Y k Bin(n, π k ). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 5 / 20

Multinomial Dirichlet distribution Let π = (π 1,..., π K ) have a Dirichlet distribution, i.e. π Dir(a), with concentration parameter a = (a 1,..., a K ) where a k > 0 for all k. The probability density function (pdf) for π is p(π) = 1 Beta(a) K k=1 π a k 1 k with K k=1 π k = 1 and Beta(a) is the multinomial beta function, i.e. Beta(a) = K k=1 Γ(a k) Γ( K k=1 a k). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 6 / 20

Multinomial Properties of the Dirichlet distribution The Dirichlet distribution with pdf p(π) K k=1 π a k 1 k has the following properties (where a 0 = K k=1 a k): E[π k ] = a k a 0 V [π k ] = a k(a 0 a k ) a 2 0 (a 0+1) Cov[π k, π k ] = a ka k a 2 0 (a 0+1) Marginally, each component of a Dirichlet distribution is a beta distribution with π k Be(a k, a 0 a k ). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 7 / 20

Multinomial Bayesian inference The conjugate prior for a multinomial distribution, i.e. Y M ult(n, π), with unknown probability vector π is a Dirichlet distribution. The Jeffreys prior is a Dirichlet distribution with a k = 0.5 for all k. Some argue that for large K, this prior will put too much mass on rare categories and would suggest the Dirichlet prior with a k = 1/K for all k. The posterior under a Dirichlet prior is p(π y) p(y π)p(π) [ K k=1 πy k k = K k=1 πa k+y k 1 k ] [ K k=1 πa k 1 k ] Thus π y Dir(a + y). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 8 / 20

Multivariate normal distribution Let Y = (Y 1,..., Y K ) have a multivariate normal distribution, i.e. Y N K (µ, Σ) with mean µ and variance-covariance matrix Σ. The probability density function (pdf) for Y is ( p(y) = (2π) k/2 Σ 1/2 exp 1 ) 2 (y µ) Σ 1 (y µ) Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 9 / 20

Bivariate normal contours Contours of a bivariate normal with correlation of 0.8 3 2 1 0 1 2 3 4 5 2 6 7 8 9 10 10 9 8 7 1 6 5 3 3 2 1 0 1 2 3 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 10 / 20

Properties of the multivariate normal distribution The multivariate normal distribution with pdf ( p(y) = (2π) k/2 Σ 1/2 exp 1 ) 2 (y µ) Σ 1 (y µ) has the following properties: E[Y k ] = µ k V [Y k ] = Σ kk Cov[Y k, Y k ] = Σ k,k Marginally, each component of a multivariate normal distribution is a normal distribution with Y k N(µ, Σ kk ). Conditional distributions are also normal, i.e. if ( ) ([ ] [ ]) Y1 µ1 Σ11 Σ N, 12 Y 2 µ 2 Σ 21 Σ 22 then Y 1 Y 2 = y 2 N ( µ 1 + Σ 12 Σ 1 22 (y 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 11 / 20

Representing independence in a multivariate normal Let Y N(µ, Σ) with precision matrix Ω = Σ 1. If Σ k,k = 0, then Y k and Y k are independent of each other. If Ω k,k = 0, then Y k and Y k are conditionally independent of each other given Y j for j k, k. Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 12 / 20

Unknown mean Default inference with an unknown mean ind Let Y i N K (µ, S) with default prior p(µ) 1 where Y i = (Y i1,..., Y ik ), then p(µ y) p(y µ)p(µ) exp ( 1 n 2 i=1 (y i µ) S 1 (y i µ) ) = exp ( 1 2 tr(s 1 S 0 ) ) where S 0 = n (y i µ)(y i µ). i=1 This posterior is proper if n K and, in that case, is ( µ y N K y, 1 ) n S. where this y = (y 1,..., y K ) has elements y k = 1 n y n ik. i=1 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 13 / 20

Unknown mean Conjugate inference with an unknown mean Let Y i ind N(µ, S) with conjugate prior µ N K (m, C) p(µ y) p(y µ)p(µ) exp ( 1 n 2 i=1 (y i µ) S 1 (y i µ) ) exp ( 1 2 µ m) C 1 (µ m) ) = exp ( 1 2 (µ m ) C 1 (µ m ) ) and thus where µ y N(m, C ) C = [ C 1 + ns 1] 1 m = C [ C 1 m + ns 1 y ]. Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 14 / 20

Unknown mean Inverse Wishart distribution Let the K K matrix Σ have an inverse Wishart distribution, i.e. Σ IW (v, W 1 ), with degrees of freedom v > K 1 and positive definite scale matrix W. The pdf for Σ is p(σ) Σ (v+k+1)/2 exp ( 1 2 tr ( W Σ 1)). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 15 / 20

Unknown mean Properties of the inverse Wishart distribution The inverse Wishart distribution with pdf ( p(σ) Σ (v+k+1)/2 exp 1 2 tr ( W Σ 1)). has the following properties: E[Σ] = (v K 1) 1 W for v > K + 1. Marginally, σ 2 k = Σ kk Inv χ 2 (v, W kk ). If a K K matrix W has a Wishart distribution, i.e. W W ishart(v, S), then W 1 IW (v, S 1 ). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 16 / 20

Unknown mean Normal-inverse Wishart distribution A multivariate generalization of the normal-scaled-inverse-χ 2 distribution is the normal-inverse Wishart distribution. For a vector µ R K and K K matrix Σ, the normal-inverse Wishart distribution is µ Σ N(m, Σ/c) Σ IW (v, W 1 ) The marginal distribution for µ, i.e. p(µ) = p(µ Σ)p(Σ)dΣ, is a multivariate t-distribution, i.e. µ t v K+1 (m, W/[c(v K + 1)]). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 17 / 20

Unknown mean and covariance Conjugate inference with unknown mean and covariance Let Y i ind N(µ, Σ) with conjugate prior which has pdf p(µ, Σ) Σ ((v+k)/2+1) exp µ Σ N(m, Σ/c) Σ IW (v, W 1 ) ( 1 2 tr(w Σ 1 ) c ) 2 (µ m) Σ 1 (µ m). The posterior is a normal-inverse Wishart with parameters where = c + n = v + n m = k k+n m + c v W n k+n y = W + S + kn k+n S = (y m)(y m) n (y i y)(y i y). i=1 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 18 / 20

Unknown mean and covariance Default inference with unknown mean and covariance The prior Σ IW (K + 1, I) is non-informative in the sense that marginally each correlation has a uniform distribution on (-1,1). The prior p(µ, Σ) Σ (K+1)/2, which can be thought of as a normal-inverse-wishart distribution with c 0, v 1, and W 0, results in the posterior distribution µ Σ, y N(y, Σ/n) Σ y IW (n 1, S 1 ). Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 19 / 20

Unknown mean and covariance Issues with the inverse Wishart distribution Marginals of the IW have an IG (or scaled-inverse-χ 2 ) distribution and therefore inherit the low density near zero resulting in a (possible) bias for small variances toward larger values. Due to the above issue, and the relationship between the variances and the correlations (http://www.themattsimpson.com/2012/08/20/ prior-distributions-for-covariance-matrices-the-scaled-inverse-wishart-prior/ the correlations can be biased: small variances imply small correlations large variances imply large correlations Remedies: Don t blindly use I for the scale matrix in an IW, instead use a reasonable diagonal matrix for your data set. Use the scaled Inverse wishart distribution (see pg 74) Use the separation strategy, i.e. Σ = Λ where is diagonal and Λ is a correlation matrix, where you specify the standard deviations (or variances) and correlations separately. In this case, Gelman recommends putting the LKJ prior (see page 582) on the correlation matrix. Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 20 / 20