USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Similar documents
STAT 830 Bayesian Estimation

PMR Learning as Inference

Gibbs Sampling in Linear Models #2

A Very Brief Summary of Bayesian Inference, and Examples

Linear Models A linear model is defined by the expression

Bayesian statistics, simulation and software

The exponential family: Conjugate priors

Foundations of Statistical Inference

Lecture 2: Priors and Conjugacy

Chapter 5 continued. Chapter 5 sections

Introduction to Probabilistic Machine Learning

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

ST 740: Linear Models and Multivariate Normal Inference

Introduction to Bayesian Methods

Beta statistics. Keywords. Bayes theorem. Bayes rule

Remarks on Improper Ignorance Priors

Chapter 5. Bayesian Statistics

Bayesian statistics, simulation and software

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017


FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am


Data Analysis and Uncertainty Part 2: Estimation

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Machine Learning

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Advanced topics from statistics

Hierarchical Models & Bayesian Model Selection

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Introduction to Machine Learning. Lecture 2

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Lecture 2: Conjugate priors

Bayesian statistics, simulation and software

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Naïve Bayes classification

A Bayesian Treatment of Linear Gaussian Regression

CSC321 Lecture 18: Learning Probabilistic Models

Non-Parametric Bayes

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

More Spectral Clustering and an Introduction to Conjugacy

GAUSSIAN PROCESS REGRESSION

Metropolis-Hastings Algorithm

Multivariate Bayesian Linear Regression MLAI Lecture 11

Bayesian Inference. Chapter 9. Linear models and regression

Introduction to Bayesian Statistics 1

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

CSC 2541: Bayesian Methods for Machine Learning

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Lecture : Probabilistic Machine Learning

CS281A/Stat241A Lecture 22

An Introduction to Bayesian Linear Regression

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

HPD Intervals / Regions

Learning Bayesian network : Given structure and completely observed data

Time Series and Dynamic Models

ST 740: Multiparameter Inference

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Chapter 5. Chapter 5 sections

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Bayesian Models in Machine Learning

Computer intensive statistical methods

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Department of Large Animal Sciences. Outline. Slide 2. Department of Large Animal Sciences. Slide 4. Department of Large Animal Sciences

STAT Advanced Bayesian Inference

Principles of Bayesian Inference

Probability and Estimation. Alan Moses

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Lecture 5: GPs and Streaming regression

Approximation of Posterior Means and Variances of the Digitised Normal Distribution using Continuous Normal Approximation

Principles of Bayesian Inference

Lecture 13 and 14: Bayesian estimation theory

Chapter 8: Sampling distributions of estimators Sections

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Lecture 11: Probability Distributions and Parameter Estimation

Bayesian RL Seminar. Chris Mansley September 9, 2008

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

A Few Special Distributions and Their Properties

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian inference: what it means and why we care

Introduction to Probabilistic Graphical Models: Exercises

Gaussian Models (9/9/13)

LECTURE 15 Markov chain Monte Carlo

INTRODUCTION TO BAYESIAN STATISTICS

Basics on Probability. Jingrui He 09/11/2007

Principles of Bayesian Inference

A tutorial on. Richard Boys

Bayesian Linear Regression

Probability Background

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Introduction to Machine Learning

Default priors and model parametrization

Part 6: Multivariate Normal and Linear Models

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Transcription:

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate normal Given a vector x IR p the multivariate normal density is fx) = exp 2π) p/2 Σ /2 Now split the vector into two parts [ ] x x =, µ = and Σ = x 2 [ µ µ 2 [ ] Σ Σ 2, of size Σ 2 Σ 22 2 x µ)t Σ x µ) ] [ ] q, of size, p q) We now state the joint and marginal distributions and the conditional density ) [ ] q q q p q) p q) q p q) p q) x Nµ,Σ ), x 2 Nµ 2,Σ 22 ), x Nµ,Σ), x x 2 N µ +Σ 2 Σ 22 x 2 µ 2 ),Σ Σ 2 Σ 22 Σ 2) The same idea holds for other sizes of partitions 32 Conjugate priors 32 Univariate normals 32 Fixed variance, random mean We consider the parameter σ 2 fixed so we are interested in the conjugate prior for µ: πµ µ 0,σ 2 ) exp ) σ 0 2σ0 2 µ µ 0 ) 2, where µ 0 and σ 2 are hyper-parameters for the prior distribution when we don t have informative prior knowledge we typically consider µ 0 = 0 and σ 2 large) 3

4 S MUKHERJEE, PROBABILISTIC MACHINE LEARNING The posterior distribution for x,,x n with a univariate normal likelihood and the above prior will be Postµ x,,x n ) N σ 2 0 σ 2 n +σ2 0 x+ σ2 µ σ 2 0, n +σ2 σ 2 + n ) ) 0 0 σ 2 322 Fixed mean, random variance We will formulate this setting with two parameterizations of the scale parameter: ) the variance σ 2, 2) the precision τ = σ 2 The two conjugate distributions are the Gamma and the inverse Gamma really they are the same distribution, just reparameterized) IGα,β) : fσ 2 ) = βα Γα) σ2 ) α exp βσ 2 ) ), Gaα,β) : fτ) = βα Γα) τα exp βτ) The posterior distribution of σ 2 is σ 2 x,,x n IG α+ n 2,β + xi µ) 2) 2 The posterior distribution of τ is not surprisingly τ x,,x n Ga α+ n 2,β + xi µ) 2) 2 323 Random mean, random variance We now put the previous priors together in what is called a Bayesian hierarchical model: x i µ,τ iid Nµ,τ) ) µ τ Nµ 0,κ 0 τ) ) τ Gaα,β) For the above likelihood and priors the posterior distribution for the mean and precision is µ0 κ 0 +n x µ τ,x,,x n N,τn+κ 0 )) ) n+κ 0 τ x,,x n Ga α+ n 2,β + xi x) 2 + n x x i ) 2 ) 2 n+ 2 322 Multivariate normal Given a vector x IR p the multivariate normal density is fx) = exp ) 2π) p/2 Σ /2 2 x µ)t Σ x µ) We will work with the precision matrix instead of the covariance and we will consider the following Bayesian hierarchical model: x i µ,λ iid Nµ,Λ) ) µ Λ Nµ 0,κ 0 Λ) ) Λ WiΛ 0,n 0 ), the precision matrix is modeled using the Wishart distribution fλ;v,n) = Λ n d )/2 exp 5trΛV )) 2 nd/2 V n/2 Γ d n/2)

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 5 For the above likelihood and priors the posterior distribution for the mean and precision is µ0 κ 0 +n x µ Λ,x,,x n N,Λn+κ 0 )) ) n+κ 0 Λ x,,x n Wi n 0 + n 2,Λ 0 + [ Σ+ κ ]) 0 2 κ 0 +n x µ 0) x µ 0 ) T

LECTURE 4 A Bayesian approach to linear regression The main motivations behind a Bayesian formalism for inference are a coherent approach to modeling uncertainty as well as an axiomatic framework for inference We will reformulate multivariate linear regression from a Bayesian formulation in this section Bayesian inference involves thinking in terms of probability distributions and conditional distributions One important idea is that of a conjugate prior Another tool we will use extensively in this class is the multivariate normal distribution and its properties 4 Conjugate priors Given a likelihood function px θ) and a prior πθ) on can write the posterior as px θ)πθ) pθ x) = px θ θ )πθ )dθ = px,θ) px), where px) is the marginal density for the data, px,θ) is the joint density of the data and the parameter θ The idea of a prior and likelihood being conjugate is that the prior and the posterior densities belong to the same family We now state some examples to illustrate this idea Beta, Binomial: Consider the Binomial likelihood with n the number of trials) fixed ) n fx p,n) = p x p) n x, x the parameter of interest the probability of a success) is p [0,] A natural prior distribution for p is the Beta distribution which has density πp;α,β) = Γα+β) Γα)Γβ) pα p) β, p 0,) and α,β > 0, 7

8 S MUKHERJEE, PROBABILISTIC MACHINE LEARNING where Γα+β) Γα)Γβ) is a normalization constant Given the prior and the likelihood densities the posterior density modulo normalizing constants will take the form [ ) n fp x) Γα+β) ] p x p) n x p α p) β, x Γα)Γβ) p x+α p) n x+β, which means that the posterior distribution of p is also a Beta with p x Betaα+x,β +n x) Normal, Normal: Given a normal distribution with unknown mean the density for the likelihood is fx θ,σ 2 ) exp ) 2σ 2x θ)2, and one can specify a normal prior πθ;θ 0,τ0) 2 exp ) 2τ0 2 θ θ 0 ) 2, with hyper-parameters θ 0 and τ 0 The resulting posterior distribution will have the following density function fθ x) exp ) 2σ 2x θ)2 exp ) 2τ0 2 θ θ 0 ) 2, which after completing squares and reordering can be written as θ x Nθ,τ 2 ), θ = 42 Bayesian linear regression We start with the likelihood as and the prior as fy X,β,σ 2 ) = n i= θ 0 + x τ0 2 σ 2 τ0 2 σ 2π exp πβ) exp + σ 2, τ 2 = 2τ 2 0 τ 2 0 y i β T x i 2 ) β T β The density of the posterior is [ n Postβ D) σ 2π exp y i β T x i 2 ) ] 2σ 2 i= 2σ 2 + σ 2 ) exp 2π) p/2 γ/2 ) 2τ0 2 β T β With a good bit of manipulation the above can be rewritten as a multivariate normal distribution β Y,X,σ 2 N p µ,σ ) with Σ = τ0 2 I p +σ 2 X T X), µ = σ 2 Σ X T Y Note the similarities of the above distribution to the MAP estimator Relate the mean of the above estimator to the MAP estimator

LECTURE 4 A BAYESIAN APPROACH TO LINEAR REGRESSION 9 Predictive distribution: Given data D = {x i,y i )} i= n and and a new value x one would like to estimate y This can be done using the posterior and is called the posterior predictive distribution fy D,x,σ 2,τ0) 2 = fy x,β,σ 2 )fβ Y,X,σ 2,τ0) 2 dβ, IR p where with some manipulation where y D,x,σ 2,τ 2 0 Nµ,σ 2 ), µ = σ 2Σ X T Yx, σ 2 = σ 2 +x T Σ x