variability of the model, represented by σ 2 and not accounted for by Xβ

Similar documents
Bayesian Linear Regression

AMS-207: Bayesian Statistics

The linear model is the most fundamental of all serious statistical models encompassing:

Weighted Least Squares

Bayesian Linear Models

Bayesian Linear Models

A Bayesian Treatment of Linear Gaussian Regression

Bayesian Linear Models

Gibbs Sampling in Linear Models #2

ST 740: Linear Models and Multivariate Normal Inference

Conjugate Analysis for the Linear Model

Bayesian linear regression

Bayesian Linear Models

Weighted Least Squares

Gibbs Sampling in Endogenous Variables Models

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Covariance and Correlation

Hierarchical Modeling for Univariate Spatial Data

Matrix Approach to Simple Linear Regression: An Overview

Multiparameter models (cont.)

Hierarchical Modelling for Univariate Spatial Data

1 Data Arrays and Decompositions

Sampling Distributions

November 2002 STA Random Effects Selection in Linear Mixed Models

Properties of the least squares estimates

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Modeling Real Estate Data using Quantile Regression

The joint posterior distribution of the unknown parameters and hidden variables, given the

Model Assessment and Comparisons

20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Statistical View of Least Squares

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Bayesian Inference. Chapter 9. Linear models and regression

Foundations of Statistical Inference

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Gibbs Sampling in Linear Models #1

Chapter 5 Matrix Approach to Simple Linear Regression

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

MCMC algorithms for fitting Bayesian models

Simple and Multiple Linear Regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests

Module 4: Bayesian Methods Lecture 5: Linear regression

Accounting for Complex Sample Designs via Mixture Models

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Multivariate Regression Analysis

Model Checking and Improvement

Part 6: Multivariate Normal and Linear Models

Multiple Linear Regression

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Multiple Linear Regression

Ch 2: Simple Linear Regression

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

David Giles Bayesian Econometrics

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Hierarchical Modelling for Univariate Spatial Data

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 4: Heteroskedasticity

Linear Models A linear model is defined by the expression

Sampling Distributions

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

STAT5044: Regression and Anova. Inyoung Kim

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

General Linear Model: Statistical Inference

MULTILEVEL IMPUTATION 1

A Few Special Distributions and Their Properties

Chapter 4 - Fundamentals of spatial processes Lecture notes

Module 11: Linear Regression. Rebecca C. Steorts

Chapter 14. Linear least squares

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Multivariate Normal & Wishart

STA 2201/442 Assignment 2

MLES & Multivariate Normal Theory

Bayesian Inference: Concept and Practice

MIT Spring 2015

Math 423/533: The Main Theoretical Topics

Bayesian Inference in the Multivariate Probit Model

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Well-developed and understood properties

Multivariate Regression (Chapter 10)

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Simple Linear Regression (Part 3)

Asymptotic Statistics-III. Changliang Zou

BIOS 2083 Linear Models c Abdus S. Wahed

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Lecture 4 Multiple linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Supplementary Material for Analysis of Job Satisfaction: The Case of Japanese Private Companies

Linear Model Under General Variance

Regression: Lecture 2

Lecture 15. Hypothesis testing in the linear model

An Introduction to Bayesian Linear Regression

Bayesian Econometrics

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Transcription:

Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability of the model, represented by σ 2 and not accounted for by Xβ posterior uncertainty in β and σ 2 due to the finite sample size of y. As n this uncertainty decreases to zero. Drawing a sample ỹ from its posterior predictive distribution can be done as follows 1. draw (β, σ 2 ) from p(β, σ 2 y) 2. draw ỹ N(Xβ, σ 2 I) 1

Given σ 2, the future observation ỹ has a normal distribution and the mean and the variance are given by E(ỹ y, σ 2 ) = E(E(ỹ β, σ 2, y) σ 2, y) = E( Xβ σ 2, y) = X ˆβ and V (ỹ σ 2, y) = E[V (ỹ β, σ 2, y) σ 2, y] +V [E(ỹ β, σ 2, y) σ 2, y] = E[σ 2 I σ 2, y] + V [ Xβ σ 2, y] = (I + XV XT β )σ 2 2

To determine p(ỹ y) we must average over the marginal posterior of σ 2, then, p(ỹ y) = N(ỹ X ˆβ, (I + XV XT β )σ 2 )p(σ 2 y)dσ 2 This is a multivariate t with center ˆβ, squared scale matrix ˆσ 2 (I + XV β XT ) and n k degrees of freedom. 3

Example The table gives short-term radon measurements for a sample of houses in three counties in Minnesota. All the measurements were recorded on the basement level of the houses, except for those indicated with *, which were recorded on the first floor. County Radon measurements (pci/l) Blue Earth 5.0, 13.0, 7.2, 6.8, 12.8, 5.8, 9.5, 6.0, 3.8, 14.3, 1.8, 6.9, 4.7, 9.5 Clay 0.9, 12.9, 2.6, 3.5, 26.6, 1.5, 13.0, 8.8, 19.5, 2.5, 9.0, 13.1, 3.6, 6.9 Goodhue 14.3, 6.9, 7.6, 9.8, 2.6, 43.5, 4.9, 3.5, 4.8, 5.6, 3.5, 3.9, 6.7 4

We can define a model in terms of indicator variables as follows, x 2 = 1 y i,j B.E. 0 y i,j B.E., x 3 = 1 y i,j C 0 y i,j C, z = 1 y i,j F.F. 0 y i,j F.F. i = 1, 2, 3, j = 1,..., n i. Then, the model can be written in the following form, log(y i,j ) = µ + α i + δ + ɛ i,j, ɛ i,j N(0, σ 2 ), with µ the mean effect for Goodhue, α 1 the effect of Blue Earth over µ, α 2 the effect of Clayton over µ, α 3 = 0, and δ the effect of the first floor. 5

Results 1 0 1 2 CONST BLUE E CLAY 1ST FL 6

Prediction Assume another house is sampled at random from Blue Earth County. We have two scenarios depending on whether the measurement we want to predict will be recorded on the basement or on the first floor. If we want to predict a basement measurement, we need to sample y rep from the posterior predictive distribution N(µ + α 1, σ 2 ). If we want a prediction for a first-floor measurement, then we need to sample y rep from the posterior predictive distribution N(µ + α 1 + δ, σ 2 ). Location 95% P.I Median Basement (0.526,29.663) 7.152 First floor (0.266,20.994) 5.012 7

Posterior predictive (basement) 0 200 600 0 50 100 150 radon measurement Posterior predictive (first floor) 0 200 400 600 800 0 20 40 60 80 100 120 140 radon measurement 8

Unequal Variances If we consider a linear model with a known general covariance matrix V, then we have y = Xβ + ε ε N(0, V ) Let V = LL the Cholesky decomposition of V. Then L 1 y = L 1 Xβ + v v N(0, I) Letting z = L 1 y and W = L 1 X, the LSE of β is the solution of W W ˆβ = W z. This is equivalent to X V 1 X ˆβ = X V 1 z 9

The conclusion is that, in order to deal with unequal variances we have to solve LW = X and LZ = y. There are several interesting special cases: 1. V = σ 2 V, V known, but σ 2 unknown. 2. V is a diagonal matrix. Then L ii = V ii and thus X and y are pre-multiplied by the inverse of roots of the diagonal elements of V, usually denoted as weights. 3. V ij = σ 2 h(i, j, φ), implying that the matrix is unknown, but there is a parametric form that its elements corresponds to. 4. When V is totally unknown then { 1 p(v β, y) V 1/2 exp 2 tr ( V 1 (y Xβ)(y Xβ) ) } p(v ) If p(v ) is an inverse Wishart then this full conditional corresponds to an inverse Wishart as well. 10

Including Prior Information So far we have considered only the case where the prior for β and σ 2 is non-informative. It is clear that using a prior for σ 2 that corresponds to an inverse gamma will not chance the analysis much. We can include information about β by using a multivariate normal, say β N(β 0, V β ). We can treat the prior for β as k additional data points by considering the model y = X β + ε, var(ε ) = V y 0 β 0 I k 0 V β We proceed by obtaining the posterior distribution of β from this model assuming p(β) 1. 11