Module 11: Linear Regression. Rebecca C. Steorts

Similar documents
Module 4: Bayesian Methods Lecture 5: Linear regression

ST 740: Linear Models and Multivariate Normal Inference

Part 6: Multivariate Normal and Linear Models

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Conjugate Analysis for the Linear Model

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Gibbs Sampling in Linear Models #2

Bayesian Linear Regression

The linear model is the most fundamental of all serious statistical models encompassing:

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Bayesian Linear Models

Bayesian Linear Models

An Introduction to Bayesian Linear Regression

Part 8: GLMs and Hierarchical LMs and GLMs

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Gibbs Sampling in Latent Variable Models #1

Multivariate Regression (Chapter 10)

Bayesian Linear Models

Foundations of Statistical Inference

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

A Bayesian Treatment of Linear Gaussian Regression

Gibbs Sampling in Linear Models #1

variability of the model, represented by σ 2 and not accounted for by Xβ

MCMC algorithms for fitting Bayesian models

Linear Regression (9/11/13)

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

MULTILEVEL IMPUTATION 1

STAT5044: Regression and Anova. Inyoung Kim

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Hierarchical Modeling for Univariate Spatial Data

1. Simple Linear Regression

CS6220: DATA MINING TECHNIQUES

[y i α βx i ] 2 (2) Q = i=1

Gibbs Sampling in Endogenous Variables Models

Statistics & Data Sciences: First Year Prelim Exam May 2018

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Chapter 1. Linear Regression with One Predictor Variable

Regression Models - Introduction

Metropolis-Hastings Algorithm

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

UNIVERSITY OF TORONTO Faculty of Arts and Science

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modeling for Spatial Data

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

MAS3301 Bayesian Statistics Problems 5 and Solutions

Hierarchical Linear Models

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Bayesian Linear Models

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Chapter 14. Linear least squares

STAT 525 Fall Final exam. Tuesday December 14, 2010

Bayesian Grouped Horseshoe Regression with Application to Additive Models

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

MATH 644: Regression Analysis Methods

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Association studies and regression

First Year Examination Department of Statistics, University of Florida

Lecture 5: A step back

Linear Models in Machine Learning

STA442/2101: Assignment 5

Multivariate Normal & Wishart

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Bayesian linear regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Lecture 5: Clustering, Linear Regression

Module 6: Introduction to Gibbs Sampling. Rebecca C. Steorts

AMS-207: Bayesian Statistics

Statistics: A review. Why statistics?

Markov Chain Monte Carlo

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Matrix Approach to Simple Linear Regression: An Overview

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

The Multivariate Normal Distribution

Robust Bayesian Simple Linear Regression

Bayesian Inference: Concept and Practice

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

BIOS 2083: Linear Models

Lecture 16: Mixtures of Generalized Linear Models

Bayesian Nonparametric Regression for Diabetes Deaths

The joint posterior distribution of the unknown parameters and hidden variables, given the

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Review for Final Exam Stat 205: Statistics for the Life Sciences

The Simple Regression Model. Part II. The Simple Regression Model

Machine Learning Linear Classification. Prof. Matteo Matteucci

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

STAT 730 Chapter 4: Estimation

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Hierarchical Modelling for Univariate Spatial Data

Bayesian Econometrics - Computer section

Can you tell the relationship between students SAT scores and their college grades?

IEOR165 Discussion Week 5

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Multivariate Bayesian Linear Regression MLAI Lecture 11

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Chapter 5 Matrix Approach to Simple Linear Regression

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Transcription:

Module 11: Linear Regression Rebecca C. Steorts

Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours this week. Optional review class next Tuesday, April 25th. Module 9 has been updated to match the Hoff book s notation.

Agenda What is linear regression Motivating Example Application from Hoff

Setup X n p : regression features or covariates (design matrix) x p 1 : ith row vector of the regression covariates y n 1 : response variable (vector) β p 1 : vector of regression coefficients Goal: Estimation of p(y x). Dimensions: y i β T x i = (1 1) (1 p)(p 1) = (1 1).

Health Insurance Example We want to predict whether or not a patient has health insurance based upon one covariate or predictor variable, income. Typically, we have many predictor variables, such as income, age, education level, etc. We store the predictor variables in a matrix X n p.

Normal Regression Model The Normal regression model specifies that E[Y x] is linear and the sampling variability around the mean is independent and identically (iid) from a normal distribution Y i = β T x i + e i (1) e 1,..., e n iid Normal(0, σ 2 )

Normal Regression Model (continued) This allows us to write down p(y 1,..., y n x 1,... x n, β, σ 2 ) (2) n = p(y i x i, β, σ 2 ) (3) i=1 (2πσ 2 ) n/2 exp{ 1 2σ 2 n (y i β T x i ) 2 } (4) i=1

Multivariate Setup Let s assume that we have data points (x i, y i ) available for all i = 1,..., n. y is the response variable y = y 1 y 2. y n n 1 x i is the ith row of the design matrix X n p. Consider the regression coefficients β = β 1 β 2. β p p 1

Multivariate Setup y X, β, σ 2 MVN(Xβ, σ 2 I) β MVN(0, τ 2 I) The likelihood in the multivariate setting simpifies to p(y 1,..., y n x 1,... x n, β, σ 2 ) (5) (2πσ 2 ) n/2 exp{ 1 n 2σ 2 (y i β T x i ) 2 } (6) i=1 (2πσ 2 ) n/2 exp{ 1 2σ 2 (y Xβ)T (y Xβ)} (7)

Posterior computation Let a = 1/σ 2 and b = 1/τ 2. p(β y, X) p(y X, β)p(β) (8) exp{ a/2(y Xβ) T (y Xβ)} exp{ b/2β T β)} (9) Just like in the Multivariate modules, we just simplify. (Check these details on your own). p(β y, X) MVN(β y, X, Λ 1 ) where Λ = ax T X + bi and µ = aλ 1 X T y.

Posterior computation (details) p(β y, X) (10) exp{ a 2 (y Xβ)T (y Xβ)} exp{ b 2 βt β)} (11) exp{ a 2 [y T y 2β T X T y + β T X T Xβ] b 2 βt β} (12) exp{aβ T X T y a 2 βt X T Xβ b/2β T β} (13) exp{aβ T [X T y] 1/2β T (ax T X + bi)β} (14) Then Λ = ax T X + bi and µ = aλ 1 X T y.

Linear Regression Applied to Swimming We will consider Exercise 9.1 in Hoff very closely to illustrate linear regression. The data set we consider contains times (in seconds) of four high school swimmers swimming 50 yards. There are 6 times for each student, taken every two weeks. Each row corresponds to a swimmer and a higher column index indicates a later date.

Data set read.table("https://www.stat.washington.edu/~pdhoff/book/da ## V1 V2 V3 V4 V5 V6 ## 1 23.1 23.2 22.9 22.9 22.8 22.7 ## 2 23.2 23.1 23.4 23.5 23.5 23.4 ## 3 22.7 22.6 22.8 22.8 22.9 22.8 ## 4 23.7 23.6 23.7 23.5 23.5 23.4

Full conditionals (Task 1) We will fit a separate linear regression model for each swimmer, with swimming time as the response and week as the explanatory variable. Let Y i R 6 be the 6 recorded times for swimmer i. Let 1 1 1 3 X i =... 1 9 1 11 be the design matrix for swimmer i. Then we use the following linear regression model: Y i N 6 ( Xβ i, τ 1 i I 6 ) β i N 2 (β 0, Σ 0 ) τ i Gamma(a, b). Derive full conditionals for β i and τ i.

Solution (Task 1) The conditional posterior for β i is multivariate normal with while V[β i Y i, X i, τ i ] = (Σ 1 0 + τxi T X i ) 1 E[β i Y i, X i, τ i ] = (Σ 1 0 + τ i Xi T X i ) 1 (Σ 1 0 β 0 + τ i Xi T Y i ). τ i Y i, X i, β Gamma ( a + 3, b + (Y i X i β i ) T (Y i X i β i ) 2 These can be found in in Hoff in section 9.2.1. ).

Task 2 Complete the prior specification by choosing a, b, β 0, and Σ 0. Let your choices be informed by the fact that times for this age group tend to be between 22 and 24 seconds.

Solution (Task 2) Choose a = b = 0.1 so as to be somewhat uninformative. Choose β 0 = [23 0] T with Σ 0 = [ ] 5 0. 0 2 This centers the intercept at 23 (the middle of the given range) and the slope at 0 (so we are assuming no increase) but we choose the variance to be a bit large to err on the side of being less informative.

Gibbs sampler (Task 3) Code a Gibbs sampler to fit each of the models. For each swimmer i, obtain draws from the posterior predictive distribution for yi, the time of swimmer i if they were to swim two weeks from the last recorded time.

Posterior Prediction (Task 4) The coach has to decide which swimmer should compete in a meet two weeks from the last recorded time. Using the posterior predictive distributions, compute Pr {yi = max (y1, y 2, y 3, y 4 )} for each swimmer i and use these probabilities to make a recommendation to the coach. This is left as an exercise.