AMS-207: Bayesian Statistics

Similar documents
variability of the model, represented by σ 2 and not accounted for by Xβ

Ch 3: Multiple Linear Regression

The linear model is the most fundamental of all serious statistical models encompassing:

A Bayesian Treatment of Linear Gaussian Regression

Multiple Linear Regression

Simple and Multiple Linear Regression

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Bayesian Linear Models

Bayesian Linear Models

ST430 Exam 1 with Answers

Linear Models A linear model is defined by the expression

Ch 2: Simple Linear Regression

Stat 5102 Final Exam May 14, 2015

Bayesian Linear Regression

MS&E 226: Small Data

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Sampling Distributions

UNIVERSITY OF TORONTO Faculty of Arts and Science

Lecture 4 Multiple linear regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Sampling Distributions

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Lecture 18: Simple Linear Regression

5.2 Expounding on the Admissibility of Shrinkage Estimators

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models


Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009

[y i α βx i ] 2 (2) Q = i=1

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Inference for Regression

Linear Regression Models P8111

Regression: Lecture 2

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Ma 3/103: Lecture 24 Linear Regression I: Estimation

ST430 Exam 2 Solutions

Problem Selected Scores

Generalized Linear Models. Kurt Hornik

1 Data Arrays and Decompositions

CAS MA575 Linear Models

Lecture 1: Linear Models and Applications

Linear Regression Model. Badr Missaoui

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Swarthmore Honors Exam 2012: Statistics

Density Temp vs Ratio. temp

Applied Regression Analysis

Linear Methods for Prediction

Math 423/533: The Main Theoretical Topics

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Linear Models and Estimation by Least Squares

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Lecture 6 Multiple Linear Regression, cont.

STAT 3022 Spring 2007

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

Gibbs Sampling in Endogenous Variables Models

ECON 5350 Class Notes Functional Form and Structural Change

Lecture 2. The Simple Linear Regression Model: Matrix Approach

14 Multiple Linear Regression

Bayesian Linear Models

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Linear Methods for Prediction

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Chapter 1. Linear Regression with One Predictor Variable

Probability and Statistics Notes

Module 4: Bayesian Methods Lecture 5: Linear regression

Coefficient of Determination

Introduction to Estimation Methods for Time Series models. Lecture 1

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Lecture 15. Hypothesis testing in the linear model

Caterpillar Regression Example: Conjugate Priors, Conditional & Marginal Posteriors, Predictive Distribution, Variable Selection

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

MS&E 226: Small Data

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Handout 4: Simple Linear Regression

ST 740: Linear Models and Multivariate Normal Inference

Random and Mixed Effects Models - Part III

LECTURE 5 HYPOTHESIS TESTING

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Simple Linear Regression

Quick Review on Linear Multiple Regression

Model Checking and Improvement

Foundations of Statistical Inference

9. Least squares data fitting

36-720: The Rasch Model

Lecture 4: Regression Analysis

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

Bayesian Linear Models

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Model Specification and Data Problems. Part VIII

Least Squares Solutions for Overdetermined Systems Joel S Steele

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

MATH 644: Regression Analysis Methods

Bayesian Ingredients. Hedibert Freitas Lopes

Transcription:

Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable. NOTATION. y (continuous) is the response or outcome variable; x = (x 1,..., x k ) (discrete or continuous) are the explanatory variables; We will denote y = (y 1,..., y n ) the vector of outcomes and X the n k matrix of explanatory variables. 1

Linear Regression The normal linear model is a model such that the distribution of y X is a normal whose mean is a linear function of X Usually x i1 = 1. E(y i β, X) = β 1 x i1 +... + β k x ik, i = 1 : n. In matrix notation we write y = Xβ + ɛ where y R n, X R n k and ɛ N n (0, σ 2 I). 2

Example 42 specimens of radiate pine (Carlin & Chib, 1995 and Williams 1995). For each specimen the maximum compressive strength y i was measured, with its density x i and its density adjusted for resin content z i. strength strength 1500 3000 4500 1500 3000 4500 20 25 30 35 density 20 25 30 35 adjusted 3

Two models can be considered in this case M 1 := E(y i β (1), X) = β (1) 1 + β (1) 2 x i M 2 := E(y i β (2), Z) = β (2) 1 + β (2) 2 z i For model M 1 : n = 42, k = 2, x 1i = 1, x 2i = x i, β 1 = β (1) 1 and β 2 = β (1) 2. For model M 2 : n = 42, k = 2, x 1i = 1, x 2i = z i, β 1 = β (2) 1 and β 2 = β (2) 2. 4

Classical Regression Consider M 2. If y i N(β (2) 1 + β (2) 2 z i, σ 2 2), the maximum likelihood estimator of β (2) is given by the solution of Z T Z= Z T y, i.e. ˆβ (2) = (Z T Z) 1 Z T y. Furthermore, ˆβ (2) N(β (2), σ 2 2(Z T Z) 1 ). The MLE of σ 2 2 is given by, σ 2 2 = (y Z ˆβ (2) ) T (y Z ˆβ (2) )/n, however, this estimator is not unbiased, so an unbiased estimator is given by ˆσ 2 2 = (y Z ˆβ (2) ) T (y Z ˆβ (2) )/(n k). 5

Computing the LSE The goal is to find β such that y Xβ is minimized. We obtain the QR decomposition of X. So, X = QR where Q is an orthogonal matrix (Q Q = I). and R a rectangular matrix such that only the upper triangle has non 0 entries. Then y Xβ = Q y Q QRβ = Q y Rβ Write Q = (Q 1, Q 2 ), where Q 1 corresponds to the first k columns of Q. Then Q y Rβ 2 = Q 1y Rβ 2 + Q 2y 2 Thus, the solution to the LSE problem is given by Q 1y = R ˆβ. The residual sum of squares is Q 2y 2. 6

Fitting the linear regression in R >pines.linear<-lm(strength~adjusted) Call: lm(formula = strength ~ adjusted) Residuals: Min 1Q Median 3Q Max -623.907-188.821 4.951 197.334 619.691 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -1917.639 252.874-7.583 2.93e-09 *** adjusted 183.273 9.304 19.698 < 2e-16 *** --- Signif. codes:0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 276.9 on 40 degrees of freedom 7

Distributions ˆβ N(β, σ 2 (X T X) 1 ). This justifies the following 100(1 α)% C.I. for the regression coefficients β i, ˆβ i ± t α/2,n kˆσ (X T X) 1 ii A 95% C.I. for β (2) 2 is given by (164.5, 202.1) We can test the following hypothesis on each β i The test statistics is given by H 0 : β i = 0 vs H 1 : β i 0 t = ˆσ ˆβ i, (X T X) 1 ii 8

F Test When comparing two nested models we can use the F test. Let X 0 and X 1 denote the corresponding design matrices and ˆβ 0, ˆβ 1 the LSE. If H 0 is correct, then f = ( ˆβ T 1 X T 1 y ˆβ T 0 X T 0 y)/(p q) (y T y ˆβ T 1 XT 1 y)/(n p) F p q,n p Therefore, values of f that are large relative to the F p q,n p provide evidence against H 0. 9

Sufficient Statistics The likelihood for a normal linear model is given by f(y β, σ 2, X) We note that ( ) n/2 { 1 σ 2 exp 1 } 2σ 2 (y Xβ) (y Xβ) (y Xβ) (y Xβ) = (β ˆβ) (X X)(β ˆβ) + y Xβ 2 So ˆβ and ˆσ 2 are sufficient statistics for β and σ 2. So f(y β, σ 2, X) ( ) n/2 { 1 σ 2 exp 1 } 2σ 2 (β ˆβ) (X X)(β ˆβ) { exp 1 } (n k)ˆσ2 2σ2 10

We consider the model The Bayesian Approach y β, σ 2, X N(Xβ, σ 2 I) p(β, σ 2 X) σ 2 Notice that this model assumes conditionality on X. The situation where the regressors are subject to error require a prior distribution for X. The posterior distribution. Conditional posterior of β. p(β, σ 2 y) = p(β σ 2, y)p(σ 2 y) β σ 2, y N( ˆβ, V β σ 2 ) with ˆβ = (X T X) 1 X T y and V β = (X T X) 1. 11

Marginal posterior of σ 2. Marginal posterior of β. p(β y) p(σ 2 y) = p(β, σ2 y) p(β σ 2, y) σ 2 y IG((n k)/2, (n k)ˆσ 2 /2), ( 1 + (β ˆβ) T X T X(β ˆβ (n k)ˆσ 2 ) (n k+k)/2 which corresponds to k-variate student with location ˆβ and scale matrix ˆσ 2 (X T X) 1. Checking that the posterior is proper. p(β, σ 2 y) is proper if 1. n > k 2. the rank of X equals k (i.e. columns of X are l.i.) 12

Sampling from the Posterior 1. Compute the QR factorization of X. 2. Obtain ˆβ as the solution of Q 1y = R ˆβ. 3. Obtain ˆσ 2 as Q 2y 2 /(n k). 4. Sample σ 2 IG((n k)/2, (n k)ˆσ 2 /2). 5. Note that X X = R Q QR = R R, so R is a Cholesky factor of X X. So, if z N k (0, I) then R 1 z N k (0, V β ). DON T compute R 1 explicitly! Solve Rβ = z, then do σβ + ˆβ. To make the generation of β more efficient you have to avoid computing Q explicitly. Also, when operating with R you have to remember that it is an upper triangular matrices. See R routines like backsolve and qr.solve. 13