Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable. NOTATION. y (continuous) is the response or outcome variable; x = (x 1,..., x k ) (discrete or continuous) are the explanatory variables; We will denote y = (y 1,..., y n ) the vector of outcomes and X the n k matrix of explanatory variables. 1
Linear Regression The normal linear model is a model such that the distribution of y X is a normal whose mean is a linear function of X Usually x i1 = 1. E(y i β, X) = β 1 x i1 +... + β k x ik, i = 1 : n. In matrix notation we write y = Xβ + ɛ where y R n, X R n k and ɛ N n (0, σ 2 I). 2
Example 42 specimens of radiate pine (Carlin & Chib, 1995 and Williams 1995). For each specimen the maximum compressive strength y i was measured, with its density x i and its density adjusted for resin content z i. strength strength 1500 3000 4500 1500 3000 4500 20 25 30 35 density 20 25 30 35 adjusted 3
Two models can be considered in this case M 1 := E(y i β (1), X) = β (1) 1 + β (1) 2 x i M 2 := E(y i β (2), Z) = β (2) 1 + β (2) 2 z i For model M 1 : n = 42, k = 2, x 1i = 1, x 2i = x i, β 1 = β (1) 1 and β 2 = β (1) 2. For model M 2 : n = 42, k = 2, x 1i = 1, x 2i = z i, β 1 = β (2) 1 and β 2 = β (2) 2. 4
Classical Regression Consider M 2. If y i N(β (2) 1 + β (2) 2 z i, σ 2 2), the maximum likelihood estimator of β (2) is given by the solution of Z T Z= Z T y, i.e. ˆβ (2) = (Z T Z) 1 Z T y. Furthermore, ˆβ (2) N(β (2), σ 2 2(Z T Z) 1 ). The MLE of σ 2 2 is given by, σ 2 2 = (y Z ˆβ (2) ) T (y Z ˆβ (2) )/n, however, this estimator is not unbiased, so an unbiased estimator is given by ˆσ 2 2 = (y Z ˆβ (2) ) T (y Z ˆβ (2) )/(n k). 5
Computing the LSE The goal is to find β such that y Xβ is minimized. We obtain the QR decomposition of X. So, X = QR where Q is an orthogonal matrix (Q Q = I). and R a rectangular matrix such that only the upper triangle has non 0 entries. Then y Xβ = Q y Q QRβ = Q y Rβ Write Q = (Q 1, Q 2 ), where Q 1 corresponds to the first k columns of Q. Then Q y Rβ 2 = Q 1y Rβ 2 + Q 2y 2 Thus, the solution to the LSE problem is given by Q 1y = R ˆβ. The residual sum of squares is Q 2y 2. 6
Fitting the linear regression in R >pines.linear<-lm(strength~adjusted) Call: lm(formula = strength ~ adjusted) Residuals: Min 1Q Median 3Q Max -623.907-188.821 4.951 197.334 619.691 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -1917.639 252.874-7.583 2.93e-09 *** adjusted 183.273 9.304 19.698 < 2e-16 *** --- Signif. codes:0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 276.9 on 40 degrees of freedom 7
Distributions ˆβ N(β, σ 2 (X T X) 1 ). This justifies the following 100(1 α)% C.I. for the regression coefficients β i, ˆβ i ± t α/2,n kˆσ (X T X) 1 ii A 95% C.I. for β (2) 2 is given by (164.5, 202.1) We can test the following hypothesis on each β i The test statistics is given by H 0 : β i = 0 vs H 1 : β i 0 t = ˆσ ˆβ i, (X T X) 1 ii 8
F Test When comparing two nested models we can use the F test. Let X 0 and X 1 denote the corresponding design matrices and ˆβ 0, ˆβ 1 the LSE. If H 0 is correct, then f = ( ˆβ T 1 X T 1 y ˆβ T 0 X T 0 y)/(p q) (y T y ˆβ T 1 XT 1 y)/(n p) F p q,n p Therefore, values of f that are large relative to the F p q,n p provide evidence against H 0. 9
Sufficient Statistics The likelihood for a normal linear model is given by f(y β, σ 2, X) We note that ( ) n/2 { 1 σ 2 exp 1 } 2σ 2 (y Xβ) (y Xβ) (y Xβ) (y Xβ) = (β ˆβ) (X X)(β ˆβ) + y Xβ 2 So ˆβ and ˆσ 2 are sufficient statistics for β and σ 2. So f(y β, σ 2, X) ( ) n/2 { 1 σ 2 exp 1 } 2σ 2 (β ˆβ) (X X)(β ˆβ) { exp 1 } (n k)ˆσ2 2σ2 10
We consider the model The Bayesian Approach y β, σ 2, X N(Xβ, σ 2 I) p(β, σ 2 X) σ 2 Notice that this model assumes conditionality on X. The situation where the regressors are subject to error require a prior distribution for X. The posterior distribution. Conditional posterior of β. p(β, σ 2 y) = p(β σ 2, y)p(σ 2 y) β σ 2, y N( ˆβ, V β σ 2 ) with ˆβ = (X T X) 1 X T y and V β = (X T X) 1. 11
Marginal posterior of σ 2. Marginal posterior of β. p(β y) p(σ 2 y) = p(β, σ2 y) p(β σ 2, y) σ 2 y IG((n k)/2, (n k)ˆσ 2 /2), ( 1 + (β ˆβ) T X T X(β ˆβ (n k)ˆσ 2 ) (n k+k)/2 which corresponds to k-variate student with location ˆβ and scale matrix ˆσ 2 (X T X) 1. Checking that the posterior is proper. p(β, σ 2 y) is proper if 1. n > k 2. the rank of X equals k (i.e. columns of X are l.i.) 12
Sampling from the Posterior 1. Compute the QR factorization of X. 2. Obtain ˆβ as the solution of Q 1y = R ˆβ. 3. Obtain ˆσ 2 as Q 2y 2 /(n k). 4. Sample σ 2 IG((n k)/2, (n k)ˆσ 2 /2). 5. Note that X X = R Q QR = R R, so R is a Cholesky factor of X X. So, if z N k (0, I) then R 1 z N k (0, V β ). DON T compute R 1 explicitly! Solve Rβ = z, then do σβ + ˆβ. To make the generation of β more efficient you have to avoid computing Q explicitly. Also, when operating with R you have to remember that it is an upper triangular matrices. See R routines like backsolve and qr.solve. 13