Module 4: Bayesian Methods Lecture 5: Linear regression

Size: px

Start display at page:

Download "Module 4: Bayesian Methods Lecture 5: Linear regression"

Wilfred Adrian Rogers
5 years ago
Views:

1 1/28 The linear regression model Module 4: Bayesian Methods Lecture 5: Linear regression Peter Hoff Departments of Statistics and Biostatistics University of Washington

2 2/28 The linear regression model Outline The linear regression model

3 3/28 The linear regression model Regression models How does an outcome Y vary as a function of x = {x 1,..., x p}? Which x j s have an effect? What are the effect sizes? Can we predict Y as a function of x? These questions can be assessed via a regression model p(y x).

4 4/28 The linear regression model Regression data Parameters in a regression model can be estimated from data: y 1 x 1,1 x 1,p... y n x n,1 x n,p These data are often expressed in matrix/vector form: y = y 1. y n X = x 1. x n = x 1,1 x 1,p.. x n,1 x n,p

5 /28 FTO experiment FTO gene is hypothesized to be involved in growth and obesity. Experimental design: 10 fto + / mice 10 fto / mice Mice are sacrificed at the end of 1-5 weeks of age. Two mice in each group are sacrificed at each age.

6 6/28 The linear regression model FTO Data weight (g) fto / fto+/ age (weeks)

7 7/28 The linear regression model Data analysis y = weight x g = fto heterozygote {0, 1} = number of + alleles x a = age in weeks {1, 2, 3, 4, 5} How can we estimate p(y x g, x a)? Cell means model: genotype age / θ 0,1 θ 0,2 θ 0,3 θ 0,4 θ 0,5 +/ θ 1,1 θ 1,2 θ 1,3 θ 1,4 θ 1,5 Problem: Only two observations per cell.

8 8/28 The linear regression model Linear regression Solution: Assume smoothness as a function of age. For each group, y = α 0 + α 1x a + ɛ This is a linear regression model. Linearity means linear in the parameters. We could also try the model which is also a linear regression model. y = α 0 + α 1x a + α 2x 2 a + α 3x 3 a + ɛ,

9 9/28 The linear regression model Multiple linear regression We can estimate the regressions for both groups simultaneously: Y i = β 1x i,1 + β 2x i,2 + β 3x i,3 + β 4x i,4 + ɛ i, where x i,1 = 1 for each subject i x i,2 = 0 if subject i is homozygous, 1 if heterozygous x i,3 = age of subject i x i,4 = x i,2 x i,3 Under this model, E[Y x] = β 1 + β 3 age if x 2 = 0, and E[Y x] = (β 1 + β 2) + (β 3 + β 4) age if x 2 = 1.

10 10/28 Multiple linear regression β3 = 0 β4 = 0 β2 = 0 β4 = 0 weight β4 = 0 weight age age

11 11/28 Normal linear regression How does each Y i vary around E[Y i β, x i ]? Assumption of normal errors: ɛ 1,..., ɛ n i.i.d. normal(0, σ 2 ) Y i = β T x i + ɛ i. This completely specifies the probability density of the data: p(y 1,..., y n x 1,... x n, β, σ 2 ) n = p(y i x i, β, σ 2 ) i=1 = (2πσ 2 ) n/2 exp{ 1 2σ 2 n (y i β T x i ) 2 }. i=1

12 2/28 Matrix form Let y be the n-dimensional column vector (y 1,..., y n) T ; Let X be the n p matrix whose ith row is x i. Then the normal regression model is that {y X, β, σ 2 } multivariate normal (Xβ, σ 2 I), where I is the p p identity matrix and Xβ = x 1 x 2. x n β 1. β p = β 1x 1,1 + + β px 1,p. β 1x n,1 + + β px n,p = E[Y 1 β, x 1]. E[Y n β, x n].

13 13/28 Ordinary least squares estimation What values of β are consistent with our data y, X? Recall p(y 1,..., y n x 1,... x n, β, σ 2 ) = (2πσ 2 ) n/2 exp{ 1 2σ 2 This is big when SSR(β) = (y i β T x i ) 2 is small. n (y i β T x i ) 2 }. i=1 n SSR(β) = (y i β T x i ) 2 = (y Xβ) T (y Xβ) i=1 = y T y 2β T X T y + β T X T Xβ. What value of β makes this the smallest?

14 14/28 Calculus Recall from calculus that 1. a minimum of a function g(z) occurs at a value z such that d g(z) = 0; dz 2. the derivative of g(z) = az is a and the derivative of g(z) = bz 2 is 2bz. Therefore, d dβ SSR(β) = d ( ) y T y 2β T X T y + β T X T Xβ dβ = 2X T y + 2X T Xβ, d dβ SSR(β) = 0 2XT y + 2X T Xβ = 0 X T Xβ = X T y β = (X T X) 1 X T y. ˆβ ols = (X T X) 1 X T y is the OLS estimator of β.

15 15/28 OLS estimation in R ### OLS estimate beta.ols<- solve( t(x)%*%x )%*%t(x)%*%y c(beta.ols) ## [1] ### using lm fit.ols<-lm(y~ X[,2] + X[,3] +X[,4] ) summary(fit.ols)$coef ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-01 ## X[, 2] e-01 ## X[, 3] e-06 ## X[, 4] e-02

16 16/28 OLS estimation weight age summary(fit.ols)$coef ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-01 ## X[, 2] e-01 ## X[, 3] e-06 ## X[, 4] e-02

17 17/28 Bayesian inference for regression models y i = β 1x i,1 + + β px i,p + ɛ i Motivation: Posterior probability statements: Pr(β j > 0 y, X) OLS tends to overfit when p is large, Bayes more conservative. Model selection and averaging

18 8/28 Prior and posterior distribution prior β mvn(β 0, Σ 0) sampling model y mvn(xβ, σ 2 I) posterior β y, X mvn(β n, Σ n) where Notice: Σ n = Var[β y, X, σ 2 ] = (Σ X T X/σ 2 ) 1 β n = E[β y, X, σ 2 ] = (Σ X T X/σ 2 ) 1 (Σ 1 0 β 0 + X T y/σ 2 ). If Σ 1 0 X T X/σ 2, then β n ˆβ ols If Σ 1 0 X T X/σ 2, then β n β 0

19 19/28 The g-prior How to pick β 0, Σ 0? g-prior: β mvn(0, gσ 2 (X T X) 1 ) Idea: The variance of the OLS estimate ˆβ ols is Var[ˆβ ols ] = σ 2 (X T X) 1 = σ2 n (XT X/n) 1 This is roughly the uncertainty in β from n observations. Var[β] gprior = gσ 2 (X T X) 1 = σ2 n/g (XT X/n) 1 The g-prior can roughly be viewed as the uncertainty from n/g observations. For example, g = n means the prior has the same amount of info as 1 obs.

20 20/28 Posterior distributions under the g-prior {β y, X, σ 2 } mvn(β n, Σ n) Σ n = Var[β y, X, σ 2 ] = β n = E[β y, X, σ 2 ] = g g + 1 σ2 (X T X) 1 g g + 1 (XT X) 1 X T y Notes: The posterior mean estimate β n is simply g g+1 ˆβ ols. The posterior variance of β is simply g Var[ˆβ g+1 ols ]. g shrinks the coefficients and can prevent overfitting to the data If g = n, then as n increases, inference approximates that using ˆβ ols.

21 21/28 Monte Carlo simulation What about the error variance σ 2? prior 1/σ 2 gamma(ν 0/2, ν 0σ0/2) 2 sampling model y mvn(xβ, σ 2 I) posterior 1/σ 2 y, X gamma([ν 0 + n]/2, [ν 0σ0 2 + SSR g ]/2) where SSR g is somewhat complicated. Simulating the joint posterior distribution: joint distribution p(σ 2, β y, X) = p(σ 2 y, X) p(β y, X, σ 2 ) simulation {σ 2, β} p(σ 2, β y, X) σ 2 p(σ 2 y, X), β p(β y, X, σ 2 ) To simulate {σ 2, β} p(σ 2, β y, X), 1. First simulate σ 2 from p(σ 2 y, X) 2. Use this σ 2 to simulate β from p(β y, X, σ 2 ) Repeat 1000 s of times to obtain MC samples: {σ 2, β} (1),..., {σ 2, β} (S).

22 22/28 FTO example Priors: 1/σ 2 gamma(1/2, /2) β σ 2 mvn(0, g σ 2 (X T X) 1 ) Posteriors: where {1/σ 2 y, X} gamma((1 + 20)/2, ( )/2) {β Y, X, σ 2 } mvn(.952 ˆβ ols,.952 σ 2 (X T X) 1 ) (X T X) 1 = ˆβ ols =

23 23/28 R-code ## data dimensions n<-dim(x)[1] ; p<-dim(x)[2] ## prior parameters nu0<-1 s20<-summary(lm(y~-1+x))$sigma^2 g<-n ## posterior calculations Hg<- (g/(g+1)) * X%*%solve(t(X)%*%X)%*%t(X) SSRg<- t(y)%*%( diag(1,nrow=n) - Hg ) %*%y Vbeta<- g*solve(t(x)%*%x)/(g+1) Ebeta<- Vbeta%*%t(X)%*%y ## simulate sigma^2 and beta s2.post<-beta.post<-null for(s in 1:5000) { s2.post<-c(s2.post,1/rgamma(1, (nu0+n)/2, (nu0*s20+ssrg)/2 ) ) beta.post<-rbind(beta.post, rmvnorm(1,ebeta,s2.post[s]*vbeta)) }

24 24/28 MC approximation to posterior s2.post[1:5] ## [1] beta.post[1:5,] ## [,1] [,2] [,3] [,4] ## [1,] ## [2,] ## [3,] ## [4,] ## [5,]

25 25/28 MC approximation to posterior quantile(s2.post,probs=c(.025,.5,.975)) ## 2.5% 50% 97.5% ## quantile(sqrt(s2.post),probs=c(.025,.5,.975)) ## 2.5% 50% 97.5% ## apply(beta.post,2,quantile,probs=c(.025,.5,.975)) ## [,1] [,2] [,3] [,4] ## 2.5% ## 50% ## 97.5%

26 26/28 OLS/Bayes comparison apply(beta.post,2,mean) ## [1] apply(beta.post,2,sd) ## [1] summary(fit.ols)$coef ## Estimate Std. Error t value Pr(> t ) ## X e-01 ## Xxg e-01 ## Xxa e-06 ## X e-02

27 27/28 Posterior distributions β β 2 β 4 β 2

28 28/28 Summarizing the genetic effect Genetic effect = E[y age, +/ ] E[y age, / ] = [(β 1 + β 2) + (β 3 + β 4) age] [β 1 + β 3 age] = β 2 + β 4 age β 2 + β 4 age age

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model