Bayesian Linear Models

Size: px

Start display at page:

Download "Bayesian Linear Models"

Ursula Sutton
5 years ago
Views:

1 Eric F. Lock UMN Division of Biostatistics, SPH 03/07/2018

2 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β x pi β p + ɛ i, x 1i,..., x pi are predictors for the i th observation. ɛ i are error terms. In matrix form: y = X β + ɛ y = (y 1,..., y n ), ɛ = (ɛ 1,..., ɛ n ), β = (β 1,..., β p ) X is the matrix with entries X ij = x ij

3 Linear model Assume X is fixed (non-random) Assume errors are normal and iid with equal variance: ɛ Normal(0, σ 2 I ). Standard frequentist estimates are ˆβ = (X T X ) 1 X T y and ˆσ 2 = s 2 = 1 n p (y X ˆβ) T (y X ˆβ). These estimates are unbiased, and can be motivated by least-squares. Under a Bayesian framework, we put a prior on β and σ 2.

4 Uninformative priors Consider uniform prior for β and Jeffreys prior for σ 2 : π(β, σ 2 ) 1 σ 2. The posterior for β, given σ 2, is ( ) p(β y, σ 2 ) = Normal ˆβ, σ 2 (X T X ) 1 )

5 Uninformative priors The marginal posterior of σ 2 is ( n p p(σ 2 y) = IG, 2 ) (n p)s2 2 Equivalently: σ 2 (n p)s2 U where U χ 2 (n p).

6 Uninformative priors The marginal posterior for β i is a non-central t-distribution: β i ˆβ i t n p. s (X T X ) 1 ii For a new predictor vector x (n+1), the posterior predictive for y n+1 is also a non-central t-distribution: y n+1 x n+1 ˆβ s 1 + x n+1 (X T X ) 1 x n+1 t n p. All given results for π(β, σ 2 ) 1 correspond to standard σ 2 frequentist inference for linear regression!

7 Example: Body Fat The % body fat (BF %) is measured for 100 adult males. 1 Using sophisticated and precise technique (water immersion) Also measure the following for each person: 1: Age (in years) 2: Weight (in pounds) 3: Height (in inches) Circumference of the neck (4), chest (5), abdomen (6), ankle (7), bicep (8), and wrist (9) in cm. Data available at Would like to predict BF % from the 9 additional measurements 1 Johnson, R. Fitting Percentage Body Fat to Simple Body Measurements, Journal of Statistics Education, 1996.

8 Example: Body Fat Assume ỹ = (ỹ 1,..., ỹ 100 ) give BF % for subjects 1,..., 100 ỹ = 18.6% sỹ = 8.01% Let X : be the matrix of standardized predictors X i,j = x i,j mean( x,j ) stdev( x,j ) X i,j is measurement j (unstandardized) for subject i The mean BF% for american adult men is 18.5% For y = ỹ 18.5 consider the model y = βx + ɛ

9 Example: Body Fat Assume ɛ Normal(0, σ 2 I ) Use uninformative prior: π(β, σ 2 ) = 1 σ 2 Recall p(β i y) is a non-central t: β i ˆβ i t 91. s (X T X ) 1 ii where and s = ˆβ = (X T X ) 1 X T y 1 91 y X ˆβ) 2 = 4.11

10 Example: Body Fat Estimates and 95% credible intervals for β i s: Variable ˆβi 95% credible interval Age (-0.186, 2.099) Weight (-7.397, 2.480) Height (-1.328, 1.523) Neck (-1.727, 1.732) Chest (-3.889, 1.526) Abdomen (7.639, ) Ankle (-1.137, 1.745) Biceps (-0.935, 1.844) Wrist (-3.807, ) Models_Rcode1.r

11 Example: Body Fat Recall p(σ 2 y) = IG ( 91 2 ) 91 s2, 2 : density s^2 E(sigma^2 y) sigma^2

12 Variance estimate, uninformative priors Note for the uninformative prior π(µ, σ 2 ) = 1 σ 2, E(σ 2 y) = s2 (n p) n p 2 However, the expected precision is E(1/σ 2 y) = 1 s 2 s 2 still commonly used as point estimate for error variance.

13 Residuals Recall: defined Bayesian residual as r i = y i E(Y i y (i) ) where y (i) = (y 1,..., y i 1, y i+1,..., y n ) For this context, the Bayesian residual is r i = y i x i ˆβ (i) where ˆβ (i) = (X T (i) X (i)) 1 X T (i) y (i). The standard (non-bayesian) definition of residual is r i = y i x i ˆβ

14 Example: Body Fat Standard residuals Predicted Residual Bayesian residuals Predicted Residual

15 Example: Body Fat Predicted vs observed (standard) Predicted Observed Predicted vs observed (Bayesian) Predicted Observed

16 Normal-inverse-gamma prior Consider independent normal priors for the β i s: β σ 2 Normal(0, σ 2 T ) where T ij = τi 2 if i = j, 0 otherwise. And an inverse-gamma prior for σ 2 : The full prior is π(β, σ 2 ) = IG(σ 2 a, b) σ 2 IG(a, b). p i=1 Normal(β i 0, σ 2 τ 2 i )

17 Normal-inverse-gamma prior The posterior for β, given σ 2, is ( ) p(β y, σ 2 ) = Normal β, σ 2 V β where β = (X T X + T 1 ) 1 (X T y) and V β = (X T X + T 1 ) 1

18 Normal-inverse-gamma prior The estimate β solves a penalized least squares criterion: β = argmin y XB 2 + β p i=1 β 2 i /τ 2 i Shrinks unbiased estimate ˆβ toward 0.

19 Normal-inverse-gamma prior The marginal posterior for σ 2 is p(σ 2 y) = IG (a n, b n ) where a n = a + n 2 and b n = b [yt y β T V 1 β β] The marginal posterior for β is a multivariate t-distribution β i β i b n a n (V β ) ii t 2a+n.

20 Normal-inverse-gamma prior For a new predictor vector x n+1, the posterior predictive for y n+1 given σ 2 is y n+1 σ 2 Normal(x n+1 β, σ 2 (1 + x n+1 V β x T n+1)) The full posterior predictive distribution is a non-central t: y n+1 x n+1 β ( ) t 2a+n. b n a n 1 + xn+1 V β x T n+1

21 Extensions There are many other versions of the Bayesian linear model. E.g.: Could use non-trivial mean and covariance for β: β Normal(µ β, T ) E.g.: Could relax iid assumption for y i s, model general covariance: y Normal(X β, Σ) requires a prior for Σ. For more details and derivations see and Carlin & Louis 4.1.1

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x