Bayesian Inference. Chapter 9. Linear models and regression

Size: px
Start display at page:

Download "Bayesian Inference. Chapter 9. Linear models and regression"

Transcription

1 Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

2 Chapter 9. Linear models and regression Objective Illustrate the Bayesian approach to fitting normal and generalized linear models. Recommended reading Lindley, D.V. and Smith, A.F.M. 197). Bayes estimates for the linear model with discussion), Journal of the Royal Statistical Society B, 34, Broemeling, L.D. 1985). Bayesian Analysis of Linear Models, Marcel- Dekker. Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. 003). Bayesian Data Analysis, Chapter 8.

3 9. Linear models and regression Chapter 9. Linear models and regression AFM Smith Objective AFM Smith developed some of the central ideas in the theory and practice of modern Bayesian statistics. To illustrate the Bayesian approach to fitting normal and generalized linear models.

4 0. Introduction Contents 1. The multivariate normal distribution 1.1. Conjugate Bayesian inference when the variance-covariance matrix is known up to a constant 1.. Conjugate Bayesian inference when the variance-covariance matrix is unknown. Normal linear models.1. Conjugate Bayesian inference for normal linear models.. Example 1: ANOVA model.3. Example : Simple linear regression model 3. Generalized linear models

5 The multivariate normal distribution Firstly, we review the definition and properties of the multivariate normal distribution. Definition A random variable X = X 1,..., X k ) T is said to have a multivariate normal distribution with mean µ and variance-covariance matrix Σ if: 1 f x µ, Σ) = exp 1 ) π) k/ Σ 1 x µ)t Σ 1 x µ), for x R k. In this case, we write X µ, Σ N µ, Σ).

6 The multivariate normal distribution The following properties of the multivariate normal distribution are well known: Any subset of X has a multivariate) normal distribution. Any linear combination k i=1 α ix i is normally distributed. If Y = a + BX is a linear transformation of X, then: Y µ, Σ N a + Bµ, BΣB T ) If X1 X = X ) µ, Σ N µ1 µ ) Σ11 Σ, 1 Σ 1 Σ then, the conditional density of X 1 given X = x is: )), X 1 x, µ, Σ N µ 1 + Σ 1 Σ 1 x µ ), Σ 11 Σ 1 Σ 1 Σ 1).

7 The multivariate normal distribution The likelihood function given a sample x = x 1,..., x n ) of data from N µ, Σ) is: ) 1 l µ, Σ x) = exp 1 n x π) nk/ Σ n i µ) T Σ 1 x i µ) i=1 ) 1 exp 1 n x Σ n i x) T Σ 1 x i x) + n µ x) T Σ 1 µ x) i=1 1 exp 1 Σ n tr SΣ 1) ) + n µ x) T Σ 1 µ x) where x = 1 n n i=1 x i and S = n i=1 x i x) x i x) T and trm) represents the trace of the matrix M. It is possible to carry out Bayesian inference with conjugate priors for µ and Σ. We shall consider two cases which reflect different levels of knowledge about the variance-covariance matrix, Σ.

8 Conjugate Bayesian inference when Σ = 1 φ C Firstly, consider the case where the variance-covariance matrix is known up to a constant, i.e. Σ = 1 φc where C is a known matrix. Then, we have X µ, φ N µ, 1 φc) and the likelihood function is, l µ, φ x) φ nk exp φ tr SC 1) ) + n µ x) T C 1 µ x) Analogous to the univariate case, it can be seen that a multivariate normal-gamma prior distribution is conjugate.

9 Conjugate Bayesian inference when Σ = 1 φ C Definition We say that µ, φ) have a multivariate normal gamma prior with parameters m, V 1, a, b ) if, µ φ N m, 1φ ) V a φ G, b ). In this case, we write µ, φ) N Gm, V 1, a, b ). Analogous to the univariate case, the marginal distribution of µ is a multivariate, non-central t distribution.

10 . Conjugate Bayesian inference when Σ = 1 φ C Definition A k-dimensional) random variable, T = T 1,..., T k ), has a multivariate t distribution with parameters d, µ T, Σ T ) if: f t) = Γ ) d+k πd) k Σ T 1 Γ ) d ) d+k d t µ T ) T Σ 1 T t µ T ) In this case, we write T T µ T, Σ T, d). Theorem Let µ, φ N Gm, V 1, a, b ). Then, the marginal density of µ is: µ T m, b V, a) a

11 Conjugate Bayesian inference when Σ = 1 φ C Theorem Let X µ, φ N µ, 1 φ C) and assume a priori µ, φ) N Gm, V 1, a, b ). Then, given a sample data, x, we have that, µ x, φ N m, 1φ ) V ) a φ x G, b. where, V = V 1 + nc 1) 1 m = V V 1 m + nc 1 x ) a = a + nk b = b + tr SC 1) + m T V 1 + n x T C 1 x + m V 1 m

12 Conjugate Bayesian inference when Σ = 1 φ C Theorem Given the reference prior pµ, φ) 1 φ, then the posterior distribution is, p µ, φ x) φ nk 1 exp φ tr SC 1) ) + n µ x) T C 1 µ x), Then, µ, φ x N G x, nc 1, n 1)k, trsc 1 ) ) which implies that, ) 1 µ x, φ N x, nφ C n 1) k φ x G, tr SC 1) ) µ x T x, tr SC 1) ) C, n 1) k n n 1) k

13 Conjugate Bayesian inference when Σ is unknown In this case, it is useful to reparameterize the normal distribution in terms of the precision matrix Φ = Σ 1. Then, the normal likelihood function becomes, lµ, Φ) Φ n exp 1 ) tr SΦ) + n µ x)t Φ µ x) It is clear that a conjugate prior for µ and Σ must take a similar form to the likelihood. This is a normal-wishart distribution.

14 Conjugate Bayesian inference when Σ is unknown Definition A k k dimensional symmetric, positive definite random variable W is said to have a Wishart distribution with parameters d and V if, f W) = dk V d π kk 1) 4 W d k 1 k i=1 Γ d+1 i ) exp 1 tr V 1 W )), where d > k 1. In this case, E [W] = dv and we write W W d, V) If W W d, V), then the distribution of W 1 is said to be an inverse Wishart distribution, W 1 IW d, V 1), with mean E [ W 1] = 1 d k 1 V 1

15 Conjugate Bayesian inference when Σ is unknown Theorem Suppose that X µ, Φ N µ, Φ 1 ) and let µ Φ N m, 1 α Φ 1 ) and Φ Wd, W). Then, ) αm + x µ Φ, x N α + n, 1 α + n Φ 1 Φ x W d + n, W 1 + S + αn ) m x) m x)t α + n Theorem Given the limiting prior, pφ) Φ k+1, the posterior distribution is, µ Φ, x N x, 1n ) Φ 1 Φ x W n 1, S)

16 Conjugate Bayesian inference when Σ is unknown The conjugacy assumption that the prior precision of µ is proportional to the model precision φ is very strong in many cases. Often, we may simply wish to use a prior distribution of form µ N m, V) where m and V are known and a Wishart prior for Φ, say Φ Wd, W) as earlier. In this case, the conditional posterior distributions are: V µ Φ, x N 1 + nφ ) 1 V 1 m + nφ x ), V 1 + nφ ) ) 1 Φ µ, x W d ) + n, W 1 + S + n µ x) µ x) T and therefore, it is straightforward to set up a Gibbs sampling algorithm to sample the joint posterior, as in the univariate case.

17 Normal linear models Definition A normal linear model is of form: y = Xθ + ɛ, where y = y 1,..., y n ) are the observed data, X is a given design matrix of size n k and θ = θ 1,..., θ k ) are the set of model parameters. Finally, it is usually assumed that: ɛ N 0 n, 1 ) φ I n.

18 Normal linear models A simple example of normal linear model is the simple linear regression ) T model where X = and θ = α, β) x 1 x... x T. n It is easy to see that there is a conjugate, multivariate normal-gamma prior distribution for any normal linear model.

19 Theorem Conjugate Bayesian inference Consider a normal linear model, y = Xθ + ɛ, and assume a normal-gamma priori distribution, θ, φ N Gm, V 1, a, b ). Then, the posterior distribution given y is also a normal-gamma distribution: θ, φ y N Gm, V 1, a, b ) where: m = X T X + V 1) 1 X T y + V 1 m ) V = X T X + V 1) 1 a = a + n b = b + y T y + m T V 1 m m T V 1 m

20 Conjugate Bayesian inference Given a normal linear model, y = Xθ + ɛ, and assuming a normal-gamma priori distribution, θ, φ N Gm, V 1, a, b ), it is easy to see that the predictive distribution of y is: y T Xm, a XVX T + I ), a) b To see this, note that the distribution of y φ is: y φ N Xm, 1 XVX T + I) ), φ then, the joint distribution of y and φ is a multivariate normal-gamma: y, φ N G Xm, XVX T + I) 1, a, b ) and therefore, the marginal of y is the multivariate t-distribution obtained before.

21 We have that: Interpretation of the posterior mean E [θ y] = X T X + V 1) 1 X T y + V 1 m ) = X T X + V 1) 1 X T X X T X ) ) 1 X T y + V 1 m = X T X + V 1) ) 1 X T Xˆθ + V 1 m where ˆθ = X T X ) 1 X T y is the maximum likelihood estimator. Thus, this expression may be interpreted as a weighted average of the prior estimator, m, and the MLE, ˆθ, with weights proportional to precisions, as we can recall that, conditional on φ, the prior variance was 1 φ V and that the distribution of the MLE from the classical viewpoint is ) ˆθ φ N θ, 1φ XT X) 1

22 Theorem Conjugate Bayesian inference Consider a normal linear model, y = Xθ + ɛ, and assume the limiting prior distribution, pθ, φ) 1 φ. Then, we have that, θ y, φ N ˆθ, 1 X T X ) ) 1, φ φ y G n k, yt y ˆθ T X T X ) ˆθ. And then, θ y T ˆθ, ˆσ X T X ) 1, n k ), where, ˆσ = yt y ˆθ T X T X ) ˆθ n k

23 Conjugate Bayesian inference Note that ˆσ = yt y ˆθ T X T X)ˆθ n k is the usual classical estimator of σ = 1 φ. In this case, Bayesian credible intervals, estimators etc. will coincide with their classical counterparts. One should note however that the propriety of the posterior distribution in this case relies on two conditions: 1. n > k.. X T X is of full rank. If either of these two conditions is not satisfied, then the posterior distribution will be improper.

24 ANOVA model The ANOVA model is an example of normal lineal model where: y ij = θ i + ɛ ij, where ɛ ij N 0, 1 φ ), for i = 1,..., k, and j = 1,..., n i. Thus, the parameters are θ = θ 1,..., θ k ), the observed data are y = y 11,..., y 1n1, y 1,..., y n,..., y k1,..., y knk ) T, the design matrix is: n X = n

25 ANOVA model ) 1 If we use conditionally independent normal priors, θ i N m i, α i φ, for i = 1,..., k, and a gamma prior φ G a, b ). Then, we have that: θ, φ) N G m, V 1, a, b ) where m = m 1,..., m k ) and V = Noting that, X T X + V 1 = the following posterior distributions. 1 α 1... n 1 + α α k n k + α k, we can obtain

26 ANOVA model It is obtained that, θ y, φ N n 1ȳ 1 +α 1m 1 n 1+α 1. n 1ȳ 1 +α 1m 1 n 1+α 1, 1 φ 1 α 1+n α k +n k and φ y G a + n, b + k ni i=1 j=1 y ij ȳ i ) + k n i i=1 n i +α i ȳ i m i ) )

27 ANOVA model Often we are interested in the differences in group means, e.g. θ 1 θ. Here we have, n1 ȳ 1 + α 1 m 1 θ 1 θ y, φ N n ȳ + α m, 1 )) n 1 + α 1 n + α φ α 1 + n 1 α + n and therefore a posterior, 95% interval for θ 1 θ is given by: n 1 ȳ 1 + α 1 m 1 n 1 + α 1 n ȳ + α m n + α ± 1 α 1+n α +n ) b+ k i=1 ni k j=1 y ij ȳ i ) + i=1 n i n i +α i ȳ i m i ) a+n t a+n 0.975)

28 ANOVA model If we assume alternatively the reference prior, pθ, φ) 1 φ, we have: θ y, φ N n k φ G, ȳ 1. ȳ k, 1 φ n k) ˆσ ), 1 n n k, where ˆσ = 1 n k k i=1 y ij ȳ i ) is the classical variance estimate for this problem. A 95% posterior interval for θ 1 θ is given by: 1 ȳ 1 ȳ ± ˆσ + 1 t n k 0.975), n 1 n which is equal to the usual, classical interval.

29 Simple linear regression model Consider the simple regression model: y i = α + βx i ɛ i, ) for i = 1,..., n, where ɛ i N 0, 1 φ and suppose that we use the limiting prior: pα, β, φ) 1 φ.

30 Simple linear regression model Then, we have that: α β y, φ N ˆαˆβ where α β ˆα = ȳ ˆβ x, φ y G y T ˆαˆβ ), n, s x ˆβ = s xy s x, ), ˆσ n n 1 xi i=1 φns x n x ) ) 1 r 1 s x n xi i=1 n x s x = n i=1 x i x), s y = n i=1 y i ȳ), s xy = n i=1 x i x) y i ȳ), r = s xy sx s y, n x n n x n, n ˆσ = s ) y 1 r. n

31 Simple linear regression model Thus, the marginal distributions of α and β are: n x i ˆσ α y T i=1 ˆα,, n n s x β y T ) ˆβ, ˆσ, n s x and therefore, for example, a 95% credible interval for β is given by: ˆβ ± equal to the usual classical interval. ˆσ sx t n 0.975)

32 Simple linear regression model Suppose now that we wish to predict a future observation: Note that, Therefore, y new = α + βx new + ɛ new. E [y new φ, y] = ˆα + ˆβx new V [y new φ, y] = 1 n i=1 x i + nx ) new n xx new + 1 φ ns x = 1 sx + n x + nx ) new n xx new + 1 φ ns x y new φ, y N ˆα + ˆβx new, 1 φ x x new ) s x + 1 n + 1 ))

33 Simple linear regression model And then, y new y T ˆα + ˆβx new, ˆσ x x new ) s x + 1 n + 1 ), n ) leading to the following 95% credible interval for y new : ) ˆα + ˆβx new ± ˆσ x x new ) + 1 s x n + 1 t n 0.975), which coincides with the usual, classical interval.

34 Generalized linear models The generalized linear model generalizes the normal linear model by allowing the possibility of non-normal error distributions and by allowing for a non-linear relationship between y and x. A generalized linear model is specified by two functions: 1. A conditional, exponential family density function of y given x, parameterized by a mean parameter, µ = µx) = E[Y x] and possibly) a dispersion parameter, φ > 0, that is independent of x.. A one-to-one) link function, g ), which relates the mean, µ = µx) to the covariate vector, x, as gµ) = xθ.

35 Generalized linear models The following are generalized linear models with the canonical link function which is the natural parameterization to leave the exponential family distribution in canonical form. Logistic regression It is often used for predicting the occurrence of an event given covariates: Poisson regression Y i p i Binn i, p i ) p i log = x i θ 1 p i It is used for predicting the number of events in a time period given covariates: Y i p i Pλ i ) log λ i = x i θ

36 Generalized linear models The Bayesian specification of a GLM is completed by defining typically normal or normal gamma) prior distributions pθ, φ) over the unknown model parameters. As with standard linear models, when improper priors are used, it is then important to check that these lead to valid posterior distributions. Clearly, these models will not have conjugate posterior distributions, but, usually, they are easily handled by Gibbs sampling. In particular, the posterior distributions from these models are usually log concave and are thus easily sampled via adaptive rejection sampling.

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

10. Exchangeability and hierarchical models Objective. Recommended reading

10. Exchangeability and hierarchical models Objective. Recommended reading 10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

A few basics of credibility theory

A few basics of credibility theory A few basics of credibility theory Greg Taylor Director, Taylor Fry Consulting Actuaries Professorial Associate, University of Melbourne Adjunct Professor, University of New South Wales General credibility

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Bayesian Inference in a Normal Population

Bayesian Inference in a Normal Population Bayesian Inference in a Normal Population September 20, 2007 Casella & Berger Chapter 7, Gelman, Carlin, Stern, Rubin Sec 2.6, 2.8, Chapter 3. Bayesian Inference in a Normal Population p. 1/16 Normal Model

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

variability of the model, represented by σ 2 and not accounted for by Xβ

variability of the model, represented by σ 2 and not accounted for by Xβ Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Lecture 3. Univariate Bayesian inference: conjugate analysis

Lecture 3. Univariate Bayesian inference: conjugate analysis Summary Lecture 3. Univariate Bayesian inference: conjugate analysis 1. Posterior predictive distributions 2. Conjugate analysis for proportions 3. Posterior predictions for proportions 4. Conjugate analysis

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010 MFM Practitioner Module: Risk & Asset Allocation Estimator February 3, 2010 Estimator Estimator In estimation we do not endow the sample with a characterization; rather, we endow the parameters with a

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information

Advanced Statistics I : Gaussian Linear Model (and beyond)

Advanced Statistics I : Gaussian Linear Model (and beyond) Advanced Statistics I : Gaussian Linear Model (and beyond) Aurélien Garivier CNRS / Telecom ParisTech Centrale Outline One and Two-Sample Statistics Linear Gaussian Model Model Reduction and model Selection

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis

More information

Nuisance parameters and their treatment

Nuisance parameters and their treatment BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend

More information

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information