Part 6: Multivariate Normal and Linear Models

Size: px
Start display at page:

Download "Part 6: Multivariate Normal and Linear Models"

Transcription

1 Part 6: Multivariate Normal and Linear Models 1

2 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of individuals, or each run of a repeated experiment However, datasets are frequently multivariate, having multiple measurements for each individual or experiment 2

3 Multivariate model The most useful (commonly used) model for multivariate data is the multivariate normal model For a collection of variables, it allows us to jointly estimate population means variances and correlations 3

4 Example: reading comprehension A sample of 22 children are given reading comprehension tests before and after receiving a particular instructional method Each student i will have two scores, Y i,1 and Y i,2 denoting the pre- and post-instructional scores respectively We denote each student s pair of scores as a vector, so that Y i Y i = Yi,1 Y i,2 = score on first test score on second test 4

5 Example: quantities of interest The things we might be interested in include the population mean µ, particularly µ 2 µ 1 E{Yi,1 } µ1 E{Y} = = E{Y i,2 } µ 2 and the population covariance matrix =Cov[Y] = = E{Y1 2 } E{Y 1 } 2 E{Y 1 Y 2 } E{Y 1 }E{Y 2 } E{Y 1 Y 2 } E{Y 1 }E{Y 2 } E{Y2 2 } E{Y 2 } ,2 2 1,2 2 [ measures the consistency of the intervention] where 5 for 1,2 = [0, 1]

6 Multivariate normal One model for describing first- and second-order moments of multivariate data is the multivariate normal model We say that a p-dimensional data vector Y has a multivariate normal (MVN) distribution if its sampling density is p(y µ, )= where y = 1 1 (2 ) p/2 exp 1/2 2 (y µ)> 1 (y µ) 0 1 y 1 B. A µ = 0 1 µ 1 B. A = ,p C A y p µ p 6 1,p 2 p

7 Example: bivariate normals 1,2 = 48 [ = 0.5] 1,2 =0[ = 0] 1,2 = 48 [ =0.5] µ = (50, 50) ( 2 1, 2 2) = (64, 144) 7

8 Notation & marginals We shall write Y N p (µ, ) when Y has a p-dimensional MVN distribution An interesting feature of the MVN distribution is that the marginal distribution of each variable is a univariate normal: Y j N (µ j, 2 j ) 8

9 Semiconjugate prior Recall that if Y 1,...,Y n are IID (univariate) normal, then a convenient conjugate prior distribution for the population mean is also (univariate) normal Similarly, a convenient prior distribution for the multivariate mean µ is a MVN distribution, which we will parameterize as p(µ) =N p (µ; µ 0, 0 ) where µ 0 and 0 are the prior mean and variance of µ, respectively 9

10 The essence of the prior We have that if µ N p (µ 0, 0 ), then 1 p(µ) / exp 2 µ> A 0 µ + µ > b 0 where A and b 0 = 1 0 = µ 0 Conversely, this result says that if a random vector µ has a density in R p that is proportional to 1 exp 2 µ> Aµ + µ > b for some matrix A and vector b, then µ must have a MVN distribution with covariance and meana 1 b 10 A 1

11 The essence of likelihood If the sampling model is {Y 1,...,Y n µ, } iid N p (µ, ) then similar calculations show that the joint sampling density of the observed vectors y 1,...,y n is p(y 1,...,y n µ, ) / exp 1 2 µ> A 1 µ + µ > b 1 where A 1 = n 1,b 1 = n 1 ȳ and ȳ =( 1 n P yi,1,..., 1 n P yi,n ) 11

12 The essence of posterior Combining the prior and likelihood gives the posterior p(µ y 1,...,y n, ) / exp 1 2 µ> A 1 µ + µ > b 1 exp 1 2 µ> A 0 µ + µ > b 0 1 =exp 2 µ> A n µ + µ > b n where A n = A 0 + A 1 = n 1 b n = b 0 + b 1 = 0 1 µ 0 + n 1 ȳ 12

13 The posterior This implies that the posterior conditional distribution of µ must therefore be MVN with covariance A 1 and mean A 1 n b n, so Cov[µ y 1,...,y n, ] = n =( n 1 ) 1 E{µ y 1,...,y n, } = µ n = n ( 1 0 µ 0 + n 1 ȳ) n giving {µ y 1,...,y n, } N p (µ n, n) Just like in the univariate case: the posterior precision (inverse covariance) is the sum of the prior precision and data precision the posterior expectation is the average of the prior expectation and the sample mean 13

14 Prior covariance elements Just as the variance 2 must be positive, the covariance matrix must be positive definite, i.e., x > x>0, for all vectors x Positive definiteness guarantees that 2 j > 0 for all j and that all correlations are between 1 and 1 Another requirement is that the covariance matrix must be symmetric, i.e., j,k = k,j Any valid prior distribution for must put all of its probability mass on this complicated set of symmetric, positive definite matrices 14

15 Wishart distribution A convenient family of distributions with just these properties is the Wishart the multivariate analogue of the gamma family Recall that, in the univariate normal model, the gamma distribution is conjugate for the precision 1/ 2 the conjugate prior for 2 is IG Similarly, it turns out that the Wishart distribution is a semi-conjugate prior for the precision matrix 1 and so the conjugate prior for Wishart 15 is inverse-

16 Inverse-Wishart To obtain IW( 0,S where S is a p p 0 1 ) 0 positive-definite matrix and is a positive integer 0 1. Sample z 1,...,z 0 iid Np (0,S 1 0 ) 2. Calculate Z > Z = 0 X i=1 z i z > i 3. Set =(Z > Z) 1 Accordingly, the precision matrix 1 has a W ( 0,S 1 distribution 16 0 )

17 Expected covariance The expected covariances under an inverse-wishart are E{ 1 } = 0 S 1 0 E{ } = 1 p 1 S 0 As a prior for a MVN covariance, if we are confident that the true is near 0, then we might choose 0 large and set so that the S 0 =( 0 p 1) 0 distribution is tightly centered around 0 If not, we may choose 0 = p +2and S 0 = 0 0, so that the distribution is loosely centered around 17

18 IW (prior) density The density for IW( 0,S 1 0 ) is given by p( ) / ( 0 +p+1)/2 exp 1 2 tr(s 0 1 ) where tr( ) is the trace, or sum of the diagonal elements, of a matrix An interesting result from linear algebra is that KX b > k Ab k =tr(bab > )=tr(b > BA) k=1 where B is a matrix whose k th row is b > k 18

19 Convenient likelihood To combine the IW prior distribution with the sampling distribution for Y 1,...,Y n we take advantage of the above result for traces: p(y 1,...,y n µ, ) ( / = n/2 exp 1 2 ) nx (y i µ > ) 1 (y i µ) i=1 n/2 1 exp 2 tr(s µ 1 ) where S µ = P n i=1 (y i µ)(y i µ) > is the residual y 1,...,y n sum of squares matrix for the vectors if the population mean is presumed to be 19 µ

20 Posterior We are now in a convenient position to combine the prior with the likelihood as follows p( y 1,...,y n,µ) / p(y 1,...,y n µ, ) p( ) = n/2 1 exp 2 tr(s µ 1 ) ( 0 +p+1)/2 1 exp 2 tr(s 0 1 ) = ( 0 +n+p 1)/2 1 exp 2 tr([s 0 + S µ ] 1 ) 20

21 Posterior Thus we have { y 1,...,y n,µ} IW( 0 + n, [S 0 + S µ ] 1 ) Even though it was rough going to obtain, hopefully the result is intuitive the posterior sample size n = 0 + n is the sum of the prior sample size and the data sample size Similarly, the posterior residual sum of squares S n = S 0 + S µ is a sum of the prior sum of squares and the data sum of squares 21

22 Gibbs sampler We now have the set of full posterior conditionals {µ y 1,...,y n, } N p (µ n, n ) { y 1,...,y n,µ} IW( n,s 1 n ) which we may use to construct a GS algorithm to obtain a MCMC approximation to the joint posterior p(µ, y 1,...,y n ) 22

23 Gibbs sampler Given a starting value, the GS algorithm generates {µ (s+1), (s+1) } from {µ (s), (s) } via the following two steps: µ (s+1) (0) 1. Sample from its full conditional distribution: a) compute µ n and n from y 1,...,y n and (s) b) sample µ (s+1) N p (µ n, n ) 2. Sample (s+1) from its full conditional distribution: a) compute S n from y 1,...,y n and µ (s+1) b) sample (s+1) IW( n,s 1 n ) a) ) {µ n, n } depend on and S 0 depends on µ, so these must be recalculated every iteration 23

24 Example: reading comprehension Lets return to the our motivating reading comprehension example We shall model the 22 pairs of scores, before and after instruction, as IID samples from a MVN distribution We start by thinking about the priors for µ and The exam was designed to give average scores of around 50 out of 100, so prior expectation µ 0 = (50, 50) is sensible as a 24

25 Example: mean prior variance Since the true mean cannot be below 0 or above 100, it is desirable to use a prior variance for little probability outside of this range µ that puts If we take 2 0,1 = 2 0,2 = (50/2) 2 = 625 then P (µ j /2 [0, 100]) = 0.05 Since the two exams are measuring the same thing, whatever the true values of µ 1 and µ 2 are, it is probable that they are close We can reflect this with a prior correlation of 0.5, so that 1,2 =

26 Example: prior covariance & data Some of the same logic about the range of exam scores applies to choosing a prior for We ll take S 0 = 0 0, but only loosely center around this value by taking = p +2=4 The sufficient statistics of MVN inference are y 1,...,y 22 needed for ȳ = (47.18, 53.86) > (s 2 1,s 2 2) = (182.16, ) s 1,2 /(s 1 s 2 )=0.7 26

27 Example: Gibbs sampler Initialize the GS algorithm with (0) = s 2 1 s 1,2 s 1,2 s mu[, 1] mu[, 2] no effect Quantile-based CIs 2.5% 97.5% µ 2 µ 1 P (µ 2 >µ 1 y 1,...,y 22 ) > 0.99 strong evidence that the instruction is working 27

28 Example: Posterior predictive What is the probability that a randomly selected child will score higher on the second exam than on the first? Quantile-based CIs 2.5% 97.5% y 2 y 1 a less significant, and possibly worrying result! P (y 2 >y 1 y 1,...,y 22 )= y[, 1] y[, 2] no effect 28

29 Regression Regression modeling is concerned with describing how the sampling distribution of one random variable Y response variable varies with another variable or set of variables x =(x 1,...,x p ) explanatory variable(s) or covariates Specifically, a regression model postulates a form for p(y x), the conditional distribution for Y given x Estimation of p(y x) is made using data y 1,...,y n gathered under a variety of conditions x 1,...,x n 29

30 Linear model One simple but flexible approach to regression is via the linear (sampling) model (LM) The LM treats responses Y i as independent (but not identically distributed) realizations of a process that is linear in explanatory variables x > i =(x i,1,...,x i,p ), observed with Gaussian noise Y i ind N (µ i, 2 ) where µ i = E{Y x i } = x > i x i where the are known, is an unknown p-dimensional parameter vector of regression coefficients, and 2 is an unknown variance parameter 30

31 Compact notation The LM is usually written as Y = X + ", where Y = 0 1 Y 1 B. A, X = > 1 B... 1 C A, = C A, " = 0 1 " 1 B. A Y n x > n p " n and {" 1,...," n } iid N(0, 2 ) Even more compact notation is design matrix Y N n (X,I n 2 ) where is a n n identity matrix I n 31

32 Maximum Likelihood As long as X > X is invertible, which is the case when X is of full rank p, the MLE is ˆ =(X > X) 1 X > Y Similarly, we may find nx ˆ2 = 1 n Y X ˆ 2 = 1 n (Y i x > i ˆ) 2 i=1 32

33 Sampling the MLE It may be shown that ˆ Np (, 2 (X > X) 1 ) and ˆ2 2 n 2 n p Moreover ˆ and ˆ2 are independent These results may be used to construct confidence intervals, test hypotheses, etc., in a frequentist setup 33

34 Example: Oxygen intake 12 healthy men who did not exercise regularly were recruited to take part in a study of the effects of two different exercise regimen on oxygen intake 6 were randomly assigned to a 12-week flat- terrain running program the remaining 6 were assigned a 12-week step aerobics program The maximum oxygen intake of each subject was measured (in l/m) while running on an inclined treadmill, both before and after the 12-week program 34

35 Example: Oxygen intake Of interest is how a subject s change in maximal oxygen intake may depend upon which program they were assigned to one explanatory variable However, other factors, such as age, are expected to affect the change in maximal intake as well two explanatory variables I.e., we wish to estimate the conditional distribution of oxygen intake for a given exercise program and age 35

36 Example: Data deta o2 uptake e.g., aerobic running age It is easy to imagine two straight lines, one for the aerobic points, and the for the running ones How do we estimate them, and do we need two? 36

37 Example: Linear in covariates A sensible LM may be constructed as follows Y i = 1 x i,1 + 2 x i,2 + 3 x i,3 + 4 x i,4 x i,1 = 1 for each subject i (intercept term) x i,2 =0ifsubjecti is running, 1 if aerobic x i,3 = the age of subject i x i,4 = x i,2 x i,3 (interaction term) Thus, the (12) rows of X are comprised of 4 columns: x > i =(x i,1,x i,2,x i,3,x i,4 ) 37

38 Example: Explaining the terms in the model Under this model the conditional expectations of Y for the two different levels of x i,2 are E{Y x} = age, if running E{Y x} =( )+( ) age, if aerobic The model assumes that the relationship is linear in age for both exercise groups, with the difference in intercepts given by and the difference in slopes by

39 Example: MLE/OLS inference deta o2 uptake aerobic running age Classical tests indicate that the exercise program may not be significant Coefficients: Estimate Std. Error t value Pr(> t ) x ** x x ** x

40 Bayesian LM We demonstrate how a simple semi-conjugate prior distribution for and can be used when there is information available about the parameters 2 In situations where prior information is unavailable or difficult to quantify, an alternative default class of prior distributions is given We shall see how the MLE ( ˆ, ˆ2) crops up as factors in our posterior distributions, with similar sampling distributions as posteriors in the default prior case 40

41 Semi-conjugate prior The sampling density of the data, as a function of is p(y X,, 2 )=N n (y; X, 2 ) / exp =exp (y X )> (y X ) 2 2 [y> y 2 > X > y + > X > X ] The role that plays in the exponent looks very similar to that played by y, which is MVN This suggests that a MVN prior for is conjugate 41

42 Semi-conjugate prior If N p ( 0, 0 ), then p( y, X, 2 ) exp > / X> y > 1 + X> X 0 2 which we recognize as proportional to an MVN with n Var[ y, X, n E{ y, X, 2 ]=( X > X/ 2 ) 1 2 } = n ( X > y/ 2 ) i.e., { y, X, 2 } N p ( n, n ) 42

43 Interpretation We can gain some understanding of the posterior (conditional) by considering some limiting cases of n Var[ y, X, n E{ y, X, 2 ]=( X > X/ 2 ) 1 2 } = n ( X > y/ 2 ) If the elements of the prior precision matrix 0 small, then n ˆ, the MLE (or OLS estimator) 1 are On the other hand, if the measurement precision is very small ( is very large), then, the prior expectation 2 43 n 0

44 Conjugate prior variance As in most normal sampling problems, the semiconjugate prior for 2 is IG If 2 IG( 0 /2, /2) then p( 2 yx, ) =( 2 ) ( 0+n)/2+1 exp [ (y X ) > (y X )] which we recognize as an IG density, i.e., { 2 y, X, } IG 0 + n 2, (y X ) > (y X ) 2 44

45 Gibbs sampler Using these full conditionals, we may construct a GS algorithm as follows: given current values of { (s), 2(s) }, new values can be generated by 1. updating a) compute n (y, X, 2(s) ) and n(y, X, 2(s) ) b) sample (s+1) N p ( n, n ) 2. updating 2 a) compute b) sample s 2 =(y X (s+1) ) > (y X (s+1) ) 2(s+1) IG([ 0 + n]/2, [ s 2 ]/2) 45

46 Prior difficulty The Bayesian analysis of the LM requires a specification of the prior parameters ( 0, 0 ) and ( 0, 2 0) There are O(p 2 ) such parameters, so this can be quite a monumental task even for modest p Even when prior information exists (as in the oxygen intake example) sometimes an analysis must be done in the absence of prior information Fortunately, there are some convenient weaklyinformative priors that are easy to use 46

47 Jeffreys prior The (independence) Jeffreys prior for the LM is p(, 2 ) / 1/ 2 i.e., p( )=p( ) / 1 and p( 2 ) / 1/ 2 We may interpret the former as N (0, 1 p ), and the latter as 2 IG(0, 0), from which we may easily derive y, X, 2 N p ( ˆ, 2 (X > X) 1 ) 2 y, X, IG(n/2,nˆ2/2) Check that the posterior is proper for n>p+1 47

48 Marginal posterior variance The marginal posterior variance is available in closed form under the IG (and Jeffreys ) prior (cond. prob) Since we can sample from both of these distributions, samples from the joint posterior p(, 2 y, X) may be obtained by MC 48

49 Example: Oxygen intake We will use the independent Jeffreys prior for this example with the MC procedure using the marginal posterior p( 2 y, X) Frequency Frequency beta[, 1] 2.5% % -23 beta[, 2] 97.5% % 49 Frequency % % 3.3 Frequency % % beta[, 3] beta[, 4]

50 Example: effect of aerobics? The posterior distributions seem to suggest only weak evidence of a difference between the two groups, as the 95% CIs for and both contain zero 2 4 However, these parameters themselves do not quite tell the whole story According to the model, the average difference in y between two people of the same age x but in different exercise programs is x Thus, the posterior distribution for the effect of the aerobics program over the running program is obtained via the posterior distribution of x for each x 50

51 Example: effect of aerobics depends on age beta[2] + beta[4]*x age This indicates reasonably strong evidence of a difference at young ages and less at older ones 51

52 Prediction (forecasting) Suppose that we wish to obtain the posterior predictive distribution Y (x ) at a new location x There are several ways in which we may go about sampling from this predictive distribution The decomposition: (LTP) p(y y, X, x )= Z p(y x,, 2 )p(, (cond. prob.) 2 y, X) d d 2 says that we may obtain MC samples as y (s) N (x > (s), 2(s) ) 52

53 Example: predictive quantiles deta o2 uptake aerobic running age 53

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Multiparameter models (cont.)

Multiparameter models (cont.) Multiparameter models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 1, 2018 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 1 / 20 Outline Multinomial Multivariate

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Module 11: Linear Regression. Rebecca C. Steorts

Module 11: Linear Regression. Rebecca C. Steorts Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Bayesian Inference for Regression Parameters

Bayesian Inference for Regression Parameters Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

1. Introduction. Hang Qian 1 Iowa State University

1. Introduction. Hang Qian 1 Iowa State University Users Guide to the VARDAS Package Hang Qian 1 Iowa State University 1. Introduction The Vector Autoregression (VAR) model is widely used in macroeconomics. However, macroeconomic data are not always observed

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Efficient Bayesian Multivariate Surface Regression

Efficient Bayesian Multivariate Surface Regression Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

A Primer on Statistical Inference using Maximum Likelihood

A Primer on Statistical Inference using Maximum Likelihood A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #1 Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)

Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) MichaelR.Roberts Department of Economics and Department of Statistics University of California

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

(Moderately) Advanced Hierarchical Models

(Moderately) Advanced Hierarchical Models (Moderately) Advanced Hierarchical Models Ben Goodrich StanCon: January 12, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 18 Obligatory Disclosure Ben is an employee of Columbia University,

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Multivariate Regression: Part I

Multivariate Regression: Part I Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S ECO 513 Fall 25 C. Sims CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 1. THE GENERAL IDEA As is documented elsewhere Sims (revised 1996, 2), there is a strong tendency for estimated time series models,

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information