Part 6: Multivariate Normal and Linear Models
|
|
- Prudence Brooks
- 5 years ago
- Views:
Transcription
1 Part 6: Multivariate Normal and Linear Models 1
2 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of individuals, or each run of a repeated experiment However, datasets are frequently multivariate, having multiple measurements for each individual or experiment 2
3 Multivariate model The most useful (commonly used) model for multivariate data is the multivariate normal model For a collection of variables, it allows us to jointly estimate population means variances and correlations 3
4 Example: reading comprehension A sample of 22 children are given reading comprehension tests before and after receiving a particular instructional method Each student i will have two scores, Y i,1 and Y i,2 denoting the pre- and post-instructional scores respectively We denote each student s pair of scores as a vector, so that Y i Y i = Yi,1 Y i,2 = score on first test score on second test 4
5 Example: quantities of interest The things we might be interested in include the population mean µ, particularly µ 2 µ 1 E{Yi,1 } µ1 E{Y} = = E{Y i,2 } µ 2 and the population covariance matrix =Cov[Y] = = E{Y1 2 } E{Y 1 } 2 E{Y 1 Y 2 } E{Y 1 }E{Y 2 } E{Y 1 Y 2 } E{Y 1 }E{Y 2 } E{Y2 2 } E{Y 2 } ,2 2 1,2 2 [ measures the consistency of the intervention] where 5 for 1,2 = [0, 1]
6 Multivariate normal One model for describing first- and second-order moments of multivariate data is the multivariate normal model We say that a p-dimensional data vector Y has a multivariate normal (MVN) distribution if its sampling density is p(y µ, )= where y = 1 1 (2 ) p/2 exp 1/2 2 (y µ)> 1 (y µ) 0 1 y 1 B. A µ = 0 1 µ 1 B. A = ,p C A y p µ p 6 1,p 2 p
7 Example: bivariate normals 1,2 = 48 [ = 0.5] 1,2 =0[ = 0] 1,2 = 48 [ =0.5] µ = (50, 50) ( 2 1, 2 2) = (64, 144) 7
8 Notation & marginals We shall write Y N p (µ, ) when Y has a p-dimensional MVN distribution An interesting feature of the MVN distribution is that the marginal distribution of each variable is a univariate normal: Y j N (µ j, 2 j ) 8
9 Semiconjugate prior Recall that if Y 1,...,Y n are IID (univariate) normal, then a convenient conjugate prior distribution for the population mean is also (univariate) normal Similarly, a convenient prior distribution for the multivariate mean µ is a MVN distribution, which we will parameterize as p(µ) =N p (µ; µ 0, 0 ) where µ 0 and 0 are the prior mean and variance of µ, respectively 9
10 The essence of the prior We have that if µ N p (µ 0, 0 ), then 1 p(µ) / exp 2 µ> A 0 µ + µ > b 0 where A and b 0 = 1 0 = µ 0 Conversely, this result says that if a random vector µ has a density in R p that is proportional to 1 exp 2 µ> Aµ + µ > b for some matrix A and vector b, then µ must have a MVN distribution with covariance and meana 1 b 10 A 1
11 The essence of likelihood If the sampling model is {Y 1,...,Y n µ, } iid N p (µ, ) then similar calculations show that the joint sampling density of the observed vectors y 1,...,y n is p(y 1,...,y n µ, ) / exp 1 2 µ> A 1 µ + µ > b 1 where A 1 = n 1,b 1 = n 1 ȳ and ȳ =( 1 n P yi,1,..., 1 n P yi,n ) 11
12 The essence of posterior Combining the prior and likelihood gives the posterior p(µ y 1,...,y n, ) / exp 1 2 µ> A 1 µ + µ > b 1 exp 1 2 µ> A 0 µ + µ > b 0 1 =exp 2 µ> A n µ + µ > b n where A n = A 0 + A 1 = n 1 b n = b 0 + b 1 = 0 1 µ 0 + n 1 ȳ 12
13 The posterior This implies that the posterior conditional distribution of µ must therefore be MVN with covariance A 1 and mean A 1 n b n, so Cov[µ y 1,...,y n, ] = n =( n 1 ) 1 E{µ y 1,...,y n, } = µ n = n ( 1 0 µ 0 + n 1 ȳ) n giving {µ y 1,...,y n, } N p (µ n, n) Just like in the univariate case: the posterior precision (inverse covariance) is the sum of the prior precision and data precision the posterior expectation is the average of the prior expectation and the sample mean 13
14 Prior covariance elements Just as the variance 2 must be positive, the covariance matrix must be positive definite, i.e., x > x>0, for all vectors x Positive definiteness guarantees that 2 j > 0 for all j and that all correlations are between 1 and 1 Another requirement is that the covariance matrix must be symmetric, i.e., j,k = k,j Any valid prior distribution for must put all of its probability mass on this complicated set of symmetric, positive definite matrices 14
15 Wishart distribution A convenient family of distributions with just these properties is the Wishart the multivariate analogue of the gamma family Recall that, in the univariate normal model, the gamma distribution is conjugate for the precision 1/ 2 the conjugate prior for 2 is IG Similarly, it turns out that the Wishart distribution is a semi-conjugate prior for the precision matrix 1 and so the conjugate prior for Wishart 15 is inverse-
16 Inverse-Wishart To obtain IW( 0,S where S is a p p 0 1 ) 0 positive-definite matrix and is a positive integer 0 1. Sample z 1,...,z 0 iid Np (0,S 1 0 ) 2. Calculate Z > Z = 0 X i=1 z i z > i 3. Set =(Z > Z) 1 Accordingly, the precision matrix 1 has a W ( 0,S 1 distribution 16 0 )
17 Expected covariance The expected covariances under an inverse-wishart are E{ 1 } = 0 S 1 0 E{ } = 1 p 1 S 0 As a prior for a MVN covariance, if we are confident that the true is near 0, then we might choose 0 large and set so that the S 0 =( 0 p 1) 0 distribution is tightly centered around 0 If not, we may choose 0 = p +2and S 0 = 0 0, so that the distribution is loosely centered around 17
18 IW (prior) density The density for IW( 0,S 1 0 ) is given by p( ) / ( 0 +p+1)/2 exp 1 2 tr(s 0 1 ) where tr( ) is the trace, or sum of the diagonal elements, of a matrix An interesting result from linear algebra is that KX b > k Ab k =tr(bab > )=tr(b > BA) k=1 where B is a matrix whose k th row is b > k 18
19 Convenient likelihood To combine the IW prior distribution with the sampling distribution for Y 1,...,Y n we take advantage of the above result for traces: p(y 1,...,y n µ, ) ( / = n/2 exp 1 2 ) nx (y i µ > ) 1 (y i µ) i=1 n/2 1 exp 2 tr(s µ 1 ) where S µ = P n i=1 (y i µ)(y i µ) > is the residual y 1,...,y n sum of squares matrix for the vectors if the population mean is presumed to be 19 µ
20 Posterior We are now in a convenient position to combine the prior with the likelihood as follows p( y 1,...,y n,µ) / p(y 1,...,y n µ, ) p( ) = n/2 1 exp 2 tr(s µ 1 ) ( 0 +p+1)/2 1 exp 2 tr(s 0 1 ) = ( 0 +n+p 1)/2 1 exp 2 tr([s 0 + S µ ] 1 ) 20
21 Posterior Thus we have { y 1,...,y n,µ} IW( 0 + n, [S 0 + S µ ] 1 ) Even though it was rough going to obtain, hopefully the result is intuitive the posterior sample size n = 0 + n is the sum of the prior sample size and the data sample size Similarly, the posterior residual sum of squares S n = S 0 + S µ is a sum of the prior sum of squares and the data sum of squares 21
22 Gibbs sampler We now have the set of full posterior conditionals {µ y 1,...,y n, } N p (µ n, n ) { y 1,...,y n,µ} IW( n,s 1 n ) which we may use to construct a GS algorithm to obtain a MCMC approximation to the joint posterior p(µ, y 1,...,y n ) 22
23 Gibbs sampler Given a starting value, the GS algorithm generates {µ (s+1), (s+1) } from {µ (s), (s) } via the following two steps: µ (s+1) (0) 1. Sample from its full conditional distribution: a) compute µ n and n from y 1,...,y n and (s) b) sample µ (s+1) N p (µ n, n ) 2. Sample (s+1) from its full conditional distribution: a) compute S n from y 1,...,y n and µ (s+1) b) sample (s+1) IW( n,s 1 n ) a) ) {µ n, n } depend on and S 0 depends on µ, so these must be recalculated every iteration 23
24 Example: reading comprehension Lets return to the our motivating reading comprehension example We shall model the 22 pairs of scores, before and after instruction, as IID samples from a MVN distribution We start by thinking about the priors for µ and The exam was designed to give average scores of around 50 out of 100, so prior expectation µ 0 = (50, 50) is sensible as a 24
25 Example: mean prior variance Since the true mean cannot be below 0 or above 100, it is desirable to use a prior variance for little probability outside of this range µ that puts If we take 2 0,1 = 2 0,2 = (50/2) 2 = 625 then P (µ j /2 [0, 100]) = 0.05 Since the two exams are measuring the same thing, whatever the true values of µ 1 and µ 2 are, it is probable that they are close We can reflect this with a prior correlation of 0.5, so that 1,2 =
26 Example: prior covariance & data Some of the same logic about the range of exam scores applies to choosing a prior for We ll take S 0 = 0 0, but only loosely center around this value by taking = p +2=4 The sufficient statistics of MVN inference are y 1,...,y 22 needed for ȳ = (47.18, 53.86) > (s 2 1,s 2 2) = (182.16, ) s 1,2 /(s 1 s 2 )=0.7 26
27 Example: Gibbs sampler Initialize the GS algorithm with (0) = s 2 1 s 1,2 s 1,2 s mu[, 1] mu[, 2] no effect Quantile-based CIs 2.5% 97.5% µ 2 µ 1 P (µ 2 >µ 1 y 1,...,y 22 ) > 0.99 strong evidence that the instruction is working 27
28 Example: Posterior predictive What is the probability that a randomly selected child will score higher on the second exam than on the first? Quantile-based CIs 2.5% 97.5% y 2 y 1 a less significant, and possibly worrying result! P (y 2 >y 1 y 1,...,y 22 )= y[, 1] y[, 2] no effect 28
29 Regression Regression modeling is concerned with describing how the sampling distribution of one random variable Y response variable varies with another variable or set of variables x =(x 1,...,x p ) explanatory variable(s) or covariates Specifically, a regression model postulates a form for p(y x), the conditional distribution for Y given x Estimation of p(y x) is made using data y 1,...,y n gathered under a variety of conditions x 1,...,x n 29
30 Linear model One simple but flexible approach to regression is via the linear (sampling) model (LM) The LM treats responses Y i as independent (but not identically distributed) realizations of a process that is linear in explanatory variables x > i =(x i,1,...,x i,p ), observed with Gaussian noise Y i ind N (µ i, 2 ) where µ i = E{Y x i } = x > i x i where the are known, is an unknown p-dimensional parameter vector of regression coefficients, and 2 is an unknown variance parameter 30
31 Compact notation The LM is usually written as Y = X + ", where Y = 0 1 Y 1 B. A, X = > 1 B... 1 C A, = C A, " = 0 1 " 1 B. A Y n x > n p " n and {" 1,...," n } iid N(0, 2 ) Even more compact notation is design matrix Y N n (X,I n 2 ) where is a n n identity matrix I n 31
32 Maximum Likelihood As long as X > X is invertible, which is the case when X is of full rank p, the MLE is ˆ =(X > X) 1 X > Y Similarly, we may find nx ˆ2 = 1 n Y X ˆ 2 = 1 n (Y i x > i ˆ) 2 i=1 32
33 Sampling the MLE It may be shown that ˆ Np (, 2 (X > X) 1 ) and ˆ2 2 n 2 n p Moreover ˆ and ˆ2 are independent These results may be used to construct confidence intervals, test hypotheses, etc., in a frequentist setup 33
34 Example: Oxygen intake 12 healthy men who did not exercise regularly were recruited to take part in a study of the effects of two different exercise regimen on oxygen intake 6 were randomly assigned to a 12-week flat- terrain running program the remaining 6 were assigned a 12-week step aerobics program The maximum oxygen intake of each subject was measured (in l/m) while running on an inclined treadmill, both before and after the 12-week program 34
35 Example: Oxygen intake Of interest is how a subject s change in maximal oxygen intake may depend upon which program they were assigned to one explanatory variable However, other factors, such as age, are expected to affect the change in maximal intake as well two explanatory variables I.e., we wish to estimate the conditional distribution of oxygen intake for a given exercise program and age 35
36 Example: Data deta o2 uptake e.g., aerobic running age It is easy to imagine two straight lines, one for the aerobic points, and the for the running ones How do we estimate them, and do we need two? 36
37 Example: Linear in covariates A sensible LM may be constructed as follows Y i = 1 x i,1 + 2 x i,2 + 3 x i,3 + 4 x i,4 x i,1 = 1 for each subject i (intercept term) x i,2 =0ifsubjecti is running, 1 if aerobic x i,3 = the age of subject i x i,4 = x i,2 x i,3 (interaction term) Thus, the (12) rows of X are comprised of 4 columns: x > i =(x i,1,x i,2,x i,3,x i,4 ) 37
38 Example: Explaining the terms in the model Under this model the conditional expectations of Y for the two different levels of x i,2 are E{Y x} = age, if running E{Y x} =( )+( ) age, if aerobic The model assumes that the relationship is linear in age for both exercise groups, with the difference in intercepts given by and the difference in slopes by
39 Example: MLE/OLS inference deta o2 uptake aerobic running age Classical tests indicate that the exercise program may not be significant Coefficients: Estimate Std. Error t value Pr(> t ) x ** x x ** x
40 Bayesian LM We demonstrate how a simple semi-conjugate prior distribution for and can be used when there is information available about the parameters 2 In situations where prior information is unavailable or difficult to quantify, an alternative default class of prior distributions is given We shall see how the MLE ( ˆ, ˆ2) crops up as factors in our posterior distributions, with similar sampling distributions as posteriors in the default prior case 40
41 Semi-conjugate prior The sampling density of the data, as a function of is p(y X,, 2 )=N n (y; X, 2 ) / exp =exp (y X )> (y X ) 2 2 [y> y 2 > X > y + > X > X ] The role that plays in the exponent looks very similar to that played by y, which is MVN This suggests that a MVN prior for is conjugate 41
42 Semi-conjugate prior If N p ( 0, 0 ), then p( y, X, 2 ) exp > / X> y > 1 + X> X 0 2 which we recognize as proportional to an MVN with n Var[ y, X, n E{ y, X, 2 ]=( X > X/ 2 ) 1 2 } = n ( X > y/ 2 ) i.e., { y, X, 2 } N p ( n, n ) 42
43 Interpretation We can gain some understanding of the posterior (conditional) by considering some limiting cases of n Var[ y, X, n E{ y, X, 2 ]=( X > X/ 2 ) 1 2 } = n ( X > y/ 2 ) If the elements of the prior precision matrix 0 small, then n ˆ, the MLE (or OLS estimator) 1 are On the other hand, if the measurement precision is very small ( is very large), then, the prior expectation 2 43 n 0
44 Conjugate prior variance As in most normal sampling problems, the semiconjugate prior for 2 is IG If 2 IG( 0 /2, /2) then p( 2 yx, ) =( 2 ) ( 0+n)/2+1 exp [ (y X ) > (y X )] which we recognize as an IG density, i.e., { 2 y, X, } IG 0 + n 2, (y X ) > (y X ) 2 44
45 Gibbs sampler Using these full conditionals, we may construct a GS algorithm as follows: given current values of { (s), 2(s) }, new values can be generated by 1. updating a) compute n (y, X, 2(s) ) and n(y, X, 2(s) ) b) sample (s+1) N p ( n, n ) 2. updating 2 a) compute b) sample s 2 =(y X (s+1) ) > (y X (s+1) ) 2(s+1) IG([ 0 + n]/2, [ s 2 ]/2) 45
46 Prior difficulty The Bayesian analysis of the LM requires a specification of the prior parameters ( 0, 0 ) and ( 0, 2 0) There are O(p 2 ) such parameters, so this can be quite a monumental task even for modest p Even when prior information exists (as in the oxygen intake example) sometimes an analysis must be done in the absence of prior information Fortunately, there are some convenient weaklyinformative priors that are easy to use 46
47 Jeffreys prior The (independence) Jeffreys prior for the LM is p(, 2 ) / 1/ 2 i.e., p( )=p( ) / 1 and p( 2 ) / 1/ 2 We may interpret the former as N (0, 1 p ), and the latter as 2 IG(0, 0), from which we may easily derive y, X, 2 N p ( ˆ, 2 (X > X) 1 ) 2 y, X, IG(n/2,nˆ2/2) Check that the posterior is proper for n>p+1 47
48 Marginal posterior variance The marginal posterior variance is available in closed form under the IG (and Jeffreys ) prior (cond. prob) Since we can sample from both of these distributions, samples from the joint posterior p(, 2 y, X) may be obtained by MC 48
49 Example: Oxygen intake We will use the independent Jeffreys prior for this example with the MC procedure using the marginal posterior p( 2 y, X) Frequency Frequency beta[, 1] 2.5% % -23 beta[, 2] 97.5% % 49 Frequency % % 3.3 Frequency % % beta[, 3] beta[, 4]
50 Example: effect of aerobics? The posterior distributions seem to suggest only weak evidence of a difference between the two groups, as the 95% CIs for and both contain zero 2 4 However, these parameters themselves do not quite tell the whole story According to the model, the average difference in y between two people of the same age x but in different exercise programs is x Thus, the posterior distribution for the effect of the aerobics program over the running program is obtained via the posterior distribution of x for each x 50
51 Example: effect of aerobics depends on age beta[2] + beta[4]*x age This indicates reasonably strong evidence of a difference at young ages and less at older ones 51
52 Prediction (forecasting) Suppose that we wish to obtain the posterior predictive distribution Y (x ) at a new location x There are several ways in which we may go about sampling from this predictive distribution The decomposition: (LTP) p(y y, X, x )= Z p(y x,, 2 )p(, (cond. prob.) 2 y, X) d d 2 says that we may obtain MC samples as y (s) N (x > (s), 2(s) ) 52
53 Example: predictive quantiles deta o2 uptake aerobic running age 53
Part 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationPart 7: Hierarchical Modeling
Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationMultivariate Normal & Wishart
Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationMultiparameter models (cont.)
Multiparameter models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 1, 2018 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 1 / 20 Outline Multinomial Multivariate
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationModule 11: Linear Regression. Rebecca C. Steorts
Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationConjugate Analysis for the Linear Model
Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationModule 17: Bayesian Statistics for Genetics Lecture 4: Linear regression
1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model
More informationRegression. ECO 312 Fall 2013 Chris Sims. January 12, 2014
ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationRandom Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30
Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationThe Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11
Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More information1. Introduction. Hang Qian 1 Iowa State University
Users Guide to the VARDAS Package Hang Qian 1 Iowa State University 1. Introduction The Vector Autoregression (VAR) model is widely used in macroeconomics. However, macroeconomic data are not always observed
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationSummary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff
Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationA Bayesian Treatment of Linear Gaussian Regression
A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationCourse topics (tentative) The role of random effects
Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationEfficient Bayesian Multivariate Surface Regression
Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More informationA Primer on Statistical Inference using Maximum Likelihood
A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationIntroduction to Matrix Algebra and the Multivariate Normal Distribution
Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationGibbs Sampling in Linear Models #1
Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationStatistical View of Least Squares
Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationBayesian Inference for Normal Mean
Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationEconomics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18)
Economics 240A, Section 3: Short and Long Regression (Ch. 17) and the Multivariate Normal Distribution (Ch. 18) MichaelR.Roberts Department of Economics and Department of Statistics University of California
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More information(Moderately) Advanced Hierarchical Models
(Moderately) Advanced Hierarchical Models Ben Goodrich StanCon: January 12, 2018 Ben Goodrich Advanced Hierarchical Models StanCon 1 / 18 Obligatory Disclosure Ben is an employee of Columbia University,
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationMultivariate Regression: Part I
Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a
More informationGaussian Models (9/9/13)
STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationCONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S
ECO 513 Fall 25 C. Sims CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 1. THE GENERAL IDEA As is documented elsewhere Sims (revised 1996, 2), there is a strong tendency for estimated time series models,
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationBernoulli and Poisson models
Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More information