A Covariance Regression Model

Size: px
Start display at page:

Download "A Covariance Regression Model"

Transcription

1 A Covariance Regression Model Peter D. Hoff 1 and Xiaoyue Niu 2 December 8, 2009 Abstract Classical regression analysis relates the expectation of a response variable to a linear combination of explanatory variables. In this article, we propose a covariance regression model that parameterizes the covariance matrix of a multivariate response vector as a parsimonious quadratic function of explanatory variables. The approach can be seen as analogous to the mean regression model, and has a representation as a type of random effects model. Parameter estimation for covariance regression is straightforward using either an EM algorithm or a Gibbs sampling scheme. The proposed methodology provides a simple but flexible representation of heteroscedasticity across the levels of an explanatory variable, and can give better-calibrated prediction regions when compared to a homoscedastic model. Some key words: heteroscedasticity, Markov chain Monte Carlo, multivariate, positive definite cone, random effects. 1 Introduction Estimation of a conditional mean function µ x = E[y x] is a well studied data-analysis task for which there are a large number of statistical models and procedures. Less studied is the problem of estimating a covariance function Σ x = Cov[y x] across a range of values of an explanatory x-variable. In the univariate case, several procedures assume that the variance can be expressed as a function of the mean, i.e. σx 2 = g(µ x ) for some known function g (see, for example, Carroll et al. [1982]). In many such cases the data can be represented by a generalized linear model with an appropriate variance function, or perhaps the data can be transformed to a scale in which the variance is constant as a function of the mean [Box and Cox, 1964]. Other approaches separately parameterize the mean and variance, giving either a linear model for the standard deviation [Rutemiller and Departments of Statistics 1,2 and Biostatistics 1, University of Washington, Seattle, WA Web: www. stat.washington.edu/~hoff. This work was partially supported by NSF grant SES

2 Bowers, 1968] or by forcing the variance to be non-negative via a link function [Smyth, 1989]. In situations where the explanatory variable x is continuous and the variance function assumed to be smooth, Carroll [1982] and Müller and Stadtmüller [1987] propose and study kernel estimates of the variance function. Less developed are methods for multivariate heteroscedasticity. One exception is in the context of multivariate time series, for which a variety of multivariate autoregressive conditionally heteroscedastic (ARCH) models have been developed [Engle and Kroner, 1995, Fong et al., 2006]. However, the applicability of such models are limited to situations where the heteroscedasticity is temporal in nature. In this article we develop a simple model for a covariance function {Σ x : x X } for which the domain of the explanatory x-variable is the same as in mean regression, that is, the explanatory vector can contain continuous, discrete and categorical variables. Our model is based on an analogy with linear regression. As a function of x, the covariance regression function Σ x is a curve within the cone of positive definite matrices. A geometric interpretation of this model is developed in Section 2, along with a representation as a random effects model. Section 3 discusses methods of parameter estimation, including an EM algorithm for obtaining maximum likelihood estimates, as well as a Gibbs sampler for Bayesian inference. Section 4 illustrates the model with a simple data analysis involving a bivariate response vector and a univariate continuous explanatory variable. Section 5 summarizes the article and suggests directions for further research. 2 A covariance regression model 2.1 Model definition and geometry Let y R p be a random multivariate response vector and x R q be a vector of explanatory variables. Our goal is to provide a parsimonious model and estimation method for Cov[y x] = Σ x, the conditional covariance matrix of y given x. We begin by analogy with linear regression. The simple linear regression model expresses the conditional mean µ x = E[y x] as a + Bx, an affine function of x. This model restricts the p-dimensional vector µ x to a q-dimensional subspace of R p. The set of p p covariance matrices is the cone of positive semidefinite matrices. This cone is convex and thus closed under addition. The simplest version of our proposed covariance regression model expresses Σ x as Σ x = A + Bxx T B T (1) where A is a p p positive-definite matrix and B is a p q matrix. The resulting covariance function is positive definite for all x, and expresses the covariance as equal to a baseline covariance matrix A plus a rank-1, p p positive definite matrix that depends on x. The model given by Equation 1 is in some sense a natural generalization of mean regression to a model for covariance matrices. A 2

3 vector mean function lies in a vector (linear) space, and is expressed as a linear map from R q to R p. The covariance matrix function lies in the cone of positive definite matrices, where the natural group action is matrix multiplication on the left and right. The covariance regression model expresses the covariance function via such a map from the q q cone to the p p cone. Letting {b 1,..., b p } be the rows of B, the covariance regression model gives Var[y j x] = a j,j + b T j xx T b j (2) Cov[y j, y k x] = a j,k + b T j xx T b k. (3) The parameterization of the variance suggests that the model requires the variance of each element of y to be increasing in the absolute value of the elements of x, as the minimum variance is obtained when x = 0. This constraint can be alleviated by including an intercept term so that the first element of the explanatory vector is 1. For example, in the case of a single scalar explanatory variable x we write b j = (b 1,j, b 2,j ) T, giving Var[y j x] = a j,j + (b 2,j + b 2,j x) 2 Cov[y j, y k x] = a j,k + (b 1,j + b 2,j x)(b 1,k + b 2,k x). For any given finite interval (c, d) R there exist parameter values (b 1,j, b 2,j ) so that the variance of y j is either increasing or decreasing in x for x (c, d). We now consider the geometry of the covariance regression model. For each x, the model expresses Σ x as equal to a point A inside the positive-definite cone plus a rank-1 positive-semidefinite matrix Bxx T B T. The latter matrix is a point on the boundary of the cone, so the range of Σ x as a function of x can be seen as a submanifold of the boundary of the cone, but pushed into the cone by an amount A. Figure 1 represents this graphically for the simplest of cases, in which p = 2 and there is just a single scalar explanatory variable x. In this case, each covariance matrix can be expressed as a three-dimensional vector (σ1 2, σ2 2, σ 1,2} such that σ1 2 0, σ2 2 0, σ 1,2 σ 1 σ 2. The set of such points constitutes the positive semidefinite cone, whose boundary is shown by the outer surfaces in the two plots in Figure 1. The range of Bxx T B T over all x and matrices B includes the set of all rank-1 positive definite matrices, which is simply the boundary of the cone. Thus the possible range of A + Bxx T B T for a given A is simply the boundary of the cone, translated by an amount A. Such a translated cone is shown from two perspectives in Figure 1. For a given A and B, the covariance regression model expresses Σ x as a curve on this translated boundary. A few such curves for six different values of B are shown in black in Figure 1. 3

4 Figure 1: The positive-definite cone and a translation, from two perspectives. The outer surface is the boundary of the the positive definite cone, and the inner cone is equal to the boundary plus a positive definite matrix A. Black curves on the inner cone represent covariance regression curves A + Bxx T B T for different values of B. 2.2 Random effects representation The covariance regression model also has an interpretation as a type of random effects model. Consider a model for observed data y 1,..., y n of the following form: y i = µ xi + γ i Bx i + ɛ i (4) E[ɛ i ] = 0, Cov[ɛ i ] = A E[γ i ] = 0, Var[γ i ] = 1, E[γ i ɛ i ] = 0. The resulting covariance matrix for y i given x i is then E[(y i µ xi )(y i µ xi ) T ] = E[γi 2 Bx i x T i B T + γ i (Bx i ɛ T i + ɛ i x T i B T ) + ɛ i ɛ T i ] = Bx i x T i B T + A = Σ xi. The model given by Equation 4 can be thought of as a factor analysis model in which the latent factor for unit i is restricted to be a multiple of the unit s explanatory vector x i. To see how this 4

5 impacts the variance, let {b 1,..., b p } be the rows of B. The model in (4) can then be expressed as y i,1 µ xi,1 b T 1 x i. = γ i. + y i,p µ xi,p b T p x i ɛ i,1. ɛ i,p. (5) We can interpret γ i as describing additional unit-level variability beyond that represented by ɛ i. The vectors {b 1,..., b p } describe how this additional variability is manifested across the p different response variables. Via the above random effects representation, the covariance regression model can be seen as similar in spirit to a random effects model for longitudinal data discussed in Scott and Handcock [2001]. In that article, the covariance among a set of repeated measurements y i from a single individual i were modeled as y i = µ i + γ i X i β + ɛ i, where X i is an observed design matrix for the repeated measurements and γ i is a mean-zero unit variance random effect. In the longitudinal data application in that article, X i was constructed from a set of basis functions evaluated at the observed time points, and β represented unknown weights. This model induces a covariance matrix of X i ββ T X T i + Cov[ɛ i ] among the observations common to an individual. For the problem we are considering in this article, where the explanatory variables are shared among all p observations of a given unit (i.e. the rows of X i are identical and equal to x i ), the covariance matrix induced by Scott and Handcock s model reduces to (x T i β)2 11 T + Cov[ɛ i ], which is much more restrictive than the model given by (4). 2.3 Higher rank models The model given by Equation 1 restricts the difference between Σ x and the baseline matrix A to be a rank-one matrix. This restriction can be lifted by extending the model to allow for higher-rank deviations. Consider the following extension of the random effects representation given by Equation 4: y = µ x + γ Bx + ψ Cx + ɛ (6) where γ and ψ are mean-zero variance-one random variables, uncorrelated with each other and with ɛ. Under this model, the covariance of y is given by Σ x = A + Bxx T B T + Cxx T C T. This model allows the deviation of Σ x from the baseline A to be of rank 2. Additionally, we can interpret the second random effect ψ as allowing an additional, independent source of heteroscedas- 5

6 ticity for the set of the p response variables. For the rank-2 model, Equation 5 becomes y i,1 µ xi,1 b T 1 x i c T 1. = γ i. + ψ x i ɛ i,1 i. +.. y i,p µ xi,p b T p x i c T p x i Whereas the rank-1 model essentially requires that extreme residuals for one element of y co-occur with extreme residuals of the other elements, the rank-2 model provides more flexibility, allowing for heteroscedasticity across multiple elements of y without requiring extreme residuals for all or none of the elements. Further flexibility can be gained by adding additional random effects, allowing the difference between Σ x and the baseline A to be of any desired rank. ɛ i,p 2.4 Identifiability We first consider identifiability for the rank-1 model and a single scalar explanatory variable x. Including an intercept term so that the explanatory vector is (1, x) T, the model in (1) becomes Σ x (A, B) = A + b 1 b T 1 + (b 1 b T 2 + b 2 b T 1 )x + b 2 b T 2 x 2. Now suppose that (Ã, B) are such that Σ x (A, B) = Σ x (Ã, B) for all x R. Setting x = 0 indicates that A + b 1 b T 1 = à + b 1 bt 1. Considering x = ±1 implies that b 2 b T 2 = b 2 bt 2 and thus that b 2 = ±b 2. If b 2 0, we have b 1 b T 2 + b 2 b T 1 = b T T 1 b2 + b2 b1, which implies that B = ±B and à = A. Thus these parameters are identifiable, at least given an adequate range of x-values. For the rank-r model with r > 1, consider a random effects representation given by y i µ xi = γi,k B (k) x i + ɛ i. Let B 1 = (b (1) 1,..., b(r) 1 ) be the p r matrix defined by the first columns of B (1),..., B (r), and define {B j : k = 1,..., q} similarly. The model can then be expressed as y i µ xi = q x k B k γ i + ɛ i. k=1 Now suppose that γ i is allowed to have a covariance matrix Ψ not necessarily equal to the identity. The above representation shows that the model given by {B 1,..., B k, Ψ} is equivalent to the one given by {B 1 Ψ 1/2,..., B k Ψ 1/2, I}, and so without loss of generality it can be assumed that Ψ = I, i.e. the random effects are independent with unit variance. In this case, note that Cov[γ i ] = Cov[Hγ i ] where H is any r r orthonormal matrix. This implies that the covariance function Σ x given by {B 1,..., B k, I} is equal to the one given by {B 1 H,..., B k H, I} for any orthonormal H, and so the parameters in the higher rank model are not completely identifiable. One possible identifiability constraint is to restrict B 1 = (b (1) 1,..., b(r) 1 ), the matrix of first columns of B (1),..., B (r), to have orthogonal columns. 6

7 3 Parameter estimation 3.1 Likelihood-based inference In this section we consider parameter estimation based on the n p data matrix Y = (y 1,..., y n ) T observed under conditions X = (x 1,..., x n ) T. We assume normal models for all error terms: γ 1,..., γ n independent normal(0, 1) (7) ɛ 1,..., ɛ n independent multivariate normal(0, A) y i = µ xi + γ i Bx i + ɛ i. For now, assume {µ x, x X } are known and let E = (e 1,..., e n ) T be the n p matrix of residuals. The log likelihood of the parameters based on E and X is l(a, B : E, X) = c 1 log A + Bx i x T i B 1 tr[(a + Bx i x T i B T ) 1 e i e T i ]. (8) 2 2 i After some algebra, it can be shown that the maximum likelihood estimates of A and B satisfy the following equations: i ˆΣ 1 x i = i i ˆΣ 1 x i ˆBxi x T i = i ˆΣ 1 x i e i e T i ˆΣ 1 x i e i e T i i ˆΣ 1 x i ˆΣ 1 x i ˆBxi x T i, where ˆΣ x =  + ˆBxx T ˆBT. While not providing closed-form expressions for  and ˆB, these 1 equations indicate that the maximum likelihood estimates give a covariance function ˆΣ x i that, loosely speaking, acts on average as a pseudo-inverse for e i e T i. While direct maximization of (8) is challenging, the random effects representation of the model allows for parameter estimation via simple iterative methods. In particular, maximum likelihood estimation via the EM algorithm is straightforward, as is Bayesian estimation using a Gibbs sampler to approximate the posterior distribution p(a, B Y, X). Both of these methods rely on the conditional distribution of {γ 1,..., γ n } given {Y, X, A, B}. Straightforward calculations give {γ i Y, X, A, B} normal(m i, v i ), where v i = (1 + x T i B T A 1 Bx i ) 1 m i = v i (y i µ xi ) T A 1 Bx i. Given the variety of modeling options for mean regression, we do not cover estimation of {µ x : x X } in the next two sections. In what follows we assume {µ x : x X } are known or fixed at some estimated values. We note that both the EM algorithm and the Gibbs sampling scheme presented below can be modified to accommodate simultaneous estimation of the mean function. 7

8 3.2 Estimation with the EM algorithm Let e i = (y i µ xi ) and E = (e T 1,..., et n ) T. The EM algorithm proceeds by iteratively maximizing the expected value of the complete data log-likelihood, l(a, B) = log p(e A, B, X, γ), which is simply obtained from the multivariate normal density ( ) l(a, B) = 1 n np log(2π) + n log A + (e i γ i Bx i ) T A 1 (e i γ i Bx i ). (9) 2 Given current estimates (Â, ˆB) of (A, B), one step of the EM algorithm proceeds as follows: First, m i = E[γ i Â, ˆB, e i ] and v i = Var[γ i Â, ˆB, e i ] are computed and plugged into the likelihood (9), giving where 2E[l(A, B) Â, ˆB)] = np log(2π) + n log A + E[(e i γ i Bx i ) T A 1 (e i γ i Bx i ) Â, ˆB)] i=1 n i=1 E[(e i γ i Bx i ) T A 1 (e i γ i Bx i ) Â, ˆB)] = (e i m i Bx i ) T A 1 (e i m i Bx i ) + v i x T i B T A 1 Bx i = (e i m i Bx i ) T A 1 (e i m i Bx i ) + s i x T i B T A 1 Bx i s i, with s i = v 1/2 i. Next, a 2n q matrix X is constructed, having ith row equal to m i x i and (n + i)th row equal to s i x i. Additionally, let Ẽ be the 2n p matrix given by (ET, 0 E T ) T. The expected value of the complete data log-likelihood can be written as 2E[l(A, B) Â, ˆB)] np log(2π) = n log A + tr([ẽ B X][Ẽ B X] T A 1 ) which is essentially the likelihood for normal multivariate regression. The next step of the EM algorithm obtains the new values (Â, ˆB) as the maximizers of this expected likelihood, which are given by ˆB = ẼT X( XT X) 1 Â = (Ẽ XB 1 ) T (Ẽ XB 1 )/n. This procedure is repeated until a desired convergence criterion has been met. 3.3 Posterior approximation with the Gibbs sampler A Bayesian analysis provides estimates and confidence intervals for arbitrary functions of the parameters, as well as a simple way of making predictive inference for future observations. Given a prior distribution p(a, B), inference is based on the joint posterior distribution, p(a, B Y, X) 8

9 p(a, B) p(y X, A, B). While this posterior distribution is not available in closed-form, a Monte Carlo approximation to the joint posterior distribution of (A, B) is available via Gibbs sampling. Using the random effects representation of the model in Equation 7, the Gibbs sampler constructs a Markov chain in {A, B, γ 1,..., γ n } whose stationary distribution is equal to the joint posterior distribution of these quantities. Calculations are facilitated by the use of a semi-conjugate prior distribution for A and B, in which p(a) is an inverse-wishart(a 1 0, ν 0) distribution having expectation A 0 /(ν 0 p 1) and p(b A) is a matrix normal(b 0, A, V 0 ) distribution, such that E[B A] = B 0, E[(B B 0 )(B B 0 ) T ) A] = A tr(v 0 ) and E[(B B 0 ) T (B B 0 ) A] = V 0 tr(a). The Gibbs sampler proceeds by iteratively sampling (A, B) and {γ 1,..., γ n } from their full conditional distributions. As with the EM algorithm, we consider inference given values of {µ x : x X }, letting e i = y i µ xi and E = (e 1,..., e n ) T. One iteration of a Gibbs sampler consists of the following steps: 1. Sample γ i normal(m i, v i ) for each i {1,..., n}, where v i = (1 + x T i BT A 1 Bx i ) 1 ; m i = v i e T i A 1 Bx i. 2. Sample (A, B) p(a, B E, X, γ 1,..., γ n ) as follows: (a) sample A inverse-wishart(a 1 n, ν 0 + n), and (b) sample B matrix normal(b n, A, [X T γ X γ + V 1 0 ] 1 ), where X γ = ΓX, with Γ = diag(γ 1,..., γ n ), B n = (E T X γ + B 0 V 1 0 )(XT γ X γ + V 1 0 ) 1, and A n = A 0 + (E X γ B n ) T (E X γ B n ) + (B n B 0 ) T V 1 0 (B n B 0 ). In the absence of strong prior information, default values for the prior parameters {B 0, V 0, A 0, ν 0 } can be based on other considerations. In normal regression for example, Zellner [1986] suggests a g-prior which makes the Bayes procedure invariant to linear transformations of the design matrix X. An analogous result for covariance regression can be obtained by selecting B 0 = 0 and V 0 = g(x T X) 1, i.e. by relating the prior precision of B to the precision given by the observed design matrix. A typical choice for g is to set g = n so that, roughly speaking, the information in the prior distribution is equivalent to that contained in one observation. Such choices lead to what Kass and Wasserman [1995] call a unit-information prior distribution, which in some cases weakly centers the prior distribution around an estimate based on the data. For example, setting ν 0 = p + 2 and A 0 equal to the sample covariance matrix of E weakly centers the prior distribution of A around a homoscedastic sample estimate. 9

10 3.4 Estimation for higher-rank models Section 2.3 discussed the possibility of a more flexible covariance regression model by allowing the deviation between A and Σ x to be of a rank greater than one. The general form for a rank-r covariance regression model is given by r y i = µ xi + γ i,k B (k) x i + ɛ i k=1 = µ xi + B(γ i x i ) + ɛ i, where B = (B (1),..., B (r) ). Estimation for this model can proceed with a small modification of the Gibbs sampling algorithm given above, in which B (k) and {γ i,k, i = 1,..., n} are updated for each k {1,..., r} separately. Alternatively, the full conditional distributions of B and {γ 1,..., γ n } are available in closed form, and so the B- and γ-parameters for all ranks could be updated simultaneously. However, in our experience the calculation of these full conditional distributions is computationally costly: The full conditional distributions of the γ i s involve separate matrix inversions for each i = 1,..., n (or more precisely, for each unique value of the x i s). In our experience, sampling the random effects associated with all ranks simultaneously greatly slows down the Markov chain without providing improved performance in terms of convergence or mixing. An EM algorithm is also available for estimation of this general rank model. The main modification to the algorithm presented in Section 3.2 is that the conditional distribution of each γ i is a multivariate normal distribution, which leads to a more complex E-step in the procedure, while the M-step is equivalent to a multivariate least-squares regression estimation as before. We note that, in our experience, convergence of the EM algorithm for ranks greater than 1 can be slow, presumably due to the identifiability issue described in Section 2.4. More details about these estimation algorithms for the general rank model are available from the companion computer code for this article, available at the first author s website. 4 An example with a single continuous predictor 4.1 Heteroscedastic FEV and height data To illustrate the use of the covariance regression model we analyze data on forced expiratory volume (FEV) in liters and height in inches of 654 Boston youths [Rosner, 2000]. One feature of these data are the general increase in the variance of these variables with age, as shown in Figure 2. As the mean responses for these two variables are also increasing with age, one possible modeling strategy is to apply a variance stabilizing transformation to the data. In general, such transformations presume a particular mean-variance relationship, and choosing an appropriate transformation can be prone to much subjectivity. As an alternative, a covariance regression model allows 10

11 age FEV age height Figure 2: FEV and height data, as a function of age. The smooth lines are local polynomial fits. heteroscedasticity to be modeled separately from heterogeneity in the mean, and also allows for modeling on the original scale of the data. 4.2 Maximum likelihood estimation Ages for the 654 subjects ranged from 3 to 19 years, although there were only two 3-year-olds and three 19-year-olds. As we will be using plug-in estimates of µ x, we combine the data from children of ages 3 and 19 with those of the 4 and 18-year-olds, respectively, giving a sample size of at least 8 in each age category. To focus the example on the covariance regression model, we take as our data the bivariate residuals from two local polynomial regression fits (using loess in the R statistical computing environment), one for each of FEV and height. We then use the EM algorithm described in Section 3 to fit the following two covariance regression models: Model 1: A rank-1 model with x i = (1, age 1/2 i ); Model 2: A rank-2 model with x i = (1, age 1/2 i, age i ). Note that including age 1/2 as a regressor results in there being a linear component to the modeled relationship between age and the variances and covariance. The maximized log likelihoods for these two models are and , respectively, which give the two models roughly the same value of the AIC. However, the increased flexibility of 11

12 Var(FEV) Var(height) Cor(FEV,height) age age age Figure 3: Sample variances and correlations as a function of age, along with covariance regression fits. The gray lines correspond to a rank-1 model with x = (1, age 1/2 ). The black lines correspond to a rank-2 model with x = (1, age 1/2, age). Model 2 over Model 1 is highlighted in Figure 3, which plots the fitted variances and covariance of FEV and height as a function of age, along with the sample variances and correlations for each age group. The plots suggest that the rank-2 model has sufficient flexibility to capture the observed trends in Σ x as a function of age. 4.3 Posterior predictive distributions One potential application of the covariance regression model is to make predictive regions for multivariate observations. Erroneously assuming a covariance matrix to be constant in x could give a prediction region with correct coverage probability for an entire population, but incorrect for specific values of x, and incorrect for making generalizations to populations having a distribution of x-values that is different from that of the data. Predictive inference is straightforward to implement in the context of Bayesian estimation: The prior distributions and data generate a predictive distribution p(ỹ x, Y, X) for each possible value of x, which can be approximated via the output from the Markov chain Monte Carlo algorithm described in Section 3. Using Model 2 described above and the default prior distributions discussed in Section 3, 50,000 iterations of the Gibbs sampler were generated, the first 1,000 of which were discarded to allow for convergence to the stationary distribution. Parameter values were saved every 10th iteration thereafter, leaving 4,900 saved values with which to make Monte Carlo approximations. For each of the 4,900 generated values of {A, B}, we constructed Σ x (A, B) for each age from 4 to 18, yielding 45 parameters for each value of {A, B}. Effective sample sizes (roughly, the equivalent 12

13 age age 5 age 6 age age 8 age 9 age 10 height residual age 11 age 12 age age 14 age 15 age age 17 FEV residual age Figure 4: Observed data and 90% posterior predictive ellipsoids for each age. The black ellipsoids correspond to the covariance regression model, and the gray to a model with constant variance. 13

14 age group sample size homoscedastic heteroscedastic Table 1: Observed-data coverage rates by age for the heteroscedastic predictive ellipse from the covariance regression model, and the homoscedastic predictive ellipse from a constant covariance model. The nominal (target) coverage rates for the ellipses is 90%. number of independent Monte Carlo samples) for these 45 parameters were all above 1000, with the exception of σ1 2 and σ 1,2 for the 18-year-old age group, which had effective sample sizes of 988 and 713 respectively. For each age group x and each of the 4,900 values of Σ x, a predictive sample ỹ was generated from the multivariate normal(0, Σ x ) distribution. A 90% predictive ellipse was then generated as the smallest ellipse that contained 90% of the 4,900 posterior predictive ỹ-values for the given age group. These ellipses are displayed graphically in Figure 4, along with the data and an analogous predictive ellipse based on a homoscedastic (constant covariance) model. Averaged across observations from all age groups, both of the two sets of ellipsoids contain 90.5% of the observed data, which is very close to the nominal coverage of 90%. However, as can be seen from Table 1, the homoscedastic ellipse overcovers the observed data for the younger age groups, and undercovers for the older groups. In contrast, the flexibility of the covariance regression model allows the confidence ellipsoids to change size and shape as a function of age, and thus is able to match the nominal coverage rate fairly closely across the different ages. 5 Discussion This article has presented a model for a covariance matrix Cov[y x] = Σ x as a function of an explanatory variable x. We have presented a geometric interpretation in terms of curves along the boundary of a translated positive definite cone, and have provided a random effects representation that facilitates parameter estimation. This covariance regression model goes beyond what can be provided by variance stabilizing transformations, which serve to reduce the relationship between the mean and the variance. Unlike models or methods which accommodate heteroscedasticity in the form of a mean-variance relationship, the covariance regression model allows the mean function µ x to be parameterized separately from the variance function Σ x. Although the example in this article involved a single continuous predictor, the covariance regression model accommodates explanatory variables of all types, including categorical variables. This could be useful in the analysis of multivariate data sampled from a large number of groups, such as groups are defined by the cross-classification of several categorical variables. For example, it may 14

15 be desirable to estimate a separate covariance matrix for each combination of age group, education level, race and religion in a given population. The number of observations for each combination of explanatory variables may be quite small, making it impractical to estimate a separate covariance matrix for each group. A practical alternative would be to use a covariance regression model as a parsimonious representation of the heteroscedasticity across the groups. Like mean regression, a challenge for covariance regression modeling is variable selection, i.e. the choice of an appropriate set of explanatory variables. One possibility is to use selection criteria such as AIC or BIC, although non-identifiability of some parameters in the higher-rank models requires a careful accounting of the number of parameters. Another possibility may be to use Bayesian procedures, either by Markov chain Monte Carlo approximations to Bayes factors, or by explicitly formulating a prior distribution to allow some coefficients to be zero with non-zero probability. Example code and an R-package for the EM and Gibbs sampling algorithms are available at the first author s website: References G. E. P. Box and D. R. Cox. An analysis of transformations. (With discussion). J. Roy. Statist. Soc. Ser. B, 26: , ISSN Raymond J. Carroll. Adapting for heteroscedasticity in linear models. Ann. Statist., 10(4): , ISSN URL 10:4<1224:AFHILM>2.0.CO;2-H&origin=MSN. Raymond J. Carroll, David Ruppert, and Robert N. Holt, Jr. Some aspects of estimation in heteroscedastic linear models. In Statistical decision theory and related topics, III, Vol. 1 (West Lafayette, Ind., 1981), pages Academic Press, New York, Robert F. Engle and Kenneth F. Kroner. Multivariate simultaneous generalized arch. Econometric Theory, 11(1): , ISSN doi: /S URL http: //dx.doi.org.offcampus.lib.washington.edu/ /s P. W. Fong, W. K. Li, and Hong-Zhi An. A simple multivariate ARCH model specified by random coefficients. Comput. Statist. Data Anal., 51(3): , ISSN doi: /j.csda URL /j.csda Robert E. Kass and Larry Wasserman. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc., 90(431): , ISSN

16 Hans-Georg Müller and Ulrich Stadtmüller. Estimation of heteroscedasticity in regression analysis. Ann. Statist., 15(2): , ISSN doi: /aos/ URL http: //dx.doi.org/ /aos/ Bernard Rosner. Fundamentals of Biostatistics. Duxbury Press, ISBN Herbert C. Rutemiller and David A. Bowers. Estimation in a heteroscedastic regression model. J. Amer. Statist. Assoc., 63: , ISSN M.A. Scott and M.S. Handcock. Covariance Models for Latent Structure in Longitudinal Data. Sociological Methodology, pages , Gordon K. Smyth. Generalized linear models with varying dispersion. J. Roy. Statist. Soc. Ser. B, 51(1):47 60, ISSN URL (1989)51:1<47:GLMWVD>2.0.CO;2-4&origin=MSN. Arnold Zellner. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian inference and decision techniques, volume 6 of Stud. Bayesian Econometrics Statist., pages North-Holland, Amsterdam,

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function Communications in Statistics Theory and Methods, 43: 4253 4264, 2014 Copyright Taylor & Francis Group, LLC ISSN: 0361-0926 print / 1532-415X online DOI: 10.1080/03610926.2012.725498 Bayesian Estimation

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information

LINEAR MULTILEVEL MODELS. Data are often hierarchical. By this we mean that data contain information LINEAR MULTILEVEL MODELS JAN DE LEEUW ABSTRACT. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 2005. 1. HIERARCHICAL DATA Data are often hierarchical.

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Factor Analysis. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Factor Analysis Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 1 Factor Models The multivariate regression model Y = XB +U expresses each row Y i R p as a linear combination

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Multilevel Analysis, with Extensions

Multilevel Analysis, with Extensions May 26, 2010 We start by reviewing the research on multilevel analysis that has been done in psychometrics and educational statistics, roughly since 1985. The canonical reference (at least I hope so) is

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

arxiv: v1 [stat.me] 6 Nov 2013

arxiv: v1 [stat.me] 6 Nov 2013 Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 A Generalized Savage-Dickey Ratio Ewan Cameron e-mail: dr.ewan.cameron@gmail.com url: astrostatistics.wordpress.com

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Gibbs Sampling in Linear Models #1

Gibbs Sampling in Linear Models #1 Gibbs Sampling in Linear Models #1 Econ 690 Purdue University Justin L Tobias Gibbs Sampling #1 Outline 1 Conditional Posterior Distributions for Regression Parameters in the Linear Model [Lindley and

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/12/2012 Outline 1 The Notion of Factors 2 Factor Analysis via Maximum Likelihood

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Recursive Deviance Information Criterion for the Hidden Markov Model

Recursive Deviance Information Criterion for the Hidden Markov Model International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information