CHOICE OF REFERENCE SUBCLASS IN REGRESSION MODELS

Size: px
Start display at page:

Download "CHOICE OF REFERENCE SUBCLASS IN REGRESSION MODELS"

Transcription

1 CHOICE OF REFERENCE SUBCLASS IN REGRESSION MODELS Gilbert MacKenzie 1,2 & Defen Peng 2,3 1 ENSAI, Rennes & 2 Centre of Biostatistics University of Limerick, Ireland 3 UBC,Vancouver, Canada. CASI, Templepatrick, Northern Ireland, May 14-16, 2014 CASI, Templepatrick, N. Ireland, May 14-16th,

2 ENSAI Building 2nd Int. BIO-SI W/S Oct 6/7th, 2011 CASI, Templepatrick, N. Ireland, May 14-16th,

3 Outline This talk is about choice of reference subclass in parametric regression models with categorical variables - mainly in observational studies Introduction Linear Model Setting Precision & Multi-collinearity Extensions to GLMs Conclusions CASI, Templepatrick, N. Ireland, May 14-16th,

4 Introduction A quotation: There is no statistical justification for choosing one reference category or another. The choice is usually made on subject matter grounds to make the interpretations easier and the choice can easily vary from data analyst to data analyst. So, the need for a reference category can complicate interpretations and the results... R. Berk (2008). We show that a judicious choice of reference subclass can improve certain properties of the regression model. CASI, Templepatrick, N. Ireland, May 14-16th,

5 Secondary Criteria Many model properties are invariant to the choice of reference subclass so we need secondary criteria: Precision of estimates - Total Variance, ˆT r = tr[v ( ˆβ r )]. Multicollinearity - Condition Number, ˆK r Logical Considerations NB: The third can be evaluated in terms of the first two. Interested in the pair (ˆT r, ˆK r ) We illustrate in terms of the Linear Model - only 15 minutes. CASI, Templepatrick, N. Ireland, May 14-16th,

6 Linear Model Setting We consider the general linear model Y = Xβ + ɛ (1) where: Y is a continuous response variable, X is an n p design matrix, β is a p 1 column vector of regression parameters, E(ɛ) = 0 and E(ɛɛ ) = σ 2 I n. We will also assume that ɛ i N(0, σ 2 ) when required, for i = 1,..., n. It follows immediately that and that ˆβ = (X X) 1 X Y (2) V ( ˆβ) = σ 2 (X X) 1 (3) which implies, under the the Gaussian assumption, that the Fisher information matrix is I(β) = (X X)/σ 2 (4) CASI, Templepatrick, N. Ireland, May 14-16th,

7 Form of Design Matrix If the design matrix X encodes a single categorical variable with p = (k + 1) subclasses, X X, may take one of of two main forms X X = diag(n 1, n 2,..., n k+1 ) (5) or, n n 1 n 2 n k n 1 n X X = n 2 0 n 2 0. (6).... n k 0 0 n k In (5) we have included exactly p = (k + 1) binary indicator variables and in (6) we have included an intercept term and exactly k binary indicator variables. CASI, Templepatrick, N. Ireland, May 14-16th,

8 Precision Suppose we have a sample allocation (n 1, n 2,, n p ), where, at least, one of the allocated numbers is different from the others. Let r, denote the reference category which may be chosen freely from (1,..., p). Then (X X) 1 = n r /n n r /n 2 1 n r n r /n k (7) where n r = n k j=1 n j is the allocated number of the reference category. CASI, Templepatrick, N. Ireland, May 14-16th,

9 Example 1 - Binary Covariate With p = 2, X (x 0, x 1 ) implies that category 2 is the reference ( 1 ˆβ r = 1 i[r] n y ) i r n r i[r] y i + 1 n s i[s] y (8) i ( 1 ˆβ s = 1 i[s] n y ) i s n s i[s] y i + 1 n r i[r] y. (9) i the intercepts differ, but, ˆβ 1,r = ˆβ 1,s. On the diagonal of the (2 2) variance-covariance matrices diagv ( ˆβ r ) = [ σ2 n r, σ 2 ( 1 n r + 1 n s )], diagv ( ˆβ s ) = [ σ2 n s, σ 2 ( 1 n s + 1 n r )], thus, Var( ˆβ 1,r ) = Var( ˆβ 1,s ). Therefore, the precision of the regression coefficient is invariant to switching the reference category. CASI, Templepatrick, N. Ireland, May 14-16th,

10 Example 2 - Two binary covariates With p = 3, X (x 0, x 1, x 2 ) implies that category 3 is the reference First the regression coefficients are different (not shown) Then the diagonals of the V-C matrices are diag V ( ˆβ r=3 ) = σ 2[ 1 n 3, ( 1 n n 1 ), ( 1 n n 2 ) ], diag V ( ˆβ r=2 ) = σ 2[ 1 n 2, ( 1 n n 1 ), ( 1 n n 3 ) ], diag V ( ˆβ r=1 ) = σ 2[ 1 n 1, ( 1 n n 2 ), ( 1 n n 3 ) ] So in LMs ˆT r is minimised when n r = n max CASI, Templepatrick, N. Ireland, May 14-16th,

11 Multi-collinearity We use the condition number, ˆK r, to measure multi-collinearity. Belsley( 2004) defines the condition number of a square matrix, M, as K (M) = λ max /λ min = ν max /ν min, where λ max = maximum(λ j ), λ min = minimum(λ j ), and λ j, j = 1, 2,, p, are the eigenvalues of M and the νs are the Singular Value Decomposition (SVD) numbers. The threshold values for K (M = X X) are 10 and 30 indicating medium and serious degrees of multi-collinearity. We use K r to denote K (M r ) where M = X X and where r indicates reference subclass dependence. CASI, Templepatrick, N. Ireland, May 14-16th,

12 MC LM Binary Covariate The eigenvalues λ of M = X X based on determinant det(x X λi) = λ 2 (n + n 1 )λ + nn 1 n 2 1 are λ max = n + n 1 2 λ min = n + n (n n 1 ) n1 2, 1 (n n 1 ) n1 2, where I is 2 2 identity matrix. The condition number is then K r (X X) = 1 + ρ 1 + (1 ρ 1 ) 2 + 4ρ 2 1, (10) 1 + ρ 1 (1 ρ 1 ) 2 + 4ρ 2 1 where ρ 1 = n 1 /n. CASI, Templepatrick, N. Ireland, May 14-16th,

13 Relationship between ˆT r and ˆK r We have examined this in a variety of cases - analytically and via simulation - in LMs and GLMs and the results are similar. The correlation between (ˆT r, ˆK r ) is typically 0.95, showing a strong linear relationship. This means that minimizing ˆT r also minimises ˆK r. Thus the stability of the model is improved by selecting n r = n max in LMs There is no loss of information by switching reference subclass as contrasts of interest are invariant to this switch. In GLMs things are more complicated when minimizing ˆT r, but the principle is the same. CASI, Templepatrick, N. Ireland, May 14-16th,

14 Lung Cancer Study Survival Study of lung cancer in NI (Wilkinson, 1992). A total of 855 incident cases followed for 2 years. 50% dead by 6 months. Interested in who gets active treatment (and why)? Some 51.5% received no active treatment! Leads to a standard MLF analysis with Y=1 for treatment else Y=0. Some 5 covariates WHO, Age, Cell type, Metastases and Albumen. See example in next slide. CASI, Templepatrick, N. Ireland, May 14-16th,

15 CASI, Templepatrick, N. Ireland, May 14-16th,

16 Conclusions There is more see than suggested by Berk. Maximising the Precision minimizes the Multi-collinearity. Must be useful in sparse data situations with many categorical covariates. No loss of information on contrasts of interest For LMs and GLMs (and beyond) results are similar. Overall we have created some useful tools. We hope their use will improve practice. CASI, Templepatrick, N. Ireland, May 14-16th,

17 Acknowledgements The work in this paper was supported by two Science Foundation Ireland (SFI, project grants. Professor MacKenzie was supported under the Mathematics Initiative, II, via the BIO-SI ( research programme in the Centre of Biostatistics, University of Limerick, Ireland: grant number 07/MI/012. Professor Peng is also supported via a Research Frontiers Programme award, grant number 05/RF/MAT 026. CASI, Templepatrick, N. Ireland, May 14-16th,

18 References ALTMAN, D. G. & ROYSTON, P. (2006). Statistics notes - The cost of dichotomising continuous variables. British Medical Journal, 332, BELSLEY, D. A., KUH, E. & WELSCH, R. E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons, First edition. BERK, R. (2008). Statistical learning from regression perspective. Springer, New York. COHEN, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum. COX & SNELL (1989). Analysis of Binary Data. Chapman and Hall, Second edition. CRAN.R-PROJECT, (2009). R project. Retrieved 2010, from Package pwr : ELWOOD, J. H., MACKENZIE, G. & CRAN, G. (1974). Observations on single births to women resident in Belfast : Part I - Factors associated with perinatal mortality. J. Chron. Dis, 27, CASI, Templepatrick, N. Ireland, May 14-16th,

19 References FELDSTEIN, M. S., (1966). A binary variable multiple regression method of analysing factors affecting Peri-Natal mortality and other outcomes of pregnancy. Journal of the Royal Statistical Society. A 129, FRØSLIE, K. F, RØISLIEN, J., LAAKE, P., HENRIKSEN, T., QVIGSTAD, E. and VEIERØD, M. B.(2010). Categorisation of continuous exposure variables revisited. A response to the Hyperglycaemia and Adverse Pregnancy Outcome (HAPO) Study. BMC Medical Research Methodology, 10, ISHAM, (1991). Statistical theory and modelling by edited by D.V. Hinkley, N. Reid, and E.J. Snell. Chapman and Hall. MACKENZIE, G. & PENG, D. (2010). Properties of estimators in interval censored PH regression survival models. Submitted. Journal of the Royal Statistical Society. C. NIJENHUIS, A. & WILF, H. S. (1978). Combinatorial Algorithms for Computers and Calculators. Academic Press, Second edition. PENG, D. & MACKENZIE G. (2014). Discrepancy and choice of reference subclass in categorical regression models. In: Statistical Modelling in Biostatistics and Bioinformatics. Springer, Munich, 260 pages. CASI, Templepatrick, N. Ireland, May 14-16th,

20 References POCOCK, S. J., COLLIER, T. J., DANDREO, K. J., DE STAVOLA, B. L., GOLDMAN, M. B., KALISH, L. A., LINDA, E. K. & VALERIE, A. M. (2004). Issues in the reporting of epidemiological studies: a survey of recent practice. British Medical Journal, 329, RAO, C. R. & RAO, M. B. (1998). Matrix Algebra and its Applications to Statistics and Econometrics. World Scientific Publishing, Singapore, First edition. SHAPIRO, S. S. (1980). How to test normality and other distributional assumptions. statistical techniques, 3, SMITH, O. K. (1961). Eigenvalues of a symmetric 3 3 matrix. Communications of the ACM. 4, 168. WISSMANN, M., TOUTENBURG, H. & SHALABH, (2007). Role of Categorical Variables in Multicollinearity in Linear Regression Model. Technical Report. Department of Statistics University of Munich, Germany. Number 008. WILLIAM, G. J. (2005). Regression III: Advanced methods. Lecture notes. Department of Political Science Michigan State University, America. CASI, Templepatrick, N. Ireland, May 14-16th,

21 Minimizing the Total Variance Proof. Let n r = max(n 1,, n p ), and s {1, 2,, p} (s r) be another choice of reference category, where n r > n s, then, from (17), the corresponding total variances are V r = 1/n r + (1/n r + 1/n s ) + p (1/n r + 1/n j ) j r,s and p V s = 1/n s + (1/n s + 1/n r ) + (1/n s + 1/n j ). j r,s Since 1/n r < 1/n s, we have V r < V s, i.e., choosing n r = n max minimizes the total variance. CASI, Templepatrick, N. Ireland, May 14-16th,

22 Canonical GLMs Canonical GLM for independent responses Y i with E(Y i ) = µ i = g(θ i ), θ i = k x ui β u u=0 is the linear predictor, β u = 0,..., k, represents the p regression parameters. Then the observed information matrix for β is I o (β) = ( β θ T )( θ θ K )( β θ T ) T = ( β µ T )( θ θ K ) 1 ( β µ T ) T.(11) When β 0 is the intercept, we can re-express as the (p p) matrix ( I o (β 0, β c ) = (X WX) = i w i i x ci w ) i i x ciw i i x cix ci w, (12) i CASI, Templepatrick, N. Ireland, May 14-16th,

23 Structural Weights Table 2: Structural weights Distribution Density(Mass) Function Link function w i (y µ) 2 Normal f (y; µ, σ) = 1 e 2πσ 2σ 2 xβ = µ = θ σ 2 Exponential f (y; λ) = λe λy xβ = µ 1 = θ (x i β) 2 IG f (y; µ, λ) = ( λ 2πy 3 ) 1 2 e λ(y µ) 2 2µ 2 y xβ = µ 2 = θ λ 4 (x i Poisson f (y; λ) = λy y! e λ xβ = log(µ) = θ exp(x i ( n Binomial f (y; n, p) = p y) (1 p) n y µ xβ = log( (1 µ ) = θ exp(x i β) (1+exp(x i β))2 Geometric f (y; p) = (1 p) y 1 µ p xβ = log( (1 µ ) = θ 1 1+exp(x i β) CASI, Templepatrick, N. Ireland, May 14-16th,

24 Extension to Canonical GLMs For a single categorical variate across GLMs we can Show that the optimal choice depends on n r ϕ( ˆβ 0 ). Show that we should choose the subclass where n r ϕ( ˆβ 0 ) is max. Show that choosing n r = n max is usually good. Show that when the observed allocation is uniform (n 1 = n 2 = = n p ) or near uniform the choice of reference subclass does not matter. Show there is an index to tell you when you need to worry about lack of uniformity. These results extend to GLMs with multiple categorical covariates. CASI, Templepatrick, N. Ireland, May 14-16th,

25 Contrasts of Interest Generally, such contrasts are conducted among the k regression coefficients. Then we have V (Z ) = C V (β r )C where c 0 = 0 and c 1 = 0. V (Z ) = σ 2 k cj 2 /n j Then Z does not depend on β 0 and accordingly such contrasts are invariant to the choice of reference subclass. j=1 CASI, Templepatrick, N. Ireland, May 14-16th,

26 Generalization of V-C matrix The generalised Variance covariance matrix for GLMs is (τ 1 nr ) 1 1 n 1 I 1 1 (β 0, β c) = (τ 2 nr ) 1 n n r ϕ(β 0 ) 2.., (13) (τ k nr ) n k where i [j] means subject i jth category, whence x ji = 1 for i jth category, and τ j = ϕ(β 0 )/ϕ(β 0 + β j ), n r and n j are the allocated numbers in the reference subclass and the jth subclass respectively, j = 1, 2,, k. This matrix structure recurs in other settings (MacKenzie & Peng, 2013: Peng & MacKenzie, 2014). CASI, Templepatrick, N. Ireland, May 14-16th,

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Covariance modelling for longitudinal randomised controlled trials

Covariance modelling for longitudinal randomised controlled trials Covariance modelling for longitudinal randomised controlled trials G. MacKenzie 1,2 1 Centre of Biostatistics, University of Limerick, Ireland. www.staff.ul.ie/mackenzieg 2 CREST, ENSAI, Rennes, France.

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

Dirichlet process Bayesian clustering with the R package PReMiuM

Dirichlet process Bayesian clustering with the R package PReMiuM Dirichlet process Bayesian clustering with the R package PReMiuM Dr Silvia Liverani Brunel University London July 2015 Silvia Liverani (Brunel University London) Profile Regression 1 / 18 Outline Motivation

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Modeling Linear Models Generalized Linear Models. Modeling in R. Peter Dalgaard. Department of Biostatistics University of Copenhagen

Modeling Linear Models Generalized Linear Models. Modeling in R. Peter Dalgaard. Department of Biostatistics University of Copenhagen Models in R Peter Dalgaard Department of Biostatistics University of Copenhagen Mixed Models in R, Copenhagen, January 2006 Outline Modeling Linear Models Generalized Linear Models Modeling Tools: Overview

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1 Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Flexible modelling of the cumulative effects of time-varying exposures

Flexible modelling of the cumulative effects of time-varying exposures Flexible modelling of the cumulative effects of time-varying exposures Applications in environmental, cancer and pharmaco-epidemiology Antonio Gasparrini Department of Medical Statistics London School

More information

The Weibull Distribution

The Weibull Distribution The Weibull Distribution Patrick Breheny October 10 Patrick Breheny University of Iowa Survival Data Analysis (BIOS 7210) 1 / 19 Introduction Today we will introduce an important generalization of the

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

CTDL-Positive Stable Frailty Model

CTDL-Positive Stable Frailty Model CTDL-Positive Stable Frailty Model M. Blagojevic 1, G. MacKenzie 2 1 Department of Mathematics, Keele University, Staffordshire ST5 5BG,UK and 2 Centre of Biostatistics, University of Limerick, Ireland

More information

Applied Linear Statistical Methods

Applied Linear Statistical Methods Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

1 Bayesian Linear Regression (BLR)

1 Bayesian Linear Regression (BLR) Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Response surface designs using the generalized variance inflation factors

Response surface designs using the generalized variance inflation factors STATISTICS RESEARCH ARTICLE Response surface designs using the generalized variance inflation factors Diarmuid O Driscoll and Donald E Ramirez 2 * Received: 22 December 204 Accepted: 5 May 205 Published:

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

Example: Suppose Y has a Poisson distribution with mean

Example: Suppose Y has a Poisson distribution with mean Transformations A variance stabilizing transformation may be useful when the variance of y appears to depend on the value of the regressor variables, or on the mean of y. Table 5.1 lists some commonly

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model

A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Journal of Data Science 3(2005), 137-151 A Diagnostic for Assessing the Influence of Cases on the Prediction of Random Effects in a Mixed Model Joseph E. Cavanaugh 1 and Junfeng Shang 2 1 University of

More information

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest. Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

More information

Lecture 9 Multi-Trait Models, Binary and Count Traits

Lecture 9 Multi-Trait Models, Binary and Count Traits Lecture 9 Multi-Trait Models, Binary and Count Traits Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 September 018 OUTLINE Multiple-trait

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models

Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Local Influence and Residual Analysis in Heteroscedastic Symmetrical Linear Models Francisco José A. Cysneiros 1 1 Departamento de Estatística - CCEN, Universidade Federal de Pernambuco, Recife - PE 5079-50

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares L Magee Fall, 2008 1 Consider a regression model y = Xβ +ɛ, where it is assumed that E(ɛ X) = 0 and E(ɛɛ X) =

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Package mctest. February 26, 2018

Package mctest. February 26, 2018 Type Package Title Multicollinearity Diagnostic Measures Version 1.2 Date 2018-02-24 Package mctest February 26, 2018 Author Dr. Muhammad Imdad Ullah and Dr. Muhammad Aslam Maintainer Dr. Muhammad Imdad

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

((n r) 1) (r 1) ε 1 ε 2. X Z β+

((n r) 1) (r 1) ε 1 ε 2. X Z β+ Bringing Order to Outlier Diagnostics in Regression Models D.R.JensenandD.E.Ramirez Virginia Polytechnic Institute and State University and University of Virginia der@virginia.edu http://www.math.virginia.edu/

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Introduction to the predictnl function

Introduction to the predictnl function Introduction to the predictnl function Mark Clements Karolinska Institutet Abstract The predictnl generic function supports variance estimation for non-linear estimators using the delta method with finite

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach Kristina Pestaria Sinaga, Manuntun Hutahaean 2, Petrus Gea 3 1, 2, 3 University of Sumatera Utara,

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations) Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology

More information

The Relationship Between the Power Prior and Hierarchical Models

The Relationship Between the Power Prior and Hierarchical Models Bayesian Analysis 006, Number 3, pp. 55 574 The Relationship Between the Power Prior and Hierarchical Models Ming-Hui Chen, and Joseph G. Ibrahim Abstract. The power prior has emerged as a useful informative

More information

Estimability Tools for Package Developers by Russell V. Lenth

Estimability Tools for Package Developers by Russell V. Lenth CONTRIBUTED RESEARCH ARTICLES 195 Estimability Tools for Package Developers by Russell V. Lenth Abstract When a linear model is rank-deficient, then predictions based on that model become questionable

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate

Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate Towards stratified medicine instead of dichotomization, estimate a treatment effect function for a continuous covariate Willi Sauerbrei 1, Patrick Royston 2 1 IMBI, University Medical Center Freiburg 2

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

MATH Notebook 4 Spring 2018

MATH Notebook 4 Spring 2018 MATH448001 Notebook 4 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH448001 Notebook 4 3 4.1 Simple Linear Model.................................

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information