Module 3. Latent Variable Statistical Models. y 1 y2

Size: px
Start display at page:

Download "Module 3. Latent Variable Statistical Models. y 1 y2"

Transcription

1 Module 3 Latent Variable Statistical Models As explained in Module 2, measurement error in a predictor variable will result in misleading slope coefficients, and measurement error in the response variable will result in inflated standard errors. These problems can be reduced by using latent variable statistical models in which the measurement models described in Module 2 are integrated into any of the statistical models described in Module 1. Statistical models can be specified in terms of true scores (from a strictly parallel, parallel, or tau-equivalent model) or factor scores (from a congeneric or factor analysis model). True scores and factor scores will be referred to as latent variable scores. There are several types of analyses that benefit from an analysis of latent variable scores. In a GLM where x 1 is the predictor variable of primary interest and one or more confounding variables have been included in the model, if the confounding variables are measured with error, their confounding effects will only be partially removed from the relation between x 1 and y. If the confounding variables are represented by latent variables, then the effects of the confounding variables can be more effectively removed from the relation between x 1 and y. If two or more predictor variables measure highly similar attributes, multicollinearity problems can be avoided by using those predictor variables as indicators of a single latent variable. If two or more response variables measure highly similar attributes, the model will contain fewer path coefficients if those response variables are used as indicators of a single latent variable. An analysis of indirect effects is another type of analysis where analyzing latent variables is preferred to analyzing variables that are measured with error. Consider the path model illustrated below. e 1 e 2 x1 y 1 y2 β 11 γ 12 1

2 Measurement error in x 1 attenuates β 11, and measurement error in y 1 attenuates γ 12. If ρ x1 and ρ y1 are the reliabilities of x 1 and y 1, then the indirect effect β 11 γ 12 is attenuated by a factor of ρ x1 ρ y1. For instance, if both reliabilities equal.5, then the indirect effect would be attenuated by a factor of. 5(. 5) =.5. Furthermore, measurement error in y 1 and y 2 will inflate the standard errors of both path coefficients, which in turn will inflate the standard error for the indirect effect. Latent variable statistical models are also attractive in applications where the factor scores in a congeneric or factor analysis model represent a better approximation to the psychological construct under investigation than what could be measured using a single measurement of the construct. For instance, if spatial ability is an important variable in a statistical model, it could be assessed using a single test such as the y 1 = Card Rotation Test, y 2 = Hidden Figures Test, y 3 = Gestalt Picture Completion Test, or y 4 = Surface Development Test. However, each of these tests assesses only a particular aspect of spatial ability, and it could be argued that the factor scores in a congeneric model for y 1, y 2, y 3, and y 4 represent a more meaningful and complete representation of spatial ability. A more general notational scheme is needed for latent variable statistical models in which some latent variables are predictor variables and some latent variables are response variables. Latent predictor variables are represented by ξ, and latent response variables are represented by η. The indicators of ξ are represented by x, and the indicators of η are represented by y. The unique factors (or measurement errors) are represented by δ for ξ and by ε for η. The factor loadings for η are represented by λ y and the factor loadings for ξ are represented by λ x. Several basic types of latent variable statistical models are described below. A path diagram and the lavaan code is given for each example. Latent Variable Regression Model As already stated, response variable measurement error inflates the standard errors of the slope estimates and predictor variable measurement can attenuate or exaggerate the slope estimates depending on the pattern of correlations among the predictor variables. Furthermore, when the measurements of a response variable and a predictor variable are obtained using a common method (e.g., both are self-report measures or both are 5-point Likert scale measures), the strength of the relation between the response variable and predictor variable can be exaggerated due to common-method variance (CMV). Suppose a 2

3 sample of employees are asked to self report their level of commitment to the organization and also self report their level of job performance. Some employees will overstate their true level of commitment and job performance while other employees will understate their true level of commitment and job performance and this will exaggerate the estimated correlation between organizational commitment and job performance. If CMV is a potential concern, each attribute can be measured using two or more measurement methods. A path diagram for a simple linear regression model where the predictor variable and the response variable have been measured using the same three measurement methods is illustrated below. In this example, assume that x 1 and y 1 have been measured using the same method, x 2 and y 2 have been measured using the same method, and x 3 and y 3 have been measured using the same method. This model includes covariances among the three pairs of unique factors that have a common measurement method. The estimate of β 1 could be substantially exaggerated if these covariances are not included in the model. ξ 1 β 1 η 1 e 1 λ x2 λ x3 1 λ y2 λ y3 x 1 x 2 x 3 y 1 y 2 y 3 δ 1 δ 2 δ 3 ε 1 ε 2 ε 3 (Model 1) σ δ1 ε 1 σ δ2 ε 2 σ δ3 ε 3 The lavaan model specification for Model 1 is given below. reg.model <- ' ksi =~ 1*x1 + x2 + x3 eta =~ 1*y1 + y2 + y3 eta ~ ksi x1 ~~ y1 x2 ~~ y2 x3 ~~ y3 ' 3

4 The ksi =~ 1*x1 + x2 + x3 command defines ξ 1 and constrains the factor loading for x 1 to equal 1. The eta =~ 1*y1 + y2 + y3 command defines η 1 and constrains the factor loadings for y 1 to equal 1. The eta ~ ksi command defines the simple linear regression model with η 1 as the response variable and ξ 1 as the predictor variable. The x1 ~~ y1, x2 ~~ y2, and x3 ~~ y3 commands specify the covariances among the pairs of measurements that used a common method of measurement. ANCOVA Model with Latent Covariates An ANCOVA model in a nonexperimental design that includes one or more confounding variables as covariates, can remove the linear confounding effects of the covariates and provide an estimate of the treatment effect that more closely approximates the causal effect of treatment. However, if any of the covariates are measured with error, then the confounding effects are only partially removed and the estimated effect of treatment can be misleading. The path diagram of a 2-group ANCOVA model with two latent covariates is shown below where x 5 is a dummy variable that codes group membership (group1 = Treatment 1 and group 2 = Treatment 2). The β 3 coefficient describes the difference in the two population treatment means after controlling for differences in the latent covariates (ξ 1 and ξ 2 ). The variance of e represents the within-group error variance. x 5 β 3 y e x 4 δ 4 1 ξ 2 β 2 x 3 δ 3 1 β 1 δ 2 1 x 2 δ 1 1 x 1 ξ 1 (Model 2) 4

5 The lavaan model specification for Model 1 is given below. ancova.model <- ' ksi1 =~ 1*x1 + 1*x2 ksi2 =~ 1*x3 + 1*x4 y ~ b3*x5 + b2*ksi2 + b1*ksi1 ksi1 ~~ ksi2 ksi1 ~~ x5 ksi2 ~~ x5 ' The ksi1 =~ 1*x1 + 1*x2 command defines ξ 1 and constrains the two factor loadings for x 1 and x 2 to equal 1. The ksi2 =~ 1*x3 + 1*x4 command defines ξ 2 and constrains the two factor loadings for x 3 and x 4 to equal 1. The ksi1 ~~ ksi2, ksi ~~ x5, and ksi2 ~~ x5 commands specify the covariances among x 5, ξ 1, and ξ 2. The variances of δ 1,δ 2, δ 3, and δ 4 are unconstrained in this example to define a tau-equivalent measurement model for x 1 and x 2 and a tau-equivalent measurement model for x 3 and x 4. Parallel measurement models could be defined by imposing an equality constraint on the variances of δ 1 and δ 2 and an equality constraint on the variances of δ 3 and δ 4 by adding the commands x1 ~~ var1*x1, x2 ~~ var1*x2, x3 ~~ var2*x2, and x4 ~~ var2*x4. MANOVA with Latent Response Variables As explained in Module 1, a one-way MANOVA can be used to test the null hypothesis that the population means of all r response variables are equal across all levels of the independent variable. This test does not provide useful scientific information because the null hypothesis is known to be false in virtually every application. Useful scientific or practical information can be obtained by computing Bonferroni confidence intervals for all pairwise group differences of means and for all r response variables. There are r[k(k 1)/2] pairwise comparisons in a k-group design, but analyzing and reporting all these results could be intractable for unless k and r are both small. Furthermore, a Bonferroni adjustment for so many comparisons could produce uselessly wide confidence intervals. If the r response variables represent congeneric indicators of q factors, then only q[k(k 1)/2] pairwise comparisons need to be examined. More importantly, the q factors might have greater psychological meaning than any of the r individual response variables. 5

6 A path diagram of a 3-group MANOVA with q = 2 sets of congeneric measures is shown below where x 1 and x 2 are dummy variables that code group membership. y 1 ε 1 η 1 x 1 β 11 λ y2 y ε λ y3 β 12 e 1 ε 3 y 3 β 21 e 2 ε 4 y 4 x 2 η 2 1 β 22 λ y5 ε 5 y 5 λ y6 y 6 ε 6 λ y7 y 7 ε 7 (Model 3) In the above model with dummy coding (where x j = 1 if group = j, 0 otherwise), β 11 describes the difference in the means of η 1 for levels 1 and 3 of the independent variable, and β 21 describes the difference in the means of η 1 for levels 2 and 3 of the independent variable. Likewise, β 12 describes the difference in the means of η 2 for levels 1 and 3 of the independent variable, and β 22 describes the difference in the means of η 2 for levels 2 and 3 of the independent variable. The mean differences for levels 1 and 2 are equal to β 11 β 21 and β 12 β 22. The lavaan model definition for Model 2 is given below. 6

7 manova.model <- ' eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 eta2 =~ 1*y4 + lamy5*y5 + lamy6*y6 + lamy7*y7 eta1 ~ b11*x1 + b21*x2 eta2 ~ b21*x1 + b22*x2 eta1 ~~ eta2 b31 := b11 b12 b32 := b21 b22 ' The eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 and eta2 =~ 1*y4 + lamy5*y5 + lamy6*y6 + lamy7*y7 commands define the two factors where the first loading in each set of congeneric measurement is set equal to 1 to identify each measurement model. The eta1 ~ b11*x1 + b21*x2 and eta2 ~ b21*x1 + b22*x2 commands define the 1-way MANOVA model. The eta1 ~~ eta2 command define the covariance among the latent variable prediction errors. The b31 := b11 b12 and b32 := b21 b22 commands define new parameters that described the mean differences for levels 1 and 2 of the independent variable for the two latent response variables. Latent Variable Path Model An example of a latent variable path model is shown below. In this model x 4 and x 5 are assumed to be tau-equivalent measures, y 4 and y 5 are assumed to be tau-equivalent measures, x 1, x 2, and x 3 are assumed to be congeneric measures, and y 1, y 2, and y 3 are assumed to be congeneric measures. In this model, β 11, β 22, and γ 12 are assumed to be meaningfully large with β 12 and β 21 assumed to be small and constrained to equal 0. The two latent predictor variables (ξ 1 and ξ 2 ) are assumed to be correlated. The correlation between e 1 and e 2 is assumed to be small in this example and has been constrained to equal 0. Note that the zero-constrained parameters do not appear the in the path diagram. In this model, the direct effects (β 11, β 22, γ 12 ) and the indirect effect (β 11 γ 12 ) are not attenuated due to measurement error in x 1, x 2, x 3, x 4, x 5 and y 1, y 2, y 3. In addition, the measurement error in y 4 and y 5 will not inflate the standard errors of the direct and indirect effects. 7

8 ε 1 ε 2 ε 3 y 1 y 2 y 3 δ 1 x λ y2 λ y3 δ 2 λ x2 β 11 e 1 x 2 ξ 1 η 1 δ 3 x λ 3 x3 x 4 1 σ 12 γ 12 δ 4 β 22 e ξ η δ x 5 y 4 y 5 (Model 4) ε 4 ε 5 The lavaan model specification for Model 4 is given below. path.model <- ' ksi1 =~ 1*x1 + lamx2*x2 + lamx3*x3 ksi2 =~ 1*x4 + 1*x5 eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 eta2 =~ 1*y4 + 1*y5 eta1 ~ b11*ksi1 eta2 ~ b22*ksi2 + g12*eta1 ind := b11*g12 ksi1 ~~ ksi2 ' fit <- sem(path.model, data = mydata) The ksi2 =~ 1*x4 + 1*x5 command defines ξ 2 and constrains the two factor loadings for x 4 and x 5 to equal 1. The eta2 =~ 1*y4 + 1*y5 command defines η 2 and constrains the two factor loadings for y 4 and y 5 to equal 1. The ksi1 =~ 1*x1 + lamx2*x2 + lamx3*x3 and eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 commands each define a congeneric measurement model. The ind := b11*g12 command defines the indirect effect of ξ 1 on η 2. 8

9 The ksi1 ~~ ksi2 command specifies the covariance between ξ 1 and ξ 2. The variances of δ 4 and δ 5 and the variances of ε 4 and ε 5 have not been constrained in the above model specification and define a tau-equivalent measurement model for x 4 and x 5 and a tau-equivalent measurement model for y 4 and y 5. Parallel measurement models could be defined by imposing one equality constraint on the variances of δ 1 and δ 2 and another equality constraint on the variances of ε 4 and ε 5. These equality constraints can be specified by adding the commands x4 ~~ var1*x4, x5 ~~ var1*x5, y4 ~~ var2*y4, and y5 ~~ var2*y5. The covariance between e 1 and e 2 has been constrained to equal 0 in Model 2, but this constraint could be removed by adding the command eta1 ~~ eta2 to the model specification. The β 12 = 0 constraint could be removed by changing eta2 ~ b22*ksi2 + g12*eta1 to eta2 ~ b22*ksi2 + g12*eta1 + b12ksi1. The β 21 = 0 constraint could be removed by changing eta1 ~ b11*ksi1 to eta1 ~ b11*ksi1 + b21*ksi1. However, only two of these three constraints can be removed because otherwise the model will not be identified. Latent Growth Curve Model In a longitudinal study, suppose each participant (i = 1 to n) is measured on the same set of k time points (e.g., Jan, Feb, March, Apr). In the simplest case, the purpose of the study is to assess the linear change in the response variable over time. In this simple case, the statistical model for one randomly selected participant can be expressed as y it = b 0i + b 1i t + e it (3.1) where b 0i is the y-intercept for participant i, b 1i is the slope of the line relating time to y for participant i, and t is the time point value (e.g., t = 1, 2, 3, 4). Given that the n participants are assumed to be a random sample from some population, it follows that the b 0i and b 1i values are a random sample from a population of person-level y-intercept and slope values. Equation 3.1 is called a level-1 model. In the same way that a statistical model describes a random sample of y scores, statistical models can be used to describe a random sample of b 0i and b 1i values. The statistical models for b 0i and b 1i are called level-2 models. The following level-2 models for b 0i and b 1i are the simplest type because they have no predictor variables. 9

10 b 0i = β 00 + u 0i b 1i = β 10 + u 1i (3.2a) (3.2b) where u 0i and u 1i are the parameter prediction errors for the random value of b 0i and b 1i, respectively. These parameter prediction errors are usually assumed to correlated with each other but are assumed to be uncorrelated with the level-1 prediction errors (e it ). The variance of u 0i describes the variability of the person-level y-intercepts and the variance of u 1i describes the variability of the person-level slopes in the population. A path diagram of a latent growth curve model is illustrated below (Model 5) for the case of four equally-spaced time points. Note that the factor loadings for the intercept factor (η 0 ) are all set equal to 1 and the factor loadings for the slope factor (η 1 ) are all set equal to 0, 1, 2, and 3. Setting the slope factor loadings to 0, 1,, k 1 is called baseline centering. It is necessary to constrain the y-intercepts for y 1, y 2, y k to zero in order to estimate β 00 and β 10. With baseline centering β 00 describes the population mean y score at baseline. The population mean of the person-level slopes relating time to y is described by β 10. With unequally-spaced time points, such as 1, 2, 5, and 10, the slope factor loadings could be set to 0, 1, 4, and 9. 1 β 00 β 10 u 0 u η 1 0 η y 1 y 2 y 3 y 4 (Model 5) ε 1 ε 2 ε 3 ε 4 10

11 The lavaan model specification for Model 5 is given below. The growth function works like the sem function but the growth function is more convenient for latent growth curve models because it automatically specifies the intercepts (β 00 and β 10 ) for the intercept factor and the slope factor, and the y-intercepts for y 1, y 2, y k are automatically constrained to equal to 0. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4' fit <- growth(growth.model, data = mydata) parameterestimates(fit, ci = T, level =.95) Some of the variability in b 0i and b 1i could be explained by one or more predictor variables. Suppose that b 0i and b 1i are believed to be related to just one predictor variable x 2. We can now specify the following level-2 models for b 0i and b 1i. b 0i = β 00 + β 01 x 2i + u 0i b 1i = β 10 + β 11 x 2i + u 1i (3.3a) (3.3b) A predictor variable in a level-2 model is referred to as a time-invariant covariate because it will be measured at a single point in time, usually at or before the first time period. For instance, suppose y in Model 5 represents self-esteem measured from a sample of students at four points in time (e.g., grades 3, 4, 5, and 6). A measure of extroversion at grade 3 could be used as a time-invariant predictor of self-esteem. Demographic variables such as gender, mother's education, or number of siblings are a few other examples of time-invariant covariates. The level-2 models can have zero, one, or more time-invariant covariates. The covariates for b 0i are usually, but not necessarily, the same as the covariates for b 1i. The lavaan model specification for a latent growth model with one time-invariant covariate (gender) is given below. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 inter ~ gender slope ~ gender' 11

12 In some applications, the level-1 model will include one or more predictor variables that are measured at each time period. This type of predictor variable is referred to as a timevarying covariate. Consider again the example where self-esteem is measured in grades 3, 4, 5, and 6. If academic performance is also measured each year, and we believe that selfesteem in year t is related to academic performance in year t, then the level-1 model could be expressed as y it = b 0i + b 1i t + b 2i x 1it + e it (3.4) where x 1it is an academic performance score for student i in year t. A level-1 model can have zero, one, or more time-varying covariates. The lavaan model specification for one time-invariant covariate (gender) and one time-varying covariate (self esteem) is given below. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 y1 ~ selfesteem1 y2 ~ selfesteem2 y3 ~ selfesteem3 y4 = selfesteem4 inter ~ gender slope ~ gender' The following approximate 100(1 α)% confidence interval for σ 2 β0 and σ 2 β1 provides important information about the person-level variability in the intercept and slope factors exp[ln (σ βj 2 ) ± z α/2 var{ln (σ βj 2 )} ] (3.5) where var{ln (σ βj 2 )} is the standard error of ln (σ βj 2 ). Square-roots of the endpoints of Equation 3.5 give a confidence interval for the standard deviation of the intercept or slope random variable. The computation of Equation 3.5 can be simplified by letting lavaan compute the confidence interval for ln(σ 2 βj ) and then the endpoints can be exponentiated by hand to get a confidence interval for σ 2 βj. 12

13 growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 inter ~~ varinter*inter slope ~~ varslope*slope logvarinter := log(varinter) logvarslope := log(varslope) ' fit <- growth(growth.model, data = mydata) parameterestimates(fit, ci = T, level =.95) The level-1 and level-2 models can be analyzed using mixed model statistical programs that do not require the same set of time periods for each participant. For example, mixed model programs allow one participant to be measured on occasions 1, 2, 4, 6, a second participant to be measured on occasions 3, 5, 9, and 10, a third participant to be measured on occasions 1 and 7, and so on. However, suppose one or more of the predictor variables in the level-1 or level-2 models are latent variables. Then the mixed model programs are of no use and a latent grown curve model is required. A latent growth curve model (e.g., Model 5) can be part of a more complex model where the intercept and slope factors are predictors of other observed or latent variables this type of analysis is not possible using mixed model programs. The confidence intervals for σ 2 β0 and σ 2 β1 computed in mixed model programs assume the person-level intercept and slope coefficients are normally distributed in the population. This normality assumption can be relaxed in a latent growth curve analysis using optional robust standard errors. Multiple-Group Latent Variable Models A k-group design can be represented in a GLM by including k 1 indicator variables as predictor variables in the model along with any quantitative predictor variables of y. Consider the most simple case of k = 2 groups with one quantitative predictor of y. Using dummy coding, the following model includes one quantitative predictor variable (x 1 ), one dummy coded variable (x 2 ) to code the 2-group design, and the product of x 1 and x 2 to code the interaction between x 1 and x 2 y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 (x 1i x 2i ) + e i. (3.6) Alternatively, the above model can be represented by specifying two models, one for each of the two groups as shown below 13

14 y 1i = β 10 + β 11 x 11i + e 1i y 2i = β 20 + β 21 x 21i + e 2i (3.7a) (3.7b) where the first subscript indicates group membership (1 or 2). It can be shown that that the parameters of Equation 3.6 can be expressed in terms of the parameters of Equations 3.7a and 3.7b. Specially, it can be shown that β 2 = β 10 β 20 and β 3 = β 11 β 21. Equations 3.7a and 3.7b are sometimes preferred to Equation 3.6 when the interaction effect is expected to be non-trivial and the researcher anticipates an examination of simple slopes, which are the β 11 and β 21 coefficients in Equations 3.7a and 3.7b. More importantly, if any of the quantitative predictor variables are latent variables, Equation 3.6 is of no use because it is not possible to compute the product of a dummy variable with a latent (unmeasured) variable. Programs like lavaan can be used to analyze latent variable models in which participants have been classified (e.g., male vs. female) or randomly assigned (e.g., treatment 1 vs. treatment 2 vs. treatment 3) into k 2 groups. Thus, the k-groups can represent the levels of a classification factor or a treatment factor. The k-groups could also represent the combinations of two or more classification or treatment factors. In multiple group studies, a model can be specified within each group. The model within each group could be any of the statistical models described in Module 1, any of the measurement or confirmatory factor analysis models described in Module 2, or any of the latent variable statistical models that have been described up to this point in Module 3. Interesting research questions can be addressed by comparing or combining unstandardized or standardized parameters (e.g., slopes, factor loadings, indirect effects, total effects, reliability coefficients, factor correlations, unique factor variances, means) from the multiple groups. Multiple-group measurement models can be used to assess measurement invariance. Strick measurement invariance across two or more groups assumes equal factor loadings, equal intercepts, and equal unique error variances across groups although the loadings, intercepts, and error variances may differ within groups. A path diagram for a 2-group study with a strictly parallel measurement model for each group is shown below. 14

15 ε 11 y 1 μ 11 λ y11 λ y11 η y 2 ε 21 λ y11 Group 1 μ 11 y 3 ε 31 μ 11 1 ε 12 y 1 λ y12 μ 12 λ y12 η y 2 ε 22 λ y12 Group 2 y 3 μ 12 ε 32 μ 12 1 (Model 6) The following approximate 100(1 α)% confidence interval for λ yj1 λ yj2 can be used to assess the similarity of population factor loading λ yj1 in two study populations λ yj1 λ yj2 ± z α/2 var(λ yj1 ) + var(λ yj2 ) (3.8) where var(λ yjk ) is the squared standard error of λ yjk and var(λ yj1 ) + var(λ yj2 ) is the estimated standard error of λ yj1 λ yj2. Equation 3.8 can be used for unstandardized or standardized factor loadings, although a confidence interval for a difference in standardized loadings is usually easier to interpret. The following approximate 100(1 α)% confidence interval for μ j1 μ j2 can be used to assess the similarity of population y-intercepts (means) in two study populations μ j1 μ j2 ± z α/2 var(μ j1) + var(μ j2) (3.9) 15

16 where var(μ jk)is the squared standard error of μ jk and var (μ j1 ) + var (μ j2 ) is the estimated standard error of μ j1 μ j2. An approximate 100(1 α)% confidence interval for σ 2 2 ε1 /σ ε2 can be used to assess the similarity of unique error variances in two study populations exp[ln(σ ε1 2 /σ ε2 2 ) ± z α/2 var{ln(σ ε1 2 )} + var{ln(σ ε2 2 )} ] (3.10) where var{ln(σ εk 2 )} is the squared standard error of ln(σ εk 2 ). The square-roots of the endpoints of Equation 3.10 gives a confidence interval for the ratio of unique factor standard deviations that is easier to interpret than a ratio of variances. The lavaan model specification and multiple-group sem function for Model 6 is given below. parallel.model <- ' eta =~ c(lam1, lam2)*y1 + c(lam1, lam2)*y2 + c(lam1,lam2)*y3 y1 ~~ c(var1, var2)*y1 y2 ~~ c(var1, var2)*y2 y3 ~~ c(var1, var2)*y3 lamdiff := lam1 lam2 logratio := log(var1/var2) ' fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group") parameterestimates(fit, ci = T, level =.95) The eta =~ c(lam1, lam2)*y1 + c(lam1, lam2)*y2 + c(lam1,lam2)*y3 command defines η with equality constrained factor loadings within each group but not across groups. The y1 ~~ c(var1, var2)*y1, y2 ~~ c(var1, var2)*y2, and y3 ~~ c(var1, var2)*y3 commands constrain the error variances to be equal within each group but not across groups. The data file contains four variables named y1, y2, y3, and group. The lamdiff := lam1 lam2 command creates a new parameter called lamdiff that is the difference in the common factor loading in the two groups. The logerror := log(var1/var2) command creates a new parameter called logratio which is the natural logarithm of the ratio of the common error variances in each group (var1 and var2). The parameterestimates command will compute a confidence interval for ln(σ 2 ε1 /σ 2 ε2 ) and the endpoints of this interval can be exponentiated by hand to give a confidence interval for σ 2 ε1 /σ 2 ε2. 16

17 The above code allows the loadings, intercepts, and error variances to differ across the two groups. To constrain only the loadings to be equal across groups, the group.equal option in the sem function could be used as shown below. fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group", group.equal = "loadings") The following code will equality constrain the loadings, intercepts, and error variances across groups. fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group", group.equal = c("loadings", "intercepts", "residuals")) δ 11 x 1 λ x11 ξ 1 x 2 δ 21 λ x11 β 11 y e 1 Group 1 ρ 121 δ 31 λ x21 β 21 x 3 ξ 2 δ 41 x 4 λ x21 δ 12 x 1 λ x12 ξ 1 x 2 δ 22 λ x12 β 12 y e 2 Group 2 ρ 122 δ 32 λ x22 β 22 x 3 δ 42 λ x22 (Model 7) x 4 ξ 2 The path diagram for a 2-group GLM with latent predictor variables is shown above (Model 7). In this example, the two latent predictor variables are assumed to each have two tau-equivalent indicator variables. The two slope coefficients within each group (β 1j and β 2j ) are the simple slopes and the differences in simple slopes (β 11 β 12 and β 21 β 22 ) describe the Group x ξ 1 and Group x ξ 2 interactions, respectively. 17

18 The following approximate 100(1 α)% confidence interval for a difference in slope parameters (e.g., β j1 β j2 ) can be used to assess the similarity of population slope parameters in two study populations β j1 β j2 ± z α/2 var(β j1 ) + var(β j2 ) (3.11) where var(β jk ) is the squared standard error of β jk and var(β j1 ) + var(β j2 ) is the estimated standard error of β j1 β j2. The slope estimates and their standard errors in Equation 3.9 can be replaced with standardized slope estimates and their standard errors. Let ρ 1 represent the population correlation between two factors, two prediction errors, or two measurement errors that is estimated in group 1, and let ρ 2 represent the corresponding population correlation that is estimated from group 2. Let ρ 1 and ρ 2 denote the estimates of ρ 1 and ρ 2, respectively. Let L 1 and U 1 denote the lower and upper 100(1 α)% interval estimates computed from group 1 using Equation 2.17 (Module 2), and let L 2 and U 2 denote the lower and upper 100(1 α)% interval estimates computed from group 2 using Equation 2.17 (Module 2). Approximate lower and upper 100(1 α)% interval estimates for ρ 1 ρ 2 are L = ρ 1 ρ 2 (ρ 1 L 1 ) 2 + (ρ 2 U 2 ) 2 (3.12a) U = ρ 1 ρ 2 + (ρ 1 U 1 ) 2 + (ρ 2 L 2 ) 2. (3.12b) The lavaan model specification and multiple group sem function for Model 7 is shown below. A group difference in the slope parameters is defined for the two predictor variable and then lavaan will compute Equation A group.equal = "regressions" option could be added to the sem function to equality constrain the slope coefficients across groups. twogroupgml.model <- ' ksi1 =~ c(lamx11,lamx12)*x1 + c(lamx11,lamx12)*x2 ksi2 =~ c(lamx21,lamx22)*x3 + c(lamx21,lamx22)*x4 y ~ c(b11,b12)*ksi1 + c(b21,b22)*ksi2 b1diff := b11 b12 b2diff := b21 b22 ' fit <- sem(twogroupglm.model, data = mydata, std.lv = T, group = "group") parameterestimates(fit) 18

19 Model Assessment All theoretically important included paths (slopes, factor loadings, correlations) in a latent variable model should describe meaningfully large relations. Furthermore, all excluded paths should represent small or unimportant relations. As a general recommendation for parameters that have been included in the model, standardized slope coefficients should be greater than.25 in absolute value (larger is better) and standardized factor loadings should be greater than.4 in absolute value (larger is better). Theoretically important correlations among factors or observed variables should also be greater than.25 in absolute value. Standardized factor loadings are equal to Pearson correlations in factor models with a single factor or multiple uncorrelated factors. A standardized slope for a particular response variable is equal to a Pearson correlation if there is only one predictor of that response variable or if the multiple predictor variables are uncorrelated. Confidence intervals for standardized slopes, standardized factor loadings, and correlations can be used to assess the magnitude of the parameters. Specifically, a 95% confidence interval should be completely outside the -.25 to.25 range for a standardized slope or correlation and completely outside the -.4 to.4 range for a standardized factor loading. Ideally, 95% Bonferroni confidence intervals for the included parameters will indicate that all included parameters are meaningfully large. Model modification indices are useful in assessing model misspecification. Each index is a one degree of freedom chi-square test statistic for of the null hypothesis that the parameter constraint is correct. Model modification indices for factor loadings should be examined first. The omitted factor loading with the largest modification index can be added to the model, and if the 95% confidence interval for this factor is completely contained within the -.4 to.4 range (smaller is better), then the researcher could argue that constraining this loading to zero is justifiable. If the 95% confidence interval for the standardized factor loading with the largest modification index is completely contained with the -.4 to.4 range, it is likely that all other factors loadings that were constrained to zero are also small. If the 95% confidence interval for this loading is completely outside the -.4 to.4 range, then this loading should be retained in the model and all model parameters need to be re-estimated. The factor loading with the largest modification index in the revised model should also be assessed. If the confidence interval for a standardized factor loading includes -.4 or.4, the statistical results are inconclusive and 19

20 the researcher must decide to include or exclude that factor loading based on nonstatistical criteria. After the excluded factor loadings have been assessed, the excluded slope parameters should examined. The slope parameter with the largest modification index should be added to the model. If the 95% confidence interval for this standardized slope is completely contained within the -.25 to.25 range, then the researcher could argue that constraining this slope to zero is justifiable and it is likely that all other excluded slope parameters are also small. If the 95% confidence interval for this standardized slope parameter is completely outside the -.25 to.25 range, then this slope parameter should be included in the model and the excluded slope with the largest modification index in the revised model should be examined. If the confidence interval for a standardized slope includes -.25 or.25, the statistical results are inconclusive and the researcher must decide to include or exclude that slope parameter based on non-statistical criteria. When assessing parameter similarity across groups in a multi-group design, the confidence intervals for differences of factor loadings, slope parameters, or means will ideally be acceptably narrow and include 0. Confidence intervals for ratios of standard deviations should be acceptably narrow and include 1. If the confidence interval for the difference in parameters is completely contained within a h to h interval, where h represents an acceptably small difference in parameter values, this would provide convincing evidence of parameter similarity. The value of h will depend on the parameter and the application. It is usually easier to specify h for a standardized factor loading or a standardized slope. For instance, if the difference in standardized slopes or standardized factor loadings is completely contained with a -.1 to.1 interval, this would indicate that the standardized parameter values are very similar in the two study populations. However, a small value of h, such as.1 for a standardized slope or factor loading, will require a large sample because large samples are needed to obtain narrow confidence intervals. If several constrained parameters are unconstrained after an exploratory examination of the modification indices, the p-value and confidence interval for a particular parameter in the final model can be misleading. At a minimum, all exploratory modifications should be described in the research report. Ideally, the final model will be reanalyzed in a new 20

21 random sample. If the researcher has access to a large random sample, the sample can be randomly divided into two samples with the exploratory analysis performed on the first sample and a confirmatory analysis performed in the second (validation) sample. Only the results in the validation sample should be reported. Chi-square GOF tests can be used to assess the path models in Module 1 and all of the models in Modules 2 and 3. The GOF test is a test of the null hypothesis that all constraints (e.g., all excluded parameters equal 0 and all equality constrained parameters are equal) on the model are correct. The GOF test is routinely misinterpreted. Researchers incorrectly interpret a p-value greater than.05 as evidence that the model is correct. In fact, the null hypothesis is almost never correct in any real application and the p-value can exceed.05 in small samples even if the model is badly misspecified. In large samples, the p-value for a GOF test can be much less than.05 in models that are only trivially misspecified. Chi-square model comparison tests are also very popular. In multiple group designs, one model might allow corresponding parameter values to differ across groups and another model constrains these parameters to be equal across groups. The chi-squared model comparison test in this example is a test of the null hypothesis that all corresponding parameters are equal across groups. This test is routinely misinterpreted. Researchers will interpret a p-value greater than.05 as evidence that all corresponding parameters are equal across groups, and if the p-value is greater than.05, researchers often conclude that the parameters differ significantly across groups. In fact, a p-value less than.05 does not imply the parameters are equal, and a p-value greater than.05 does not imply that the parameter values are meaningfully different. Confidence intervals for differences or ratios of parameters rather than model comparison tests are needed to determine if the corresponding parameter values are similar or dissimilar across groups. Model comparison tests are also used to compare a model that includes all of the theoretically specified path parameters with a second model that omits all of these parameters. If the p-value for the chi-square model comparison test is less than.05, researchers incorrectly interpret this result as evidence that the model with the omitted paths is incorrect or unacceptable. Despite the serious limitations of the GOF and model comparison tests, most social science journals expect authors to report the results of a GOF test and possibly the results of a model comparison test. The recommendation 21

22 here is to supplement a GOF or model modification test with appropriate confidence interval results. Equivalent Models Equivalent models are models that have the identical GOF test statistic and fit index values with identical degrees of freedom. For instance, the six models shown below (with error terms omitted) for three variables (a, b, c) are all equivalent models with df = 1. Additional equivalent models can be specified by replacing a one-headed arrow in any of these models with a two-headed arrow. a b c (Model 9a) a c b (Model 9b) b a c (Model 9c) b c a (Model 9d) c a b (Model 9e) c b a (Model 9f) When presenting the results for a proposed model, it is important to acknowledge the existence of equivalent models because different equivalent models can have substantially different interpretations and causal implications. Some equivalent models can be ruled out based on theory or logic. In applications where two or more plausible theories are represented by equivalent models, alternative models should be acknowledged when presenting the results of the proposed model. 22

23 Assumptions The GOF tests, model comparison tests, and all confidence intervals assume: 1) random sampling, 2) independence among the n participants, and 3) the observed random variables have an approximate multivariate normal distribution in the study population. The standard errors for path parameters, factor loadings, and correlations are sensitive primarily to the kurtosis of the observed variables. The standard errors will be too small with leptokurtic distributions and too large with platykurtic distributions. Since confidence interval results provide the best way to assess a model, leptokurtosis is more serious than platykurtosis because the confidence intervals will be misleadingly narrow with leptokurtic distributions. As noted in Module 2, if the normality assumption for any particular observed variable has been violated, it might be possible to reduce skewness and kurtosis by transforming that variable. Data transformations might also help reduce nonlinearity and heteroscedasticity. If remedial measures cannot remove excess kurtosis, confidence intervals should be computed using robust or bootstrap standard errors. The recommendation to have a sample size of at least 100 when using ULS estimation and robust standard errors in measurement models and confirmatory factory analysis models also applies to general latent variable statistical models. For indirect and total effects, which can have highly nonnormal sampling distributions, bootstrap confidence intervals based on ULS estimates are recommended. For GOF tests and fit indices, the mean adjusted (Satorra-Bentler) test statistic based on ML estimates is recommended. Sample Size Recommendations There are two completely separate issues regarding sample size requirements for the tests and confidence intervals presented in Modules 1, 2, and 3. One issue is the sample size required for a test or confidence interval to perform properly. A 95% confidence interval for some parameter is said to perform properly in a sample of size n if about 95% of the confidence intervals computed from all possible samples of size n would contain the parameter value. A hypothesis test with α =.05 in a sample of size n is said to perform properly if the null hypothesis would be rejected in about 5% of all possible samples of size n assuming the null hypothesis is true. All of the tests and confidence intervals for the GLM and MGLM are small-sample methods and will perform properly in small samples if their assumptions (e.g., random sampling, independence among participants, 23

24 prediction error normality) have been satisfied. Tests and confidence intervals for latent variable models are referred to as large-sample methods that cannot be expected to perform properly in small samples. If the observed variables are platykurtic or at most moderately leptokurtic, confidence intervals based on USL estimates with robust standard errors should be acceptable with sample sizes of at least 100. Confidence intervals based on ML estimates with robust standard errors can require larger sample sizes (e.g. 200 or more) especially if the model contains many parameters to be estimated. The sample size needed to obtain acceptably narrow confidence intervals is a completely different issue. If the confidence intervals are too wide, the researcher will not be able to provide convincing evidence that the population factor loadings or slope parameters that have been included in the model are meaningfully large. Narrow confidence intervals are also needed to show that factor loadings and slope parameters that have been excluded from the model are small or unimportant. Large sample sizes are usually needed to obtain acceptably narrow confidence intervals possibly much larger than the minimum sample size needed for a robust test or confidence interval to perform properly. Sample size formulas to achieve desired confidence interval width for latent variable model parameters are not useful because they require accurate planning values of unknown population variances and covariances. A more practical approach is to use a sample size that would produce an acceptably narrow confidence interval for a Pearson correlation (ρ yx ) between any two observed variables because the estimated slopes and factor loadings are functions of the sample correlations. The required sample size to estimate ρ yx with 100(1 α)% confidence and a desired confidence interval width equal to w is approximately n = 4(1 ρ yx 2 ) 2 (z α/2 /w) (3.13) where ρ yx is a planning value of the Pearson correlation between observed variables y and x. Equation 3.13 could be used to obtain a rough approximation to the sample size needed to show that certain factor loadings or slope parameters are small. Small factor 2 loadings or slope parameters imply that certain correlations are small and ρ yx could then be set to 0. For instance, to obtain a 95% confidence interval for ρ yx that has a width of.2, 24

25 Equation 3.13 gives a sample size requirement of 388, which is substantially greater than the recommended minimum sample size requirement of 200 (assuming at most moderate leptokurtosis) for ULS estimates with robust standard errors and the mean adjusted (Satorra-Bentler) GOF test with ML estimates. If the sample can be obtained in two stages, the number of participants to sample in the second stage and added to the first-stage sample to achieve the desired confidence interval width (w) of a confidence interval for a specific parameter is approximately equal to n 2 = n 1 [(w 1 /w) 2 1] (3.14) where n 1 is the size of the first-stage sample and w 1 is the width (upper limit minus lower limit) of a confidence interval for a particular parameter obtained in the first-stage sample. A second-stage sample of size n 2 is taken from the same study population and combined with the first-stage sample. The parameters of the latent variable model are then estimated from the combined sample of size n 1 + n 2. The precision of a confidence interval for a standard deviation or a ratio of standard deviations is best described by the upper limit to lower limit ratio rather than the difference. Let r 1 denote the upper limit to lower limit ratio for a standard deviation or a ratio of standard deviations in a first-stage sample of size n 1. The number of participants that should be sampled in the second stage and added to the first-stage sample to achieve the desired upper limit to lower limit ratio (r) is approximately equal to n 2 = n 1 [{ln(r 1 )/ln(r)} 2 1]. (3.15) 25

Prerequisite Material

Prerequisite Material Prerequisite Material Study Populations and Random Samples A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population

More information

Module 2. General Linear Model

Module 2. General Linear Model D.G. Bonett (9/018) Module General Linear Model The relation between one response variable (y) and q 1 predictor variables (x 1, x,, x q ) for one randomly selected person can be represented by the following

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

Moderation 調節 = 交互作用

Moderation 調節 = 交互作用 Moderation 調節 = 交互作用 Kit-Tai Hau 侯傑泰 JianFang Chang 常建芳 The Chinese University of Hong Kong Based on Marsh, H. W., Hau, K. T., Wen, Z., Nagengast, B., & Morin, A. J. S. (in press). Moderation. In Little,

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Exploring Cultural Differences with Structural Equation Modelling

Exploring Cultural Differences with Structural Equation Modelling Exploring Cultural Differences with Structural Equation Modelling Wynne W. Chin University of Calgary and City University of Hong Kong 1996 IS Cross Cultural Workshop slide 1 The objectives for this presentation

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith

More information

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

1. How will an increase in the sample size affect the width of the confidence interval?

1. How will an increase in the sample size affect the width of the confidence interval? Study Guide Concept Questions 1. How will an increase in the sample size affect the width of the confidence interval? 2. How will an increase in the sample size affect the power of a statistical test?

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Review of the General Linear Model

Review of the General Linear Model Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Sociology 593 Exam 2 March 28, 2002

Sociology 593 Exam 2 March 28, 2002 Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that

More information

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Introduction to Structural Equation Modeling Lecture #1 January 11, 2012 ERSH 8750: Lecture 1 Today s Class Introduction

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Confirmatory Factor Analysis. Psych 818 DeShon

Confirmatory Factor Analysis. Psych 818 DeShon Confirmatory Factor Analysis Psych 818 DeShon Purpose Takes factor analysis a few steps further. Impose theoretically interesting constraints on the model and examine the resulting fit of the model with

More information

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models Chapter 8 Models with Structural and Measurement Components Good people are good because they've come to wisdom through failure. Overview William Saroyan Characteristics of SR models Estimation of SR models

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab

Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab Applied Statistics Lab Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab SEM Model 3.64 7.32 Education 2.6 Income 2.1.6.83 Charac. of Individuals 1 5.2e-06 -.62 2.62

More information

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Nesting and Equivalence Testing

Nesting and Equivalence Testing Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra

More information

Answer Key: Problem Set 6

Answer Key: Problem Set 6 : Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients SPSS Output Homework 1-1e ANOVA a Sum of Squares df Mean Square F Sig. 1 Regression 351.056 1 351.056 11.295.002 b Residual 932.412 30 31.080 Total 1283.469 31 a. Dependent Variable: Sexual Harassment

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

LECTURE 11. Introduction to Econometrics. Autocorrelation

LECTURE 11. Introduction to Econometrics. Autocorrelation LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct

More information

ANCOVA. Lecture 9 Andrew Ainsworth

ANCOVA. Lecture 9 Andrew Ainsworth ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.

More information

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?) 12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang Use in experiment, quasi-experiment

More information

Systematic error, of course, can produce either an upward or downward bias.

Systematic error, of course, can produce either an upward or downward bias. Brief Overview of LISREL & Related Programs & Techniques (Optional) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 STRUCTURAL AND MEASUREMENT MODELS:

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter 3: Testing alternative models of data

Chapter 3: Testing alternative models of data Chapter 3: Testing alternative models of data William Revelle Northwestern University Prepared as part of course on latent variable analysis (Psychology 454) and as a supplement to the Short Guide to R

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Outline

Outline 2559 Outline cvonck@111zeelandnet.nl 1. Review of analysis of variance (ANOVA), simple regression analysis (SRA), and path analysis (PA) 1.1 Similarities and differences between MRA with dummy variables

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

Structural Equation Modeling

Structural Equation Modeling Chapter 11 Structural Equation Modeling Hans Baumgartner and Bert Weijters Hans Baumgartner, Smeal College of Business, The Pennsylvania State University, University Park, PA 16802, USA, E-mail: jxb14@psu.edu.

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

What is in the Book: Outline

What is in the Book: Outline Estimating and Testing Latent Interactions: Advancements in Theories and Practical Applications Herbert W Marsh Oford University Zhonglin Wen South China Normal University Hong Kong Eaminations Authority

More information

Chapter 3 ANALYSIS OF RESPONSE PROFILES

Chapter 3 ANALYSIS OF RESPONSE PROFILES Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over

More information

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests last lecture: introduction to factorial designs next lecture: factorial between-ps ANOVA II: (effect sizes and follow-up tests) 1 general

More information

Advanced Structural Equations Models I

Advanced Structural Equations Models I This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

SEM 2: Structural Equation Modeling

SEM 2: Structural Equation Modeling SEM 2: Structural Equation Modeling Week 1 - Causal modeling and SEM Sacha Epskamp 18-04-2017 Course Overview Mondays: Lecture Wednesdays: Unstructured practicals Three assignments First two 20% of final

More information

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables

More information

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM Using Mplus individual residual plots for diagnostics and model evaluation in SEM Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 20 October 31, 2017 1 Introduction A variety of plots are available

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina

Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology Jeffrey R. Edwards University of North Carolina 1 Outline I. Types of Difference Scores II. Questions Difference

More information

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755 Mixed- Model Analysis of Variance Sohad Murrar & Markus Brauer University of Wisconsin- Madison The SAGE Encyclopedia of Educational Research, Measurement and Evaluation Target Word Count: 3000 - Actual

More information

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Course Packet The purpose of this packet is to show you one particular dataset and how it is used in

More information

Module 1. Study Populations

Module 1. Study Populations Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population usually consists of a specific

More information

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Consequences of measurement error. Psychology 588: Covariance structure and factor models

Consequences of measurement error. Psychology 588: Covariance structure and factor models Consequences of measurement error Psychology 588: Covariance structure and factor models Scaling indeterminacy of latent variables Scale of a latent variable is arbitrary and determined by a convention

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Assessing the relation between language comprehension and performance in general chemistry. Appendices Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,

More information

ADVANCED C. MEASUREMENT INVARIANCE SEM REX B KLINE CONCORDIA

ADVANCED C. MEASUREMENT INVARIANCE SEM REX B KLINE CONCORDIA ADVANCED SEM C. MEASUREMENT INVARIANCE REX B KLINE CONCORDIA C C2 multiple model 2 data sets simultaneous C3 multiple 2 populations 2 occasions 2 methods C4 multiple unstandardized constrain to equal fit

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information