Module 3. Latent Variable Statistical Models. y 1 y2
|
|
- Rosemary Barrett
- 5 years ago
- Views:
Transcription
1 Module 3 Latent Variable Statistical Models As explained in Module 2, measurement error in a predictor variable will result in misleading slope coefficients, and measurement error in the response variable will result in inflated standard errors. These problems can be reduced by using latent variable statistical models in which the measurement models described in Module 2 are integrated into any of the statistical models described in Module 1. Statistical models can be specified in terms of true scores (from a strictly parallel, parallel, or tau-equivalent model) or factor scores (from a congeneric or factor analysis model). True scores and factor scores will be referred to as latent variable scores. There are several types of analyses that benefit from an analysis of latent variable scores. In a GLM where x 1 is the predictor variable of primary interest and one or more confounding variables have been included in the model, if the confounding variables are measured with error, their confounding effects will only be partially removed from the relation between x 1 and y. If the confounding variables are represented by latent variables, then the effects of the confounding variables can be more effectively removed from the relation between x 1 and y. If two or more predictor variables measure highly similar attributes, multicollinearity problems can be avoided by using those predictor variables as indicators of a single latent variable. If two or more response variables measure highly similar attributes, the model will contain fewer path coefficients if those response variables are used as indicators of a single latent variable. An analysis of indirect effects is another type of analysis where analyzing latent variables is preferred to analyzing variables that are measured with error. Consider the path model illustrated below. e 1 e 2 x1 y 1 y2 β 11 γ 12 1
2 Measurement error in x 1 attenuates β 11, and measurement error in y 1 attenuates γ 12. If ρ x1 and ρ y1 are the reliabilities of x 1 and y 1, then the indirect effect β 11 γ 12 is attenuated by a factor of ρ x1 ρ y1. For instance, if both reliabilities equal.5, then the indirect effect would be attenuated by a factor of. 5(. 5) =.5. Furthermore, measurement error in y 1 and y 2 will inflate the standard errors of both path coefficients, which in turn will inflate the standard error for the indirect effect. Latent variable statistical models are also attractive in applications where the factor scores in a congeneric or factor analysis model represent a better approximation to the psychological construct under investigation than what could be measured using a single measurement of the construct. For instance, if spatial ability is an important variable in a statistical model, it could be assessed using a single test such as the y 1 = Card Rotation Test, y 2 = Hidden Figures Test, y 3 = Gestalt Picture Completion Test, or y 4 = Surface Development Test. However, each of these tests assesses only a particular aspect of spatial ability, and it could be argued that the factor scores in a congeneric model for y 1, y 2, y 3, and y 4 represent a more meaningful and complete representation of spatial ability. A more general notational scheme is needed for latent variable statistical models in which some latent variables are predictor variables and some latent variables are response variables. Latent predictor variables are represented by ξ, and latent response variables are represented by η. The indicators of ξ are represented by x, and the indicators of η are represented by y. The unique factors (or measurement errors) are represented by δ for ξ and by ε for η. The factor loadings for η are represented by λ y and the factor loadings for ξ are represented by λ x. Several basic types of latent variable statistical models are described below. A path diagram and the lavaan code is given for each example. Latent Variable Regression Model As already stated, response variable measurement error inflates the standard errors of the slope estimates and predictor variable measurement can attenuate or exaggerate the slope estimates depending on the pattern of correlations among the predictor variables. Furthermore, when the measurements of a response variable and a predictor variable are obtained using a common method (e.g., both are self-report measures or both are 5-point Likert scale measures), the strength of the relation between the response variable and predictor variable can be exaggerated due to common-method variance (CMV). Suppose a 2
3 sample of employees are asked to self report their level of commitment to the organization and also self report their level of job performance. Some employees will overstate their true level of commitment and job performance while other employees will understate their true level of commitment and job performance and this will exaggerate the estimated correlation between organizational commitment and job performance. If CMV is a potential concern, each attribute can be measured using two or more measurement methods. A path diagram for a simple linear regression model where the predictor variable and the response variable have been measured using the same three measurement methods is illustrated below. In this example, assume that x 1 and y 1 have been measured using the same method, x 2 and y 2 have been measured using the same method, and x 3 and y 3 have been measured using the same method. This model includes covariances among the three pairs of unique factors that have a common measurement method. The estimate of β 1 could be substantially exaggerated if these covariances are not included in the model. ξ 1 β 1 η 1 e 1 λ x2 λ x3 1 λ y2 λ y3 x 1 x 2 x 3 y 1 y 2 y 3 δ 1 δ 2 δ 3 ε 1 ε 2 ε 3 (Model 1) σ δ1 ε 1 σ δ2 ε 2 σ δ3 ε 3 The lavaan model specification for Model 1 is given below. reg.model <- ' ksi =~ 1*x1 + x2 + x3 eta =~ 1*y1 + y2 + y3 eta ~ ksi x1 ~~ y1 x2 ~~ y2 x3 ~~ y3 ' 3
4 The ksi =~ 1*x1 + x2 + x3 command defines ξ 1 and constrains the factor loading for x 1 to equal 1. The eta =~ 1*y1 + y2 + y3 command defines η 1 and constrains the factor loadings for y 1 to equal 1. The eta ~ ksi command defines the simple linear regression model with η 1 as the response variable and ξ 1 as the predictor variable. The x1 ~~ y1, x2 ~~ y2, and x3 ~~ y3 commands specify the covariances among the pairs of measurements that used a common method of measurement. ANCOVA Model with Latent Covariates An ANCOVA model in a nonexperimental design that includes one or more confounding variables as covariates, can remove the linear confounding effects of the covariates and provide an estimate of the treatment effect that more closely approximates the causal effect of treatment. However, if any of the covariates are measured with error, then the confounding effects are only partially removed and the estimated effect of treatment can be misleading. The path diagram of a 2-group ANCOVA model with two latent covariates is shown below where x 5 is a dummy variable that codes group membership (group1 = Treatment 1 and group 2 = Treatment 2). The β 3 coefficient describes the difference in the two population treatment means after controlling for differences in the latent covariates (ξ 1 and ξ 2 ). The variance of e represents the within-group error variance. x 5 β 3 y e x 4 δ 4 1 ξ 2 β 2 x 3 δ 3 1 β 1 δ 2 1 x 2 δ 1 1 x 1 ξ 1 (Model 2) 4
5 The lavaan model specification for Model 1 is given below. ancova.model <- ' ksi1 =~ 1*x1 + 1*x2 ksi2 =~ 1*x3 + 1*x4 y ~ b3*x5 + b2*ksi2 + b1*ksi1 ksi1 ~~ ksi2 ksi1 ~~ x5 ksi2 ~~ x5 ' The ksi1 =~ 1*x1 + 1*x2 command defines ξ 1 and constrains the two factor loadings for x 1 and x 2 to equal 1. The ksi2 =~ 1*x3 + 1*x4 command defines ξ 2 and constrains the two factor loadings for x 3 and x 4 to equal 1. The ksi1 ~~ ksi2, ksi ~~ x5, and ksi2 ~~ x5 commands specify the covariances among x 5, ξ 1, and ξ 2. The variances of δ 1,δ 2, δ 3, and δ 4 are unconstrained in this example to define a tau-equivalent measurement model for x 1 and x 2 and a tau-equivalent measurement model for x 3 and x 4. Parallel measurement models could be defined by imposing an equality constraint on the variances of δ 1 and δ 2 and an equality constraint on the variances of δ 3 and δ 4 by adding the commands x1 ~~ var1*x1, x2 ~~ var1*x2, x3 ~~ var2*x2, and x4 ~~ var2*x4. MANOVA with Latent Response Variables As explained in Module 1, a one-way MANOVA can be used to test the null hypothesis that the population means of all r response variables are equal across all levels of the independent variable. This test does not provide useful scientific information because the null hypothesis is known to be false in virtually every application. Useful scientific or practical information can be obtained by computing Bonferroni confidence intervals for all pairwise group differences of means and for all r response variables. There are r[k(k 1)/2] pairwise comparisons in a k-group design, but analyzing and reporting all these results could be intractable for unless k and r are both small. Furthermore, a Bonferroni adjustment for so many comparisons could produce uselessly wide confidence intervals. If the r response variables represent congeneric indicators of q factors, then only q[k(k 1)/2] pairwise comparisons need to be examined. More importantly, the q factors might have greater psychological meaning than any of the r individual response variables. 5
6 A path diagram of a 3-group MANOVA with q = 2 sets of congeneric measures is shown below where x 1 and x 2 are dummy variables that code group membership. y 1 ε 1 η 1 x 1 β 11 λ y2 y ε λ y3 β 12 e 1 ε 3 y 3 β 21 e 2 ε 4 y 4 x 2 η 2 1 β 22 λ y5 ε 5 y 5 λ y6 y 6 ε 6 λ y7 y 7 ε 7 (Model 3) In the above model with dummy coding (where x j = 1 if group = j, 0 otherwise), β 11 describes the difference in the means of η 1 for levels 1 and 3 of the independent variable, and β 21 describes the difference in the means of η 1 for levels 2 and 3 of the independent variable. Likewise, β 12 describes the difference in the means of η 2 for levels 1 and 3 of the independent variable, and β 22 describes the difference in the means of η 2 for levels 2 and 3 of the independent variable. The mean differences for levels 1 and 2 are equal to β 11 β 21 and β 12 β 22. The lavaan model definition for Model 2 is given below. 6
7 manova.model <- ' eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 eta2 =~ 1*y4 + lamy5*y5 + lamy6*y6 + lamy7*y7 eta1 ~ b11*x1 + b21*x2 eta2 ~ b21*x1 + b22*x2 eta1 ~~ eta2 b31 := b11 b12 b32 := b21 b22 ' The eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 and eta2 =~ 1*y4 + lamy5*y5 + lamy6*y6 + lamy7*y7 commands define the two factors where the first loading in each set of congeneric measurement is set equal to 1 to identify each measurement model. The eta1 ~ b11*x1 + b21*x2 and eta2 ~ b21*x1 + b22*x2 commands define the 1-way MANOVA model. The eta1 ~~ eta2 command define the covariance among the latent variable prediction errors. The b31 := b11 b12 and b32 := b21 b22 commands define new parameters that described the mean differences for levels 1 and 2 of the independent variable for the two latent response variables. Latent Variable Path Model An example of a latent variable path model is shown below. In this model x 4 and x 5 are assumed to be tau-equivalent measures, y 4 and y 5 are assumed to be tau-equivalent measures, x 1, x 2, and x 3 are assumed to be congeneric measures, and y 1, y 2, and y 3 are assumed to be congeneric measures. In this model, β 11, β 22, and γ 12 are assumed to be meaningfully large with β 12 and β 21 assumed to be small and constrained to equal 0. The two latent predictor variables (ξ 1 and ξ 2 ) are assumed to be correlated. The correlation between e 1 and e 2 is assumed to be small in this example and has been constrained to equal 0. Note that the zero-constrained parameters do not appear the in the path diagram. In this model, the direct effects (β 11, β 22, γ 12 ) and the indirect effect (β 11 γ 12 ) are not attenuated due to measurement error in x 1, x 2, x 3, x 4, x 5 and y 1, y 2, y 3. In addition, the measurement error in y 4 and y 5 will not inflate the standard errors of the direct and indirect effects. 7
8 ε 1 ε 2 ε 3 y 1 y 2 y 3 δ 1 x λ y2 λ y3 δ 2 λ x2 β 11 e 1 x 2 ξ 1 η 1 δ 3 x λ 3 x3 x 4 1 σ 12 γ 12 δ 4 β 22 e ξ η δ x 5 y 4 y 5 (Model 4) ε 4 ε 5 The lavaan model specification for Model 4 is given below. path.model <- ' ksi1 =~ 1*x1 + lamx2*x2 + lamx3*x3 ksi2 =~ 1*x4 + 1*x5 eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 eta2 =~ 1*y4 + 1*y5 eta1 ~ b11*ksi1 eta2 ~ b22*ksi2 + g12*eta1 ind := b11*g12 ksi1 ~~ ksi2 ' fit <- sem(path.model, data = mydata) The ksi2 =~ 1*x4 + 1*x5 command defines ξ 2 and constrains the two factor loadings for x 4 and x 5 to equal 1. The eta2 =~ 1*y4 + 1*y5 command defines η 2 and constrains the two factor loadings for y 4 and y 5 to equal 1. The ksi1 =~ 1*x1 + lamx2*x2 + lamx3*x3 and eta1 =~ 1*y1 + lamy2*y2 + lamy3*y3 commands each define a congeneric measurement model. The ind := b11*g12 command defines the indirect effect of ξ 1 on η 2. 8
9 The ksi1 ~~ ksi2 command specifies the covariance between ξ 1 and ξ 2. The variances of δ 4 and δ 5 and the variances of ε 4 and ε 5 have not been constrained in the above model specification and define a tau-equivalent measurement model for x 4 and x 5 and a tau-equivalent measurement model for y 4 and y 5. Parallel measurement models could be defined by imposing one equality constraint on the variances of δ 1 and δ 2 and another equality constraint on the variances of ε 4 and ε 5. These equality constraints can be specified by adding the commands x4 ~~ var1*x4, x5 ~~ var1*x5, y4 ~~ var2*y4, and y5 ~~ var2*y5. The covariance between e 1 and e 2 has been constrained to equal 0 in Model 2, but this constraint could be removed by adding the command eta1 ~~ eta2 to the model specification. The β 12 = 0 constraint could be removed by changing eta2 ~ b22*ksi2 + g12*eta1 to eta2 ~ b22*ksi2 + g12*eta1 + b12ksi1. The β 21 = 0 constraint could be removed by changing eta1 ~ b11*ksi1 to eta1 ~ b11*ksi1 + b21*ksi1. However, only two of these three constraints can be removed because otherwise the model will not be identified. Latent Growth Curve Model In a longitudinal study, suppose each participant (i = 1 to n) is measured on the same set of k time points (e.g., Jan, Feb, March, Apr). In the simplest case, the purpose of the study is to assess the linear change in the response variable over time. In this simple case, the statistical model for one randomly selected participant can be expressed as y it = b 0i + b 1i t + e it (3.1) where b 0i is the y-intercept for participant i, b 1i is the slope of the line relating time to y for participant i, and t is the time point value (e.g., t = 1, 2, 3, 4). Given that the n participants are assumed to be a random sample from some population, it follows that the b 0i and b 1i values are a random sample from a population of person-level y-intercept and slope values. Equation 3.1 is called a level-1 model. In the same way that a statistical model describes a random sample of y scores, statistical models can be used to describe a random sample of b 0i and b 1i values. The statistical models for b 0i and b 1i are called level-2 models. The following level-2 models for b 0i and b 1i are the simplest type because they have no predictor variables. 9
10 b 0i = β 00 + u 0i b 1i = β 10 + u 1i (3.2a) (3.2b) where u 0i and u 1i are the parameter prediction errors for the random value of b 0i and b 1i, respectively. These parameter prediction errors are usually assumed to correlated with each other but are assumed to be uncorrelated with the level-1 prediction errors (e it ). The variance of u 0i describes the variability of the person-level y-intercepts and the variance of u 1i describes the variability of the person-level slopes in the population. A path diagram of a latent growth curve model is illustrated below (Model 5) for the case of four equally-spaced time points. Note that the factor loadings for the intercept factor (η 0 ) are all set equal to 1 and the factor loadings for the slope factor (η 1 ) are all set equal to 0, 1, 2, and 3. Setting the slope factor loadings to 0, 1,, k 1 is called baseline centering. It is necessary to constrain the y-intercepts for y 1, y 2, y k to zero in order to estimate β 00 and β 10. With baseline centering β 00 describes the population mean y score at baseline. The population mean of the person-level slopes relating time to y is described by β 10. With unequally-spaced time points, such as 1, 2, 5, and 10, the slope factor loadings could be set to 0, 1, 4, and 9. 1 β 00 β 10 u 0 u η 1 0 η y 1 y 2 y 3 y 4 (Model 5) ε 1 ε 2 ε 3 ε 4 10
11 The lavaan model specification for Model 5 is given below. The growth function works like the sem function but the growth function is more convenient for latent growth curve models because it automatically specifies the intercepts (β 00 and β 10 ) for the intercept factor and the slope factor, and the y-intercepts for y 1, y 2, y k are automatically constrained to equal to 0. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4' fit <- growth(growth.model, data = mydata) parameterestimates(fit, ci = T, level =.95) Some of the variability in b 0i and b 1i could be explained by one or more predictor variables. Suppose that b 0i and b 1i are believed to be related to just one predictor variable x 2. We can now specify the following level-2 models for b 0i and b 1i. b 0i = β 00 + β 01 x 2i + u 0i b 1i = β 10 + β 11 x 2i + u 1i (3.3a) (3.3b) A predictor variable in a level-2 model is referred to as a time-invariant covariate because it will be measured at a single point in time, usually at or before the first time period. For instance, suppose y in Model 5 represents self-esteem measured from a sample of students at four points in time (e.g., grades 3, 4, 5, and 6). A measure of extroversion at grade 3 could be used as a time-invariant predictor of self-esteem. Demographic variables such as gender, mother's education, or number of siblings are a few other examples of time-invariant covariates. The level-2 models can have zero, one, or more time-invariant covariates. The covariates for b 0i are usually, but not necessarily, the same as the covariates for b 1i. The lavaan model specification for a latent growth model with one time-invariant covariate (gender) is given below. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 inter ~ gender slope ~ gender' 11
12 In some applications, the level-1 model will include one or more predictor variables that are measured at each time period. This type of predictor variable is referred to as a timevarying covariate. Consider again the example where self-esteem is measured in grades 3, 4, 5, and 6. If academic performance is also measured each year, and we believe that selfesteem in year t is related to academic performance in year t, then the level-1 model could be expressed as y it = b 0i + b 1i t + b 2i x 1it + e it (3.4) where x 1it is an academic performance score for student i in year t. A level-1 model can have zero, one, or more time-varying covariates. The lavaan model specification for one time-invariant covariate (gender) and one time-varying covariate (self esteem) is given below. growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 y1 ~ selfesteem1 y2 ~ selfesteem2 y3 ~ selfesteem3 y4 = selfesteem4 inter ~ gender slope ~ gender' The following approximate 100(1 α)% confidence interval for σ 2 β0 and σ 2 β1 provides important information about the person-level variability in the intercept and slope factors exp[ln (σ βj 2 ) ± z α/2 var{ln (σ βj 2 )} ] (3.5) where var{ln (σ βj 2 )} is the standard error of ln (σ βj 2 ). Square-roots of the endpoints of Equation 3.5 give a confidence interval for the standard deviation of the intercept or slope random variable. The computation of Equation 3.5 can be simplified by letting lavaan compute the confidence interval for ln(σ 2 βj ) and then the endpoints can be exponentiated by hand to get a confidence interval for σ 2 βj. 12
13 growth.model <- ' inter =~ 1*y1 + 1*y2 + 1*y3 + 1*y4 slope =~ 0*y1 + l*y2 + 2*y3 + 3*y4 inter ~~ varinter*inter slope ~~ varslope*slope logvarinter := log(varinter) logvarslope := log(varslope) ' fit <- growth(growth.model, data = mydata) parameterestimates(fit, ci = T, level =.95) The level-1 and level-2 models can be analyzed using mixed model statistical programs that do not require the same set of time periods for each participant. For example, mixed model programs allow one participant to be measured on occasions 1, 2, 4, 6, a second participant to be measured on occasions 3, 5, 9, and 10, a third participant to be measured on occasions 1 and 7, and so on. However, suppose one or more of the predictor variables in the level-1 or level-2 models are latent variables. Then the mixed model programs are of no use and a latent grown curve model is required. A latent growth curve model (e.g., Model 5) can be part of a more complex model where the intercept and slope factors are predictors of other observed or latent variables this type of analysis is not possible using mixed model programs. The confidence intervals for σ 2 β0 and σ 2 β1 computed in mixed model programs assume the person-level intercept and slope coefficients are normally distributed in the population. This normality assumption can be relaxed in a latent growth curve analysis using optional robust standard errors. Multiple-Group Latent Variable Models A k-group design can be represented in a GLM by including k 1 indicator variables as predictor variables in the model along with any quantitative predictor variables of y. Consider the most simple case of k = 2 groups with one quantitative predictor of y. Using dummy coding, the following model includes one quantitative predictor variable (x 1 ), one dummy coded variable (x 2 ) to code the 2-group design, and the product of x 1 and x 2 to code the interaction between x 1 and x 2 y i = β 0 + β 1 x 1i + β 2 x 2i + β 3 (x 1i x 2i ) + e i. (3.6) Alternatively, the above model can be represented by specifying two models, one for each of the two groups as shown below 13
14 y 1i = β 10 + β 11 x 11i + e 1i y 2i = β 20 + β 21 x 21i + e 2i (3.7a) (3.7b) where the first subscript indicates group membership (1 or 2). It can be shown that that the parameters of Equation 3.6 can be expressed in terms of the parameters of Equations 3.7a and 3.7b. Specially, it can be shown that β 2 = β 10 β 20 and β 3 = β 11 β 21. Equations 3.7a and 3.7b are sometimes preferred to Equation 3.6 when the interaction effect is expected to be non-trivial and the researcher anticipates an examination of simple slopes, which are the β 11 and β 21 coefficients in Equations 3.7a and 3.7b. More importantly, if any of the quantitative predictor variables are latent variables, Equation 3.6 is of no use because it is not possible to compute the product of a dummy variable with a latent (unmeasured) variable. Programs like lavaan can be used to analyze latent variable models in which participants have been classified (e.g., male vs. female) or randomly assigned (e.g., treatment 1 vs. treatment 2 vs. treatment 3) into k 2 groups. Thus, the k-groups can represent the levels of a classification factor or a treatment factor. The k-groups could also represent the combinations of two or more classification or treatment factors. In multiple group studies, a model can be specified within each group. The model within each group could be any of the statistical models described in Module 1, any of the measurement or confirmatory factor analysis models described in Module 2, or any of the latent variable statistical models that have been described up to this point in Module 3. Interesting research questions can be addressed by comparing or combining unstandardized or standardized parameters (e.g., slopes, factor loadings, indirect effects, total effects, reliability coefficients, factor correlations, unique factor variances, means) from the multiple groups. Multiple-group measurement models can be used to assess measurement invariance. Strick measurement invariance across two or more groups assumes equal factor loadings, equal intercepts, and equal unique error variances across groups although the loadings, intercepts, and error variances may differ within groups. A path diagram for a 2-group study with a strictly parallel measurement model for each group is shown below. 14
15 ε 11 y 1 μ 11 λ y11 λ y11 η y 2 ε 21 λ y11 Group 1 μ 11 y 3 ε 31 μ 11 1 ε 12 y 1 λ y12 μ 12 λ y12 η y 2 ε 22 λ y12 Group 2 y 3 μ 12 ε 32 μ 12 1 (Model 6) The following approximate 100(1 α)% confidence interval for λ yj1 λ yj2 can be used to assess the similarity of population factor loading λ yj1 in two study populations λ yj1 λ yj2 ± z α/2 var(λ yj1 ) + var(λ yj2 ) (3.8) where var(λ yjk ) is the squared standard error of λ yjk and var(λ yj1 ) + var(λ yj2 ) is the estimated standard error of λ yj1 λ yj2. Equation 3.8 can be used for unstandardized or standardized factor loadings, although a confidence interval for a difference in standardized loadings is usually easier to interpret. The following approximate 100(1 α)% confidence interval for μ j1 μ j2 can be used to assess the similarity of population y-intercepts (means) in two study populations μ j1 μ j2 ± z α/2 var(μ j1) + var(μ j2) (3.9) 15
16 where var(μ jk)is the squared standard error of μ jk and var (μ j1 ) + var (μ j2 ) is the estimated standard error of μ j1 μ j2. An approximate 100(1 α)% confidence interval for σ 2 2 ε1 /σ ε2 can be used to assess the similarity of unique error variances in two study populations exp[ln(σ ε1 2 /σ ε2 2 ) ± z α/2 var{ln(σ ε1 2 )} + var{ln(σ ε2 2 )} ] (3.10) where var{ln(σ εk 2 )} is the squared standard error of ln(σ εk 2 ). The square-roots of the endpoints of Equation 3.10 gives a confidence interval for the ratio of unique factor standard deviations that is easier to interpret than a ratio of variances. The lavaan model specification and multiple-group sem function for Model 6 is given below. parallel.model <- ' eta =~ c(lam1, lam2)*y1 + c(lam1, lam2)*y2 + c(lam1,lam2)*y3 y1 ~~ c(var1, var2)*y1 y2 ~~ c(var1, var2)*y2 y3 ~~ c(var1, var2)*y3 lamdiff := lam1 lam2 logratio := log(var1/var2) ' fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group") parameterestimates(fit, ci = T, level =.95) The eta =~ c(lam1, lam2)*y1 + c(lam1, lam2)*y2 + c(lam1,lam2)*y3 command defines η with equality constrained factor loadings within each group but not across groups. The y1 ~~ c(var1, var2)*y1, y2 ~~ c(var1, var2)*y2, and y3 ~~ c(var1, var2)*y3 commands constrain the error variances to be equal within each group but not across groups. The data file contains four variables named y1, y2, y3, and group. The lamdiff := lam1 lam2 command creates a new parameter called lamdiff that is the difference in the common factor loading in the two groups. The logerror := log(var1/var2) command creates a new parameter called logratio which is the natural logarithm of the ratio of the common error variances in each group (var1 and var2). The parameterestimates command will compute a confidence interval for ln(σ 2 ε1 /σ 2 ε2 ) and the endpoints of this interval can be exponentiated by hand to give a confidence interval for σ 2 ε1 /σ 2 ε2. 16
17 The above code allows the loadings, intercepts, and error variances to differ across the two groups. To constrain only the loadings to be equal across groups, the group.equal option in the sem function could be used as shown below. fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group", group.equal = "loadings") The following code will equality constrain the loadings, intercepts, and error variances across groups. fit <- sem(parallel.model, data = mydata, std.lv = T, group = "group", group.equal = c("loadings", "intercepts", "residuals")) δ 11 x 1 λ x11 ξ 1 x 2 δ 21 λ x11 β 11 y e 1 Group 1 ρ 121 δ 31 λ x21 β 21 x 3 ξ 2 δ 41 x 4 λ x21 δ 12 x 1 λ x12 ξ 1 x 2 δ 22 λ x12 β 12 y e 2 Group 2 ρ 122 δ 32 λ x22 β 22 x 3 δ 42 λ x22 (Model 7) x 4 ξ 2 The path diagram for a 2-group GLM with latent predictor variables is shown above (Model 7). In this example, the two latent predictor variables are assumed to each have two tau-equivalent indicator variables. The two slope coefficients within each group (β 1j and β 2j ) are the simple slopes and the differences in simple slopes (β 11 β 12 and β 21 β 22 ) describe the Group x ξ 1 and Group x ξ 2 interactions, respectively. 17
18 The following approximate 100(1 α)% confidence interval for a difference in slope parameters (e.g., β j1 β j2 ) can be used to assess the similarity of population slope parameters in two study populations β j1 β j2 ± z α/2 var(β j1 ) + var(β j2 ) (3.11) where var(β jk ) is the squared standard error of β jk and var(β j1 ) + var(β j2 ) is the estimated standard error of β j1 β j2. The slope estimates and their standard errors in Equation 3.9 can be replaced with standardized slope estimates and their standard errors. Let ρ 1 represent the population correlation between two factors, two prediction errors, or two measurement errors that is estimated in group 1, and let ρ 2 represent the corresponding population correlation that is estimated from group 2. Let ρ 1 and ρ 2 denote the estimates of ρ 1 and ρ 2, respectively. Let L 1 and U 1 denote the lower and upper 100(1 α)% interval estimates computed from group 1 using Equation 2.17 (Module 2), and let L 2 and U 2 denote the lower and upper 100(1 α)% interval estimates computed from group 2 using Equation 2.17 (Module 2). Approximate lower and upper 100(1 α)% interval estimates for ρ 1 ρ 2 are L = ρ 1 ρ 2 (ρ 1 L 1 ) 2 + (ρ 2 U 2 ) 2 (3.12a) U = ρ 1 ρ 2 + (ρ 1 U 1 ) 2 + (ρ 2 L 2 ) 2. (3.12b) The lavaan model specification and multiple group sem function for Model 7 is shown below. A group difference in the slope parameters is defined for the two predictor variable and then lavaan will compute Equation A group.equal = "regressions" option could be added to the sem function to equality constrain the slope coefficients across groups. twogroupgml.model <- ' ksi1 =~ c(lamx11,lamx12)*x1 + c(lamx11,lamx12)*x2 ksi2 =~ c(lamx21,lamx22)*x3 + c(lamx21,lamx22)*x4 y ~ c(b11,b12)*ksi1 + c(b21,b22)*ksi2 b1diff := b11 b12 b2diff := b21 b22 ' fit <- sem(twogroupglm.model, data = mydata, std.lv = T, group = "group") parameterestimates(fit) 18
19 Model Assessment All theoretically important included paths (slopes, factor loadings, correlations) in a latent variable model should describe meaningfully large relations. Furthermore, all excluded paths should represent small or unimportant relations. As a general recommendation for parameters that have been included in the model, standardized slope coefficients should be greater than.25 in absolute value (larger is better) and standardized factor loadings should be greater than.4 in absolute value (larger is better). Theoretically important correlations among factors or observed variables should also be greater than.25 in absolute value. Standardized factor loadings are equal to Pearson correlations in factor models with a single factor or multiple uncorrelated factors. A standardized slope for a particular response variable is equal to a Pearson correlation if there is only one predictor of that response variable or if the multiple predictor variables are uncorrelated. Confidence intervals for standardized slopes, standardized factor loadings, and correlations can be used to assess the magnitude of the parameters. Specifically, a 95% confidence interval should be completely outside the -.25 to.25 range for a standardized slope or correlation and completely outside the -.4 to.4 range for a standardized factor loading. Ideally, 95% Bonferroni confidence intervals for the included parameters will indicate that all included parameters are meaningfully large. Model modification indices are useful in assessing model misspecification. Each index is a one degree of freedom chi-square test statistic for of the null hypothesis that the parameter constraint is correct. Model modification indices for factor loadings should be examined first. The omitted factor loading with the largest modification index can be added to the model, and if the 95% confidence interval for this factor is completely contained within the -.4 to.4 range (smaller is better), then the researcher could argue that constraining this loading to zero is justifiable. If the 95% confidence interval for the standardized factor loading with the largest modification index is completely contained with the -.4 to.4 range, it is likely that all other factors loadings that were constrained to zero are also small. If the 95% confidence interval for this loading is completely outside the -.4 to.4 range, then this loading should be retained in the model and all model parameters need to be re-estimated. The factor loading with the largest modification index in the revised model should also be assessed. If the confidence interval for a standardized factor loading includes -.4 or.4, the statistical results are inconclusive and 19
20 the researcher must decide to include or exclude that factor loading based on nonstatistical criteria. After the excluded factor loadings have been assessed, the excluded slope parameters should examined. The slope parameter with the largest modification index should be added to the model. If the 95% confidence interval for this standardized slope is completely contained within the -.25 to.25 range, then the researcher could argue that constraining this slope to zero is justifiable and it is likely that all other excluded slope parameters are also small. If the 95% confidence interval for this standardized slope parameter is completely outside the -.25 to.25 range, then this slope parameter should be included in the model and the excluded slope with the largest modification index in the revised model should be examined. If the confidence interval for a standardized slope includes -.25 or.25, the statistical results are inconclusive and the researcher must decide to include or exclude that slope parameter based on non-statistical criteria. When assessing parameter similarity across groups in a multi-group design, the confidence intervals for differences of factor loadings, slope parameters, or means will ideally be acceptably narrow and include 0. Confidence intervals for ratios of standard deviations should be acceptably narrow and include 1. If the confidence interval for the difference in parameters is completely contained within a h to h interval, where h represents an acceptably small difference in parameter values, this would provide convincing evidence of parameter similarity. The value of h will depend on the parameter and the application. It is usually easier to specify h for a standardized factor loading or a standardized slope. For instance, if the difference in standardized slopes or standardized factor loadings is completely contained with a -.1 to.1 interval, this would indicate that the standardized parameter values are very similar in the two study populations. However, a small value of h, such as.1 for a standardized slope or factor loading, will require a large sample because large samples are needed to obtain narrow confidence intervals. If several constrained parameters are unconstrained after an exploratory examination of the modification indices, the p-value and confidence interval for a particular parameter in the final model can be misleading. At a minimum, all exploratory modifications should be described in the research report. Ideally, the final model will be reanalyzed in a new 20
21 random sample. If the researcher has access to a large random sample, the sample can be randomly divided into two samples with the exploratory analysis performed on the first sample and a confirmatory analysis performed in the second (validation) sample. Only the results in the validation sample should be reported. Chi-square GOF tests can be used to assess the path models in Module 1 and all of the models in Modules 2 and 3. The GOF test is a test of the null hypothesis that all constraints (e.g., all excluded parameters equal 0 and all equality constrained parameters are equal) on the model are correct. The GOF test is routinely misinterpreted. Researchers incorrectly interpret a p-value greater than.05 as evidence that the model is correct. In fact, the null hypothesis is almost never correct in any real application and the p-value can exceed.05 in small samples even if the model is badly misspecified. In large samples, the p-value for a GOF test can be much less than.05 in models that are only trivially misspecified. Chi-square model comparison tests are also very popular. In multiple group designs, one model might allow corresponding parameter values to differ across groups and another model constrains these parameters to be equal across groups. The chi-squared model comparison test in this example is a test of the null hypothesis that all corresponding parameters are equal across groups. This test is routinely misinterpreted. Researchers will interpret a p-value greater than.05 as evidence that all corresponding parameters are equal across groups, and if the p-value is greater than.05, researchers often conclude that the parameters differ significantly across groups. In fact, a p-value less than.05 does not imply the parameters are equal, and a p-value greater than.05 does not imply that the parameter values are meaningfully different. Confidence intervals for differences or ratios of parameters rather than model comparison tests are needed to determine if the corresponding parameter values are similar or dissimilar across groups. Model comparison tests are also used to compare a model that includes all of the theoretically specified path parameters with a second model that omits all of these parameters. If the p-value for the chi-square model comparison test is less than.05, researchers incorrectly interpret this result as evidence that the model with the omitted paths is incorrect or unacceptable. Despite the serious limitations of the GOF and model comparison tests, most social science journals expect authors to report the results of a GOF test and possibly the results of a model comparison test. The recommendation 21
22 here is to supplement a GOF or model modification test with appropriate confidence interval results. Equivalent Models Equivalent models are models that have the identical GOF test statistic and fit index values with identical degrees of freedom. For instance, the six models shown below (with error terms omitted) for three variables (a, b, c) are all equivalent models with df = 1. Additional equivalent models can be specified by replacing a one-headed arrow in any of these models with a two-headed arrow. a b c (Model 9a) a c b (Model 9b) b a c (Model 9c) b c a (Model 9d) c a b (Model 9e) c b a (Model 9f) When presenting the results for a proposed model, it is important to acknowledge the existence of equivalent models because different equivalent models can have substantially different interpretations and causal implications. Some equivalent models can be ruled out based on theory or logic. In applications where two or more plausible theories are represented by equivalent models, alternative models should be acknowledged when presenting the results of the proposed model. 22
23 Assumptions The GOF tests, model comparison tests, and all confidence intervals assume: 1) random sampling, 2) independence among the n participants, and 3) the observed random variables have an approximate multivariate normal distribution in the study population. The standard errors for path parameters, factor loadings, and correlations are sensitive primarily to the kurtosis of the observed variables. The standard errors will be too small with leptokurtic distributions and too large with platykurtic distributions. Since confidence interval results provide the best way to assess a model, leptokurtosis is more serious than platykurtosis because the confidence intervals will be misleadingly narrow with leptokurtic distributions. As noted in Module 2, if the normality assumption for any particular observed variable has been violated, it might be possible to reduce skewness and kurtosis by transforming that variable. Data transformations might also help reduce nonlinearity and heteroscedasticity. If remedial measures cannot remove excess kurtosis, confidence intervals should be computed using robust or bootstrap standard errors. The recommendation to have a sample size of at least 100 when using ULS estimation and robust standard errors in measurement models and confirmatory factory analysis models also applies to general latent variable statistical models. For indirect and total effects, which can have highly nonnormal sampling distributions, bootstrap confidence intervals based on ULS estimates are recommended. For GOF tests and fit indices, the mean adjusted (Satorra-Bentler) test statistic based on ML estimates is recommended. Sample Size Recommendations There are two completely separate issues regarding sample size requirements for the tests and confidence intervals presented in Modules 1, 2, and 3. One issue is the sample size required for a test or confidence interval to perform properly. A 95% confidence interval for some parameter is said to perform properly in a sample of size n if about 95% of the confidence intervals computed from all possible samples of size n would contain the parameter value. A hypothesis test with α =.05 in a sample of size n is said to perform properly if the null hypothesis would be rejected in about 5% of all possible samples of size n assuming the null hypothesis is true. All of the tests and confidence intervals for the GLM and MGLM are small-sample methods and will perform properly in small samples if their assumptions (e.g., random sampling, independence among participants, 23
24 prediction error normality) have been satisfied. Tests and confidence intervals for latent variable models are referred to as large-sample methods that cannot be expected to perform properly in small samples. If the observed variables are platykurtic or at most moderately leptokurtic, confidence intervals based on USL estimates with robust standard errors should be acceptable with sample sizes of at least 100. Confidence intervals based on ML estimates with robust standard errors can require larger sample sizes (e.g. 200 or more) especially if the model contains many parameters to be estimated. The sample size needed to obtain acceptably narrow confidence intervals is a completely different issue. If the confidence intervals are too wide, the researcher will not be able to provide convincing evidence that the population factor loadings or slope parameters that have been included in the model are meaningfully large. Narrow confidence intervals are also needed to show that factor loadings and slope parameters that have been excluded from the model are small or unimportant. Large sample sizes are usually needed to obtain acceptably narrow confidence intervals possibly much larger than the minimum sample size needed for a robust test or confidence interval to perform properly. Sample size formulas to achieve desired confidence interval width for latent variable model parameters are not useful because they require accurate planning values of unknown population variances and covariances. A more practical approach is to use a sample size that would produce an acceptably narrow confidence interval for a Pearson correlation (ρ yx ) between any two observed variables because the estimated slopes and factor loadings are functions of the sample correlations. The required sample size to estimate ρ yx with 100(1 α)% confidence and a desired confidence interval width equal to w is approximately n = 4(1 ρ yx 2 ) 2 (z α/2 /w) (3.13) where ρ yx is a planning value of the Pearson correlation between observed variables y and x. Equation 3.13 could be used to obtain a rough approximation to the sample size needed to show that certain factor loadings or slope parameters are small. Small factor 2 loadings or slope parameters imply that certain correlations are small and ρ yx could then be set to 0. For instance, to obtain a 95% confidence interval for ρ yx that has a width of.2, 24
25 Equation 3.13 gives a sample size requirement of 388, which is substantially greater than the recommended minimum sample size requirement of 200 (assuming at most moderate leptokurtosis) for ULS estimates with robust standard errors and the mean adjusted (Satorra-Bentler) GOF test with ML estimates. If the sample can be obtained in two stages, the number of participants to sample in the second stage and added to the first-stage sample to achieve the desired confidence interval width (w) of a confidence interval for a specific parameter is approximately equal to n 2 = n 1 [(w 1 /w) 2 1] (3.14) where n 1 is the size of the first-stage sample and w 1 is the width (upper limit minus lower limit) of a confidence interval for a particular parameter obtained in the first-stage sample. A second-stage sample of size n 2 is taken from the same study population and combined with the first-stage sample. The parameters of the latent variable model are then estimated from the combined sample of size n 1 + n 2. The precision of a confidence interval for a standard deviation or a ratio of standard deviations is best described by the upper limit to lower limit ratio rather than the difference. Let r 1 denote the upper limit to lower limit ratio for a standard deviation or a ratio of standard deviations in a first-stage sample of size n 1. The number of participants that should be sampled in the second stage and added to the first-stage sample to achieve the desired upper limit to lower limit ratio (r) is approximately equal to n 2 = n 1 [{ln(r 1 )/ln(r)} 2 1]. (3.15) 25
Prerequisite Material
Prerequisite Material Study Populations and Random Samples A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population
More informationModule 2. General Linear Model
D.G. Bonett (9/018) Module General Linear Model The relation between one response variable (y) and q 1 predictor variables (x 1, x,, x q ) for one randomly selected person can be represented by the following
More informationFinQuiz Notes
Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationPath Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis
Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate
More informationModeration 調節 = 交互作用
Moderation 調節 = 交互作用 Kit-Tai Hau 侯傑泰 JianFang Chang 常建芳 The Chinese University of Hong Kong Based on Marsh, H. W., Hau, K. T., Wen, Z., Nagengast, B., & Morin, A. J. S. (in press). Moderation. In Little,
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationSociology 593 Exam 2 Answer Key March 28, 2002
Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably
More informationExploring Cultural Differences with Structural Equation Modelling
Exploring Cultural Differences with Structural Equation Modelling Wynne W. Chin University of Calgary and City University of Hong Kong 1996 IS Cross Cultural Workshop slide 1 The objectives for this presentation
More informationA Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts
A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationChapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models
Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationSC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)
SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More information1. How will an increase in the sample size affect the width of the confidence interval?
Study Guide Concept Questions 1. How will an increase in the sample size affect the width of the confidence interval? 2. How will an increase in the sample size affect the power of a statistical test?
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationSTRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013
STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the
More informationMeasurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA
Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationIntroduction to Structural Equation Modeling
Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationReview of the General Linear Model
Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis
More informationSpecifying Latent Curve and Other Growth Models Using Mplus. (Revised )
Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing
More informationConfirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models
Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,
More informationIntroduction to Confirmatory Factor Analysis
Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationHypothesis Testing for Var-Cov Components
Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output
More informationSociology 593 Exam 2 March 28, 2002
Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that
More informationClass Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement
Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Introduction to Structural Equation Modeling Lecture #1 January 11, 2012 ERSH 8750: Lecture 1 Today s Class Introduction
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationConfirmatory Factor Analysis. Psych 818 DeShon
Confirmatory Factor Analysis Psych 818 DeShon Purpose Takes factor analysis a few steps further. Impose theoretically interesting constraints on the model and examine the resulting fit of the model with
More informationMANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:
MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationFactor analysis. George Balabanis
Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationChapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models
Chapter 8 Models with Structural and Measurement Components Good people are good because they've come to wisdom through failure. Overview William Saroyan Characteristics of SR models Estimation of SR models
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More information26:010:557 / 26:620:557 Social Science Research Methods
26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationIntroduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab
Applied Statistics Lab Introduction to Structural Equation Modeling Dominique Zephyr Applied Statistics Lab SEM Model 3.64 7.32 Education 2.6 Income 2.1.6.83 Charac. of Individuals 1 5.2e-06 -.62 2.62
More informationMaximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood
Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationIntroduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs
Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationNesting and Equivalence Testing
Nesting and Equivalence Testing Tihomir Asparouhov and Bengt Muthén August 13, 2018 Abstract In this note, we discuss the nesting and equivalence testing (NET) methodology developed in Bentler and Satorra
More informationAnswer Key: Problem Set 6
: Problem Set 6 1. Consider a linear model to explain monthly beer consumption: beer = + inc + price + educ + female + u 0 1 3 4 E ( u inc, price, educ, female ) = 0 ( u inc price educ female) σ inc var,,,
More informationChapter 9 - Correlation and Regression
Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of
More informationSPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients
SPSS Output Homework 1-1e ANOVA a Sum of Squares df Mean Square F Sig. 1 Regression 351.056 1 351.056 11.295.002 b Residual 932.412 30 31.080 Total 1283.469 31 a. Dependent Variable: Sexual Harassment
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationLECTURE 11. Introduction to Econometrics. Autocorrelation
LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct
More informationANCOVA. Lecture 9 Andrew Ainsworth
ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the
More informationsphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19
additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA
More informationThree-Level Modeling for Factorial Experiments With Experimentally Induced Clustering
Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.
More information(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)
12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang Use in experiment, quasi-experiment
More informationSystematic error, of course, can produce either an upward or downward bias.
Brief Overview of LISREL & Related Programs & Techniques (Optional) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 STRUCTURAL AND MEASUREMENT MODELS:
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationChapter 3: Testing alternative models of data
Chapter 3: Testing alternative models of data William Revelle Northwestern University Prepared as part of course on latent variable analysis (Psychology 454) and as a supplement to the Short Guide to R
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationOutline
2559 Outline cvonck@111zeelandnet.nl 1. Review of analysis of variance (ANOVA), simple regression analysis (SRA), and path analysis (PA) 1.1 Similarities and differences between MRA with dummy variables
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationStep 2: Select Analyze, Mixed Models, and Linear.
Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file
More informationMultiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:
Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship
More informationStructural Equation Modeling
Chapter 11 Structural Equation Modeling Hans Baumgartner and Bert Weijters Hans Baumgartner, Smeal College of Business, The Pennsylvania State University, University Park, PA 16802, USA, E-mail: jxb14@psu.edu.
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationWhat is in the Book: Outline
Estimating and Testing Latent Interactions: Advancements in Theories and Practical Applications Herbert W Marsh Oford University Zhonglin Wen South China Normal University Hong Kong Eaminations Authority
More informationChapter 3 ANALYSIS OF RESPONSE PROFILES
Chapter 3 ANALYSIS OF RESPONSE PROFILES 78 31 Introduction In this chapter we present a method for analysing longitudinal data that imposes minimal structure or restrictions on the mean responses over
More informationpsyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests
psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests last lecture: introduction to factorial designs next lecture: factorial between-ps ANOVA II: (effect sizes and follow-up tests) 1 general
More informationAdvanced Structural Equations Models I
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationSEM 2: Structural Equation Modeling
SEM 2: Structural Equation Modeling Week 1 - Causal modeling and SEM Sacha Epskamp 18-04-2017 Course Overview Mondays: Lecture Wednesdays: Unstructured practicals Three assignments First two 20% of final
More informationGeneral structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models
General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables
More informationUsing Mplus individual residual plots for. diagnostics and model evaluation in SEM
Using Mplus individual residual plots for diagnostics and model evaluation in SEM Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 20 October 31, 2017 1 Introduction A variety of plots are available
More informationApplied Quantitative Methods II
Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator
More informationAlternatives to Difference Scores: Polynomial Regression and Response Surface Methodology. Jeffrey R. Edwards University of North Carolina
Alternatives to Difference Scores: Polynomial Regression and Response Surface Methodology Jeffrey R. Edwards University of North Carolina 1 Outline I. Types of Difference Scores II. Questions Difference
More informationMixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755
Mixed- Model Analysis of Variance Sohad Murrar & Markus Brauer University of Wisconsin- Madison The SAGE Encyclopedia of Educational Research, Measurement and Evaluation Target Word Count: 3000 - Actual
More informationEconomics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama
Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama Course Packet The purpose of this packet is to show you one particular dataset and how it is used in
More informationModule 1. Study Populations
Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population usually consists of a specific
More informationComparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior
Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationConsequences of measurement error. Psychology 588: Covariance structure and factor models
Consequences of measurement error Psychology 588: Covariance structure and factor models Scaling indeterminacy of latent variables Scale of a latent variable is arbitrary and determined by a convention
More informationRecent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data
Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationAssessing the relation between language comprehension and performance in general chemistry. Appendices
Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,
More informationADVANCED C. MEASUREMENT INVARIANCE SEM REX B KLINE CONCORDIA
ADVANCED SEM C. MEASUREMENT INVARIANCE REX B KLINE CONCORDIA C C2 multiple model 2 data sets simultaneous C3 multiple 2 populations 2 occasions 2 methods C4 multiple unstandardized constrain to equal fit
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationDESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective
DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,
More informationLongitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts
Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control
More information