Structural Equation Modeling

Size: px

Start display at page:

Download "Structural Equation Modeling"

Charles McGee
5 years ago
Views:

1 Chapter 11 Structural Equation Modeling Hans Baumgartner and Bert Weijters Hans Baumgartner, Smeal College of Business, The Pennsylvania State University, University Park, PA 16802, USA, Bert Weijters, Department of Personnel Management, Work and Organizational Psychology, Ghent University, Dunantlaan 2, 9000 Ghent, Belgium,

2 Introduction The term structural equation modeling (SEM) refers to a family of multivariate techniques concerned with the examination of relationships between constructs (conceptual or latent variables) that can generally be measured only imperfectly by observed variables. For example, a researcher may be interested in the determinants of consumers use of self-scanning when buying groceries in a grocery store (Weijters et al. 2007). Although the actual use of a self-scanning device is directly observable, antecedents such as consumers attitude toward self-scanning technology or specific beliefs about the benefits of using self-scanning (perceived usefulness, perceived ease of use, etc.) cannot be directly observed and have to be assessed indirectly via self-report or other means. SEM has two important advantages over other, related techniques (e.g., exploratory factor analysis, regression analysis). First and maybe most importantly, SEM enables a sophisticated analysis of the quality of measurement of the theoretical concepts of interest by observable measures. When using SEM for measurement analysis, a researcher will usually specify an explicit measurement model in which each observed variable is linked to a theoretical concept of interest (conceived of as a latent variable of substantive interest) and measurement error. More complex measurement models in which other sources of systematic covariation between observed measures besides a common underlying construct are present can also be formulated. This is important because most conceptual variables of interest can only be measured with error (both random and systematic) and ignoring measurement error has undesirable effects on model estimation and testing. Second, SEM makes it possible to investigate complex patterns of relationships among the constructs in one s theory and assess, both in an overall sense and in terms of specific relations between constructs, how well the hypothesized model represents the data. The model describing the relationships between the constructs in one s theory is usually

3 2 called the latent variable (or structural) model. For example, a researcher can test whether several perceived benefits of self-scanning technologies influence the actual use of such technologies directly or indirectly via attitudes toward these technologies (e.g., whether attitudes mediate the effects of beliefs on behavior), and one can also investigate whether the relationships of interest are invariant across gender or other potential moderators. Initially, SEM was designed for linear relationships between continuous or quasicontinuous observed variables that originated from a single population and for which the assumption of multivariate normality was reasonable. However, substantial progress has been made in broadening the scope of SEM. The model has been extended to represent data from multiple populations (multi-sample analysis), and the heterogeneity can even be unobserved (mixture models; see Chap. 13). Observed variables need not be continuous (e.g., binary, ordinal, or count variables can be modeled), and the latent variables can also be discrete. Estimation procedures that correct for violations of multivariate normality are available, and Bayesian estimation procedures have been incorporated into some programs. Missing data are allowed and can be readily accommodated during model estimation. Models for nonlinear relationships (e.g., interactions between latent variables) have been developed. Traditional cross-sectional models have been supplemented with increasingly sophisticated longitudinal models such as latent curve models. Complex survey designs (e.g., stratification, cluster sampling) can be handled easily, and structural models can be specified at several levels in multi-level models. Because of space constraints, the focus of this chapter will be on cross-sectional confirmatory factor models and full structural equation models combining a confirmatory factor model with a path model for the latent variables. We will emphasize models for continuous observed and latent variables, although we will briefly mention extensions to other observed variable types. After reviewing some general model specification, estimation, and testing issues

4 3 (Sect. 11.2), we will discuss confirmatory factor models (Sect. 11.3) and full structural equation models (Sect. 11.4) in more detail. We will also present an empirical example to illustrate SEM in a particular context (Sect. 11.5). Finally, in Sect we briefly discuss common applications of SEM in marketing and provide an overview of computer programs available for estimating and testing structural equation models An Overview of Structural Equation Modeling Specification of Structural Equation Models A structural equation model can be specified algebraically or graphically. Since a graphical representation, if done correctly, is a complete formulation of the underlying model and often communicates the desired specification more intuitively, we will emphasize graphical models. In order to make the discussion more concrete, we will consider a specific model. According to the model shown in Fig. 11.1, a consumer s attitude toward the use of self-scanning technologies (SST) is a function of five types of benefits: perceived usefulness (PU), perceived ease of use (PEU), reliability (REL), fun (FUN), and newness (NEW). Attitude toward the use of self-scanning (ATT), in turn, influences a consumer s actual use of self-scanning (USE). Because USE is a 0/1 observed dependent variable, a probit transformation of the probability of use of SST is employed so that the relationship of actual use with its antecedents is linear; a logit specification would be another possibility. The six unobserved constructs in this simple model are shown as ellipses (or circles), which signifies that they are the conceptual (latent) entities of theoretical interest. The five benefit constructs are called exogenous latent variables because they are not influenced by other latent variables in the model. In contrast, attitude and USE are endogenous latent variables (although USE is not really latent in the present case) because they

5 4 depend on other constructs in the model. The Greek letters ξ (ksi) and η (eta) are sometimes used to refer to exogenous and endogenous latent variables, respectively, but more descriptive names are used in the present case. Directed paths from exogenous to endogenous latent variables are sometimes called γ (gamma) and directed paths from endogenous to other endogenous latent variables are called β (beta), although it is not necessary to make this distinction. The model assumes that the determinants of an endogenous latent variable do not account for all of the variation in the variable, which implies that an error term (ζ, zeta) is associated with each endogenous latent variable (a so-called error in equation or equation disturbance); there is no error term for Probit[P(USE)] since it is fixed in the present case. Curved arrows starting and ending at the same variable indicate variances, and two-way arrows between variables indicate covariances. For example, the curved arrows associated with the five belief factors are the variances of the exogenous constructs, which are denoted by ϕ ii (phi). For simplicity, the variances of the errors in equations (which are usually denoted by ψ ii or psi) and the covariances between the exogenous constructs (ϕ ij ) are not shown explicitly in the model; usually, non-zero covariances between the exogenous constructs are assumed by default.

5 Fig. 11.1 Graphical Illustration of a Specific Structural Equation Model If the constructs in one s theory are latent variables, they have to be linked to observed measures.

The letters x and y are sometimes used to refer to the indicators of exogenous and endogenous latent constructs, respectively, but more descriptive names are used in the present case.

6 5 Fig Graphical Illustration of a Specific Structural Equation Model If the constructs in one s theory are latent variables, they have to be linked to observed measures. Except for USE, each of the other six constructs is measured by three observed (manifest) variables or indicators, which are shown as rectangles (or squares). The letters x and y are sometimes used to refer to the indicators of exogenous and endogenous latent constructs, respectively, but more descriptive names are used in the present case. The model assumes that a respondent s observed score for a given variable is a function of the underlying latent variable of theoretical interest; this is called a reflective indicator model, and the corresponding indicators

7 6 are sometimes called effect indicators. Since observed variables are fallible, there is also a unique component of variation, which is frequently (and somewhat inaccurately) equated with random error variance. The errors are usually denoted by δ (delta) for indicators of exogenous latent variables and ε (epsilon) for indicators of endogenous latent variables; the corresponding variances are θ δ and θ ε (theta), respectively (which are not shown explicitly in Fig. 11.1). The strength of the relationship between an indicator and its underlying latent variable (construct, factor) is called a factor loading and is usually denoted by λ (lambda). The observed USE measure is a 0/1 variable in the present case (self-scanning was or was not used during the particular shopping trip in question) and one may assume that the observed variable is a crude (binary) measure of an underlying latent variable indicating a consumer s propensity to use self-scanning. This requires the estimation of a threshold parameter. Of course, the model depicted in Fig can also be specified algebraically. This is shown in Table In Table 11.1, it is assumed that all relationships between variables are linear. This is not explicitly expressed in the model in Fig.11.1 (which could be interpreted as a nonparametric structural equation model), but relationships between variables are usually assumed to be linear (esp. when the model is estimated with commonly used SEM programs), unless a distribution other than the normal distribution is specified for a variable. Latent variable model: ATT = γ 1 PU + γ 2 PEU + γ 3 REL + γ 4 FUN + γ 5 NEW + ς 1 Probit[P(USE)] = βatt VVV ζ 1 = ψ 11 VVV(ξ i ) = φ ii

8 7 CCC(ξ i, ξ i ) = φ ii Measurement model: PU1 = λ 1 x PU + δ 1 PU2 = λ 2 x PU + δ 2 PU3 = λ 3 x PU + δ 3 PEU1 = λ 4 x PEU + δ 4 PEU2 = λ 5 x PEU + δ 5 PEU3 = λ 6 x PEU + δ 6 REL1 = λ 7 x REL + δ 7 REL2 = λ 8 x REL + δ 8 REL3 = λ 9 x REL + δ 9 FUN1 = λ x 10 FUN + δ 10 FUN2 = λ x 11 FUN + δ 11 FUN3 = λ x 12 FUN + δ 12 NEW1 = λ x 13 NEW + δ 13 NEW2 = λ x 14 NEW + δ 14 NEW3 = λ x 15 NEW + δ 15 ATT1 = λ 1 y ATT + ε 1 ATT2 = λ 2 y ATT + ε 2 ATT3 = λ 3 y ATT + ε 3 USE = 1 if Probit[P(USE)] > ν, where ν is a threshold parameter, USE = 0 otherwise VVV(δ i ) = θ ii x

9 8 VVV(ε i ) = θ ii y Table 11.1 Algebraic specification of the model in Fig Note that the model in Fig or Table 11.1 is specified for observed variables that have been mean-centered. In this case, latent variable means and equation intercepts can be ignored. Although the means can be estimated, they usually do not provide important additional information. However, in multi-sample analysis, to be discussed below, means may be relevant (e.g., one may want to compare means across samples) and are often modeled explicitly. The hypothesized model shown in Fig contains six relationships between constructs that are specified to be nonzero (i.e., the effect of the five belief factors on attitude, and the effect of attitude on USE). However, one could argue that the relationships that are assumed to be zero are even more important, because these restrictions allow the researcher to test the plausibility of the specified model. The model in Fig contains several restrictions. First, it is hypothesized that, controlling for attitude, there are no direct effects from the five benefit factors on the use of self-scanning. Technically speaking, the model assumes that the effects of benefit beliefs on the use of self-scanning are mediated by consumers attitudes (see Chap. 8). Second, the errors in equations are hypothesized to be uncorrelated. This means that there are no influences on attitude and use that are common to both constructs other than those contained in the model. Third, each observed variable is allowed to load only on its assumed underlying factor; non-target loadings are specified to be zero. Fourth, the model assumes that all errors of measurement are uncorrelated. Models in which at least some of the error correlations are nonzero could be

10 9 entertained. Testing the model on empirical data will show whether these assumptions are justified. Before a model can be estimated or tested, it is important to ascertain that the specified model is actually identified. Identification refers to whether or not a unique solution is possible. A unique aspect of structural equation models is that many variables in the model are unobserved. For example, in the measurement equations for the observed variables, all the righthand side variables are unobserved (see Table 11.1). A first requirement for identification is that the scale in which the latent variables are measured be fixed. This can be done by setting one factor loading per latent variable to one or standardizing the factor variance to one. In Fig. 11.1, one loading per factor was fixed at one. A second requirement is that the number of model parameters (i.e., the number of parameters to be estimated) not be greater than the number of unique elements in the variancecovariance of the observed measures. Since the number of unique variances and covariances is (p)(p+1)/2, where p is the number of observed variables (19 in the present case), and since (p)(p+1)/2 r is the degrees of freedom of the model, where r is the number of model parameters, this requirement says that the number of degrees of freedom must be nonnegative. This is a necessary condition for model identification, but it is not sufficient. If the model in Fig did not contain a categorical variable, the number of estimated parameters would be 53 and the model would have 137 degrees of freedom. Because of the presence of the 0/1 USE variable, the situation is more complex, but the degrees of freedom of the model is still 137. Thus, the necessary condition for identification is satisfied. There are no identification rules that are both necessary and sufficient and can be applied to any type of model. This makes determining model identification a nontrivial task, at least for

11 10 certain models. However, simple identification rules are available for commonly encountered models and some of these will be described later in the chapter Estimation of Structural Equation Models Estimation means finding values of the model parameters such that the discrepancy between the sample variance/covariance matrix of the observed variables (S) and the variance/covariance matrix implied by the estimated model parameters ( Σ ) is minimized. Although several estimation procedures are available (e.g., unweighted least squares, weighted least squares), maximum likelihood (ML) estimation based on the assumption of multivariate normality of the observed variables is often the default method of choice (see Sect. 6.4, Vol. I). ML estimation assumes that the observations are independently and identically multivariate-normally distributed and that the sample size is large (but see Chap. 12 for a discussion of alternative methods that relax these assumptions). The researcher has to ensure that these assumptions are not too grossly violated (e.g., that the skewness and kurtosis of the observed variables, both individually and jointly, is not excessive). Although all estimation methods are iterative procedures, convergence is usually not a problem unless the model is severely misspecified or complex. Small sample sizes and a very small number of indicators per factor may also create problems. If ML estimation is appropriate, the resulting estimates have a variety of desirable properties such as consistency, asymptotic (large-sample) efficiency, and asymptotic normality. Missing values are easily accommodated as long as the missing response mechanism is completely random or random conditional on the observed data. Unfortunately, data are often not distributed multivariate normally, and this is problematic if non-normal variables serve as dependent variables (or outcomes) in the analysis (i.e., if they

12 11 appear on the left-hand side of any model equation). For example, the distribution of the observed variables may not be symmetric, or the distribution may be too flat or too peaked. If categorical or other types of variables are used in the analysis, normality is also violated. Although estimation procedures are available that do not depend on the assumption of multivariate normality (often called asymptotically distribution-free methods), they are frequently not practical because they require very large samples in order to work well. A more promising approach seems to be the use of various robust estimators that apply corrections to the usual test statistic of overall model fit and the estimated standard errors. An example is the Satorra- Bentler scaled (chi-square) test statistic and corresponding robust standard errors. Other correction procedures, which can also be used to correct for non-independence of observations, are available as well. As mentioned earlier, a major advantage of SEM is that measurement error in the observed variables is explicitly accounted for. It is well-known that if observed variables that are measured with error are correlated, the resulting correlations are attenuated (i.e., lower than they should be). When multiple items are available to measure a construct of interest, using an average of several fallible indicators takes into account unreliability to some extent, and the correlation between averages of individual measures will be purged of the distorting influence of measurement error to some extent, but the correction is usually insufficient. SEM automatically controls for the presence of measurement error, and the correlations between the latent variables (factor correlations) are corrected for attenuation Testing Structural Equation Models Testing the Overall Fit of Structural Equation Models

13 12 The fit of a specified model to empirical data can be tested with a chi-square test, which examines whether the null hypothesis of perfect fit is tenable. In principle, this is an attractive test of the overall fit of the model, but in practice there are two problems. First, the test is based on strong assumptions, which are often not met in real data (although as explained earlier, robust versions of the test are available). Second, on the one hand the test requires a large sample size, but on the other hand, as the sample size increases, it becomes more likely that (possibly minor and practically unimportant) misspecifications will lead to the rejection of a hypothesized model. Because of these shortcomings of the chi-square test of overall model fit, many alternative fit indices have been proposed. Although researchers reliance on these fit indices is somewhat controversial (model evaluation is based on mere rules of thumb, and some authors argue that researchers dismiss a significant chi-square test too easily), several alternative fit indices are often reported in practice. Definitions, brief explanations, important characteristics, and commonly used cutoffs for assessing model fit are summarized in Table 11.2.

14 13 Index Definition of the index 1 Characteristics 2 Interpretation and use of the index Minimum fit function chisquare (χ 2 ) (N-1) f BF, SA, NNO, NP Tests the hypothesis that the specified model fits perfectly (within the limits of sampling error); the obtained χ 2 value should be smaller than χ 2 crit; note that the minimum fit function χ 2 is only one possible chi-square statistic and that different discrepancy functions will yield different χ 2 values. Root mean square error of approximation (RMSEA) (χ2 dd) (N 1)dd BF, SA, NNO, P Estimates how well the fitted model approximates the population covariance matrix per df; Browne and Cudeck (1992) suggest that a value of.05 indicates a close fit and that values up to.08 are reasonable; Hu and Bentler (1999) recommend a cutoff value of.06; a p-value for testing the hypothesis that the discrepancy is smaller than.05 may be calculated (so-called test of close fit). Bayesian information criterion (BIC) [χ 2 + r ln N] or [χ 2 df ln N] BF, SA, NNO, P Based on statistical information theory and used for testing competing (possibly non-nested) models; the model with the smallest BIC is selected. Root mean squared residual (S)RMR) 2ΣΣ(s ii σ ii ) 2 (p)(p + 1) BF, SA, NO or NNO, NP Measures the average size of residuals between the fitted and sample covariance matrices; if a correlation matrix is analyzed, RMR is standardized to fall within the [0, 1] interval (SRMR), otherwise it is only bounded from below; a cutoff of.05 is often used for SRMR; Hu and Bentler (1999) recommend a cutoff value close to.08. Comparative Fit Index (CFI) max(χ 2 t dd t, 0) 1 max(χ 2 n dd n, χ 2 t dd t,0) GF, IM, NO, NP Measures the proportionate improvement in fit (defined in terms of noncentrality, i.e., χ 2 - df) as one moves from the baseline to the target model; originally, values greater than.90 were deemed

15 14 acceptable, but Hu and Bentler (1999) recommend a cutoff value of.95. Tucker and Lewis nonnormed fit index (TLI, NNFI) χ 2 n dd n dd n χ t 2 dd t dd t χ n 2 dd n dd n GF, IM, ANO, P Measures the proportionate improvement in fit (defined in terms of noncentrality) as one moves from the baseline to the target model, per df; originally, values greater than.90 were deemed acceptable, but Hu and Bentler (1999) recommend a cutoff value of N = sample size; f = minimum of the fitting function; df = degrees of freedom; r = number of parameters estimated; p = number of observed variables; χ 2 crit = critical value of the χ 2 distribution with the appropriate number of degrees of freedom and for a given significance level; the subscripts n and t refer to the null (or baseline) and target models, respectively. The baseline model is usually the model of complete independence of all observed variables. GF = goodness-of-fit index (i.e., the larger the fit index, the better the fit); BF = badness-of-fit index (i.e., the smaller the fit index, the better the fit); SA = stand-alone fit index (i.e., the model is evaluated in an absolute sense); IM = incremental fit index (i.e., the model is evaluated relative to a baseline model); NO = normed (in the sample) fit index; ANO = normed (in the population) fit index, but only approximately normed in the sample (i.e., can fall outside the [0, 1] interval); NNO = nonnormed fit index; NP = no correction for parsimony; P = correction for parsimony. Table 11.2 Summary of commonly used overall fit (or lack-of-fit) indices

16 15 We offer the following guidelines to researchers assessing the overall fit of a model. First, a significant chi-square statistic should not be ignored because of the presumed weaknesses of the test; after all, a significant chi-square value does show that the model is inconsistent with the data. Close inspection of the hypothesized model is necessary to determine whether or not the discrepancies identified by the chi-square test are serious (even if some of the alternative fit indices suggest that the fit of the model is reasonable). Second, surprisingly often, different fit indices suggest different conclusions (i.e., the CFI may indicate a good fit of the model, whereas the RMSEA is problematic). In these cases, particular care is required in interpreting the model results. Third, a hypothesized model may be problematic even when the overall fit indices are favorable (e.g., if estimated error variances are negative or path coefficients have the wrong sign). Fourth, a well-fitting model is not necessarily the true model. There may be other models that fit equally or nearly equally well. In summary, overall fit indices seem to be most helpful in alerting researchers to possible problems with the specified model Model Modification If a model is found to be deficient, it should be respecified. Two tools are useful in this regard. First, a researcher can inspect the residuals, which express the difference between a sample variance or covariance and the variance or covariance implied by the hypothesized model. Socalled standardized residuals are most helpful, because they correct for both differences in the metric in which different observed variables are measured and sampling fluctuation. A standardized residual can be interpreted as a z-value for testing whether the residual is significantly different from zero. For example, if there is a large positive standardized residual between two variables, it means that the specified model cannot fully account for the covariation between the two variables; a respecification that increases the implied covariance is called for.

17 16 Second, a researcher can study the modification indices for the specified model. A modification index is essentially a Lagrange multiplier test of whether a certain model restriction is consistent with the data (e.g., whether a certain parameter is actually zero or whether an equality constraint holds). If a modification index (MI) is larger than 3.84 (i.e., 1.96 squared), this means that the revised model in which the parameter is freely estimated will fit significantly better (at α =.05) and that the estimated parameter will be significant. Most computer programs also report an estimated parameter change (EPC) statistic, which indicates the likely value of the freely estimated parameter. Modification indices have to be used with care because there is no guarantee that a specification search based on MIs will recover the true model, in part because an added parameter may simply model an idiosyncratic characteristic of the data set at hand. For this reason, it is best to validate data-driven model modifications on a new data set. Often, quite a few MIs will be significant and it may not be obvious which parameters to add first (the final model is likely to depend on the sequence in which parameters are added). Finally, model modification should not only be based on statistical considerations, and strong reliance on prior theory and conceptual understanding of the context at hand is the best guide to meaningful model modifications Assessing the Local Fit of Structural Equation Models Often, researchers will iterate between examining the overall fit of the model, inspecting residuals and modification indices, and looking at some of the details of the specified model. However, once the researcher is comfortable with the final model, this model has to be interpreted in detail. Usually, this will involve the following. First, all model parameters are checked for consistency with expectations and significance tests are conducted at least for the

18 17 parameters that are of substantive interest. Second, depending on the model, certain other analyses will be conducted. For example, a researcher will usually want to report evidence about the reliability and convergent validity of the observed measures, as well as the discriminant validity of the constructs. Third, for models containing endogenous latent variables, the amount of variance in each endogenous variable explained by the exogenous latent variables should be reported. Finally, for some models one may want to conduct particular model comparisons. For example, if a model contains three layers of relationships, one may wish to examine to what extent the variables in the middle layer mediate or channel the relationships between the variables in the first and third layer. Or if a multi-sample analysis is performed, one may wish to test the invariance of particular paths across different groups. More details about local fit assessment will be provided below Confirmatory Measurement Models Congeneric Measurement Models Conceptual variables frequently cannot be measured directly and sets of individually imperfect observed variables are used as proxies of the underlying constructs of interest. SEM is very useful for ascertaining the quality of construct measurement because it enables a detailed assessment of the reliability and validity of measurement instruments, as described below. The analysis usually starts with a congeneric measurement model in which (continuous) observed variables are seen as effects of underlying constructs (i.e., the measurement model is reflective), each observed variable loads on one and only one factor (if the model contains multiple factors), the common factors are correlated, and the unique factors are assumed to be uncorrelated. These assumptions are not always realistic, but the model can be modified if the original model is too

19 18 restrictive. An illustrative congeneric measurement model corresponding to the antecedent constructs in the model in Fig is shown in Fig Fig An Illustrative Congeneric Measurement Model

20 19 Although reflective measurement models are reasonable in many situations, researchers should carefully evaluate whether observed measures can be assumed to be the effects of underlying latent variables. Sometimes, constructs are better thought of as being caused by their indicators (so-called formative measurement models). For example, satisfaction with a product probably does not lead to satisfaction with particular aspects of a product, but is a function of satisfaction with these aspects. Chapter 12 discusses formative measurement models in more detail (see also Diamantopoulos, Riefler, and Roth 2008). Several simple identification rules are available for congeneric measurement models (see Bollen 1989). If at least three indicators per factor are available, a congeneric measurement model is identified, even if the factors are uncorrelated. If a factor has only two indicators, the factor has to have at least one nonzero correlation with another factor, or the model has to be constrained further (e.g., the factor loadings have to be set equal). If there is only a single indicator of a given construct, the error variance of this measure has to be set to zero or another assumed value (e.g., based on the measure s reliability observed in other studies). Provided that the congeneric measurement model is found to be reasonably consistent with the data, the following measurement issues should be investigated. First, the indicators of a given construct should be substantially related to the target construct, both individually and as a set. In the congeneric measurement model, the observed variance in a measure consists of only two sources, substantive variance (variance due to the underlying construct) and unique variance. If one assumes that unique variance is equal to random error variance (usually, it is difficult to separate random error variance from other sources of unique variance), convergent validity is the same as reliability and we will henceforth use the term reliability for simplicity. Individually, an item should load significantly on its target factor, and each item s observed variance should contain a substantial amount of substantive variance. One index, called

21 20 individual-item reliability (IIR) or individual-item convergent validity (IICV), is defined as the squared correlation between a measure x i and its underlying construct ξ j (i.e., the proportion of the total variance in x i that is substantive variance), which can be computed as follows: III xi = λ ii λ ii 2 φ jj 2 φ jj +θ ii (11.1) where λ ij is the loading of indicator x i on construct ξ j, ϕ jj is the variance of ξ j, and θ ii is the unique variance in x i. One common rule of thumb is that at least half of the total variance in an indicator should be substantive variance (i.e., IIR.5). One can also summarize the reliability of all indicators of a given construct by computing the average of the individual-item reliabilities. This is usually called average variance extracted (AVE), that is, AAA ξj = III x i K (11.2) where K is the number of indicators (x i ) for the construct in question (ξ j ). Similar to IIR, a common rule of thumb is that AVE should be at least.5. As a set, all measures of a given construct combined should be strongly related to the underlying construct. One common index is composite reliability (CR), which is defined as the squared correlation between an unweighted sum (or average) of the measures of a construct and the construct itself. CR is a generalization of coefficient alpha to a situation in which items can have different loadings on the underlying factor and it can be computed as follows: CC xi = CR should be at least.7 and preferably higher. ( λ ii) 2 φ jj ( λ ii ) 2 φ jj + θ ii (11.3) Second, indicators should be primarily related to their underlying construct and not to other constructs. In a congeneric model, loadings on non-target factors are set to zero a priori, but the researcher has to evaluate whether this assumption is justified by looking at the relevant

22 21 modification indices and expected parameter changes. This criterion can be thought of as an assessment of discriminant validity at the item level. Third, the constructs themselves should not be too highly correlated if they are to be distinct. This is called discriminant validity at the construct level. One way to test discriminant validity is to construct a confidence interval around each construct correlation and check whether the confidence interval includes one. However, this is a weak criterion of discriminant validity because with a large sample and precise estimates of the factor correlations, the factor correlations will usually be distinct from one, even if the correlations are quite high. A stronger test of discriminant validity is the criterion proposed by Fornell and Larcker (1981). This criterion says that each squared factor correlation should be smaller than the AVE for the two constructs involved in the correlation. Intuitively, this rule means that a construct should be more strongly related to its own indicators than to another construct from which it is supposedly distinct. Up to this point, the assumption has been that individual items are used as indicators of each latent variable. In principle, having more indicators to measure a latent variable is beneficial, but in practice a large number of indicators may not be practical (i.e., too many parameters have to be estimated, the required sample size becomes prohibitive, and it will be difficult to obtain a well-fitting model, etc.). Sometimes, researchers combine individual items into parcels and use the sum or average score of the items in the parcel as an indicator. Such a strategy may be unavoidable when the number of items in a scale is rather large (e.g., a personality scale may consist of 20 or more items) and has certain advantages (e.g., parceling may be used strategically to correct for lack of normality), but parceling has to be used with care (e.g., the items in the parcel should be unidimensional). Particularly when the factor structure of a set of items is not well-understood, item parceling is not recommended. An alternative to item

23 22 parceling is to average all the measures of a given construct, fix the loading on the construct to one, and set the error variance to (1 α) times the variance of the average of the observed measures, where α is an estimate of the reliability of the composite of observed measures (such as coefficient α). However, the same caution as for item parceling is applicable here as well More Complex Measurement Models The congeneric measurement model makes strong assumptions about the factor loading matrix and the covariance matrix of the unique factors. Each indicator loads on a single substantive factor, and the unique factors are uncorrelated. It is possible to relax the assumption that the loadings of observed measures on nontarget factors are zero. In Exploratory Structural Equation Modeling (ESEM), the congeneric confirmatory factor model is replaced with an exploratory factor model in which the number of factors is determined a priori and the initial factor solution is rotated using target rotation (Marsh et al. 2014). The fit of the congeneric factor model can be compared to the fit of an exploratory structural equation model using a chi-square difference test (based on the difference of the two chi-square values and the difference in the degrees of freedom of the two models) and, ideally, the restrictions in the congeneric factor model will not decrease the fit substantially, although frequently the fit does get worse. An alternative method for modeling a more flexible factor pattern is based on Bayesian Structural Equation Modeling (BSEM) (Muthén and Asparouhov 2012). In this approach, informative priors with a small variance are specified for the crossloadings (e.g., a normal prior with a mean of zero and a variance of.01 for the standardized loadings, which implies a 95 percent confidence interval for the loadings ranging from -.2 to +.2) (see Chap. 16 on Bayesian Analysis). Although both methods tend to improve the fit of specified

24 23 models and may avoid distortions of the factor solution when the congeneric measurement model is clearly inconsistent with the data, the two approaches abandon the ideal that an indicator should only be related to a single construct, which creates problems with the interpretation of hypothesized factors. The assumption that the substantive factors specified in the congeneric measurement model are the only sources of covariation between observed measures is also limiting. Frequently, there will be significant modification indices suggesting that the covariation between certain unique factors should be freely estimated. However, there have to be plausible conceptual reasons for introducing correlated errors, because otherwise the resulting respecification of the model will come across as too ad hoc. As an example of a theoretically justified model modification, consider a situation in which some of the indicators are reverse-scored. There is extensive evidence showing that if some of the indicators are reverse-keyed, it is likely that the items keyed in the same direction are more highly correlated than the items keyed in the opposite direction. In this case, the specification of alternative sources of covariation besides substantive overlap seems reasonable (see Weijters, Baumgartner, and Schillewaert 2013). There are two ways in which method effects have been modeled. The first approach is generally referred to as the correlated uniqueness model (Marsh 1989). This method consists of allowing correlations among certain error terms, but instead of introducing the error correlations in an ad hoc fashion, they are motived by a priori hypotheses. For example, correlated uniquenesses might be specified for all items that share the same keying direction (i.e., the reversed items, the regular items, or both) The second approach involves specifying method factors for the hypothesized method effects. Sometimes, a global method factor is posited to underlie all items in one s model, but this is only meaningful under special circumstances (e.g., when both regular and reversed items are

25 24 available to measure a construct or several constructs; see Weijters et al. 2013), because otherwise method variance will be confounded with substantive variance. More likely, a method factor will be specified for subsets of items that share a common method (e.g., reversed items). Of course, it is possible to model multiple method factors if several sources of method bias are thought to be present Multi-sample Measurement Models Sometimes, researchers want to conduct a measurement analysis across different populations of respondents. This is particularly useful in cross-cultural research, where certain conditions of measurement invariance have to be satisfied before meaningful comparisons across different cultures can be performed. Multi-sample measurement models are also useful for comparing factor means across groups, and this requires incorporating the means of observed variables into the analysis. Three types of measurement invariance are particularly important. First, at the most basic level, the same factor model has to hold in each population if constructs are to be compared across groups. This is sometimes called configural invariance. Second, one can test whether the factor loadings of corresponding items are the same across groups. This is referred to as metric invariance. Third, if the means of the variables are included in the model (which is important when the means of constructs are to be compared across groups), one can test whether the intercept of the regression of each observed variable on the underlying factor is the same in each group. This is called scalar invariance in the literature. As discussed by Steenkamp and Baumgartner (1998), if a researcher wants to investigate the strength of directional relationships between constructs across groups, metric invariance (equality of factor loadings) has to hold, and if latent construct means are to be compared across groups, scalar invariance (invariance of

26 25 measurement intercepts) has to hold as well. It is not necessary that all loadings or all measurement intercepts are invariant across groups, but at least two indicators per factor have to exhibit metric and scalar invariance. For details, the interested reader is referred to Steenkamp and Baumgartner (1998). Multi-sample measurement analysis may be thought of as an instance of population heterogeneity in which the heterogeneity is known. Multi-sample models are most useful when the number of distinct groups is small to moderate, and in this situation such fixed-effects models are a straightforward approach to test for moderator effects. As the number of groups gets large, a random-effects specification may be more useful, and if the moderator is continuous, a model with interaction effects is preferable (i.e., continuous moderators should not be discretized). It is also possible to estimate models in which the heterogeneity is unknown and the researcher tries to recover the population heterogeneity from the data. Unknown population heterogeneity is discussed in Chap Measurement Models Based on Item Response Theory Simulation evidence suggests that the assumption of continuous, normally distributed observed variables, while never literally true, is reasonable if the response scale has at least 5 to 7 distinct categories, the response scale category labels were chosen carefully to be equidistant, and the distribution of the data is symmetric. However, there are situations in which these assumptions are difficult to justify, such as when there are only two response options (e.g., yes or no). An attractive approach that explicitly takes into account the discreteness of the data is item response theory (IRT; see Kamata and Bauer 2008). The IRT model can be developed by assuming that the variables that are actually observed are discretized versions of underlying continuous response variables. Therefore, the conventional measurement model has to be

27 26 extended by specifying how the discretized variable that is actually observed is related to the underlying continuous response variable. In the so-called two-parameter IRT model, the probability that a person will provide a response of 1 on item i, given ξ j, is expressed as follows: P x i = 1 ξ j = F a i ξ j + ν i = F a i ξ j b i (11.4) where F is either the normal or logistic cumulative distribution function. Equation (4) specifies a sigmoid relationship between the probability of a response of 1 to an item and the latent construct (referred to as an item characteristic curve); a i is called the discrimination parameter (which shows the sensitivity of the item to discriminate between respondents having different ξ j around the point of inflection of the sigmoid curve) and b i the difficulty parameter (i.e., the value of ξ j at which the probability of a response of 1 is.5). The model is similar to logistic or probit regression, except that the explanatory variable ξ j is latent rather than observed (Wu and Zumbo 2007). The IRT model for binary data can be extended to ordinal responses. The interested reader is referred to Baumgartner and Weijters (2016) for a recent discussion Full Structural Equation Models A full structural equation model can be thought of as a combination of a confirmatory factor model with a latent variable path model. There is a measurement model for both the exogenous and endogenous latent variables, and the latent variable path model (sometimes called the structural model) specifies the relationships between the constructs in one s model. Since measurement models were discussed previously, this section will focus on the latent variable path model. Two kinds of latent variable models can be distinguished. In recursive models, one cannot trace a series of directed (one-way) paths from a latent variable back to the same latent variable

28 27 (there are no bidirectional effects or feedback loops), and all errors in equations (equation disturbances) are uncorrelated. In nonrecursive models, at least one of these conditions is violated. Although nonrecursive models can be specified, questions have been raised about the meaningfulness of such models when all the constructs are measured at the same point of time. When proposing a full structural equation model, it is important to show that the model is identified. The so-called two-step rule is often used for this purpose, which is a sufficient condition for identification (Bollen 1989). In the first step, it is shown that the measurement model corresponding to the structural equation model (in which no structural specification is imposed on the latent variable model and the constructs are allowed to freely correlate) is identified. Identification rules for measurement models have already been discussed. In the second step, the latent variables can be treated as observed (since their variances and covariances were shown to be identified in the first step) and the remaining model parameters (the relationships between the latent variables and the variances and covariances of the errors in equations) are shown to be identified. If there are no directed relationships between the endogenous latent variables (i.e., there are no nonzero β s, see Sect ) or the latent variable model is recursive, the model is identified. If the model is nonrecursive, other identification rules may be applicable (e.g., the rank rule). It is not always easy to show that a model is identified theoretically. Frequently, researchers rely on the computer program used for estimation and testing to alert them to identification problems. A preferred approach may be to start with a model that is known to be identified and to free desired parameters one at a time, provided the modification index for the parameter in question is significant. If a modification index is significant, the parameter in question is probably identified. If the modification index is zero, the parameter is probably not

29 28 identified. If the modification index is non-significant, the freely estimated parameter is likely non-significant, so there should be little interest in freeing the parameter. When assessing the overall fit of a model, one should not only assess the model s fit in isolation, but also compare the target model to several other models (Anderson and Gerbing 1988). The overall fit of the target model is a function of the fit of the measurement model and the fit of the latent variable model. On the one hand, a measurement model in which the latent variables are freely correlated provides an upper limit on the fit of the latent variable model because the latent variable model is saturated. Such a model assesses the fit of the measurement model only and if the measurement model does not fit adequately, the measurement model has to be respecified. On the other hand, a measurement model in which the latent variables are uncorrelated (the so-called model of structural independence) provides a baseline of comparison to evaluate how much the consideration of relationships between the constructs as hypothesized in the target model improves the fit of the model. Note that the model of structural independence is only identified if at least three indicators are available for each latent variable (unless one of the constructs is assumed to be measured perfectly by a single indicator or a certain amount of reliability is assumed). Ideally, the target model should fit much better than the baseline model of structural independence, and as well as (or nearly as well as) the saturated structural model, even though fewer relationships among the latent variables are estimated. It should be noted that the issue of whether the specified model is able to account for the covariances between observed variables (covariance fit) is distinct from the issue of whether the specified model can account for the variation in each endogenous latent variable (variance fit). For example, it is possible that a model fits very well overall, but only a very small portion of the variance in the endogenous constructs is explained. Thus, it is necessary to provide evidence about the explained variance in each endogenous latent variable.

30 29 If a multi-stage latent variable model is specified (e.g., A B C), it is often of interest to test whether the effect of the antecedent (e.g., A) on the outcome (e.g., C) is completely (i.e., no direct effect of A on C) or at least partially mediated by the intervening variables (i.e., at least some of the total effect of A on C goes through B), and how strong the mediated effect is. Most computer programs provide estimates and statistical tests of direct, indirect, and total effects. Research has shown that normal-theory tests of the indirect effects are not always trustworthy and alternatives based on boostrapping are available. SEM was initially developed for models containing only linear relationships. For example, LISREL, the first commonly used program for SEM, stands for Linear Structural Relations (Jöreskog and Sörbom 2006). However, the model has been extended to accommodate nonlinear effects of latent variables, particularly interaction effects. Several different approaches are available; interested readers are referred to Marsh et al The approach implemented in Mplus, based on the method proposed by Klein and Moosbrugger (2000), is very easy to use and has been shown to perform well in simulations. Researchers are often interested in comparing structural paths across different populations. For example, it may be of interest to assess whether the effects of the perceived benefits of self-scanning on attitudes toward self-scanning, or the effect of attitude on the use of self-scanning, are invariant across gender. In order for such comparisons to be meaningful, the measurement model has to exhibit metric invariance across the populations to be compared. In other words, the factor loadings of corresponding items have to be the same across groups. Although full metric invariance is not required, at least two items per construct have to have invariant loadings. Since one loading per factor is fixed at one to set the scale of each factor, this implies that at least one additional loading has to be invariant. If at least two indicators per factor

31 30 are constrained to be invariant, the modification indices on the loadings of the two items will show whether these constraints are satisfied. Provided that a sufficient number of items per factor is invariant (i.e., at least 2), the structural paths of interest can be compared across samples using a chi-square difference test Empirical Example Introduction As an empirical example, we analyze data that were collected from shoppers in stores of a grocery retail chain in Western Europe to study the determinants of consumers use of selfscanning technology (SST). The self-scanners were hand-held devices that were made available on a shelf at the entrance of the store. Customers choosing the self-scanning option used the device throughout their shopping trip to scan the barcodes on all items they selected from the shelves. At check-out, self-scanner users then proceeded to separate fast lanes. Different teams of research associates simultaneously collected the data in six stores of the grocery retailer over the course of three days. Data collection consisted of two stages. In the first stage, research associates approached shoppers upon entering the store and, if shoppers agreed to participate, administered a questionnaire with closed-ended questions. The entry survey contained filter questions to screen out people who were unaware of self-scanning devices and to restrict the sample to customers with a loyalty card, given the retailer s policy of offering self-scanning devices only to loyal customers. The main questionnaire consisted of a series of items measuring attitudes toward SST as well as the perceived attributes of SST and some demographic background variables, including gender. The items are reported in Table In the second stage of data collection, after customers had done their shopping and had checked out their purchases,

32 31 respondents use or non-use of self-scanning was recorded by matching unique codes provided to respondents in the entry and exit data. Perceived usefulness (PU) PU1 Self-scanning will allow me to shop faster. PU2 Self-scanning will make me more efficient while shopping. PU3 Self-scanning reduces the waiting time at the cash register. Perceived ease of use (PEU) PEU1 PEU2 PEU3 Self-scanning will be effortless. Self-scanning will be easy. Self-scanning will be user-friendly. Reliability (REL) REL1 Self-scanning will be reliable. REL2 I expect self-scanning to work well. REL3 Self-scanning will have a faultless result. Fun (FUN) FUN1 Self-scanning will be entertaining. FUN2 Self-scanning will be fun. FUN3 Self-scanning will be enjoyable. Newness (NEW) NEW1 Self-scanning is outmoded - Self-scanning is progressive NEW2 Self-scanning is old - Self-scanning is new NEW3 Self-scanning is obsolete - Self-scanning is innovative Attitude (ATT) ATT1 Unfavorable Favorable ATT2 I dislike it - I like it ATT3 Bad Good Note: All items were administered using a 5-point rating scale format and the instruction What is your position on the following statements?, with the exception of the attitude scale, which contained the following question stem: How would you describe your feelings toward using self-scanning in this store? Table 11.3 Questionnaire Items for the Empirical Example

33 32 A total of 1492 shoppers were approached for participation in the survey. Of these, 709 people responded favorably. Finally, 497 questionnaires contained complete data for customers who were eligible to participate in the study (i.e., they were aware of self-scanning, were in possession of a loyalty card, had purchased at least one product, and their observed self-scanning use or non-use could be matched with their entry survey data). In this sample, 65% (35%) were female (male). Further, 63% had had education after secondary school. As for age, 1% were aged 12-19, 21% 20-29, 21% 30-39, 28% 40-49, 19% 50-59, 7% 60-69, 2% 70-79, and 1% years. Finally, 36% used self-scanning during their visit to the store Analyses and Results In what follows, we illustrate the use of SEM on the self-scanning data, roughly following the outline of the preceding exposition. Thus, we start with a CFA of the five belief constructs. Next, we test measurement invariance of this factor structure across men and women (multi-sample measurement). We then move on to full SEM, testing a two-group (men/women) mediation model where the five belief factors are used as antecedents of self-scanning use, mediated by attitude toward self-scanning use. All analyses were run in Mplus Measurement Analysis Our first aim is to assess the factor structure of the five belief factors (PU, PEU, REL, FUN and NEW). Note that the factor models are intended as stand-alone examples of a measurement analysis. If a factor analysis were used as a precursor to a full structural equation model, it would be common to also include the endogenous constructs and their indicators in the measurement analysis. We start by running an exploratory factor analysis where the 15 belief items freely load on five factors using the default ML estimator with oblique GEOMIN rotation. This model shows

34 33 acceptable fit to the data: χ²(40) = , p <.001; RMSEA =.048 (90% confidence interval (CI) = [.034,.062]); SRMR =.014; CFI =.989; TLI =.970. Each of the five factors shows loadings for the three target items that are statistically significant (p <.05) and substantial (all loadings were greater than.50, although most loadings were greater than.80). There were also six significant cross-loadings, suggesting that the factor pattern does not have perfect simple structure. However, these six cross-loadings do not seem problematic as they are small (most are smaller than.10, and none are greater than.20). We proceed to test a confirmatory factor analysis (CFA) of the five belief factors. Even though the CFA model fits the data significantly worse than the exploratory factor model (the two models are nested and can be compared with a chi-square difference test, χ²(40) = , p <.001), the fit of the CFA model is deemed acceptable, especially in terms of the alternative fit indices: χ²(80)=195.70; RMSEA=.054 (90% CI=[.044,.064]); SRMR=.037; CFI=.972; TLI=.963. Closer inspection of the local fit of the model shows that five modification indices for factor loadings constrained to zero have a value greater than 10; these five modification indices are for the non-target loadings identified in the exploratory factor analysis. Although statistically significant, they are not large enough to warrant model modifications, as this would come at the expense of parsimony and replicability. Table 11.4 reports the CFA results for individual items and factors. Overall, the results are satisfactory, with the exception of two items that have problematic IIR values (less than.50). All AVE values are at least.50 and all CR values are larger than.70, in support of convergent validity. Table 11.5 evaluates discriminant and convergent validity by showing the AVE s and correlations for all factors. Discriminant validity is supported as the squared correlations between constructs are smaller than the AVE s of the constructs involved in the correlation.

35 34 Standardized factor loading IIR AVE CR PU PU PU PU PEU PEU PEU PEU FUN FUN FUN FUN REL REL REL REL NEW NEW NEW NEW Table 11.4 CFA factor structure

36 35 CR PU PEU FUN REL NEW PU PEU FUN REL NEW Note: Values on the diagonal for PU through NEW represent AVE. Below-diagonal values are inter-factor correlations. Above-diagonal values are squared inter-factor correlations. CR refers to composite reliability. Table 11.5 Factor correlations, composite reliability (CR) and average variance extracted (AVE) Now that we have established a viable factor model, we can test for measurement invariance between male and female respondents. To this purpose, we use the same CFA model as before, but additionally specify gender as the grouping variable and run a sequence of three models with constraints corresponding to configural invariance, metric invariance and scalar invariance. Table 11.6 reports the model fit results.

37 36 Model χ² df χ² df p CFI TLI SRMR BIC RMSEA Configural Metric Scalar Table 11.6 Model fit indices for measurement invariance tests The comparison of the metric invariance model with the configural invariance model shows no significant deterioration in fit, so metric invariance can be accepted. Strictly speaking, the χ² difference testing scalar invariance against metric invariance is significant at the.05 level, but there are good reasons to nevertheless accept scalar invariance: the χ² difference is small, and the alternative fit indices (CFI, TLI, SRMR, and RMSEA) do not deteriorate much, particularly the ones that take into account model parsimony (TLI and RMSEA). The information-theory based fit index BIC is lowest for the scalar invariance model. Moreover, closer inspection of the results shows that the modification indices are rather small (the highest modification index for an item intercept is 6.42). In sum, it is reasonable to conclude that the five beliefs related to selfscanning are measured equivalently among men and women, both in terms of scale metrics and item intercepts. As a result, we can use the CFA model to compare factor means. To do so, we set the factor means to zero in the male group while freely estimating the factor means in the female group. None of the factor means are significantly different across groups, although two differences come close: the means of PEU (t = , p =.096) and REL (t = , p =.088) are somewhat lower for women than for men.

38 Full Structural Equation Model To illustrate the use of full SEM, we test the model shown in Fig. 1, although we include gender as a grouping variable and test the invariance of structural paths across men and women. In order for comparisons of structural coefficients to be meaningful, we imposed equality of factor loadings across groups. It was already established that the belief items satisfy metric invariance, and additional analyses showed that metric invariance also held for the indicators of attitude. Table 11.7 reports the model fit indices for a partial mediation model in which the five belief factors influence USE (more specifically, the probit of the probability of use of SST) both directly and indirectly via attitude (model A) and a model with full mediation in which there are no direct effects of the five belief factors on USE (model B). Model B shows significantly worse fit than model A. Closer inspection of the results reveals a significant modification index for the direct effect of PEU on USE in the female group. In model C, we therefore release the direct effect of PEU on USE, and the resulting model does not show a deterioration in fit relative to model A. We can conclude that there are no direct effects of four of the belief factors (PU, REL, FUN, and NEW) on USE, but PEU has a direct effect for women. Fig presents the unstandardized path coefficients estimated for model C. Note that the regressions of USE on ATT and on PEU are probit regressions, which means that the path coefficients are interpreted as the increase in the probit index (z-score) of the probability of USE of SST for a unit increase in attitude or PEU (as measured by a five point scale). Although the path coefficients are not identical for men and women, none of the coefficients were significantly different across groups (the chi-square difference test comparing a model with freely estimated coefficients and a model with invariant coefficients was χ²(7) = 6.77, p =.45). PU, PEU, and FUN have significant

39 38 effects on ATT for both males and females and the effect of REL is marginal for women; the effect of NEW is non-significant for both men and women. PEU also has a direct effect on USE for women. In a bootstrap analysis based on 1000 bootstrap samples the effect of FUN on ATT for men is only marginal. The indirect effects of PU, PEU, and FUN are significant for both men and women, and the indirect effect of REL is marginal for women, based on a Sobel test. However, the indirect effects of PEU and FUN are fragile for men based on a bootstrap analysis with 1000 bootstrap samples (i.e., PEU is not significant and FUN is marginal), and the indirect effect of REL for women is nonsignificant. Note that the indirect effects are naïve indirect effects, not causally defined indirect effects (see Muthén and Asparouhov 2015). Fig also reports the R 2 s for the various endogenous constructs, which range from.55 to.71. In summary, the findings show that perceptions of PU, PEU, and FUN determine consumers attitude toward self-scanning technology, and that attitude influences actual use of self-scanning. PEU also has a direct effect on USE for women, but overall the structural model is largely invariant across genders. Exact fit χ² χ² df p χ² df p CFI TLI WRMR RMSEA Lo Hi A B C Model A = partial mediation (direct and indirect effects); Model B = full mediation (no direct effects); Model C = full mediation with the exception of a direct effect of PEU on USE. The χ 2 difference tests are based on the DIFFTEST procedure in Mplus since the regular χ 2 difference test is not appropriate for the estimation procedure used in the present case (WLSMV due to the presence of the binary USE measure). A probit link is assumed for USE. Lo and Hi refer to the

40 39 lower and upper bound of a 90% CI for RMSEA. WRMR is the weighted root mean square residual (for which the fit heuristics listed in Table 2 are not applicable). Table Model Fit Indices for Different Models

40 Note: Unstandardized path coefficients; *** = p <.001; ** = p <.01, * = p <.05, (*).05 < p <.10. Fig. 11. 2 Parameter Estimates for Model C 11.

41 40 Note: Unstandardized path coefficients; *** = p <.001; ** = p <.01, * = p <.05, (*).05 < p <.10. Fig Parameter Estimates for Model C 11.6 Recent applications of SEM and computer programs for SEM Early reviews of SEM in marketing are provided by Baumgartner and Homburg (1996) and Hulland, Chow, and Lam (1996). An update of the Baumgartner and Homburg review, covering articles published in the major marketing journals until 2007, is available in Martínez-López, Gázquez-Abad, and Sousa (2013). SEM is often used in scale development studies, and is particularly useful for examining measurement invariance of instruments in cross-cultural research (see Hult et al. 2008). SEM is also quite common in survey-based managerial research in marketing (see Homburg, Stierl, and Borneman 2013 for a recent example). The empirical illustrations described in this chapter were estimated using MPlus 7.4 ( Several other programs exist and are commonly used for model

Confirmatory Factor Analysis. Psych 818 DeShon

Confirmatory Factor Analysis. Psych 818 DeShon Confirmatory Factor Analysis Psych 818 DeShon Purpose Takes factor analysis a few steps further. Impose theoretically interesting constraints on the model and examine the resulting fit of the model with