Introduction to Structural Equation Modelling Answers to Exercises

Introduction to Structural Equation Modelling Answers to Exercises John Fox Applied Statistics With R WU-Wien, May/June 2006 1. 4.1 (a) Recursive y 3 = γ 31 x 1 + γ 32 x 2 + ε 6 y 4 = γ 41 x 1 + γ 42 x 2 + ε 7 y 5 = β 53 y 3 + β 54 y 4 + ε 8 (b) Nonrecursive y 3 = γ 31 x 1 + β 34 y 4 + ε 6 y 4 = γ 42 x 2 + β 43 y 3 + ε 7 y 5 = β 53 y 3 + β 54 y 4 + ε 8 (c) Block recursive y 3 = γ 31 x 1 + β 34 y 4 + ε 6 y 4 = γ 42 x 2 + β 43 y 4 + ε 7 y 5 = γ 51 x 1 + γ 52 x 2 + β 53 y 3 + β 54 y 4 + ε 8 (d) Block recursive y 3 = γ 31 x 1 + γ 32 x 2 + β 34 y 4 + ε 6 y 4 = γ 41 x 1 + γ 42 x 2 + β 43 y 4 + ε 7 y 5 = β 53 y 3 + β 54 y 4 + ε 8 (e) Nonrecursive y 2 = γ 21 x 1 + ε 4 y 3 = γ 31 x 1 + β 32 y 2 + ε 5 (f) Recursive y 2 = γ 21 x 1 + ε 4 y 3 = γ 31 x 1 + β 32 y 2 + ε 5 (g) Nonrecursive y 10 = γ 10,1 x 1 + γ 10,2 x 2 + γ 10,3 x 3 + γ 10,4 x 4 + γ 10,5 x 5 + γ 10,6 x 6 + γ 10,7 x 7 + γ 10,8 x 8 + β 10,11 y 11 + ε 12 y 11 = γ 11,2 x 2 + γ 11,3 x 3 + γ 11,4 x 4 + γ 11,5 x 5 + γ 11,6 x 6 + γ 11,7 x 7 + γ 10,8 x 8 + γ 11,9 x 9 + β 11,10 y 11 + ε 13 1

(j) Recursive y 5 = γ 52 x 2 + γ 53 x 3 + γ 54 x 4 + ε 8 y 6 = γ 61 x 1 + γ 62 x 2 + γ 63 x 3 + β 65 y 5 + ε 9 y 7 = γ 71 x 1 + γ 72 x 2 + γ 73 x 3 + β 76 y 6 + ε 10 4.2 The general causal structure of the model seems reasonable to me. It s always possible in observational data to imagine that the exogenous variables are actually correlated with the error, but the burden should be on a critic to indicate what omitted variables are likely to make this true. The model could be more articulated, treating some of the exogenous variables as endogenous, perhaps in a block-recursvive structure, but this doesn t invalidate the model as specified. I wonder about the distribution of the endogenous variables, about the linearity of effects, and about possible interactions (e.g., of other explanatory variables with race), but without access to the original data, it s impossible to know whether these concerns are valid. 4.5 Here, too, the general causal structure of the model seems reasonable, but the assumption that the errors are uncorrelated does not: I expect that the omitted causes of the endogenous variables are similar. As well, all of the endogenous variables are counts, which are probably positively skewed. Although one can t know without checking the data, assuming normally distributed errors here is probably not a good idea. 2. (a) IV estimating equations: s z1 y = s z1 x 1 b β1 + s z1 x 2 b β2 s z2 y = s z2 x 1 b β1 + s z2 x 2 b β2 s z3y = s z3x 1 b β1 + s z3x 2 b β2 (b) We have three estimating equations but only two unknown parameters; generally, the estimating equations will be inconsistent there will be no pair of values b β 1, b β 2 that satisfies all three simultaneously. (c) [optional] There are several ways to proceed. One simple way would be to arbitrarily get rid of one of the IVs. After all, we have more IVs than we need, and any two will give us consistent estimates of β 1 and β 2. (But see the later discussion of estimating overidentified equations.) 3. Identification: (a) overidentified (b) just-identified (c) just-identified (d) underidentified (e) underidentified (f) just-identified (g) just-identified (j) overidentified 2

Estimation: (a) OLS (b) IV (or equivalently, 2SLS or FIML) (c) IV (or equivalently, 2SLS or FIML), taking account of the block-recursive structure (so x 3 and x 4 can be used as IVs in estimating for the equation for y 5 ) (d) The model cannot be estimated (although the structural equation for y 5 could be estimated by 2SLS). (e) The model cannot be estimated (although the structural equation for y 2 could be estimated by OLS). (f) OLS (g) IV (or equivalently, 2SLS or FIML) (j) OLS Rindfuss et al: Because this is a just-identified model, IV, 2SLS, and FIML estimates will coincide. Estimates by FIML from the sem function: library(sem) Rindfuss <- matrix(c( + 456.6769, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + -0.9201, 0.0894, 0, 0, 0, 0, 0, 0, 0, 0, 0, + -15.8253, 0.1416, 9.2112, 0, 0, 0, 0, 0, 0, 0, 0, + -3.2442, 0.0124, 0.3908, 0.2209, 0, 0, 0, 0, 0, 0, 0, + -1.3205, 0.0451, 0.2181, 0.0491, 0.2294, 0, 0, 0, 0, 0, 0, + -0.4631, 0.0174,-0.0458,-0.0055, 0.0132, 0.1498, 0, 0, 0, 0, 0, + 0.4768,-0.0191, 0.0179,-0.0295,-0.0589,-0.0085, 0.1772, 0, 0, 0, 0, + -0.3143, 0.0031, 0.0291,-0.0096,-0.0018, 0.0089,-0.0014, 0.1170, 0, 0, 0, + 0.2356, 0.0031, 0.0018,-0.0045,-0.0039, 0.0021,-0.0003, 0.0009, 0.0888, 0, 0, + 18.6603,-0.1567,-2.3493,-0.2052,-0.2385,-0.1434,-0.0119,-0.1380, 0.0267, 5.5596, 0, + 16.2133,-0.2305,-1.4237,-0.2262,-0.3458, 0.1752, 0.1683,-0.1702, 0.2626, 3.6580,16.6382), + 11, 11, byrow=true) rownames(rindfuss) <- colnames(rindfuss) <- c("fatherocc", "Black", "Sibs", "Farm", "South", + "Parents", "Catholic", "Smoking", "Miscarriage", "Education", "FirstBirth") Rindfuss.mod <- specify.model() 1: FatherOcc - Education, gamma10.1, NA 2: Black - Education, gamma10.2, NA 3: Sibs - Education, gamma10.3, NA 4: Farm - Education, gamma10.4, NA 5: South - Education, gamma10.5, NA 6: Parents - Education, gamma10.6, NA 7: Catholic - Education, gamma10.7, NA 8: Smoking - Education, gamma10.8, NA 9: FirstBirth - Education, beta10.11, NA 10: Black - FirstBirth, gamma11.2, NA 11: Sibs - FirstBirth, gamma11.3, NA 12: Farm - FirstBirth, gamma11.4, NA 13: South - FirstBirth, gamma11.5, NA 14: Parents - FirstBirth, gamma11.6, NA 15: Catholic - FirstBirth, gamma11.7, NA 16: Smoking - FirstBirth, gamma11.8, NA 17: Miscarriage- FirstBirth, gamma11.9, NA 3

18: Education - FirstBirth, beta11.10, NA 19: Education <- Education, sigma10.10,na 20: FirstBirth<- FirstBirth, sigma11.11,na 21: Education <- FirstBirth, sigma10.11,na 22: Read 21 records Rindfuss.sem <- sem(rindfuss.mod, Rindfuss, 1766, + fixed.x=c("fatherocc", "Black", "Sibs", "Farm", "South", + "Parents", "Catholic", "Smoking", "Miscarriage")) summary(rindfuss.sem) Model Chisquare = 1.0268e-06 Df = 0 Pr(Chisq) = NA Chisquare (null model) = 18142 Df = 55 Goodness-of-fit index = 1 Adjusted goodness-of-fit index = NA RMSEA index = Inf 90 % CI: (Inf, Inf) Bentler-Bonnett NFI = 1 Tucker-Lewis NNFI = -Inf Bentler CFI = 1 BIC = NA Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -3.17e-04 0.00e+00 0.00e+00 2.72e-05 5.82e-15 5.81e-04 Parameter Estimates Estimate Std Error z value Pr( z ) gamma10.1 0.027441 0.0027177 10.09709 0.0000e+00 Education <--- FatherOcc gamma10.2-0.604368 0.1957844-3.08691 2.0225e-03 Education <--- Black gamma10.3-0.168397 0.0163504-10.29930 0.0000e+00 Education <--- Sibs gamma10.4-0.116996 0.1091451-1.07193 2.8375e-01 Education <--- Farm gamma10.5-0.538643 0.1168517-4.60963 4.0338e-06 Education <--- South gamma10.6-0.886834 0.1495746-5.92904 3.0471e-09 Education <--- Parents gamma10.7-0.517759 0.1174769-4.40733 1.0465e-05 Education <--- Catholic gamma10.8-0.881146 0.1566237-5.62588 1.8457e-08 Education <--- Smoking beta10.11 0.084854 0.0534017 1.58898 1.1206e-01 Education <--- FirstBirth gamma11.2-1.434353 0.3349486-4.28231 1.8497e-05 FirstBirth <--- Black gamma11.3 0.103361 0.0418505 2.46976 1.3520e-02 FirstBirth <--- Sibs gamma11.4-0.085089 0.2059630-0.41313 6.7951e-01 FirstBirth <--- Farm gamma11.5-0.308320 0.2172264-1.41935 1.5580e-01 FirstBirth <--- South gamma11.6 2.242559 0.2537962 8.83606 0.0000e+00 FirstBirth <--- Parents gamma11.7 0.831387 0.2245214 3.70293 2.1313e-04 FirstBirth <--- Catholic gamma11.8-0.646233 0.2958561-2.18428 2.8942e-02 FirstBirth <--- Smoking gamma11.9 2.691497 0.2885664 9.32713 0.0000e+00 FirstBirth <--- Miscarriage beta11.10 0.838723 0.1452543 5.77417 7.7333e-09 FirstBirth <--- Education sigma10.10 3.728918 0.1874051 19.89764 0.0000e+00 Education <-- Education sigma11.11 12.672309 0.5015538 25.26610 0.0000e+00 FirstBirth <-- FirstBirth sigma10.11-1.918187 0.8668871-2.21273 2.6916e-02 FirstBirth <-- Education Iterations = 52 4

The reciprocal effects between education and age at first birth are primarily what are of interest here. The coefficient for the direct effect of age at first birth on education at marriage is (surprisingly) positive, but non-significant. The coefficient for the direct effect of education on age at first birth is positive (as expected) and highly statistically significant. The estimated correlation between the errors is r 10,11 = 1.918 3.729 12.672 = 0.279 This isn t a very large correlaton, but I would have expected it to be positive (and the negative disturbance covariance is statistically significant), so there is a suggestion that the model may be misspecified. Lincoln: Because this is a recursive model, OLS and FIML estimates coincide. Estimates by FIML from the sem function: Lincoln <- matrix(c( + 0.007744, 0, 0, 0, 0, 0, 0, + 0.000635,0.000400, 0, 0, 0, 0, 0, + 0.052401,0.005077,1.065024, 0, 0, 0, 0, + 0.006624,0.001471,0.066069,0.037636, 0, 0, 0, + 0.054564,0.012024,0.823108,0.137249,1.809025, 0, 0, + 0.084675,0.015990,1.131609,0.171958,2.025220,2.496400, 0, + 0.103616,0.019572,1.325756,0.184820,1.969703,2.567911,2.989441), + 7, 7, byrow=true) rownames(lincoln) <- colnames(lincoln) <- c("unionstaff", "Employment", + "logworkers", "Unionized", "Strikes", "Strikers", "PersonDays") Lincoln.mod <- specify.model() 1: Employment - Strikes, gamma52, NA 2: logworkers - Strikes, gamma53, NA 3: Unionized - Strikes, gamma54, NA 4: UnionStaff - Strikers, gamma61, NA 5: Employment - Strikers, gamma62, NA 6: logworkers - Strikers, gamma63, NA 7: Strikes - Strikers, beta65, NA 8: UnionStaff - PersonDays,gamma71, NA 9: Employment - PersonDays,gamma72, NA 10: logworkers - PersonDays,gamma73, NA 11: Strikers - PersonDays,beta76, NA 12: Strikes <- Strikes, sigma55, NA 13: Strikers <- Strikers, sigma66, NA 14: PersonDays <- PersonDays,sigma77, NA 15: Read 14 records Lincoln.sem <- sem(lincoln.mod, Lincoln, 78, + fixed.x=c("unionstaff", "Employment", "logworkers", "Unionized")) summary(lincoln.sem) Model Chisquare = 7.4465 Df = 4 Pr(Chisq) = 0.11409 Chisquare (null model) = -485.02 Df = 21 Goodness-of-fit index = 0.97236 Adjusted goodness-of-fit index = 0.80649 RMSEA index = 0.10578 90 % CI: (NA, 0.22272) Bentler-Bonnett NFI = 1.0154 Tucker-Lewis NNFI = 1.0358 Bentler CFI = 0 BIC = -17.764 5

Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -1.05e-01 0.00e+00 2.58e-15 2.33e-02 1.59e-02 2.49e-01 Parameter Estimates Estimate Std Error z value Pr( z ) gamma52 15.26815 5.809577 2.6281 8.5863e-03 Strikes <--- Employment gamma53 0.57330 0.110366 5.1945 2.0528e-07 Strikes <--- logworkers gamma54 2.04359 0.614975 3.3230 8.9041e-04 Strikes <--- Unionized gamma61 2.70454 0.545643 4.9566 7.1731e-07 Strikers <--- UnionStaff gamma62 5.84584 2.156703 2.7105 6.7172e-03 Strikers <--- Employment gamma63 0.19965 0.050311 3.9683 7.2402e-05 Strikers <--- logworkers beta65 0.90824 0.037721 24.0778 0.0000e+00 Strikers <--- Strikes gamma71 2.36159 0.814651 2.8989 3.7447e-03 PersonDays <--- UnionStaff gamma72 11.80125 3.220081 3.6649 2.4744e-04 PersonDays <--- Employment gamma73 0.27940 0.077421 3.6089 3.0754e-04 PersonDays <--- logworkers beta76 0.74630 0.056402 13.2318 0.0000e+00 PersonDays <--- Strikers sigma55 0.87308 0.140737 6.2036 5.5187e-10 Strikes <-- Strikes sigma66 0.10861 0.017508 6.2036 5.5187e-10 Strikers <-- Strikers sigma77 0.22691 0.036578 6.2036 5.5186e-10 PersonDays <-- PersonDays Iterations = 0 Because this model is overidentified, it was possible that it would not do a good job of reproducing the observed correlational structure of the data, but the overidentification test (labelled Model Chisquare in the output) is nonsignificant. Desipte the small sample size, all of the specified structural parameters are statistically significant. Not surprisingly, given the nature of the three endogenous variables (numbers of strikes, strikers, and person-days lost to strikes), the coefficients b β 65 and b β 76 are particularly highly significant. 4. (a) Label the variables as follows structural submodel: x 1 x 2 y 1 y 2 y 3 y 4 ξ 1 η 1 η 2 Education SEI Anomia67 Powerless67 Anomia71 Powerless71 SES Alienation67 Alienation71 η 1 = γ 11 ξ 1 + ζ 1 η 2 = γ 21 ξ 1 + β 21 η 1 + ζ 2 6

measurement submodel: x 1 = ξ 1 + δ 1 x 2 = λ x 21ξ 1 + δ 2 y 1 = η 1 + ε 1 y 2 = λ y 21 η 1 + ε 2 y 3 = η 2 + ε 3 y 4 = λ y 42 η 2 + ε 4 note that Cov(ε 1,ε 3 ) and Cov(ε 2,ε 4 ) are not prespecified to be 0. (b) Allowing correlated measurement errors across waves of the panel study appears to make sense. Specifying that the structural disturbances ζ 1 and ζ 2 are uncorrelated across waves almost surely does not (and will likely lead to the direct effect of Alienation67 on Alienation71 being over-estimated). Having only a single exogenous cause of alienation makes for a very sparse model and, in my opinion, calls into question the exogeneity of SES. Isn t SES likely to be correlated with other possible causes of alienation? race, ethnicity, and age come immediately to mind. Finally, it doesn t make sense to me to think of SES as a cause, rather than a consequence, of Education and SEI. (The resulting model would not be identified, I believe.) (c) Parameters: γ 11,γ 21,β 21,λ x 21,λ y 21,λy 42,φ 11,ψ 11,ψ 22,θ δ 11,θ δ 22,θ ε 11,θ ε 22,θ ε 33,θ ε 44,θ ε 13,θ ε 24 (17 in all) Number of observed variances and covariances = (6)(6 + 1)/2 = 21. Given that the model is identified, it is overidentified since there are more observed variances and covariances than parameters to estimate. (d) S.Wheaton <- matrix(c( + 11.834, 0, 0, 0, 0, 0, + 6.947, 9.364, 0, 0, 0, 0, + 6.819, 5.091, 12.532, 0, 0, 0, + 4.783, 5.028, 7.495, 9.986, 0, 0, + -3.839, -3.889, -3.841, -3.625, 9.610, 0, + -21.899, -18.831, -21.748, -18.775, 35.522, 450.288), + 6, 6) rownames(s.wheaton) <- colnames(s.wheaton) <- + c( Anomia67, Powerless67, Anomia71, Powerless71, Education, SEI ) model.wheaton.1 <- specify.model() 1: Alienation67 - Anomia67, NA, 1 2: Alienation67 - Powerless67, lamby21, NA 3: Alienation71 - Anomia71, NA, 1 4: Alienation71 - Powerless71, lamby42, NA 5: SES - Education, NA, 1 6: SES - SEI, lambx21, NA 7: SES - Alienation67, gam11, NA 8: SES - Alienation71, gam21, NA 9: Alienation67 - Alienation71, beta21, NA 10: Anomia67 <- Anomia67, thee11, NA 11: Powerless67 <- Powerless67, thee22, NA 12: Anomia71 <- Anomia71, thee33, NA 13: Powerless71 <- Powerless71, thee44, NA 14: Anomia67 <- Anomia71, thee13, NA 15: Powerless67 <- Powerless71, thee24, NA 16: Education <- Education, thed11, NA 17: SEI <- SEI, thed22, NA 7

18: Alienation67 <- Alienation67, psi11, NA 19: Alienation71 <- Alienation71, psi22, NA 20: SES <- SES, phi11, NA 21: Read 20 records sem.wheaton.1 <- sem(model.wheaton.1, S.Wheaton, 932) summary(sem.wheaton.1) Model Chisquare = 4.7302 Df = 4 Pr(Chisq) = 0.31612 Chisquare (null model) = 22260 Df = 15 Goodness-of-fit index = 0.99831 Adjusted goodness-of-fit index = 0.99115 RMSEA index = 0.014003 90 % CI: (NA, 0.05315) Bentler-Bonnett NFI = 0.99979 Tucker-Lewis NNFI = 0.99988 Bentler CFI = 0.99997 BIC = -29.786 Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -0.593000-0.019500 0.000278-0.019900 0.015200 0.521000 Parameter Estimates Estimate Std Error z value Pr( z ) lamby21 0.97874 0.062048 15.7739 0.0000e+00 Powerless67 <--- Alienation67 lamby42 0.92207 0.059797 15.4199 0.0000e+00 Powerless71 <--- Alienation71 lambx21 5.21949 0.425771 12.2589 0.0000e+00 SEI <--- SES gam11-0.57500 0.057981-9.9171 0.0000e+00 Alienation67 <--- SES gam21-0.22676 0.053059-4.2738 1.9217e-05 Alienation71 <--- SES beta21 0.60705 0.051264 11.8418 0.0000e+00 Alienation71 <--- Alienation67 thee11 4.73579 0.457165 10.3590 0.0000e+00 Anomia67 <-- Anomia67 thee22 2.56607 0.406834 6.3074 2.8375e-10 Powerless67 <-- Powerless67 thee33 4.40390 0.518061 8.5007 0.0000e+00 Anomia71 <-- Anomia71 thee44 3.07320 0.436967 7.0330 2.0210e-12 Powerless71 <-- Powerless71 thee13 1.62472 0.315952 5.1423 2.7137e-07 Anomia71 <-- Anomia67 thee24 0.33906 0.263239 1.2880 1.9773e-01 Powerless71 <-- Powerless67 thed11 2.80436 0.512637 5.4705 4.4889e-08 Education <-- Education thed22 264.88096 18.256607 14.5088 0.0000e+00 SEI <-- SEI psi11 4.84670 0.463167 10.4643 0.0000e+00 Alienation67 <-- Alienation67 psi22 4.08769 0.404604 10.1029 0.0000e+00 Alienation71 <-- Alienation71 phi11 6.80551 0.653827 10.4087 0.0000e+00 SES <-- SES Iterations = 92 With the exception of the measurement-error covariance of the two powerlessness measures, all of the parameter estimates are highly statistically significant. The estimates of the structural parameters appear reasonable (although, as indicated above, the coefficient β b 21 is probably an over-estimate). The overidentification test and fit indices suggest that the model does a good job of reproducing the correlational structure of the data. (e) model.wheaton.2 <- specify.model() 1: Alienation67 - Anomia67, NA, 1 2: Alienation67 - Powerless67, lamby*, NA 3: Alienation71 - Anomia71, NA, 1 8

4: Alienation71 - Powerless71, lamby*, NA 5: SES - Education, NA, 1 6: SES - SEI, lambx21, NA 7: SES - Alienation67, gam11, NA 8: SES - Alienation71, gam21, NA 9: Alienation67 - Alienation71, beta21, NA 10: Anomia67 <- Anomia67, thee1*, NA 11: Powerless67 <- Powerless67, thee2*, NA 12: Anomia71 <- Anomia71, thee1*, NA 13: Powerless71 <- Powerless71, thee2*, NA 14: Anomia67 <- Anomia71, thee13, NA 15: Powerless67 <- Powerless71, thee24, NA 16: Education <- Education, thed11, NA 17: SEI <- SEI, thed22, NA 18: Alienation67 <- Alienation67, psi11, NA 19: Alienation71 <- Alienation71, psi22, NA 20: SES <- SES, phi11, NA 21: Read 20 records sem.wheaton.2 <- sem(model.wheaton.2, S.Wheaton, 932) sem.wheaton.2 Model Chisquare = 6.058066 Df = 7 lamby* lambx21 gam11 gam21 beta21 thee1* 0.9546483 5.2273024-0.5830131-0.2187253 0.5956085 4.6191514 thee2* thee13 thee24 thed11 thed22 psi11 2.7813075 1.6481588 0.3142721 2.8144930 264.6041076 4.9183452 psi22 phi11 3.9786787 6.7954548 Iterations = 84 1 - pchisq(6.058066-4.73018, 7-4) [1] 0.7225222 pchisq(deviance(sem.wheaton.2) - deviance(sem.wheaton.1), + df.residual(sem.wheaton.2) - df.residual(sem.wheaton.1), + lower.tail=false) [1] 0.7225222 The nonsignificant likelihood-ratio test comparing the two models suggests that constraining the measurement-model parameters to be equal across waves is consistent with the data. 9