SEM Day 1 Lab Exercises SPIDA 2007 Dave Flora

SEM Day 1 Lab Exercises SPIDA 2007 Dave Flora 1 Today we will see how to estimate CFA models and interpret output using both SAS and LISREL. In SAS, commands for specifying SEMs are given using linear equations, using PROC CALIS. In LISREL, commands for specifying SEMs are given by describing the matrices of the model. First, we will estimate the basic two-factor CFA for depression and anxiety described during the lecture. Let s start with SAS: As we learned in the lecture, a CFA model can be estimated directly from a covariance matrix (i.e., a raw data file is not needed). The following SAS code reads the covariance matrix into a temporary, internal data set called anxdep. The covariance matrix itself is in the file anxdep.txt in the sem folder (i.e., h:\courses\spida\sem). You can just copy-paste the numbers from the text file, but then you will have to type the rest of the SAS code: data anxdep(type=cov); _type_ = 'cov'; input _name_ $ bdi cesd dac bai stai ipat; datalines; bdi.70..... cesd.47.60.... dac.43.33.50... bai.23.15.20.56.. stai.22.13.17.39.53. ipat.17.14.16.39.36.50 ; run; Note how the variable names in the data set match the names of the variables in the CFA example from today s lecture: bdi, cesd, and dac are hypothesized indictors of the first factor (depression). bai, stai, and ipat are hypothesized indictors of the second factor (anxiety).

2 Exercise 1 Remember from the lecture that there are two common ways for defining the scales of the latent variables in CFA. One is to set the variance of each factor (phi11 and phi22) to equal one, while leaving all factor loadings free. The following PROC CALIS code accomplishes this specification: *setting latent variable scales by constraining their variance; proc calis data=anxdep covariance nobs = 138 pshort stderr; lineqs bdi = ly11 f1 + e1, cesd = ly21 f1 + e2, dac = ly31 f1 + e3, bai = ly42 f2 + e4, stai = ly52 f2 + e5, ipat = ly62 f2 + e6; std f1 f2 = 1 1, e1 e2 e3 e4 e5 e6 = te11 te22 te33 te44 te55 te66; cov f1 f2 = ps21; run; PROC CALIS statement The covariance option tells SAS that the data to be analyzed is a covariance matrix. nobs tells SAS the number of observations (N) that were used to calculate the sample covariance matrix. pshort suppresses a good bit of default technical output. stderr tells SAS to include in the output the standard error (and Z-test) for each parameter estimate. LINEQS statement Used to tell SAS the linear equations implied by the model! For a standard CFA model, each equation gives an observed variable (left side of equal sign) as a linear function (right side of equal sign) of a factor loading (i.e., lambda value) multiplied by a latent variable plus a residual variable. For example: bdi = ly11 f1 + e1, where bdi is the observed variable, ly11 is a name for the factor loading, f1 is the name of the latent variable, and e1 is the name of the residual variable. Refer to today s lecture notes to see where these equations come from. This one is y 1 = λ 11 *η 1 + ε 1 ly11 is an arbitrary name for the factor loading parameter that I made up to remind myself that this is the same as the LY(1,1) parameter (λ 11 ) in LISREL (see below). f1 is an arbitrary name for the first factor that I made up (maybe should have used dep for depression instead). Note that proper SAS code is puts a space between ly11 and f1, although the linear equation is actually ly11 times f1!!! Using ly11*f1 won t work! Finally, e1 is an arbitrary name for this equation s residual variable that I made up ( e for epsilon ). Note that each equation is separated by a comma, not a semi-colon! That s because all of the equations are a part of a single LINEQS statement. The last equation does end with a semi-colon, signifying the end of the LINEQS statement. STD statement Used to tell SAS which latent variable variances are parameters to estimate and which are fixed values. f1 f2 = 1 1 tells SAS to fix the variances of the latent variables f1 and f2 (defined in the LINEQS statement) to equal one. e1 e2 e3 e4 e5 e6 = te11 te22 te33 te44 te55 te66 tells SAS that the variance of e1 is a to-be-estimated parameter called te11, the variance of e2 is a to-be-estimated parameter called te22, and so on, where e1 - e6 were defined in the LINEQS statement.

3 Again, te11 is an arbitrary name that I made up, this time to remind myself that this is the same as the TE(1,1) parameter in LISREL. te11 through te66 are the elements of the Θ ε (theta-epsilon) matrix. Again, refer to the lecture notes to see what that is. Again, note that a comma separates the two lines within this statement, but then a semi-colon ends the statement. COV statement Tells SAS which covariances among latent variables (including residual variables) to estimate. All others are set to zero by default. Thus, f1 f2 = ps21; tells SAS to estimate the covariance between latent variables f1 and f2, and to call this parameter ps21. This is the covariance between the anxiety and depression latent variables. ps21 is an arbitrary name I made up to be the same as the PS(2,1) parameter in LISREL. This is the ψ 21 element from the lecture notes. IF we were estimating a model where F1 was used to linearly predict F2, rather than merely allowing them to covary, we would remove this line from the COV statement, and instead insert a new equation in the LINEQS statement. The equation could be f1 = ga21 f2 + ze1, where ga21 is the regression coefficient γ 21 and ze1 is the residual ζ 1. Enter the PROC CALIS code above and run it. Check your log window to make sure there are no errors or warnings! Scroll to the top of the output and examine it. First, you will see some information regarding the model that you have specified. The observed variables (bdi cesd dac bai stai ipat) are listed as manifest endogenous. They are endogenous because they appear as dependent variables in the LINEQS statement. The latent variables f1 and f2 are listed as latent exogenous because they are independent variables predicting the dependent, endogenous variables. Finally, the residual variables are error exogenous. Scrolling down, you will see some technical information about the maximum likelihood estimation algorithm employed by SAS and how well it performed for this particular analysis. The main thing you want to see here is GCONV convergence criterion satisfied. Next comes information about model fit. Look for the chi-square, RMSEA, and CFI values that were in today s lecture notes. There are many other fit indices! Some of these are old indices, such as GFI and AGFI, that nobody uses any more (or shouldn t use) and some are newer indices that can be useful, but are perhaps less popular than RMSEA and CFI, for various reasons. Then you will find the parameter estimates for the model, as well as standard errors and t-statistics (which we essentially treat as Z). The way SAS aligns the output is unfortunate. For example: bdi = 0.7783*f1 + 1.0000 e1 Std Err 0.0590 ly11 t Value 13.2015 The factor loading ( ly11 = λ 11 ) describing the relationship between the depression latent variable ( f1 ) and the observed variable bdi is ly11 = 0.7783, which has standard error = 0.059. This parameter estimate divided by standard error (.7783/.059) = Z = 13.2015, which has p <.001, leading us to reject the null hypothesis that ly11 = 0. Note that the variances of f1 and f2 equal 1, and these values do not have a standard error because they were constrained. The variances of e1-e6 (i.e., te11-te66) do have estimates accompanied by standard errors and t-statistics.

Finally, there is also output labeled Manifest Variable Equations with Standardized Estimates. This output represents what is commonly called the completely standard solution, meaning that these are the parameter estimates that would be obtained if all of the observed variables had been standardized before the model was estimated (or, equivalently, if we had analyzed a correlation matrix instead of a covariance matrix). Note that because we constrained the variances of f1 and f2 to equal one, the covariance between f1 and f2 equals the correlation between f1 and f2. 4 Exercise 2 Re-parameterize the model so that the scale of the latent variables is established by constraining the factor loading for bdi = 1 and the factor loading for bai = 1, allowing the variances of the two latent variables to be freely estimated. The PROC CALIS code below does it note how it is different from the code given above: *setting latent variable scales by constraining their variance; proc calis data=anxdep covariance nobs = 138 pshort stderr; lineqs bdi = 1 f1 + e1, cesd = ly21 f1 + e2, dac = ly31 f1 + e3, bai = 1 f2 + e4, stai = ly52 f2 + e5, ipat = ly62 f2 + e6; std f1 f2 = ps11 ps22, e1 e2 e3 e4 e5 e6 = te11 te22 te33 te44 te55 te66; cov f1 f2 = ps21; run; Compare and contrast the output from this parameterization with that from the previous parameterization. What s the same? What s different? Exercise 3 Modify the model above to include a free parameter for the residual covariance between bdi and bai. This is the ε 41 parameter in the lecture notes. Hint: You will have to add this parameter to the COV statement. Does the Z-test for this parameter estimate suggest that it s an important parameter to include in the model? Can you reproduce the chi-square difference test described in the lecture notes? Is this consistent with the Z-test?

NOW for LISREL! 5 After first starting LISREL, you will want to open a new syntax window. You can do so by clicking on the File pull-down menu, then choosing the New option: Then the New window pops open, and you want to choose the top option, Syntax Only option. Then a new text editor opens into which you will type your syntax. Again, you can just copy-paste the covariance matrix from the file anxdep.txt. Please refer to today s lecture notes to understand what s going on!

Reading in the covariance matrix:!depression and anxiety CFA example DA NI=6 NO=138 MA=CM CM.70.47.60.43.33.50.23.15.20.56.22.13.17.39.53.17.14.16.39.36.50 LA BDI CESD DAC BAI STAI IPAT 6 The first line gives a title. The DA ( data ) line tells LISREL that there are 6 indicators (NI), or observed variables, and 138 observations (NO). The input matrix (MA) is a covariance matrix (CM). Then the CM line signifies that the covariance matrix is given below. The LA ( label ) line signifies that observed variable names are below. Next, the LISREL model specification begins like this: MO NY=6 NE=2 LY=FU,FI PS=SY,FI TE=SY,FI LE depression anxiety FR LY(1,1) LY(2,1) LY(3,1) LY (4,2) LY (5,2) LY (6,2) FR TE(1,1) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) VA 1.0 PS(1,1) PS(2,2) FR PS(2,1) MO signifies the beginning of the model command. NY=6 says that there are 6 observed Y variables. NE=2 says that there are two eta (η) latent variables. LE signifies that the next line contains labels for each eta.

7 In LISREL, we describe the model to be estimated by describing the matrices that form the model: λ11 0 λ21 0 λ31 0 0 λ 0 λ 0 λ 42 52 62 The matrix of factor loadings is called Lambda-Y, or LY. We start out be specifying the form of the matrix: LY = FU, FI FU means full and FI means fixed. Based on the number of observed variables and latent variables, LISREL knows the dimensions of LY. By using the FU option, we are telling LISREL to create a complete matrix (i.e., rather than a diagonal or symmetric matrix). The FI option tells LISREL to fill the matrix with zeros. Next, we tell LISREL which parts of the LY matrix to estimate, or free, with the FR command: FR LY(1,1) LY(2,1) LY(3,1) LY(4,2) LY(5,2) LY(6,2) TE=SY,FI ε1 0 ε 2 0 0 ε3 Θ ε = 0 0 0 ε 4 0 0 0 0 ε5 0 0 0 0 0 ε 6 Signifies that theta-epsilon (TE) starts out symmetric (SY) with fixed (FI) values of zero in all places. Adding this command frees the diagonal values to be estimated: FR TE(1,1) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) 1 Ψ= ψ 21 1 PS=SY,FI Signifies that psy (PS) also starts symmetric and fixed. Adding these lines sets the diagonal elements to 1 and frees the off-diagonal: VA 1.0 PS(1,1) PS(2,2) FR PS(2,1)

Complete LISREL program: Depression and anxiety CFA example DA NI=6 NO=138 MA=CM CM.70.47.60.43.33.50.23.15.20.56.22.13.17.39.53.17.14.16.39.36.50 LA BDI CESD DAC BAI STAI IPAT MO NY=6 NE=2 LY=FU,FI PS=SY,FI TE=SY,FI LE depression anxiety FR LY(1,1) LY(2,1) LY(3,1) LY(4,2) LY(5,2) LY(6,2) FR TE(1,1) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) VA 1.0 PS(1,1) PS(2,2) FR PS(2,1) PATH DIAGRAM OU 8 Including the PATH DIAGRAM option under the model specification commands will generate the path diagram corresponding to the model. Next, OU represents the line for output commands by leaving it blank, we will get default output.

Exercise 4 9 Run the LISREL syntax above. After you have typed the commands, you can run the syntax either by clicking on the L icon on the toolbar or by choosing Run LISREL from the File pull down menu. LISREL will then ask you to save your syntax (if you haven t already done so). Notice that by default, LISREL wants to save the commands as type Simplis Syntax (*.spl). Change it to type Lisrel Syntax (*.ls8).

10 Also, make sure you change the location where the file to be saved to your personal space, rather than the default LISREL folder. This will also be the location of the output file.

11 Assuming there are no errors, the path diagram itself will appear. Note that parameter estimates and a few model fit statistics are printed on the path diagram. However, in order to see all of the output, you will need to open the output window. You can do so by clicking on the Window pull-down menu and choosing the corresponding *.out file. Examine the contents of the output file. After reproducing your syntax and the covariance matrix that was analyzed, LISREL gives a count of the parameters in the model according to the different matrices. For example, the 6 parameters of the lambda-y matrix are listed like this: LAMBDA-Y depressi anxiety -------- -------- BDI 1 0 CESD 2 0 DAC 3 0 BAI 0 4 STAI 0 5 IPAT 0 6 The zeros correspond to parameters that are not part of the model, where non-zero values are parameters to be estimated. This helps confirm that your syntax described the model as you wanted it.

12 Next comes the actual parameter estimates, along with standard errors and Z-scores. Hopefully these are the same as you got in SAS! For example, the estimates of the lambda-y matrix are like this: LISREL Estimates (Maximum Likelihood) LAMBDA-Y depressi anxiety -------- -------- BDI 0.78 - - (0.06) 13.20 CESD 0.60 - - (0.06) 10.25 DAC 0.55 - - (0.05) 10.42 BAI - - 0.65 (0.05) 12.22 STAI - - 0.60 (0.05) 11.29 IPAT - - 0.59 (0.05) 11.59 The factor loading of the BDI on the depression latent variable is 0.78, with standard error of 0.06. We know the factor loading is significant because Z = 13.20. Finally, after the parameter estimates, LISREL gives a lengthy list of model fit statistics.

Exercise 5 13 Change the LSIREL syntax to re-parameterize the model so that the scale of the latent variables is established by constraining the factor loading for bdi = 1 and the factor loading for bai = 1, allowing the variances of the two latent variables to be freely estimated. The following syntax will do it: MO NY=6 NE=2 LY=FU,FI PS=SY,FR TE=SY,FI LE depression anxiety FR LY(2,1) LY(3,1) LY(5,2) LY(6,2) FR TE(1,1) TE(2,2) TE(3,3) TE(4,4) TE(5,5) TE(6,6) VA 1.0 LY(1,1) LY(4,2) PATH DIAGRAM OU How is this syntax different from the previous LISREL syntax? Which SAS output is reproduced using this LISREL syntax? Exercise 6 Modify the model above to include a free parameter for the residual covariance between bdi and bai. This is the ε 41 parameter in the lecture notes. Hint: You will have to free an element of the TE matrix. Does the Z-test for this parameter estimate suggest that it s an important parameter to include in the model? Can you reproduce the chi-square difference test described in the lecture notes? Is it consistent with the Z-test? Is it consistent with what you found using SAS?

Exercise 7 (maybe for homework?) 14 The file saas.txt contains data for 104 participants who completed the following psychological questionnaires: 1. Brief Fear of Negative Evaluation scale (BFNE) 2. Social Interaction Anxiety Scale (SIAS) 3. Social Phobia Scale (SPS) 4. Body Image Ideals Questionnaire (BIQ) 5. Appearance Schemas Inventory (ASI) 6. Appearance Evaluation subscale (APPEVAL) 7. Overweight Preoccupation subscale (OWPREOC) 8. Social Physique Anxiety Scale (SPAS) 9. Social Appearance Anxiety Scale (SAAS) Using either SAS PROC CALIS or LISREL or (even better) both, estimate a CFA model testing the theory that the BFNE, SIAS, and SPS measure one latent construct ( social anxiety ), while the BIQ, ASI, APPEVAL, and OWPREOC measure another latent construct ( negative body image ) that is correlated with the first. Next, it is not known which of these two factors (or both) is most strongly related to the SPAS and SAAS. Thus, your model should include paths linking these two measures to both constructs: SPS SIAS BFNE Social Anxiety SAAS SPAS BIQ ASI AppEval Negative Body Image OwPreoc See next page for instructions on reading the data into the software.

This code will read the data into SAS: 15 data saas; infile sem(saas.txt); input bfne sias sps biq asi appeval owpreoc spas saas; run; This code, typed directly into the syntax window, reads in the data file into LISREL: CFA example 2 DA NI=9 NO=104 MA=CM LA bfne sias sps biq asi appeval owpreoc spas saas RA FI='h:\courses\spida\sem\saas.txt' SE 1 2 3 4 5 6 7 8 9/ Again, the first line gives a title for the analysis. The DA line again specifies the number of observed indicators in the data file (NI=9), the number of observations (NO=104), and tells LISREL to analyze covariances (MA=CM). Also as before, the LA line signifies that labels for the observed variables are given below. Now, the RA line ( RA stands for raw data ) gives the location and name of the data file to be analyzed ( FI stands for file ). Next, the SE command ( SE stands for select ) tells LISREL that a list of numbers will follow on the next line. These numbers tell LISREL which variables to include in the analysis, as well as the order of the variables in terms of the model specification. More on this tomorrow.