Structural Equation Modeling, 17:703 711, 2010 Copyright Taylor & Francis Group, LLC ISSN: 1070-5511 print/1532-8007 online DOI: 10.1080/10705511.2010.510074 SOFTWARE REVIEW The Specification of Causal Models with Tetrad IV: A Review J. A. Landsheer Utrecht University Tetrad IV is a program designed for the specification of causal models. It is specifically designed to search for causal relations, but also offers the possibility to estimate the parameters of a structural equation model. It offers a remarkable graphical user interface, which facilitates building, evaluating, and searching for causal models. The search algorithms make it possible to find alternatives for existing models, as well as to find new models when a theoretical directive is lacking. This is illustrated by the detection of a causal model for longitudinal data, which is a viable alternative for a latent growth model. Tetrad IV is in various respects a remarkable program for causal model evaluation. It is a freeware program that can be used for structural equation modeling (SEM), but beyond that it has various search algorithms that help to discover alternative models. Tetrad IV is the fourth implementation of various routines directed at the search for causal explanations of statistical data. Glymour developed this program for respecification of linear latent variable models in collaboration with his students, Kelly, Scheines, and Spirtes (Glymour, Scheines, Spirtes, & Kelly, 1987). In the 1990s, Scheines, Spirtes, and Glymour worked on the causal interpretation of Bayesian networks, which is also integrated in the Tetrad program (Spirtes, Glymour, & Scheines, 2001). The basic interest of the creators of Tetrad is in search strategies for model specification (Scheines, Spirtes, Glymour, Meek, & Richardson, 1998). These search routines make use of tetrads, the relations between sets of four covariance elements defined as ghij D gh ij gi hj : When a tetrad is equal to zero it is called a vanishing tetrad. Glymour et al. (1987) uses these vanishing tetrads and vanishing partial correlations to explore the covariance matrix. This Correspondence should be addressed to J. A. Landsheer, Department of Methodology and Statistics, Faculty of Social Sciences, Utrecht University, P. O. Box 80140, Utrecht NL3508TC, Netherlands. E-mail: J.A.Landsheer@uu.nl 703
704 LANDSHEER approach has an advantage in that it does not require numerical minimization and avoids the convergence problem. Next to that, a model that is not identified by its parameters can still be identified in terms of vanishing tetrads. These search algorithms can be considered as a form of exploratory model building. The program, which is implemented in Java, is freely available from http://www.phil.cmu.edu/projects/tetrad. In this article, Tetrad IV version 4.3.8 14 is evaluated. All references to Tetrad concern this version. This review is focused on the usability of the Tetrad IV program for model evaluation. The latent growth model has been used as an example of model analysis. The example data (Appendix) concerns the longitudinal development of adolescent aggression from age 12 onward, collected in five annual waves between 2001 and 2006 (Akse, Hale, Engels, Raaijmakers, & Meeus, 2004). These data have also been used in a search session for causal relations. All illustrative figures are directly obtained from Tetrad. Interface THE PROGRAM Tetrad s interface defines several tools that allow for the gradual approach of several statistical tasks. Figure 1 shows the workbench with the tools that are available in Tetrad. The tools for the analysis of a structural equation model are Data for loading data from a file or entering data; Graph, which can be used to create an instance of a SEM graph; Parametric Model, which takes Graph as input and allows the user to set starting values for the different parameters or to fix the parameter to the set starting value; and Estimator, which takes a Parametric Model and Data as input and immediately produces a full information maximum likelihood estimate of the parameters. The graphical interface displays these tools and allows their manipulation and connection. After placing an instance of each tool on the workbench, the user connects the tasks in logical order. The program indicates in a helpful way the possibilities for this logical order when hovering over a box with the mouse. Furthermore, a template can be used for different connected common tasks, including the estimation of a structural equation model from loaded data. The user can double-click on each of the boxes to perform the tasks necessary for data analysis. For data analysis there is another important tool, Manipulated Data, that takes tabular data as input; it allows for different forms of data manipulation, including imputation of missing values, variable conversion, and the creation of new variables. After manipulation the data can be used in the same way as the originally loaded data. Documentation The Web site makes various documentation available for Tetrad. First, a brief tutorial allows a user with some statistical experience to work with the program in a few hours. It works through several examples that show the use of the various tools in Tetrad. A user learns in a short period of time that besides SEM, the program offers several search tools that are not available in other SEM programs. Second, a manual offers some technical documentation and shows the possibilities of Tetrad IV in a more or less systematical way. Third, the program
THE SPECIFICATION OF CAUSAL MODELS 705 FIGURE 1 The workbench of Tetrad IV, provided with the tools for structural equation modeling. itself offers a help file with further explanation of all tools and its use. Fourth, the Web site offers several references for the study of the theoretical background of the program (Scheines et al., 1998; Spirtes et al., 2001). Getting acquainted with all the possibilities that Tetrad offers for the search of alternative models takes considerably more than a few hours. Search Algorithms A unique feature of Tetrad is that it offers a variety of algorithms to search for causal explanations of a body of data. Among other goals, it can search for causal path models, for latent structure, for unobserved confounders in a measurement model, or for linear feedback relations. In this search process, Tetrad can incorporate prior background knowledge, such as temporal order or clusters. This can be used to resolve ambiguous directionality represented by a symmetric arrow (covariance), as well as indicate independence between variables. The results of the search process do not entail a single model, but rather a class of models, each of which fits the data equally well. The search result can suggest alternative models, which often fit the data as well as a commonly used model. Ting (1998) provided a critical discussion of the backgrounds of Tetrad, and Wood (1998) offered some insights into how the Tetrad approach can be used to obtain a better understanding of structural equation models.
706 LANDSHEER A SEM SESSION The user soon discovers that data can be loaded from a delimited text file (Appendix). Figure 1 shows the tools that are necessary for SEM. Tetrad IV allows for two types of data, a covariance matrix or tabular raw data with a line of data for each respondent. Tetrad does not permit loading means next to a covariance matrix, presumably because means are not relevant for the search routines that are the primary interest of the authors. A longitudinal data set with five measurements is used, on which a latent growth model is tested. The latent growth model (Figure 2) is a very popular model for longitudinal data, which has shown an excellent fit in a wide area of different research problems. The latent growth model is a unique class of model on its own, as the basic model does not offer a structural explanation of the data, as all causal relations are fixed. As such, it does not offer any causal meaning. It does provide a linear model of longitudinal data. A remarkable feature is that the model in many cases offers an acceptable fit, often without the need for correlated errors between measurements. Commonly, the basic model does not include a measurement model. Although a measurement model could augment the latent growth model, it is seldom applied. Creating measurement models for longitudinal data is a challenge, as measurement invariance is a critical requirement (Meredith, 1993; Pitts, West, & Tein, 1996), but is not easily accomplished (Wicherts et al., 2004). Naturally, the theoretical basis for the latent growth model varies greatly between the different areas in which it has been applied, even though developmental theories that assume a strict linear relation between time and the developmental variable in focus are scarce. The basic question here is how the Tetrad program can help to build and evaluate models. After loading the data, the next step is to define the model in a graph (Figure 2). Tetrad uses the variable names of the observed variables as these are provided in the input file. The drawing of the model is straightforward and works without a glitch. The interface is easy to use for a Windows user. The next step is the provision of parameters in the Parametric Model. In both the Graph and the Parameterized Model, the user can choose to show the FIGURE 2 A latent growth model for five measurements.
THE SPECIFICATION OF CAUSAL MODELS 707 FIGURE 3 Parameter estimates. errors or not. Double-clicking the parameter name opens a window in which each parameter can be fixed or can be provided with suitable starting values. Oddly enough, the covariance between I and S is only provided with a name when the residuals are not shown. Only then can Tetrad s dialog window be opened to fix this parameter or to provide a starting value. After all information is provided, double-clicking the Estimator quickly opens a window with the results, and perhaps too quickly, because the first result was a bad estimation of the parameters for which Tetrad offered no fit estimates. Other programs such as Mx (Neale, Boker, Xie, & Maes, 1999) or LISREL (Jöreskog & Sörbom, 1996) had no problem estimating this model on these data. The covariance between I and S is often important for the estimation of latent growth models and after providing a better starting value for the covariance (zero), Tetrad provided the same results as Mx and LISREL. The estimated model (Figure 3) has a sufficient fit, 2.7; N D 498/ D 9:67, p D 0:21; BIC D 33,80. When inspecting the results, the estimates provided are standard deviations instead of variances. However, when providing a fixed value for a residual, the variance has to be entered, whereas the square root of this value is shown as the fixed estimate. In the current version of Tetrad IV, it is not possible to apply equality restrictions or restrictions that are dependent on other parameters. A SEARCH SESSION When assuming that the true causal hypothesis is acyclic and there is no hidden common cause between any two variables in the data set, the PC search algorithm can be used for the search of causal paths. When applied to our data set, this gives the following graph, which identifies a class of viable models (Figure 4). None of the edges (lines between variables) are directed, which indicates that no asymmetric (causal) relations are detected. This means that the longitudinal data do not reveal any temporal ordering. Furthermore, we can see that X1 and X5 have two edges, whereas the other variables have three edges. This different status of the first and last measurement offers no surprise. We
708 LANDSHEER FIGURE 4 Search result from the PC algorithm. can now apply our prior knowledge concerning the temporal ordering of the variables and the relationships between the variables, which results in Figure 5. Figure 5 shows a regular autoregressive pattern with a first-order autoregression coefficient that indicates the relation between the same variable on time t and time t C1, and a second-order coefficient between time t and t C 2. The autoregressive process is described by coefficients that reflect the amount of stability in the relative rank order of individuals between two or more points in time. Other model parameters are the (unexplained) variances of the repeated measures. The interpretation of the variance of the first measurement, which is generally the largest, is fundamentally different from the disturbance variances or unexplained variances of subsequent measurements, as the first measurement is not explained by preceding measurements. Autoregressive coefficients do not necessarily represent a simple developmental process and the growth or change curve resulting from a set of autoregressive effects could be complex. Autoregressive coefficients are often used in prediction of future developments, especially when the coefficients are stable. Parameter estimations are shown in Figure 6. The autoregressive model does not fit very well, 2.3; N D 498/ D 9:68, p D :02; BIC D 8.95. As the second-order autoregressive coefficients are very close to each other, we can assume that these are stable. Putting an equality restriction (which is possible by fixing them to a constant) improves the fit, 2.6; N D 498/ D 9:71, p D :14; BIC D 27.55). If we ignore the first measurement because it has no predictors, we see that the first-order autoregressive coefficients increase, whereas the second-order coefficients are stable. Also, the residuals decrease. This indicates increased predictability when the respondents grow older. For the prediction of future events, this autoregressive model might have advantages above the latent growth model. In the latent growth model, the modeled variances of the repeated measures grow beyond the point in time where minimum variance is expected. The latent growth model predicts further growth and is asymptotically unstable as a result. Its applicability for FIGURE 5 Applying prior knowledge concerning temporal ordering.
THE SPECIFICATION OF CAUSAL MODELS 709 FIGURE 6 Parameter estimates of a second-order autoregression model. the prediction of future events beyond the points of measurements is therefore limited. The variance of the observed variables can be decomposed in unexplained residual variance and variance that can be explained by previous measurements. Neither autoregressive model nor latent growth model is as such closely linked to substantive developmental theory. Also, both models lack a measurement model. Liu (2009) offered a more complex example of the use of the Tetrad III search algorithms Purify and Mimbuild, which are also available in Tetrad IV. The goal of Purify is to obtain a measurement model, and the goal of Mimbuild is to discover causal models among latent variables, each of which is measured by multiple indicators. Liu used Tetrad to detect conditional independencies among the latent constructs and in this way rejected two of three hypotheses, based on the technological acceptance model. He concluded that the application of the search algorithms available in Tetrad constitutes a contribution in its own right, and more specifically, helps to detect spurious relations. Liu found that the technological acceptance model can be validated when tested in isolation, but fails within a larger nomological network that includes relevant antecedents and consequences. Haughton, Kamis, and Scholten (2006) compared Tetrad with two other packages for the detection of causal structure in data sets. They applied the programs to a data set of 30 variables. They concluded that Tetrad is helpful to find indirect predictors that might not appear at all in a traditional regression. They recommend Tetrad for its strong functionality, modularity, and flexibility. They also note that there is a significant learning curve to use all the tools and options properly. CONCLUSIONS Tetrad offers several possibilities for model building and evaluation not found in other programs. The layered graphical interface facilitates building a model and the search routines facilitate the definition of alternative models. The graphical interface of Tetrad is more graphical than those offered by most other statistical programs. The interface of Tetrad allows for the stepwise definition of several statistical tasks and permits saving an integrated session of all performed statistical tasks that belong to each
710 LANDSHEER other. In this way, it documents both the performed tasks and its logical order in an entirely graphical way. The display of the various graphs is attractive. As an instrument for SEM, Tetrad IV leaves a few things to be desired. In different models, the program seems to get stuck on a local minimum. As the provision of adequate starting values is an art in itself, this is not always a problem that can be easily solved by a user. The possibilities to constrain the parameters are limited. Clearly, the primary interest of the authors of Tetrad is the search routines. Although the wish of the authors to provide for a way to estimate the search results in the same program is understandable, it seems advisable to integrate an existing engine, such as the Mx engine, which is also available in the public domain and has proven itself in many studies. However, other program glitches that have been mentioned should be solved more easily. Although the reservation of Ting (1998) against data mining techniques can be applied to Tetrad, the basic principle that a single data set can be successfully fitted by various models makes it worthwhile to look for alternative models. The results of Tetrad can be viewed as a way to let the data speak without further thought, but it can also be viewed as a way to open the mind to sensible alternatives. The second-order autoregressive model as the result of the search routine shows that looking for alternative models might be sensible. In conclusion, it is worthwhile to experiment with Tetrad. A look at its graphical interface is recommended for anyone who is involved in the development of graphical interfaces for statistical analysis. The search routines are interesting for anyone who is interested in alternative models. REFERENCES Akse, J., Hale, W. W., Engels, R. C., Raaijmakers, Q. A., & Meeus, W. H. (2004). Personality, perceived parental rejection and problem behavior in adolescence. Social Psychiatry and Psychiatric Epidemiology, 39(12), 980 988. Glymour, C., Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure: Artificial intelligence, philosophy of science, and statistical modeling. New York: Academic. Haughton, D., Kamis, A., & Scholten, P. (2006). A review of three directed acyclic graphs software packages. The American Statistician, 60(3), 272 286. Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User s reference guide. Chicago: Scientific Software. Liu, L. (2009). Technology acceptance model: A replicated test using TETRAD. International Journal of Intelligent Systems, 24(12), 1230 1242. Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525 543. Neale, M. C., Boker, S. M., Xie, G., & Maes, H. M. (1999). Mx statistical modeling. Richmond: Virginia Commonwealth University. Retrieved September 27, 1999, from ftp://ftp.vcu.edu/pub/mx/doc/mxman2.doc Pitts, S. C., West, S. G., & Tein, J. Y. (1996). Longitudinal measurement models in evaluation research: Examining stability and change 1. Evaluation and Program Planning, 19(4), 333 350. Scheines, R., Spirtes, P., Glymour, C., Meek, C., & Richardson, T. (1998). The TETRAD project: Constraint based aids to causal model specification. Multivariate Behavioral Research, 33(1), 65 117. Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, prediction, and search (2nd ed.). Cambridge, MA: MIT Press. Ting, K. (1998). The TETRAD approach to model respecification. Multivariate Behavioral Research, 33(1), 157 164. Wicherts, J. M., Dolan, C. V., Hessen, D. J., Oosterveld, P., van Baal, G. C., Boomsma, D. I., et al. (2004). Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence, 32(5), 509 537. Wood, P. K. (1998). Response to the TETRAD Project: Constraint based aids to causal model specification. Multivariate Behavioral Research, 33(1), 149 156.
THE SPECIFICATION OF CAUSAL MODELS 711 APPENDIX COVARIANCE MATRIX IN TETRAD FORMAT 498 X1 X2 X3 X4 X5.336618.147144.287906.130140.143475.280884.098720.122244.147060.231527.094766.109852.134042.148953.239601